From 35e5c9d624348c8f5716842e2d05e7392856b8e0 Mon Sep 17 00:00:00 2001 From: Gil <12395516+gf712@users.noreply.github.com> Date: Wed, 24 Jun 2026 09:32:06 +0100 Subject: [PATCH] ci: use claude for PR reviews and respond to comments --- .github/workflows/claude-code-review.yml | 35 +++ .github/workflows/claude.yml | 39 +++ CLAUDE.md | 352 +++++++++++++++++++++++ README.md | 2 +- 4 files changed, 427 insertions(+), 1 deletion(-) create mode 100644 .github/workflows/claude-code-review.yml create mode 100644 .github/workflows/claude.yml create mode 100644 CLAUDE.md diff --git a/.github/workflows/claude-code-review.yml b/.github/workflows/claude-code-review.yml new file mode 100644 index 00000000..bf45b5cd --- /dev/null +++ b/.github/workflows/claude-code-review.yml @@ -0,0 +1,35 @@ +name: Claude Code Review + +on: + pull_request: + types: [opened, synchronize, ready_for_review, reopened] + +jobs: + claude-review: + # Only review PRs opened by the repository owner + if: github.event.pull_request.user.login == 'gf712' + + runs-on: ubuntu-latest + permissions: + contents: read + pull-requests: read + issues: read + id-token: write + + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 1 + + - name: Run Claude Code Review + id: claude-review + uses: anthropics/claude-code-action@v1 + with: + claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }} + plugin_marketplaces: 'https://github.com/anthropics/claude-code.git' + plugins: 'code-review@claude-code-plugins' + prompt: '/code-review:code-review ${{ github.repository }}/pull/${{ github.event.pull_request.number }}' + # See https://github.com/anthropics/claude-code-action/blob/main/docs/usage.md + # or https://code.claude.com/docs/en/cli-reference for available options + diff --git a/.github/workflows/claude.yml b/.github/workflows/claude.yml new file mode 100644 index 00000000..8386fa19 --- /dev/null +++ b/.github/workflows/claude.yml @@ -0,0 +1,39 @@ +name: Claude Code + +on: + pull_request_review_comment: + types: [created] + pull_request_review: + types: [submitted] + +jobs: + claude: + if: | + github.actor == 'gf712' && ( + (github.event_name == 'issue_comment' && contains(github.event.comment.body, '@claude')) || + (github.event_name == 'pull_request_review_comment' && contains(github.event.comment.body, '@claude')) || + (github.event_name == 'pull_request_review' && contains(github.event.review.body, '@claude')) || + (github.event_name == 'issues' && (contains(github.event.issue.body, '@claude') || contains(github.event.issue.title, '@claude'))) + ) + runs-on: ubuntu-latest + permissions: + contents: read + pull-requests: read + issues: read + id-token: write + actions: read # Required for Claude to read CI results on PRs + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 1 + + - name: Run Claude Code + id: claude + uses: anthropics/claude-code-action@v1 + with: + claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }} + + # This is an optional setting that allows Claude to read CI results on PRs + additional_permissions: | + actions: read diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 00000000..cd406ece --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,352 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +This is an experimental Python 3.9-compatible interpreter implementation in C++. Unlike CPython, this interpreter uses a **register-based VM** instead of a stack-based VM, implements Python objects as C++ classes, and includes MLIR integration for advanced optimizations. + +## Build System + +### Prerequisites +- CMake 3.25+ +- C++23 compiler +- LLVM 23+ with MLIR (required for MLIR backend) +- GMP (GNU Multiple Precision library) +- ICU (International Components for Unicode) + +Install LLVM/MLIR on Ubuntu: +```bash +wget https://apt.llvm.org/llvm.sh +chmod +x llvm.sh +sudo ./llvm.sh 23 all +sudo apt install libmlir-23-dev mlir-23-tools +``` + +### Build Commands + +**Configure and build:** +```bash +cmake --preset release +cmake --build --preset release +``` + +**Run tests:** +```bash +# Run all tests (unit tests + integration tests) +ctest --preset release + +# Run just integration tests +ctest --preset release -R integration-tests + +# Run just unittests +ctest --preset release -E integration-tests +``` + +**Run the Python interpreter:** +```bash +# The binary is named `python` and lives under the preset's build dir +./build/release/src/python + +# Stress the garbage collector while running (recommended when debugging +# object-lifetime issues); unit is number of allocations, default 10000 +./build/release/src/python --gc-frequency 1000000 +``` + +Useful diagnostic flags: `-t/--tokenize` (print tokens), `-a/--ast` (print AST), +`-b/--bytecode` (print generated bytecode), `-d/--debug` / `--trace` (logging). + +**Development builds with sanitizers:** +```bash +# Address sanitizer +cmake -B build -DCMAKE_BUILD_TYPE=Debug -DENABLE_SANITIZER_ADDRESS=ON +cmake --build build + +# Undefined behavior sanitizer +cmake -B build -DCMAKE_BUILD_TYPE=Debug -DENABLE_SANITIZER_UNDEFINED_BEHAVIOR=ON +cmake --build build +``` + + +## Architecture Overview + +### Execution Pipeline + +**Source → Lexer → Parser → AST → Compiler → Program → VM → Runtime** + +1. **Lexer** (`src/lexer/`) tokenizes Python source using CPython-compatible tokens +2. **Parser** (`src/parser/`) builds an AST using the same grammar spec as CPython +3. **AST** (`src/ast/`) represents code with the same node types as CPython +4. **Compiler** has three backends (`compiler::Backend` in `src/executable/Program.cpp`): + - **MLIR** (current default): the `python` binary always compiles via `Backend::MLIR`. Uses MLIR dialects for optimization, then lowers to bytecode + - **BytecodeGenerator**: Register-based bytecode generated directly from the AST + - **LLVM**: JIT compilation (incomplete/experimental). Must be compiled in by configuring with `-DENABLE_LLVM_BACKEND=ON` (which defines the `USE_LLVM` macro), then selected at runtime with `--use-llvm` +5. **VM** (`src/vm/`) executes instructions with register-based architecture +6. **Interpreter** (`src/interpreter/`) manages execution state, frames, modules +7. **Runtime** (`src/runtime/`) implements Python objects as C++ classes + +### Register-Based VM Architecture + +Unlike CPython's stack-based VM, this interpreter uses registers for intermediate values: + +**StackFrame structure:** +- `registers`: Vector of `py::Value` acting like CPU registers +- `locals`: Stack-allocated local variables (separate from registers) +- `stack_pointer`: For runtime stack management + +**Instructions specify register operands explicitly:** +```cpp +// Example: BINARY_OPERATION r5 r3 r4 means r5 = r3 + r4 +const auto &lhs = vm.reg(m_lhs); +const auto &rhs = vm.reg(m_rhs); +vm.reg(m_destination) = result.unwrap(); +``` + +**Benefits over stack-based:** +- Fewer memory accesses +- More optimization opportunities +- Closer to actual CPU architectures + +**Trade-offs:** +- Larger instruction encoding (includes register indices) +- Currently no register reuse optimization (allocated sequentially) + +### MLIR Integration + +MLIR provides an optimization infrastructure and alternative compilation path. + +**Compilation flow:** +``` +AST → MLIR Python Dialect → Optimizations → MLIR PythonBytecode Dialect → Bytecode +``` + +**Key components:** +- **Python Dialect** (`src/executable/mlir/Dialect/Python/`): High-level Python operations (py.add, py.call, etc.) defined in TableGen +- **MLIRGenerator** (`src/executable/mlir/Dialect/Python/MLIRGenerator.hpp`): Visitor over AST nodes that generates MLIR operations +- **PythonBytecode Dialect** (`src/executable/mlir/Dialect/EmitPythonBytecode/`): Lower-level operations closer to final bytecode +- **Conversion Pass** (`src/executable/mlir/Conversion/PythonToPythonBytecode/`): Lowers Python dialect → PythonBytecode dialect +- **Bytecode Emitter** (`src/executable/mlir/Target/PythonBytecode/`): Translates MLIR to BytecodeProgram + +**Why MLIR?** +- Enables sophisticated optimizations (constant folding, DCE, inlining) +- Infrastructure for future JIT compilation +- Clean separation between frontend (Python semantics) and backend (codegen) +- Can leverage MLIR's ecosystem of transformation passes + +### Python Objects as C++ Classes + +All Python objects inherit from `PyObject` (`src/runtime/PyObject.hpp`): + +```cpp +class PyObject : public Cell { // Cell enables garbage collection + TypePrototype &m_type; // Type information + PyDict *m_attributes; // Instance __dict__ +}; +``` + +**TypePrototype pattern:** +- Template-based compile-time introspection +- Slot functions for protocols (`__add__`, `__getitem__`, etc.) +- Supports both C++ lambdas and PyObject methods + +**Value representation (`src/runtime/Value.hpp`):** +- `py::Value` is a discriminated union to avoid heap allocations for primitives +- Can hold `PyObject*`, inline `Number`, `String`, or `Bytes` + +**Concrete types** (`src/runtime/`): +- Each Python type is a C++ class: PyInteger, PyString, PyList, PyDict, PyTuple, etc. +- Implement Python protocols via methods + +### Interpreter and Runtime Interaction + +**Interpreter** (`src/interpreter/Interpreter.hpp`) manages: +- Current execution frame (`m_current_frame: PyFrame*`) +- Module registry and import machinery +- Global frame for module-level code +- Exception state + +**Runtime** provides object implementations and delegates protocol operations: +```cpp +// VM executes instruction, calls interpreter for object operations +PyResult execute(VirtualMachine &vm, Interpreter &interpreter) { + const auto &lhs = vm.reg(m_lhs); + return add(lhs, rhs, interpreter); // delegates to runtime +} +``` + +**Frame management:** +- `PyFrame`: Python execution context (locals, globals, builtins) +- `StackFrame`: VM state (registers, stack pointer) +- Interpreter maintains frame chain for tracebacks + +## Important Patterns & Conventions + +### Result Type for Error Handling + +All runtime operations return `PyResult` for error propagation: +```cpp +template class PyResult; // Either Ok(T) or Err(BaseException*) + +PyResult add(const PyObject*, const PyObject*); +``` + +Never throw exceptions from runtime code - use PyResult. + +### Visitor Pattern + +Used extensively for: +- **AST traversal**: `ast::CodeGenerator` with `visit()` methods for each AST node type +- **Garbage collection**: `Cell::Visitor` for graph traversal +- Both use double-dispatch pattern + +### Scoping and Variables Resolution + +**VariablesResolver** (`src/executable/bytecode/codegen/VariablesResolver.hpp`): +- Pre-pass before bytecode generation +- Analyzes variable scope (local, global, free variables, cell variables) +- Critical for correct closure and nested function implementation + +**Name mangling** (`src/executable/Mangler.hpp`): +- Implements Python's private name mangling for class attributes (e.g., `__private` → `_ClassName__private`) +- Used during bytecode generation + +### Control Flow + +- Uses `Label` objects for jumps and branches +- Two-pass compilation: generate code with labels, then relocate to instruction positions +- See `src/executable/Label.hpp` + +### Memory Management + +**Garbage Collection** (`src/memory/`): +- Mark-sweep collector +- All objects inherit from `Cell` to participate in GC +- Slab allocator for efficient small object allocation + +**Factory functions:** +```cpp +static PyObject* create(...); // Allocates via VirtualMachine::heap() +``` + +## Directory Structure + +### Core Components + +**Execution:** +- `src/vm/` - Register-based virtual machine +- `src/interpreter/` - Execution control, frame management, module system +- `src/executable/` - Compiled program representations (BytecodeProgram, etc.) + +**Frontend (CPython-compatible):** +- `src/lexer/` - Tokenization +- `src/parser/` - Recursive descent parser +- `src/ast/` - Abstract syntax tree nodes + +**Compilation:** +- `src/executable/bytecode/codegen/` - Register bytecode generator +- `src/executable/bytecode/instructions/` - ~80 instruction types +- `src/executable/mlir/` - MLIR compilation pipeline + - `Dialect/Python/` - High-level Python dialect (TableGen definitions) + - `Dialect/EmitPythonBytecode/` - Low-level bytecode dialect + - `Conversion/` - Lowering passes between dialects + - `Target/` - Final bytecode emission from MLIR + +**Runtime:** +- `src/runtime/` - Python object implementations (PyInteger, PyList, PyDict, etc.) +- `src/runtime/types/` - Built-in type definitions +- `src/runtime/modules/` - Standard library modules (sys, builtins, math, etc.) + +**Memory:** +- `src/memory/` - Mark-sweep garbage collector, slab allocator + +**Other:** +- `src/utilities/` - Helper utilities and freeze tool +- `src/repl/` - Interactive shell (uses linenoise) +- `src/testing/` - Test infrastructure + +### Integration Tests + +**Location:** `integration/` + +**Run integration tests:** +```bash +# Language-feature test suite +./integration/run_python_tests.sh ./build/release/src/python + +# Full integration run (examples + run_python_tests.sh + LLVM backend) +./integration/run_integration_tests.sh ./build/release/src/python +``` + +Test categories: +- `integration/tests/` - Python scripts testing various language features +- `integration/aoc/` - Advent of Code solutions used as larger programs +- `integration/fibonacci/` - Fibonacci example +- `integration/mandelbrot/` - Mandelbrot set computation +- `integration/llvm/` - LLVM backend tests (experimental) + +**Test structure:** +- Tests should assert using Python's `assert` statement +- Scripts exit with code 0 on success, non-zero on failure +- Tests run with `--gc-frequency` flag to stress-test garbage collector + +## Development Workflow + +### Adding a New Bytecode Instruction + +1. Define instruction in `src/executable/bytecode/instructions/` +2. Add to instruction set enumeration +3. Implement `execute()` method that takes VM and Interpreter +4. Register in instruction decoder +5. Update BytecodeGenerator to emit the instruction when visiting relevant AST nodes + +### Adding a New MLIR Operation + +1. Define operation in TableGen: `src/executable/mlir/Dialect/Python/IR/PythonOps.td` +2. Build to generate C++ code from TableGen +3. Add emission in MLIRGenerator when visiting AST nodes +4. Add lowering to PythonBytecode dialect in conversion pass +5. Add bytecode emission in Target + +### Adding a New Python Type + +1. Create class inheriting from `PyObject` in `src/runtime/` +2. Implement Python protocols as methods +3. Create `TypePrototype` registration +4. Add factory function using `VirtualMachine::heap()` +5. Implement GC visitor if type contains references to other objects +6. Add to builtins in `src/runtime/modules/BuiltinsModule.cpp` + +### Debugging + +**GC debugging:** +- Use `--gc-frequency N` to trigger GC every N allocations +- Useful for finding object lifetime bugs + +**Bytecode inspection:** +- Run with `--bytecode` (or `-b`) to print generated instructions; `--ast`/`-a` and `--tokenize`/`-t` dump the AST and token stream + +**MLIR pipeline debugging:** +- Set `MLIR_PRINT_IR_AFTER_ALL=1` when running the `python` binary to dump the + IR after every pass (e.g. `MLIR_PRINT_IR_AFTER_ALL=1 ./build/release/src/python `). + The interpreter parses its own args with cxxopts and does not expose MLIR's + `-mlir-print-*` command-line flags directly. +- The standalone `python-mlir-opt` tool (`src/executable/mlir/tools/python-mlir-opt/`) + is a regular `mlir-opt`-style driver and does accept MLIR's CL flags. + +## Compatibility with CPython + +**What's the same:** +- Token types from the lexer +- Grammar specification for the parser +- AST node types +- Python 3.9 language semantics + +**What's different:** +- VM architecture (register-based vs stack-based) +- Runtime implementation (C++ classes vs C structs) +- Bytecode format (incompatible with CPython .pyc files) +- Performance characteristics (no JIT yet, but register VM may have different trade-offs) + +## Testing Philosophy + +The codebase maintains compatibility by keeping the frontend (lexer, parser, AST) identical to CPython while innovating in the backend (VM, runtime). Integration tests in `integration/tests/` verify Python semantics are preserved. diff --git a/README.md b/README.md index 4189825c..b7d33d9a 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Python C++ (EXPERIMENTAL + IN PROGRESS) -A Python interpreter implementation in C++. The current aim is to be compliant with the Python 3.10 spec and have releases inline with future Python versions. +A Python interpreter implementation in C++. The current aim is to be compliant with the Python 3.9 spec and have releases inline with future Python versions. # What is different from CPython?