opt: parallelize FRI fold with rayon by MauroToscano · Pull Request #448 · yetanotherco/lambda_vm

MauroToscano · 2026-03-18T19:19:30Z

Summary

Parallelize fold_evaluations_in_place in the FRI commit phase using rayon::par_iter. The fold loop was previously sequential over N/2 extension field elements. Since the in-place fold has aliasing (evals[j] reads from evals[2*j]), the parallel version uses a temporary buffer + clone_from_slice.

Benchmark (Apple Silicon M3 Pro, `PARALLEL_TABLES=1`, `fib_iterative_1M`)

Metric	Before	After	Delta
Prove time (median, 3 samples)	62.6s	60.3s	-3.7%
CV	-	0.4%
Heap	23,300 MB	23,336 MB	+0.2%
Verification	-	PASS

The temp buffer for the parallel version is ~36MB (N/2 extension elements) — small vs the 23GB working set.

Test plan

cargo test --release -p stark (all passed)
Proof verified against baseline verifier binary
/bench on CI runner

github-actions · 2026-03-18T19:21:02Z

Codex Code Review

Medium - Potential data races / cross-request contamination in instrumentation state
- The new instrumentation uses process-global atomics and partial thread-local reset, but multi_prove can be called concurrently in a library context.
- reset_all() zeroes shared counters globally, so overlapping proofs can clobber each other’s measurements; take_r1_sub() then swap(0, ...) also drains shared state from whichever proof reads first.
- References: crypto/stark/src/instruments.rs:55, crypto/stark/src/instruments.rs:101, crypto/stark/src/instruments.rs:90, crypto/stark/src/prover.rs:1475, crypto/stark/src/prover.rs:1875.
- Action: scope instrumentation to a per-proof context (passed explicitly) or gate it behind a global mutex to prevent concurrent runs.
Medium - Hot-path performance regression in FRI fold
- The parallel version of fold_evaluations_in_place now allocates a new Vec and then copies it back on every fold layer.
- FRI folding is on the proving hot path and called repeatedly; repeated alloc+copy can materially increase runtime and allocator pressure on large proofs.
- Reference: crypto/stark/src/fri/fri_functions.rs:68, crypto/stark/src/fri/fri_functions.rs:79.
- Action: reuse a scratch buffer across rounds or use a two-buffer strategy to avoid per-round allocations.

No direct security vulnerabilities (unsafe/memory-safety/crypto correctness/VM privilege issues) stood out in the changed diff.

claude · 2026-03-18T19:24:35Z

Review: opt: parallelize FRI fold with rayon

FRI Fold Parallelization is correct. Using a temporary buffer to avoid aliasing (reading evals[2*j] while writing evals[j]) is the right approach. Both sequential and parallel paths produce identical results.

Issues Found:

[Low] Dead code in bench script
stddev() is defined in scripts/bench_prove.sh but never called. Only cv_pct() is used. Remove it.

[Low] Lossy as u64 cast on nanosecond accumulator
In crypto/stark/src/instruments.rs, as_nanos() returns u128 but is cast to u64 in accum_r1_main/accum_r1_aux. Benign in practice but consider .min(u64::MAX as u128) as u64 to be explicit.

[Low] Misleading variable name r4_merkle_dur
This variable captures time for commit_phase_from_evaluations (the full FRI commit phase), not just Merkle hashing. It maps correctly to the fri_commit field in TableSubOps, but the intermediate name is confusing. Consider r4_fri_commit_dur.

[Informational] reset_all() only clears main-thread TLS
reset_all() resets the four global atomics but only clears thread-local slots on the calling thread. Rayon worker threads retain stale TLS across successive multi_prove calls. Safe in the current design since store always precedes take within the same rayon closure, but a panic inside prove_rounds_2_to_4 would leave stale data on that worker. The unwrap_or_default() guard on take_round_sub_ops() silently zeroes it -- worth a comment explaining this invariant.

No correctness or security issues.

MauroToscano · 2026-03-18T19:49:29Z

/bench

The FRI fold loop was sequential over half the evaluation points. Parallelize with rayon par_iter for the first few layers where domain_size is large enough to benefit.

diegokingston · 2026-03-20T14:34:34Z

/bench

github-actions · 2026-03-20T14:46:00Z

Benchmark — fib_iterative_8M (median of 3)

_{Table parallelism: 32 (auto = cores / 3)}

Metric	main	PR	Δ
Peak heap	67626 MB	66991 MB	-635 MB (-0.9%) ⚪
Prove time	35.276s	35.686s	+0.410s (+1.2%) ⚪

✅ No significant change.

⚠️ Heap spread: 9.1% (72145 MB / 66074 MB / 66991 MB)
Consider re-running /bench

_{Commit: f4fbb09 · Baseline: cached · Runner: self-hosted bench}

nicole-graus · 2026-03-30T18:58:14Z

/bench

nicole-graus · 2026-03-30T19:19:12Z

/bench

nicole-graus · 2026-03-30T19:30:53Z

/bench

MauroToscano · 2026-04-07T21:03:40Z

/bench 3 1

MauroToscano · 2026-04-07T21:04:48Z

/bench 3 1

MauroToscano · 2026-04-15T20:49:31Z

/bench

MauroToscano · 2026-06-24T14:02:30Z

Closing this — superseded by #597, which is the newer take on the same FRI fold parallelization and adds a size threshold to avoid Rayon overhead on the small final layers. #597 still needs a more thorough review (rebase onto current main, fmt fix, and a real /bench on the runner), but it's the version we'll carry forward rather than this one.

diegokingston force-pushed the opt/11-parallel-fri-fold branch from 76dd9cc to ddaca1a Compare March 20, 2026 14:07

opt: parallelize FRI fold_evaluations_in_place with rayon

373be35

The FRI fold loop was sequential over half the evaluation points. Parallelize with rayon par_iter for the first few layers where domain_size is large enough to benefit.

diegokingston force-pushed the opt/11-parallel-fri-fold branch from 55b1a19 to 373be35 Compare March 20, 2026 14:15

gabrielbosio and others added 2 commits March 20, 2026 11:15

Merge branch 'main' into opt/11-parallel-fri-fold

ba796ac

Merge branch 'main' into opt/11-parallel-fri-fold

ed72ad9

Merge branch 'main' into opt/11-parallel-fri-fold

f646464

Merge branch 'main' into opt/11-parallel-fri-fold

2f15abb

MauroToscano added 2 commits April 15, 2026 17:45

Merge branch 'main' into opt/11-parallel-fri-fold

f29f87f

Merge branch 'main' into opt/11-parallel-fri-fold

f4fbb09

MauroToscano closed this Jun 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

opt: parallelize FRI fold with rayon#448

opt: parallelize FRI fold with rayon#448
MauroToscano wants to merge 7 commits into
mainfrom
opt/11-parallel-fri-fold

MauroToscano commented Mar 18, 2026

Uh oh!

github-actions Bot commented Mar 18, 2026

Uh oh!

claude Bot commented Mar 18, 2026

Uh oh!

MauroToscano commented Mar 18, 2026

Uh oh!

diegokingston commented Mar 20, 2026

Uh oh!

github-actions Bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

nicole-graus commented Mar 30, 2026

Uh oh!

nicole-graus commented Mar 30, 2026

Uh oh!

nicole-graus commented Mar 30, 2026

Uh oh!

MauroToscano commented Apr 7, 2026

Uh oh!

MauroToscano commented Apr 7, 2026

Uh oh!

MauroToscano commented Apr 15, 2026

Uh oh!

MauroToscano commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

MauroToscano commented Mar 18, 2026

Summary

Benchmark (Apple Silicon M3 Pro, PARALLEL_TABLES=1, fib_iterative_1M)

Test plan

Uh oh!

github-actions Bot commented Mar 18, 2026

Codex Code Review

Uh oh!

claude Bot commented Mar 18, 2026

Uh oh!

MauroToscano commented Mar 18, 2026

Uh oh!

diegokingston commented Mar 20, 2026

Uh oh!

github-actions Bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark — fib_iterative_8M (median of 3)

Uh oh!

nicole-graus commented Mar 30, 2026

Uh oh!

nicole-graus commented Mar 30, 2026

Uh oh!

nicole-graus commented Mar 30, 2026

Uh oh!

MauroToscano commented Apr 7, 2026

Uh oh!

MauroToscano commented Apr 7, 2026

Uh oh!

MauroToscano commented Apr 15, 2026

Uh oh!

MauroToscano commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Benchmark (Apple Silicon M3 Pro, `PARALLEL_TABLES=1`, `fib_iterative_1M`)

github-actions Bot commented Mar 20, 2026 •

edited

Loading