feat(blockchain): pre-build proposer block one interval early#445
feat(blockchain): pre-build proposer block one interval early#445MegaRedHand wants to merge 7 commits into
Conversation
Split out of #445 (proposer pre-build). `gate_proposer` was a one-line wrapper over `duties_allowed()` with a single call site. This inlines it as `scheduled_proposer.filter(|_| self.sync_status.duties_allowed())` and removes the method, so the sync gate reads the same way everywhere. **Behavior-preserving** — `proposer_validator_id` is computed identically; every downstream use (the skip-while-syncing log, the `store::on_tick` proposal flag, the proposal call) is untouched. The three `gate_proposer`-only unit tests collapse onto the `duties_allowed()` assertions they already made. Stacked under the on_tick-regrouping PR; #445 will drop these hunks once both land. Co-authored-by: Pablo Deymonnaz <pdeymon@fi.uba.ar>
on_tick ran its per-interval blocks out of order (4-snapshot, tick, 2, 0, 1). Regroup them into ascending interval order behind `==== interval N ====` markers so the slot timeline reads top to bottom and matches the duty schedule. Pure reorder, behavior unchanged. The interval-4 new_payloads snapshot is the one block that stays ahead of store::on_tick: the interval-4 tick promotes new_payloads out, so it cannot move into a post-tick group. A comment now pins that constraint.
…ick flag Compute scheduled_proposer just before store::on_tick and pass the raw is_proposer (= scheduled_proposer.is_some()) to it; gate the actual proposal on duties_allowed() at the call site instead. While syncing and scheduled to propose, on_tick now accepts attestations early at interval 0 (it did not before); the proposal itself is still skipped.
d4a251e to
4955800
Compare
At interval 4, if one of our validators proposes the next slot, build and sign its block synchronously on the actor so the heavy leanVM aggregation (~1-2s) is done before interval 0. The proposer then only has to publish. Shares one block-build core (produce_block_on_head) across the prebuild and normal proposal paths; a publish-time usability gate falls back to a fresh build if the prebuilt block is stale.
4955800 to
2c6eb1c
Compare
Split out of #445 (proposer pre-build). **Stacked on #449** (`gate_proposer` removal) — review/merge that first; this PR's base retargets to `main` once #449 lands. `on_tick` ran its per-interval blocks out of order (interval-4 snapshot, then tick, then 2, 0, 1). This regroups them into ascending interval order behind `==== interval N ====` markers so the slot timeline reads top-to-bottom and lines up with the duty schedule. **Pure reorder, behavior unchanged.** ### One block intentionally does *not* move The interval-4 `new_payloads` snapshot stays **ahead** of `store::on_tick`. At interval 4 the tick runs `accept_new_attestations → promote_new_aggregated_payloads()`, which drains `new_payloads`; snapshotting after the tick would capture an already-drained set and silently break the post-block coverage report's "timely" cohort. A comment now pins that ordering constraint. > Note: commit `7474c15` on #445 currently moves this snapshot *into* the post-tick interval-4 group, which hits exactly that regression (observability-only, so devnet doesn't surface it). #445 should adopt this PR's placement when it rebases.
Greptile SummaryThis PR pre-builds the next slot's block at interval 4 of the previous slot, stashing a
Confidence Score: 3/5The fallback path can call sign_block_root for the same slot twice — once at interval 4 and again at interval 0 — and the has_proposal flag change alters store attestation-promotion for syncing nodes; both need to be resolved before merging. The core block-builder refactor and the crates/blockchain/src/lib.rs — the fallback signing path and the has_proposal flag change both need attention
|
| Filename | Overview |
|---|---|
| crates/blockchain/src/lib.rs | Core actor refactor: introduces pre-build at interval 4, fallback at interval 0, and three shared helpers. The is_proposer flag passed to store::on_tick no longer filters by duties_allowed() (behavioral regression), and the shared build_signed_block helper signs for slot S at interval 4 and potentially again at interval 0 (XMSS OTS reuse risk in the fallback path). |
| crates/blockchain/src/block_builder.rs | Adds PreparedBlock struct and prebuilt_block_is_usable predicate with thorough unit tests. Logic is clean and well-exercised; no issues found. |
| crates/blockchain/src/store.rs | Splits produce_block_with_signatures into a new produce_block_on_head variant that accepts an already-resolved head root without ticking the store clock. get_proposal_head is made pub(crate) with clear documentation. Changes are correct and well-scoped. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Slot S-1 Interval 4] --> B[prebuild_block next_slot=S]
B --> C[produce_block_on_head READ-ONLY store.head]
C --> D[sign_block_root vid slot=S root_A OTS index S consumed]
D --> E[stash PreparedBlock slot=S]
E --> F[advance_keys_to S]
F --> G[Slot S Interval 0]
G --> H[get_proposal_head ticks store clock]
H --> I{prebuilt_block_is_usable?}
I -->|yes| J[process_block prepared]
J -->|ok| K[publish pre-built block FAST PATH]
J -->|fail| L[fallback build_signed_block]
I -->|no - head moved| L
L --> M[sign_block_root vid slot=S root_B WARNING same OTS index S reused]
M --> N[publish fallback block]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[Slot S-1 Interval 4] --> B[prebuild_block next_slot=S]
B --> C[produce_block_on_head READ-ONLY store.head]
C --> D[sign_block_root vid slot=S root_A OTS index S consumed]
D --> E[stash PreparedBlock slot=S]
E --> F[advance_keys_to S]
F --> G[Slot S Interval 0]
G --> H[get_proposal_head ticks store clock]
H --> I{prebuilt_block_is_usable?}
I -->|yes| J[process_block prepared]
J -->|ok| K[publish pre-built block FAST PATH]
J -->|fail| L[fallback build_signed_block]
I -->|no - head moved| L
L --> M[sign_block_root vid slot=S root_B WARNING same OTS index S reused]
M --> N[publish fallback block]
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
crates/blockchain/src/lib.rs:560-562
**XMSS OTS key reuse in fallback path**
`prebuild_block` at interval 4 calls `sign_block_root(validator_id, slot=S, &block_root_A)`, then if `prebuilt_block_is_usable` returns false at interval 0, the fallback calls `build_signed_block` → `sign_block_root(validator_id, slot=S, &block_root_B)` — same `slot` argument, different message. `ValidatorSecretKey::sign` takes `&self` (non-mutating) and calls `LeanSignatureScheme::sign(&self.inner, slot, …)` deterministically from the OTS leaf at index `slot`, with no internal "already used" guard. Two distinct messages signed under the same OTS leaf violates the W-OTS+/XMSS single-use guarantee; if both signatures are ever visible to an attacker (e.g., via in-memory recovery or a future change that inadvertently publishes the pre-build), signing keys are compromisable. The PR's own description flags OTS-index reuse as a non-negotiable constraint, yet this fallback path triggers it whenever the head changes between interval 4 and interval 0.
### Issue 2 of 3
crates/blockchain/src/lib.rs:259-265
**`has_proposal` flag now ignores sync status**
The old code filtered `scheduled_proposer` through `sync_status.duties_allowed()` before passing the flag to `store::on_tick`. The new code passes `is_proposer = scheduled_proposer.is_some()` without that filter, so a syncing node that happens to be the scheduled proposer now passes `has_proposal = true`, causing `store::on_tick` to call `accept_new_attestations` at interval 0 even though no block will be built. The old behavior deliberately withheld this early promotion while syncing.
```suggestion
let scheduled_proposer = (interval == 0 && slot > 0)
.then(|| self.get_our_proposer(slot))
.flatten();
let is_proposer = scheduled_proposer.is_some()
&& self.sync_status.duties_allowed();
// Tick the store first - this accepts attestations at interval 0 if we have a proposal
store::on_tick(&mut self.store, timestamp_ms, is_proposer);
```
### Issue 3 of 3
crates/blockchain/src/lib.rs:460-464
**Double `coverage::emit_proposal_coverage` emission**
`build_signed_block` calls `coverage::emit_proposal_coverage` unconditionally. This function is invoked at interval 4 (pre-build) and potentially again at interval 0 (fallback). If the pre-built block passes `prebuilt_block_is_usable` but `process_and_publish_block` returns false (local import failure), the fallback calls `build_signed_block` a second time, emitting proposal-coverage metrics twice for the same slot. Consider moving the coverage emission into `process_and_publish_block` (after a successful local import) so it fires exactly once per published block.
Reviews (1): Last reviewed commit: "feat(blockchain): pre-build proposer blo..." | Re-trigger Greptile
| if let Some((signed_block, _)) = self.build_signed_block(slot, validator_id, head_root) { | ||
| self.process_and_publish_block(slot, validator_id, signed_block, "Published block"); | ||
| } |
There was a problem hiding this comment.
XMSS OTS key reuse in fallback path
prebuild_block at interval 4 calls sign_block_root(validator_id, slot=S, &block_root_A), then if prebuilt_block_is_usable returns false at interval 0, the fallback calls build_signed_block → sign_block_root(validator_id, slot=S, &block_root_B) — same slot argument, different message. ValidatorSecretKey::sign takes &self (non-mutating) and calls LeanSignatureScheme::sign(&self.inner, slot, …) deterministically from the OTS leaf at index slot, with no internal "already used" guard. Two distinct messages signed under the same OTS leaf violates the W-OTS+/XMSS single-use guarantee; if both signatures are ever visible to an attacker (e.g., via in-memory recovery or a future change that inadvertently publishes the pre-build), signing keys are compromisable. The PR's own description flags OTS-index reuse as a non-negotiable constraint, yet this fallback path triggers it whenever the head changes between interval 4 and interval 0.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/src/lib.rs
Line: 560-562
Comment:
**XMSS OTS key reuse in fallback path**
`prebuild_block` at interval 4 calls `sign_block_root(validator_id, slot=S, &block_root_A)`, then if `prebuilt_block_is_usable` returns false at interval 0, the fallback calls `build_signed_block` → `sign_block_root(validator_id, slot=S, &block_root_B)` — same `slot` argument, different message. `ValidatorSecretKey::sign` takes `&self` (non-mutating) and calls `LeanSignatureScheme::sign(&self.inner, slot, …)` deterministically from the OTS leaf at index `slot`, with no internal "already used" guard. Two distinct messages signed under the same OTS leaf violates the W-OTS+/XMSS single-use guarantee; if both signatures are ever visible to an attacker (e.g., via in-memory recovery or a future change that inadvertently publishes the pre-build), signing keys are compromisable. The PR's own description flags OTS-index reuse as a non-negotiable constraint, yet this fallback path triggers it whenever the head changes between interval 4 and interval 0.
How can I resolve this? If you propose a fix, please make it concise.| let scheduled_proposer = (interval == 0 && slot > 0) | ||
| .then(|| self.get_our_proposer(slot)) | ||
| .flatten(); | ||
| let is_proposer = scheduled_proposer.is_some(); | ||
|
|
||
| // Tick the store first - this accepts attestations at interval 0 if we have a proposal | ||
| store::on_tick( | ||
| &mut self.store, | ||
| timestamp_ms, | ||
| proposer_validator_id.is_some(), | ||
| ); | ||
| store::on_tick(&mut self.store, timestamp_ms, is_proposer); |
There was a problem hiding this comment.
has_proposal flag now ignores sync status
The old code filtered scheduled_proposer through sync_status.duties_allowed() before passing the flag to store::on_tick. The new code passes is_proposer = scheduled_proposer.is_some() without that filter, so a syncing node that happens to be the scheduled proposer now passes has_proposal = true, causing store::on_tick to call accept_new_attestations at interval 0 even though no block will be built. The old behavior deliberately withheld this early promotion while syncing.
| let scheduled_proposer = (interval == 0 && slot > 0) | |
| .then(|| self.get_our_proposer(slot)) | |
| .flatten(); | |
| let is_proposer = scheduled_proposer.is_some(); | |
| // Tick the store first - this accepts attestations at interval 0 if we have a proposal | |
| store::on_tick( | |
| &mut self.store, | |
| timestamp_ms, | |
| proposer_validator_id.is_some(), | |
| ); | |
| store::on_tick(&mut self.store, timestamp_ms, is_proposer); | |
| let scheduled_proposer = (interval == 0 && slot > 0) | |
| .then(|| self.get_our_proposer(slot)) | |
| .flatten(); | |
| let is_proposer = scheduled_proposer.is_some() | |
| && self.sync_status.duties_allowed(); | |
| // Tick the store first - this accepts attestations at interval 0 if we have a proposal | |
| store::on_tick(&mut self.store, timestamp_ms, is_proposer); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/src/lib.rs
Line: 259-265
Comment:
**`has_proposal` flag now ignores sync status**
The old code filtered `scheduled_proposer` through `sync_status.duties_allowed()` before passing the flag to `store::on_tick`. The new code passes `is_proposer = scheduled_proposer.is_some()` without that filter, so a syncing node that happens to be the scheduled proposer now passes `has_proposal = true`, causing `store::on_tick` to call `accept_new_attestations` at interval 0 even though no block will be built. The old behavior deliberately withheld this early promotion while syncing.
```suggestion
let scheduled_proposer = (interval == 0 && slot > 0)
.then(|| self.get_our_proposer(slot))
.flatten();
let is_proposer = scheduled_proposer.is_some()
&& self.sync_status.duties_allowed();
// Tick the store first - this accepts attestations at interval 0 if we have a proposal
store::on_tick(&mut self.store, timestamp_ms, is_proposer);
```
How can I resolve this? If you propose a fix, please make it concise.| coverage::emit_proposal_coverage( | ||
| &self.store, | ||
| self.attestation_committee_count, | ||
| block.body.attestations.iter(), | ||
| ); |
There was a problem hiding this comment.
Double
coverage::emit_proposal_coverage emission
build_signed_block calls coverage::emit_proposal_coverage unconditionally. This function is invoked at interval 4 (pre-build) and potentially again at interval 0 (fallback). If the pre-built block passes prebuilt_block_is_usable but process_and_publish_block returns false (local import failure), the fallback calls build_signed_block a second time, emitting proposal-coverage metrics twice for the same slot. Consider moving the coverage emission into process_and_publish_block (after a successful local import) so it fires exactly once per published block.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/src/lib.rs
Line: 460-464
Comment:
**Double `coverage::emit_proposal_coverage` emission**
`build_signed_block` calls `coverage::emit_proposal_coverage` unconditionally. This function is invoked at interval 4 (pre-build) and potentially again at interval 0 (fallback). If the pre-built block passes `prebuilt_block_is_usable` but `process_and_publish_block` returns false (local import failure), the fallback calls `build_signed_block` a second time, emitting proposal-coverage metrics twice for the same slot. Consider moving the coverage emission into `process_and_publish_block` (after a successful local import) so it fires exactly once per published block.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
#450 (the on_tick interval regrouping) landed in main via squash, so both sides added the same interval markers/blocks relative to the pre-#450 merge-base. Git kept both with no textual conflict, duplicating the interval 2/3/4 sections (start_aggregation_session would have run twice). Removed the duplicate set, keeping a single interval 0-4 sequence with the prebuild trigger in interval 4.
c5812bb to
d940222
Compare
What
Proposers do the heavy leanVM block-building work (attestation compaction + the Type-1 → Type-2 merge, ~1–2s) synchronously at interval 0 today, squeezed into the slot's first interval. This builds the next slot's block one interval early — at the previous slot's interval 4, synchronously on the blockchain actor — and publishes it at interval 0 after a revalidation. On any mismatch or failure it falls back to the existing synchronous build, so there is no regression.
How
prebuild_blockbuilds + signs + merges the full block against the current (read-only) head and stashes it as aPreparedBlock.propose_blockpublishes the prepared block iff it is still valid (prebuilt_block_is_usable: same slot/proposer, parent still the canonical head, and the built justified slot not behind the store's). Otherwise it rebuilds synchronously.assemble_signed_block/process_and_publish_blockhelpers back both the pre-build and the fallback path.sign_block_roottakes&mut self), so the build cannot be moved off-thread without risking OTS-index reuse.Finalization fix (included)
Capturing the pre-build head via
get_proposal_head— which is not read-only, it ticks the store clock — one interval early advanced the clock prematurely and stalled finalization. Fixed by capturing the head read-only viastore.head().Validation (local devnet: 4 ethlambda nodes, 1 aggregator)
Tradeoff
Building synchronously blocks the actor for the build duration at interval 4. This is acceptable: between interval 4 and the next slot the actor has no other consensus-critical duty, and the devnet run showed zero skipped ticks.
Tests
make fmt+cargo clippy -p ethlambda-blockchain --all-targets -- -D warningsclean; lib tests pass, including the newblock_builder::prebuild_testscovering the usability predicate.