Skip to content

[Perf] Add performance improvements and benchmark harness#581

Open
tywalch wants to merge 18 commits into
masterfrom
perf/audit-findings
Open

[Perf] Add performance improvements and benchmark harness#581
tywalch wants to merge 18 commits into
masterfrom
perf/audit-findings

Conversation

@tywalch

@tywalch tywalch commented Jun 11, 2026

Copy link
Copy Markdown
Owner

No description provided.

tywalch and others added 15 commits June 9, 2026 22:36
Lifts makeMockV2Client out of offline.audit-fixes.spec.js into
test/fixtures/mock-client.js and adds a paging query handler plus a
representative fixture entity (test/fixtures/entities.js) so upcoming
guard tests and benchmark scenarios can share them.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
executeQuery rebuilt its full result accumulator on every page
(results = [...results, ...items] and the collection/hydration
equivalents), making auto-paging O(pages²) in copies. Push items in
place instead; slicing for `count` and cursor derivation are untouched,
so order, truncation, and resumability are identical.

Guard tests pin multi-page union order, count truncation straddling a
page boundary with cursor resume, and per-entity collection demixing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…irst

validateModel ran the ModelBeta jsonschema pass for every model even
though modern v1 models can never satisfy it (it requires a root
`entity` string), enumerating and discarding errors on every Entity
construction. Skip the beta pass when the model has no root entity
string; every other path (valid beta, invalid beta-shaped, invalid v1,
garbage) flows exactly as before, so thrown messages stay byte-identical
— pinned by fixtures captured from the pre-fix implementation.

getInstanceType similarly ran full schema validation before the cheap
`_instance` symbol checks; symbols are now checked first and testModel
only runs for the bare-model fallback.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
formatResponse constructed an ElectroError (including stack capture) on
every invocation when originalErr was unset — once per item across
batchGet formatting, collection demixing, and transaction loops — and
discarded it whenever formatting succeeded. Build it inside the catch
instead: zero allocations on success, identical message/cause/code on
failure. formatResponse is synchronous, so the stack captured in the
catch is the same frame the eager capture rooted at.

Guard tests pin the previously-untested wrapping contract: plain errors
wrap with cause + exact message, ElectroErrors rethrow unwrapped,
originalErr rethrows raw, and parse() surfaces wrapped errors.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…regex

_applyAttributeMutation rebuilt a full copy of the payload every time
any attribute's getter/setter asked for its siblings — once per
attribute, per pass, per item. The snapshot is now built lazily on
first use and shared across the pass; the payload is never mutated
within a pass (writes land on the separate `data` object), so the
shared snapshot holds the same values the per-call copies did.

genericizeJSONPath ran its [digits]→[*] regex on every attribute path
lookup; it now returns bracket-free paths (the per-attribute common
case) untouched.

Guard tests pin sibling visibility (setters see original values,
watchers fire once and see the watched setter's output, getters see
siblings on parse) and bracketed list-path update resolution.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Every chain construction — including get/query/scan — eagerly built an
AttributeOperationProxy, which defines a property per attribute and per
operation and dominates chain-construction cost, despite read chains
never using it. ChainState now exposes `updateProxy` as a cached lazy
getter: the cache preserves the instance identity write clauses rely on
to accumulate expression state, and `update`/the FilterExpressions stay
eager since query paths read them.

entity._params previously destructured updateProxy from state.query for
every method, which would have triggered the getter on all reads; it is
now read only inside the upsert case.

Guard tests pin the accessor + identity stability, that plain read
chains construct zero proxies while an update constructs exactly one,
and byte-identical update/upsert/query params against fixtures captured
from the eager implementation.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adds manual/local benchmarks (npm run benchmark / :json / :update /
:compare) covering the hot paths touched by the perf fixes: entity
construction, chain params building, parse/format at size, batchGet
formatting, and multi-page query accumulation. Scenarios are dropped
into benchmark/scenarios/*.bench.js with no registration list; results
are normalized against a fixed reference task so the committed
baseline.json is roughly machine-independent. Compare is advisory
(exit 0); --strict is the one-flag hook for promoting it to CI later.
Not wired into the test gate or any workflow.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…Script

Fixtures (test/fixtures/*.ts) gain typed exports; the guard spec and
benchmark scenarios import them, while untyped src internals stay as
require() per the existing ts_connected spec convention. The runner
becomes benchmark/run.ts (scripts now use ts-node) and exports the
ScenarioEntry type that scenario files default-export. No behavior
change; the JS audit spec resolves the TS fixtures through ts-node's
require hook in the existing mocha setup.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The compare previously used a blind 20% tolerance on normalized
throughput — unable to distinguish a real 15% regression from noise or
to confirm a real 4% change. Verdicts now apply two gates per task:

statistical — the delta must beat a noise floor where the baseline and
current 95% confidence intervals would overlap. With --runs > 1 (the
default 3 for compare/update) the suite repeats and the interval comes
from the t-based between-run spread of the normalized value, which
captures GC phasing, JIT state, and thermal drift that within-run
sampling underestimates — measured here: the pagination tasks swing
±20% between runs while their within-run margin reads ±1%, and
single-run CIs produced false REGRESSIONs on identical code. With
--runs 1 (quick iteration) it falls back to within-run tinybench rme
combined with the reference task's, since normalized is a ratio.

practical — a real delta must also exceed --threshold (default 5%) to
be labeled REGRESSION/improved; smaller real changes report as
"within threshold", sub-floor deltas as "~noise".

Compare also reports the reference task's raw drift since baseline
(non-uniform machine-condition changes make borderline verdicts
suspect) and warns when noise floors exceed the threshold. The
committed baseline moves to schemaVersion 2 (normalized + normalizedRme
+ samples, captured at 3 runs); filtered compares skip unmatched
baseline tasks. Verified: same-code compares report no regressions;
injected 30%/3% baseline shifts produce REGRESSION and
within-threshold verdicts respectively; --strict exits 1 only on
REGRESSION.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
aggregateRuns defined a loop-local avg closure and a second module-level
avg2 doing the same thing; one module-level average() now serves both
sites. Also drops the missing-task filter in the per-run aggregation —
collectResults aborts the process on any task failure, so every run is
guaranteed to contain every task.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
rme→relativeMarginOfErrorPercent, hz→operationsPerSecond,
p99→percentile99, fn/opts→benchmarkFunction/options, CliArgs→
CommandLineArguments, lo/hi→lowerBound/upperBound, config knobs gain
units (_MILLISECONDS/_PERCENT), scenario files rename to *.benchmark.ts.
Field renames reach baseline.json and --json output, so the baseline
schema bumps to v3 and the committed baseline is regenerated. External
names that mirror third-party APIs (tinybench result fields, DynamoDB
pk/sk) are kept at their access sites.
@tywalch tywalch changed the base branch from fix/documentation-implementation-divergences to master June 13, 2026 13:30
…rf/audit-findings

# Conflicts:
#	test/fixtures/mock-client.ts
@netlify

netlify Bot commented Jun 13, 2026

Copy link
Copy Markdown

Deploy Preview for electrodb-dev canceled.

Name Link
🔨 Latest commit 60abea3
🔍 Latest deploy log https://app.netlify.com/projects/electrodb-dev/deploys/6a2d5d53945d7000086d9c13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant