[Perf] Add performance improvements and benchmark harness#581
Open
tywalch wants to merge 18 commits into
Open
Conversation
…vered by Claude Fable.
Lifts makeMockV2Client out of offline.audit-fixes.spec.js into test/fixtures/mock-client.js and adds a paging query handler plus a representative fixture entity (test/fixtures/entities.js) so upcoming guard tests and benchmark scenarios can share them. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
executeQuery rebuilt its full result accumulator on every page (results = [...results, ...items] and the collection/hydration equivalents), making auto-paging O(pages²) in copies. Push items in place instead; slicing for `count` and cursor derivation are untouched, so order, truncation, and resumability are identical. Guard tests pin multi-page union order, count truncation straddling a page boundary with cursor resume, and per-entity collection demixing. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…irst validateModel ran the ModelBeta jsonschema pass for every model even though modern v1 models can never satisfy it (it requires a root `entity` string), enumerating and discarding errors on every Entity construction. Skip the beta pass when the model has no root entity string; every other path (valid beta, invalid beta-shaped, invalid v1, garbage) flows exactly as before, so thrown messages stay byte-identical — pinned by fixtures captured from the pre-fix implementation. getInstanceType similarly ran full schema validation before the cheap `_instance` symbol checks; symbols are now checked first and testModel only runs for the bare-model fallback. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
formatResponse constructed an ElectroError (including stack capture) on every invocation when originalErr was unset — once per item across batchGet formatting, collection demixing, and transaction loops — and discarded it whenever formatting succeeded. Build it inside the catch instead: zero allocations on success, identical message/cause/code on failure. formatResponse is synchronous, so the stack captured in the catch is the same frame the eager capture rooted at. Guard tests pin the previously-untested wrapping contract: plain errors wrap with cause + exact message, ElectroErrors rethrow unwrapped, originalErr rethrows raw, and parse() surfaces wrapped errors. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…regex _applyAttributeMutation rebuilt a full copy of the payload every time any attribute's getter/setter asked for its siblings — once per attribute, per pass, per item. The snapshot is now built lazily on first use and shared across the pass; the payload is never mutated within a pass (writes land on the separate `data` object), so the shared snapshot holds the same values the per-call copies did. genericizeJSONPath ran its [digits]→[*] regex on every attribute path lookup; it now returns bracket-free paths (the per-attribute common case) untouched. Guard tests pin sibling visibility (setters see original values, watchers fire once and see the watched setter's output, getters see siblings on parse) and bracketed list-path update resolution. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Every chain construction — including get/query/scan — eagerly built an AttributeOperationProxy, which defines a property per attribute and per operation and dominates chain-construction cost, despite read chains never using it. ChainState now exposes `updateProxy` as a cached lazy getter: the cache preserves the instance identity write clauses rely on to accumulate expression state, and `update`/the FilterExpressions stay eager since query paths read them. entity._params previously destructured updateProxy from state.query for every method, which would have triggered the getter on all reads; it is now read only inside the upsert case. Guard tests pin the accessor + identity stability, that plain read chains construct zero proxies while an update constructs exactly one, and byte-identical update/upsert/query params against fixtures captured from the eager implementation. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adds manual/local benchmarks (npm run benchmark / :json / :update / :compare) covering the hot paths touched by the perf fixes: entity construction, chain params building, parse/format at size, batchGet formatting, and multi-page query accumulation. Scenarios are dropped into benchmark/scenarios/*.bench.js with no registration list; results are normalized against a fixed reference task so the committed baseline.json is roughly machine-independent. Compare is advisory (exit 0); --strict is the one-flag hook for promoting it to CI later. Not wired into the test gate or any workflow. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…Script Fixtures (test/fixtures/*.ts) gain typed exports; the guard spec and benchmark scenarios import them, while untyped src internals stay as require() per the existing ts_connected spec convention. The runner becomes benchmark/run.ts (scripts now use ts-node) and exports the ScenarioEntry type that scenario files default-export. No behavior change; the JS audit spec resolves the TS fixtures through ts-node's require hook in the existing mocha setup. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The compare previously used a blind 20% tolerance on normalized throughput — unable to distinguish a real 15% regression from noise or to confirm a real 4% change. Verdicts now apply two gates per task: statistical — the delta must beat a noise floor where the baseline and current 95% confidence intervals would overlap. With --runs > 1 (the default 3 for compare/update) the suite repeats and the interval comes from the t-based between-run spread of the normalized value, which captures GC phasing, JIT state, and thermal drift that within-run sampling underestimates — measured here: the pagination tasks swing ±20% between runs while their within-run margin reads ±1%, and single-run CIs produced false REGRESSIONs on identical code. With --runs 1 (quick iteration) it falls back to within-run tinybench rme combined with the reference task's, since normalized is a ratio. practical — a real delta must also exceed --threshold (default 5%) to be labeled REGRESSION/improved; smaller real changes report as "within threshold", sub-floor deltas as "~noise". Compare also reports the reference task's raw drift since baseline (non-uniform machine-condition changes make borderline verdicts suspect) and warns when noise floors exceed the threshold. The committed baseline moves to schemaVersion 2 (normalized + normalizedRme + samples, captured at 3 runs); filtered compares skip unmatched baseline tasks. Verified: same-code compares report no regressions; injected 30%/3% baseline shifts produce REGRESSION and within-threshold verdicts respectively; --strict exits 1 only on REGRESSION. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
aggregateRuns defined a loop-local avg closure and a second module-level avg2 doing the same thing; one module-level average() now serves both sites. Also drops the missing-task filter in the per-run aggregation — collectResults aborts the process on any task failure, so every run is guaranteed to contain every task. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
rme→relativeMarginOfErrorPercent, hz→operationsPerSecond, p99→percentile99, fn/opts→benchmarkFunction/options, CliArgs→ CommandLineArguments, lo/hi→lowerBound/upperBound, config knobs gain units (_MILLISECONDS/_PERCENT), scenario files rename to *.benchmark.ts. Field renames reach baseline.json and --json output, so the baseline schema bumps to v3 and the committed baseline is regenerated. External names that mirror third-party APIs (tinybench result fields, DynamoDB pk/sk) are kept at their access sites.
…rf/audit-findings # Conflicts: # test/fixtures/mock-client.ts
✅ Deploy Preview for electrodb-dev canceled.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.