Skip to content

fix(layout): don't panic collecting an empty stream#8472

Open
miniex wants to merge 2 commits into
vortex-data:developfrom
miniex:fix/empty-struct-write-panic
Open

fix(layout): don't panic collecting an empty stream#8472
miniex wants to merge 2 commits into
vortex-data:developfrom
miniex:fix/empty-struct-write-panic

Conversation

@miniex

@miniex miniex commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

CollectStrategy collects its whole input into a single chunk for a child that requires exactly one chunk. When the input stream is empty, e.g. writing a zero-row table with a nullable struct column whose validity substream is empty, no chunk supplied a sequence id and it panicked with must have visited at least one chunk. CollectStrategy now yields nothing on empty input, and FlatLayoutStrategy returns an empty ChunkedLayout instead of requiring a single chunk, so no segment is written for an empty array.

The issue mentions the fuzzer's assume() guard could be dropped once this is fixed. I left it in place here: reading a nullable struct nested in a struct back is a separate bug (#8348), so the round-trip only works once both are fixed.

Closes: #8347

Testing

Added a regression test in vortex-file/tests/test_write_table.rs that writes a zero-row nullable struct column; it panicked before this change and now writes and reads back as zero rows. The issue's Python repro (vx.io.write of an empty struct column) no longer panics. cargo nextest run -p vortex-layout -p vortex-file passes (177 tests, including the segment-ordering tests). fmt --all + clippy --all-targets --all-features clean.


I'm Korean, so sorry if any wording reads a little awkward.

@miniex miniex requested a review from a team June 17, 2026 14:37
@joseph-isaacs joseph-isaacs requested a review from onursatici June 17, 2026 14:53
@codspeed-hq

codspeed-hq Bot commented Jun 17, 2026

Copy link
Copy Markdown

Merging this PR will degrade performance by 25.57%

⚡ 3 improved benchmarks
❌ 13 regressed benchmarks
✅ 1493 untouched benchmarks
⏩ 83 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation slice_empty_vortex 368.3 ns 2,628.6 ns -85.99%
Simulation slice_vortex_buffer[1024] 871.4 ns 1,335 ns -34.73%
Simulation slice_vortex_buffer[16384] 871.4 ns 1,335 ns -34.73%
Simulation slice_vortex_buffer[2048] 871.4 ns 1,335 ns -34.73%
Simulation slice_vortex_buffer[128] 871.4 ns 1,335 ns -34.73%
Simulation slice_vortex_buffer[65536] 871.4 ns 1,335 ns -34.73%
Simulation chunked_varbinview_canonical_into[(1000, 10)] 162.2 µs 198 µs -18.08%
Simulation chunked_varbinview_into_canonical[(1000, 10)] 177.7 µs 213.5 µs -16.78%
Simulation search_index_below_min_chunked 1.3 ms 1.5 ms -13.61%
Simulation search_index_mixed_out_of_range_chunked 1.3 ms 1.5 ms -13.31%
Simulation count_i32_clustered_nulls 47 µs 53.8 µs -12.68%
Simulation search_index_full_range_random_chunked 1.4 ms 1.6 ms -12.06%
Simulation chunked_varbinview_canonical_into[(100, 100)] 273.2 µs 307.9 µs -11.27%
Simulation chunked_varbinview_opt_into_canonical[(1000, 10)] 229.1 µs 192.7 µs +18.86%
Simulation bitwise_not_vortex_buffer_mut[128] 215.3 ns 186.1 ns +15.67%
Simulation bitwise_not_vortex_buffer_mut[1024] 275.6 ns 246.4 ns +11.84%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing miniex:fix/empty-struct-write-panic (174774e) with develop (85aad72)2

Open in CodSpeed

Footnotes

  1. 83 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on develop (0ed06b3) during the generation of this report, so 85aad72 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@miniex miniex marked this pull request as draft June 17, 2026 22:40
@miniex miniex marked this pull request as ready for review June 17, 2026 22:44

@onursatici onursatici left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!
This seems correct, but I think we should fix this on a lower level.
We generally shouldn't use eof if we do not intend the chunk to go towards the end of file, so although it makes it possible to write an empty chunk, I think the right behaviour of collect strategy is to not yield anything on empty input.
If you just make that change the empty stream would propagate to the children and the flat layout would then fail. I think in the flat layout if there is no chunk we should return an empty ChunkedLayout which allows for no segments to exist at all, unlike flat layout which enforces one segment. If we do this we can save from writing a segment just to store an empty array

So the shape would be:

  • CollectStrategy: empty input yields no chunks
  • FlatLayoutStrategy: empty input returns a zero-row, zero-child ChunkedLayout
  • non-empty flat input keeps the existing exactly-one-chunk behavior

miniex added 2 commits June 19, 2026 09:12
`CollectStrategy` panicked when its input stream was empty (writing a zero-row
nullable struct column) because no chunk supplied a sequence id. mint the
collected chunk's id from `eof` instead.

Closes vortex-data#8347

Signed-off-by: Han Damin <miniex@daminstudio.net>
instead of minting a sequence id from `eof`, `CollectStrategy` now yields nothing
on empty input and `FlatLayoutStrategy` returns an empty `ChunkedLayout`, so no
segment is written for an empty array.

Signed-off-by: Han Damin <miniex@daminstudio.net>
@miniex miniex force-pushed the fix/empty-struct-write-panic branch from 174774e to 1d1792b Compare June 19, 2026 00:15
@miniex

miniex commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the steer - done.

CollectStrategy now yields nothing on empty input, and FlatLayoutStrategy returns an empty ChunkedLayout instead of requiring a single chunk, so no segment is written for an empty array. Dropped the eof minting as you suggested. Non-empty flat input keeps the exactly-one-chunk path.

Verified the empty file still opens and scans back to zero rows; full vortex-layout + vortex-file suites green.

@miniex miniex requested a review from onursatici June 19, 2026 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Writing an empty table with a top-level struct column panics: "must have visited at least one chunk"

2 participants