fix(layout): don't panic collecting an empty stream#8472
Conversation
Merging this PR will degrade performance by 25.57%
Warning Please fix the performance issues or acknowledge them on CodSpeed. Performance Changes
Tip Investigate this regression by commenting Comparing Footnotes
|
There was a problem hiding this comment.
Thanks for the contribution!
This seems correct, but I think we should fix this on a lower level.
We generally shouldn't use eof if we do not intend the chunk to go towards the end of file, so although it makes it possible to write an empty chunk, I think the right behaviour of collect strategy is to not yield anything on empty input.
If you just make that change the empty stream would propagate to the children and the flat layout would then fail. I think in the flat layout if there is no chunk we should return an empty ChunkedLayout which allows for no segments to exist at all, unlike flat layout which enforces one segment. If we do this we can save from writing a segment just to store an empty array
So the shape would be:
CollectStrategy: empty input yields no chunksFlatLayoutStrategy: empty input returns a zero-row, zero-childChunkedLayout- non-empty flat input keeps the existing exactly-one-chunk behavior
`CollectStrategy` panicked when its input stream was empty (writing a zero-row nullable struct column) because no chunk supplied a sequence id. mint the collected chunk's id from `eof` instead. Closes vortex-data#8347 Signed-off-by: Han Damin <miniex@daminstudio.net>
instead of minting a sequence id from `eof`, `CollectStrategy` now yields nothing on empty input and `FlatLayoutStrategy` returns an empty `ChunkedLayout`, so no segment is written for an empty array. Signed-off-by: Han Damin <miniex@daminstudio.net>
174774e to
1d1792b
Compare
|
Thanks for the steer - done.
Verified the empty file still opens and scans back to zero rows; full |
Summary
CollectStrategycollects its whole input into a single chunk for a child that requires exactly one chunk. When the input stream is empty, e.g. writing a zero-row table with a nullable struct column whose validity substream is empty, no chunk supplied a sequence id and it panicked withmust have visited at least one chunk.CollectStrategynow yields nothing on empty input, andFlatLayoutStrategyreturns an emptyChunkedLayoutinstead of requiring a single chunk, so no segment is written for an empty array.The issue mentions the fuzzer's
assume()guard could be dropped once this is fixed. I left it in place here: reading a nullable struct nested in a struct back is a separate bug (#8348), so the round-trip only works once both are fixed.Closes: #8347
Testing
Added a regression test in
vortex-file/tests/test_write_table.rsthat writes a zero-row nullable struct column; it panicked before this change and now writes and reads back as zero rows. The issue's Python repro (vx.io.writeof an emptystructcolumn) no longer panics.cargo nextest run -p vortex-layout -p vortex-filepasses (177 tests, including the segment-ordering tests).fmt --all+clippy --all-targets --all-featuresclean.I'm Korean, so sorry if any wording reads a little awkward.