Describe the bug, including details regarding any error messages, version, and platform.
ListVector and LargeListVector can expose an invalid offset buffer state when valueCount == 0.
For an empty list vector, the logical offset buffer should still contain the leading offset entry:
ListVector: (valueCount + 1) * 4 == 4 bytes
LargeListVector: (valueCount + 1) * 8 == 8 bytes
However, in the empty-vector path, the offset buffer can have:
readerIndex: 0
writerIndex: 4
capacity: 0
or the equivalent writerIndex: 8, capacity: 0 for LargeListVector.
This violates the normal buffer invariant:
0 <= readerIndex <= writerIndex <= capacity
Downstream consumers that unwrap or serialize the Arrow buffer through Netty can then fail with:
IndexOutOfBoundsException: readerIndex: 0, writerIndex: 4
(expected: 0 <= readerIndex <= writerIndex <= capacity(0))
The issue is that setReaderAndWriterIndex() sets the offset buffer writer index based on valueCount * OFFSET_WIDTH, which is 0 for empty vectors. But list vectors still require one offset slot even when there are no values.
The same issue applies to both:
org.apache.arrow.vector.complex.ListVector
org.apache.arrow.vector.complex.LargeListVector
Expected behavior
For valueCount == 0, the offset buffer should still have enough capacity and readable bytes for the leading zero offset:
(valueCount + 1) * OFFSET_WIDTH
So:
- empty
ListVector should expose at least 4 bytes for offset [0]
- empty
LargeListVector should expose at least 8 bytes for offset [0]
The first offset value should be zero.
Actual behavior
An empty list vector can expose an offset buffer with a non-zero writer index but zero capacity, causing Netty buffer validation to fail when the buffer is unwrapped or consumed.
Suggested fix
Update ListVector.setReaderAndWriterIndex() and LargeListVector.setReaderAndWriterIndex() so the offset buffer writer index is based on:
(valueCount + 1) * OFFSET_WIDTH
For the valueCount == 0 case, ensure the offset buffer has enough capacity for the leading zero offset before setting the writer index.
Care should be taken not to shrink the vector's future offset allocation size when allocating this empty sentinel offset buffer.
Additional context
This was observed downstream in Dremio after upgrading Arrow Java. The failure occurred while sending a record batch containing an empty list vector, where the send path unwraps Arrow buffers through Netty.
The downstream error was:
SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 4
(expected: 0 <= readerIndex <= writerIndex <= capacity(0))
This issue is distinct from #1125. That issue involves UnionListReader.setPosition on a post-IPC empty list. This issue is about the offset buffer exported by empty ListVector / LargeListVector instances having an invalid writer-index/capacity relationship.
Describe the bug, including details regarding any error messages, version, and platform.
ListVectorandLargeListVectorcan expose an invalid offset buffer state whenvalueCount == 0.For an empty list vector, the logical offset buffer should still contain the leading offset entry:
ListVector:(valueCount + 1) * 4 == 4bytesLargeListVector:(valueCount + 1) * 8 == 8bytesHowever, in the empty-vector path, the offset buffer can have:
or the equivalent
writerIndex: 8, capacity: 0forLargeListVector.This violates the normal buffer invariant:
Downstream consumers that unwrap or serialize the Arrow buffer through Netty can then fail with:
The issue is that
setReaderAndWriterIndex()sets the offset buffer writer index based onvalueCount * OFFSET_WIDTH, which is0for empty vectors. But list vectors still require one offset slot even when there are no values.The same issue applies to both:
org.apache.arrow.vector.complex.ListVectororg.apache.arrow.vector.complex.LargeListVectorExpected behavior
For
valueCount == 0, the offset buffer should still have enough capacity and readable bytes for the leading zero offset:So:
ListVectorshould expose at least 4 bytes for offset[0]LargeListVectorshould expose at least 8 bytes for offset[0]The first offset value should be zero.
Actual behavior
An empty list vector can expose an offset buffer with a non-zero writer index but zero capacity, causing Netty buffer validation to fail when the buffer is unwrapped or consumed.
Suggested fix
Update
ListVector.setReaderAndWriterIndex()andLargeListVector.setReaderAndWriterIndex()so the offset buffer writer index is based on:For the
valueCount == 0case, ensure the offset buffer has enough capacity for the leading zero offset before setting the writer index.Care should be taken not to shrink the vector's future offset allocation size when allocating this empty sentinel offset buffer.
Additional context
This was observed downstream in Dremio after upgrading Arrow Java. The failure occurred while sending a record batch containing an empty list vector, where the send path unwraps Arrow buffers through Netty.
The downstream error was:
This issue is distinct from #1125. That issue involves
UnionListReader.setPositionon a post-IPC empty list. This issue is about the offset buffer exported by emptyListVector/LargeListVectorinstances having an invalid writer-index/capacity relationship.