Skip to content

Empty ListVector and LargeListVector can expose offset buffers with writerIndex greater than capacity #1194

Description

@prashanthbdremio

Describe the bug, including details regarding any error messages, version, and platform.

ListVector and LargeListVector can expose an invalid offset buffer state when valueCount == 0.

For an empty list vector, the logical offset buffer should still contain the leading offset entry:

  • ListVector: (valueCount + 1) * 4 == 4 bytes
  • LargeListVector: (valueCount + 1) * 8 == 8 bytes

However, in the empty-vector path, the offset buffer can have:

readerIndex: 0
writerIndex: 4
capacity: 0

or the equivalent writerIndex: 8, capacity: 0 for LargeListVector.

This violates the normal buffer invariant:

0 <= readerIndex <= writerIndex <= capacity

Downstream consumers that unwrap or serialize the Arrow buffer through Netty can then fail with:

IndexOutOfBoundsException: readerIndex: 0, writerIndex: 4
(expected: 0 <= readerIndex <= writerIndex <= capacity(0))

The issue is that setReaderAndWriterIndex() sets the offset buffer writer index based on valueCount * OFFSET_WIDTH, which is 0 for empty vectors. But list vectors still require one offset slot even when there are no values.

The same issue applies to both:

  • org.apache.arrow.vector.complex.ListVector
  • org.apache.arrow.vector.complex.LargeListVector

Expected behavior

For valueCount == 0, the offset buffer should still have enough capacity and readable bytes for the leading zero offset:

(valueCount + 1) * OFFSET_WIDTH

So:

  • empty ListVector should expose at least 4 bytes for offset [0]
  • empty LargeListVector should expose at least 8 bytes for offset [0]

The first offset value should be zero.

Actual behavior

An empty list vector can expose an offset buffer with a non-zero writer index but zero capacity, causing Netty buffer validation to fail when the buffer is unwrapped or consumed.

Suggested fix

Update ListVector.setReaderAndWriterIndex() and LargeListVector.setReaderAndWriterIndex() so the offset buffer writer index is based on:

(valueCount + 1) * OFFSET_WIDTH

For the valueCount == 0 case, ensure the offset buffer has enough capacity for the leading zero offset before setting the writer index.

Care should be taken not to shrink the vector's future offset allocation size when allocating this empty sentinel offset buffer.

Additional context

This was observed downstream in Dremio after upgrading Arrow Java. The failure occurred while sending a record batch containing an empty list vector, where the send path unwraps Arrow buffers through Netty.

The downstream error was:

SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 4
(expected: 0 <= readerIndex <= writerIndex <= capacity(0))

This issue is distinct from #1125. That issue involves UnionListReader.setPosition on a post-IPC empty list. This issue is about the offset buffer exported by empty ListVector / LargeListVector instances having an invalid writer-index/capacity relationship.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions