Skip to content

INTERVAL filter pushdown unsupported: PyArrow has no equal kernel for any interval type #35

@rustyconover

Description

@rustyconover

Summary

PyArrow has no pc.equal (or not_equal, greater, etc.) kernel for any
of the interval types (month_day_nano_interval, day_time_interval,
month_interval). That means a filter pushdown like:

SELECT * FROM example.echo((SELECT INTERVAL 1 DAY AS i)) WHERE i = INTERVAL 1 DAY;

…cannot be evaluated on the worker side regardless of any extension-type
unwrapping. There is simply no kernel to dispatch to.

Reproduction

Pinned in
tests/test_filter_pushdown_extension.py::test_pyarrow_interval_kernel_gap:

import pyarrow as pa, pyarrow.compute as pc
arr = pa.array([(1, 1, 1000), (2, 2, 2000)], type=pa.month_day_nano_interval())
val = pa.scalar((1, 1, 1000), type=pa.month_day_nano_interval())
pc.equal(arr, val)
# pyarrow.lib.ArrowNotImplementedError:
#   Function 'equal' has no kernel matching input types
#   (month_day_nano_interval, month_day_nano_interval)

Two paths forward

  1. Refuse INTERVAL filters at serialisation in the DuckDB extension.
    Have vgi/src/vgi_table_function_impl.cpp's FilterSerializer skip
    any filter whose value type is INTERVAL, so DuckDB falls back to
    evaluating the predicate after fetching. Smallest, least surprising
    change; cost is no pushdown for INTERVAL.

  2. Implement a custom comparison in ConstantFilter.evaluate for
    interval types — split into (months, days, nanos) and compare
    field-by-field. Requires care for ordered comparisons because two
    intervals like 30 days and 1 month are conventionally equal but
    their fields aren't.

(1) is probably the right call. INTERVAL pushdown is rare in practice and
the field-comparison semantics are murky enough that doing it right needs
explicit thought.

Coverage

test_pyarrow_interval_kernel_gap documents the gap so we notice if a
future PyArrow release fills it.

Priority

Low — INTERVAL filter pushdown is uncommon in real queries. The current
behaviour (worker raises, DuckDB falls back) is suboptimal but not silently
wrong; a clean refusal at serialisation would make the fallback explicit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions