Summary
PyArrow has no pc.equal (or not_equal, greater, etc.) kernel for any
of the interval types (month_day_nano_interval, day_time_interval,
month_interval). That means a filter pushdown like:
SELECT * FROM example.echo((SELECT INTERVAL 1 DAY AS i)) WHERE i = INTERVAL 1 DAY;
…cannot be evaluated on the worker side regardless of any extension-type
unwrapping. There is simply no kernel to dispatch to.
Reproduction
Pinned in
tests/test_filter_pushdown_extension.py::test_pyarrow_interval_kernel_gap:
import pyarrow as pa, pyarrow.compute as pc
arr = pa.array([(1, 1, 1000), (2, 2, 2000)], type=pa.month_day_nano_interval())
val = pa.scalar((1, 1, 1000), type=pa.month_day_nano_interval())
pc.equal(arr, val)
# pyarrow.lib.ArrowNotImplementedError:
# Function 'equal' has no kernel matching input types
# (month_day_nano_interval, month_day_nano_interval)
Two paths forward
-
Refuse INTERVAL filters at serialisation in the DuckDB extension.
Have vgi/src/vgi_table_function_impl.cpp's FilterSerializer skip
any filter whose value type is INTERVAL, so DuckDB falls back to
evaluating the predicate after fetching. Smallest, least surprising
change; cost is no pushdown for INTERVAL.
-
Implement a custom comparison in ConstantFilter.evaluate for
interval types — split into (months, days, nanos) and compare
field-by-field. Requires care for ordered comparisons because two
intervals like 30 days and 1 month are conventionally equal but
their fields aren't.
(1) is probably the right call. INTERVAL pushdown is rare in practice and
the field-comparison semantics are murky enough that doing it right needs
explicit thought.
Coverage
test_pyarrow_interval_kernel_gap documents the gap so we notice if a
future PyArrow release fills it.
Priority
Low — INTERVAL filter pushdown is uncommon in real queries. The current
behaviour (worker raises, DuckDB falls back) is suboptimal but not silently
wrong; a clean refusal at serialisation would make the fallback explicit.
Summary
PyArrow has no
pc.equal(ornot_equal,greater, etc.) kernel for anyof the interval types (
month_day_nano_interval,day_time_interval,month_interval). That means a filter pushdown like:…cannot be evaluated on the worker side regardless of any extension-type
unwrapping. There is simply no kernel to dispatch to.
Reproduction
Pinned in
tests/test_filter_pushdown_extension.py::test_pyarrow_interval_kernel_gap:Two paths forward
Refuse INTERVAL filters at serialisation in the DuckDB extension.
Have
vgi/src/vgi_table_function_impl.cpp'sFilterSerializerskipany filter whose value type is INTERVAL, so DuckDB falls back to
evaluating the predicate after fetching. Smallest, least surprising
change; cost is no pushdown for INTERVAL.
Implement a custom comparison in
ConstantFilter.evaluateforinterval types — split into
(months, days, nanos)and comparefield-by-field. Requires care for ordered comparisons because two
intervals like
30 daysand1 monthare conventionally equal buttheir fields aren't.
(1) is probably the right call. INTERVAL pushdown is rare in practice and
the field-comparison semantics are murky enough that doing it right needs
explicit thought.
Coverage
test_pyarrow_interval_kernel_gapdocuments the gap so we notice if afuture PyArrow release fills it.
Priority
Low — INTERVAL filter pushdown is uncommon in real queries. The current
behaviour (worker raises, DuckDB falls back) is suboptimal but not silently
wrong; a clean refusal at serialisation would make the fallback explicit.