Symptom
Filter pushdown on TIME WITH TIME ZONE columns returns rows that
DuckDB-native evaluation does not match. Specifically, the worker
treats two TIMETZ values as equal whenever their absolute UTC instant
matches (e.g. 12:00:00+00 and 13:00:00+01), while DuckDB-native
compares them as distinct because their (wall, offset) tuples differ.
Reproduction
ATTACH 'example' AS example (TYPE vgi, LOCATION '${VGI_TEST_WORKER}');
-- DuckDB-native: 1 row (12:00+00 only)
SELECT n FROM (VALUES
(1, '12:00:00+00'::TIMETZ),
(2, '13:00:00+01'::TIMETZ),
(3, '12:00:00+02'::TIMETZ)
) t(n, t)
WHERE t = '12:00:00+00'::TIMETZ;
-- Through example.echo (pushdown): 2 rows (12:00+00 AND 13:00+01)
SELECT n FROM example.echo((SELECT * FROM (VALUES
(1, '12:00:00+00'::TIMETZ),
(2, '13:00:00+01'::TIMETZ),
(3, '12:00:00+02'::TIMETZ)
) v(n, t)))
WHERE t = '12:00:00+00'::TIMETZ;
Root cause
DuckDB.time_tz is registered as extension<DuckDB.time_tz> storage type
fixed_size_binary(8) (see
vgi/duckdb/src/common/arrow/arrow_type_extension.cpp:561).
After the new normalisation in
vgi-python/vgi/table_filter_pushdown.py
_normalize_for_compare() strips the wrapper to fixed_size_binary(8) on
both sides, then PyArrow does byte-wise equality. That's almost right —
but DuckDB's TIMETZ storage layout encodes the absolute UTC instant in the
low 40 bits and the original offset in the high 24 bits, so two TIMETZ
values that represent the same UTC instant should differ in the high
bits if they had different originating offsets. The fact that the worker
finds them equal suggests DuckDB is normalising the bytes to UTC before
serialisation (so the offset bytes are identical), and the worker's
byte-equality is then comparing UTC instants only.
So fixing this requires either:
- The worker to do a TIMETZ-aware comparison that splits the 8 bytes into
(wall, offset) and compares both fields.
- The DuckDB extension to serialise the original
(wall, offset)
uninterpreted (so byte-equality matches DuckDB-native semantics).
Option (1) is the right one — the wire format should be canonical UTC for
ordering/range queries to make sense; the worker just needs to know that
= on TIMETZ is tuple-equality, not instant-equality.
Coverage gap
vgi/test/sql/integration/filter_pushdown/time_tz.test
currently scopes itself to same-offset comparisons to keep the kernel-pair
fix pinned without surfacing this divergence. Adding cross-offset cases
once this is resolved would close the gap.
Priority
Medium — only affects users who write WHERE timetz_col = literal across
different time zone offsets, which is uncommon, but silent wrong-row
results on a comparison are worth fixing.
Symptom
Filter pushdown on
TIME WITH TIME ZONEcolumns returns rows thatDuckDB-native evaluation does not match. Specifically, the worker
treats two TIMETZ values as equal whenever their absolute UTC instant
matches (e.g.
12:00:00+00and13:00:00+01), while DuckDB-nativecompares them as distinct because their
(wall, offset)tuples differ.Reproduction
Root cause
DuckDB.time_tzis registered asextension<DuckDB.time_tz>storage typefixed_size_binary(8)(seevgi/duckdb/src/common/arrow/arrow_type_extension.cpp:561).After the new normalisation in
vgi-python/vgi/table_filter_pushdown.py_normalize_for_compare()strips the wrapper tofixed_size_binary(8)onboth sides, then PyArrow does byte-wise equality. That's almost right —
but DuckDB's TIMETZ storage layout encodes the absolute UTC instant in the
low 40 bits and the original offset in the high 24 bits, so two TIMETZ
values that represent the same UTC instant should differ in the high
bits if they had different originating offsets. The fact that the worker
finds them equal suggests DuckDB is normalising the bytes to UTC before
serialisation (so the offset bytes are identical), and the worker's
byte-equality is then comparing UTC instants only.
So fixing this requires either:
(wall, offset)and compares both fields.(wall, offset)uninterpreted (so byte-equality matches DuckDB-native semantics).
Option (1) is the right one — the wire format should be canonical UTC for
ordering/range queries to make sense; the worker just needs to know that
=on TIMETZ is tuple-equality, not instant-equality.Coverage gap
vgi/test/sql/integration/filter_pushdown/time_tz.testcurrently scopes itself to same-offset comparisons to keep the kernel-pair
fix pinned without surfacing this divergence. Adding cross-offset cases
once this is resolved would close the gap.
Priority
Medium — only affects users who write
WHERE timetz_col = literalacrossdifferent time zone offsets, which is uncommon, but silent wrong-row
results on a comparison are worth fixing.