Skip to content

TIMETZ pushdown comparison treats values across different offsets as equal; DuckDB-native compares (wall, offset) tuple #33

@rustyconover

Description

@rustyconover

Symptom

Filter pushdown on TIME WITH TIME ZONE columns returns rows that
DuckDB-native evaluation does not match. Specifically, the worker
treats two TIMETZ values as equal whenever their absolute UTC instant
matches (e.g. 12:00:00+00 and 13:00:00+01), while DuckDB-native
compares them as distinct because their (wall, offset) tuples differ.

Reproduction

ATTACH 'example' AS example (TYPE vgi, LOCATION '${VGI_TEST_WORKER}');

-- DuckDB-native: 1 row (12:00+00 only)
SELECT n FROM (VALUES
  (1, '12:00:00+00'::TIMETZ),
  (2, '13:00:00+01'::TIMETZ),
  (3, '12:00:00+02'::TIMETZ)
) t(n, t)
WHERE t = '12:00:00+00'::TIMETZ;

-- Through example.echo (pushdown): 2 rows (12:00+00 AND 13:00+01)
SELECT n FROM example.echo((SELECT * FROM (VALUES
  (1, '12:00:00+00'::TIMETZ),
  (2, '13:00:00+01'::TIMETZ),
  (3, '12:00:00+02'::TIMETZ)
) v(n, t)))
WHERE t = '12:00:00+00'::TIMETZ;

Root cause

DuckDB.time_tz is registered as extension<DuckDB.time_tz> storage type
fixed_size_binary(8) (see
vgi/duckdb/src/common/arrow/arrow_type_extension.cpp:561).
After the new normalisation in
vgi-python/vgi/table_filter_pushdown.py
_normalize_for_compare() strips the wrapper to fixed_size_binary(8) on
both sides, then PyArrow does byte-wise equality. That's almost right —
but DuckDB's TIMETZ storage layout encodes the absolute UTC instant in the
low 40 bits and the original offset in the high 24 bits, so two TIMETZ
values that represent the same UTC instant should differ in the high
bits if they had different originating offsets. The fact that the worker
finds them equal suggests DuckDB is normalising the bytes to UTC before
serialisation (so the offset bytes are identical), and the worker's
byte-equality is then comparing UTC instants only.

So fixing this requires either:

  1. The worker to do a TIMETZ-aware comparison that splits the 8 bytes into
    (wall, offset) and compares both fields.
  2. The DuckDB extension to serialise the original (wall, offset)
    uninterpreted (so byte-equality matches DuckDB-native semantics).

Option (1) is the right one — the wire format should be canonical UTC for
ordering/range queries to make sense; the worker just needs to know that
= on TIMETZ is tuple-equality, not instant-equality.

Coverage gap

vgi/test/sql/integration/filter_pushdown/time_tz.test
currently scopes itself to same-offset comparisons to keep the kernel-pair
fix pinned without surfacing this divergence. Adding cross-offset cases
once this is resolved would close the gap.

Priority

Medium — only affects users who write WHERE timetz_col = literal across
different time zone offsets, which is uncommon, but silent wrong-row
results on a comparison are worth fixing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions