Skip to content

feat[gpu]: arrow device array stream support#8483

Draft
0ax1 wants to merge 23 commits into
developfrom
ad/arrow-device-array-stream
Draft

feat[gpu]: arrow device array stream support#8483
0ax1 wants to merge 23 commits into
developfrom
ad/arrow-device-array-stream

Conversation

@0ax1

@0ax1 0ax1 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

No description provided.

0ax1 added 9 commits June 17, 2026 18:46
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
This reverts commit 52952d2.

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
@0ax1 0ax1 added the changelog/feature A new feature label Jun 17, 2026
@codspeed-hq

codspeed-hq Bot commented Jun 17, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 7 improved benchmarks
❌ 6 regressed benchmarks
✅ 1568 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation take_10k_random 197.9 µs 255.8 µs -22.61%
Simulation take_10k_contiguous 218.5 µs 276.3 µs -20.93%
Simulation patched_take_10k_contiguous_patches 232.2 µs 290.9 µs -20.18%
Simulation patched_take_10k_random 244.2 µs 303 µs -19.4%
Simulation chunked_varbinview_opt_canonical_into[(1000, 10)] 178 µs 213.8 µs -16.78%
Simulation chunked_varbinview_opt_into_canonical[(1000, 10)] 193.4 µs 229.6 µs -15.78%
Simulation chunked_bool_canonical_into[(1000, 10)] 34.9 µs 20.3 µs +71.95%
Simulation chunked_varbinview_canonical_into[(1000, 10)] 198.5 µs 162.2 µs +22.35%
Simulation chunked_varbinview_into_canonical[(1000, 10)] 214.4 µs 178 µs +20.48%
WallTime cuda/bitpacked_u8/unpack/3bw[100M] 352.6 µs 298.7 µs +18.04%
Simulation chunked_varbinview_canonical_into[(100, 100)] 308.7 µs 273.1 µs +13.02%
Simulation chunked_varbinview_into_canonical[(100, 100)] 367.7 µs 332.8 µs +10.48%
Simulation eq_i64_constant 322.6 µs 292.8 µs +10.21%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing ad/arrow-device-array-stream (b89a5c9) with develop (d020924)

Open in CodSpeed

0ax1 added 5 commits June 17, 2026 21:28
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
@0ax1 0ax1 force-pushed the ad/arrow-device-array-stream branch from 3ae0d00 to 07926a0 Compare June 18, 2026 14:37
0ax1 added 5 commits June 18, 2026 14:59
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
@vortex-data vortex-data deleted a comment from github-actions Bot Jun 18, 2026
0ax1 added 4 commits June 18, 2026 16:38
The Arrow C device array stream export drove the Vortex stream on a private
CurrentThreadRuntime, but a partition scan spawns its decode work onto the
session's runtime (vortex-ffi's RUNTIME). Nothing ever drove that runtime
during streaming, so the first get_next on a real partition deadlocked
waiting on tasks that never ran. The existing tests only exercise an inert
in-memory stream, so they never hit it.

Thread the session's runtime through export_device_array_stream and drive
the stream and per-array exports on it, removing the private runtime and
worker pool. Expose vortex_ffi::runtime() so layered FFI crates can pass the
same runtime the partition's scan spawns onto.

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
The device stream derives its schema from the first array and rejects any
later array whose Arrow schema differs, which is required by the Arrow C
stream contract but means a stream whose chunks vary their encoding (a
dictionary-encoded chunk among plain chunks) fails mid-stream. Document this
on the trait, note that an empty stream reports a dtype-derived schema that
can differ from a non-empty run, and sharpen the mismatch error to name the
cause.

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Add a shared ArrowDeviceArray::empty() constructor and build the end-of-stream
marker from it, replacing the hand-rolled struct literal. The stream tests now
call the module-level release_schema/release_device_array helpers instead of
redefining byte-for-byte copies, and drop the duplicate empty_device_array
placeholder in favor of ArrowDeviceArray::empty().

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Several doc and line comments added for the Arrow device array stream exceeded
the 100-column limit. Wrap them; no behavior change.

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant