Skip to content

Add list_length scalar function#8495

Open
mhk197 wants to merge 1 commit into
developfrom
mk/list-length
Open

Add list_length scalar function#8495
mhk197 wants to merge 1 commit into
developfrom
mk/list-length

Conversation

@mhk197

@mhk197 mhk197 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Adds a list_length scalar function returning the number of elements in each list of a List array.

  • Computed purely from the list's offsets/sizes — it never reads elements. Different paths for List and ListView arrays.
  • Returns a U64 array; a null list yields a null length.
  • Registered as a built-in (vortex.list.length) alongside list_contains, and exposed via the list_length(expr) expression constructor.

@mhk197 mhk197 requested a review from a team June 18, 2026 16:11
@mhk197 mhk197 marked this pull request as draft June 18, 2026 16:19
@codspeed-hq

codspeed-hq Bot commented Jun 18, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 13.97%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 13 improved benchmarks
❌ 2 regressed benchmarks
✅ 1566 untouched benchmarks
🆕 6 new benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_varbinview_canonical_into[(100, 100)] 223.8 µs 258.7 µs -13.46%
Simulation chunked_varbinview_opt_canonical_into[(100, 100)] 304.7 µs 338.7 µs -10.04%
Simulation decompress_rd[f64, (10000, 0.01)] 139 µs 108.8 µs +27.79%
Simulation decompress_rd[f64, (10000, 0.1)] 139.3 µs 109.1 µs +27.73%
Simulation decompress_rd[f64, (10000, 0.0)] 138.9 µs 108.8 µs +27.68%
Simulation take_10k_first_chunk_only 271.8 µs 226.8 µs +19.86%
Simulation chunked_bool_canonical_into[(1000, 10)] 33 µs 27.6 µs +19.56%
Simulation take_10k_dispersed 285.5 µs 240.7 µs +18.62%
Simulation decompress_rd[f32, (100000, 0.0)] 583.6 µs 495.9 µs +17.68%
Simulation decompress_rd[f32, (10000, 0.1)] 91 µs 78 µs +16.67%
Simulation decompress_rd[f32, (10000, 0.01)] 90.8 µs 78 µs +16.35%
Simulation decompress_rd[f32, (10000, 0.0)] 91 µs 78.5 µs +15.96%
Simulation patched_take_10k_adversarial 260.4 µs 230 µs +13.2%
Simulation patched_take_10k_first_chunk_only 303.5 µs 273.1 µs +11.14%
Simulation patched_take_10k_dispersed 317.6 µs 287.2 µs +10.59%
🆕 Simulation list_large N/A 9.9 ms N/A
🆕 Simulation list_medium N/A 143.4 µs N/A
🆕 Simulation list_small N/A 58.1 µs N/A
🆕 Simulation listview_large N/A 6 ms N/A
🆕 Simulation listview_medium N/A 98.2 µs N/A
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing mk/list-length (0cdbea4) with develop (ed69077)

Open in CodSpeed

@mhk197 mhk197 force-pushed the mk/list-length branch 2 times, most recently from 0a2f1f1 to 1ed27e1 Compare June 18, 2026 17:19
@mhk197 mhk197 added the changelog/feature A new feature label Jun 18, 2026
Computes the number of elements in each list from the offsets/sizes only (never reading element values), returning a U64 array; a null list yields a null length. Registered as a built-in scalar function (vortex.list.length) alongside list_contains.

Signed-off-by: Matt Katz <mhkatz97@gmail.com>
@mhk197 mhk197 marked this pull request as ready for review June 18, 2026 20:46
@mhk197 mhk197 changed the title Add list_length scalar function Add list_length scalar function Jun 18, 2026
@mhk197 mhk197 requested review from AdamGS and gatesn June 18, 2026 20:47
use vortex_session::VortexSession;

fn main() {
divan::main();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we use criterion these days right?

@mhk197 mhk197 Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its divan in vortex-array crate? Can switch to criterion though if wanted

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only use criterion for cuda benchmarks, divan for the rest. Divan seems largely unmaintained, might need to build our own :/

fn return_dtype(&self, _options: &Self::Options, arg_dtypes: &[DType]) -> VortexResult<DType> {
match &arg_dtypes[0] {
DType::List(_, nullable) => Ok(DType::Primitive(PType::U64, *nullable)),
other => vortex_bail!("list_length() requires List, got {other}"),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May as well support FixedList as well, then implement reduce to collapse it into the constant

) -> VortexResult<ArrayRef> {
// TODO(mk): short-circuit when array is all null

let (lengths, validity) = if let Some(list) = array.as_opt::<ListView>() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "proper" way to do this would be to define an AnyList matcher, then execute_until::<AnyList>?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants