Skip to content

DRILL-8239: Convert JSON UDF to EVF#2567

Open
cgivre wants to merge 1 commit into
apache:masterfrom
cgivre:Convert_to_json_udf
Open

DRILL-8239: Convert JSON UDF to EVF#2567
cgivre wants to merge 1 commit into
apache:masterfrom
cgivre:Convert_to_json_udf

Conversation

@cgivre

@cgivre cgivre commented May 29, 2022

Copy link
Copy Markdown
Contributor

DRILL-8239: Convert JSON UDF to EVF

Description

This PR switches the convert_fromJSON UDFs over to the EVF JsonLoader instead of the old JsonReader. The bulk of the work was in ProjectRecordBatch, which had to learn how to handle more than one complex-writer function per query, top-level JSON scalars and arrays, and multi-row batches — none of which the existing EVF path supported. Tests that broke along the way (TestComplexTypeWriter, TestConvertFunctions, the HTTP UDFs, and others) all pass again.

Documentation

No user facing changes.

Testing

Ran unit tests.

@cgivre cgivre self-assigned this May 29, 2022
@cgivre cgivre marked this pull request as draft May 29, 2022 03:55
@cgivre cgivre changed the title DRILL-8239: Convert JSON UDF to EVF [WIP] DRILL-8239: Convert JSON UDF to EVF May 29, 2022
@cgivre cgivre force-pushed the Convert_to_json_udf branch 2 times, most recently from fc1a320 to 8dd00f8 Compare June 7, 2022 15:46
@cgivre cgivre changed the title [WIP] DRILL-8239: Convert JSON UDF to EVF DRILL-8239: Convert JSON UDF to EVF Jun 7, 2022
@vdiravka vdiravka added the udf DrilllFunc label Jun 9, 2022
@cgivre cgivre force-pushed the Convert_to_json_udf branch 2 times, most recently from 63023ef to 5ab0489 Compare June 19, 2022 02:30
@cgivre cgivre force-pushed the Convert_to_json_udf branch from 87a0d60 to 25c9069 Compare January 3, 2024 18:56
@cgivre cgivre force-pushed the Convert_to_json_udf branch from 6b65419 to 3bfb116 Compare June 26, 2026 15:23
@cgivre cgivre marked this pull request as ready for review June 26, 2026 15:28
Rewrite the convert_fromJSON UDFs to use the EVF JsonLoader (ResultSetLoader)
instead of the legacy JsonReader, mirroring the HTTP storage plugin UDFs.
JsonConverterUtils builds the loader from either the system JSON options or the
explicit allTextMode/readNumbersAsDouble arguments, and centralises the per-row
conversion.

To preserve the full convert_fromJSON contract, the EVF complex-writer support
in ProjectRecordBatch is extended:

* Multiple complex-writer functions per project list. addLoader now keeps a
  list of (loader, output-column) pairs -- captured at codegen in
  DrillComplexWriterFuncHolder -- so each loader's output lands in the column
  reserved for it (fixes SELECT convert_from(a) m1, convert_from(b) m2 and
  cases that project columns before/after the function).

* Top-level scalars and arrays. The UDF wraps each input value in a single
  marker field so the record-oriented loader reads {scalar, array, object}
  uniformly; ProjectRecordBatch unwraps that marker column by transferring it
  directly (preserving the value's own type), and otherwise wraps the loader's
  columns in a map (the HTTP-UDF behaviour).

* Per-output-batch lifecycle. Loaders are re-started before each batch, fixing
  the "Unexpected state: HARVESTED" failure on multi-row/multi-batch input.

* Null/empty input writes an aligned (null) row so the loader row count matches
  the surrounding batch.
@cgivre cgivre force-pushed the Convert_to_json_udf branch from 3bfb116 to 7d3bfee Compare June 26, 2026 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants