Skip to content

DRILL-8543: Add Support for Materialized Views#3036

Merged
cgivre merged 11 commits into
apache:masterfrom
cgivre:views
Jun 25, 2026
Merged

DRILL-8543: Add Support for Materialized Views#3036
cgivre merged 11 commits into
apache:masterfrom
cgivre:views

Conversation

@cgivre

@cgivre cgivre commented Feb 2, 2026

Copy link
Copy Markdown
Contributor

DRILL-8543: Add Support for Materialized Views

Description

This PR adds materialized view support to Apache Drill, enabling users to store pre-computed query results for improved query performance.

Features

  • SQL Commands: CREATE [OR REPLACE] MATERIALIZED VIEW, DROP MATERIALIZED VIEW, and REFRESH MATERIALIZED VIEW
  • Query Rewriting: Automatic query optimization using Calcite's SubstitutionVisitor to transparently rewrite queries to use materialized views when beneficial
  • Parquet Storage: MV data stored as Parquet files for efficient columnar access
  • Metastore Integration: Optional synchronization of MV metadata to Drill Metastore (Iceberg, RDBMS, MongoDB backends)

Implementation

  • New SQL parser classes for MV statements
  • MaterializedView data model with JSON serialization (.materialized_view.drill files)
  • MaterializedViewHandler for CREATE/DROP/REFRESH operations
  • MaterializedViewRewriter for query plan substitution
  • DrillMaterializedViewTable implementing Calcite's TranslatableTable
  • Metastore API extensions: MaterializedViews interface and MaterializedViewMetadataUnit
  • Iceberg metastore backend implementation for MV metadata

Configuration

  • planner.enable_materialized_view_rewrite (default: true) - Controls automatic query rewriting

Documentation

Added docs/dev/MaterializedViews.md with complete feature documentation

Testing

Added additional unit tests.

@cgivre cgivre self-assigned this Feb 2, 2026
@cgivre cgivre added enhancement PRs that add a new functionality to Drill doc-impacting PRs that affect the documentation performance PRs that Improve Performance major-update labels Feb 2, 2026

@letian-jiang letian-jiang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Materialized view is a powerful feature for analytic engine. 🥳

Comment thread docs/dev/MaterializedViews.md
Comment thread docs/dev/MaterializedViews.md Outdated

@letian-jiang letian-jiang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also add a plan-asserting test to ensure the query is correctly rewrite using MV.

@cgivre

cgivre commented Feb 8, 2026

Copy link
Copy Markdown
Contributor Author

@letian-jiang I believe I addressed your review comments. Could you please mark the review as complete so we can merge?
Thanks!

@cgivre cgivre requested a review from pjfanning February 8, 2026 15:16

@letian-jiang letian-jiang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cgivre cgivre requested a review from rymarm June 15, 2026 13:28

@rymarm rymarm left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cgivre Thank you for implementing this feature! It looks great overall. I found a few issues from my point of view - please check them out. They relate to:

  • MV dataStoragePath: making its format strict and accessing it from the object instead of relying on duck typing.
  • Using hardcoded backticks during query building.
  • Optimizing code syntax.

Comment thread exec/java-exec/src/main/java/org/apache/drill/exec/dotdrill/MaterializedView.java Outdated
Comment thread exec/java-exec/src/main/java/org/apache/drill/exec/dotdrill/MaterializedView.java Outdated
cgivre added 10 commits June 22, 2026 17:29
The MV rewriter had three bugs preventing query rewriting from working:
1. Schema discovery used lazy-loaded schema tree (always empty) - now
   iterates StoragePluginRegistry for FileSystemPlugin instances
2. SubstitutionVisitor arguments were swapped - now uses Calcite's
   RelOptMaterializations.useMaterializedViews() API which handles
   normalization and correct argument order internally
3. buildMvScanRel used SELECT * causing DYNAMIC_STAR type mismatch -
   now selects explicit columns from the MV field definitions

Also adds plan verification tests to both TestMaterializedViewSupport
and TestMaterializedViewRewriting to assert that query plans actually
reference _mv_data (Parquet) or region.json as expected.
- Make MaterializedView.dataStoragePath the single source of truth for the
  data directory ({name}_mv_data) and use getDataStoragePath() everywhere
  instead of hardcoding the _mv_data suffix.
- Quote generated identifiers using the session's configured quoting
  character so MV data scans work when planner.parser.quoting_identifiers
  is not the default backtick.
- createMaterializedViewDataWriter now takes the MaterializedView object.
- Simplify isTable check and use a declarative findFirst in
  getMaterializedView.
- Use pattern-matching instanceof in RecordCollector.
- Bump maven.compiler release/source/target 11 -> 17 (Drill no longer
  supports Java 11; matches the existing enforcer rule and CI matrix).
@cgivre

cgivre commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

@rymarm Thanks for the review. I believe I addressed all your comments.

@rymarm

rymarm commented Jun 25, 2026

Copy link
Copy Markdown
Member

@cgivre Thank you for the changes! Everything looks good. The only thing I wrote about that didn't seem to be understood correctly. Why do you use the full class name:

org.apache.calcite.sql.SqlNode parsedNode = sqlConverter.parse(scanSql.toString());
org.apache.calcite.sql.SqlNode validatedNode = sqlConverter.validate(parsedNode);

instead of import:

import org.apache.calcite.sql.SqlNode;

Here:
https://github.com/cgivre/drill/blob/cf26170180103a55ab4ec7ba3fc2c01a0bc6855e/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/MaterializedViewRewriter.java#L171-L171
https://github.com/cgivre/drill/blob/cf26170180103a55ab4ec7ba3fc2c01a0bc6855e/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/MaterializedViewRewriter.java#L207-L208

@rymarm rymarm left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from me.

@cgivre

cgivre commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

@cgivre Thank you for the changes! Everything looks good. The only thing I wrote about that didn't seem to be understood correctly. Why do you use the full class name:

org.apache.calcite.sql.SqlNode parsedNode = sqlConverter.parse(scanSql.toString());
org.apache.calcite.sql.SqlNode validatedNode = sqlConverter.validate(parsedNode);

instead of import:

import org.apache.calcite.sql.SqlNode;

Here: https://github.com/cgivre/drill/blob/cf26170180103a55ab4ec7ba3fc2c01a0bc6855e/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/MaterializedViewRewriter.java#L171-L171 https://github.com/cgivre/drill/blob/cf26170180103a55ab4ec7ba3fc2c01a0bc6855e/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/MaterializedViewRewriter.java#L207-L208

Oops.. I fixed that.

@cgivre cgivre merged commit 8f06c30 into apache:master Jun 25, 2026
6 checks passed
@cgivre cgivre deleted the views branch June 25, 2026 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-impacting PRs that affect the documentation enhancement PRs that add a new functionality to Drill major-update performance PRs that Improve Performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants