Skip to content

Add MySQL bulk load and dialect-controlled engine creation#67

Merged
martinv13 merged 5 commits into
cre-dev:mainfrom
martinv13:claude/mysql-bulk-load
Jun 20, 2026
Merged

Add MySQL bulk load and dialect-controlled engine creation#67
martinv13 merged 5 commits into
cre-dev:mainfrom
martinv13:claude/mysql-bulk-load

Conversation

@martinv13

Copy link
Copy Markdown
Collaborator

Summary

  • Add LOAD DATA LOCAL INFILE bulk loading for MySQL/MariaDB via the pymysql and mysqlclient drivers, with hex-encoded binary columns decoded server-side via UNHEX()
  • Move engine creation into each dialect so MySQL can inject local_infile=True and MSSQL can inject fast_executemany/SERIALIZABLE automatically when a connection string is provided
  • Add bulk_load (True/False/None) and bulk_load_threshold parameters to Document.insert_into_target_tables()None uses the native path with silent fallback, True requires it and raises RuntimeError with an actionable message if unavailable, False forces executemany
  • Remove use_bcp (never released publicly)
  • Default thresholds: 100 rows for MySQL/MSSQL/DuckDB (file-based), 0 for PostgreSQL (in-protocol COPY)
  • Document bulk

claude added 5 commits June 20, 2026 10:04
Implements MySQLDialect.bulk_insert() using MySQL's LOAD DATA LOCAL INFILE
for pymysql and mysqldb drivers. Binary columns are hex-encoded in the temp
file and decoded server-side with UNHEX(). Falls back to SQLAlchemy
executemany for unsupported drivers. Adds integration tests and updates the
MySQL CI workflow to enable local_infile on the server.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01QvgSwLLGgKaDbZaEYYp7BJ
- Move engine creation into each dialect (create_engine() override) so
  MySQL can inject local_infile=True and MSSQL can inject fast_executemany
  and SERIALIZABLE isolation level automatically when a connection string is
  provided via DataModel.
- Add bulk_load (bool | None) and bulk_load_threshold (int | None) keyword
  arguments to Document.insert_into_target_tables() and bulk_insert() on
  all dialects. None (default) = use bulk loading when available with silent
  fallback; True = require it and raise RuntimeError with an actionable
  message if unavailable; False = always use executemany.
- Remove use_bcp parameter (never publicly released).
- Default thresholds: 100 rows for MySQL / MSSQL / DuckDB (file-based),
  0 for PostgreSQL (in-protocol COPY, no temp file).
- Add corresponding tests for bulk_load=True/False/RuntimeError paths on
  MySQL and MSSQL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01QvgSwLLGgKaDbZaEYYp7BJ
Add a "Bulk loading" subsection to how_it_works.md explaining the native
bulk-loading mechanism used by each backend (COPY, LOAD DATA LOCAL INFILE,
BCP, read_csv), the required drivers/tools, the default per-dialect row
threshold, and how the new bulk_load / bulk_load_threshold parameters on
Document.insert_into_target_tables() control the behaviour.

Expand the driver lists in index.md, getting_started.md, and README.md to
include psycopg (psycopg3) alongside psycopg2 for PostgreSQL, and
mysqlclient alongside pymysql for MySQL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01QvgSwLLGgKaDbZaEYYp7BJ
Swap "value — description" patterns for "value to/: description" phrasing
throughout dialect and document docstrings.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01QvgSwLLGgKaDbZaEYYp7BJ
test_mysql_bulk_insert_numeric_types uses a table with only i/bi/si/d
columns; order_by(table.c.id) raised AttributeError. Apply ordering
only when the id column is present.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01QvgSwLLGgKaDbZaEYYp7BJ
@martinv13 martinv13 merged commit 737da3d into cre-dev:main Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants