Add MySQL bulk load and dialect-controlled engine creation#67
Merged
Conversation
Implements MySQLDialect.bulk_insert() using MySQL's LOAD DATA LOCAL INFILE for pymysql and mysqldb drivers. Binary columns are hex-encoded in the temp file and decoded server-side with UNHEX(). Falls back to SQLAlchemy executemany for unsupported drivers. Adds integration tests and updates the MySQL CI workflow to enable local_infile on the server. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01QvgSwLLGgKaDbZaEYYp7BJ
- Move engine creation into each dialect (create_engine() override) so MySQL can inject local_infile=True and MSSQL can inject fast_executemany and SERIALIZABLE isolation level automatically when a connection string is provided via DataModel. - Add bulk_load (bool | None) and bulk_load_threshold (int | None) keyword arguments to Document.insert_into_target_tables() and bulk_insert() on all dialects. None (default) = use bulk loading when available with silent fallback; True = require it and raise RuntimeError with an actionable message if unavailable; False = always use executemany. - Remove use_bcp parameter (never publicly released). - Default thresholds: 100 rows for MySQL / MSSQL / DuckDB (file-based), 0 for PostgreSQL (in-protocol COPY, no temp file). - Add corresponding tests for bulk_load=True/False/RuntimeError paths on MySQL and MSSQL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01QvgSwLLGgKaDbZaEYYp7BJ
Add a "Bulk loading" subsection to how_it_works.md explaining the native bulk-loading mechanism used by each backend (COPY, LOAD DATA LOCAL INFILE, BCP, read_csv), the required drivers/tools, the default per-dialect row threshold, and how the new bulk_load / bulk_load_threshold parameters on Document.insert_into_target_tables() control the behaviour. Expand the driver lists in index.md, getting_started.md, and README.md to include psycopg (psycopg3) alongside psycopg2 for PostgreSQL, and mysqlclient alongside pymysql for MySQL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01QvgSwLLGgKaDbZaEYYp7BJ
Swap "value — description" patterns for "value to/: description" phrasing throughout dialect and document docstrings. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01QvgSwLLGgKaDbZaEYYp7BJ
test_mysql_bulk_insert_numeric_types uses a table with only i/bi/si/d columns; order_by(table.c.id) raised AttributeError. Apply ordering only when the id column is present. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01QvgSwLLGgKaDbZaEYYp7BJ
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
LOAD DATA LOCAL INFILEbulk loading for MySQL/MariaDB via thepymysqlandmysqlclientdrivers, with hex-encoded binary columns decoded server-side viaUNHEX()local_infile=Trueand MSSQL can injectfast_executemany/SERIALIZABLEautomatically when a connection string is providedbulk_load(True/False/None) andbulk_load_thresholdparameters toDocument.insert_into_target_tables()—Noneuses the native path with silent fallback,Truerequires it and raisesRuntimeErrorwith an actionable message if unavailable,Falseforces executemanyuse_bcp(never released publicly)