Skip to content

[SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC using MERGE INTO with temp table#54450

Open
EnricoMi wants to merge 17 commits intoapache:masterfrom
G-Research:jdbc-upsert-merge-temp-table
Open

[SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC using MERGE INTO with temp table#54450
EnricoMi wants to merge 17 commits intoapache:masterfrom
G-Research:jdbc-upsert-merge-temp-table

Conversation

@EnricoMi
Copy link
Contributor

What changes were proposed in this pull request?

Implements upsert mode for SaveMode.Append of the MsSql, Postgres, Derby, H2 and oracle JDBC source.

This uses MERGE INTO in combination with a temporary table. A batch of rows is inserted into the temporary table (rather than the target table) and merged into the target table with one MERGE INTO command per batch.

See #41518 for an alternative for databases not supporting MERGE INTO syntax.

Why are the changes needed?

The JDBC writer only supports either truncating the existing table or inserting. Duplicates, i.e. rows with identical values in the primary or unique index columns, cause an exception, permitting updating existing and inserting new rows.

Re-evaluating a partition due to executor loss will insert rows that have been inserted in an earlier attempt, which kills the entier Spark job.

Does this PR introduce any user-facing change?

This adds upsert and upsertKeyColumns options for SaveMode.Append of the JDBC source.

How was this patch tested?

Tests in JdbcSuite and integration suites.

Re-opens #41611

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant