Skip to content

Comments

[SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC#54449

Open
EnricoMi wants to merge 15 commits intoapache:masterfrom
G-Research:jdbc-upsert-2
Open

[SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC#54449
EnricoMi wants to merge 15 commits intoapache:masterfrom
G-Research:jdbc-upsert-2

Conversation

@EnricoMi
Copy link
Contributor

What changes were proposed in this pull request?

This is a follow-up on #16685 and #16692.

Implements upsert mode for SaveMode.Append of the MySql, MsSql, and Postgres JDBC source.

See #41611 for an alternative using the MERGE INTO command (not supported by MySql).

Why are the changes needed?

The JDBC writer only supports either truncating the existing table or inserting. Duplicates, i.e. rows with identical values in the primary or unique index columns, cause an exception, permitting updating existing and inserting new rows.

Re-evaluating a partition due to executor loss will insert rows that have been inserted in an earlier attempt, which kills the entier Spark job.

Does this PR introduce any user-facing change?

This adds upsert and upsertKeyColumns options for SaveMode.Append of the JDBC source.

How was this patch tested?

Tests in JdbcSuite and integration suites.

Re-opens #49528.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant