Skip to content

[FLINK-39438][MaxCompute] Add support for sink operation types in MaxCompute options#4377

Open
dion-ricky wants to merge 1 commit intoapache:masterfrom
dion-ricky:FLINK-39438-maxcompute-pipeline-sink-operation
Open

[FLINK-39438][MaxCompute] Add support for sink operation types in MaxCompute options#4377
dion-ricky wants to merge 1 commit intoapache:masterfrom
dion-ricky:FLINK-39438-maxcompute-pipeline-sink-operation

Conversation

@dion-ricky
Copy link
Copy Markdown

@dion-ricky dion-ricky commented Apr 15, 2026

Problem

  • Sink operation mode cannot be explicitly configured via pipeline YAML.
  • Behavior is tightly coupled with MaxCompute table type.
  • Schema evolution auto-creation enforces transactional tables → always upsert.
  • No way to override this behavior for append use cases.

Expected Behavior

  • Users can explicitly define sink behavior (append or upsert) via configuration.
  • Writer selection logic respects the configured option instead of table type.

Proposal

Example YAML Configuration:

sink:
  type: maxcompute
  name: maxcompute
  access-id: ${secret_values.maxcompute_access_id}
  access-key: ${secret_values.maxcompute_access_key}
  endpoint: ${secret_values.maxcompute_endpoint}
  project: maxcompute_project
  quota.name: res_grp
  sink.operation: append 

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an explicit MaxCompute sink option to control whether the sink operates in append or upsert mode, decoupling some behavior from MaxCompute table type.

Changes:

  • Introduce sink.operation configuration (append/upsert) wired through MaxComputeDataSinkFactory into MaxComputeOptions.
  • Adjust writer selection and table-creation behavior to honor the configured sink operation.
  • Add an emulator-based E2E test to verify table creation in APPEND mode does not create transactional / PK metadata.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
.../utils/SchemaEvolutionUtilsTest.java Adds E2E test covering append-mode table creation behavior.
.../EmulatorTestBase.java Adds appendOptions test config using the new sink operation option.
.../writer/MaxComputeWriter.java Uses sinkOperation to decide between upsert vs append writer.
.../writer/BatchAppendWriter.java Avoids constructing PartitionSpec when partition is null/empty.
.../utils/SchemaEvolutionUtils.java Creates transactional table with PKs only when in UPSERT mode.
.../options/MaxComputeOptions.java Adds SinkOperation option to MaxComputeOptions (+ builder + enum).
.../MaxComputeDataSinkOptions.java Defines the new sink.operation ConfigOption.
.../MaxComputeDataSinkFactory.java Reads sink.operation from config and passes it into options.
Comments suppressed due to low confidence (2)

flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-maxcompute/src/main/java/org/apache/flink/cdc/connectors/maxcompute/utils/SchemaEvolutionUtils.java:94

  • When sinkOperation is UPSERT but the provided schema has no primary keys, the table will be created as non-transactional and the sink will effectively behave as append. Consider validating this combination and throwing a clear exception (or otherwise enforcing that UPSERT requires primary keys) to avoid silent misconfiguration.
        if (!CollectionUtil.isNullOrEmpty(schema.primaryKeys())
                && options.getSinkOperation() == MaxComputeOptions.SinkOperation.UPSERT) {
            tableCreator
                    .transactionTable()
                    .withBucketNum(options.getBucketsNum())
                    .withPrimaryKeys(schema.primaryKeys());
        }

flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-maxcompute/src/main/java/org/apache/flink/cdc/connectors/maxcompute/writer/MaxComputeWriter.java:43

  • In UPSERT mode, this falls back to BatchAppendWriter when the target table is non-transactional, which silently ignores update/delete semantics and doesn’t truly “respect the configured option”. Consider failing fast when sinkOperation is UPSERT but isTransactionalTable(...) is false (with an actionable error telling users to create a transactional table or switch to APPEND).
        if (MaxComputeUtils.isTransactionalTable(options, sessionIdentifier)
                && options.getSinkOperation() == MaxComputeOptions.SinkOperation.UPSERT) {
            return new BatchUpsertWriter(options, writeOptions, sessionIdentifier);
        } else {
            return new BatchAppendWriter(options, writeOptions, sessionIdentifier);
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +133 to +147
void testCreateTableInAppendMode() {
try {
String appendTable = "SCHEMA_EVOLUTION_APPEND_TABLE";
SchemaEvolutionUtils.createTable(
appendOptions,
TableId.tableId(appendTable),
Schema.newBuilder()
.physicalColumn("PK", DataTypes.BIGINT())
.physicalColumn("ID1", DataTypes.BIGINT())
.primaryKey("PK")
.build());
// In APPEND mode the table should NOT be created as a transactional table,
// so primary key metadata should be absent even though the schema defines one.
assertThat(odpsInstance.tables().get(appendTable).getPrimaryKey()).isEmpty();
odpsInstance.tables().delete(appendTable, true);
@lvyanquan lvyanquan self-assigned this Apr 30, 2026
.defaultValue(4)
.withDescription("The number of concurrent with flush bucket data.");

public static final ConfigOption<MaxComputeOptions.SinkOperation> SINK_OPERATION =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add documentation for the newly introduced sink.operation option. This should explain:

  • upsert (default): Requires primary keys in schema. Creates a transactional table and supports update/delete semantics.
  • append: Creates a regular table regardless of primary keys. Only supports insert operations, suitable for append-only scenarios.

This helps users understand the configuration impact and choose the appropriate mode for their use case.

return value;
}

public static SinkOperation fromValue(String value) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is unused and could be removed.

this.value = value;
}

public String getValue() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is unused and could be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants