Skip to content

[Bug] two fixed bucket pk table join to fixed bucket table cause low performance #7385

@blackflash997997

Description

@blackflash997997

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

My two paimon tables are both primary key tables with a fixed bucket size of 16. I'm using Spark SQL to execute a join statement on these two tables. One table has 80 million records, and the other has 150 million records. Using my SQL left join, writing to the non-primary key table (without a bucket key) is the fastest. However, if I write to a table with a fixed bucket key or dynamic bucketing, it becomes more than twice as slow. How can I eliminate the requirement for the primary key table to use fixed or dynamic bucketing? Otherwise, it severely impacts computational performance.

Compute Engine

spark 3.5.2

Minimal reproduce step

table A and B have same pk and same fixed bucket number,
table C same fixed bucket number,

insert into c
select * from A left join B on a.id=b.id and a.id1=b.id2

What doesn't meet your expectations?

i can setting table c use no bucket mode to improve performance

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions