-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Search before asking
- I searched in the issues and found nothing similar.
Paimon version
My two paimon tables are both primary key tables with a fixed bucket size of 16. I'm using Spark SQL to execute a join statement on these two tables. One table has 80 million records, and the other has 150 million records. Using my SQL left join, writing to the non-primary key table (without a bucket key) is the fastest. However, if I write to a table with a fixed bucket key or dynamic bucketing, it becomes more than twice as slow. How can I eliminate the requirement for the primary key table to use fixed or dynamic bucketing? Otherwise, it severely impacts computational performance.
Compute Engine
spark 3.5.2
Minimal reproduce step
table A and B have same pk and same fixed bucket number,
table C same fixed bucket number,
insert into c
select * from A left join B on a.id=b.id and a.id1=b.id2
What doesn't meet your expectations?
i can setting table c use no bucket mode to improve performance
Anything else?
No response
Are you willing to submit a PR?
- I'm willing to submit a PR!