[SPARK-55674][PYTHON] Optimize 0-column table conversion in Spark Connect by Yicong-Huang · Pull Request #54468 · apache/spark

Yicong-Huang · 2026-02-25T01:51:31Z

What changes were proposed in this pull request?

Replace pa.Table.from_struct_array(pa.array([{}] * len(data), type=pa.struct([]))) with pa.Table.from_batches([pa.RecordBatch.from_pandas(data)]) in connect/session.py when handling 0-column pandas DataFrames. This is O(1) operation, regardless how many rows are there.

Why are the changes needed?

The original approach constructs len(data) Python dict objects ([{}] * len(data)), which is O(n). pa.RecordBatch.from_pandas is an O(1) operation regardless of the number of rows, as it reads row
count directly from pandas index metadata without allocating per-row Python objects.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

…sion

ueshin · 2026-02-25T02:14:05Z

Can't we apply this to

spark/python/pyspark/sql/conversion.py

Line 289 in 01bfd80

return pa.RecordBatch.from_struct_array(pa.array([{}] * len(data), arrow_type))

?

Yicong-Huang · 2026-02-25T02:27:23Z

Can't we apply this to ？

Not for this case. This case is "data" is empty but schema is non empty, so we could not use convert and preserve the information from data: the columns will mismatch.

ueshin

LGTM, pending tests.

HyukjinKwon · 2026-02-25T06:56:19Z

Merged to master.

fix: use pa.RecordBatch.from_pandas for 0-column table in connect ses…

01bfd80

…sion

Yicong-Huang changed the title ~~[SPARK-55674][PYTHON] Use pa.RecordBatch.from_pandas for 0-column table in Spark Connect session~~ [SPARK-55674][PYTHON] Optimize 0-column table conversion in Spark Connect session Feb 25, 2026

Yicong-Huang changed the title ~~[SPARK-55674][PYTHON] Optimize 0-column table conversion in Spark Connect session~~ [SPARK-55674][PYTHON] Optimize 0-column table conversion in Spark Connect Feb 25, 2026

ueshin approved these changes Feb 25, 2026

View reviewed changes

HyukjinKwon closed this in 8ac083c Feb 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-55674][PYTHON] Optimize 0-column table conversion in Spark Connect#54468

[SPARK-55674][PYTHON] Optimize 0-column table conversion in Spark Connect#54468
Yicong-Huang wants to merge 1 commit intoapache:masterfrom
Yicong-Huang:SPARK-55674/followup/unify-zero-column-pandas-arrow-fix

Yicong-Huang commented Feb 25, 2026 •

edited

Loading

Uh oh!

ueshin commented Feb 25, 2026

Uh oh!

Yicong-Huang commented Feb 25, 2026

Uh oh!

ueshin left a comment

Uh oh!

HyukjinKwon commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Yicong-Huang commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

ueshin commented Feb 25, 2026

Uh oh!

Yicong-Huang commented Feb 25, 2026

Uh oh!

ueshin left a comment

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Yicong-Huang commented Feb 25, 2026 •

edited

Loading