Skip to content

[SPARK-55059][PYTHON] Remove empty table workaround in toPandas#53824

Closed
Yicong-Huang wants to merge 1 commit intoapache:masterfrom
Yicong-Huang:SPARK-55059/refactor/remove-empty-table-workaround
Closed

[SPARK-55059][PYTHON] Remove empty table workaround in toPandas#53824
Yicong-Huang wants to merge 1 commit intoapache:masterfrom
Yicong-Huang:SPARK-55059/refactor/remove-empty-table-workaround

Conversation

@Yicong-Huang
Copy link
Contributor

@Yicong-Huang Yicong-Huang commented Jan 15, 2026

What changes were proposed in this pull request?

Remove the SPARK-51112 workaround in _convert_arrow_table_to_pandas() that bypassed PyArrow's to_pandas() for empty tables.

Why are the changes needed?

The workaround was added because arrow-java's ListVector.getBufferSizeFor(0) returned 0, causing the offset buffer to be omitted for empty nested arrays in IPC serialization, which led to a segmentation fault in PyArrow.

This has been fixed upstream in arrow-java 19.0.0 (apache/arrow-java#343), which Spark adopted in SPARK-56000 (PR #54820). The workaround is no longer necessary.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing test test_to_pandas_for_empty_df_with_nested_array_columns passes.

Was this patch authored or co-authored using generative AI tooling?

No.

@Yicong-Huang Yicong-Huang marked this pull request as draft January 15, 2026 23:30
@github-actions
Copy link

JIRA Issue Information

=== Improvement SPARK-55059 ===
Summary: Remove empty table workaround in toPandas
Assignee: None
Status: Open
Affected: ["4.2.0"]


This comment was automatically generated by GitHub Actions

@Yicong-Huang Yicong-Huang force-pushed the SPARK-55059/refactor/remove-empty-table-workaround branch from fa4f1a3 to 754c775 Compare January 16, 2026 00:53
@Yicong-Huang
Copy link
Contributor Author

Yicong-Huang commented Jan 16, 2026

waiting for #53822

@Yicong-Huang Yicong-Huang force-pushed the SPARK-55059/refactor/remove-empty-table-workaround branch from 754c775 to dc3371b Compare January 16, 2026 01:04
@Yicong-Huang
Copy link
Contributor Author

depends on #54820

@Yicong-Huang Yicong-Huang force-pushed the SPARK-55059/refactor/remove-empty-table-workaround branch from 6421453 to af90c9c Compare March 18, 2026 20:25
@Yicong-Huang Yicong-Huang changed the title [WIP][SPARK-55059][PYTHON] Remove empty table workaround in toPandas [SPARK-55059][PYTHON] Remove empty table workaround in toPandas Mar 18, 2026
@Yicong-Huang Yicong-Huang marked this pull request as ready for review March 18, 2026 20:26
@Yicong-Huang Yicong-Huang force-pushed the SPARK-55059/refactor/remove-empty-table-workaround branch from af90c9c to 2acd216 Compare March 19, 2026 00:43
@Yicong-Huang
Copy link
Contributor Author

cc @ueshin @HyukjinKwon @zhengruifeng this is ready for review.

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants