Skip to content

[SPARK-47998][PS] Support native pandas inputs in pandas-on-Spark concat#55561

Open
hvph-uyen wants to merge 3 commits intoapache:masterfrom
hvph-uyen:spark-47998-ps-concat
Open

[SPARK-47998][PS] Support native pandas inputs in pandas-on-Spark concat#55561
hvph-uyen wants to merge 3 commits intoapache:masterfrom
hvph-uyen:spark-47998-ps-concat

Conversation

@hvph-uyen
Copy link
Copy Markdown

What changes were proposed in this pull request?

This PR updates pandas-on-Spark concat so that native pandas DataFrame and Series objects are accepted when they are passed inside the input iterable.

It also fixes the error message for unsupported inputs so that it reports the actual invalid element type instead of the outer container type.

The existing behavior for bare inputs such as ps.concat(pdf) and ps.concat(pser) is preserved.

Why are the changes needed?

Currently, ps.concat rejects native pandas objects even when they can be converted in the iterable case.

Also, the current error message is misleading because it can report list instead of the actual invalid input type.

Does this PR introduce any user-facing change?

Yes.

Before this change, ps.concat([pdf, pdf]) rejected native pandas inputs, and unsupported input errors could report list instead of the actual invalid element type.

After this change, native pandas DataFrame and Series inputs are accepted in the iterable case, and unsupported input errors report the actual invalid element type.

How was this patch tested?

  • Added regression coverage in pyspark.pandas.tests.test_namespace.
  • Ran:
    python/run-tests --python-executables python3 --testnames pyspark.pandas.tests.test_namespace

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Codex (GPT-5)

Generative AI tooling was used to help inspect the issue, understand the relevant Spark codebase. The final patch was manually reviewed and tested before submission.

@HyukjinKwon
Copy link
Copy Markdown
Member

I think we should not do this. Otherwise, we will have to fix all API surface to support pandas instances

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants