SNOW-1051741: df.apply(axis=1) should preserve the original index by sfc-gh-jkew · Pull Request #3955 · snowflakedb/snowpark-python

sfc-gh-jkew · 2025-10-24T19:25:04Z

df.apply(axis=1) should preserve the original index. Previously we would return a RangeIndex regardless of the original index. This approach passes the index data into the underlying UDTF.

Mostly AI written approach, but with original tests for verification.

Fixes SNOW-1051741

Fill out the following pre-review checklist:
- I am adding a new automated test(s) to verify correctness of my new code
  - If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
- I am adding new logging messages
- I am adding a new telemetry message
- I am adding new credentials
- I am adding a new dependency
- If this is a new feature/behavior, I'm adding the Local Testing parity changes.
- I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
- If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines

…ew/apply.axis.1.row.index.0

tests/integ/modin/frame/test_apply.py

sfc-gh-helmeleegy

LGTM, just had one question.

src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py

…b/snowpark-python into jkew/apply.axis.1.row.index.0

src/snowflake/snowpark/modin/plugin/_internal/apply_utils.py

sfc-gh-mvashishtha · 2025-10-27T20:22:37Z

src/snowflake/snowpark/modin/plugin/_internal/apply_utils.py

+            if num_index_columns > 0:
+                # Columns after row position are index columns, then data columns
+                index_cols = df.iloc[:, 1 : 1 + num_index_columns]
+                data_cols = df.iloc[:, 1 + num_index_columns :]
+
+                # Set the index using the index columns
+                if num_index_columns == 1:
+                    index = index_cols.iloc[:, 0]
+                    if index_column_pandas_labels:
+                        index.name = index_column_pandas_labels[0]
+                else:
+                    # Multi-index case
+                    index = native_pd.MultiIndex.from_arrays(
+                        [index_cols.iloc[:, i] for i in range(num_index_columns)],
+                        names=index_column_pandas_labels
+                        if index_column_pandas_labels
+                        else None,
+                    )
+                data_cols.index = index
+                df = data_cols
+            else:


can't we use set_index() in both cases?

I meant that you can replace most of the code here with set_index(). See #3979.

src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py

src/snowflake/snowpark/modin/plugin/_internal/apply_utils.py

…b/snowpark-python into jkew/apply.axis.1.row.index.0

sfc-gh-mvashishtha · 2025-10-27T22:06:37Z

src/snowflake/snowpark/modin/plugin/_internal/apply_utils.py

+        input_types: Snowpark column types of the input data columns (including index columns).
+        index_column_pandas_labels: The pandas labels for the index columns, if any.


Suggested change

input_types: Snowpark column types of the input data columns (including index columns).

index_column_pandas_labels: The pandas labels for the index columns, if any.

input_types: Snowpark column types of the input data columns (including index columns).

sfc-gh-mvashishtha · 2025-10-30T18:08:58Z

tests/integ/modin/frame/test_apply.py

+
+
+@sql_count_checker(query_count=5, join_count=2, udtf_count=1)
+def test_apply_axis_1_multiindex_preservation():


Could we also test

func with return type annotations. We'll use vectorized UDFs instead of UDTFs.

func returning a series

apply() on series (with func typed, untyped, or returning a series)

sfc-gh-mvashishtha · 2025-10-30T19:45:30Z

src/snowflake/snowpark/modin/plugin/_internal/apply_utils.py

+            if num_index_columns > 0:
+                # Columns after row position are index columns, then data columns
+                index_cols = df.iloc[:, 1 : 1 + num_index_columns]
+                data_cols = df.iloc[:, 1 + num_index_columns :]
+
+                # Set the index using the index columns
+                if num_index_columns == 1:
+                    index = index_cols.iloc[:, 0]
+                    if index_column_pandas_labels:
+                        index.name = index_column_pandas_labels[0]
+                else:
+                    # Multi-index case
+                    index = native_pd.MultiIndex.from_arrays(
+                        [index_cols.iloc[:, i] for i in range(num_index_columns)],
+                        names=index_column_pandas_labels
+                        if index_column_pandas_labels
+                        else None,
+                    )
+                data_cols.index = index
+                df = data_cols
+            else:


I meant that you can replace most of the code here with set_index(). See #3979.

sfc-gh-mvashishtha · 2025-10-30T19:54:44Z

src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py

+        # Determine if we should pass index columns to the UDTF
+        # We pass index columns when the index is not the row position itself


We always pass the index column names here. We can keep doing that, but we should update the comment and make the parameter required, since there don't seem to be any other invocations of that function.

sfc-gh-mvashishtha · 2025-10-31T14:00:24Z

src/snowflake/snowpark/modin/plugin/_internal/apply_utils.py

    column_index: native_pd.Index,
    input_types: list[DataType],
    session: Session,
+    index_column_labels: list[Hashable] | None = None,


It turns out that just passing the number of index columns is enough:

snowpark-python/src/snowflake/snowpark/modin/plugin/_internal/apply_utils.py

Line 471 in 849ba40

# columns. We don't care about the index names because `func`

sfc-gh-jkew added 4 commits October 24, 2025 11:02

Failed tests

e085f83

Tests partially pass test_apply

ab4a197

Merge branch 'main' of github.com:snowflakedb/snowpark-python into jk…

751e50c

…ew/apply.axis.1.row.index.0

lint

47ebc94

github-actions bot added the snowpark-pandas label Oct 24, 2025

sfc-gh-jkew added the NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs label Oct 24, 2025

sfc-gh-jkew added 3 commits October 24, 2025 13:06

Changelog

a6c70ab

Add test for an index from column

8fda927

Merge branch 'main' into jkew/apply.axis.1.row.index.0

3085ea3

sfc-gh-jkew marked this pull request as ready for review October 24, 2025 20:56

sfc-gh-jkew requested a review from a team as a code owner October 24, 2025 20:56

sfc-gh-jkew requested review from sfc-gh-helmeleegy and sfc-gh-mvashishtha October 24, 2025 20:56

sfc-gh-helmeleegy reviewed Oct 24, 2025

View reviewed changes

tests/integ/modin/frame/test_apply.py Show resolved Hide resolved

sfc-gh-helmeleegy approved these changes Oct 24, 2025

View reviewed changes

sfc-gh-jkew commented Oct 24, 2025

View reviewed changes

src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py Outdated Show resolved Hide resolved

sfc-gh-jkew added 3 commits October 24, 2025 15:30

Remove stale branch

93b282b

Merge branch 'jkew/apply.axis.1.row.index.0' of github.com:snowflaked…

8a7f8bb

…b/snowpark-python into jkew/apply.axis.1.row.index.0

Merge branch 'main' into jkew/apply.axis.1.row.index.0

59c915e

sfc-gh-mvashishtha reviewed Oct 27, 2025

View reviewed changes

sfc-gh-jkew added 10 commits October 27, 2025 14:05

Clean up some AI stuff

056e54d

Merge branch 'jkew/apply.axis.1.row.index.0' of github.com:snowflaked…

33fd482

…b/snowpark-python into jkew/apply.axis.1.row.index.0

Use set_index

e5b352c

Merge branch 'main' into jkew/apply.axis.1.row.index.0

20f3056

Merge branch 'main' into jkew/apply.axis.1.row.index.0

55d31a4

More cleanup

73961bd

Merge branch 'jkew/apply.axis.1.row.index.0' of github.com:snowflaked…

39bb90f

…b/snowpark-python into jkew/apply.axis.1.row.index.0

Merge branch 'main' into jkew/apply.axis.1.row.index.0

7a6161b

Merge branch 'main' into jkew/apply.axis.1.row.index.0

aa998d5

Merge branch 'main' into jkew/apply.axis.1.row.index.0

8c90e48

Merge branch 'main' into jkew/apply.axis.1.row.index.0

12d2253

sfc-gh-mvashishtha reviewed Oct 30, 2025

View reviewed changes

sfc-gh-mvashishtha reviewed Oct 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-1051741: df.apply(axis=1) should preserve the original index#3955

SNOW-1051741: df.apply(axis=1) should preserve the original index#3955
sfc-gh-jkew wants to merge 21 commits intomainfrom
jkew/apply.axis.1.row.index.0

sfc-gh-jkew commented Oct 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

sfc-gh-helmeleegy left a comment

Uh oh!

Uh oh!

Uh oh!

sfc-gh-mvashishtha Oct 27, 2025

Uh oh!

sfc-gh-mvashishtha Oct 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sfc-gh-mvashishtha Oct 27, 2025

Uh oh!

sfc-gh-mvashishtha Oct 30, 2025

Uh oh!

sfc-gh-mvashishtha Oct 30, 2025

Uh oh!

sfc-gh-mvashishtha Oct 30, 2025

Uh oh!

sfc-gh-mvashishtha Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

		input_types: Snowpark column types of the input data columns (including index columns).
		index_column_pandas_labels: The pandas labels for the index columns, if any.



		@sql_count_checker(query_count=5, join_count=2, udtf_count=1)
		def test_apply_axis_1_multiindex_preservation():

		# Determine if we should pass index columns to the UDTF
		# We pass index columns when the index is not the row position itself

Conversation

sfc-gh-jkew commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sfc-gh-helmeleegy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sfc-gh-mvashishtha Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

sfc-gh-mvashishtha Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sfc-gh-mvashishtha Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

sfc-gh-mvashishtha Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

sfc-gh-mvashishtha Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

sfc-gh-mvashishtha Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

sfc-gh-mvashishtha Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

sfc-gh-jkew commented Oct 24, 2025 •

edited

Loading