Skip to content

Comments

[SPARK-55621][PYTHON] Fix ambiguous and unnecessary unicode usage#54410

Open
gaogaotiantian wants to merge 1 commit intoapache:masterfrom
gaogaotiantian:fix-ascii
Open

[SPARK-55621][PYTHON] Fix ambiguous and unnecessary unicode usage#54410
gaogaotiantian wants to merge 1 commit intoapache:masterfrom
gaogaotiantian:fix-ascii

Conversation

@gaogaotiantian
Copy link
Contributor

@gaogaotiantian gaogaotiantian commented Feb 20, 2026

What changes were proposed in this pull request?

Fixed all the unnecessary and ambiguous unicode character usage.
A set of ruff rules are also added to prevent future regressions.

Why are the changes needed?

We should avoid using non-ascii unicode character usage as much as possible. There are few rationales behind it

  • Sometimes it's just wrong. e.g. ‘index’ vs 'index'
  • Some editor (VSCode) will highlight it as a warning and some editor/terminal might not display it well
  • It's difficult to keep consistency because people don't know how to type that
  • For docstrings, it could actually be displayed somewhere while users are using it and unicode could cause problems

Does this PR introduce any user-facing change?

No.

How was this patch tested?

ruff check passed.

Was this patch authored or co-authored using generative AI tooling?

No.

Mapping correspondence.
na_action : {None, 'ignore'}
If ignore, propagate NA values, without passing them to the mapping correspondence.
If 'ignore', propagate NA values, without passing them to the mapping correspondence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought backticks are legitimate in Sphinx.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backticks (`) are legit syntax, is not a backtick. It's a unicode quote.

Copy link
Contributor

@allisonwang-db allisonwang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this! Good to know.

Copy link
Contributor

@holdenk holdenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, although I'm uncertain if we need the comment changes and I think we should be open to dropping it in the future if we find having unicode in comments helpful for illustrating behaviour.

# ==== 3.2 Nullable Extension Types ====
# (data, target_type, expected_values)
nullable_cases = [
# Int types float
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in-line unicode comments don't seem as bad as string/docstring issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is actually not enforced by ruff. The added ruff checker only checks for "ambiguous unicode usage" like the quote I mentioned above. This fix is done by myself. It's actually added pretty recently and I believe it's because LLMs like to generate icons like this.

I don't think having such characters in the comments is horrible, and in some case it might actually be helpful. But unicode characters may have issues on some IDEs/machines/editors and it's not worth it to do vs ->. I don't even know how to type by myself :) .

That being said, this enforcement will not block any unicode usages in the future - people can still do that. This specific change is a side effect when I'm trying to clean up unicode character usages in this PR.

# ambiguous unicode character
"RUF001", # string
"RUF002", # docstring
"RUF003", # comment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in-line unicode comments don't seem as bad as string/docstring issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants