[SPARK-55621][PYTHON] Fix ambiguous and unnecessary unicode usage#54410
[SPARK-55621][PYTHON] Fix ambiguous and unnecessary unicode usage#54410gaogaotiantian wants to merge 1 commit intoapache:masterfrom
Conversation
| Mapping correspondence. | ||
| na_action : {None, 'ignore'} | ||
| If ‘ignore’, propagate NA values, without passing them to the mapping correspondence. | ||
| If 'ignore', propagate NA values, without passing them to the mapping correspondence. |
There was a problem hiding this comment.
I thought backticks are legitimate in Sphinx.
There was a problem hiding this comment.
backticks (`) are legit syntax, ‘ is not a backtick. It's a unicode quote.
allisonwang-db
left a comment
There was a problem hiding this comment.
Thanks for fixing this! Good to know.
holdenk
left a comment
There was a problem hiding this comment.
Approved, although I'm uncertain if we need the comment changes and I think we should be open to dropping it in the future if we find having unicode in comments helpful for illustrating behaviour.
| # ==== 3.2 Nullable Extension Types ==== | ||
| # (data, target_type, expected_values) | ||
| nullable_cases = [ | ||
| # Int types → float |
There was a problem hiding this comment.
in-line unicode comments don't seem as bad as string/docstring issues.
There was a problem hiding this comment.
So this is actually not enforced by ruff. The added ruff checker only checks for "ambiguous unicode usage" like the quote I mentioned above. This fix is done by myself. It's actually added pretty recently and I believe it's because LLMs like to generate icons like this.
I don't think having such characters in the comments is horrible, and in some case it might actually be helpful. But unicode characters may have issues on some IDEs/machines/editors and it's not worth it to do → vs ->. I don't even know how to type → by myself :) .
That being said, this enforcement will not block any unicode usages in the future - people can still do that. This specific change is a side effect when I'm trying to clean up unicode character usages in this PR.
| # ambiguous unicode character | ||
| "RUF001", # string | ||
| "RUF002", # docstring | ||
| "RUF003", # comment |
There was a problem hiding this comment.
in-line unicode comments don't seem as bad as string/docstring issues.
What changes were proposed in this pull request?
Fixed all the unnecessary and ambiguous unicode character usage.
A set of
ruffrules are also added to prevent future regressions.Why are the changes needed?
We should avoid using non-ascii unicode character usage as much as possible. There are few rationales behind it
‘index’vs'index'Does this PR introduce any user-facing change?
No.
How was this patch tested?
ruff checkpassed.Was this patch authored or co-authored using generative AI tooling?
No.