Skip to content

Normalize CSV identifiers to ASCII slugs#1438

Open
LoicOuth wants to merge 2 commits intoPokeAPI:masterfrom
LoicOuth:fix/normalize-csv-identifiers
Open

Normalize CSV identifiers to ASCII slugs#1438
LoicOuth wants to merge 2 commits intoPokeAPI:masterfrom
LoicOuth:fix/normalize-csv-identifiers

Conversation

@LoicOuth
Copy link
Contributor

@LoicOuth LoicOuth commented Mar 12, 2026

Normalize CSV identifiers to ASCII slug format

Fixes the invalid identifiers identified by #1436

Context

This PR normalizes 16 resource identifiers in CSV files to comply with the ASCII slug format (^[a-z0-9-]+$). These identifiers contained Unicode characters, accents, or special characters that caused API endpoint issues.

Changes

items.csv (10 corrections):

  • kofu's-walletkofus-wallet (Unicode apostrophe U+2019)
  • leader's-crestleaders-crest (Unicode apostrophe U+2019)
  • jalapeñojalapeno (Unicode character ñ)
  • steel-bottle-(r)steel-bottle-r (parentheses)
  • steel-bottle-(y)steel-bottle-y (parentheses)
  • steel-bottle-(b)steel-bottle-b (parentheses)
  • plaid-tablecloth-(y)plaid-tablecloth-y (parentheses)
  • plaid-tablecloth-(b)plaid-tablecloth-b (parentheses)
  • plaid-tablecloth-(r)plaid-tablecloth-r (parentheses)
  • b&w-grass-tableclothbw-grass-tablecloth (ampersand)

locations.csv (2 corrections):

  • rivière-walkriviere-walk (accent è)
  • dernière-wayderniere-way (accent è)

move_meta_categories.csv (4 corrections):

  • damage+ailmentdamage-ailment (plus sign)
  • damage+lowerdamage-lower (plus sign)
  • damage+raisedamage-raise (plus sign)
  • damage+healdamage-heal (plus sign)

Normalization Rules

  • Unicode apostrophes (') → removed (kofus-wallet, leaders-crest)
  • Accented characters (è, ñ) → ASCII equivalent (e, n)
  • Parentheses () → removed
  • Ampersand & → removed (b-and-w → bw)
  • Plus sign + → replaced with hyphen -

Testing

After these changes, the CSV validation test introduced in PR #1437 passes successfully:

$ python manage.py test pokemon_v2.test_models.CSVResourceNameValidationTestCase
...
Ran 2 tests in 0.036s

OK

- Remove Unicode apostrophes (') from item names
- Remove accents (è) from location names
- Remove Unicode characters (ñ) from item names
- Remove parentheses from item identifiers
- Replace + with - in move meta categories
- Replace & with nothing in item names

Fixes 16 invalid identifiers to comply with ASCII slug pattern ^[a-z0-9-]+$
@phalt
Copy link
Member

phalt commented Mar 13, 2026

Fixes the invalid identifiers identified by #[PR number of first PR]

Is there an error here? Which issue?

@LoicOuth
Copy link
Contributor Author

Fixes the invalid identifiers identified by #[PR number of first PR]

Is there an error here? Which issue?

I updated the identifiers in the original message.
It's linked to issue #1436 and discussion #1435.

@phalt
Copy link
Member

phalt commented Mar 13, 2026

Just a note on your original PR:

You're absolutely right

I think a lot of your replies in the previous issue and PR are LLM generated. It is really hard to tell what is human and what isn't anymore on this platform.

I strongly suggest writing your own messages back to humans and not relying on a bot to do your thinking. We will not accept contributions from humans who haven't given good thought to what they're doing and just copied and pasted a prompt to a bot and then opened a PR. That isn't contribution, that is noise. Outsourcing effort to machines shifts the effort from you creating the code, to use reviewing it and dealing with what the bot is producing.

1670,scarlet-book,20,0,,
1671,violet-book,20,0,,
1672,kofu’s-wallet,22,0,,
1672,kofus-wallet,22,0,,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given these were all originally broken and not returning anything, I think it's okay to just update them?

FWIW - I feel like most people use the hypermedia links in API responses to access items, which uses IDs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was pretty sure from the beginning that these PRs were both AI, but I figured I'd be thorough with the review since this was something that would benefit future contributions. Since we offer accessing the URLs with the identifiers (which is actually what I tend to use) I think it's best to make sure that the URLs are all accessible.

It's been frustrating that the past three PRs have been all AI, but I see you're updating AI guidelines so I hope things will be different moving forward.

@LoicOuth
Copy link
Contributor Author

LoicOuth commented Mar 13, 2026

Just a note on your original PR:

You're absolutely right

I think a lot of your replies in the previous issue and PR are LLM generated. It is really hard to tell what is human and what isn't anymore on this platform.

I strongly suggest writing your own messages back to humans and not relying on a bot to do your thinking. We will not accept contributions from humans who haven't given good thought to what they're doing and just copied and pasted a prompt to a bot and then opened a PR. That isn't contribution, that is noise. Outsourcing effort to machines shifts the effort from you creating the code, to use reviewing it and dealing with what the bot is producing.

Ok, I understand. It’s true that I used LLM to write the PR descriptions and some messages to save time. I also understand that this can make the review process take more time for you and that it may raise doubts about my code in that case, but I have no problem discussing it if necessary. I’ll be more careful next time and will write my messages myself. I Apologize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants