Skip to content

gh-149891: Add more encoding aliases#149892

Open
serhiy-storchaka wants to merge 2 commits into
python:mainfrom
serhiy-storchaka:encodings-aliases-iana
Open

gh-149891: Add more encoding aliases#149892
serhiy-storchaka wants to merge 2 commits into
python:mainfrom
serhiy-storchaka:encodings-aliases-iana

Conversation

@serhiy-storchaka
Copy link
Copy Markdown
Member

@serhiy-storchaka serhiy-storchaka commented May 15, 2026

Support all aliases officially registered in IANA.

New names: Extended_UNIX_Code_Packed_Format_for_Japanese, KSC_5601, KS_C_5601-1989, iso-ir-149, GB_2312-80, windows-936, mac, CCSID00858, CCSID01140, and a number of "cs"-prefixed names.

Fix csHPRoman8, which was not normalized.

Support all aliases officially registered in IANA.

New names: Extended_UNIX_Code_Packed_Format_for_Japanese,
KSC_5601, KS_C_5601-1989, iso-ir-149, GB_2312-80, windows-936, mac,
CCSID00858, CCSID01140, and a number of "cs"-prefixed names.

Fix csHPRoman8, which was not normalized.
@serhiy-storchaka serhiy-storchaka added the needs backport to 3.15 pre-release feature fixes, bugs and security fixes label May 15, 2026
Comment thread Misc/NEWS.d/next/Library/2026-05-15-19-52-41.gh-issue-149891.BJUIGB.rst Outdated
…JUIGB.rst

Co-authored-by: Stan Ulbrych <stan@python.org>
Comment thread Lib/encodings/aliases.py
# euc_jp codec
'cseucpkdfmtjapanese' : 'euc_jp',
'eucjp' : 'euc_jp',
'extended_unix_code_packed_format_for_japanese' : 'euc_jp',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did you find such a long encoding name ?

Copy link
Copy Markdown
Member

@StanFromIreland StanFromIreland Jun 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the IANA page at https://www.iana.org/assignments/character-sets/character-sets.xhtml this does not appear to be an encoding name actually used in practice, but just an explanation of what "EUC-JP" stands for.

Save to remove, I think.

Comment thread Lib/encodings/aliases.py
'euckr' : 'euc_kr',
'iso_ir_149' : 'euc_kr',
'korean' : 'euc_kr',
'ks_c_5601_1987' : 'euc_kr',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #62825 I think these need to be changed.

Comment thread Lib/encodings/aliases.py
'cseuckr' : 'euc_kr',
'csksc56011987' : 'euc_kr',
'euckr' : 'euc_kr',
'iso_ir_149' : 'euc_kr',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #62825

Comment thread Lib/encodings/aliases.py
'ks_c_5601_1987' : 'euc_kr',
'ksc_5601' : 'euc_kr',
'ksx1001' : 'euc_kr',
'ks_x_1001' : 'euc_kr',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #62825

Most of these aliases should really go to cp949

@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented Jun 4, 2026

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting changes needs backport to 3.15 pre-release feature fixes, bugs and security fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants