fix censor behavior for japanese diactritics #47

dgisser · 2025-12-22T04:27:00Z

Restore Japanese Kana With Dakuten

Added a regression test showing that Japanese kana like パピプペポ / ぱぴぷぺぽ currently get normalized to their unvoiced forms (ハヒフヘホ / はひふへほ) by the diacritic stripping pass.
Updated the normalization filter (src/censor.rs:159-175) so the combining dakuten (\u{3099}) and handakuten (\u{309A}) marks are preserved while still removing other nonspacing marks and banned characters. This keeps Japanese kana intact without loosening other filtering behavior.

Testing

Before fix:

cargo +nightly test japanese_diacritics_preserved -- --nocapture
...
thread 'censor::tests::japanese_diacritics_preserved' panicked
  left: "パピプペポ"
  right: "ハヒフヘホ"

After fix:

cargo +nightly test japanese_diacritics_preserved -- --nocapture
running 1 test
test censor::tests::japanese_diacritics_preserved ... ok

Full suite:

cargo +nightly test censor

finnbear · 2025-12-22T05:33:08Z

Looks great, thanks! I did initially suggest detecting Japanese (like Devanagari) but this is a better solution as there are only 2 diacritics and they're hard to abuse. This change will take effect in our games next time I update the filter.

dgisser · 2025-12-22T06:38:34Z

for sure, let me know when it pushes to prod and I'll test it out!

dgisser force-pushed the main branch 2 times, most recently from ec27d63 to e84841b Compare December 22, 2025 04:31

fix censor behavior for japanese diactritics

e84841b

dgisser had a problem deploying to github-pages December 22, 2025 05:31 — with GitHub Actions Failure

finnbear merged commit 85b2334 into finnbear:main Dec 22, 2025
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix censor behavior for japanese diactritics #47

fix censor behavior for japanese diactritics #47

Uh oh!

dgisser commented Dec 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

finnbear commented Dec 22, 2025 •

edited

Loading

Uh oh!

dgisser commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

fix censor behavior for japanese diactritics #47

fix censor behavior for japanese diactritics #47

Uh oh!

Conversation

dgisser commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

finnbear commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dgisser commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dgisser commented Dec 22, 2025 •

edited

Loading

finnbear commented Dec 22, 2025 •

edited

Loading