Skip to content

[FEATURE]: Transliterate German Umlauts with Diagraphs #253

@weilbith

Description

@weilbith

Feature Description

Hi 👋

it would be great if Codebook could take diagraphs into account when checking the spelling to account for transliterated diacritical marks. That would help to improve the spell checking, reduce false positives, and reduces friction.

Use Case

When working with a language that uses diacritical marks like umlauts, it is sometimes required to transliterate them to diagraphs. In my case of German, umlauts get expanded like this: ä -> ae, ö -> oe, ü -> ue. Unfortunately this makes regular Hunspell dictionaries not match anymore. In turn any German word with an expanded umlaut causes a false positive.

Proposed Solution

Codebooks somehow recognizes these expanded notations and matches them properly. This might as primitive as permutating every term in reverse (ae -> ä, oe -> ö, ue -> ü) and see it any of the for variants is correctly spelled

Alternative Solutions

This can currently only be fixed by adding any used term with an umlaut to the configured list of words.

Examples

codebook.toml:

dictionaries = ["en_us", "de"]
export const buecherListe = [] // Bücher (books) -> buecher - flagged as issue

Currently this requires to adapt the codebook.toml to this:

dictionaries = ["en_us", "de"]
words = [
   "buecher",
   # ... long list of words with umlaute used in the codebase
]

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions