Normalize text before running the converter

Before running cyr-lat or lat-cyr conversion, I think it'd be beneficial to normalize the given texts. What I mean by that is, sometimes wrong characters might be used in the words, which are not valid Karakalpak letters, they look visually similar or even exactly the same, but their underlying unicode value is different. For example, I have seen a case when cyrillic karakalpak word had a wrong letter `ӊ` which is similar to correct `ң`, but they are different letters.

So, I suggest to collect a map of characters, that might have another similarly looking equivalents. And then, when conversion is happening, we first need to normalize the string by replacing all wrong characters with the correct ones, then do the conversion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize text before running the converter #18

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Normalize text before running the converter #18

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions