Skip to content

CFR and Procyon displayed Cyrillic#1023

Open
b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0 wants to merge 4 commits into
Col-E:masterfrom
b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0:master
Open

CFR and Procyon displayed Cyrillic#1023
b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0 wants to merge 4 commits into
Col-E:masterfrom
b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0:master

Conversation

@b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0
Copy link
Copy Markdown
Contributor

Problem: In Recaf 4, CFR and Procyon displayed Cyrillic in decompiled output as \u0414..., while the class file and search still showed the correct text.

Solution: Post-process decompiler output via UnicodeUnescapeOutputTextFilter in the DecompilerManager pipeline; default hideutf=false (CFR) and isUnicodeOutputEnabled=true (Procyon).

Tests: EscapeUtilTest.unescapeCyrillicUnicodeEscapes, updated FoldingDeobfuscationTest.

@1050TIt0p
Copy link
Copy Markdown

1050TIt0p commented May 21, 2026

This was definitely missing

@Col-E
Copy link
Copy Markdown
Owner

Col-E commented May 21, 2026

I'd have to dig through my samples folder but I'd like to make sure there aren't any negative effects with certain naming schemes (like whitespace-only bs, right-to-left spam, etc) - Any idea if these changes would affect those? IIRC it was stuff like that which led me to make the defaults false.

- Decode display-safe \\uXXXX only; keep obfuscation escapes
- Respect CFR hideutf and Procyon unicode settings
- Revert decompiler default changes
- Use getServiceConfig() instead of ReflectUtil
- Stronger asserts for ZWSP, bidi, and readable Cyrillic
- CFR/Procyon integration via DecompilerManager
@b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0
Copy link
Copy Markdown
Contributor Author

I'd have to dig through my samples folder but I'd like to make sure there aren't any negative effects with certain naming schemes (like whitespace-only bs, right-to-left spam, etc) - Any idea if these changes would affect those? IIRC it was stuff like that which led me to make the defaults false.

Yes, you're right to be concerned.

I tested exactly those cases (whitespace-only names, ZWSP, RTL/bidi overrides, exotic whitespace etc.).

With the selective filter:

  • Trick characters like \u200B, \u202E and similar stay escaped — they are still clearly visible as \uXXXX, same as before.
  • Normal readable text (Cyrillic, etc.) gets properly unescaped.

I also made sure the filter is completely skipped when hideutf=true (CFR) or unicode output is enabled (Procyon), so your original defaults are fully respected.

No negative effects on obfuscation samples.

@KwilzOne
Copy link
Copy Markdown

That really was a problem; it's not convenient to use the search function with such an output, without the Cyrillic characters being displayed naturally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants