feat: sometimes interpret char[] mutations as single bytes #1013

florianGla · 2025-11-07T13:03:14Z

When mutating char[] randomly interpret the bytes from libFuzzer as individual (single byte) chars. This helps to make use of libFuzzers table of recent compare entries (encoded as CESU8) if the char[] is used as a String inside the fuzz test.

tests/src/test/java/com/example/CharArrayFuzzer.java

simonresch

LGTM. Thanks!

oetr · 2025-11-18T09:09:19Z

...in/java/com/code_intelligence/jazzer/mutation/mutator/lang/PrimitiveArrayMutatorFactory.java

+      if (prng.choice()) {
+        return (char[]) toPrimitive.apply(bytes);
+      } else {
+        char[] chars = new String(bytes, Charset.forName("CESU-8")).toCharArray();


This doesn't seem to respect tha length provided by @WithLength annotation.

Oh, it does, never mind!

Out of curiosity: Doesn't the current code here (specifically convertWithLength) assume that 2 byte = 1 char?
Wouldn't this assumption be violated here because:

1 CESU-8 byte can be converted to 1 char (for ASCII chars) → array becomes too large

3 CESU-8 byte can be converted to 1 char (for chars >= U+0800) → array becomes too short

Maybe I am overlooking something though; sorry for the trouble in that case.

(Also side note: Would it make sense to store the Charset.forName("CESU-8") in a static final field?)

I also couldn't see how the length gets respected here, and did some fuzzing with various @WithLength(min=..., max=...) values, and couldn't get the array length out of bounds. So it seems to be respected, for some reason!

Would it make sense to store the Charset.forName("CESU-8") in a static final field?

It would make sense, I think.

@oetr The length annotation is enforced by the innerMutator, which is used by toPrimitive and toPrimitiveAfterMutate: see https://github.com/CodeIntelligenceTesting/jazzer/blob/CIF-1863-string-compares-on-char-arrays/src/main/java/com/code_intelligence/jazzer/mutation/mutator/lang/PrimitiveArrayMutatorFactory.java#L95

@Marcono1234 You are right regarding the convertWithLength assuming two bytes per char, which could lead to enforcing incorrect length constraints when interepreting the byte array as "CESU-8"-encoded. I adjusted the convertWithLength method to assume a maximum of 6 bytes for a char (the maximum number of bytes used with CESU-8). Then, when we do the actual conversion to a char array, we ensure to enforce the length constraints. @oetr Could you have a look at the last commit?

Nice catch @Marcono1234 , thanks!

Isn't this * 6 for a complete Unicode code point, specifically a code point >= 0x10000?
However a Java char is UTF-16, so those code points >= 0x10000 would actually require 2 char.

So wouldn't it suffice to do * 3 here?

Or at worst * 4 in case a malformed 4-byte CESU-8 encoding can lead to a single replacement ? char.

When mutating char[] randomly interpret the bytes from libFuzzer as individual (single byte) chars. This helps to make use of libFuzzers table of recent compare entries (encoded as CESU8) if the char[] is used as a String inside the fuzz test.

oetr

Looks good, just one issue from my side!

oetr · 2025-11-20T15:25:09Z

...in/java/com/code_intelligence/jazzer/mutation/mutator/lang/PrimitiveArrayMutatorFactory.java

      Optional<WithLength> withLength = Optional.ofNullable(type.getAnnotation(WithLength.class));
-      int minLength = withLength.map(WithLength::min).orElse(DEFAULT_MIN_LENGTH);
-      int maxLength = withLength.map(WithLength::max).orElse(DEFAULT_MAX_LENGTH);
+      withLength.ifPresent(System.err::println);


can be removed

oetr · 2025-11-20T15:45:29Z

...in/java/com/code_intelligence/jazzer/mutation/mutator/lang/PrimitiveArrayMutatorFactory.java

        }
+
+        if (chars.length < minLength) {
+          return Arrays.copyOf(chars, minLength);


Here, the newly padded chars might not be in range.

Maybe we can do a single-pass copy and forceInRange here? Or is Java smart enough to optimize it at runtime?
WDYT about the following:

int targetLength = Math.min(Math.max(chars.length, minLength), maxLength); if (chars.length == targetLength) { for (int i = 0; i < chars.length; i++) { chars[i] = (char) forceInRange(chars[i], minRange, maxRange); } return chars; } else { char[] result = new char[targetLength]; for (int i = 0; i < targetLength; i++) { result[i] = i < chars.length ? (char) forceInRange(chars[i], minRange, maxRange) : (char) forceInRange(0, minRange, maxRange); } return result; }

florianGla force-pushed the CIF-1863-string-compares-on-char-arrays branch from ec0033e to 5c73f71 Compare November 7, 2025 13:06

fmeum reviewed Nov 7, 2025

View reviewed changes

tests/src/test/java/com/example/CharArrayFuzzer.java Outdated Show resolved Hide resolved

kyakdan force-pushed the CIF-1863-string-compares-on-char-arrays branch 3 times, most recently from 46f44d5 to 6762580 Compare November 12, 2025 10:35

florianGla requested review from oetr and simonresch November 18, 2025 08:56

simonresch approved these changes Nov 18, 2025

View reviewed changes

simonresch force-pushed the CIF-1863-string-compares-on-char-arrays branch from 6762580 to 046a4e9 Compare November 18, 2025 08:59

oetr reviewed Nov 18, 2025

View reviewed changes

kyakdan force-pushed the CIF-1863-string-compares-on-char-arrays branch from 046a4e9 to f240d7a Compare November 19, 2025 12:15

simonresch and others added 3 commits November 19, 2025 13:16

feat: decode bytes as CESU-8 when converting to char[]

96e19d6

feat: ensure correct length constraints for char[] mutation

ef7ea96

kyakdan force-pushed the CIF-1863-string-compares-on-char-arrays branch from f240d7a to ef7ea96 Compare November 19, 2025 12:16

oetr reviewed Nov 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: sometimes interpret char[] mutations as single bytes #1013

feat: sometimes interpret char[] mutations as single bytes #1013

Uh oh!

florianGla commented Nov 7, 2025

Uh oh!

Uh oh!

simonresch left a comment

Uh oh!

oetr Nov 18, 2025 •

edited

Loading

Uh oh!

Marcono1234 Nov 18, 2025

Uh oh!

oetr Nov 18, 2025

Uh oh!

kyakdan Nov 19, 2025 •

edited

Loading

Uh oh!

kyakdan Nov 19, 2025

Uh oh!

oetr Nov 20, 2025

Uh oh!

Marcono1234 Nov 21, 2025 •

edited

Loading

Uh oh!

oetr left a comment

Uh oh!

oetr Nov 20, 2025

Uh oh!

oetr Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

feat: sometimes interpret char[] mutations as single bytes #1013

Are you sure you want to change the base?

feat: sometimes interpret char[] mutations as single bytes #1013

Uh oh!

Conversation

florianGla commented Nov 7, 2025

Uh oh!

Uh oh!

simonresch left a comment

Choose a reason for hiding this comment

Uh oh!

oetr Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Marcono1234 Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

oetr Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

kyakdan Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kyakdan Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

oetr Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Marcono1234 Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oetr left a comment

Choose a reason for hiding this comment

Uh oh!

oetr Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

oetr Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

oetr Nov 18, 2025 •

edited

Loading

kyakdan Nov 19, 2025 •

edited

Loading

Marcono1234 Nov 21, 2025 •

edited

Loading