Skip to content

Conversation

@florianGla
Copy link
Contributor

When mutating char[] randomly interpret the bytes from libFuzzer as individual (single byte) chars. This helps to make use of libFuzzers table of recent compare entries (encoded as CESU8) if the char[] is used as a String inside the fuzz test.

@florianGla florianGla force-pushed the CIF-1863-string-compares-on-char-arrays branch from ec0033e to 5c73f71 Compare November 7, 2025 13:06
@kyakdan kyakdan force-pushed the CIF-1863-string-compares-on-char-arrays branch 3 times, most recently from 46f44d5 to 6762580 Compare November 12, 2025 10:35
Copy link
Contributor

@simonresch simonresch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@simonresch simonresch force-pushed the CIF-1863-string-compares-on-char-arrays branch from 6762580 to 046a4e9 Compare November 18, 2025 08:59
if (prng.choice()) {
return (char[]) toPrimitive.apply(bytes);
} else {
char[] chars = new String(bytes, Charset.forName("CESU-8")).toCharArray();
Copy link
Contributor

@oetr oetr Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to respect tha length provided by @WithLength annotation.

Oh, it does, never mind!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: Doesn't the current code here (specifically convertWithLength) assume that 2 byte = 1 char?
Wouldn't this assumption be violated here because:

  • 1 CESU-8 byte can be converted to 1 char (for ASCII chars) → array becomes too large
  • 3 CESU-8 byte can be converted to 1 char (for chars >= U+0800) → array becomes too short

Maybe I am overlooking something though; sorry for the trouble in that case.

(Also side note: Would it make sense to store the Charset.forName("CESU-8") in a static final field?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also couldn't see how the length gets respected here, and did some fuzzing with various @WithLength(min=..., max=...) values, and couldn't get the array length out of bounds. So it seems to be respected, for some reason!

Would it make sense to store the Charset.forName("CESU-8") in a static final field?

It would make sense, I think.

Copy link
Member

@kyakdan kyakdan Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Marcono1234 You are right regarding the convertWithLength assuming two bytes per char, which could lead to enforcing incorrect length constraints when interepreting the byte array as "CESU-8"-encoded. I adjusted the convertWithLength method to assume a maximum of 6 bytes for a char (the maximum number of bytes used with CESU-8). Then, when we do the actual conversion to a char array, we ensure to enforce the length constraints. @oetr Could you have a look at the last commit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch @Marcono1234 , thanks!

Copy link
Contributor

@Marcono1234 Marcono1234 Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this * 6 for a complete Unicode code point, specifically a code point >= 0x10000?
However a Java char is UTF-16, so those code points >= 0x10000 would actually require 2 char.

So wouldn't it suffice to do * 3 here?

Or at worst * 4 in case a malformed 4-byte CESU-8 encoding can lead to a single replacement ? char.

@kyakdan kyakdan force-pushed the CIF-1863-string-compares-on-char-arrays branch from 046a4e9 to f240d7a Compare November 19, 2025 12:15
simonresch and others added 3 commits November 19, 2025 13:16
When mutating char[] randomly interpret the bytes from libFuzzer as
individual (single byte) chars. This helps to make use of libFuzzers
table of recent compare entries (encoded as CESU8) if the char[] is used
as a String inside the fuzz test.
@kyakdan kyakdan force-pushed the CIF-1863-string-compares-on-char-arrays branch from f240d7a to ef7ea96 Compare November 19, 2025 12:16
Copy link
Contributor

@oetr oetr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one issue from my side!

Optional<WithLength> withLength = Optional.ofNullable(type.getAnnotation(WithLength.class));
int minLength = withLength.map(WithLength::min).orElse(DEFAULT_MIN_LENGTH);
int maxLength = withLength.map(WithLength::max).orElse(DEFAULT_MAX_LENGTH);
withLength.ifPresent(System.err::println);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed

}

if (chars.length < minLength) {
return Arrays.copyOf(chars, minLength);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the newly padded chars might not be in range.

Maybe we can do a single-pass copy and forceInRange here? Or is Java smart enough to optimize it at runtime?
WDYT about the following:

       int targetLength = Math.min(Math.max(chars.length, minLength), maxLength);

        if (chars.length == targetLength) {
          for (int i = 0; i < chars.length; i++) {
            chars[i] = (char) forceInRange(chars[i], minRange, maxRange);
          }
          return chars;
        } else {
          char[] result = new char[targetLength];
          for (int i = 0; i < targetLength; i++) {
            result[i] = i < chars.length
                  ? (char) forceInRange(chars[i], minRange, maxRange)
                  : (char) forceInRange(0, minRange, maxRange);
          }
          return result;
        }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants