Support octal escape sequences.#100
Conversation
|
Hi @kaos , thank you very much for your contribution. |
|
Hi, certainly! I thought I ran all the tests before submitting, but I obviously failed to do so 🤦🏽 Looking at the null.d test, there are escape sequences on the form
So the previous escape should be parsed into two characters: It would seem that this is a breaking change, but one that follows the ISO C specification. Not sure how you would prefer to resolve this? (I'll take this opportunity to note that it seems PackCC enforces the use of two hex digits for This diff to the tests make them pass again (by adding a --- a/tests/null.d/input.peg
+++ b/tests/null.d/input.peg
@@ -6,8 +6,8 @@ CHAR_CLASS_0 <- "char_class_0_a:" [abc\0-!123]+ { printf("CHAR_CLASS_0_A\n"); }
CHAR_CLASS_1 <- "char_class_1_a:" [abc\0]+ { printf("CHAR_CLASS_1_A\n"); } / "char_class_1_b:" [abc\x00]+ { printf("CHAR_CLASS_1_B\n"); }
CHAR_CLASS_2 <- "char_class_2_a:" [\0-!]+ { printf("CHAR_CLASS_2_A\n"); } / "char_class_2_b:" [\x00-!]+ { printf("CHAR_CLASS_2_B\n"); }
CHAR_CLASS_3 <- "char_class_3_a:" [\0]+ { printf("CHAR_CLASS_3_A\n"); } / "char_class_3_b:" [\x00]+ { printf("CHAR_CLASS_3_B\n"); }
-STRING_0 <- "string_0_a:" "abc\0123" { printf("STRING_0_A\n"); } / "string_0_b:" "abc\x00123" { printf("STRING_0_B\n"); }
+STRING_0 <- "string_0_a:" "abc\x00123" { printf("STRING_0_A\n"); } / "string_0_b:" "abc\x00123" { printf("STRING_0_B\n"); }
STRING_1 <- "string_1_a:" "abc\0" { printf("STRING_1_A\n"); } / "string_1_b:" "abc\x00" { printf("STRING_1_B\n"); }
-STRING_2 <- "string_2_a:" "\0123" { printf("STRING_2_A\n"); } / "string_2_b:" "\x00123" { printf("STRING_2_B\n"); }
+STRING_2 <- "string_2_a:" "\x00123" { printf("STRING_2_A\n"); } / "string_2_b:" "\x00123" { printf("STRING_2_B\n"); }
STRING_3 <- "string_3_a:" "\0" { printf("STRING_3_A\n"); } / "string_3_b:" "\x00" { printf("STRING_3_B\n"); }
-CAPTURED <- "captured_a:" < CHAR_CLASS_0 "xyz\0123" > "|" $1 { printf("CAPTURED_A\n"); } / "captured_b:" < CHAR_CLASS_0 "xyz\x00123" > "|" $2 { printf("CAPTURED_B\n"); }
+CAPTURED <- "captured_a:" < CHAR_CLASS_0 "xyz\x00123" > "|" $1 { printf("CAPTURED_A\n"); } / "captured_b:" < CHAR_CLASS_0 "xyz\x00123" > "|" $2 { printf("CAPTURED_B\n"); } |
I've also verified this change by looking at the generated parser rule code.
Diff for
parser.cwhen using the octal character class before/after this change:PCC_DEBUG(ctx->auxil, PCC_DBG_EVALUATE, "UPPER_LETTER", ctx->level, chunk->pos, ctx->buffer.p + chunk->pos, ctx->buffer.n - chunk->pos); ctx->level++; { int u; const size_t n = pcc_get_char_as_utf32(ctx, &u); if (n == 0) goto L0000; if (!( - u == 0x000031 || - u == 0x000030 || - (u >= 0x000031 && u <= 0x000031) || - u == 0x000033 || - u == 0x000032 + (u >= 0x000041 && u <= 0x00005a) )) goto L0000; ctx->cur += n; }