test(dns): extend idn-hostname coverage for Bidi, A-label and ContextJ rules#2489
test(dns): extend idn-hostname coverage for Bidi, A-label and ContextJ rules#2489vtushar06 wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds additional negative test coverage for internationalized domain name (IDN) hostname validation, focusing on tricky RFC edge-cases.
Changes:
- Added new IDN hostname rejection tests for Bidi rules and whole-name constraints (RFC 5893).
- Added new A-label / Punycode validation tests (RFC 5890/5891/5892).
- Added ContextJ per-occurrence ZWNJ rule coverage (RFC 5892).
Comments suppressed due to low confidence (1)
test/dns/idn_hostname_test.cc:1
- The three inputs above are ASCII-only strings, but they’re written as hex byte escapes, which makes the test vectors harder to review and increases the chance of subtle literal-escaping mistakes. Prefer plain string literals (e.g.,
"xn--7a","xn--example-","xn---9uc") for ASCII to improve readability and reduce maintenance overhead.
#include <gtest/gtest.h>
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| TEST(DNS_idn_hostname, invalid_noncanonical_punycode) { | ||
| EXPECT_FALSE(sourcemeta::core::is_idn_hostname("\x78\x6e\x2d\x2d\x2d\x39\x75\x63")); | ||
| } |
| TEST(DNS_idn_hostname, invalid_bidi_digit_first_in_bidi_domain) { | ||
| EXPECT_FALSE(sourcemeta::core::is_idn_hostname("\x30\x61\x2e\xd7\x90")); | ||
| } | ||
|
|
||
| // RFC 5893 sec 2 cond 1: label must start with L, R or AL | ||
| TEST(DNS_idn_hostname, invalid_bidi_single_label_digit_then_rtl) { | ||
| EXPECT_FALSE(sourcemeta::core::is_idn_hostname("\x30\xd8\xa7")); | ||
| } |
| TEST(DNS_idn_hostname, invalid_zero_width_space) { | ||
| EXPECT_FALSE(sourcemeta::core::is_idn_hostname("\x61\xe2\x80\x8b\x62")); | ||
| } |
| // --- additional coverage: subtle cases (whole-name Bidi, A-label re-validation, | ||
| // non-canonical Punycode, per-occurrence ContextJ) --- |
🤖 Augment PR SummarySummary: Adds targeted regression tests for subtle IDN hostname validation edge cases that are easy to get wrong across implementations. Changes:
🤖 Was this summary useful? React with 👍 or 👎 |
|
|
||
| // RFC 5890 sec 2.3.2.1 + 5892 sec 2.6: decodes to U+00A1 (DISALLOWED) | ||
| TEST(DNS_idn_hostname, invalid_a_label_decodes_to_disallowed) { | ||
| EXPECT_FALSE(sourcemeta::core::is_idn_hostname("\x78\x6e\x2d\x2d\x37\x61")); |
There was a problem hiding this comment.
These A-label cases are pure ASCII; since is_hostname() also re-validates xn-- labels via idna_is_valid_a_label, consider asserting EXPECT_FALSE(sourcemeta::core::is_hostname(...)) here as well to pin behavior for the ASCII-only API.
Severity: low
Other Locations
test/dns/idn_hostname_test.cc:622test/dns/idn_hostname_test.cc:627test/dns/idn_hostname_test.cc:632
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
1 issue found across 1 file
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="test/dns/idn_hostname_test.cc">
<violation number="1" location="test/dns/idn_hostname_test.cc:617">
P3: This A-label is pure ASCII and `is_hostname()` also validates `xn--` labels. Add a corresponding `EXPECT_FALSE(sourcemeta::core::is_hostname(...))` assertion here (and for the other A-label tests at lines 622, 627, 632) to pin behavior for the ASCII-only validation path consistently with the pattern used elsewhere in this file.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
|
|
||
| // RFC 5890 sec 2.3.2.1 + 5892 sec 2.6: decodes to U+00A1 (DISALLOWED) | ||
| TEST(DNS_idn_hostname, invalid_a_label_decodes_to_disallowed) { | ||
| EXPECT_FALSE(sourcemeta::core::is_idn_hostname("\x78\x6e\x2d\x2d\x37\x61")); |
There was a problem hiding this comment.
P3: This A-label is pure ASCII and is_hostname() also validates xn-- labels. Add a corresponding EXPECT_FALSE(sourcemeta::core::is_hostname(...)) assertion here (and for the other A-label tests at lines 622, 627, 632) to pin behavior for the ASCII-only validation path consistently with the pattern used elsewhere in this file.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At test/dns/idn_hostname_test.cc, line 617:
<comment>This A-label is pure ASCII and `is_hostname()` also validates `xn--` labels. Add a corresponding `EXPECT_FALSE(sourcemeta::core::is_hostname(...))` assertion here (and for the other A-label tests at lines 622, 627, 632) to pin behavior for the ASCII-only validation path consistently with the pattern used elsewhere in this file.</comment>
<file context>
@@ -593,3 +593,56 @@ TEST(DNS_idn_hostname, label_with_out_of_order_combining_marks_rejected) {
+
+// RFC 5890 sec 2.3.2.1 + 5892 sec 2.6: decodes to U+00A1 (DISALLOWED)
+TEST(DNS_idn_hostname, invalid_a_label_decodes_to_disallowed) {
+ EXPECT_FALSE(sourcemeta::core::is_idn_hostname("\x78\x6e\x2d\x2d\x37\x61"));
+}
+
</file context>
…J edge cases Signed-off-by: Tushar Verma <tusharmyself06@gmail.com>
bad613f to
07db034
Compare
is_idn_hostnamealready handles these cases correctly. This adds 10 tests that pin that behaviour explicitly so regressions are caught early. Each case targets a rule that at least one other production library misses, confirmed by running the same inputs across 9 validators.Changes
invalid_bidi_digit_first_in_bidi_domain- whole-name Bidi check:0a.אis invalid because the LTR label0aviolates RFC 5893 §2 condition 1 when paired with an RTL label (python-jsonschema misses this)invalid_bidi_single_label_digit_then_rtl- RTL label starting with a digit is invalid (RFC 5893 §2 cond 1)invalid_bidi_ltr_label_with_rtl_letter- LTR label containing an RTL letter violates RFC 5893 §2 cond 5invalid_a_label_decodes_to_disallowed-xn--7adecodes to U+00A1, which is DISALLOWED under RFC 5892 §2.6; an A-label is only valid if it decodes to a valid U-label (RFC 5890 §2.3.2.1)invalid_a_label_decodes_to_ascii_only-xn--example-decodes toexample(all ASCII); a U-label must contain at least one non-ASCII character (RFC 5890 §2.3.2.1)invalid_a_label_decodes_to_bidi_violation- A-label whose decoded U-label itself violates the Bidi ruleinvalid_noncanonical_punycode-xn---9ucdecodes to a codepoint whose canonical A-label isxn--9uc; round-trip must match (RFC 5891 §5.4)invalid_fullwidth_digits- U+FF11..U+FF13 are DISALLOWED under RFC 5892 §2.6; UTS46 processors silently map them to ASCII and acceptinvalid_zero_width_space- U+200B is DISALLOWED under RFC 5892 §2.6; several UTS46 libraries strip it silentlyinvalid_zwnj_failing_at_one_occurrence- ZWNJ rule (RFC 5892 appendix A.1, erratum 3312) must pass at every occurrence; Node and PHP accept a label where one ZWNJ is valid and a second is notEcosystem Impact
idna3.10 FAILS (per-label check misses cross-label condition); Gox/net/idnaPASSESx/net/idnaand Node ACCEPT; python idna REJECTS - same class as CVE-2024-12224 (Rust idna), CVE-2026-39821 (Go x/net), CVE-2026-46644 (PHP polyfill)RFC References
Cross-implementation evidence: https://github.com/vtushar06/JSON-Schema-format-test-Evidence/blob/main/idn-hostname.md