Skip to content

test(dns): extend idn-hostname coverage for Bidi, A-label and ContextJ rules#2489

Open
vtushar06 wants to merge 1 commit into
sourcemeta:mainfrom
vtushar06:idn-hostname-test-coverage
Open

test(dns): extend idn-hostname coverage for Bidi, A-label and ContextJ rules#2489
vtushar06 wants to merge 1 commit into
sourcemeta:mainfrom
vtushar06:idn-hostname-test-coverage

Conversation

@vtushar06

@vtushar06 vtushar06 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

is_idn_hostname already handles these cases correctly. This adds 10 tests that pin that behaviour explicitly so regressions are caught early. Each case targets a rule that at least one other production library misses, confirmed by running the same inputs across 9 validators.

Changes

  • invalid_bidi_digit_first_in_bidi_domain - whole-name Bidi check: 0a.א is invalid because the LTR label 0a violates RFC 5893 §2 condition 1 when paired with an RTL label (python-jsonschema misses this)
  • invalid_bidi_single_label_digit_then_rtl - RTL label starting with a digit is invalid (RFC 5893 §2 cond 1)
  • invalid_bidi_ltr_label_with_rtl_letter - LTR label containing an RTL letter violates RFC 5893 §2 cond 5
  • invalid_a_label_decodes_to_disallowed - xn--7a decodes to U+00A1, which is DISALLOWED under RFC 5892 §2.6; an A-label is only valid if it decodes to a valid U-label (RFC 5890 §2.3.2.1)
  • invalid_a_label_decodes_to_ascii_only - xn--example- decodes to example (all ASCII); a U-label must contain at least one non-ASCII character (RFC 5890 §2.3.2.1)
  • invalid_a_label_decodes_to_bidi_violation - A-label whose decoded U-label itself violates the Bidi rule
  • invalid_noncanonical_punycode - xn---9uc decodes to a codepoint whose canonical A-label is xn--9uc; round-trip must match (RFC 5891 §5.4)
  • invalid_fullwidth_digits - U+FF11..U+FF13 are DISALLOWED under RFC 5892 §2.6; UTS46 processors silently map them to ASCII and accept
  • invalid_zero_width_space - U+200B is DISALLOWED under RFC 5892 §2.6; several UTS46 libraries strip it silently
  • invalid_zwnj_failing_at_one_occurrence - ZWNJ rule (RFC 5892 appendix A.1, erratum 3312) must pass at every occurrence; Node and PHP accept a label where one ZWNJ is valid and a second is not

Ecosystem Impact

  1. Bidi whole-name rule: python idna 3.10 FAILS (per-label check misses cross-label condition); Go x/net/idna PASSES
  2. A-label decodes to DISALLOWED: Go, Node, PHP, Rust, Java, ICU4J, Guava all ACCEPT; python idna REJECTS (7/9 wrong)
  3. A-label decodes to ASCII only: Go x/net/idna and Node ACCEPT; python idna REJECTS - same class as CVE-2024-12224 (Rust idna), CVE-2026-39821 (Go x/net), CVE-2026-46644 (PHP polyfill)
  4. Non-canonical Punycode: python idna and Node ACCEPT; Go, PHP, Rust, ICU4J REJECT
  5. Fullwidth digits / zero-width space: 8/9 validators ACCEPT (all but python idna); GNU libidn2 GitLab issue Exercise installation on local builds #136
  6. ZWNJ per-occurrence: Node and PHP ACCEPT the mixed case; Go and python idna REJECT

RFC References

Cross-implementation evidence: https://github.com/vtushar06/JSON-Schema-format-test-Evidence/blob/main/idn-hostname.md

Copilot AI review requested due to automatic review settings June 11, 2026 18:47

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds additional negative test coverage for internationalized domain name (IDN) hostname validation, focusing on tricky RFC edge-cases.

Changes:

  • Added new IDN hostname rejection tests for Bidi rules and whole-name constraints (RFC 5893).
  • Added new A-label / Punycode validation tests (RFC 5890/5891/5892).
  • Added ContextJ per-occurrence ZWNJ rule coverage (RFC 5892).
Comments suppressed due to low confidence (1)

test/dns/idn_hostname_test.cc:1

  • The three inputs above are ASCII-only strings, but they’re written as hex byte escapes, which makes the test vectors harder to review and increases the chance of subtle literal-escaping mistakes. Prefer plain string literals (e.g., "xn--7a", "xn--example-", "xn---9uc") for ASCII to improve readability and reduce maintenance overhead.
#include <gtest/gtest.h>

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +631 to +633
TEST(DNS_idn_hostname, invalid_noncanonical_punycode) {
EXPECT_FALSE(sourcemeta::core::is_idn_hostname("\x78\x6e\x2d\x2d\x2d\x39\x75\x63"));
}
Comment on lines +601 to +608
TEST(DNS_idn_hostname, invalid_bidi_digit_first_in_bidi_domain) {
EXPECT_FALSE(sourcemeta::core::is_idn_hostname("\x30\x61\x2e\xd7\x90"));
}

// RFC 5893 sec 2 cond 1: label must start with L, R or AL
TEST(DNS_idn_hostname, invalid_bidi_single_label_digit_then_rtl) {
EXPECT_FALSE(sourcemeta::core::is_idn_hostname("\x30\xd8\xa7"));
}
Comment on lines +641 to +643
TEST(DNS_idn_hostname, invalid_zero_width_space) {
EXPECT_FALSE(sourcemeta::core::is_idn_hostname("\x61\xe2\x80\x8b\x62"));
}
Comment thread test/dns/idn_hostname_test.cc Outdated
Comment on lines +597 to +598
// --- additional coverage: subtle cases (whole-name Bidi, A-label re-validation,
// non-canonical Punycode, per-occurrence ContextJ) ---
@augmentcode

augmentcode Bot commented Jun 11, 2026

Copy link
Copy Markdown
🤖 Augment PR Summary

Summary: Adds targeted regression tests for subtle IDN hostname validation edge cases that are easy to get wrong across implementations.

Changes:

  • Introduces 10 new negative tests for whole-domain Bidi enforcement (RFC 5893) across labels
  • Adds explicit A-label re-validation cases where Punycode decoding yields DISALLOWED code points, ASCII-only U-labels, or Bidi violations (RFC 5890/5892/5893)
  • Covers non-canonical Punycode round-trip rejection (RFC 5891 §5.4)
  • Extends coverage for DISALLOWED code points like fullwidth digits and zero-width space (RFC 5892)
  • Adds a per-occurrence ContextJ/ZWNJ failure case to ensure contextual rules are enforced for every ZWNJ occurrence

🤖 Was this summary useful? React with 👍 or 👎

@augmentcode augmentcode Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestion posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.


// RFC 5890 sec 2.3.2.1 + 5892 sec 2.6: decodes to U+00A1 (DISALLOWED)
TEST(DNS_idn_hostname, invalid_a_label_decodes_to_disallowed) {
EXPECT_FALSE(sourcemeta::core::is_idn_hostname("\x78\x6e\x2d\x2d\x37\x61"));

@augmentcode augmentcode Bot Jun 11, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These A-label cases are pure ASCII; since is_hostname() also re-validates xn-- labels via idna_is_valid_a_label, consider asserting EXPECT_FALSE(sourcemeta::core::is_hostname(...)) here as well to pin behavior for the ASCII-only API.

Severity: low

Other Locations
  • test/dns/idn_hostname_test.cc:622
  • test/dns/idn_hostname_test.cc:627
  • test/dns/idn_hostname_test.cc:632

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="test/dns/idn_hostname_test.cc">

<violation number="1" location="test/dns/idn_hostname_test.cc:617">
P3: This A-label is pure ASCII and `is_hostname()` also validates `xn--` labels. Add a corresponding `EXPECT_FALSE(sourcemeta::core::is_hostname(...))` assertion here (and for the other A-label tests at lines 622, 627, 632) to pin behavior for the ASCII-only validation path consistently with the pattern used elsewhere in this file.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic


// RFC 5890 sec 2.3.2.1 + 5892 sec 2.6: decodes to U+00A1 (DISALLOWED)
TEST(DNS_idn_hostname, invalid_a_label_decodes_to_disallowed) {
EXPECT_FALSE(sourcemeta::core::is_idn_hostname("\x78\x6e\x2d\x2d\x37\x61"));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: This A-label is pure ASCII and is_hostname() also validates xn-- labels. Add a corresponding EXPECT_FALSE(sourcemeta::core::is_hostname(...)) assertion here (and for the other A-label tests at lines 622, 627, 632) to pin behavior for the ASCII-only validation path consistently with the pattern used elsewhere in this file.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At test/dns/idn_hostname_test.cc, line 617:

<comment>This A-label is pure ASCII and `is_hostname()` also validates `xn--` labels. Add a corresponding `EXPECT_FALSE(sourcemeta::core::is_hostname(...))` assertion here (and for the other A-label tests at lines 622, 627, 632) to pin behavior for the ASCII-only validation path consistently with the pattern used elsewhere in this file.</comment>

<file context>
@@ -593,3 +593,56 @@ TEST(DNS_idn_hostname, label_with_out_of_order_combining_marks_rejected) {
+
+// RFC 5890 sec 2.3.2.1 + 5892 sec 2.6: decodes to U+00A1 (DISALLOWED)
+TEST(DNS_idn_hostname, invalid_a_label_decodes_to_disallowed) {
+  EXPECT_FALSE(sourcemeta::core::is_idn_hostname("\x78\x6e\x2d\x2d\x37\x61"));
+}
+
</file context>

…J edge cases

Signed-off-by: Tushar Verma <tusharmyself06@gmail.com>
@vtushar06 vtushar06 force-pushed the idn-hostname-test-coverage branch from bad613f to 07db034 Compare June 11, 2026 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants