Skip to content

fix: exclude additional section records from finalCacheOnly results#11

Merged
nic-6443 merged 4 commits intomasterfrom
fix/finalcacheonly-exclude-additional-section
Apr 1, 2026
Merged

fix: exclude additional section records from finalCacheOnly results#11
nic-6443 merged 4 commits intomasterfrom
fix/finalcacheonly-exclude-additional-section

Conversation

@jarvis9443
Copy link
Copy Markdown

@jarvis9443 jarvis9443 commented Apr 1, 2026

Problem

When finalCacheOnly is enabled (used by api7-ee-3-gateway) and additional_section is true (default in resolve()), the parseAnswer() function incorrectly returns DNS Additional section (section=3) glue records as answers for the queried domain.

Root Cause

The finalCacheOnly block in parseAnswer() (L658-678) filters records by type only, not by name or section:

  1. additional_section=true brings Section 3 glue records (A records for nameservers) into the flat answers list
  2. finalCacheOnly keeps all records matching qtype (e.g., TYPE_A) — including Section 3 A records that belong to different domains
  3. Line 674 then overwrites all kept records names to check_qname, making Section 3 records appear to belong to the queried domain
  4. The subsequent name-based filter at line 687 no longer catches them since the name was already overwritten
  5. Caller receives records with the correct queried name but wrong IP address (from nameserver glue records)

Symptoms

  • DNS resolution returns IP addresses from DNS Additional section (nameserver glue records) instead of the correct Answer section
  • The returned record has section: 3 (proof of wrong origin)
  • TTL is set to min_ttl across all sections, producing unexpected TTL values
  • The issue is intermittent because math.random() selects from all kept records, sometimes picking the correct one

Fix

Added a section == 1 (Answer section) check in the finalCacheOnly block to exclude non-Answer section records from both the result set and the min_ttl calculation. These are glue records for nameservers and should never be treated as answers for the queried domain.

CI Fix

Fixed pre-existing t/02-timer-usage.t failure: the test relied on external DNS resolution (8.8.8.8 for httpbin.org and mockbin.org) which fails in CI environments. Switched to use local CoreDNS server (127.0.0.1:15353) with test domains svc1.test and svc2.test. Also fixed a nil dereference bug in the test where resolve failure did not early-return before accessing the result.

Test Coverage

Added 3 test cases for finalCacheOnly with Additional section records:

  1. Basic case: Answer section A record + Additional section A glue records — only Answer section record returned
  2. CNAME chain case: CNAME + A in Answer + glue records in Additional — correct resolution with proper min_ttl from section 1 only
  3. Edge case: Only CNAME in Answer + A in Additional — Additional section A records not returned as answers

When finalCacheOnly is enabled and additional_section is true (default),
the parseAnswer() function incorrectly keeps A/AAAA records from the DNS
Additional section (section=3) alongside Answer section records. It then
overwrites their names to the queried domain name, causing wrong IPs
(belonging to nameserver glue records) to be returned for the queried domain.

The fix adds a section check in the finalCacheOnly block to exclude
Additional section (section=3) records from both the result set and the
min_ttl calculation. These records are glue records for nameservers and
should never be treated as answers for the queried domain.

Root cause chain:
1. additional_section=true brings Section 3 glue records into the flat
   answers list
2. finalCacheOnly filters by type only (not name or section)
3. Line 674 overwrites all kept records' names to check_qname, making
   Section 3 records appear to belong to the queried domain
4. The subsequent name-based filter at line 687 no longer catches them
5. Caller receives records with correct name but wrong IP address

The section=3 field preserved in output serves as evidence of the bug.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 1, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e6d27218-e4c6-4a75-826c-8a215dfc79cb

📥 Commits

Reviewing files that changed from the base of the PR and between 0439da4 and 35c9da2.

📒 Files selected for processing (1)
  • src/resty/dns/client.lua
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/resty/dns/client.lua

📝 Walkthrough

Walkthrough

When finalCacheOnly is enabled, DNS parsing was changed so records from the Additional section (section 3) are excluded from result sets and from minimum-TTL computation when the final Answer matches the requested qtype; tests were added to verify this behavior across A and CNAME scenarios.

Changes

Cohort / File(s) Summary
Test Coverage: finalCacheOnly
spec/client_cache_spec.lua
Added a new describe("finalCacheOnly") test suite (+151 lines) that verifies Answer vs Additional section behavior: Answer A records preferred over Additional glue A records; CNAME chains in Answer resolve to Answer A records with TTL from Answer section; Additional-only A records are not returned when Answer lacks A.
DNS parsing logic
src/resty/dns/client.lua
Adjusted parseAnswer (+12/-4 lines) to, when finalCacheOnly is true and the last answer matches the requested qtype, ignore Additional-section (section == 3) records for both min-TTL calculation and the set of returned records.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I hop through answers, tidy and merry,
I leave glue behind — no extra carry,
When final’s true I keep the right clue,
TTLs from answers, neat and true,
A joyous nibble — DNS made airy! 🥕✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and accurately summarizes the main change: excluding additional section records from finalCacheOnly results, which is the core fix implemented in the DNS client.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/finalcacheonly-exclude-additional-section

Comment @coderabbitai help to get the list of available commands and usage tips.

@jarvis9443
Copy link
Copy Markdown
Author

The CI failure in t/02-timer-usage.t is a pre-existing infrastructure issue — the test tries to resolve external domains (httpbin.org, mockbin.org) via 8.8.8.8 in the CI environment. This is unrelated to the changes in this PR.

All busted spec tests (including the 3 new finalCacheOnly tests) pass successfully. The only pre-existing spec failures are fetching multiple A records and weight change for unresolved record tests.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
spec/client_cache_spec.lua (1)

670-701: Consider strengthening the edge-case test assertions.

The conditional assertion at lines 694-700 is defensive but doesn't clearly verify the expected behavior. It allows result to be nil without checking err, and if result exists, it only checks that no section=3 records are present.

Consider making the test more explicit about the expected outcome:

♻️ Suggested improvement for clearer assertions
       -- Should not return the section 3 A record as an answer
       local result, err = client.resolve("myservice", { qtype = client.TYPE_A })
-      if result then
-        -- If any result is returned, it must not be the section 3 record
-        for _, r in ipairs(result) do
-          assert.not_equal(3, r.section)
-          assert.not_equal("10.0.0.11", r.address)
-        end
-      end
+      -- With only CNAME in Answer section and no A record to dereference,
+      -- resolution should fail or return empty/CNAME only
+      if result and `#result` > 0 then
+        -- If any result is returned, it must not be the section 3 record
+        for _, r in ipairs(result) do
+          assert.not_equal(3, r.section)
+          assert.not_equal("10.0.0.11", r.address)
+        end
+      else
+        -- Expected: either nil result or empty result since there's no
+        -- resolvable A record in the Answer section
+        assert.truthy(result == nil or `#result` == 0 or result[1].type == client.TYPE_CNAME)
+      end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/client_cache_spec.lua` around lines 670 - 701, Make the test explicit
about expected outcomes by asserting error absence and tightening result checks:
call client.resolve("myservice", { qtype = client.TYPE_A }), then immediately
assert.is_nil(err); if result is nil assert.is_nil(result) (or
assert.is_falsy(result)) to confirm no unexpected data, otherwise iterate result
and assert no record has section == 3 and no record.address == "10.0.0.11";
optionally also assert that any returned answer is the CNAME (type ==
client.TYPE_CNAME) to verify the answer section behavior. Use the existing local
variables result and err and reference client.resolve and mock_records in the
updated assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@spec/client_cache_spec.lua`:
- Around line 670-701: Make the test explicit about expected outcomes by
asserting error absence and tightening result checks: call
client.resolve("myservice", { qtype = client.TYPE_A }), then immediately
assert.is_nil(err); if result is nil assert.is_nil(result) (or
assert.is_falsy(result)) to confirm no unexpected data, otherwise iterate result
and assert no record has section == 3 and no record.address == "10.0.0.11";
optionally also assert that any returned answer is the CNAME (type ==
client.TYPE_CNAME) to verify the answer section behavior. Use the existing local
variables result and err and reference client.resolve and mock_records in the
updated assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b1b7f84c-47bc-4ff7-8f49-07d9050878b7

📥 Commits

Reviewing files that changed from the base of the PR and between 4165396 and 5757082.

📒 Files selected for processing (2)
  • spec/client_cache_spec.lua
  • src/resty/dns/client.lua

@nic-6443 nic-6443 self-assigned this Apr 1, 2026
Per review feedback, using == 1 (Answer section) is more precise than
!= 3 (not Additional section). We only want Answer section records in
the finalCacheOnly result set.
t/02-timer-usage.t relied on external DNS resolution (8.8.8.8 for
httpbin.org and mockbin.org) which fails in CI environments without
external network access. Changes:

- Switch to local CoreDNS server (127.0.0.1:15353) with test domains
  svc1.test and svc2.test
- Add CoreDNS template entries for the test domains
- Use consistent client.init() config in both init_worker and access
  phases (the access_by_lua block previously called client.init()
  without nameserver config, falling back to system resolv.conf)
- Add early return after resolve failure to prevent nil dereference
@nic-6443 nic-6443 merged commit b9e4d9a into master Apr 1, 2026
1 check passed
@nic-6443 nic-6443 deleted the fix/finalcacheonly-exclude-additional-section branch April 1, 2026 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants