Skip to content

Add AuTest for connect_attempts rr_retries and max_retries#12932

Merged
masaori335 merged 4 commits intoapache:masterfrom
masaori335:asf-master-hostdb-autest
Mar 6, 2026
Merged

Add AuTest for connect_attempts rr_retries and max_retries#12932
masaori335 merged 4 commits intoapache:masterfrom
masaori335:asf-master-hostdb-autest

Conversation

@masaori335
Copy link
Contributor

Covers combination of proxy.config.http.connect_attempts_rr_retries and proxy.config.http.connect_attempts_max_retries.

The connect_attempts_max_retries_down_server will be covered by #12922

@masaori335 masaori335 added this to the 11.0.0 milestone Mar 1, 2026
@masaori335 masaori335 self-assigned this Mar 1, 2026
@masaori335 masaori335 force-pushed the asf-master-hostdb-autest branch from ab39cfa to 3b069bc Compare March 2, 2026 01:41
@masaori335 masaori335 requested a review from Copilot March 2, 2026 01:41
@masaori335 masaori335 changed the title Add AuTest for connect_attempts Add AuTest for connect_attempts rr_retries and max_retries Mar 2, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new AuTest coverage for origin connect-attempt retry behaviors (round-robin retries vs max retries) and extends the ATS replay test harness to support DNS record injection and error.log validation via replay YAML.

Changes:

  • Added three new Proxy Verifier replay scenarios covering proxy.config.http.connect_attempts_rr_retries and proxy.config.http.connect_attempts_max_retries.
  • Added new dns/connect_attempts.test.py to run the new replay tests and gold files for error.log expectations.
  • Extended tests/gold_tests/autest-site/ats_replay.test.ext to support DNS records injection and error_log validation (including gold-file based validation).

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/gold_tests/dns/replay/connect_attempts_rr_retries.replay.yaml New replay scenario for round-robin retry behavior and down-server cache behavior.
tests/gold_tests/dns/replay/connect_attempts_rr_no_retry.replay.yaml New replay scenario covering baseline behavior with no retries.
tests/gold_tests/dns/replay/connect_attempts_rr_max_retries.replay.yaml New replay scenario covering connect_attempts_max_retries behavior.
tests/gold_tests/dns/gold/connect_attempts_rr_retries_error_log.gold Expected error.log output for rr-retries scenario.
tests/gold_tests/dns/gold/connect_attempts_rr_no_error_log.gold Expected error.log output for no-retry scenario.
tests/gold_tests/dns/gold/connect_attempts_rr_max_retries_error_log.gold Expected error.log output for max-retries scenario.
tests/gold_tests/dns/connect_attempts.test.py Test driver that runs the three new replay tests.
tests/gold_tests/autest-site/ats_replay.test.ext Adds error_log validation support and DNS records wiring for replay-based tests.

@bryancall bryancall added the DNS label Mar 2, 2026
@bryancall bryancall self-requested a review March 2, 2026 22:39
@masaori335
Copy link
Contributor Author

Somehow, the connect_attempts_rr_max_retries test is keep trying 0.0.0.1 even after it's marked "down" instead of round robin to 0.0.0.2 in this env.

20260302.01h50m26s CONNECT: attempt fail [CONNECTION_ERROR] to 0.0.0.1:62299 for host='example.com' connection_result=ETIMEDOUT [110] error=ETIMEDOUT [110] retry_attempts=0 url='http://backend.example.com:62299/path/'
20260302.01h50m28s CONNECT: attempt fail [CONNECTION_ERROR] to 0.0.0.1:62299 for host='example.com' connection_result=ETIMEDOUT [110] error=ETIMEDOUT [110] retry_attempts=1 url='http://backend.example.com:62299/path/'
20260302.01h50m30s CONNECT: attempt fail [CONNECTION_ERROR] to 0.0.0.1:62299 for host='example.com' connection_result=ETIMEDOUT [110] error=ETIMEDOUT [110] retry_attempts=2 url='http://backend.example.com:62299/path/'
20260302.01h50m30s CONNECT : ETIMEDOUT [110] connecting to 0.0.0.1:62299 for host='example.com' url='http://backend.example.com:62299/path/' fail_count='1' marking down
20260302.01h50m31s CONNECT: attempt fail [CONNECTION_ERROR] to 0.0.0.1:62299 for host='example.com' connection_result=ETIMEDOUT [110] error=ETIMEDOUT [110] retry_attempts=0 url='http://backend.example.com:62299/path/'
20260302.01h50m33s CONNECT: attempt fail [CONNECTION_ERROR] to 0.0.0.1:62299 for host='example.com' connection_result=ETIMEDOUT [110] error=ETIMEDOUT [110] retry_attempts=1 url='http://backend.example.com:62299/path/'
20260302.01h50m34s CONNECT: attempt fail [CONNECTION_ERROR] to 0.0.0.1:62299 for host='example.com' connection_result=ETIMEDOUT [110] error=ETIMEDOUT [110] retry_attempts=2 url='http://backend.example.com:62299/path/'

@masaori335 masaori335 marked this pull request as draft March 3, 2026 00:31
@masaori335
Copy link
Contributor Author

I found difference on my mac and docker env. We're hitting ETIMEDOUT [110]. This takes time and down_server.cache_time: 5 was too short. I extended it to 10 seconds.

An idea of speed up this test is adding some features to proxy-verifer to refuse connection immediately.

@masaori335
Copy link
Contributor Author

[approve ci autest 0of4]

@bryancall
Copy link
Contributor

[approve ci autest 0]

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.


You can also share your feedback on Copilot code review. Take the survey.

else:
dns = tr.MakeDNServer(name, default='127.0.0.1')
if 'records' in dns_config:
dns.addRecords(dns_config['records'])
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the rest of the test suite, consider calling dns.addRecords using the named argument (records=...). Every other caller in tests/gold_tests uses the keyword form, which makes the call site clearer and consistent.

Suggested change
dns.addRecords(dns_config['records'])
dns.addRecords(records=dns_config['records'])

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +44
proxy.config.diags.debug.tags: 'http|hostdb|dns'
proxy.config.http.connect_attempts_rr_retries: 2
proxy.config.http.connect_attempts_max_retries: 0
proxy.config.http.connect_attempts_max_retries_down_server: 0
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says this covers the combination of connect_attempts_rr_retries and connect_attempts_max_retries, but in this replay config connect_attempts_max_retries is set to 0 so the retry path that uses connect_attempts_rr_retries to rotate RR targets is never exercised. Consider adding (or adjusting) a scenario where both settings are non-zero so the combined behavior is actually tested.

Copilot uses AI. Check for mistakes.
Comment on lines +118 to +122
fields:
- [Host, example.com]
- [uuid, 20]
delay: 10s

Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This request uses a 10s delay to wait for proxy.config.http.down_server.cache_time (10s) to expire. Using an equal delay makes the test timing-sensitive, and the fixed 10s sleep significantly slows the gold test suite. Consider reducing the cache_time for the test and using a delay slightly greater than it (or keep cache_time=10 but make the delay >10s).

Copilot uses AI. Check for mistakes.
fields:
- [Host, example.com]
- [uuid, 20]
delay: 10s
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This request uses a 10s delay to wait for proxy.config.http.down_server.cache_time (5s) to expire. That makes the test unnecessarily slow; consider shrinking the delay to just over the cache_time (or lowering cache_time + delay together) to keep the suite fast while still ensuring the cache has expired.

Suggested change
delay: 10s
delay: 6s

Copilot uses AI. Check for mistakes.
fields:
- [Host, example.com]
- [uuid, 20]
delay: 10s
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This request uses a 10s delay to wait for proxy.config.http.down_server.cache_time (10s) to expire. Using an equal delay can be timing-sensitive, and the fixed 10s sleep slows the suite. Consider reducing cache_time for the test and using a delay slightly greater than it (or keep cache_time=10 but make delay >10s).

Suggested change
delay: 10s
delay: 11s

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@bryancall bryancall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this, @masaori335. The test design is solid — using sm_id in the gold files to verify round-robin behavior across retries is clever, and the DNS record injection + error_log validation extensions to the autest harness are well done. Built and ran the test on an ASAN build and it passes cleanly. 👍

@masaori335
Copy link
Contributor Author

I'll merge this ignoring Copilot's suggestions. I'll revisit this if we find this is flaky.

@masaori335 masaori335 merged commit b5557e6 into apache:master Mar 6, 2026
19 checks passed
@github-project-automation github-project-automation bot moved this to For v10.2.0 in ATS v10.2.x Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: For v10.2.0

Development

Successfully merging this pull request may close these issues.

3 participants