Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new proxy.config.http.connect.down.policy value (3) intended to treat origin server inactivity timeouts (after a connection is established) as failures for the purposes of marking an origin down.
Changes:
- Extend
HttpSM::track_connect_fail()to recognizeconnect_down_policy == 3, including an additional timeout-based condition. - Update admin documentation to describe the new policy value
3.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/proxy/http/HttpSM.cc |
Updates server-down tracking logic to treat policy 3 as “policy 2 + inactivity timeout” behavior. |
doc/admin-guide/files/records.yaml.en.rst |
Documents the new connect.down.policy = 3 option and its intended effect. |
tests/gold_tests/connect_down_policy/connect_down_policy_3.test.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
tests/gold_tests/connect_down_policy/connect_down_policy_3.test.py
Outdated
Show resolved
Hide resolved
| // What does our policy say? | ||
| if (t_state.txn_conf->connect_down_policy == 2) { // Any connection error through TLS handshake | ||
| if (t_state.txn_conf->connect_down_policy == 2 || | ||
| t_state.txn_conf->connect_down_policy == 3) { // Any connection error through TLS handshake |
There was a problem hiding this comment.
I prefer to make this config bit mask, it'll allow us to choose combination of policies when we have more policies.
bryancall
left a comment
There was a problem hiding this comment.
LGTM. Clean change — policy 3 correctly extends policy 2 by adding the INACTIVE_TIMEOUT check outside the had_connect_fail() guard, since connect_result is cleared when the connection transitions to CONNECTION_ALIVE. The test is well done with both positive (policy=3 marks down) and negative (policy=2 does not) cases. Docs are clear.
Re: Copilot comment about t_state.current.server->state vs t_state.current.state — both fields get set to INACTIVE_TIMEOUT via different handlers (tunnel handlers set server->state, while handle_server_setup_error and tunnel_handler_post_server set current.state). The check is correct as written.
Re: bitmask suggestion — the failure modes here are hierarchical, not orthogonal (TCP is required for TLS, and a successful connection is required for inactive timeout), so an enum/level approach is the right model. A bitmask would allow expressing meaningless combinations like TLS only without TCP.
Add option 3 to connect down policy to count inactive connections as failures