Skip to content

fix: honor server-directed FDv1 Fallback Directive in initializer phase#423

Draft
keelerm84 wants to merge 3 commits intomainfrom
mk/sdk-2290/fdv1-fallback
Draft

fix: honor server-directed FDv1 Fallback Directive in initializer phase#423
keelerm84 wants to merge 3 commits intomainfrom
mk/sdk-2290/fdv1-fallback

Conversation

@keelerm84
Copy link
Copy Markdown
Member

Summary

  • FDv2 initializer responses carrying X-LD-FD-Fallback: true now apply any accompanying payload first, then the data system swaps the synchronizer list to the FDv1 Fallback Synchronizer (or transitions to OFF when none is configured). Synchronizer-phase behavior is unchanged; the directive is one-way.
  • Adds Basis.fallback_to_fdv1 and renames Update.revert_to_fdv1 to Update.fallback_to_fdv1 so initializer/synchronizer results carry the signal explicitly. Polling and streaming data sources detect the header on both success and error paths via a shared helper.
  • Contract test service declares the fdv1-fallback capability and accepts a top-level dataSystem.fdv1Fallback config object, wiring it directly to datasystem.fdv1_compatible_synchronizer(...) instead of inferring it from the last polling synchronizer. Bumps the contract-tests pin from v3.0.0-alpha.4 to v3.0.0-alpha.6.

Mirrors the Go reference change in go-server-sdk#365 and contract-test wiring in go-server-sdk#368. Spec rationale lives in sdk-specs#155; the new harness suite is in sdk-test-harness#336.

Test plan

  • pytest (LD_SKIP_DATABASE_TESTS=1): 975 passed, 194 skipped (database-only).
  • mypy ldclient: clean.
  • isort --check, pycodestyle: clean.
  • New deterministic tests cover initializer fallback on success, on error, with and without FDv1 configured, plus streaming Start-then-payload and synchronizer fallback after a Valid update. No real wall clocks or sleeps introduced.
  • Contract-test workflow against v3.0.0-alpha.6 runs the new "FDv1 Fallback Directive" suite cleanly in CI.

Previously the X-LD-FD-Fallback response header was honored only in
the synchronizer phase, and in the synchronizer phase a payload
arriving alongside the directive was discarded because the streaming
processor halted before applying it. Per the updated FDv2 Data System
spec, the directive must be honored in both initializer and
synchronizer phases, must take precedence over the section 1.2
failover algorithm, and must let any accompanying payload be applied
before the SDK switches terminally to the FDv1 Fallback Synchronizer.

Carry the signal explicitly on the public initializer/synchronizer
result types -- Basis.fallback_to_fdv1 and Update.fallback_to_fdv1 --
so callers cannot silently drop it. Update.revert_to_fdv1 is renamed
to Update.fallback_to_fdv1 to match the spec terminology.
Declare the fdv1-fallback capability and accept the new top-level
dataSystem.fdv1Fallback config object, wiring it directly to the SDK's
FDv1 Fallback Synchronizer. Drop the heuristic that previously inferred
the FDv1 fallback from the last polling synchronizer entry -- the
FDv1 Fallback Synchronizer is a distinct configuration concern from
the FDv2 Primary/Fallback chain.

Also bump the test harness pin to v3.0.0-alpha.6 so the new
fdv1-fallback test suite runs in CI.
Per the FDv1 Fallback Directive (Data System spec 1.6.3), the Primary
Synchronizer must be terminated when the directive engages. The
streaming source called ``SSEClient.close()`` to do this, but on
Python 3.10 that wasn't enough: ``SSEClient.close()`` ultimately
invokes ``urllib3.HTTPResponse.shutdown()`` (which only performs a
half-close on the local read side) plus ``release_conn()`` (which
returns the live connection to the pool). The TCP socket stayed open
until the response was garbage-collected, which the contract test
harness flagged on the linux/3.10 matrix entry as
"SDK did not close the FDv2 streaming connection after the directive
— the Primary Synchronizer must be stopped when Directed Fallback
engages." Newer Python versions GC the response promptly enough to
hide the leak.

Track the urllib3 PoolManager alongside the SSEClient and call
``HTTPConnectionPool.close()`` on each pool when the synchronizer
stops. ``HTTPConnectionPool.close()`` drains the connection queue and
closes each socket, sending the FIN the server is waiting on.
``PoolManager.clear()`` alone won't do this -- it drops the dict of
pools without closing the connections inside them. The fix is
deterministic (no sleeps) and runs both at the natural end of
``sync()`` and in ``stop()`` so it's robust to whichever thread
arrives first.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant