Skip to content

capabilities: retain don2don request IDs longer#21677

Open
prashantkumar1982 wants to merge 1 commit intodevelopfrom
codex/don2don-requestid-cache-retention
Open

capabilities: retain don2don request IDs longer#21677
prashantkumar1982 wants to merge 1 commit intodevelopfrom
codex/don2don-requestid-cache-retention

Conversation

@prashantkumar1982
Copy link
Contributor

@prashantkumar1982 prashantkumar1982 commented Mar 24, 2026

Summary

We've seen in production that some lagging nodes can send the same Don2Don request ID well after the earlier copies of that request have already been cleaned up on the capability DON.

When that happens, the later message is treated like a fresh request instead of being recognized as a duplicate of an old one. That creates confusing follow-on errors in the stack because we lose the original dedup context too early.

This change increases the eviction window for executable Don2Don request IDs so we retain old request state for longer and get a better error surface when delayed duplicates arrive.

What this change does

  • keeps executable server request entries until they are both expired and have been retained for at least DefaultExecutableRequestTimeout
  • effectively changes the retention window to max(requestTimeout, DefaultExecutableRequestTimeout) instead of only requestTimeout
  • preserves the current request-timeout behavior for execution and cancellation while delaying eviction of dedup state
  • adds tests in the existing executable server and server-request test files for the new retention behavior

@github-actions
Copy link
Contributor

👋 prashantkumar1982, thanks for creating this pull request!

To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team.

Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks!

@github-actions
Copy link
Contributor

github-actions bot commented Mar 24, 2026

✅ No conflicts with other open PRs targeting develop

@github-actions
Copy link
Contributor

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

  • #added For any new functionality added.
  • #breaking_change For any functionality that requires manual action for the node to boot.
  • #bugfix For bug fixes.
  • #changed For any change to the existing functionality.
  • #db_update For any feature that introduces updates to database schema.
  • #deprecation_notice For any upcoming deprecation functionality.
  • #internal For changesets that need to be excluded from the final changelog.
  • #nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
  • #removed For any functionality/config that is removed.
  • #updated For any functionality that is updated.
  • #wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

@prashantkumar1982 prashantkumar1982 force-pushed the codex/don2don-requestid-cache-retention branch from cc02553 to b48b33b Compare March 24, 2026 18:09
@cl-sonarqube-production
Copy link

@trunk-io
Copy link

trunk-io bot commented Mar 24, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

@cedric-cordenier cedric-cordenier requested a review from ettec March 25, 2026 10:26

for requestID, executeReq := range r.requestIDToRequest {
if executeReq.request.Expired() {
if executeReq.request.Evictable(commoncap.DefaultExecutableRequestTimeout) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this correspond to in practice? commoncap.DefaultExecutableRequestTimeout

IMO we should keep these requests around for minutes, ~5/10 or so; I don't think this will be big enough

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants