Skip to content

fix(webhooks): run inactive deployment-version cleanup inline on deploy#5250

Merged
waleedlatif1 merged 1 commit into
stagingfrom
worktree-webhook-version-stranding
Jun 28, 2026
Merged

fix(webhooks): run inactive deployment-version cleanup inline on deploy#5250
waleedlatif1 merged 1 commit into
stagingfrom
worktree-webhook-version-stranding

Conversation

@waleedlatif1

@waleedlatif1 waleedlatif1 commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • When a deploy activates a new version, the superseded versions' webhooks are removed by a separate, best-effort CLEANUP_INACTIVE outbox event. When that event is lost / dead-letters silently, old-version webhooks linger as is_active=true orphans that fetchActiveWebhooks skips (deployment-version mismatch) — so they silently stop polling. In prod this stranded ~515 webhooks across ~130 workflows.
  • Fix: run the existing cleanupInactiveDeploymentVersions synchronously in the SYNC_ACTIVE handler, right after the active version's webhooks/schedules are registered — falling back to the deferred outbox event only if the inline pass throws.
  • This reuses the existing guarded cleanup rather than introducing new deletion logic. That cleanup already re-checks each version is still inactive (isDeploymentVersionActive) before tearing anything down, and re-enqueues side-effects if a version became active — so it can never touch the active version.

Why this layer (addressing the first review round)

An earlier revision folded stale-webhook deletion into saveTriggerWebhooksForDeploy. Review correctly flagged two problems with that approach, both avoided here:

  • Strict cleanup blocking registration (P1): the outbox path uses strictExternalCleanup: true, so tearing down a stale/missing external subscription before registration could abort the deploy and leave the active version unregistered. Here cleanup runs strictly after registration, and a failure is caught and deferred — it can never block registration.
  • Active-version race (P2): deleting by snapshot could tear down a version that concurrently became active. The reused cleanupInactiveDeploymentVersions re-checks active-ness before each delete (the existing shouldDeleteWebhook guard), so it's race-safe.

Scope

Closes the leak going forward. Backfilling the ~515 already-stranded rows remains a separate ops pass (the "fully dead" subset needs external OAuth/polling subscriptions recreated via deploy side-effects).

Type of Change

  • Bug fix

Testing

  • tsc + biome clean. Full webhooks suite green.
  • No new unit test: the change reuses the already-exercised cleanupInactiveDeploymentVersions unchanged; the only new code is an inline-with-deferred-fallback wrapper. An isolated test would require brittle deep-mocking of the cleanup's internal db-query ordering and would assert little, so it was deliberately omitted rather than added as a checkbox.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel

vercel Bot commented Jun 28, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jun 28, 2026 1:02am

Request Review

@cursor

cursor Bot commented Jun 28, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Touches deployment outbox ordering and webhook lifecycle on every deploy; behavior reuses guarded cleanup but runs it on the critical deploy path with swallow-and-defer on errors.

Overview
Deploy sync now tears down superseded deployment versions immediately instead of relying solely on a deferred CLEANUP_INACTIVE outbox event.

At the end of the SYNC_ACTIVE side-effects handler, the flow calls new syncInactiveDeploymentCleanup, which runs the existing cleanupInactiveDeploymentVersions synchronously (after webhooks, schedules, MCP, etc. for the active version are registered). If that inline pass throws, it logs a warning and falls back to enqueueing the same inactive cleanup outbox event as before—so registration success is not blocked by cleanup failures.

The deferred cleanupInactiveSideEffects handler is unchanged and still serves as the retry path when inline cleanup fails or when the event is processed later.

Reviewed by Cursor Bugbot for commit 4508552. Configure here.

@greptile-apps

greptile-apps Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR moves inactive deployment cleanup into the active deployment sync path. The main changes are:

  • Runs inactive-version cleanup inline after active webhooks and schedules are registered.
  • Falls back to the existing cleanup outbox event when inline cleanup fails.
  • Reuses the existing inactive-version guards before deleting webhooks or schedules.

Confidence Score: 5/5

This looks safe to merge.

  • No blocking issues found in the changed code.

Important Files Changed

Filename Overview
apps/sim/lib/workflows/deployment-outbox.ts Moves inactive deployment cleanup into SYNC_ACTIVE after active side effects are registered, with deferred cleanup kept as a fallback.

Reviews (2): Last reviewed commit: "fix(webhooks): run inactive deployment-v..." | Re-trigger Greptile

Comment thread apps/sim/lib/webhooks/deploy.ts
Comment thread apps/sim/lib/webhooks/deploy.ts
@waleedlatif1 waleedlatif1 force-pushed the worktree-webhook-version-stranding branch from 6998516 to f75e0e0 Compare June 28, 2026 00:54
When a deploy activates a new version, superseded versions' webhooks are
removed by a separate, best-effort CLEANUP_INACTIVE outbox event. When that
event is lost/dead-letters, old-version webhooks linger as is_active orphans
that fetchActiveWebhooks skips (version mismatch), so they silently stop
polling (~515 webhooks across ~130 workflows in prod).

Run the existing cleanupInactiveDeploymentVersions synchronously in the
SYNC_ACTIVE handler, right after the active version's webhooks/schedules are
registered, falling back to the deferred outbox event only if the inline pass
throws. This reuses the existing guarded cleanup, which re-checks each version
is still inactive before tearing anything down (so it never touches the active
version) and runs strictly after registration (so a teardown failure can't
block it).
@waleedlatif1 waleedlatif1 force-pushed the worktree-webhook-version-stranding branch from f75e0e0 to 4508552 Compare June 28, 2026 01:02
@waleedlatif1 waleedlatif1 changed the title fix(webhooks): clean up stale deployment-version webhooks during deploy fix(webhooks): run inactive deployment-version cleanup inline on deploy Jun 28, 2026
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile review

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 4508552. Configure here.

@waleedlatif1 waleedlatif1 merged commit 3766582 into staging Jun 28, 2026
17 checks passed
@waleedlatif1 waleedlatif1 deleted the worktree-webhook-version-stranding branch June 28, 2026 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant