Skip to content

fix(api): tier upgrade promotes team default TTL + auto_24h deploys (P1)#212

Merged
mastermanas805 merged 4 commits into
masterfrom
fix/api-tier-upgrade-promotes-deployment-ttl
May 31, 2026
Merged

fix(api): tier upgrade promotes team default TTL + auto_24h deploys (P1)#212
mastermanas805 merged 4 commits into
masterfrom
fix/api-tier-upgrade-promotes-deployment-ttl

Conversation

@mastermanas805

Copy link
Copy Markdown
Member

Summary

P1 user-visible bug. A Pro-tier user got an "expires in 6 hours" email the day after upgrading free→pro. Root cause: the Razorpay subscription.charged webhook called UpgradeTeamAllTiers but never touched teams.default_deployment_ttl_policy — so every NEXT POST /deploy/new still inherited auto_24h and re-fired the 24h-expiry reminder cycle.

  • New models.PromoteDeploymentTTLsForTeam(ctx, db, teamID) (PromoteDeploymentTTLsResult, error) — single tx that (a) flips teams.default_deployment_ttl_policy auto_24hpermanent (user-explicit non-auto defaults are LEFT UNTOUCHED), and (b) promotes every non-terminal ttl_policy='auto_24h' deploy to permanent + clears expires_at + resets the reminders ledger. Custom + already-permanent rows are LEFT UNTOUCHED.
  • Wired into handleSubscriptionCharged for tiers >= hobby (anonymous / free skip the promote path). Fail-open: a promote error never 500s the webhook — the upgrade tx has already committed and the operator runs the idempotent cmd/backfill-tier-ttl to repair the residual.
  • Audit row team.ttl_policies_promoted (operator-only, no customer email — the subscription.upgraded email already covers customer comms) + metric instant_tier_upgrade_ttl_promote_total{outcome=success|noop|error}.
  • Companion infra PR adds the Prom rule + NR alert + dashboard tile + METRICS-CATALOG row (rule 25): https://github.com/InstaNode-dev/infra/pull/new/fix/observability-tier-upgrade-ttl-promote
  • Companion content PR documents the new auto-promote behavior in llms.txt: https://github.com/InstaNode-dev/content/pull/new/docs/llms-tier-upgrade-ttl-promote

Coverage block (CLAUDE.md rule 17)

Symptom:        team.default stays 'auto_24h' + auto_24h deploys keep
                firing "expires in N hours" emails after paid upgrade
Enumeration:    rg -F 'UpdatePlanTier' / 'UpgradeTeamAllTiers' /
                'default_deployment_ttl_policy'
Sites found:    4 paid-tier mutation paths (Razorpay webhook,
                /internal/set-tier dev-only, admin tier-change,
                PATCH /api/v1/team/settings)
Sites touched:  1 — the Razorpay webhook is the ONLY paid-tier
                promotion path. set-tier is dev-only (skipped per spec
                anti-goal). Admin demote keeps existing state by design.
                Team-settings is the user's own override and must not
                trigger promote.
Coverage test:  e2e/reliability_contract_test.go's audit-kinds registry
                iterator (rule 18) flags any new AuditKind* constant
                without a downstream-consumer entry, and
                TestPlansRegistryUpgradeTargets_AllInvokePromoteGuard
                iterates plans.Registry so a new tier landing between
                free and hobby loudly fails the guard.
Live verified:  awaiting user verification of the next paid upgrade —
                the new NR alert (tier-upgrade-ttl-promote-failed) pages
                on any outcome=error tick within 10m.

Backfill (operator action — one-off after deploy)

DATABASE_URL=$(kubectl get secret -n instant instant-secrets -o jsonpath='{.data.DATABASE_URL}' | base64 -d) \
  go run ./cmd/backfill-tier-ttl -apply

Default mode is dry-run. The function is idempotent — safe to re-run for any residual after a partial failure.

Test plan

  • make gate — new tests pass; remaining failures are pre-existing (customer-DB proxy / NATS service not present on local laptop — known per Makefile docstring). Confirmed by stash-test on master baseline.
  • go test -run 'TestPromoteDeploymentTTLs|TestPlansRegistryUpgradeTargets|TestBillingWebhook_ChargedPromotesTeamDefaultAndDeploys|TestBillingWebhook_ChargedDoesNotPromoteOnSameTierRenewal|TestReliability_AuditKinds_EveryConstantHasConsumerSpec' ./internal/models ./internal/handlers ./e2eok.
  • Post-deploy: build-SHA gate via curl https://api.instanode.dev/healthz | jq .commit_id.
  • Operator runs cmd/backfill-tier-ttl -apply once against prod (DRY-RUN first).
  • Verify on the affected user: re-check their team default + deploys are now permanent.

🤖 Generated with Claude Code

mastermanas805 and others added 4 commits May 31, 2026 10:08
A Pro-tier user (mastermanas805) just got an "expires in 6 hours" email
the day after upgrading free→pro. Root cause: subscription.charged only
called UpgradeTeamAllTiers, which lifts per-deploy ttl_policy as a
side-effect but never touches teams.default_deployment_ttl_policy — so
every NEXT POST /deploy/new still inherited 'auto_24h' and re-fired the
24h-expiry reminder cycle.

Wires a new models.PromoteDeploymentTTLsForTeam (single tx) into
handleSubscriptionCharged for tiers >= hobby:
  - flips teams.default_deployment_ttl_policy auto_24h → permanent
    (user-explicit non-auto_24h defaults are LEFT UNTOUCHED)
  - promotes every non-terminal ttl_policy='auto_24h' deploy to
    permanent + clears expires_at + resets the reminders ledger
  - emits team.ttl_policies_promoted audit row + counter
    instant_tier_upgrade_ttl_promote_total{outcome=success|noop|error}

Fail-open: the upgrade tx has already committed by promote-time, so a
promote error never 500s the webhook (operator runs cmd/backfill-tier-ttl
to repair the residual; the function is idempotent).

Coverage block (CLAUDE.md rule 17):
  Symptom:        team.default stays 'auto_24h' + auto_24h deploys keep
                  firing "expires in N hours" emails after paid upgrade
  Enumeration:    rg -F 'UpdatePlanTier' rg -F 'UpgradeTeamAllTiers'
                  rg -F 'default_deployment_ttl_policy'
  Sites found:    4 (billing webhook, /internal/set-tier, admin tier change,
                  PATCH /api/v1/team/settings)
  Sites touched:  1 — the Razorpay webhook is the ONLY paid-tier
                  promotion path (set-tier is dev-only, admin demote
                  keeps existing state by design, team-settings is the
                  user's own override and must not trigger promote)
  Coverage test:  e2e/reliability_contract_test.go's audit-kinds registry
                  iterator (rule 18) flags any new AuditKind* constant
                  that ships without a downstream-consumer entry, and
                  TestPlansRegistryUpgradeTargets_AllInvokePromoteGuard
                  iterates plans.Registry tiers so a new tier added
                  between free and hobby would loudly fail the guard.
  Live verified:  awaiting user verification of the next paid upgrade —
                  the new metric NR alert (tier-upgrade-ttl-promote-failed)
                  pages on any outcome=error tick within 10m.

Backfill: operator runs `DATABASE_URL=… go run ./cmd/backfill-tier-ttl
-apply` once to repair every paid team with stale auto_24h state. The
script is idempotent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the missing branch coverage for the Pro-upgrade auto-promote TTL
fix landed in 82de682, taking the three changed surfaces to 100%:

- cmd/backfill-tier-ttl: refactored main() → exitFn-wrapped run() with
  injectable openDB + promoteFn seams (mirrors cmd/openapi-snapshot).
  Ten tests cover usage errors, DB open/ping/query/scan/rows failures,
  dry-run vs apply modes, mixed ok/error per-team tallying, env-var
  fallback, and the main() exit-code dispatch.

- models.PromoteDeploymentTTLsForTeam: six sqlmock-driven tests for the
  begin/exec/rows-affected/commit error wrappers that a real Postgres
  test DB can't drive on demand.

- handlers.handleSubscriptionCharged + emitTTLPoliciesPromotedAudit:
  added the promoteDeploymentTTLsForTeamFn seam (same pattern as
  billingPortalFactory) so the fail-open promote-error branch is
  reachable. Two tests: nil-db audit early-return + webhook still 200s
  on a simulated promote tx failure.

No production behaviour change: the cmd refactor keeps the same exit
codes and the handler seam is a package-level var pointing at the real
models call, swapped only by tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mastermanas805 mastermanas805 merged commit 914bb37 into master May 31, 2026
18 checks passed
@mastermanas805 mastermanas805 deleted the fix/api-tier-upgrade-promotes-deployment-ttl branch May 31, 2026 05:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant