fix(issue-89): close AFTER_TTL cancel policy loopholes#136
Conversation
| data: { | ||
| cancelPolicy: policy, | ||
| cardTtlMinutes: policy === CardCancelPolicy.AFTER_TTL ? undefined : null, | ||
| }, |
There was a problem hiding this comment.
Bug 1 fix — cardTtlMinutes cleared on policy switch.
Using undefined for AFTER_TTL (so we don't overwrite an existing TTL) and null for every other policy is intentional — it mirrors the invariant the PATCH /preferences Zod schema already enforces. This closes the gap where the Telegram path could persist AFTER_TTL → IMMEDIATE with a stale TTL still in the DB.
| } else if (cancelPolicy === 'MANUAL') { | ||
| await getPaymentProvider() | ||
| .freezeCard(intentId) | ||
| .catch((err) => { |
There was a problem hiding this comment.
Bug 2 fix — no more silent no-op.
The old guard && cardTtlMinutes dropped the job whenever cardTtlMinutes was null, 0, or negative. Checkout has already succeeded by the time this runs, so refusing to cancel the card at this point only extends the window in which it is live. I chose a loud log.error + fallback to IMMEDIATE so the card is always neutralised; a DB CHECK constraint as a belt-and-braces defense would be a reasonable follow-up.
| }); | ||
| if (telegramChatId) { | ||
| await notifyManualCardPending(telegramChatId, intentId, intent.subject ?? intent.query); | ||
| await sendManualCardPendingNotice(telegramChatId, intentId, intent.subject ?? intent.query); |
There was a problem hiding this comment.
Bug 4 fix — module boundary restored.
notifyManualCardPending moved to src/telegram/notificationService.ts as sendManualCardPendingNotice. The orchestrator no longer imports getTelegramBot or InlineKeyboard from grammy. If a second notification channel is added later (email / SMS), the orchestrator won't need to change.
| throw new UnrecoverableError(`Intent ${intentId} not found during AFTER_TTL cancel`); | ||
| } | ||
|
|
||
| // All other failures (Stripe 5xx, DB unreachable, …) are retried using |
There was a problem hiding this comment.
Bug 3 fix — fail-fast on unrecoverable errors.
IntentNotFoundError will never succeed on retry, so wrapping it in BullMQ's UnrecoverableError stops the queue from burning all 5 attempts on a dead intent. Duck-typing on .name covers the case where a different Error subclass with the same name leaks from another module.
| exhausted, | ||
| err, | ||
| }, | ||
| exhausted |
There was a problem hiding this comment.
Observability — loud failure when retries are exhausted.
A permanently-live virtual card is a real money leak. The exhausted branch logs "manual intervention required" at error level so alerting can page on it. The error listener on the next line catches worker-level failures (e.g. Redis disconnect) that never make it to a job.
Extracting handleCancelCardJob as a top-level export makes this unit-testable without Redis (see tests/unit/worker/cancelCardProcessor.test.ts).
| ); | ||
|
|
||
| it('IMMEDIATE policy cancels card synchronously', async () => { | ||
| mockIntent('IMMEDIATE', null); |
There was a problem hiding this comment.
The parameterized null | 0 | negative case is the direct regression test for the original report in #89. Before the fix these inputs silently skipped the cancel job; now they must all fall back to IMMEDIATE cancel.
JonasBaeumer
left a comment
There was a problem hiding this comment.
Solid PR — all four bugs from #89 are addressed and the test coverage is thorough. A few questions and small observations before merge.
Question on the duck-typed error check
src/worker/processors/cancelCardProcessor.ts:30
if (err instanceof IntentNotFoundError || (err as Error)?.name === 'IntentNotFoundError') {IntentNotFoundError is a plain class extending Error from src/contracts/errors.ts. When does instanceof actually fail here? If this is guarding against a real case (module duplication, a serialization boundary, an error rebuilt from BullMQ's wire format), please add a one-line comment explaining the case so we don't lose the reasoning. If it's purely defensive, I'd drop the duck-type branch and the corresponding treats duck-typed IntentNotFoundError test — duck-typing on .name widens the surface that swallows retries, and a future renamed-but-similarly-named error could accidentally trip it.
Bug 1 follow-up: switching TO AFTER_TTL
src/telegram/menuHandler.ts:258-267
cardTtlMinutes: policy === CardCancelPolicy.AFTER_TTL ? undefined : null,The undefined (Prisma no-op) is correct, but it means the sequence AFTER_TTL(60) → IMMEDIATE → AFTER_TTL leaves the user on AFTER_TTL with cardTtlMinutes === null (because the second hop cleared it). The Bug 2 fallback handles that safely, so this is not a regression — but it's a UX gap worth a follow-up issue for forcing TTL re-selection when switching back to AFTER_TTL. Not blocking.
Test flush pattern
tests/unit/orchestrator/cancelPolicy.test.ts:79-83
for (let i = 0; i < 10; i++) {
await Promise.resolve();
}This relies on the microtask depth never exceeding 10. If the orchestrator ever gains an additional await in the cancel-policy chain, the assertions will silently run against stale mocks and the test will pass for the wrong reason. Prefer one of: await new Promise(setImmediate), a flushPromises helper that loops until the microtask queue drains, or refactoring applyPostCheckoutCancelPolicy to return a promise so the test can await it directly. Minor, but worth tightening.
Suggested follow-up
The PR description already calls this out — please open a follow-up issue for the DB-level CHECK(cancelPolicy != 'AFTER_TTL' OR cardTtlMinutes IS NOT NULL) so the invariant is enforced at write time and we have belt-and-braces protection if a future code path ever updates User directly.
Things I liked
- Module boundary cleanup (Bug 4) is exactly the right move — orchestrator no longer pulls in grammy.
- Loud
failedlistener withmanual intervention requiredwording — appropriate for a real money leak. UnrecoverableErrorforIntentNotFoundErrorto stop burning retries on dead intents — good queue hygiene.- Parameterized
.eachfor null/0/negative TTL — exactly the regression class the original bug allowed.
Happy to approve once the duck-type question is answered (either with a code comment or by trimming the branch).
|
Opened #143 to track the DB-level |
Four related bugs could leave virtual cards permanently active in Stripe after checkout on the AFTER_TTL cancel policy. This change plugs all of them and adds regression tests. 1. src/telegram/menuHandler.ts `savePrefPolicy` now clears `cardTtlMinutes` when switching to any policy other than AFTER_TTL, restoring the invariant enforced by the PATCH /preferences Zod schema. 2. src/orchestrator/intentService.ts `applyPostCheckoutCancelPolicy` no longer silently no-ops when `cancelPolicy === AFTER_TTL` but `cardTtlMinutes` is null or non-positive. It now logs at error level and falls back to IMMEDIATE cancellation so no card outlives checkout. 3. src/worker/processors/cancelCardProcessor.ts gains try/catch with structured logging, wraps IntentNotFoundError in BullMQ's UnrecoverableError to stop pointless retries, and attaches `failed` and `error` listeners that escalate loudly when retries are exhausted (potential money leak). The handler is also exported independently so it can be unit tested without Redis. Uses getProviderForIntent(intentId) so the correct payment provider is used. 4. `notifyManualCardPending` moves from the orchestrator to src/telegram/notificationService.ts as `sendManualCardPendingNotice`. The orchestrator now imports it via the telegram module interface and no longer pulls in grammy or telegramClient directly, respecting the module boundary rule in CLAUDE.md. Tests: - tests/unit/telegram/menuHandler.test.ts asserts cardTtlMinutes is cleared on every non-AFTER_TTL policy switch. - tests/unit/orchestrator/cancelPolicy.test.ts covers all four cancel policies including the null / zero / negative TTL fallback. - tests/unit/worker/cancelCardProcessor.test.ts covers happy path, transient retry, and unrecoverable error behaviour. - tests/unit/orchestrator/postCheckoutCancelPolicy.test.ts updated for null cardTtlMinutes under AFTER_TTL (immediate cancel). Rebased onto main (per-user paymentProvider + getProviderForIntent). Review follow-up: drop duck-typed IntentNotFoundError name check; use setImmediate flush in cancelPolicy tests. 405/405 unit tests pass locally (tests/unit). Closes #89
30bc04c to
4377bbc
Compare
|
Rebased onto
Ready for another look when you have time. |
📝 WalkthroughWalkthroughThis PR fixes four interconnected bugs in the AFTER_TTL card cancel policy (issue ChangesCard Cancel Policy and Job Processing
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
georgyia
left a comment
There was a problem hiding this comment.
Verification (local)
npx prisma generate && npx tsc --noEmit— cleannpm test -- --testPathPattern=tests/unit— 405/405 passedtests/integration/telegram/menuHandler— not run here (Docker unavailable); rely on CI for that job.
Review
applyPostCheckoutCancelPolicy: Integrates cleanly with current main (paymentProvider + getPaymentProvider). AFTER_TTL only enqueues when cardTtlMinutes != null && cardTtlMinutes > 0; otherwise logs at error and immediate provider.cancelCard — removes the dangerous silent no-op without blocking checkout.
handleCancelCardJob: getProviderForIntent(intentId) is consistent with the rest of the app; IntentNotFoundError → UnrecoverableError only via instanceof (no duck-typing). Exhausted-retry logging is appropriate for a potential live-card leak.
Telegram savePrefPolicy: Clearing TTL when switching off AFTER_TTL matches the API invariant.
sendManualCardPendingNotice: Good module-boundary fix (orchestrator no longer imports grammy).
Tests: setImmediate flush and updated postCheckoutCancelPolicy null case look correct.
Cannot self-approve on GitHub; posting as comment. Jonas: please approve/merge when satisfied.
Optional follow-up (non-blocking): UX when switching back to AFTER_TTL after an intermediate policy cleared TTL (user may need to re-select TTL) — separate issue if you want it tracked.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/telegram/notificationService.ts`:
- Around line 8-24: The Number() conversion of telegramChatId in
sendManualCardPendingNotice can produce NaN for invalid strings; validate and
guard the chat ID before calling bot.api.sendMessage by parsing/validating
telegramChatId (e.g., parseInt or regex to ensure a signed integer) and if
invalid, log a clear error including intentId and the raw telegramChatId and
return early; keep the existing try/catch around the send but move the
validation before creating the InlineKeyboard and calling bot.api.sendMessage so
the log distinguishes formatting errors from API/network failures.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: af42de7b-c4a1-4c85-9207-2e80e937f8bb
📒 Files selected for processing (8)
src/orchestrator/intentService.tssrc/telegram/menuHandler.tssrc/telegram/notificationService.tssrc/worker/processors/cancelCardProcessor.tstests/unit/orchestrator/cancelPolicy.test.tstests/unit/orchestrator/postCheckoutCancelPolicy.test.tstests/unit/telegram/menuHandler.test.tstests/unit/worker/cancelCardProcessor.test.ts
| export async function sendManualCardPendingNotice( | ||
| telegramChatId: string, | ||
| intentId: string, | ||
| label: string, | ||
| ): Promise<void> { | ||
| try { | ||
| const bot = getTelegramBot(); | ||
| const keyboard = new InlineKeyboard().text('Cancel Card Now', `menu_card_cancel:${intentId}`); | ||
| await bot.api.sendMessage( | ||
| Number(telegramChatId), | ||
| `Checkout complete for "${label}".\n\nYour virtual card is frozen. Tap below when you no longer need it.`, | ||
| { reply_markup: keyboard }, | ||
| ); | ||
| } catch (err) { | ||
| log.error({ intentId, err }, 'Failed to send MANUAL card notification'); | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Consider guarding the Number() conversion.
If telegramChatId contains an invalid numeric string, Number(telegramChatId) will produce NaN, which the Telegram API will reject. Since the error is caught and logged, this won't crash the worker, but the log entry won't make it obvious that the root cause was an invalid chat ID format rather than a network/API issue.
🛡️ Proposed guard for invalid chat ID
export async function sendManualCardPendingNotice(
telegramChatId: string,
intentId: string,
label: string,
): Promise<void> {
try {
+ const chatIdNum = Number(telegramChatId);
+ if (!Number.isFinite(chatIdNum)) {
+ log.error({ intentId, telegramChatId }, 'Invalid telegramChatId format');
+ return;
+ }
const bot = getTelegramBot();
const keyboard = new InlineKeyboard().text('Cancel Card Now', `menu_card_cancel:${intentId}`);
await bot.api.sendMessage(
- Number(telegramChatId),
+ chatIdNum,
`Checkout complete for "${label}".\n\nYour virtual card is frozen. Tap below when you no longer need it.`,
{ reply_markup: keyboard },
);
} catch (err) {
log.error({ intentId, err }, 'Failed to send MANUAL card notification');
}
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/telegram/notificationService.ts` around lines 8 - 24, The Number()
conversion of telegramChatId in sendManualCardPendingNotice can produce NaN for
invalid strings; validate and guard the chat ID before calling
bot.api.sendMessage by parsing/validating telegramChatId (e.g., parseInt or
regex to ensure a signed integer) and if invalid, log a clear error including
intentId and the raw telegramChatId and return early; keep the existing
try/catch around the send but move the validation before creating the
InlineKeyboard and calling bot.api.sendMessage so the log distinguishes
formatting errors from API/network failures.
|
Self-review pass complete — ready for maintainer review. What I checked:
Suggested review focus: the fallback behavior for invalid/missing |
…utes invariant (#147) * fix(issue-143): add CHECK constraint enforcing AFTER_TTL ↔ cardTtlMinutes Adds a Postgres CHECK constraint so the database itself rejects any User row where cancelPolicy = AFTER_TTL but cardTtlMinutes is NULL or non-positive. This is defense-in-depth on top of the application-level handling added in PR #136 — it closes the gap that any future code path writing User directly (scripts, ad-hoc updates, new endpoints, typos in another module) could re-introduce the broken state. The migration backfills any existing AFTER_TTL rows missing a TTL with the 60-minute default already used by the Telegram menu, so the migration is safe to run on existing data. Closes #143 * 📝 CodeRabbit Chat: Implement requested code changes * test(issue-143): fix accepts-valid-states test wrapper Wrap the AFTER_TTL positive TTL case in an it() block so the integration test is syntactically valid and actually executes. * test(issue-143): add atomic switch-to-AFTER_TTL acceptance case Symmetric to the existing 'switch away from AFTER_TTL' test, this asserts the CHECK constraint accepts an atomic update that flips cancelPolicy to AFTER_TTL while setting a positive cardTtlMinutes in the same statement. Pins the intended behaviour against future migrations or trigger-based rewrites that might evaluate the constraint mid-update. --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Jonas <71970468+JonasBaeumer@users.noreply.github.com>
Closes #89.
Summary
Four related bugs in the
AFTER_TTLcard-cancel policy could leave a virtual card permanently active in Stripe after checkout. This PR plugs all four and adds regression tests.savePrefPolicyleft a stalecardTtlMinuteswhen switching away fromAFTER_TTL(src/telegram/menuHandler.ts:254)cardTtlMinutestonullon every non-AFTER_TTLswitch, mirroring the API-level Zod invariant.AFTER_TTL+ null / non-positivecardTtlMinutessilently skipped cancellation (src/orchestrator/intentService.ts:131)errorlevel and fall back toIMMEDIATEcancellation so no card outlives checkout.cancelCardProcessorhad notry/catch, no structured logs, and nofailedlistener (src/worker/processors/cancelCardProcessor.ts)UnrecoverableErrorforIntentNotFoundErrorso BullMQ stops retrying a dead intent, and escalate loudly when retries are exhausted.getTelegramBot+InlineKeyboarddirectly, violating the module boundarynotifyManualCardPendingintosrc/telegram/notificationService.tsassendManualCardPendingNotice; orchestrator now only calls the telegram module interface.Testing
tests/unit/telegram/menuHandler.test.ts— assertscardTtlMinutesis cleared on every non-AFTER_TTLswitch (parameterized overON_TRANSACTION,IMMEDIATE,MANUAL).tests/unit/orchestrator/cancelPolicy.test.ts(new) — covers all four cancel policies, including thenull/0/ negative TTL fallback and theMANUALnotification being dispatched through the telegram module.tests/unit/worker/cancelCardProcessor.test.ts(new) — happy path, transient-error retry, andIntentNotFoundError→UnrecoverableErrormapping.Local results:
jest --testPathPattern=unit)tsc --noEmitcleaneslint0 errors (pre-existing warnings only)prettier --checkcleantests/integration/telegram/menuHandler.test.tspasses end-to-end against real Postgres + RedisReview / feedback notes
A few design choices worth calling out:
IMMEDIATEcancel, or refuse the checkout with a DBCHECKconstraint. I chose the fallback because checkout has already succeeded by the time we hit this branch — refusing the card cancel at that point does not undo the purchase, it only increases the window in which the card is live. A DBCHECK(cancelPolicy != 'AFTER_TTL' OR cardTtlMinutes IS NOT NULL)would be a good additional belt-and-braces defense in a follow-up migration.UnrecoverableErrorfor missing intents. Without this, the queue's 5 configured attempts with exponential backoff will burn all retries on an intent that will never exist again. Fail fast, keep the logs clean, free up worker capacity.failedlistener. A permanently-live test card is a real money leak in production. Theexhaustedbranch intentionally logs aterrorwith"manual intervention required"so alerting can page on it.src/telegram/. If a second notification channel is added later (e.g. email, SMS) the orchestrator will not need to change.handleCancelCardJobas a plain async function so unit tests can drive it without spinning up a real BullMQ Worker and Redis connection. The exportedcreateCancelCardWorkeris unchanged from the caller's perspective.Out of scope
CHECKconstraint enforcing theAFTER_TTL↔cardTtlMinutesinvariant (see point 1 above).IPaymentProvider-style interface for telegram notifications insrc/contracts/services.ts. The current direct import respects the boundary rule and keeps the diff surgical; a formal contract can be extracted when a second channel appears.Summary by CodeRabbit
Bug Fixes
New Features