improvement(logs): move per-block progress markers to Redis to cut write amplification#5248
Conversation
…ite amplification
Per-block lastStartedBlock/lastCompletedBlock markers were persisted via a
jsonb_set UPDATE on workflow_execution_logs on every block start and complete
(~2N UPDATEs per run) — the heaviest write query in the DB. These are live
progress breadcrumbs with no DB-polling consumer (live progress comes from the
executor over WebSocket); their only durable value is a breadcrumb folded into
the final record.
Behind the redis-progress-markers flag, markers now live in Redis during the run
and are folded into the single terminal UPDATE at completion, dropping per-run
row UPDATEs from ~2N+1 to 1.
- New progress-markers module: HASH execution:progress:{id}, atomic Lua
monotonic-guard writes preserving the existing <= ordering, reservation-aligned
TTL backstop, graceful no-op when Redis is unavailable
- Deterministic GC: cleared at every terminal/pause boundary; TTL covers crashes
- Flag resolved once per logging session so a run never mixes write paths
- Fold markers into the completion record (Redis wins, falls back to row markers)
- Merge live markers for in-flight detail reads
- Extract shared getExecutionReservationTtlMs so marker and admission-slot TTLs
share one source of truth
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryMedium Risk Overview New Reviewed by Cursor Bugbot for commit 6669170. Configure here. |
Greptile SummaryThis PR moves live execution progress markers from per-block database updates into Redis. The main changes are:
Confidence Score: 5/5This looks safe to merge.
Important Files Changed
Reviews (6): Last reviewed commit: "perf(logs): skip completion Redis read/c..." | Re-trigger Greptile |
…n force-fail, validate marker shape Addresses review feedback on the redis-progress-markers PR: - persistLast* now falls back to the jsonb_set UPDATE when Redis is unavailable or the write fails (setLast* returns whether it persisted), so a marker is never dropped when the flag is on without a healthy Redis. - markExecutionAsFailed folds live Redis markers into execution_data before clearing, so the last-started/last-completed breadcrumb survives the force-fail path. - getProgressMarkers validates marker shape (rebuilds from typed fields), so a stale or wrong-shaped Redis value can never reach API consumers.
|
@greptile review |
|
@cursor review |
getProgressMarkers now returns null on a Redis read error (vs {} for genuinely empty). completeWorkflowExecution and markExecutionAsFailed skip clearProgressMarkers when the read returns null, so a transient read error at completion no longer wipes markers that are still durably in Redis — the TTL reclaims them instead.
|
@greptile review |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 08f54ef. Configure here.
|
@greptile review |
|
@cursor review |
…+ drain on force-fail - When a Redis marker write falls back to SQL, Redis and the row can each hold a marker for a different block; reads/folds previously preferred Redis unconditionally and could pick a stale value. Now the completion fold, the in-flight detail read, and the force-fail fold all pick the marker with the later timestamp (pickLatestStartedMarker/pickLatestCompletedMarker; markExecutionAsFailed uses a monotonic SQL guard). - markAsFailed now drains pending per-block marker writes (not just the completion promise) before folding, so a force-fail racing onBlockStart/onBlockComplete still captures the latest breadcrumb.
|
@greptile review |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 6c4c289. Configure here.
Guard the monotonic-check index with type(decoded) == 'table' so a corrupted Redis field that decodes to a non-table (e.g. a number) can't error the eval; our write path only ever stores JSON objects, so this is defense-in-depth.
completeWorkflowExecution now takes readProgressMarkers (the session's resolved marker mode); when the flag is off it skips the per-completion HGETALL+DEL entirely instead of probing a key that was never written. Sticky to the session so it stays flip-safe (an execution that wrote to Redis always folds+clears Redis). Non-session callers default to true (safe read-and-fold). Also hardened the Lua guard with type(decoded)=='table'.
|
@greptile review |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 6669170. Configure here.
Summary
lastStartedBlock/lastCompletedBlockprogress markers were persisted with ajsonb_setUPDATE onworkflow_execution_logson every block start and complete (~2N row UPDATEs per run) — the heaviest write query on the table by volume.redis-progress-markersflag, markers now live in Redis during the run and are folded into the single terminal UPDATE at completion — dropping per-run row UPDATEs from ~2N+1 to 1.How it works
progress-markers.ts: one Redis HASH per execution (execution:progress:{id}), atomic Lua monotonic-guard writes that preserve the existing<=ordering, a reservation-aligned TTL backstop, and graceful no-op when Redis is unavailable.completeWorkflowExecution, plusmarkExecutionAsFailed); the TTL only covers crashed runs. Paused executions hold zero Redis keys.getExecutionReservationTtlMs()so the marker TTL and the admission-slot TTL share one source of truth.Rollout
jsonb_setpath unchanged). Enable to cut the write amplification; flip off for instant rollback.Type of Change
Testing
jsonb_set, flag-resolution failure falls back to SQL, andmarkExecutionAsFailedclears markers.tscclean, biome clean.Checklist