feat(code-review): Add CodeReviewRun model to track check run lifecycle#108445
feat(code-review): Add CodeReviewRun model to track check run lifecycle#108445
Conversation
Adds a CodeReviewRun database model to persist the state of each code review run from task enqueued through to Seer response, enabling visibility into stuck or failed checks. - New CodeReviewRun model with status lifecycle tracking (task_enqueued -> seer_request_sent -> seer_request_succeeded/failed) - Migration 1031 to create sentry_codereviewrun table - schedule_task() creates CodeReviewRun records and passes run_id to the Celery task for status updates - Celery task updates status at each stage; retryable errors only mark as failed on the final attempt - Also consolidates sentry_sdk tags + log_seer_request into a single _set_tags_and_log() function using Seer-consistent tag names - Scheduled cleanup task deletes records older than 90 days every 6h using batched bulk_delete_objects to avoid table locks Co-authored-by: Cursor <cursoragent@cursor.com>
…n conflict Co-authored-by: Cursor <cursoragent@cursor.com>
|
This PR has a migration; here is the generated SQL for for --
-- Create model CodeReviewRun
--
CREATE TABLE "sentry_codereviewrun" ("id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, "organization_id" bigint NOT NULL, "repository_id" bigint NOT NULL, "pull_request_number" integer NOT NULL, "commit_sha" varchar(64) NOT NULL, "github_delivery_id" varchar(64) NOT NULL, "status" varchar(32) NOT NULL, "seer_response_status" integer NULL, "error_message" text NULL, "date_added" timestamp with time zone DEFAULT (STATEMENT_TIMESTAMP()) NOT NULL, "date_updated" timestamp with time zone DEFAULT (STATEMENT_TIMESTAMP()) NOT NULL);
CREATE INDEX CONCURRENTLY "sentry_codereviewrun_organization_id_3abec94a" ON "sentry_codereviewrun" ("organization_id");
CREATE INDEX CONCURRENTLY "sentry_codereviewrun_repository_id_b785049c" ON "sentry_codereviewrun" ("repository_id");
CREATE INDEX CONCURRENTLY "sentry_codereviewrun_github_delivery_id_e1ef8e8d" ON "sentry_codereviewrun" ("github_delivery_id");
CREATE INDEX CONCURRENTLY "sentry_codereviewrun_github_delivery_id_e1ef8e8d_like" ON "sentry_codereviewrun" ("github_delivery_id" varchar_pattern_ops);
CREATE INDEX CONCURRENTLY "sentry_codereviewrun_date_added_925ef389" ON "sentry_codereviewrun" ("date_added"); |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| "schedule": task_crontab("0", "*", "*", "*", "*"), | ||
| }, | ||
| "cleanup-old-code-review-runs": { | ||
| "task": "seer:sentry.seer.code_review.tasks.cleanup.cleanup_old_code_review_runs", |
There was a problem hiding this comment.
Cleanup task schedule has wrong namespace and name
High Severity
The schedule config references "seer:sentry.seer.code_review.tasks.cleanup.cleanup_old_code_review_runs" but the task is registered with namespace "seer.code_review" (not "seer") and name "sentry.seer.code_review.tasks.cleanup_old_code_review_runs" (without the extra .cleanup. segment). The scheduler splits on : and calls taskregistry.get_task(namespace, taskname), which will fail to resolve the task since both the namespace and task name are wrong. The cleanup job will never run, causing the sentry_codereviewrun table to grow without bound.
Additional Locations (1)
| update_fields["error_message"] = error_message | ||
| if seer_response_status is not None: | ||
| update_fields["seer_response_status"] = seer_response_status | ||
| CodeReviewRun.objects.filter(id=code_review_run_id).update(**update_fields) |
There was a problem hiding this comment.
date_updated never updated by QuerySet update
Medium Severity
_update_code_review_run uses CodeReviewRun.objects.filter(...).update(...) which is Django's QuerySet.update(). This bypasses the auto_now=True behavior on the date_updated field, so it will retain its creation-time value forever. The codebase provides an instance-level update() method (in sentry.db.models.query) that correctly handles auto_now fields, but it isn't used here. Since this model is designed for monitoring stuck runs, an inaccurate date_updated undermines that visibility.
| "task": "seer:sentry.seer.code_review.tasks.cleanup.cleanup_old_code_review_runs", | ||
| # Runs every 6 hours (at 00:00, 06:00, 12:00, 18:00 UTC) | ||
| "schedule": task_crontab("0", "*/6", "*", "*", "*"), | ||
| }, |
There was a problem hiding this comment.
Cleanup task module missing from TASKWORKER_IMPORTS
High Severity
The cleanup_old_code_review_runs task module (sentry.seer.code_review.tasks.cleanup) is not listed in TASKWORKER_IMPORTS. Taskworkers discover tasks by importing modules from that list at startup, so the @instrumented_task decorator on cleanup_old_code_review_runs never executes and the task is never registered. This is independent of the previously reported schedule namespace/name mismatch — both issues must be fixed for the cleanup job to run. Without it, the sentry_codereviewrun table grows unboundedly.
Additional Locations (1)
|
There's no mention in the PR about a follow up deleting the stuff we currently have for this (we can track the status creation and completion via Just confirming that removing that stuff is being tracked too? |


Summary
Adds a
CodeReviewRundatabase model to persist the state of each code review run, enabling visibility into stuck or failed GitHub check runs.Model (
sentry_codereviewruntable):organization_id,repository_id,pull_request_number,commit_sha,github_delivery_idtask_enqueued→seer_request_sent→seer_request_succeeded/seer_request_failedseer_response_status(HTTP code) anderror_messageon failureorganization_id,repository_id,github_delivery_id,date_addedfor efficient queryingPipeline wiring:
schedule_task()acceptspull_request_numberandgithub_delivery_id, creates aCodeReviewRunwithTASK_ENQUEUEDstatus, and passescode_review_run_idto the Celery taskpull_request.pyandissue_comment.pypass the new paramssentry_sdk.set_tags+log_seer_requestinto a single_set_tags_and_log()function using Seer-consistent tag names (scm_provider,scm_owner,pr_id, etc.)Retention cleanup:
cleanup_old_code_review_runsruns every 6 hoursdate_addedolder than 90 days usingbulk_delete_objects(10k-row batches) to avoid table locksTest plan
CreateCodeReviewRunTest- creates records, handles missing fields, survives DB errors gracefullyUpdateCodeReviewRunTest- updates status/error_message, no-op on None ID, survives DB errorsProcessGitHubWebhookEventRunTrackingTest- lifecycle: success, client error, retryable final, retryable mid-retry, no run_idCleanupOldCodeReviewRunsTest- deletes old rows, retains recent rowsMade with Cursor