Skip to content

feat(code-review): Add CodeReviewRun model to track check run lifecycle#108445

Closed
armenzg wants to merge 2 commits intomasterfrom
code-review-run-model
Closed

feat(code-review): Add CodeReviewRun model to track check run lifecycle#108445
armenzg wants to merge 2 commits intomasterfrom
code-review-run-model

Conversation

@armenzg
Copy link
Copy Markdown
Member

@armenzg armenzg commented Feb 18, 2026

Summary

Adds a CodeReviewRun database model to persist the state of each code review run, enabling visibility into stuck or failed GitHub check runs.

Model (sentry_codereviewrun table):

  • Tracks organization_id, repository_id, pull_request_number, commit_sha, github_delivery_id
  • Status lifecycle: task_enqueuedseer_request_sentseer_request_succeeded / seer_request_failed
  • Records seer_response_status (HTTP code) and error_message on failure
  • Indexed on organization_id, repository_id, github_delivery_id, date_added for efficient querying

Pipeline wiring:

  • schedule_task() accepts pull_request_number and github_delivery_id, creates a CodeReviewRun with TASK_ENQUEUED status, and passes code_review_run_id to the Celery task
  • pull_request.py and issue_comment.py pass the new params
  • Celery task updates status at each stage; retryable errors only mark as failed on the final attempt (not mid-retry)
  • Also consolidates sentry_sdk.set_tags + log_seer_request into a single _set_tags_and_log() function using Seer-consistent tag names (scm_provider, scm_owner, pr_id, etc.)

Retention cleanup:

  • New periodic task cleanup_old_code_review_runs runs every 6 hours
  • Deletes rows with date_added older than 90 days using bulk_delete_objects (10k-row batches) to avoid table locks

Test plan

  • CreateCodeReviewRunTest - creates records, handles missing fields, survives DB errors gracefully
  • UpdateCodeReviewRunTest - updates status/error_message, no-op on None ID, survives DB errors
  • ProcessGitHubWebhookEventRunTrackingTest - lifecycle: success, client error, retryable final, retryable mid-retry, no run_id
  • CleanupOldCodeReviewRunsTest - deletes old rows, retains recent rows
  • All 159 existing code review tests pass

Made with Cursor

armenzg and others added 2 commits February 18, 2026 12:16
Adds a CodeReviewRun database model to persist the state of each code
review run from task enqueued through to Seer response, enabling
visibility into stuck or failed checks.

- New CodeReviewRun model with status lifecycle tracking
  (task_enqueued -> seer_request_sent -> seer_request_succeeded/failed)
- Migration 1031 to create sentry_codereviewrun table
- schedule_task() creates CodeReviewRun records and passes run_id to
  the Celery task for status updates
- Celery task updates status at each stage; retryable errors only mark
  as failed on the final attempt
- Also consolidates sentry_sdk tags + log_seer_request into a single
  _set_tags_and_log() function using Seer-consistent tag names
- Scheduled cleanup task deletes records older than 90 days every 6h
  using batched bulk_delete_objects to avoid table locks

Co-authored-by: Cursor <cursoragent@cursor.com>
…n conflict

Co-authored-by: Cursor <cursoragent@cursor.com>
@armenzg armenzg requested review from a team as code owners February 18, 2026 17:19
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Feb 18, 2026
@github-actions
Copy link
Copy Markdown
Contributor

This PR has a migration; here is the generated SQL for src/sentry/migrations/1031_add_codereviewrun_table.py

for 1031_add_codereviewrun_table in sentry

--
-- Create model CodeReviewRun
--
CREATE TABLE "sentry_codereviewrun" ("id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, "organization_id" bigint NOT NULL, "repository_id" bigint NOT NULL, "pull_request_number" integer NOT NULL, "commit_sha" varchar(64) NOT NULL, "github_delivery_id" varchar(64) NOT NULL, "status" varchar(32) NOT NULL, "seer_response_status" integer NULL, "error_message" text NULL, "date_added" timestamp with time zone DEFAULT (STATEMENT_TIMESTAMP()) NOT NULL, "date_updated" timestamp with time zone DEFAULT (STATEMENT_TIMESTAMP()) NOT NULL);
CREATE INDEX CONCURRENTLY "sentry_codereviewrun_organization_id_3abec94a" ON "sentry_codereviewrun" ("organization_id");
CREATE INDEX CONCURRENTLY "sentry_codereviewrun_repository_id_b785049c" ON "sentry_codereviewrun" ("repository_id");
CREATE INDEX CONCURRENTLY "sentry_codereviewrun_github_delivery_id_e1ef8e8d" ON "sentry_codereviewrun" ("github_delivery_id");
CREATE INDEX CONCURRENTLY "sentry_codereviewrun_github_delivery_id_e1ef8e8d_like" ON "sentry_codereviewrun" ("github_delivery_id" varchar_pattern_ops);
CREATE INDEX CONCURRENTLY "sentry_codereviewrun_date_added_925ef389" ON "sentry_codereviewrun" ("date_added");

Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment thread src/sentry/conf/server.py
"schedule": task_crontab("0", "*", "*", "*", "*"),
},
"cleanup-old-code-review-runs": {
"task": "seer:sentry.seer.code_review.tasks.cleanup.cleanup_old_code_review_runs",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup task schedule has wrong namespace and name

High Severity

The schedule config references "seer:sentry.seer.code_review.tasks.cleanup.cleanup_old_code_review_runs" but the task is registered with namespace "seer.code_review" (not "seer") and name "sentry.seer.code_review.tasks.cleanup_old_code_review_runs" (without the extra .cleanup. segment). The scheduler splits on : and calls taskregistry.get_task(namespace, taskname), which will fail to resolve the task since both the namespace and task name are wrong. The cleanup job will never run, causing the sentry_codereviewrun table to grow without bound.

Additional Locations (1)

Fix in Cursor Fix in Web

update_fields["error_message"] = error_message
if seer_response_status is not None:
update_fields["seer_response_status"] = seer_response_status
CodeReviewRun.objects.filter(id=code_review_run_id).update(**update_fields)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

date_updated never updated by QuerySet update

Medium Severity

_update_code_review_run uses CodeReviewRun.objects.filter(...).update(...) which is Django's QuerySet.update(). This bypasses the auto_now=True behavior on the date_updated field, so it will retain its creation-time value forever. The codebase provides an instance-level update() method (in sentry.db.models.query) that correctly handles auto_now fields, but it isn't used here. Since this model is designed for monitoring stuck runs, an inaccurate date_updated undermines that visibility.

Fix in Cursor Fix in Web

Comment thread src/sentry/conf/server.py
"task": "seer:sentry.seer.code_review.tasks.cleanup.cleanup_old_code_review_runs",
# Runs every 6 hours (at 00:00, 06:00, 12:00, 18:00 UTC)
"schedule": task_crontab("0", "*/6", "*", "*", "*"),
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup task module missing from TASKWORKER_IMPORTS

High Severity

The cleanup_old_code_review_runs task module (sentry.seer.code_review.tasks.cleanup) is not listed in TASKWORKER_IMPORTS. Taskworkers discover tasks by importing modules from that list at startup, so the @instrumented_task decorator on cleanup_old_code_review_runs never executes and the task is never registered. This is independent of the previously reported schedule namespace/name mismatch — both issues must be fixed for the cleanup job to run. Without it, the sentry_codereviewrun table grows unboundedly.

Additional Locations (1)

Fix in Cursor Fix in Web

@giovanni-guidini
Copy link
Copy Markdown
Contributor

There's no mention in the PR about a follow up deleting the stuff we currently have for this (we can track the status creation and completion via run_state.value and we do have a scheduled task that marks runs as "timedout" and has a side effect of also marking the status timed out.

Just confirming that removing that stuff is being tracked too?

@armenzg armenzg closed this Feb 19, 2026
@armenzg
Copy link
Copy Markdown
Member Author

armenzg commented Feb 19, 2026

I see @vaind building something similar here: #108531

@armenzg armenzg deleted the code-review-run-model branch February 19, 2026 17:14
@github-actions github-actions bot locked and limited conversation to collaborators Mar 7, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants