Skip to content

Conversation

@brendan-kellam
Copy link
Contributor

@brendan-kellam brendan-kellam commented Nov 11, 2025

This PR resolves a couple of issues:

  • During termination, we call worker.close() to terminate any in progress jobs. This takes a gracefulTimeout param (defaults 30 seconds). If there are any jobs still in progress after that duration, there is a bug in groupmq where the job's group lock will not be released (see [bug] Group lock is not released after graceful shutdown, resulting in stalling jobs Openpanel-dev/groupmq#8). This resulted in jobs (especially long running connection jobs) being left in PENDING or IN_PROGRESS indefinitely, and new jobs not being queued until the timeout expires. The current workaround is to explicitly free the locks when calling dispose.
  • Improved the graceful shutdown handling in index.ts

Summary by CodeRabbit

Release Notes

Bug Fixes

  • Implemented timeout handling for background jobs; timed-out jobs are now properly marked as failed with status updates and metrics.
  • Added job state validation to prevent execution of jobs in invalid states.

Improvements

  • Enhanced graceful shutdown process with comprehensive error handling and error tracking integration.
  • Improved lock management during shutdown to prevent deadlocks.
  • Extended process signal handling for more reliable termination.

@coderabbitai
Copy link

coderabbitai bot commented Nov 11, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The pull request introduces graceful shutdown handling for job queues, implements job state validation before execution, enriches job payloads with metadata, adds a reusable signal listener for shutdown orchestration, and updates method signatures to pass AbortSignal instead of AbortController instances.

Changes

Cohort / File(s) Summary
Graceful timeout and job validation
packages/backend/src/connectionManager.ts, packages/backend/src/repoIndexManager.ts
Added graceful-timeout event handlers that mark timed-out jobs as FAILED with status updates and metric emissions. Implemented job state validation in runJob to fetch and verify job status is PENDING or IN_PROGRESS before execution. Added manual Redis lock cleanup during shutdown to release group locks for in-progress jobs. Extended constructors to store Redis instance as private field for lock management.
Shutdown signal handling
packages/backend/src/index.ts
Refactored cleanup flow into a new listenToShutdownSignals function that sequentially disposes managers, disconnects services, and handles errors with Sentry. Replaced hard-coded cleanup with signal handler registration loop over SHUTDOWN_SIGNALS and added uncaught exception/rejection handlers.
Shutdown constants
packages/backend/src/constants.ts
Added GROUPMQ_WORKER_STOP_GRACEFUL_TIMEOUT_MS (5 seconds) for worker shutdown timeout and SHUTDOWN_SIGNALS array listing Node.js termination signals.
Parameter type refinement
packages/backend/src/repoCompileUtils.ts
Changed compileGithubConfig signature to accept signal: AbortSignal instead of abortController: AbortController, forwarding the signal to getGitHubReposFromConfig.

Sequence Diagram(s)

sequenceDiagram
    participant Worker
    participant JobQueue
    participant DB as Prisma/Redis
    participant Handler

    rect rgb(200, 220, 255)
    note over Worker: Job Lifecycle with Graceful Timeout
    end

    Worker->>Worker: runJob called
    Worker->>DB: Fetch current job status
    DB-->>Worker: Job status
    
    alt Status is PENDING or IN_PROGRESS
        Worker->>Worker: Execute job
        alt Normal Completion
            Worker->>Handler: Job success
        else Graceful Timeout
            Handler->>Worker: onJobGracefulTimeout
            Worker->>DB: Mark job as FAILED
            Worker->>Handler: Update metrics & log
        end
    else Status is other
        Worker->>Handler: Throw error (skip invalid state)
    end
Loading
sequenceDiagram
    participant System
    participant Index as index.ts
    participant Managers as ConnectionManager<br/>RepoIndexManager
    participant Services as Prisma<br/>Redis<br/>API<br/>Posthog
    participant Signals

    rect rgb(255, 220, 200)
    note over System: Graceful Shutdown Flow
    end

    Signals->>Index: SIGTERM/SIGINT/etc.
    Index->>Index: listenToShutdownSignals
    
    Index->>Managers: dispose() each manager
    
    par Manager Cleanup
        Managers->>Managers: Worker.gracefulStop(timeout)
        Managers->>Services: Delete Redis lock keys
        Managers->>Managers: Log lock release
    end
    
    Index->>Services: Disconnect each service
    Services-->>Index: Disconnected
    
    Index->>Index: Exit gracefully
    Index->>Signals: Remove handlers
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas requiring extra attention:

  • Lock cleanup logic in connectionManager.ts and repoIndexManager.ts: Manual Redis key deletion during disposal—verify error handling is robust and doesn't prevent shutdown
  • Job state validation logic: Ensure the early-exit pattern for invalid job states doesn't suppress legitimate errors or cause jobs to be lost
  • Signal handler guard mechanism: Verify the guard against repeated signal processing works correctly and doesn't prevent graceful shutdown if triggered multiple times
  • Payload enrichment impact: Confirm job payload expansion (jobId, connectionId, connectionName, orgId) doesn't break existing job consumers or queue serialization
  • Parameter type change in repoCompileUtils.ts: Ensure all call sites pass AbortSignal correctly and the change is backward-compatible with consumers

Possibly related PRs

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: implementing graceful shutdown fixes for workers, which is the core objective of the pull request.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

This comment has been minimized.

@brendan-kellam
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Nov 12, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18fad64 and d43f35b.

📒 Files selected for processing (5)
  • packages/backend/src/connectionManager.ts (9 hunks)
  • packages/backend/src/constants.ts (1 hunks)
  • packages/backend/src/index.ts (3 hunks)
  • packages/backend/src/repoCompileUtils.ts (1 hunks)
  • packages/backend/src/repoIndexManager.ts (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*

📄 CodeRabbit inference engine (.cursor/rules/style.mdc)

Filenames should always be camelCase. Exception: if there are filenames in the same directory with a format other than camelCase, use that format to keep things consistent.

Files:

  • packages/backend/src/constants.ts
  • packages/backend/src/repoCompileUtils.ts
  • packages/backend/src/index.ts
  • packages/backend/src/repoIndexManager.ts
  • packages/backend/src/connectionManager.ts

@brendan-kellam brendan-kellam merged commit 903d15a into main Nov 12, 2025
9 checks passed
@brendan-kellam brendan-kellam deleted the bkellam/groupmq_workaround branch November 12, 2025 04:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants