Skip to content

feat: add health check retry logic and alert cooldown to reduce false…#468

Open
inderjeet20 wants to merge 1 commit intoCCExtractor:mainfrom
inderjeet20:feat/health-check-retry-cooldown
Open

feat: add health check retry logic and alert cooldown to reduce false…#468
inderjeet20 wants to merge 1 commit intoCCExtractor:mainfrom
inderjeet20:feat/health-check-retry-cooldown

Conversation

@inderjeet20
Copy link
Copy Markdown

🚀 Summary

Reduce false-positive backend health alerts by introducing retry logic and cooldown.

🐛 Problem

Health alerts are currently triggered even when the backend is healthy. This appears to be caused by transient failures in the health check.

Additionally, alerting logic is not version-controlled and likely depends on an external VPS script, making it harder to maintain and debug.

✅ Solution

  • Added a versioned monitoring script: deployment/health-check.sh
  • Implemented retry threshold (e.g., alert only after 3 consecutive failures)
  • Added cooldown mechanism (e.g., suppress alerts for a fixed duration after triggering)
  • Introduced state persistence to track failures and alert timing

🎯 Benefits

  • Reduces false-positive alerts
  • Prevents alert spam (observed ~35 min intervals)
  • Makes monitoring logic version-controlled and maintainable
  • Improves reliability of alerting system

📝 Notes

  • This is a proposed improvement and may need alignment with the current monitoring setup
  • Open to feedback on approach or integration

🧪 Testing

  • Logic reviewed for retry and cooldown behavior
  • Health endpoint verified to return healthy during normal operation
  • Designed to handle intermittent failures without triggering unnecessary alerts

@github-actions
Copy link
Copy Markdown

Thank you for opening this PR!

Before a maintainer takes a look, it would be really helpful if you could walk through your changes using GitHub's review tools.

Please take a moment to:

  • Check the "Files changed" tab
  • Leave comments on any lines for functions, comments, etc. that are important, non-obvious, or may need attention
  • Clarify decisions you made or areas you might be unsure about and/or any future updates being considered.
  • Finally, submit all the comments!

More information on how to conduct a self review:
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/reviewing-proposed-changes-in-a-pull-request

This helps make the review process smoother and gives us a clearer understanding of your thought process.

Once you've added your self-review, we'll continue from our side. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant