-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[postgres] Skip buffercache metrics on Aurora PostgreSQL 17+ to prevent crash #22206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[postgres] Skip buffercache metrics on Aurora PostgreSQL 17+ to prevent crash #22206
Conversation
…nt crash Aurora PostgreSQL 17+ crashes with Bus Error (signal 7) when querying pg_buffercache. This adds a guard clause to skip collect_buffercache_metrics on Aurora PostgreSQL 17+, with a warning log for visibility. Fixes: DataDog#21633
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # Simulate Aurora PostgreSQL 17+ | ||
| check.is_aurora = True | ||
| check.version = V17 | ||
|
|
||
| check.run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constrain Aurora 17 skip test to matching server versions
test_buffercache_metrics_skipped_on_aurora_17 sets check.is_aurora = True and check.version = V17 before check.run(), but PostgreSql.load_version (postgres/datadog_checks/postgres/postgres.py:561) overwrites self.version with the real server version during check initialization. On any CI run targeting Postgres ≤16 (as configured in postgres/hatch.toml), the guard self.version >= V17 never triggers, buffercache metrics are collected, and the subsequent assert_metric(..., count=0) expectations fail. The test should be limited to Postgres 17+ or stub out version loading so it does not depend on the runtime version.
Useful? React with 👍 / 👎.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files🚀 New features to boost your workflow:
|
The test was failing on Postgres 10-16 because load_version() overwrites check.version during check.run(). Changed @requires_over_10 to @requires_over_17 so the test only runs where the actual version satisfies the skip condition.
8030b76 to
6af41fd
Compare
|
@DataDog/database-monitoring-agent Hi team! 👋 This PR is ready for review. Here's a quick summary: Issue: Fix:
Changes:
All CI checks are now passing ✅ Thanks for your time! |
Aurora PostgreSQL 17+ crashes with Bus Error (signal 7) when querying pg_buffercache. This adds a guard clause to skip collect_buffercache_metrics on Aurora PostgreSQL 17+, with a warning log for visibility.
Fixes: #21633
What does this PR do?
Adds a guard clause to skip
collect_buffercache_metricson Aurora PostgreSQL 17+ to prevent database instance crash.Changes in
postgres/datadog_checks/postgres/postgres.py:self.is_aurora and self.version >= V17) before addingBUFFERCACHE_METRICSquery>= V17to cover potential future versions with the same issueTests added in
postgres/tests/test_pg_integration.py:test_buffercache_metrics_skipped_on_aurora_17: Verifies buffercache metrics are NOT collected on Aurora PostgreSQL 17+test_buffercache_metrics_collected_on_non_aurora_17: Verifies buffercache metrics ARE still collected on non-Aurora PostgreSQLMotivation
Aurora PostgreSQL 17 crashes with a Bus Error (signal 7) when the Datadog Agent queries
pg_buffercache. This appears to be an Aurora-specific issue with the buffer cache extension in version 17.Error from AWS logs (reported in #21633):
This is a workaround until AWS resolves the underlying Aurora issue.
Review checklist (to be filled by reviewers)
qa/skip-qalabel if the PR doesn't need to be tested during QA.backport/<branch-name>label to the PR and it will automatically open a backport PR once this one is merged