-
Notifications
You must be signed in to change notification settings - Fork 814
Refactor ping check for clarity and error handling #2836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@jstanton617 Could you please review this PR when you have a chance? |
masafm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you reviewed or updated the test cases to align with this change?
ping/datadog_checks/ping/ping.py
Outdated
| # Real execution error (e.g. missing binary, permission, DNS failure) | ||
| # The service check is marked CRITICAL, and the check itself raises an error. | ||
| self.log.info("%s check error (%s)", host, e) | ||
| self.service_check(self.SERVICE_CHECK_NAME, AgentCheck.CRITICAL, custom_tags, message=str(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s not exactly a correction, but wouldn’t AgentCheck.UNKNOWN instead of AgentCheck.CRITICAL be more appropriate?
|
This pull request has not been updated for more than 21 days. If there are no updates to this PR within 7 days, it will be closed. If you'd like to re-open this PR after it's been closed, you can start from the latest master branch or pull the latest changes into your branch and create a new pull request. |
|
This pull request was not updated after an additional 7 days of no activity. If you would like to continue work on this PR, please re-open this PR or create a fresh branch off of the latest master branch. |
What does this PR do?
Previously, when a ping command returned a non-zero exit code, the check raised a CheckException, causing the entire check run to fail. This behavior made the Agent mark the check as ERROR, even for normal network conditions such as a host being temporarily unreachable.
This PR improves the robustness and cross-platform behavior of the PingCheck integration by refining how ping failures are interpreted.
Specifically, it adds logic to distinguish between network-level unreachable errors and name resolution / invalid address errors on platforms where ping returns the same exit code (especially Windows).
Key changes include:
• Adds detection of name resolution and invalid address errors even when ping returns exit code 1.
• Raises a CheckException for these execution-level errors to correctly surface DNS and malformed address issues.
• Preserves exit-code-1 handling for genuine network unreachable cases by returning a structured "unreachable" status.
• Ensures consistent behavior across Linux, macOS, and Windows ping implementations.