Skip to content

Useless alerting #667

@jchristgit

Description

@jchristgit

Our alerting is, once again, useless.

  • NGFHighUpstreamTTFB: cries constantly. It also says it's "not an issue with nginx". What to do about this? Is anyone investigating this? Is this an actual issue we need to investigate?

  • django/errors: We had an actual issue here, but this alarm is way too triggerhappy, because it self-resolves all the time. We should possibly increase the time window.

  • NGFHighRequestLatency

    What is even going on here?

    P99 request latency is 63.33ms on www.pythondiscord.com. Threshold is 3s. (Alert suppressed on hosts with fewer than 10 req/5m.)

    OK - so it's fine? What is this telling me?

These are the main offenders right now. These, as currently configured, are equivalent to a hospital alerting 50 nurses everytime someone's pulse goes over 70. Please give me actionable alerts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    component: monitoringAn issue relating to a monitoring component (e.g. Prometheus, Grafana)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Up next

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions