-
Notifications
You must be signed in to change notification settings - Fork 16
Description
dnstap is really comprehensive as a DNS server monitoring solution.
Thanks to dnstap it is really simple to obtain, for instance, response time data for queries and responses. Because dnstap
includes the query timestamps in response messages, obtaining the response time is simple without needing to keep track of individual queries and responses, using the context information stored by the DNS server instead.
However, there is a situation in which dnstap (in my opinion) falls short: timeouts due to packet loss or non responsive servers are bit reported through dnstap.
This means that in order to obtain this data the possibilities are:
- Lame server logging. Which varies a lot among implementations. Even with Bind 9, branches 9.16 and 9.18 have a very different behavior when logging timed out or unreachable lame servers. For other DNS implementations I don´t know, at least I think that Unbound doesn´t register that kind of errors (but I haven´t checked seriously)
- Dnstap in its present state. A separate program could be keeping track of RQ and RR dnstap messages, generating a new type of event (timeout) for "missing" RR messages. Apart from the added complexity, some information would be lost. For example, it wouldn´t be possible to know what happened, a timeout, network unreachable error or something else.
The second option doesn´t look so good. Moreover, dnstap seems to be designed from the ground up to avoid a situation like that. Reply messages benefit from the DNS server software being aware of the query state and response messages include the query timestamp when available.
Although it would break one aspect of dnstap in which it tries to behave as close as possible to a packet capture on steroids, that kind of out of band messages would (in my opinion) greatly improve it.
At least in the situation I am describing, detecting certain errors when trying to querying another DNS server, I guess the performance impact would be negligible and all of the state information needed is already in place.
What do you think?