First, this is what I saw in my Grafana instance, the other day, which, as you can imagine, caught my attention:
To give you a bit of context. What you’re seeing here, is a setup where I use
ping_exporter, and have Prometheus scrape that, and a dashboard in Grafana to show my dashboard, plotting the average ping, and ping loss metrics. This is all running in a Kubernetes homelab environment. The purpose of this is to have some insights on the uptime of our company internet link (but as you can imagine also to just play around with this tech stack, and learn from it), as we had some issues with connectivity.
ping_exporter is configured to ping a set of 3 Tailscale nodes, referenced by DNS, and I have Magic DNS enabled in my tailnet. So what you see in the graph is that suddenly the graph went berserk for a couple of hours, and afterwards went completely back to the regular pattern. If you zoom into the timeseries in Prometheus, you see that suddenly the machines in my tailnet started to resolve to different IPs completely (I will post that in a follow up post).
My questions related to this:
- Was this Tailscale acting up weird? Or just the host machine doing weird stuff related to DNS?
- How is it possible that different hosts resolved to the same IPs?