Tailscale high CPU usage

Hi,

Not sure when it started exactly (Tailscale v1.20.x maybe ?) but I noticed excessive CPU usage by tailscale -if I’m looking at the cpu usage right-.
I took a couple of flamegraphs of the entire system and they’re pretty much the same in terms of code-path taking most of the time.
Let me know if this looks like a valid issue and I can provide logs etc …

Perf flamegraphs:

using just perf record -F 99 -ag -- sleep 10
tailscale2.svg
tailscale.svg

System

Distro: NixOS 21.11.20220307.9b1c7ba

➤ uname -a
Linux nixos 5.10.103 #1-NixOS SMP Wed Mar 2 10:42:57 UTC 2022 x86_64 GNU/Linux
➤ tailscale version
1.22.0
  go version: go1.17.7
1 Like

Alright, so I did a small experiment and disable DNS (--accept-dns=false) for ~5m and then enabled it again. See that clear dip in CPU/Network usage (for macbook labels) between 18:38 to 18:45, that’s when --accept-dns was set to false then I returned it to the default value:

Don’t know if that’s helpful, in case some recent changes in the DNS handling might explain this.
Note that during those 5 minutes, I continued doing the same things in the background (mostly using chrome).

Oh also just noticed lots of log entries about dns queue being full:

➤ journalctl -u tailscaled -S "2 hour ago" | grep "dns: enqueue: request queue full" | wc -l
890

Ok I looked at this again, mostly annoyed by the CPU fan being loud :stuck_out_tongue:
So it seems from looking at the profiles and the relation to DNS that this is related to DoH, since I set 1.1.1.1 and 8.8.8.8 as nameservers in my config, looks like tailscale automatically upgrades to DoH in that case tailscale/forwarder.go at main · tailscale/tailscale · GitHub.
I’ve switched to DHCP for my network to get whatever dumb nameservers from my ISP, and seems things are quiter, will run it for day or so and see if it really makes a difference. Though I had 8.8.8.8 and 1.1.1.1 for as long as I had been using Tailscale+MagicDNS (At least 1 year I think) so this is definitely something new, but not sure yet what’s the root cause exactly.

Two things to try:

  1. rebuild with Go 1.18, which has crypto changes that might help you performance-wise. We’d had these in our Go toolchain fork, but you didn’t use our binaries or build with our toolchain.

  2. run tailscale debug --cpu-profile=cpu.prof to grab a profile for 15s (or longer if you crank up the other flag), then run go tool pprof cpu.prof and top and see what’s consuming the CPU. Or email that to us.

@bradfitz thanks for the suggestions.

run tailscale debug --cpu-profile=cpu.prof to grab a profile for 15s (or longer if you crank up the other flag), then run go tool pprof cpu.prof and top and see what’s consuming the CPU. Or email that to us.

amine@nixos:~
➤ sudo tailscale debug --cpu-profile=cpu.prof
2022/03/21 14:13:43 Capturing CPU profile for 15 seconds ...
2022/03/21 14:13:58 CPU profile written to cpu.prof
amine@nixos:~
➤ sudo go tool pprof cpu.prof
File: .tailscaled-wrapped
Type: cpu
Time: Mar 21, 2022 at 2:13pm (EDT)
Duration: 15s, Total samples = 13.07s (87.13%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 5580ms, 42.69% of 13070ms total
Dropped 366 nodes (cum <= 65.35ms)
Showing top 10 nodes out of 174
      flat  flat%   sum%        cum   cum%
     980ms  7.50%  7.50%     2350ms 17.98%  math/big.nat.divBasic
     870ms  6.66% 14.15%     2320ms 17.75%  runtime.scanobject
     780ms  5.97% 20.12%     1000ms  7.65%  runtime.findObject
     660ms  5.05% 25.17%      670ms  5.13%  runtime.pageIndexOf (inline)
     460ms  3.52% 28.69%      460ms  3.52%  math/big.addMulVVW
     420ms  3.21% 31.91%      420ms  3.21%  math/big.subVV
     390ms  2.98% 34.89%      390ms  2.98%  math/big.mulAddVWW
     350ms  2.68% 37.57%     2330ms 17.83%  runtime.gentraceback
     340ms  2.60% 40.17%      490ms  3.75%  runtime.step
     330ms  2.52% 42.69%     4100ms 31.37%  runtime.mallocgc
(pprof)
  1. rebuild with Go 1.18, which has crypto changes that might help you performance-wise. We’d had these in our Go toolchain fork, but you didn’t use our binaries or build with our toolchain.

I just use the vanilla package from Nixpkgs. Since it’s co-maintained by Tailscale employees, I just assumed it does the right thing. But perhaps there should be an alternative package which pulls the fully static binaries from https://pkgs.tailscale.com/ ? Also this isn’t just NixOS not using the Tailscale go toolchain, for instance Alpine’s aports is also building from source APKBUILD « tailscale « community - aports - Alpine packages build scripts .