Slow throughput on ARM

While I’m tracking #426, I’d like to share some data on the abysmal throughput performance on ARM CPUs, if its perhaps helpful.

Both machines are of arm64 architecture, running the latest version of Tailscale (1.22.2) as of writing and have a direct peer-to-peer connection, verified with tailscale status.

Via Tailscale:

Reverse mode, remote host xxxxx is sending
[  5] local 100.120.109.59 port 43240 connected to 100.83.6.75 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  5.55 MBytes  46.6 Mbits/sec                  
[  5]   1.00-2.00   sec  2.43 MBytes  20.4 Mbits/sec                  
[  5]   2.00-3.00   sec  1.08 MBytes  9.03 Mbits/sec                  
[  5]   3.00-4.00   sec   723 KBytes  5.92 Mbits/sec                  
[  5]   4.00-5.00   sec   859 KBytes  7.03 Mbits/sec                  
[  5]   5.00-6.00   sec   890 KBytes  7.29 Mbits/sec                  
[  5]   6.00-7.00   sec   807 KBytes  6.61 Mbits/sec                  
[  5]   7.00-8.00   sec   911 KBytes  7.47 Mbits/sec                  
[  5]   8.00-9.00   sec  1.05 MBytes  8.81 Mbits/sec                  
[  5]   9.00-10.00  sec   868 KBytes  7.11 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.04  sec  17.4 MBytes  14.6 Mbits/sec  125             sender
[  5]   0.00-10.00  sec  15.0 MBytes  12.6 Mbits/sec                  receiver

Without a VPN:

Reverse mode, remote host xxxxx is sending
[  5] local 10.0.0.11 port 56740 connected to xxxxx port 23060
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  13.4 MBytes   112 Mbits/sec
[  5]   1.00-2.00   sec  13.1 MBytes   110 Mbits/sec
[  5]   2.00-3.00   sec  13.7 MBytes   115 Mbits/sec
[  5]   3.00-4.00   sec  14.0 MBytes   118 Mbits/sec
[  5]   4.00-5.00   sec  14.7 MBytes   123 Mbits/sec
[  5]   5.00-6.00   sec  14.6 MBytes   122 Mbits/sec
[  5]   6.00-7.00   sec  13.9 MBytes   116 Mbits/sec
[  5]   7.00-8.00   sec  15.3 MBytes   129 Mbits/sec
[  5]   8.00-9.00   sec  16.1 MBytes   135 Mbits/sec
[  5]   9.00-10.00  sec  16.4 MBytes   138 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.04  sec   148 MBytes   124 Mbits/sec  928             sender
[  5]   0.00-10.00  sec   145 MBytes   122 Mbits/sec                  receiver

Just out of curiosity, what does your CPU usage look like while doing the above test WITH tailscale?

I did a similar test using 2 Raspberry Pi 4 on the same 1GB local switch. Both have the latest Pi OS 64bit
CPU (combined) for TailScale was about 235% when iperf on the tailscale IP

------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  4] local 192.168.2.254 port 5001 connected with 192.168.2.184 port 53806
[ ID] Interval       Transfer     Bandwidth
[  4] 0.0000-10.0309 sec  1.10 GBytes   938 Mbits/sec
[  5] local 100.124.58.25 port 5001 connected with 100.71.37.59 port 52044
[ ID] Interval       Transfer     Bandwidth
[  5] 0.0000-10.0199 sec   222 MBytes   186 Mbits/sec

Great question.

Per top, it’s about 120% for the tailscaled process. This is on the Raspberry Pi.

On the Arm VM, it hovers between 50% to 60%.