Fast and slow ping times (at the same time)

Hello,

I’m having the following strange (to me) situation.

We have a couple of servers in the AWS Frankfurt region.

I’m on a DSL line in Germany, with (unfortunately) a very restrictive firewall that blocks all outgoing connections other than ports 80 and 443, making DERP necessary.

Doing a “tailscale ping” from my machine to one of the servers is using the DERP relay in Frankfurt (as expected) and is quite fast:

$ /Applications/Tailscale.app/Contents/MacOS/Tailscale ping 100.105.175.45
pong from {server} (100.105.175.45) via DERP(fra) in 19ms
pong from {server} (100.105.175.45) via DERP(fra) in 22ms
pong from {server} (100.105.175.45) via DERP(fra) in 16ms

The same is true in the opposite direction:

$ tailscale ping 100.116.241.31
pong from device-of-shared-to-user (100.116.241.31) via DERP(fra) in 22ms
pong from device-of-shared-to-user (100.116.241.31) via DERP(fra) in 23ms
pong from device-of-shared-to-user (100.116.241.31) via DERP(fra) in 18ms

So far, everything as expected.

When doing a “normal ping” from the command line however, the ping times increase quite a bit, making SSH (for example) quite laggy.

$ ping 100.105.175.45                                                    
PING 100.105.175.45 (100.105.175.45): 56 data bytes
64 bytes from 100.105.175.45: icmp_seq=0 ttl=64 time=197.219 ms
64 bytes from 100.105.175.45: icmp_seq=1 ttl=64 time=170.337 ms
64 bytes from 100.105.175.45: icmp_seq=2 ttl=64 time=169.350 ms
64 bytes from 100.105.175.45: icmp_seq=3 ttl=64 time=184.146 ms

I’d much appreciate any ideas, where the difference in “real world use” and “tailscale internal” could come from and how to fix it.

Thanks,

Stephan

Thanks, Stephan. Someone from our team will share details on this.

Thank you, Laura. Another data point:

Using another server, also located in Frankfurt, Germany (and in the same Tailscale account as my laptop, not shared between accounts) I’m seeing the following behaviour:

The server is consistently reporting the Frankfurt relay as “Nearest DERP” with a very, very low latency:

$ tailscale netcheck
Report:
    [...]
    * Nearest DERP: Frankfurt
    * DERP latency:
        - fra: 800µs   (Frankfurt)

My laptop on the DSL line is also still and consistently showing Frankfurt as the “Nearest DERP”:

$ /Applications/Tailscale.app/Contents/MacOS/Tailscale netcheck
Report:
    [...]
    * Nearest DERP: Frankfurt
    * DERP latency:
        - fra: 22ms    (Frankfurt)

Pinging my laptop from the server (100.111.143.35) will however use the “San Francisco” relay instead of Frankfurt:

$ tailscale ping 100.116.241.31
pong from laptop (100.116.241.31) via DERP(sfo) in 310ms
pong from laptop (100.116.241.31) via DERP(sfo) in 317ms
pong from laptop (100.116.241.31) via DERP(sfo) in 316ms

I can’t really think of a reason, why tailscale would use the SFO relay, even when all involved nodes are reporting FRA as the nearest DERP.

The periodic netcheck run by the Tailscale client primarily uses the latency for STUN requests sent to each of the DERP regions in its decision. If STUN fails (for example, this can happen when outgoing UDP traffic is blocked) then it tries to use the latency for making a TLS connection but that’s a very noisy signal and it’ll often guess incorrectly.

On top of that, the throughput limits on our DERP servers can add additional latency. If the nearby DERP server is busy then perhaps it’s responding to incoming connections slowly. We have plans to improve that over time but it’s a slow process to migrate between network providers.

@stephan - Do you have ICMP (ping) connectivity on your DSL line, out of curiosity?

Apologies for replying a bit late. I was away for a couple of days.

@adrian Many thanks for the explanation. Indeed, outgoing UDP traffic is blocked, so I guess the TLS latency check is used to choose the DERP. I cannot rule out that our great firewall of the office :roll_eyes: tries to do something with that traffic as well, so maybe it’s indeed just quite a noisy signal in the end.

@andrew Unfortunately I can’t, as outgoing ICMP is blocked as well. Doing an hping instead looks like this. (As expected, I’d say.)

# SFO
➜ hping -S -p 443 derp2d.tailscale.com
HPING derp2d.tailscale.com (en7 192.73.252.65): S set, 40 headers + 0 data bytes
len=46 ip=192.73.252.65 ttl=46 DF id=0 sport=443 flags=SA seq=0 win=64240 rtt=169.1 ms
len=46 ip=192.73.252.65 ttl=46 DF id=0 sport=443 flags=SA seq=1 win=64240 rtt=170.8 ms
len=46 ip=192.73.252.65 ttl=46 DF id=0 sport=443 flags=SA seq=3 win=64240 rtt=177.4 ms
len=46 ip=192.73.252.65 ttl=46 DF id=0 sport=443 flags=SA seq=4 win=64240 rtt=178.5 ms

# FRA
➜ hping -S -p 443 derp4c.tailscale.com
HPING derp4c.tailscale.com (en7 134.122.77.138): S set, 40 headers + 0 data bytes
len=46 ip=134.122.77.138 ttl=51 DF id=0 sport=443 flags=SA seq=0 win=64240 rtt=15.7 ms
len=46 ip=134.122.77.138 ttl=51 DF id=0 sport=443 flags=SA seq=1 win=64240 rtt=53.6 ms
len=46 ip=134.122.77.138 ttl=51 DF id=0 sport=443 flags=SA seq=2 win=64240 rtt=23.2 ms
len=46 ip=134.122.77.138 ttl=51 DF id=0 sport=443 flags=SA seq=3 win=64240 rtt=27.6 ms

We’ve “solved” the problem, by disabling all DERPs except the Frankfurt one for the time being. Seems to work so far.

Thanks for the help and thanks for Tailscale :grinning: