Break all existing tcp connection on tailscale up/down or exit node change

It seems that if I up and down on tailscale (i.e. push traffic to an exit node and then not to an exit node) on Ubuntu 22.04 LTS I have up to a minute where Firefox can’t reach sites that are active or have been active, my theory is currently these tcp connection are idle in keep-alive state - as if I kill firefox and restart it, it’s immediately fine, as it in a browser that wasn’t previously active, or a site like that I havent made a connection to before.

Looking for steps to troubleshoot this, or a way to kill active tcp connection when I up and down pushing traffic to exit node, a CLI way would be good

Here’s my script that switches on my VPN mode (i.e. send traffic via my Linode):

➜  ~ more bin/tssgp

sudo tailscale down
sudo resolvectl flush-caches
echo "ts down" 
sudo tailscale up --exit-node-allow-lan-access=false --exit-node 100.x.x.x --shields-up=false
echo "ts up - traffic routed through linode vpn" 
sudo resolvectl flush-caches

Here’s tsdown


sudo resolvectl flush-caches
sudo tailscale down
sudo resolvectl flush-caches

I also about:networking#dns and clear dns when i switch on and off sending traffic through to exit node, plus the over top use of flushing via resolvectl I don’t think it’s DNS.

Steps to replicate:
Tailscale 1.30 on exit node
Tailscale 1.30 on client
Open Firefox 105
Switch on exit node
Refresh and viola stall for up to a minute

Expected, switch on and off exit node, refresh as often and like, things immediately work, as existing tcp connections are kills and DNS entries clear (this shouldn’t really matter though but networking isn’t perfect especially with anycast IPs)

Even weirder, here a request to non cloudflare domain with a single A record on linode

➜ ~ netstat -t | grep www.static-ip-naked-non-cloudflare
tcp 0 0 hplaptop0.acme:55730 www.static-ip-naked:https ESTABLISHED

then switch on exit node (tailscale up)

tcp 0 1000 100.80.x.x:55730 www.static-ip-naked:https ESTABLISHED

it seems it consumes existing tcp connections judging by client port remaining the same, which would then be rejected as source is different? should it be doing this??

I understand of course tailscale up with exit node should route all traffic via the exit node, but tcp existing connections should be closed or left alone I suspect?

Okay, getting closer, think I’ve solved most my issue, added to my ts-up and ts-down script a snippet to kill existing tcp connections so that source isn’t rewriting and thus broken (i.e. data doesnt flow)

sudo ss -K dport = 443

For when I tailscale down and when I tailscale up --exit-node. and voila no more stalls.

Can;t help but think tailscale up/down should do this though when routing via an exit nide

I think this kind of makes sense now, as a tcp/ip connection is identified by a qaudruple of clientIP, clientPort, serverIP, serverPort - is it a deliberate action that tailscale maintains existing tcp connections or more delibrate that it just doesnt kill them?

It requires taking extra action to kill the existing connections, and coming up with the correct list of connections to break can be difficult. This may have been overlooked on Linux. The Windows firewall has a mechanism to do this sort of thing and I believe we do trigger that under at least some circumstances; it’s more tricky on Linux.

If you want to submit an issue on Github (Sign in to GitHub · GitHub), that would be helpful. I think it should be possible to do better than the current behavior.

Thanks Adrian GH #5760