Lost Connectivity after months of no problems

Hi. I’ve just recently lost access to my cloud VM via Tailscale. I re-added a tcp-22 port and was able to login just fine through the public ip.

This has not resolved itself, despite the status looking fine from the CLI. Yet, you can see that I can’t SSH properly.

from xeep:

cole@xeep ~ 45m 3s
❯ tailscale status
100.72.11.62    xeep                 cole.mickens@ linux   -
100.103.122.117 azdev                cole.mickens@ linux   idle, tx 3281144 rx 5683896
100.101.102.103 ("hello-ipn-dev")    services@    linux   -
100.103.91.27   jeffhyper            cole.mickens@ linux   -
100.89.55.100   pinebook             cole.mickens@ linux   -
100.89.237.128  pixel-3-1            cole.mickens@ android -
100.68.13.41    porty                cole.mickens@ linux   -
100.96.145.20   redsly               cole.mickens@ windows idle, tx 80056 rx 71416
100.111.5.113   rpifour1             cole.mickens@ linux   -

cole@xeep ~
❯ ping azdev.ts.r10e.tech
PING azdev.ts.r10e.tech (100.103.122.117) 56(84) bytes of data.

cole@xeep ~
❯ ssh cole@azdev.ts.r10e.tech
ssh: connect to host azdev.ts.r10e.tech port 22: No route to host

from azdev:

cole@azdev ~ 1h 17m 11s
❯ tailscale status
100.103.122.117 azdev                cole.mickens@ linux   -
100.101.102.103 ("hello-ipn-dev")    services@    linux   -
100.103.91.27   jeffhyper            cole.mickens@ linux   -
100.89.55.100   pinebook             cole.mickens@ linux   -
100.89.237.128  pixel-3-1            cole.mickens@ android -
100.68.13.41    porty                cole.mickens@ linux   -
100.96.145.20   redsly               cole.mickens@ windows -
100.111.5.113   rpifour1             cole.mickens@ linux   -
100.72.11.62    xeep                 cole.mickens@ linux   active; relay "sea", tx 1184 rx 0

cole@azdev ~
❯ ssh cole@100.72.11.62 # aka 'xeep'
# actually just sort of hangs... maybe it will timeout

Background:

  • xeep is my laptop
  • azdev is my cloud VM
  • redsly is my desktop machine where I’m connecting to these machines

I can’t ssh azdev.ts.r10e.tech from xeep (though I can again from redsly; that was also broken last night). And as you can see xeep thinks it has an active idle tunnel to azdev.

Seems related:

cole@xeep ~
❯ sudo tailscale down
2021/04/29 15:02:42 was in state "Running"
2021/04/29 15:02:42 now in state "Stopped"

cole@xeep ~
❯ sudo tailscale up

cole@xeep ~
❯ tailscale status
100.72.11.62    xeep.cole-mickens.gmail.com.beta.tailscale.net userid:c126b52d00467c linux   -
                ("")                 -                    -
                ("")                 -                    -
                ("")                 -                    -
                ("")                 -                    -
                ("")                 -                    -
                ("")                 -                    -
                ("")                 -                    -
                ("")                 -                    -

cole@xeep ~
❯ tailscale status
100.72.11.62    xeep.cole-mickens.gmail.com.beta.tailscale.net userid:c126b52d00467c linux   -
                ("")                 -                    -
                ("")                 -                    -
                ("")                 -                    -
                ("")                 -                    -
                ("")                 -                    -
                ("")                 -                    -
                ("")                 -                    -
                ("")                 -                    -

EDIT: Checking the usual suspects: the system date/time is correct.

A random snippet of tailscaled logs:

Apr 29 15:07:16 xeep tailscaled[2588]: magicsock: [0xc00034e000] derp.Recv(derp-10): derphttp.Client.Recv connect to region 10 (sea): dial tcp6 [2001:19f0:8001:2d9:5400:2ff:feef:bbb1]:443: connect: network is unreachable
Apr 29 15:07:16 xeep tailscaled[2588]: derp-10: backoff: 6644 msec
Apr 29 15:07:22 xeep tailscaled[2588]: derphttp.Client.Recv: connecting to derp-10 (sea)
Apr 29 15:07:22 xeep tailscaled[2588]: magicsock: [0xc00034e000] derp.Recv(derp-10): derphttp.Client.Recv connect to region 10 (sea): dial tcp6 [2001:19f0:8001:2d9:5400:2ff:feef:bbb1]:443: connect: network is unreachable
Apr 29 15:07:22 xeep tailscaled[2588]: derp-10: backoff: 6884 msec
Apr 29 15:07:24 xeep tailscaled[2588]: logtail: dial "log.tailscale.io:443" failed: dial tcp [2600:1f14:436:d603:342:4c0d:2df9:191b]:443: connect: network is unreachable (in 1ms)
Apr 29 15:07:24 xeep tailscaled[2588]: logtail: upload: log upload of 309 bytes compressed failed: Post "https://log.tailscale.io/c/tailnode.log.tailscale.io/80feebb9472c8f609d11ffe0df461e4e7c5d1282541ced0d4a1fc0c4bb03a85d": dial tcp [2600:1f14:436:d603:342:4c0d:2df9:191b]:443: connect: network is unreachable
Apr 29 15:07:24 xeep tailscaled[2588]: logtail: backoff: 35776 msec
Apr 29 15:07:29 xeep tailscaled[2588]: derphttp.Client.Recv: connecting to derp-10 (sea)
Apr 29 15:07:29 xeep tailscaled[2588]: magicsock: [0xc00034e000] derp.Recv(derp-10): derphttp.Client.Recv connect to region 10 (sea): dial tcp6 [2001:19f0:8001:2d9:5400:2ff:feef:bbb1]:443: connect: network is unreachable

(BTW, I’m a free user with an easy workaround – I’m rapidly posting just to provide info, not out of any sense of urgency or expectation. Thank you for Tailscale!)

Is there a chance your key has expired after 6 months? Nowadays you can renew it via the admin panel.
https://tailscale.com/kb/1028/key-expiry/

I don’t think so? (see screenshot that shows key expiry in 3 months still). I also figure the CLI would produce a more relevant message in that case.

Oof I wonder if ipv6 isn’t working for this machine, and that’s why tailscale client can’t hit the tailscale servers? (I can’t imagine why that would’ve happened suddenly last night though…)

Yeah, it’s going to be hard to say - I had to reboot this machine for other reasons and of course it’s happily reconnected now.

Not sure, if anyone has notes for more data to collect next time… otherwise we can just let this be. Thanks again for tailscale!

Hmm, if tailscale up doesn’t ask for reauthentication then key expiry isn’t it.

In your logs, the “network unreachable” problems dialing logtail and DERP are a bad sign. Is it possible your firewall has suddenly started blocking outgoing https? We need this in order to negotiate connections.

There’s also quite a bit of IPv6 noise in there. I wonder if tailscale has accidentally latched itself onto using IPv6 for everything, and then only the IPv6 part of your link has gone down.

Is it possible your firewall has suddenly started blocking outgoing https? We need this in order to negotiate connections.

Zero chance of this, but it is possible that I had pushed a nixos change that altered a different part of my config (removed an ethernet bridge that shouldn’t have even been at all related to the device that tailscale would’ve been using to get to the Internet). I do sort of feel like maybe that dislodged something, somewhere, that caused ipv6 to fail for tailscale.

Or, maybe tailscale had latched on to using the bridge, and then I’d torn it down? Not sure that even makes sense though? (I even think I’d restarted tailscaled.)

I really wish I’d have checked if general ipv6 was working for other programs. Now everything seems fine, of course.

I have a similar problem from time to time that requires restarting sshd after a system reboot. Any chance you have tried that?

Until I do that, I can ssh via a local lan IP but no go for the tailscale IP. Might be something with the order the daemons load up.

It’s looking like: Tailscale loses control plane + DERP connectivity when a node loses IPv6 internet connectivity · Issue #1726 · tailscale/tailscale · GitHub

1 Like