Tsnet not working on kubernetes with Linkerd service mesh

Hello,

I am playing around with tsnet and exposing a Go service to my tailnet. The service is running in a Digital Ocean kubernetes cluster, and everything works correctly when the service is not meshed with linkerd.

However, when the pod is meshed with linkerd, tsnet is not able to connect to tailscale with requests timing out. (Logs below)

If I had to guess, it would be that linkerd is intercepting the TCP traffic and doing things wrong. It is possible to tell Linkerd to skip outgoing ports,, but if I understand tsnet correctly, the port I’d need to skip is being picked dynamically.

Skipping all outbound ports with config.linkerd.io/skip-outbound-ports: 1-90000 seems to work, but this could be a red herring.

Any help is much appreciated!
Thanks,
Nick.

Logs from the app:

[v1] using fake (no-op) tun device
[v1] using fake (no-op) OS network configurator
[v1] using fake (no-op) DNS configurator
dns: using dns.noopManager
link state: interfaces.State{defaultRoute=eth0 ifs={eth0:[10.244.0.226/32]} v4=true v6=false}
magicsock: failed to force-set UDP read buffer size to 7340032: operation not permitted
magicsock: failed to force-set UDP write buffer size to 7340032: operation not permitted
magicsock: failed to force-set UDP read buffer size to 7340032: operation not permitted
magicsock: failed to force-set UDP write buffer size to 7340032: operation not permitted
[v1] couldn't create raw v4 disco listener, using regular listener instead: raw disco listening disabled, SO_MARK unavailable
[v1] couldn't create raw v6 disco listener, using regular listener instead: raw disco listening disabled, SO_MARK unavailable
magicsock: disco key = d:4b160652b6c2f995
Creating WireGuard device...
Bringing WireGuard device up...
wg: [v2] UDP bind has been updated
wg: [v2] Interface state was Down, requested Up, now Up
Bringing router up...
[v1] warning: fakeRouter.Up: not implemented.
Clearing router settings...
[v1] warning: fakeRouter.Set: not implemented.
Starting network monitor...
Engine created.
tsnet running state path /root/.config/tsnet-service/tailscaled.state
wg: [v2] Routine: receive incoming mkReceiveFunc - started
wg: [v2] Routine: receive incoming receiveDERP - started
wg: [v2] Routine: receive incoming mkReceiveFunc - started
pm: migrating \"_daemon\" profile to new format
[v\u0000JSON]1{\"Hostinfo\":{\"IPNVersion\":\"1.40.1-ERR-BuildInfo\",\"OS\":\"linux\",\"OSVersion\":\"5.10.0-0.deb10.17-amd64\",\"Container\":false,\"Env\":\"k8s\",\"Distro\":\"alpine\",\"DistroVersion\":\"3.18.0\",\"Desktop\":false,\"Package\":\"tsnet\",\"Hostname\":\"v2-b5845c978-drwrp\",\"Machine\":\"x86_64\",\"GoArch\":\"amd64\",\"GoArchVar\":\"v1\",\"GoVersion\":\"go1.20.4\"}}
logpolicy: using system state directory \"/var/lib/tailscale\"
[v1] netmap packet filter: (not ready yet)
tsnet starting with hostname \"podname\", varRoot \"/root/.config/tsnet-service\"
Start
generating new machine key
machine key written to store
[v1] netmap packet filter: (not ready yet)
[v1] got initial portlist info in 0s
control: [v1] HostInfo: {\"IPNVersion\":\"1.40.1-ERR-BuildInfo\",\"BackendLogID\":\"694d0ad667625f3a28f48fff73193688bab3853b247a5d27ce5b2162a9e13c1b\",\"OS\":\"linux\",\"OSVersion\":\"5.10.0-0.deb10.17-amd64\",\"Container\":false,\"Env\":\"k8s\",\"Distro\":\"alpine\",\"DistroVersion\":\"3.18.0\",\"Desktop\":false,\"Package\":\"tsnet\",\"Hostname\":\"podname\",\"Machine\":\"x86_64\",\"GoArch\":\"amd64\",\"GoArchVar\":\"v1\",\"GoVersion\":\"go1.20.4\",\"Services\":[{\"Proto\":\"tcp\",\"Port\":4143},{\"Proto\":\"tcp\",\"Port\":4191},{\"Proto\":\"tcp\",\"Port\":8000,\"Description\":\"service\"}],\"Userspace\":true,\"UserspaceRouter\":true}
Backend: logs: be:694d0ad667625f3a28f48fff73193688bab3853b247a5d27ce5b2162a9e13c1b fe:
control: [v1] authRoutine: state:new; goal=nil paused=false
control: [v1] mapRoutine: state:new
Switching ipn state NoState -> NeedsLogin (WantRunning=true, nm=false)
blockEngineUpdates(true)
health(\"overall\"): error: not in map poll
wgengine: Reconfig: configuring userspace WireGuard config (with 0/0 peers)
wgengine: Reconfig: configuring router
[v1] warning: fakeRouter.Set: not implemented.
wgengine: Reconfig: configuring DNS
dns: Set: {DefaultResolvers:[] Routes:{} SearchDomains:[] Hosts:0}
dns: Resolvercfg: {Routes:{} Hosts:0 LocalDomains:[]}
dns: OScfg: {Nameservers:[] SearchDomains:[] MatchDomains:[] Hosts:[]}
[v1] wgengine: Reconfig done
LocalBackend state is NeedsLogin; running StartLoginInteractive...
StartLoginInteractive: url=false
control: client.Login(false, 6)
control: [v1] authRoutine: context done.
control: [v1] authRoutine: state:new; wantLoggedIn=true
control: [v1] direct.TryLogin(token=false, flags=6)
control: LoginInteractive -> regen=true
control: doLogin(regen=true, hasUrl=false)
2023/05/19 16:38:38 logtail: dial "log.tailscale.io:443" failed: dial tcp 34.229.201.48:443: i/o timeout (in 30s), trying bootstrap...
2023/05/19 16:39:23 logtail: dial "log.tailscale.io:443" failed: dial tcp 34.229.201.48:443: i/o timeout (in 30.001s), trying bootstrap...
2023/05/19 16:40:08 logtail: dial "log.tailscale.io:443" failed: dial tcp 34.229.201.48:443: i/o timeout (in 30.001s), trying bootstrap...
control: trying bootstrapDNS(\"derp12c.tailscale.com\", \"149.28.119.105\") for \"controlplane.tailscale.com\" ...
control: bootstrapDNS(\"derp12c.tailscale.com\", \"149.28.119.105\") for \"controlplane.tailscale.com\" error: Get \"https://derp12c.tailscale.com/bootstrap-dns?q=controlplane.tailscale.com\": context deadline exceeded
control: trying bootstrapDNS(\"derp7.tailscale.com\", \"2401:c080:1000:467f:5400:2ff:feee:22aa\") for \"controlplane.tailscale.com\" ...
control: bootstrapDNS(\"derp7.tailscale.com\", \"2401:c080:1000:467f:5400:2ff:feee:22aa\") for \"controlplane.tailscale.com\" error: Get \"https://derp7.tailscale.com/bootstrap-dns?q=controlplane.tailscale.com\": dial tcp [2401:c080:1000:467f:5400:2ff:feee:22aa]:443: connect: network is unreachable
control: trying bootstrapDNS(\"derp2d.tailscale.com\", \"192.73.252.65\") for \"controlplane.tailscale.com\" ...
control: bootstrapDNS(\"derp2d.tailscale.com\", \"192.73.252.65\") for \"controlplane.tailscale.com\" error: Get \"https://derp2d.tailscale.com/bootstrap-dns?q=controlplane.tailscale.com\": context deadline exceeded
control: trying bootstrapDNS(\"derp5.tailscale.com\", \"2001:19f0:5801:10b7:5400:2ff:feaa:284c\") for \"controlplane.tailscale.com\" ...
control: bootstrapDNS(\"derp5.tailscale.com\", \"2001:19f0:5801:10b7:5400:2ff:feaa:284c\") for \"controlplane.tailscale.com\" error: Get \"https://derp5.tailscale.com/bootstrap-dns?q=controlplane.tailscale.com\": dial tcp [2001:19f0:5801:10b7:5400:2ff:feaa:284c]:443: connect: network is unreachable
control: trying bootstrapDNS(\"derp6.tailscale.com\", \"68.183.90.120\") for \"controlplane.tailscale.com\" ...
control: bootstrapDNS(\"derp6.tailscale.com\", \"68.183.90.120\") for \"controlplane.tailscale.com\" error: Get \"https://derp6.tailscale.com/bootstrap-dns?q=controlplane.tailscale.com\": context deadline exceeded
control: trying bootstrapDNS(\"derp12c.tailscale.com\", \"2001:19f0:5c01:2cb:5400:3ff:fe8d:cb60\") for \"controlplane.tailscale.com\" ...
control: bootstrapDNS(\"derp12c.tailscale.com\", \"2001:19f0:5c01:2cb:5400:3ff:fe8d:cb60\") for \"controlplane.tailscale.com\" error: Get \"https://derp12c.tailscale.com/bootstrap-dns?q=controlplane.tailscale.com\": dial tcp [2001:19f0:5c01:2cb:5400:3ff:fe8d:cb60]:443: connect: network is unreachable
control: [v1] TryLogin: fetch control key: Get \"https://controlplane.tailscale.com/key?v=61\": dial tcp [2a05:d014:386:203:4535:c15c:9ab:8258]:443: connect: network is unreachable
control: [v1] sendStatus: authRoutine-report: state:authenticating
Received error: fetch control key: Get \"https://controlplane.tailscale.com/key?v=61\": dial tcp [2a05:d014:386:203:4535:c15c:9ab:8258]:443: connect: network is unreachable
control: authRoutine: [v1] backoff: 6 msec
control: [v1] authRoutine: state:authenticating; wantLoggedIn=true
control: [v1] direct.TryLogin(token=false, flags=6)
control: LoginInteractive -> regen=true
control: doLogin(regen=true, hasUrl=false)
2023/05/19 16:40:53 logtail: dial "log.tailscale.io:443" failed: dial tcp 34.229.201.48:443: i/o timeout (in 30s), trying bootstrap...
2023/05/19 16:41:39 logtail: dial "log.tailscale.io:443" failed: dial tcp 34.229.201.48:443: i/o timeout (in 30.001s), trying bootstrap...

Ok, I’ve done a bit more digging and when using config.linkerd.io/skip-outbound-ports: 443 it connects correctly so my suggestion earlier won’t help.

The userspace-sidecar.yaml example within the tailscale k8s docs works fine when meshed, without the need to skip-outbound-ports. However if you add a containerPort:80 to the nginx container, it stops working.

Not really sure what to make of this right now.

I’ve managed to track this down to tailscale.com/net/netns which the controlclient.Direct uses on its http client dialer. I think the socket options that netns is configuring are not liked by the linkerd proxy for whatever reason.

Using netns.SetEnabled(false) fixes the issue.