Tailscale proxy in k8s with cilium works with pod not with svc?

Hi, we use k8s v1.21.10 with cilium v1.11.1 and tailscale as proxy v1.22.
I think the real problem is on cilium side, but i am interested if someone has an idea to understand the problem.

The tailscale proxy should route to a k8s svc (nginx-ingress) with some pods.

If i monitor the traffic on the eth0 interface of the tailscale pod i can see SYN packets but no answer:
10.251.99.81 is the nginx-ingress service.

/ # tcpdump -n -i eth0 host 10.251.99.81
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
08:07:30.777474 IP 10.251.132.39.50760 > 10.251.99.81.443: Flags [S], seq 3804727180, win 65535, options [mss 1240,nop,wscale 6,nop,nop,TS val 2734636986 ecr 0,sackOK,eol], length 0
08:07:32.589384 IP 10.251.132.39.50755 > 10.251.99.81.443: Flags [S], seq 323254430, win 65535, options [mss 1240,nop,wscale 6,nop,nop,TS val 3192462830 ecr 0,sackOK,eol], length 0
08:07:35.362561 IP 10.251.132.39.50770 > 10.251.99.81.443: Flags [SEW], seq 1067036323, win 65535, options [mss 1240,nop,wscale 6,nop,nop,TS val 3201352034 ecr 0,sackOK,eol], length 0

I do not see any discarting or rejects in the cilium monitor tools (hubble or tcpdump on the vxlan interface) on the nginx pods.

What works:

  • Connecting the nginx-ingress svc from the tailscale pod (curl https://10.251.99.81:443)
  • Using tailscale as proxy service if it is connected to one of the nginx-ingres pods (which makes no sense while pods can be restarted/destroyed/removed without further notice and if, they will have a different ip address)

This is not a problem with older cilium (v1.8.x) or a different cni (calico).
So - this will be clearly a cilium problem, but i am curious what might be the technical difference between a forwarded tcp/ip packet and a new created one?

Do you have any evidence of this issue? We have exactly same issue and wonder why it did not work.

We could it fix by changing the art of kubeproxy replacement in cilium from Strict to Probe.

See Kubernetes Without kube-proxy — Cilium 1.11.5 documentation

1 Like

Hmm, that’s very interesting. We’ve also used strict mode for kube-proxy replacement, so I will give it a try.

By the way, did you figure out what’s the root cause to make such difference? As I spent a lot of time, I’m very curious about the background.

I’ve just had this fight with calico in ebpf mode and I wonder if it’s because cilium can’t parse the tailscale0 packets being an L3 interface and lacking the eth header.

Per the docs:

As per k8s Service, Cilium’s eBPF kube-proxy replacement by default disallows access to a ClusterIP service from outside the cluster. This can be allowed by setting bpf.lbExternalClusterIP=true.

Enabling that flag allows to talk to the svcs over tailnet.

hey @farcaller, how did you solve this in calico? I’m having a similar issue currently (I think). Direct connection to tailscale directly on host works, but into pods not, probably because calico is messing with it

I actually moved to cilium – it was a better fit for my infra.

Ah too bad. I’m using a managed kubernetes cluster that is using calico for everything, so moving to cilium is sadly not an option.

banging my head against it right now

hey, so I enabled lbExternalClusterIP, and restarted the cilium daemonset and I can see that the correct value has been propagated. However I still cannot access services, the traffic is simply routed to the default gateway which doesn’t know what to do with it. Any tips what can be wrong?

On my router I can see the logs about traffic coming from the Tailscale pod to the router (10.43.159.168 is my service’s cluster IP)
~# tcpdump -i any host 10.43.159.168
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
13:58:42.841462 IP 10.0.15.115.53058 > 10.43.159.168.3000: Flags [SEW], seq 3643393857, win 65535, options [mss 1240,nop,wscale 6,nop,nop,TS val 853167005 ecr 0,sackOK,eol], length 0