Tailscale proxy in k8s with cilium works with pod not with svc?

alesti · March 9, 2022, 4:39pm

Hi, we use k8s v1.21.10 with cilium v1.11.1 and tailscale as proxy v1.22.
I think the real problem is on cilium side, but i am interested if someone has an idea to understand the problem.

The tailscale proxy should route to a k8s svc (nginx-ingress) with some pods.

If i monitor the traffic on the eth0 interface of the tailscale pod i can see SYN packets but no answer:
10.251.99.81 is the nginx-ingress service.

/ # tcpdump -n -i eth0 host 10.251.99.81
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
08:07:30.777474 IP 10.251.132.39.50760 > 10.251.99.81.443: Flags [S], seq 3804727180, win 65535, options [mss 1240,nop,wscale 6,nop,nop,TS val 2734636986 ecr 0,sackOK,eol], length 0
08:07:32.589384 IP 10.251.132.39.50755 > 10.251.99.81.443: Flags [S], seq 323254430, win 65535, options [mss 1240,nop,wscale 6,nop,nop,TS val 3192462830 ecr 0,sackOK,eol], length 0
08:07:35.362561 IP 10.251.132.39.50770 > 10.251.99.81.443: Flags [SEW], seq 1067036323, win 65535, options [mss 1240,nop,wscale 6,nop,nop,TS val 3201352034 ecr 0,sackOK,eol], length 0

I do not see any discarting or rejects in the cilium monitor tools (hubble or tcpdump on the vxlan interface) on the nginx pods.

What works:

Connecting the nginx-ingress svc from the tailscale pod (curl https://10.251.99.81:443)
Using tailscale as proxy service if it is connected to one of the nginx-ingres pods (which makes no sense while pods can be restarted/destroyed/removed without further notice and if, they will have a different ip address)

This is not a problem with older cilium (v1.8.x) or a different cni (calico).
So - this will be clearly a cilium problem, but i am curious what might be the technical difference between a forwarded tcp/ip packet and a new created one?

ileixe · June 14, 2022, 7:25am

Do you have any evidence of this issue? We have exactly same issue and wonder why it did not work.

alesti · June 14, 2022, 8:21am

We could it fix by changing the art of kubeproxy replacement in cilium from Strict to Probe.

See Kubernetes Without kube-proxy — Cilium 1.11.5 documentation

ileixe · June 14, 2022, 9:52am

Hmm, that’s very interesting. We’ve also used strict mode for kube-proxy replacement, so I will give it a try.

By the way, did you figure out what’s the root cause to make such difference? As I spent a lot of time, I’m very curious about the background.

farcaller · August 29, 2022, 7:47pm

I’ve just had this fight with calico in ebpf mode and I wonder if it’s because cilium can’t parse the tailscale0 packets being an L3 interface and lacking the eth header.

farcaller · August 29, 2022, 8:28pm

Per the docs:

As per k8s Service, Cilium’s eBPF kube-proxy replacement by default disallows access to a ClusterIP service from outside the cluster. This can be allowed by setting bpf.lbExternalClusterIP=true.

Enabling that flag allows to talk to the svcs over tailnet.

greenlime · November 1, 2022, 9:17am

hey @farcaller, how did you solve this in calico? I’m having a similar issue currently (I think). Direct connection to tailscale directly on host works, but into pods not, probably because calico is messing with it

farcaller · November 1, 2022, 9:32am

I actually moved to cilium – it was a better fit for my infra.

greenlime · November 1, 2022, 9:43am

Ah too bad. I’m using a managed kubernetes cluster that is using calico for everything, so moving to cilium is sadly not an option.

banging my head against it right now

wokalski · July 15, 2023, 12:07pm

hey, so I enabled lbExternalClusterIP, and restarted the cilium daemonset and I can see that the correct value has been propagated. However I still cannot access services, the traffic is simply routed to the default gateway which doesn’t know what to do with it. Any tips what can be wrong?

On my router I can see the logs about traffic coming from the Tailscale pod to the router (10.43.159.168 is my service’s cluster IP)
~# tcpdump -i any host 10.43.159.168
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
13:58:42.841462 IP 10.0.15.115.53058 > 10.43.159.168.3000: Flags [SEW], seq 3643393857, win 65535, options [mss 1240,nop,wscale 6,nop,nop,TS val 853167005 ecr 0,sackOK,eol], length 0

Topic		Replies	Views
Mtu issue, searching the best way to solve it SUPPORT QUESTIONS	1	2591	February 17, 2022
Tailscale blocking k8s.gcr.io?	2	857	March 14, 2022
Please help me figure out why my k8s proxy isn't working Containers in Tailscale	2	1522	February 9, 2023
Tailscaled in GKE Pod can ping but not route Linux	8	1558	September 4, 2021
Can't connect to rootless podman container with tailscale on host Linux	3	2501	March 20, 2023

Tailscale proxy in k8s with cilium works with pod not with svc?

Related topics