Two Non-Tailscaled machines talking via subnet-router (relay)

I have a tailscale node (let’s call it Hub Subnet Advertiser) advertising 172.31.0.0/16 in one Amazon VPC, let’s call it VPC Hub (CIDR 172.31.0.0/16). I brought it up with sudo tailscale up --authkey=REDACTED —advertise-routes=172.31.0.0/16

On another VPC, let’s call it VPC Spoke (CIDR 172.20.11.0/24) I have started up a node, let’s call it Spoke Relay using sudo tailscale up --accept-routes .

From that node, I can access any machine in 172.31.0.0/16. So far so good.

Now I want another machine (let’s call it Spoke UnTailscaled) in VPC Spoke which does not have tailscale installed to be able to relay through Spoke Relay. I have installed a route on my VPC Spoke to forward all traffic for 172.31.0.0/16 to Spoke Relay. I have enabled net forwarding on this instance, opened up security groups internally, turned off source/destination traffic, etc.

From Spoke UnTailscaled I cannot access any machine in 172.31.0.0/16.

Running tshark on Spoke Relay I can see that when I query from Spoke Relay to a machine on 172.31.0.0/16, I get this successfully:

Capturing on 'tailscale0'
  1 0.000000000 100.107.32.30 -> 172.31.41.11 TCP 60 50974 > distinct [SYN] Seq=0 Win=64480 Len=0 MSS=1240 SACK_PERM=1 TSval=3468169467 TSecr=0 WS=128
  2 0.014745624 172.31.41.11 -> 100.107.32.30 TCP 40 distinct > 50974 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
  3 6.334066939 100.107.32.30 -> 172.31.41.11 TCP 60 50976 > distinct [SYN] Seq=0 Win=64480 Len=0 MSS=1240 SACK_PERM=1 TSval=3468175801 TSecr=0 WS=128
… etc

and that the IP address is 100.107.32.30 and the query (netcat, fwiw) works.

When I query instead *from “Spoke UnTailscaled” and look at the traffic I see:

 12 23.498941114 172.20.11.100 -> 172.31.41.11 TCP 60 47968 > distinct [SYN] Seq=0 Win=26883 Len=0 MSS=8961 SACK_PERM=1 TSval=1002026423 TSecr=0 WS=128
 13 24.528713417 172.20.11.100 -> 172.31.41.11 TCP 60 [TCP Retransmission] 47968 > distinct [SYN] Seq=0 Win=26883 Len=0 MSS=8961 SACK_PERM=1 TSval=1002027452 TSecr=0 WS=128
 14 26.544661868 172.20.11.100 -> 172.31.41.11 TCP 60 [TCP Retransmission] 47968 > distinct [SYN] Seq=0 Win=26883 Len=0 MSS=8961 SACK_PERM=1 TSval=1002029468 TSecr=0 WS=128
 15 30.800670416 172.20.11.100 -> 172.31.41.11 TCP 60 [TCP Retransmission] 47968 > distinct [SYN] Seq=0 Win=26883 Len=0 MSS=8961 SACK_PERM=1 TSval=1002033724 TSecr=0 WS=128

which means that my Amazon route table is getting traffic to this machine.

As usual, I am semi-networking literate (enough knowledge to be dangerous), and am interested in learning more here. So a bunch of questions

  1. Is it the case that this has something to do with the fact that the Spoke Relay node starts up a software TUN interface and handles the routing for itself well?
  2. If that is the case, is there a special set of iptables masquerading or simple ip route that could take the traffic coming from the Spoke UnTailscaled (172.20.11.100) and forward it through this network interface? Is this something to do with table 52? Again, I am semi-literate in networking, so go easy on me :slight_smile:
  3. Am I barking up the wrong tree here? I have two Amazon “serverless” systems (fargate load balancers and Sagemaker) in two different VPCs that might need this. Even though I know the recommendation is to invest in ephemeral lambda nodes to encourage true mesh, I have to believe that this kind of ‘concentrator’ relay node is a common pattern.
  4. If so, are there any good tutorials for this kind of access pattern?

thanks