Fortigate firewall odd behaviour when connecting direct

I have a machine called “macmini” that has subnet & exit node enabled. macmini sits in the office behind a fortigate router.

I have a home machine called t14, and I have --accept-routes enabled

When I first set it up, it works perfectly and it seems to go through lhr relay. But after a while it switch to direct connect and then the connection doesn’t work. Running tailscale ping from t14 shows:

pong from macmini (100.65.126.31) via DERP(lhr) in 27ms
pong from macmini (100.65.126.31) via DERP(lhr) in 126ms
pong from macmini (100.65.126.31) via DERP(lhr) in 12ms
pong from macmini (100.65.126.31) via DERP(lhr) in 276ms
pong from macmini (100.65.126.31) via 1.2.3.4:41641 in 20ms

after it’s connected via our office public ip 1.2.3.4, tailscale ping returns ok, but ssh or any other connection just won’t work.

our office uses a Fortigate router. the default policy is allow all outgoing: (first line)

I thought this should be enough after reading the “How NAT traversal works”: How NAT traversal works · Tailscale

I tried adding the allow incoming UDP 41641 on the router, then restart tailscaled, then direct connection starting to work. see second line in the above image

Question: is this what I’m supposed to do? Surely fortigate is the stateful firewall that’ll be able to allow outgoing to remember incoming?

And how did tailscale decide to go for direct connection instead of lhr relay? Obviously the relay works but direct doesn’t?

Actually I noticed after a while (not sure how long) tailscale seems to realised the direct didn’t work and reverted back to relay? I couldn’t stably reproduce this.

We need to allow


in order for Tailscale to work over DERp or DIrect connection.

Well the first line in my router config allows any outgoing traffic

Connection is negotiated based on the latency to reach from source to destination unless the direct connection has some issue or interruption connection should be stay direct else it will be swapped over to the DERP connection.

Another thing to try: does normal ping work? If so, try ping -s 1252. If that fails, can you find a lower number that does work?

Does macmini have any firewall rules that may be preventing incoming or outgoing traffic on tailscale0? iptables -L -n -v should show what the current firewall rules are. This does not seem like as likely to be the problem since you report that it works fine until the connection switches to direct.

1 Like

To answer your questions in the original post:

Tailscale decides to go for the direct connection instead of the relay when it sees successful communication over a direct connection. It will periodically send probes to every IP address of the remote device, and request that the remote do the same in the other direction. Until the probes succeed, it will use the relay. If the direct connection stops working, it will eventually give up and return to using the relay. It is unusual for tailscale ping to succeed over a direct connection but other traffic to not work correctly.

Your Fortigate router appears to vary port numbers to different destinations (“Hard NAT” in the NAT traversal document), which makes direct connections difficult.

  • Adding a port forward can help but is not guaranteed to work.
  • If enabling a NAT traversal protocol to the router like NAT-PMP is acceptable, that will typically resolve issues. If NAT-PMP is not labelled as a separate option, sometimes NAT-PMP is enabled by enabling UPNP (we do not currently use UPNP).
  • Sometimes there is an option to avoid changing port numbers quite so much: e.g. pfsense has a “disable firewall scrub” option that reportedly helps.
  • I took a look through the FortiGate documentation and it mentions “When using the IP pool for source NAT, you can define a fixed port to guarantee the source port number is unchanged.” which (if I read it correctly) might help: if you can force macmini:41641/udp to always map to the same external port then other devices should be able to establish direct connections. The external port doesn’t have to be 41641 (useful if you have multiple devices you wish to expose), the actual port number will be automatically discovered using STUN. If the external port used in SNAT is fixed and you can also use a port forward, that will probably be very effective.
1 Like

Thanks Adrian for the detailed reply. I have reverted my configuration to the original which essentially only have one policy: allow everything outgoing and deny everything incoming.

There is no firewall on macmini. sudo ufw status shows Status: inactive

At the moment, running tailscale status on home machine t14 shows direct connection to macmini in the office. And ping or ssh works fine.

I have full control of the router & network config. I think I’ll see when it breaks next time and try to go through the suggestions you gave.

Another observation: I have multiple machines in the office, e.g. there is another call l5501. However at any point, only one machine in the office seems to be able to connect direct to home machine t14:

❯ tailscale ping macmini
pong from macmini (100.65.126.31) via 1.2.3.4:41641 in 18ms
❯ tailscale ping l5501
pong from l5501 (100.106.45.127) via DERP(lhr) in 19ms
pong from l5501 (100.106.45.127) via DERP(lhr) in 12ms
...

Apologies creating another thread but I am facing exact issue as @jackielii only this time my setup is using an internal pfSense for network NAT’ed out to OPNsense.

[172.16.255.0/24] <-->DHCP pfSense <--> outbound NAT OPNsense <--> ISP's gear.

MacOS firewall has been disabled locally and we are using appstore-based Tailscale software.

What doesn’t work for me?
Connecting to internet via AWS VPC EC2 instance (using 1.1.1.1 MTR as an example).

What works for me?
Connecting to AWS VPC subnet via EC2 instance (using 10.2.0.4 SSH/MTR as an example).

When connected to Tailscale but not using exit node:

Report:
        * UDP: true
        * IPv4: yes, w.x.y.z:12914
        * IPv6: no
        * MappingVariesByDestIP: true
        * HairPinning: false
        * PortMapping:
        * Nearest DERP: Sydney
        * DERP latency:
                - syd: 12.7ms  (Sydney)
                - sin: 105ms   (Singapore)
                - tok: 147ms   (Tokyo)
                - sfo: 164.4ms (San Francisco)
                - sea: 178.1ms (Seattle)
                - dfw: 198.2ms (Dallas)
                - nyc: 230ms   (New York City)
                - fra: 248ms   (Frankfurt)
                - lhr: 263.3ms (London)
                - blr: 273.2ms (Bangalore)
                - sao: 333.6ms (São Paulo)

When using exit node:

Report:
        * UDP: false
        * IPv4: (no addr found)
        * IPv6: no
        * MappingVariesByDestIP:
        * HairPinning:
        * PortMapping:
        * Nearest DERP: unknown (no response to latency probes)

Let me know what I can check from my end. We want this setup for our rollout because without this folks out in untrusted envs - cafe, library wouldn’t be able to have secure access.

Thanks.

It seems the issue is due to NAT on my end. Double checked with a 4G network and it works like a charm.

Now, I need to figure out how to fix the hard NAT :sob:

Tailscale 1.12.3, the current version, has a NAT-PMP + UPnP implementation, and the next version 1.14 adds PCP support which works well with OPNsense. You can enable the support in OPNsense by installing the os-upnp plugin and enabling UPnP and NAP-PMP.

Tailscale 1.14.0, to be released soon, is able to make direct connections through OPNsense using this support.

However what you’ve described is a double-NAT where both layers are hard NAT. I don’t see a way of making direct connections work through that.

Yes. I think that’s the issue. I’ll try to see if i can muscle up the os-upnp route. Thank you!

I enabled UPnP, NAT-PMP at both hard NATs. On OPNsense, using os-upnp and also toggled them in the pfSense that handles the LAN network. I think the pfSense UPNP+NAT-PMP wasn’t required but I tried both. Cleaned pf states several times as well.

There is a VLAN between the two hard NATs which complicates it a bit. So far, not much joy. I’ll keep checking for replies here. Thanks thus far.

The Tailscale version where we’ve tested direct connectivity through OPNsense is not out yet. We’re expecting it next week, version 1.14.0. It adds PCP (Port Control Protocol) support, which OPNsense’s NAT-PMP implementation uses.

HOWEVER: NAT-PMP/PCP will only reach the first level of firewall, pfSense in your description. It might get forwarding through pfSense, but won’t be able to reach the second layer with OPNsense.

pfSense and OPNsense are both considered “hard” NAT. I do not think we currently have a way to get through two layers of hard NAT.

1 Like

That’s great.

As for my woes, after many tcpdump sessions, i figured out exactly where the problem was. It turns out with the double hard NATs out of the way (performing only single hard NAT), the internal pfSense needed an allowlist sort of. Cringe at putting hosts in allowlist but that was the only way I could get it working successfully to exit our EC2 node. At least, we have a outer hard NAT on OPNsense so that doesn’t bother me too much for this particular use case.

Many thanks @DGentry ! :+1:

The NAT-PMP+PCP support in 1.14.0 may work with pfSense as well, we just haven’t tested it (we had test configurations for OPNsense). Enabling UPnP and NAT-PMP on pfSense is worth trying.

1 Like

Did NAT-PMP+PCP ever get enabled? I’m seeing plenty of entries for Parsec and other services in our NAT-PMP\UPnP log for our PFSense but none for Tailscale.

Yes, current Tailscale versions send out regular queries. tailscale netcheck can show what it found. We’d expect to see a line saying:
PortMapping: NAT-PMP, PCP

    * UDP: true
    * IPv4: yes, 
    * IPv6: yes, 
    * MappingVariesByDestIP: true
    * HairPinning: false
    * PortMapping:
    * Nearest DERP: Seattle
    * DERP latency:
            - sea: 9.4ms   (Seattle)
            - sfo: 12.3ms  (San Francisco)
            - dfw: 42.5ms  (Dallas)
            - ord: 49.4ms  (Chicago)
            - nyc: 72.5ms  (New York City)
            - tok: 114.5ms (Tokyo)
            - lhr: 135.6ms (London)
            - fra: 143.5ms (Frankfurt)
            - sin: 174.1ms (Singapore)
            - syd: 178.9ms (Sydney)
            - sao: 185.4ms (São Paulo)
            - blr: 241.2ms (Bangalore)

nada.