Fwiw, I was curious in my daily wish-i-didnt-bother-unblocking glance at HN if this could affect Tailscale as my understanding is the wireguard client is implemented in user space in Go?
Wireguard is a UDP protocol and thus not affected by TCP_NODELAY.
It gets a little fuzzy with things like DERP because the Wireguard UDP packets are encapsulated in a TCP connection, but the underlying implementation should be making reasonably sized writes, so I don’t expect that to be a problem.
Overall, that article is revealing an application problem. TCP_NODELAY is a heuristic in the kernel that deals with the tendency for applications where performance isn’t critical to be written in a way that is simple but inefficient – making a bunch of small writes that the kernel can batch together if it adds some extra latency. Go chose the lower latency mode but in turn that adds the expectation that applications will be more careful about how often they transmit data to the kernel. Fewer large writes through the kernel is significantly more efficient than a bunch of small writes (because each write is a system call, each of which takes a meaningful amount of time and system resources), so that is the better approach anyway. I’d imagine that most Go applications are using the built in HTTP client and/or server for most of their network needs, and that already handles the buffering in a reasonable way.
In any case, if Tailscale does have any instances of this class of problem in an area that is performance sensitive, it will quickly become apparent to the people working on performance optimization, and will be fixed. If it is occurring in an area that is not performance sensitive, it is unlikely to be much of a problem in practice, so it may not be discovered for a while. That seems OK.