Where the fuck is my traffic going?
Written on
Or how I rediscovered a quirk of linux networking for the millionth time. But let me start at the beginning.
The Setup
As some of you may know I am hosting some stuff (like this blog) at home. For a while I was just content with letting everybody know my home IP address and connect to it. But I always had an uneasy feeling about that.
Some days ago I decided to finally change that. I already had several scenarios in mind how to get y'all into my network without telling you where it is. But the simplest solution (ha, simple... we'll see about that...) I came up with was using a VPN tunnel between a VPS and me and using DNAT to squeeze some traffic through this tunnel.
I hadn't done this before, because I am a cheapskate and did not want to add any more recurring cost to my ever increasing list of IT related expenses. But I found an offer I couldn't refuse... (I won't tell you where, but 1 Euro per month is neat).
Configuring the VPS and tunneling a tunnel
It's kind of fascinating to see how many random SSH connection attempts you get on a host with an open SSH port. So first order of business was uploading a SSH-Key and only allowing key-based authentication. Kind of crazy that that's not the standard at this VPS provider.
After that I created a wireguard tunnel between the VPS and my main router (OpenWRT based) at home. At first I had only the other side in AllowedIps
. But that would change at a later point.
DNAT config
Since I already had DNAT configured for traffic arriving on Port 80 and 443 on the WAN-interface of the router, I copied that configuration for traffic arriving throught the wireguard tunnel.
On the VPS I added DNAT rules according to this guide.
Everything seemed to work and most people would be happy with this configuration.
But this solution has one drawback: masquerading.
All traffic arriving at my reverse proxy at home has the tunnel IP of my VPS as source address. This sucks for a lot of reasons. Most notably: I can't ban IP adresses from accessing my stuff. Well at least not at home. I would have to do this at the VPS and I don't want to do that.
Who's hiding behind that mask?
Isn't it possible to do this without masquerading? It should be.
Before attempting to leave the mask off I wanted to simplify the route of traffic. There is no real reason for double DNAT.
So I edited the wireguard config and some firewall rules. The IP of my reverse proxy is now in AllowedIps
in the peer config on the VPS and the DNAT on the VPS points all traffic there instead of the router IP.
Without masquerading traffic from any IP would come through the tunnel and has to go back through it, I added 0.0.0.0/0
to AllowedIps
in the peer config on my OpenWrt router (and made sure that no routes would be written to the main routing table).
Okay no more dilly dallying let's try this without the mask.
I removed the masquerade rule and... it doesn't work.
Locating the problem
Okay. The first question when something does not work as expected or hoped is to locate the problem.
So I activated tcpdump
on the VPS, on the router and on my reverse proxy to see where the traffic stops.
So I could see the requests on the VPS but no responses. Hmm... On the router I could see the same on the VPN interface. Is the reverse proxy not answering? Yeah looks like it. Wait...
The reverse proxy does answer. But the answers are going out through the wrong interface 🤦.
I have 2 network interfaces on my reverse proxy (well acutally three, but that doesn't matter here): LAN and DMZ. The lan interface is for management access and stuff. For some reason it has default route. The DMZ interface is for incoming connection through the DMZ.
The requests are coming through the DMZ interface but the answers are going out through the LAN interface. Why wasn't this a problem before? I haven't verified it, but I think the OpenWRT router simply did not care, that the answers were coming through a different interface.
Okay there is a simple solution to this: change the default route to the DMZ interface.
So now it should work shouldn't it? Nope. It doesn't.
What is going on?
Eagle-eyed readers might already know where the problem is. In the moment I didn't. It was late, I was tired and because I was sick I should have been laying in bed instead of reconfiguring my network.
I stared at all my tcpdump
sessions and could see the requests coming to my reverse proxy and it was answering like it should but for some reason the responses were not making it back through the tunnel.
Same problem different host
On my router I could see the requests and responses going through the DMZ interface. But only the requests were coming through the VPN tunnel. Where the hell were the answers going?
I ran tcpdump on a few other interfaces on the router (I have way too many for a 1-person household). And than I saw it: the answers were going out the wan interface. Out into the internet. Without being masqueraded 😲.
Well ok. They weren't going to the internet because the next hop discarded them. But still.
But why the hell would they go out through the WAN interface?? It took my way too long to realize that I simply had the same situation I already had on the reverse proxy: my router was simply sending them out over the default route.
I was in shock. Because of all of my (overcomplicated) firewall rules and firewall zones and stuff it shouldn't be possible for traffic from the DMZ interface to go to the WAN interface. But my router was routing them there. Why was my router routing traffic the wrong way? Why was the traffic allowed there?
Well because of connection tracking the router knew that these were responses and so they were allowed to go through. But it didn't know that I wanted them to go through the VPN.
In my opionion the networking stack should always route answers through the same interface the request was coming from. I don't know why the linux networking stack doesn't do this. I only know that this has bitten me in the ass way too many times and I seem to always forget about this after I have build a solution for this.
Solution
So how do I tell my router to route all the answers through the VPN?
The answer is policy-based routing. I could simply set up a rule to route all traffic from my reverse proxy through the VPN. But I don't want that. I want traffic originating from it to go the normal route.
So we need to somehow mark the traffic and route it accordingly. And that is what I did.
After searching a lot on the internet (because I didn't really know what I was searching for) I found several stackexchange/serverfault/superuser threads1 about this topic.
So I set up a new routing table with the VPN as default route, added a routing rule to use that table for all traffic with mark 0xfe and setup firewall rules to mark the traffic that arrives through the VPN and to restore marks for traffic coming from the reverse proxy (see mentioned stackexchange/serverfault/superuser threads).
Annoyingly all these threads mentioned how to do this with iptables but my router uses nftables. So I had to translate the rules. Luckily there are tools like iptables-translate.
Conclusions
- It works.
- I hate networking.