Some day I am going to put together an article or series on deploying NSX (6.4.1 now) and vSphere 6.7 and all of the other fun goodies I’ve been messing with, I promise. But, for now, I am going to post my findings with OSPF in my NSX lab when mixing it up with pfSense (2.4.x).
Let me start with a quick diagram of what I’ve built out:
As you can see above I have a pretty standard-issue NSX deployment behind a pfSense firewall. The only mildly confusing part might be that pfSense is a virtual machine with it’s “LAN” up-link trunking all VLANs in my environment from a distributed port group within vSphere 6.7. The focus of the lab consists of a VXLAN transport network (VLAN 250), an Edge Service Gateway (ESG1 with up-link into VXLAN Distributed Switch), some logical switches, and a Distributed Logical Router (DLR1) routing between three networks (transit, web, and database but labeled generic in the diagram).
The goal is that I will be setting up a web server in 10.252.252.0/24, a database server in 10.251.251.0/24, and creating rules between the two (with distributed firewall and ESG rules as needed) to keep only required ports traversing back and forth in a least-privileged fashion. Both networks should be able to get out to the internet for things like updates, etc., and reach 192.168.50.0/24 for DNS. Easy enough, right?
It is! And, in fact, www.jonkensy.com had been running on this setup for quite some time. I recently moved my blog to AWS because I was physically moving my server rack from a 2nd story bedroom closet to the basement after much renovation in the basement (more on that in another post, it’s done), so now I am putting NSX 6.4.1 back together atop vSphere 6.7 – fun stuff! That said, I wanted to add some more flair – when stuff works predictably and reliably you get bored – trust me! I had previously created static routes on pfSense in order for my other connected networks (primarily 192.168.50.0/24) to know how to get down into 10.251.251.0/24, etc. This was easy enough and looked like this:
You can see that I told pfSense about the networks behind the NSX DLR by means of 192.168.250.254. This had been working great. Using static routes is a very common routing strategy because it’s predictable and reliable but one drawback is that it does not scale well. This is my lab, of course, and though I don’t anticipate having thousands of networks defined in NSX VXLANs I am always out to learn how to do it best should the need ever arise. Again, with this static routing setup everything works great – DFW rules work, ESG rules work, all routing works properly and my VMs could get out to the internet through 192.168.250.1 (the gateway on the VXLAN transport network). Life was good.
But then I started down the path of dynamic routing. Boo hiss! There are a number of protocols that can be deployed when looking to configure dynamic routing but since this was just for internal routing I decided to use OSPF or Open Shortest Path First. It’s a pretty basic interior gateway protocol and I won’t get into all of the different configurations (stubby, not-so-stubby-areas, etc.) but just know that I have two areas as shown in the initial diagram in this blog post. The two areas are basically defined as “how the ESG gets to the stuff behind the DLR” and “how the pfSense gets to the stuff behind the ESG”. You really want your areas to all connect back to the backbone or 0.0.0.0 (aka “0” area). This is why my pfSense to ESG area is “Area 0”. I won’t bore you with the details but basically I just installed the FRR package in pfSense and configured Area 0 between the pfSense firewall and ESG firewall and boom, everything was good.
However, I couldn’t get to the internet from behind the DLR!
TestTraceRoute:~$ traceroute 22.214.171.124 traceroute to 126.96.36.199 (188.8.131.52), 30 hops max, 60 byte packets 1 10.252.252.1 (10.251.251.1) 0.237 ms 0.142 ms 1002.258 ms 2 10.250.250.1 (10.250.250.1) 0.166 ms 0.191 ms 0.185 ms 3 192.168.250.1 (192.168.250.1) 0.616 ms 0.618 ms 0.598 ms 4 * * * 5 * * * 6 * * * 7 * * * 8 * * *…
You can see in the above traceroute output that I have no response to 184.108.40.206 – in fact, I have no response past 192.168.250.1! I turned all firewalling off so everything is just acting as routers – I even created an “allow all” rule in pfSense on the 192.168.250.1/24 interface for good measure. Nada.
So, annoyed, I went back to static routes to see if maybe I had changed anything that would have prevented this from working. Nope, it still worked:
I started to think this through. I won’t lie – it took me a day or two. While working on work projects/etc. I am constantly running through scenarios of things that I haven’t figured out in my head and then bam – ah hah! I remember talking to a co-worker and saying, “I wonder if pfSense creates any other access rules when setting up static routes that it doesn’t when learning routes through OSPF…” – his eyes glazed over in complete disinterest and off I went tearing through the menus and status pages of pfSense until I found it…
Ok, so the default pfSense outbound NAT mode (source NAT) is a automatic but what does that mean? It wasn’t very apparent until I clicked on a small blue “information” icon at the bottom… and even then it wasn’t super obvious:
The key words above are “…a mapping is automatically generated for each interface’s subnet…” – but guess what? OSPF routes aren’t interfaces and so an automatic mapping is not automatically created. Doh! Further, it doesn’t appear as though you can configure automatic outbound NATs for OSPF-learned routes. Bummer!
The only real options I have, as far as I can tell, are to either switch back to static routing and leverage automatic outbound NATs or switch to hybrid outbound NAT mode and create manual outbound NATs for the NSX networks. Neither option is “dynamic” nor “scalable”. That said, I could create a summary static route and only deploy NSX networks within that subnet – for instance, I could create a static route for 10.250.0.0/16 and only create 10.250.x.x subnets within NSX, all of which would match the static route configuration which would then, in turn, create the automatic outbound NAT rule. Or, I could create a manual outbound NAT rule in the same fashion for 10.250.0.0/16 and thus any dynamically created route would have a matching outbound NAT based on dynamic OSPF routes.
Actually, while typing that out, I realized I could probably create an outbound NAT for the entire RFC1918 IP space! Why not? If I defined an interface it’d create an automatic outbound NAT rule anyway! Hmm… maybe go full manual and just outbound all RFC1918…
If you’re reading this and have an opinion on one method over the other please let me know! I am mostly content that I was able to get OSPF working all the way up to pfSense, so I guess a victory is a victory. That said, I wish there were a method to create the outbound/source NAT. Oh well – can’t win them all.
Thanks for reading and I hope you find this useful!