Like many of you out there, we were suddenly in a position where we needed to ramp up out remote connectivity to cope with the demand driven by Covid-19, after some research, we decided the easiest path was to build some more RAS servers and load-balance them with a pair of Kemp Loadmasters.
We followed the guide provided by Kemp and used their templates to configure the virtual services as laid out here: https://support.kemptechnologies.com/hc/en-us/articles/360026123592
Our set up was fairly straight forwards, a Nat rule on the firewall for UDP 500/4500 to the VIP on the load-balancer. Ras servers had their default gateway pointed at the same VIP. (See https://support.kemptechnologies.com/hc/en-us/articles/360002996552-Routing-Feature-Description and https://support.kemptechnologies.com/hc/en-us/articles/203126369-Transparency for a good explanation covering routing and default gateways) Ras servers and Kemps were on the same subnet.
All appeared well, but we could never get more than ~60 connections on a server, and many users were failing to connect, with the dreaded 809 error: https://directaccess.richardhicks.com/2019/02/14/troubleshooting-always-on-vpn-error-code-809/ (By the way, Richards site has a TON of really useful information on the subject, if you haven’t checked his site out yet, I would urge you to do so!)
We had ruled out blocked ports and Ikev2 packet fragmentation (We were running server 2019 with the registry setting to enable support for ikev3 packet fragmentation):
New-ItemProperty -Path “HKLM:\SYSTEM\CurrentControlSet\Services\RemoteAccess\Parameters\Ikev2\” -Name EnableServerFragmentation -PropertyType DWORD -Value 1 -Force
But we were still having issues. At this point, we called in Microsoft Premiere support, and they pointed out that the logs indicated “Max number of established MM SAs to peer exceeded”
this was because the Kemp was passing through the load-balanced VIP IP to the ras servers, and they, in turn, hit a max connection per IP limit and refused further connections. As per Kemps recommendation: “It is best practice to enable the Subnet Originating Requests option globally.” meant that the RAS servers only ever saw the IP of the VIP, not the clients.
this could be fixed by upping the limit in the registry on the RAS servers:(no upper limit)
New DWORD : IkeNumEstablishedForInitialQuery = 0x0000c350 [DEC 50000]
however, they pointed out that “you need to not use NAT for IKEv2 on the LB”
So a cleaner way of doing this is to adjust the settings on the virtual services on the kemp to enable transparency: (caveat, this works for us because we have the Loadbalancers and the RAS servers on the same subnet) Do this on both the virtual services, and uncheck Subnet Originating Requests (We unticked this from the Network Options pane in the WUI)
This helped with the connection limit, and clients with their public IPs were now being displayed in the RAS console.
The next issue we had is that clients would never re-connect if one of the ras servers failed until that failed server had been brought online, this was fixed by enabling “System Configuration/Miscellaneous :Options/Network Options :: Enable reset on close”
We also verified our L7 configuration was correct, with Always Check Persist set to “Yes – accept changes”, “Drop Connections on RS Failure” and “Drop at drain time” are checked.
Clients will now reconnect to another host should one go down, as expected.
We are now running in a stable environment with these tweaks to the original Kemp settings – a huge thanks to Richard Hicks for his help, hopefully, this will help some of you!