Document about supporting IPSec protocols with the Linux Virtual Server (Development HowTo) CONTENTS: 1. Introduction 2. Goals 3. Current practice 4. Implementation 4.1 Direct Routing/Tunnel Methods 4.2 NAT Method 4.3 Denial of Service prevention 1. Introduction Before going into the details, the reader needs to understand all terms and internals for the following subjects: [1] Linux Virtual Server (LVS), its forwarding methods, the Internet Protocol Security (IPSec) terminology and usage, one of the Linux IPSec implementations, [2]FreeSWAN. Look at the end of the document for any URLs and references. 2. Goals The main goals to add IPSec support to LVS are: - build a cluster of VPN servers for IPv4 - use more CPU power for encryption and decryption, fill your bandwidth with many real servers, high availability, etc - no more, isn't it enough? Read below But how? In the near feature there will be methods where the both Secure Gateways will be able to automatically switch to secure mode without any prearrangements or at least with a low cost. We start to develop this support with the goal to be ready for that time and with the ability to provide all LVS features for all widely used protocols. We already know that the FreeSWAN team already proposed a way to "automatically" establishing secure connections by using authentication with RSA keys and with the help from the Secure DNS: the "Opportunistic" encryption [3]. This scheme allows the secure communication to take place if possible. If not, the talks can continue in the normal unencrypted way. We consider this as an Initiator's problem. If the requirements are to build only encrypted channels then this can be arranged by using the current methods but this is not a LVS issue, we are going to cover the case where the the encryption takes place. The unencrypted case is already implemented in LVS. Note that the Director does not need any IPSec support in the IP Stack. It is needed only if LocalNode method is used to deliver locally the IPSec traffic. 3. Current practice * Terminology in our context - Initiator: the VPN Client - Responder: the director (in the client's eyes) or the real server in director's eyes * VPN Masquerade (in Linux 2.0 and 2.2) The VPN Masquerade [5] has the most difficult goal to share the ISAKMP channel (UDP, port 500) for all its masqueraded initiators. This is caused from the way the ISAKMP connection is used: it is a host-to-host channel, from port 500 to port 500. The both ends make negotiations for other talks. The ISAKMP masquerading takes care of the fact that two initiators can establish ISAKMP session to same remote secure gateway which can break the protocol. * IPSec and NAT There are some difficulties in forwarding the IPSec protocols (ISAKMP, ESP and AH) through a NAT router. In some of the modes the protocols can provide integrity checks which involve the addresses included in the first IP header. We can list the following rules which apply when using NAT over the IPSec traffic: - NAT works for ESP tunnel mode because the addresses in the outer IP header are not protected from the protocol - AH does not like address translations because the addresses are protected - NAT for ESP transport mode can work safely only for UDP without checksums - The TCP/UDP checksum checks at the end host can be disabled to allow NAT for ESP transport mode. This is caused from the fact that both TCP and UDP include pseudo header checksums which break on address change. - IKE negotiations with preshared key depend on source addresses, so FQDN or ASN1 Identity types can be used for NAT - We don't care for the self identification, we don't care for the ID types but may be IPv4 is not suitable for NAT purposes. They can be FQDN or DER_ASN1_DN. * ISAKMP and NAT - We can't alter the source port 500 in the ISAKMP packets on NAT - From the ISAKMP traffic we can't understand what negotiate the both ends but by analyzing the traffic we can know whether the initiator is valid: we can see (in NAT mode only) the right responder's cookie in its next packets sent from the initiator. This helps in preventing a spoofing for the ISAKMP requests. - The timeouts for the ISAKMP traffic should be set according to the used rekey time - Because the cookies can change on rekey, we don't need to compare them each time, we need the match only for the first packets. - We can expect that keep-alive packets can be used to maintain the IKE/IPSec alive. The bad thing is that if these keep-alive packets are sent with low IP TTL with the intention not to make big traffic and to reach only the local NAT router, the LVS-DR method can't detect them (it works in one direction only, i.e. for out->in only). - Note that we can't learn something useful from the ISAKMP traffic because it is encrypted * NAT Traversal With the above scheme of two states for ISAKMP it seems we can support the NAT Traversal as defined in [4]. The ESP traffic is provided in ISAKMP packet after the two cookies with all zeroes. We can stop any traffic with initiator's cookie equal to 0x00 if the entry is still in initial state. There is one question: can we encapsulate from ESP to UDP on DNAT and from UDP to ESP on SNAT and by this way to transparently provide NAT traversal for the real servers, invisible to the external clients (suitable only for NAT method). 4. Implementation We can say that according to the forwarding methods and the kind of the IPSec traffic, not all cases can be supported in LVS. In any case, for all forwarding methods, we can use the following rules: - We don't need to maintain connection entries for the ESP and the AH protocols. It is enough to create one entry according to the used virtual service type: UDP/CIP:CPORT->VIP:500 with an optional connection template for persistent virtual services: IP/CIP->FWMARK If this disturbs the load balancing the problem should be handled differently but in any case maintaining one ESP and one AH entry per CIP is not a big problem, we even don't need to create additional entry for each client's SPI because all traffic from one client is sent to one real server. Of course, if we maintain one connection entry for each client's SPI we will know exactly the number of the SAs but with the help from dynamic weights for the real servers we can solve the problem with the imbalance in the cases when only a ISAKMP entry is maintained per each CIP. We even don't need to restart expiration timers for the ESP and AH traffic, we need only to follow the ISAKMP traffic and to treat the ESP/AH traffic as related and slave. - The ISAKMP entry needs 2 states with different timeouts: the first state needs short timeout to prevent the attackers from creating many ISAKMP entries without replying with correct responder's cookies. The first state covers the cases where the initiator sends packets with rcookie=0. The second state should be treated as established, i.e. in this state we can forward ESP and AH traffic to the selected real server. The timeout for this state should be tuned from the administrator according to the desired lifespan and the rekey interval. If keep-alives are used, this can help to maintain the ISAKMP entry in the LVS table. In short: When a ESP/AH packet comes we should lookup for ISAKMP entry. If such entry already exists and is in ESTABLISHED state (responder's cookie is already set), we can forward the ESP/AH packet to the ISAKMP's real server. When an ISAKMP entry is not found we should return ICMP_PROT_UNREACH error. When ISAKMP traffic is received we can restart the expiration timer for the current entry state. - We can even define a per-service number of entries that should be softly handled from the expire mechanism, i.e. we can allow up to specified number of entries to live even after their normal idle timeout period. In some cases this can cover the cases where the rekey interval is not incorporated correctly in the ISAKMP connection entries. - Note that we don't want to mark the virtual service as persistent, the persistence is implicitly required for the way the ISAKMP protocol is used. In such cases the load balancing can be maintained with dynamic weights for the real servers, only if needed. Apply persistence if you extend the virtual service with other kind of traffic and the redirection of the client to same real servers both for the IPSec and the other traffic is required. 4.1 Direct Routing/Tunnel Methods Nothing to add here, these methods can forward any kind of IPSec traffic because we don't change the addresses in the outer IP header and the IPSec terminators will treat the packet as correctly routed. As for other kind of traffic, in DR and TUN method it is difficult to deduce only from the input traffic whether the client is valid and we are not flooded from attacker generating pseudo requests. We know that there are many different ways to replace a NAT setup with Direct Routing setup without breaking any security rules. The experience shows that in most of the cases it is a matter of taste whether to select NAT or DR setup. It is even believed that the NAT method is more secure but the real benefits for the NAT method is that it provides more valid way to analyze the in->out traffic. The DR method sees the traffic only in one direction. In the LVS world we already know different ways to route the real server's replies back through the director, even for DR method without reaching the anti-spoofing checks at routing layer. The different ways can include: - using Layer 2 switching (Linux Bridging) to forward the real server's replies through the director without checking them at routing - using other solutions such as modifing the reverse path protection (forward_shared device flag) in the director to allow spoofed traffic from the network interfaces used to receive the traffic from the real servers There is a way even the real servers not to be reachable from the public, we don't need any public IP addresses on the real servers to use DR method, we can even provide the needed firewall rules according to the desired firewall policy. As result, there are many ways to use Direct Routing method for the IPSec traffic instead of NAT and by this way to support any kind of its modes and protocols: ISAKMP, ESP and even AH. 4.2 NAT Method From the above list of exceptions we can say that it is administrator's job to decide whether it is safe to use NAT method for the specific ESP traffic (AH can not be used in NAT mode). The ESP traffic in tunnel mode can be safely NAT-ed but for transport mode the admins should take care of the broken TCP/UDP checksums of the encrypted traffic caused from the changes in the addresses of the IP headers on NAT. The TCP/UDP receivers should ignore the broken pseudo checksums. Note that this is not a recommendation but a solution. We should mention that the ISAKMP traffic can be safely handled in this mode because we can match the responder's cookie in the initiator's packet and we can know whether the initiator is an attacker generating fake ISAKMP requests or a valid client. 4.3 Denial of Service prevention We can use many ways to prevent the attackers from flooding the director with fake ISAKMP or ESP/AH traffic. We know for some LVS and non-LVS ways: - QoS ingress We can rate limit the IPSec request packets by adding different rates for the ISAKMP, ESP and the AH traffic. We hope that this barrier prevents wasting memory for LVS entries and also protects the real servers. - QoS egress This kind of handling prevents flooding the real servers with unexpected number of requests but does not protect the LVS connection table from growing in a bad way. - LVS defense strategies We know that LVS already has different strategies to defend against attacks. These strategies prevent overloading the director's memory with fake entries but do not protect the real servers from the actual traffic. For this, the LVS defense can be combined with the above solutions or even with any other kinds of denial of service preventions. We should mention that the LVS dropentry strategy is universal way to keep the allocated memory for LVS connection entries in reasonable size. [1] http://www.linuxvirtualserver.org/ - The Linux Virtual Server [2] http://www.freeswan.org/ - The Linux IPSec implementation [3] draft-richardson-ipsec-opportunistic.txt - Opportunistic encryption [4] draft-ietf-ipsec-udp-encaps-01.txt - IPSec NAT Traversal [5] http://www.impsec.org/linux/masquerade/ip_masq_vpn.html - VPN Masquerading for Linux 2.0 and 2.2 Authors: Started from Julian Anastasov , February 17, 2002