Document about supporting IPSec protocols with the
			Linux Virtual Server
			(Development HowTo)


	CONTENTS:

	1. Introduction
	2. Goals
	3. Current practice
	4. Implementation
	4.1 Direct Routing/Tunnel Methods
	4.2 NAT Method
	4.3 Denial of Service prevention


	1. Introduction

	Before going into the details, the reader needs to understand
all terms and internals for the following subjects: [1] Linux
Virtual Server (LVS), its forwarding methods, the Internet Protocol
Security (IPSec) terminology and usage, one of the Linux IPSec
implementations, [2]FreeSWAN. Look at the end of the document for
any URLs and references.


	2. Goals

	The main goals to add IPSec support to LVS are:

- build a cluster of VPN servers for IPv4

- use more CPU power for encryption and decryption, fill
your bandwidth with many real servers, high availability,
etc

- no more, isn't it enough? Read below

	But how? In the near feature there will be methods
where the both Secure Gateways will be able to automatically
switch to secure mode without any prearrangements or at least
with a low cost. We start to develop this support with the
goal to be ready for that time and with the ability to provide
all LVS features for all widely used protocols.

	We already know that the FreeSWAN team already proposed
a way to "automatically" establishing secure connections by
using authentication with RSA keys and with the help from the
Secure DNS: the "Opportunistic" encryption [3]. This scheme allows
the secure communication to take place if possible. If not,
the talks can continue in the normal unencrypted way. We
consider this as an Initiator's problem. If the requirements
are to build only encrypted channels then this can be arranged
by using the current methods but this is not a LVS issue, we
are going to cover the case where the the encryption takes
place. The unencrypted case is already implemented in LVS.

	Note that the Director does not need any IPSec support
in the IP Stack. It is needed only if LocalNode method is
used to deliver locally the IPSec traffic.

	3. Current practice

* Terminology in our context

	- Initiator: the VPN Client

	- Responder: the director (in the client's eyes) or
	the real server in director's eyes

* VPN Masquerade (in Linux 2.0 and 2.2)

	The VPN Masquerade [5] has the most difficult goal to
share the ISAKMP channel (UDP, port 500) for all its masqueraded
initiators. This is caused from the way the ISAKMP connection
is used: it is a host-to-host channel, from port 500 to port 500.
The both ends make negotiations for other talks. The ISAKMP
masquerading takes care of the fact that two initiators can
establish ISAKMP session to same remote secure gateway which
can break the protocol.

* IPSec and NAT

	There are some difficulties in forwarding the IPSec
protocols (ISAKMP, ESP and AH) through a NAT router. In some of
the modes the protocols can provide integrity checks which
involve the addresses included in the first IP header. We
can list the following rules which apply when using NAT
over the IPSec traffic:

- NAT works for ESP tunnel mode because the addresses in the
outer IP header are not protected from the protocol

- AH does not like address translations because the addresses are
protected

- NAT for ESP transport mode can work safely only for UDP
without checksums

- The TCP/UDP checksum checks at the end host can be disabled to
allow NAT for ESP transport mode. This is caused from the fact
that both TCP and UDP include pseudo header checksums which
break on address change.

- IKE negotiations with preshared key depend on source addresses, so
FQDN or ASN1 Identity types can be used for NAT

- We don't care for the self identification, we don't care
for the ID types but may be IPv4 is not suitable for NAT
purposes. They can be FQDN or DER_ASN1_DN.


* ISAKMP and NAT

- We can't alter the source port 500 in the ISAKMP packets on
NAT

- From the ISAKMP traffic we can't understand what negotiate
the both ends but by analyzing the traffic we can know whether
the initiator is valid: we can see (in NAT mode only) the
right responder's cookie in its next packets sent from the
initiator. This helps in preventing a spoofing for the ISAKMP
requests.

- The timeouts for the ISAKMP traffic should be set
according to the used rekey time

- Because the cookies can change on rekey, we don't need to
compare them each time, we need the match only for the first
packets.

- We can expect that keep-alive packets can be used to maintain the
IKE/IPSec alive. The bad thing is that if these keep-alive packets
are sent with low IP TTL with the intention not to make big traffic
and to reach only the local NAT router, the LVS-DR method can't
detect them (it works in one direction only, i.e. for out->in only).

- Note that we can't learn something useful from the ISAKMP
traffic because it is encrypted

* NAT Traversal

	With the above scheme of two states for ISAKMP it seems
we can support the NAT Traversal as defined in [4]. The ESP
traffic is provided in ISAKMP packet after the two cookies with
all zeroes. We can stop any traffic with initiator's cookie equal
to 0x00 if the entry is still in initial state.

	There is one question: can we encapsulate from ESP to
UDP on DNAT and from UDP to ESP on SNAT and by this way to
transparently provide NAT traversal for the real servers,
invisible to the external clients (suitable only for NAT method).

	4. Implementation

	We can say that according to the forwarding methods
and the kind of the IPSec traffic, not all cases can be
supported in LVS.

	In any case, for all forwarding methods, we can use the
following rules:

- We don't need to maintain connection entries for the ESP and
the AH protocols. It is enough to create one entry according to
the used virtual service type:

	UDP/CIP:CPORT->VIP:500

	with an optional connection template for persistent
	virtual services:

	IP/CIP->FWMARK

	If this disturbs the load balancing the problem should
	be handled differently but in any case maintaining
	one ESP and one AH entry per CIP is not a big problem,
	we even don't need to create additional entry for
	each client's SPI because all traffic from one client
	is sent to one real server. Of course, if we maintain
	one connection entry for each client's SPI we will know
	exactly the number of the SAs but with the help from
	dynamic weights for the real servers we can solve the
	problem with the imbalance in the cases when only a
	ISAKMP entry is maintained per each CIP. We even don't
	need to restart expiration timers for the ESP and AH
	traffic, we need only to follow the ISAKMP traffic and
	to treat the ESP/AH traffic as related and slave.

- The ISAKMP entry needs 2 states with different timeouts:
the first state needs short timeout to prevent the attackers
from creating many ISAKMP entries without replying with correct
responder's cookies. The first state covers the cases where the
initiator sends packets with rcookie=0. The second state should
be treated as established, i.e. in this state we can forward ESP
and AH traffic to the selected real server. The timeout for this
state should be tuned from the administrator according to the desired
lifespan and the rekey interval. If keep-alives are used, this
can help to maintain the ISAKMP entry in the LVS table.

	In short:

	When a ESP/AH packet comes we should lookup for ISAKMP
	entry. If such entry already exists and is in ESTABLISHED
	state (responder's cookie is already set), we can forward
	the ESP/AH packet to the ISAKMP's real server.

	When an ISAKMP entry is not found we should return
	ICMP_PROT_UNREACH error.

	When ISAKMP traffic is received we can restart the
	expiration timer for the current entry state.

- We can even define a per-service number of entries that should
be softly handled from the expire mechanism, i.e. we can allow
up to specified number of entries to live even after their normal
idle timeout period. In some cases this can cover the cases where
the rekey interval is not incorporated correctly in the ISAKMP
connection entries.

- Note that we don't want to mark the virtual service as persistent,
the persistence is implicitly required for the way the ISAKMP
protocol is used. In such cases the load balancing can be
maintained with dynamic weights for the real servers, only if
needed. Apply persistence if you extend the virtual service with
other kind of traffic and the redirection of the client to
same real servers both for the IPSec and the other traffic is
required.

	4.1 Direct Routing/Tunnel Methods

	Nothing to add here, these methods can forward any
kind of IPSec traffic because we don't change the addresses
in the outer IP header and the IPSec terminators will treat
the packet as correctly routed.

	As for other kind of traffic, in DR and TUN method it
is difficult to deduce only from the input traffic whether
the client is valid and we are not flooded from attacker
generating pseudo requests.

	We know that there are many different ways to replace
a NAT setup with Direct Routing setup without breaking any
security rules. The experience shows that in most of the
cases it is a matter of taste whether to select NAT or DR
setup. It is even believed that the NAT method is more
secure but the real benefits for the NAT method is that
it provides more valid way to analyze the in->out traffic.
The DR method sees the traffic only in one direction.

	In the LVS world we already know different ways to
route the real server's replies back through the director,
even for DR method without reaching the anti-spoofing checks
at routing layer. The different ways can include:

- using Layer 2 switching (Linux Bridging) to forward the
real server's replies through the director without checking
them at routing

- using other solutions such as modifing the reverse path
protection (forward_shared device flag) in the director to
allow spoofed traffic from the network interfaces used to
receive the traffic from the real servers

	There is a way even the real servers not to be reachable
from the public, we don't need any public IP addresses on the
real servers to use DR method, we can even provide the needed
firewall rules according to the desired firewall policy.

	As result, there are many ways to use Direct Routing
method for the IPSec traffic instead of NAT and by this way
to support any kind of its modes and protocols: ISAKMP, ESP
and even AH.

	4.2 NAT Method

	From the above list of exceptions we can say that it
is administrator's job to decide whether it is safe to use
NAT method for the specific ESP traffic (AH can not be used in
NAT mode). The ESP traffic in tunnel mode can be safely NAT-ed
but for transport mode the admins should take care of the
broken TCP/UDP checksums of the encrypted traffic caused from the
changes in the addresses of the IP headers on NAT. The TCP/UDP
receivers should ignore the broken pseudo checksums. Note that
this is not a recommendation but a solution.

	We should mention that the ISAKMP traffic can be safely
handled in this mode because we can match the responder's cookie
in the initiator's packet and we can know whether the initiator
is an attacker generating fake ISAKMP requests or a valid
client.


	4.3 Denial of Service prevention

	We can use many ways to prevent the attackers from
flooding the director with fake ISAKMP or ESP/AH traffic.
We know for some LVS and non-LVS ways:

- QoS ingress

	We can rate limit the IPSec request packets by adding
different rates for the ISAKMP, ESP and the AH traffic. We
hope that this barrier prevents wasting memory for LVS entries
and also protects the real servers.

- QoS egress

	This kind of handling prevents flooding the real servers
with unexpected number of requests but does not protect the
LVS connection table from growing in a bad way.

- LVS defense strategies

	We know that LVS already has different strategies to
defend against attacks. These strategies prevent overloading
the director's memory with fake entries but do not protect the
real servers from the actual traffic. For this, the LVS defense
can be combined with the above solutions or even with any other
kinds of denial of service preventions. We should mention that
the LVS dropentry strategy is universal way to keep the allocated
memory for LVS connection entries in reasonable size.


[1] http://www.linuxvirtualserver.org/ - The Linux Virtual Server
[2] http://www.freeswan.org/ - The Linux IPSec implementation
[3] draft-richardson-ipsec-opportunistic.txt - Opportunistic encryption
[4] draft-ietf-ipsec-udp-encaps-01.txt - IPSec NAT Traversal
[5] http://www.impsec.org/linux/masquerade/ip_masq_vpn.html - VPN
Masquerading for Linux 2.0 and 2.2

Authors:

Started from Julian Anastasov <ja@ssi.bg>, February 17, 2002