Netfilter connection tracking support for IPVS Julian Anastasov - Sep 6, 2003 CONTENTS: 1. Install on Linux 2.4.23 and next 2.4/2.6 kernels 2. Install on Linux 2.4.22 with IPVS 1.0.10 3. Install on Linux 2.6 4. Purpose 5. Usage 6. Examples 7. Implementation details 1. Install on Linux 2.4.23 and next 2.4/2.6 kernels IPVS is included in 2.4.23, use only ipvs-nfct-2.4.23-1.diff No patches for 2.6.36 and next kernels, NFCT is included in 2.6.36 and DR/TUN parts in 2.6.37 cd linux patch -p1 < ../ipvs-nfct-2.4.23-1.diff - enable the CONFIG_IP_VS_NFCT option: 'Netfilter connection tracking' 2. Install on Linux 2.4.22 with IPVS 1.0.10 - Patch your 2.4.22 kernel with ipvs-1.0.10-nfct-2.4.22-1.diff, it exports some useful functions - apply all needed patches for the kernel, as specified in the IPVS README file - build your kernel For modules: - apply ipvs-1.0.10-nfct-1.diff to IPVS 1.0.10 - enable the CONFIG_IP_VS_NFCT support in ipvs/Makefile, look for "Enable NFCT support here:" For in-kernel build: - install the IPVS into your kernel source tree as usually - enable the CONFIG_IP_VS_NFCT option: 'Netfilter connection tracking' Example steps to build IPVS in kernel 2.4.22: # patch IPVS: tar xfz ipvs-1.0.10.tar.gz cd ipvs-1.0.10/ cat ../ipvs-1.0.10-nfct-1.diff | patch -p1 # in Makefile fix KERNELSOURCE to the right linux source # patch the kernel cd linux/ cat ../ipvs-1.0.10-nfct-2.4.22-1.diff | patch -p1 # install IPVS in kernel tree cd ../ipvs-1.0.10/ make patchkernel make installsource # reconfigure and build the kernel cd ../linux make menuconfig # Now may be you want to select the IPVS 'Netfilter connection tracking' # option after enabling Netfilter's # 'Connection tracking (required for masq/NAT)' support 3. Install on Linux 2.6 cd linux/ cat ../ipvs-nfct-2.6.19-1.diff | patch -p1 # Configure and build kernel as usually make menuconfig etc 4. Purpose The conntrack support is useful for LVS-NAT setups and for non-NAT methods (if forward_shared flag is used) to allow real servers to use the director as default gateway. By this way, we have proper conntrack state updated in reply direction. The NFCT patch is useful for the following cases: - matching IPVS packets with iptables rules using '-m state --state RELATED,ESTABLISHED'. This reduces the number of iptables rules in director for firewall setups. - Director with two or more default gateways (probably using multiple ISPs) can set snat_reroute to properly route the traffic from director to clients. This flag causes 2nd routing resolution for the IPVS packets after the source address is SNAT-ed (changed) from real server address (RIP) to virtual address (VIP). By this way the packets will use the right uplink gateway (ISP) for the used VIP but only if the routing rules and routes are properly set to route VIP to right gateway (ip rules with source address matching). As result, director can use many VIPs from different subnets (ISPs) and the packets will use the right uplinks. If this flag is not set (it defaults to 0) the packets will use gateway resolved based on RIP, not on VIP, i.e. the IPVS packets can not select the right ISP in all cases. This flag works for LVS-NAT only, no matter if /proc/sys/net/ipv4/vs/conntrack is set to 1 or not, i.e. this feature is independent from the NFCT's conntrack state support. 5. Usage - for FTP make sure you have ip_conntrack_ftp (nf_conntrack_ftp after 2.6.22) loaded: - LVS-NAT: - ip_vs_ftp, nf_conntrack_ftp, nf_nat_ftp - iptables_nat: to properly adjust the TCP sequence numbers - LVS-DR with forward_shared flag: - requires nf_conntrack_ftp: - to properly create expectations - to create RELATED connections to be properly matched in LOCAL_IN - requires persistent service as usually - LVS-TUN with forward_shared flag: - requires nf_conntrack_ftp: - to properly create expectations - to create RELATED connections to be properly matched in LOCAL_IN - requires persistent service as usually - Limitations: - Passive FTP works: - because nf_conntrack_ftp can see the FTP command responses in POST_ROUTING and can create expectation for the related data connection from client - Active FTP fails, need ACCEPT for VIP:20 in FORWARD: - because nf_conntrack_ftp can not see the FTP command requests in POST_ROUTING (there is IPIP header prepended) and can not create expectation for the related data connection from real server (port 20). May be a rule in FORWARD (-s VIP --sport 20) can help to accept this connection as NEW instead. - load your modules as usually - enable the NFCT support at run time: echo 1 > /proc/sys/net/ipv4/vs/conntrack Starting from 2.6.36 NFCT is enabled for ip_vs_ftp (LVS-NAT) connections even if /proc/sys/net/ipv4/vs/conntrack is 0 - enable the SNAT rerouting for IPVS traffic (optional, useful for source based routing. This flag does not play for iptables -j SNAT!!! Only for IPVS packets. To properly route iptables -j SNAT packets use other patches, eg. routes-*.diff echo 1 > /proc/sys/net/ipv4/vs/snat_reroute Currently, the default value is 1, so the flag can be disabled for performance reasons if multipath routes are not used - Sometimes IPVS changes TCP payload (eg. FTP commands changed by ip_vs_ftp) resizing the packets. When /proc/sys/net/ipv4/vs/conntrack is enabled the Netfilter's TCP connection tracking detects the changes in sequence numbers as errors and drops the packets. So, we need to avoid these checks: echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal or after 2.6.22: echo 1 > /proc/sys/net/netfilter/nf_conntrack_tcp_be_liberal This is not needed when vs/conntrack=0 because the netfilter conntracks don't remain after IPVS packets are forwarded. Not sure for SMP. - configure your IPVS services as usually 6. Examples For users of ipvs-nfct I would recommend such rules (the example is for FTP to VIP 192.168.1.100, access to all other ports on VIP is denied): echo 1 > /proc/sys/net/ipv4/vs/conntrack Before 2.6.22: modprobe ip_conntrack_ftp modprobe iptable_filter modprobe ip_nat_ftp Starting from 2.6.22 we use nf_conntrack: modprobe nf_conntrack_ftp modprobe iptable_filter modprobe nf_nat_ftp # iptables_nat is needed from 2.6.36 (with NFCT in kernel): modprobe iptables_nat # Restrict LOCAL_IN access iptables -A INPUT -p tcp -d 192.168.1.100 -m state --state RELATED,ESTABLISHED -j ACCEPT iptables -A INPUT -p tcp -d 192.168.1.100 --dport 21 -j ACCEPT #{ Only for Local Process (LVS-NAT) to accept NEW connections from RIP: iptables -A INPUT -p tcp -s RIP -j ACCEPT #} iptables -A INPUT -p tcp -d 192.168.1.100 -j DROP # Restrict FORWARD access iptables -A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT #{ Required for LVS-TUN only: Accept FTP DATA from real server (-s VIP): iptables -A FORWARD -p tcp -s 192.168.1.100 --sport 20 -j ACCEPT #} iptables -A FORWARD -j DROP Note that IPVS matches requests in INPUT and responses in FORWARD. Starting from 2.6.37 the requests and replies can be matched at LOCAL_OUT. We still need to avoid marking IPVS packets as invalid in TCP conntrack: echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal or after 2.6.22: echo 1 > /proc/sys/net/netfilter/nf_conntrack_tcp_be_liberal 7. Implementation details - State matching by forwarding method: LVS-NAT: - packets are seen in both directions, so the packets have conntrack in valid state (NEW, ESTABLISHED or RELATED) LVS-DR, LVS-TUN - when Director is not default gateway for real servers: - packets are seen in one direction only. In this case TCP treats every packet after SYN as invalid and the packets are forwarded untracked (skb->nfct = NULL). But conntrack entry remains in SYN_SENT state (60 secs). This is a problem for the state matching. We can not prevent it even if we set IPS_SEEN_REPLY_BIT (to force ESTABLISHED state) in conntrack while forwarding the first packet because the match modules do not see any conntrack attached to the packet. Such packets are with conntrack state INVALID, net/netfilter/xt_state.c:state_mt() uses XT_STATE_INVALID for them (-m state --state INVALID) LVS-DR, LVS-TUN - when Director is used as default gateway for real servers, for example, by using forward_shared interface flag: - same as LVS-NAT: packets are seen in both directions, so the packets have conntrack in valid state (NEW, ESTABLISHED or RELATED) - altering reply: - only for LVS-NAT - only for packets in original direction (NEW, RELATED) - usually, performed in LOCAL_IN for packets from clients, sometimes performed in FORWARD for RELATED connections created from real server (Active FTP DATA) or from LOCAL_OUT for local clients (2.6.37+) - not for Local Process (client on director, LVS-NAT) before 2.6.37 because: - Local Process feature is not implemented to schedule connections at LOCAL_OUT hook. When IPVS detects the first packet (state NEW) its conntrack is already confirmed at POST_ROUTING and can not be altered: LOCAL_OUT -> POST_ROUTING (confirm) -> LOCAL_IN (IPVS schedule) - Local Process should work in 2.6.37 and above - conntrack confirmation by forwarding method: LVS-NAT, LVS-DR: - the conntrack is modified (LVS-NAT only) to contain valid addresses and ports and the packet is not modified in other way that prevents it from traversing the POST_ROUTING chain including for NAT purposes LVS-TUN: - the conntrack expects reply from virtual IP, so it is not good idea to walk POST_ROUTING with the IPIP header prepended. This is the only method that needs skb->ipvs_property = 1 to exit from the POST_ROUTING chain early. - conntrack confirmation: - can be done at normal place at end of hook because __nf_conntrack_confirm() does not inspect the packet content (which can be a problem when confirming for LVS-TUN packets) - When conntrack is not needed: - set ipvs_property to 1 to avoid the confirmation. In this case we should exit POST_ROUTING early. Not possible with the LOCALNODE method. The conntrack is removed when packet is freed and new conntrack is created for next packet. Starting from 2.6.37 ipvs_property=1 means IPVS marked this packet as request or reply to avoid other IPVS hooks to process it. If netfilter conntrack support is disabled skb->nfct is replaced with notrack structure to avoid other netfilter hooks to play with IPVS packets. - skb->dev: - incoming device for received packets - outgoing device for sent packets but not defined for LOCAL_OUT - must be set before icmpv6_send() - Local routes: - IPv4 stack requires received packet in LOCAL_IN to have route that matches the IP header because some replies are built from the route - Local Client: when connecting to VIP with DR or TUN server the client must bind socket to another local IP because the kernel auto-selects the same VIP as source address. To accept packets from real server with saddr=VIP (local IP in director), director must use echo 1 > /proc/sys/net/ipv4/conf//accept_local where RS_DEVICE is the device where traffic from real server comes to director. Such option is dangerous if firewall does not protect against spoofing.