NAT ====================================================================== The NAT subsystem of netfilter/iptables is split up in three parts: ip_nat_rule.c - the ru initialization ip_nat_standalone.c:init_or_cleanup() initializes the whole nat subsystem. This global initialization function does nothing else than calling the initialization function of all NAT subsystem parts. - ip_nat_rule.c:ip_nat_rule_init(), which registers the 'nat' table with the generic IP tables layer by calling ip_tables.c:ipt_register_table(). - ip_tables.c:ipt_register_target to register the two built-in NAT targets (SNAT and DNAT) via ip_tables.c:ipt_register_target(). - ip_nat_core.c:ip_nat_init() for initialization of the real NAT core ip_nat_core.c:ip_nat_init(): - Registers the builtin NAT protocol modules for TCP, UDP and ICMP. - Initializes the linked lists of the NAT hashtables - Registers ip_nat_cleanup_conntrack() as callback function with the connection tracking subsystem. Now we are ready to register NAT with with the core netfilter hooks by calling net/core/netfilter.c:nf_register_hook() for NF_IP_PRE_ROUTING, NF_IP_POST_ROUTING and NF_IP_LOCAL_OUT. packet traversal through nat code incoming packets at the NF_IP_PRE_ROUTING hook As we've registered ip_nat_standalone.c:ip_nat_fn() with the netfilter prerouting hook, this function is called as soon as a packet traverses this hook in the network stack. According to the priority we've registered at the hook, the following things may have happened at this hook, before entering the NAT code: - connection tracking - traversal of the mangle table ip_nat_standalone.c:ip_nat_fn() We will discover later that this function is also called from other places than the NF_IP_PRE_ROUTING hook directly. The function determines the type of manipulation. As we are discussing the PRE_ROUTING case, the main focus is manipulation of the des the destination address/port (DNAT is the built-in target for this case) After invalidating the hardware checksum of the skb, the nat code tries to resolve the appropriate struct ip_conntrack for this packet by calling ip_conntrack_core.c:ip_conntrack_get(). If the connection tracking code was not able to determine state information about this packet, the NAT code returns NF_DROP to the netfilter hook, thereby causing the packet to be dropped. If we have valid state information about this packet, further processing depends on which state was assigned to this packet. - RELATED If the packet is an ICMP packet, we have to do special icmp NAT handling, which is done in ip_nat_core.c:icmp_reply_translation(). - NEW (and RELATED non-icmp) This is a very important part relevant for understanding the whole NAT subsystem. Only if the packet has the state NEW (i.e. it would establish a new connection, if we'd accept it), the NAT table is traversed by calling ip_nat_rule.c:ip_nat_rule_find(), which in turn calls ip_tables.c:ipt_do_table() for the actual IP table traversal. The traversal ends up in either ACCEPTing the packet as it is, or one of the nat targets (SNAT, DNAT and if loaded: REDIRECT, MASQUERADE) Please see chapter FIXME for further description of those targets. - ESTABLISHED This packet belongs to an already established connection. We don't need to traverse the NAT table again, as the necessary information (struct ip_nat_info) was already gained as the first packet of this connection entered the NEW case above. We just use what's in the struct ip_conntrack.nat.info. For all packets, either in the NEW or RELATED case, we now call ip_nat_core.c:do_bindings(). ip_nat_core.c:do_bindings() The job of this function is to apply the address translation information as stated in a struct ip_nat_info to a particular packet / skb. A struct ip_nat_info can hold information about multiple manipultions (e.g. if we do SNAT and DNAT to one connection). So the first job is to find out, if one of the ip_nat_info_manip's applies for this hook and this flow direction. If so, we call ip_nat_core.c:manip_packet() to do the real work. ip_nat_core,c:manip_packet() has the job of calling the manip_pkt() function of the NAT protocol helper applying to the packet's layer 4 protocol. See Chapter FIXME for more information about the protocol-specific NAT manipulation. Afterwards the source or destination IP address is changed according to the information in the struct ip_conntrack_manip passed as an argument to this function. The IP header's checksum is updated by calling ip_nat_core.c:ip_nat_cheat_check(). After manip_packet() did the work, we look again into ip_nat_info, if there is an application helper module registered for this packet. If there is one registered, we call it. If there was no helper, or the helper returned sucessfully, we return NF_ACCEPT to the calling function (most likely ip_nat_standalone.c:ip_nat_fn()) outgoing packets at the NF_IP_POST_ROUTING hook. ip_nat_standalone.c:init_or_cleanup() has registered the function ip_nat_standalone.c:ip_nat_out() for packets traversing this hook. ip_nat_standalone:ip_nat_out(): This function doesn't do very much... it's main task is to have special handling for fragments by calling ip_conntrack_core.c:ip_ct_gather_frags(). But wait... why do we encounter fragments at this hook? As we know, connection tracking already did the defragmentation for us at the NF_IP_PRE_ROUTING hook. But imagine what happens, if the outgoing interface has a smaller MTU than the packet's size: The routing code (re-)fragments the packet. After fragment reassembly, we call ip_nat_standalone.c:ip_nat_fn(). Please see Chapter FIXME, as the further processing is the same as in the prerouting case. outgoing packets at the NF_IP_LOCAL_OUT hook. We apply destination NAT for locally-generated at this hook, right before the routing decision is made and the IP_NF_POST_ROUTING hook is entered. The initialization code has registered the function ip_nat_standalone.c:ip_nat_local_fn() with this hook. There's only a small difference from the NF_IP_PRE_ROUTING case: If the applied manipulation has changed anything, we have to re-route the packet by calling ip_nat_standalone.c:route_me_harder() deciding which manipulation to apply The decision about which manipulation applies to a particular connection is made at the time the first packet of this connection (i.e. the one assigned the state NEW by conntrack) traverses a NAT hook. As described in Chapter FIXME, ip_nat_rule.c:ip_nat_rule_find() is called for the state == NEW case. ip_nat_rule.c:ip_nat_rule_find() FIXME: expectation Afterwards, ip_tables.c:ipt_do_table() is called for the nat table. The packet traverses the respective chain in the 'nat' table. As soon as a rule matches, the respective target function is called. The two built-in targets for the 'nat' table are SNAT and DNAT. the SNAT target ip_nat_rule.c:ipt_snat_target() is the function registered as the SNAT target. It resolves the struct ip_conntrack for the connection this packet belongs to and calls ip_nat_core.c:ip_nat_setup_info() with the struct ip_nat_multi_range from the rule-specific private date (contains --to-source adress(es)) ip_nat_core.c:ip_nat_setup_info(): FIXME ip_nat_core.c:get_unique_tuple(): This function is called with a bunch of arguments: mainly a pointer to two struct ip_conntrack_tuple's: a new (manipulated) one and an old, unmodified one as well as a struct ip_nat_multi_range which specifies the range of adresses which are valid for the manipulation. FIXME: find_appropriate_src case Next we have to choose which new IP/port we want to map this packet and therefore this whole connection to. The whole process of making this decision seems fairly complex but can be nailed down to: - find_best_ips_proto_fast covers the easy case, where we have a 'range' consisting out of only one IP address. - find_best_ips_proto does the more complex handling if this rule has a real range of more than one IP adressses. The result is always the least used tuple consisting out of (src_ip,dst_ip,l4_prot). After having determined the new IP adress, we check if the protocol specific part of the new tuple-to-be is within the specified range for this rule. If it is, and the tuple-to-be isn't already used, we're done: We now have the exact mapping necessary for the NAT we want to do. If the range of this rule include a specific port range, we have to call the protocol's own 'unique_tuple' routine as well as checking if this tuple is already in use. If any of the condition fails, we start the whole loop again and call with find_best_ips_proto_fast(). As soon as we finally suceeded in getting a unique new tuple, which is not already used and within our rule's range, the function returns, and the program continues in io_nat_core.c:ip_nat_setup_info(). Back in ip_nat_setup_info() we invert the newly acquired tuple and alter our connection's struct ip_conntrack by calling ip_conntrack_alter_reply(). If either the source or destination adress has changed (which is most likely to be true, as we're talking about adress translation *g*), the information in ip_conntrack.nat.info is updated according to exactly what manipulation we've applied. In addition the ip_conntrack.nat.info.helper field is updated, as our changes might affect the application helper module. Finally NF_ACCEPT is returned and we're done. Please note that we just figured out which (if any) manipulation applies to this packet. The actual modification of the real packet does not happen within this target, but in manip_packet() as described in chapter FIXME.