Date: Wed, 8 Oct 2003 11:48:24 -0500 Mime-Version: 1.0 (Apple Message framework v552) Content-Type: multipart/mixed; boundary=Apple-Mail-35--722324879 Subject: April 19th 2000 IPFW document From: Paul Borman To: BSDI List Message-Id: <3BAC7B84-F9AF-11D7-8BF2-000A9599FC6A@kryslix.com> X-Mailer: Apple Mail (2.552) Status: RO Content-Length: 26898 Lines: 978 --Apple-Mail-35--722324879 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; format=flowed As promised, here is my internal description of the IPFW language. -Paul --Apple-Mail-35--722324879 Content-Disposition: attachment; filename=ipfw.txt Content-Transfer-Encoding: 7bit Content-Type: text/plain; x-unix-mode=0644; name="ipfw.txt" Wed Apr 19 2000 Tue Dec 14 1999 Thu Sep 2 1999 Tue Jun 29 1999 Wed Dec 16 1998 IPFW BPF Command Syntax The syntax described below describes the features available in BSD/OS's BPF IP Filters as of the above date. Not all of the statements may be available in the released version. BSDI will release an updated version of IPFW. With the exception of the filter and bind statements (described below), BPF IP Filters are executed serially. The underlying BPF virtual machine only allows forward jumps and provides for not subroutine calls or returns from subroutines. The order in which statements are presented are important. In general, the filter should attempt to resolve the disposition of the packet as quickly as possible. When examining addresses is it generally faster to start with the largest netmask and follow on to smaller netmasks, for instance, the condition srcaddr(192.162.42.17, 192.162.44.0/24) requires fewer BPF instructions than: srcaddr(192.162.42.0/24, 192.162.42.17) However, if a majority of the packets are in the 192.162.42 network then the second statement, while longer, may reduce the average execution time of the filter. The serial nature can also make certain conditions impossible, for instance: srcaddr(192.162.42.0/24) { command-list reject; } srcaddr(192.162.42.17) { command-list } The second condition will never be executed since all packets that could possibly match it were resolved in the previous condition. However, the check will still be made for each packet that does not match the first condition. Due to the historical development of IPFW and its relationship to other systems, "permit" is a synonym for "accept" and "deny" is a synonym for "reject". This document uses the terms "accept" and "reject", though "permit" and "deny" may be used instead. reject ; reject [ nbytes ]; reject [ nbytes : userdefined ]; Terminate the filter and reject the packet. If nbytes is specified then report the first nbytes of the packet to the IPFW logging socket (if nbytes is 0 then the entire packet will be reported). If userdefined is specified then the 8 bit userdefined field in the return value is set to the specified value. The userdefined field takes the form of one or more user defined values bitwise ORed together. Each user defined value is described in one of the following ways: user0 - user255 user0x0 - user0xff An integer value between 0 and 255 userbit0 - userbit7 An integer with only the specified bit being set (1, 2, 4, 8, 16, 32, 64, or 128) Bit 0 through 7 For example, it may be desired to reject SYN requests for FTP requests and forward the packet to user level. Further, to indicate to the user level program we have decided that the high order bit of the user defined field being set indicates the packet is being logged for examination by a user level program and that the lower 7 bits determine which proxy should examine the request. In this sample, we use the value of 21: tcp && dstport(ftp/tcp) && tcpflags(syn) { reject [0 : userbit7 user21]; } Of course, a proper filter would embed this such that there is only a single check for tcp packets in any path, and that the dstport also be checked only once. For example: switch ipprotocol { case tcp: established { accept; } switch dstport { case ftp/tcp: tcpflags(syn) { reject [0 : userbit7 user21]; } reject; break; case telnet/tcp: ... break; } break; } reject icmp ( icmptype, icmpcode ) ; reject icmp ( icmptype, icmpcode ) [ nbytes ]; reject icmp ( icmptype, icmpcode ) [ nbytes : userdefined ]; Operates as the reject statement above, however, an ICMP packet is returned with the specified ICMP type and code. The ICMP type and code can be either an integer or one of the following: echo echoreply ireq ireqreply maskreply paramprob optabsent redirect host net toshost tosnet routeradvert routersolicit sourcequench timxceed intrans reass tstamp tstampreply unreach host host_prohib host_unknown isolated needfrag net net_prohib net_unknown port protocol srcfail toshost tosnet accept ; accept [ nbytes ]; accept [ nbytes : userdefined ]; Terminate the filter and accept the packet. See the reject statement for a description of the nbytes and userdefined parameters. return [ number ]; Returns a numeric value from the filter. This is used by the classification filter for rate filtering to return which rate filter the packet belongs to. next ; Terminate this filter and proceed on to the next filter in the chain. If this is the last filter in the chain then the packet is rejected. call ( "filter-tag" ) ; call ( "filter-tag" : filter-specific-data ) ; call ( filter-index ) ; call ( filter-index : filter-specific-data ) ; See the discussion below in the condition section. A call() can be both a statement and a condition. chain ( "filter-tag" ) ; Calls the filter on the call chain that has the matching tag. The current filter is terminated with the same value as the chained filter, except no bytes will be logged (the chained filter will have already logged the bytes). chain ( filter-index ) ; Same as above, but the actual index into the call chain is used. You should almost always use the above version. implicit reject ; implicit reject [ nbytes ]; implicit reject [ nbytes : userdefined ]; implicit accept ; implicit accept [ nbytes ]; implicit accept [ nbytes : userdefined ]; implicit next ; If the end of the filter is reached act as if the specified statement was issued. It is generally considered much better to explicitly place an ending statement rather than using the implicit keyword. local ; This experimental statement sets the mbuf flag M_LOCAL. This flag, if supported by the system and when set by a pre-input filter, forces the packet to be viewed as a local packet. This is used for experimental transparent proxy filters. ipv4 { command-list } If the IP packet is version 4 then execute the command-list. Use IP version 4 addresses and offsets in the command-list. This is the default mode. ipv6 { command-list } If the IP packet is version 6 then execute the command-list. Use IP version 6 addresses and offsets in the command-list. ipv6 ; Assume IP version 6 for all further statements. This command does not generate any BPF assembly instructions. It simply adjusts the internal state of the compiler. This would be used for a filter that can only have IPv6 packets passed to it. The "ipv6 { command-list }" statement should be used for filters that may see both version 4 and version 6 packets. ipv4inipv4 { command-list } If the IP packet is the IP-in-IP protocol then execute the command list. The outside, or wrapper will be examined while executing the command-list while any commands following this statement will examine the encapsulated packet. condition { command-list } condition { command-list } else { command-list } Most statements are condition statements in one of the two forms above. Conditions can be combined to form a single condition using the following syntax: condition || condition True if either condition is true condition && condition True if both condition are true ( condition ) Simple grouping ! condition True if condition is false If the condition evaluates as true then the initial command-list is evaluated. If the condition is not true then the first command-list is skipped. If a second command list, following an "else" is specified then it will be executed if the condition does not evaluate as true. switch item { case range: command-list break ; case range: command-list break ; default: command-list break; } This statement examines a single item and determines the list of commands to execute based on the value. A comparison is done sequentially until a case either matches, the default case is reached, or we drop out the bottom of the switch. Cases may overlap and the first match is used. Since the default case matches all values, it necessarily must be the last case. Each case must terminate its command-list with the break statement, which terminates the command-list for that case. The only exception is that a case with an empty command-list may be allowed to fall through to the next case: switch srcaddr { case 192.168.42.17: case 192.168.42.23: command-list break; ... } It is possible to write cases which are impossible to execute due to this serial nature. For example: switch srcaddr { case 192.168.42.0/24: command-list break; case 192.168.42.17: command-list break; ... } The second case will never be reached as the first case always matches every packet that could possibly match the second condition. In general, the switch statement is more efficient than multiple conditional statements. The items that may be switched on, and described in more detail below, are: "string" accumulate data dstaddr dstport icmp icmpcode icmptype ipdata ipdatalen iphlen iplen ipoffset ipv packetlength protocol srcaddr srcport block { case cond : command-list break ; case cond : command-list break ; ... default: command-list break ; } A block statement is the equivalent of, but less wieldy than: cond { command-list; } else { cond { command-list; } else { ... else { command-list; } } } A block statement requires at least one case and may have the options default case. This syntax predates the switch syntax and is generally no longer used. Conditions defined are: call ( "filter-tag" ) call ( "filter-tag" : filter-specific-data ) Calls the filter on the call chain with the specified tag. If the IPFW_ACCEPT bit (0x80000000) is set in the return value of the filter (i.e., use of accept rather than reject to terminate the filter) then the condition evaluates as true. If not, it evaluates as false. The second variation, with filter-specific-data, passes a numeric valued, 0 - 15, to the filter. This 4 bit number is can be used by the filter to do filter specific actions. For example, the circuit cache uses the lower order bit to determine if this packet should be entered into the cache if not found. This could be used something like: tcp { call("tcp-filter") { accept; } established { reject; } dstaddr (www.company.dom) && dstport(http/tcp) { call("tcp-filter" : 1); accept; } } You will notice that unlike most conditions, a call() can be made as a statement as well. This is used in conjunction with statefull filters, such as a circuit cache or the throttling filter. You can limit calls to the filter to only those packets that should be checked. call ( filter-index ) call ( filter-index : filter-specific-data ) Same as above, but the actual index into the call chain is used. You should almost always use the above version. decapsulated Returns true if this packet was decapsulated from a tunneled packet, such as and IPinIP or IPSec tunnel. This is the same as ! preheader (0) established This condition should only be called for TCP packets. It checks to see if either the ACK or RST bit is set in the TCP flags for this packet. It evaluates as true if either bit is set, false if neither bit is set. This is often used at the top of a filter to simply allow established TCP sessions to continue as quickly as possible. One method is to place: tcp && established { accept; } at the top of the filter. A second is to place it after it is determined the packet is a TCP packet later in the filter: switch ipprotocol { case tcp: established { accept; } command list... break; ... } forwarding This condition examines the MBUF flags to see if the M_FORW flag is set and returns true if it is set and false if not. This will always evaluate as true for the forward filter. It can be used in the pre-output filter to distinguish between locally generated packets and forwarded packets. For example, a pre-output filter might choose to only work on forwarded packets: forwarding { accept; } commands.... broadcast This condition examines the MBUF flags to see if the M_BCAST flag is set and returns true if it is set and false if not. For example, to filter out all broadcast packets a filter could say: broadcast { reject; } toobig This condition adds the length of this packet to the offset of this packet. If the sum is greater than 65535 then this condition evaluates as true, false if not. This can be used to filter out the forwarding of "the ping of death" packets: tobig { rejejct; } ipfrag This condition evaluates as true if either the "more fragments" bit is set or the offset for this packet is not 0. Some attackers will try and fragment a packet such that a firewall is unable to gain enough information about the packet to determine if it should be permitted or not. Note that the input filter is called only after the packet has been reassembled, so this statement has no meaning in an input filter. All other filters, including the pre-input and forward filters, can see fragmented packets. Since TCP should never see fragments, the following might be appropriate: tcp && ipfrag { reject; } ipmorefrag This condition is true if the "more fragments" bit is set in the IP packet. ipfirstfrag This condition is true if the "more fragments" bit is set in the IP packet and the offset is 0. An ambitious filtering system could key on this information to allow future parts of the fragment through (thus requiring the first fragment to always arrive first.) Such a filter is beyond the scope of the technology provided in BSD/OS but is not prevented by the framework provided. ipdontfrag This condition is true if the "dont fragment" bit is set in the IP packet. ipdontfrag and ipfrag should never both be true. This bit should always be set for TCP packets. The following might then be appropriate: tcp && ! ipdontfrag { reject; } An aggressive filter might combine several of the above to: (ipdatalen is described below) tobig { reject; } tcp { ipfirstfrag && ipdatalen(<40) { reject; } ipfrag || !ipdontfrag { reject; } established { accept; } } ip, icmp, igmp, ggp, ipip, tcp, egp, pup, udp, idp, tp, eon, encap, ospf These conditions are true if the IP packet's protocol field matches the specified protocol. Numeric protocol values, or ranges of protocol values can be examined by the protocol condition described below. While the protocol can be used directly, as in: tcp { ... } It often is better to use the switch statement on ipprotocol: switch ipprotocol { case tcp: ... case udp: ... ... } input interface ( interfaces ) output interface ( interfaces ) return interface ( interfaces ) These conditions evaluate as true if the requested interface matches ones of the specified interfaces. The input interface is the interface the packet arrived on. The output interface is the interface the packet is believed to be going out on. The return interface is the interface we would return a packet to the source address on. Typically the return interface should be the same as the input interface. For example, a system which uses exp0 for its internal network and exp1 for the external network might say in a pre-input filter or forward filter: // // Don't let packets coming in from the external network // spoof addresses on the internal interfaces // input interface (exp1) { ! return interface (exp1) { reject; } } an output or pre-output filter might have: // // Don't let packets going out to the external network // spoof external addresses // output interface (exp1) { return interface (exp1) { reject; } } a forward filter might have: // // Don't allow external packets to be routed through us // input interface (exp1) && output interface (exp1) { reject; } The following conditions each takes a series of ranges. Ranges can be specified as type specific data (which are described with the type below). In general, this is a comma delimited set of ranges. Each range can be a single "number", or a range of values. The following syntax is used to specify a range: < number <= number > number >= number Any value less than, less than or equal to, greater than, or greater than or equal to number, respectively number - number Any value between the two number, inclusive number / number This is not division. The second number must be a value between 1 and 32 and defines a network mask. The first number is then compared with the value with the generated netmask. The most common use is with srcaddr and dstaddr: srcaddr (1.2.3.4/29) number & number Compare the first number ANDed with the second number to the value ANDed with the second number. "variable" ( ranges ) Evaluates as true if the number associated with "variable" is within the specified ranges. This syntax was originally added to allow hand optimization of the BPF code, however, the need for this was basically eliminated when the optimization phase was added to the assembler (ipfwasm). For example, this was used as: "DSTADDR" = dstaddr; ... "DSTADDR" (1.2.3.4) { ... } The former advantage was the ability to load the value out of the mbuf only once and store it in a scratch memory location. Extraction of data directly from the mbuf is orders of magnitude more expensive than using scratch memory. The optimizer now automatically caches loads directly from the mbuf into scratch memory, but only if the result is used more than a single time. accumulator ( ranges ) Tests the value of the A register internally used in the BPF engine. This is only of use when embedded BPF assembly is included in the filter. For instance: LDX 4 * ([0] & 0xf) LD [X + 0] RSH # 4 accumulator (4) { ... } This load the IP header length into the X register, then loads the first byte of the data following the IP header and the upper nibble of this value down. This would be the IP protocol in an encapsulated packet. data [ index : size ] ( ranges ) data [ index ] ( ranges ) Extract size bytes (1, 2, or 4, defaulting to 1) starting index bytes into the packet and check the resulting value against the specified ranges. The data is assumed to be in network byte order. dstaddr ( ranges ) srcaddr ( ranges ) Compare the destination or source IP address of the packet to the values within the specified ranges. In addition to integers, values may be dotted quads (i.e., 1.2.3.4) or host names. Host names are looked up at the time the filter is compiled and only the first address returned is used. A range of IP addresses may also be expressed in the shorthand (1.2.3.4-.9) which is the same as (1.2.3.4-1.2.3.9) dstport ( ranges ) srcport ( ranges ) Compare the destination or source port (TCP and UDP packets) to the values within the specified ranges. In addition to integers, values may be the name of a service specified in the /etc/services file. For instance: dstport(smtp/tcp). Note that the trailing /tcp or /udp is required. icmp ( ranges ) This should only be used on packets known to be of the ICMP protocol. Compares the two byte ICMP code and type value to the values within the ranges specified. ICMP codes and types can be specified using the special form of: [ icmptype, icmpcode ] The type and associated codes are: echo echoreply ireq ireqreply maskreply paramprob optabsent redirect host net toshost tosnet routeradvert routersolicit sourcequench timxceed intrans reass tstamp tstampreply unreach host host_prohib host_unknown isolated needfrag net net_prohib net_unknown port protocol srcfail toshost tosnet icmptype ( ranges ) This should only be used on packets known to be of the ICMP protocol. Compares the ICMP type to the specified ranges. In addition to numeric values, the following values may also be used: echo echoreply ireq ireqreply maskreply maskreq paramprob redirect routeradvert routersolicit sourcequench timxceed tstamp tstampreply unreach icmpcode ( ranges ) This should only be used on packets known to be of the ICMP protocol. Compares the ICMP code to the specified ranges. In addition to numeric values, the following values may also be used (it should be noted that unlike the icmp condition, the icmpcode condition has no knowledge of the ICMP type.) paramprob_optabsent redirect_host redirect_net redirect_toshost redirect_tosnet timxceed_intrans timxceed_reass unreach_host unreach_host_prohib unreach_host_unknown unreach_isolated unreach_needfrag unreach_net unreach_net_prohib unreach_net_unknown unreach_port unreach_protocol unreach_srcfail unreach_toshost unreach_tosnet ipdata [ index : size ] ( ranges ) ipdata [ index ] ( ranges ) Extract size bytes (1, 2, or 4, defaulting to 1) starting index bytes into the data of the IP packet and check the resulting value against the specified ranges. The data is assumed to be in network byte order. For instance, to ban packets that do not have a UDP checksum: udp { ipdata[6:2](0) { reject; } } ipdatalen ( ranges ) Compares the length of the data in the IP packet to the specified ranges. This is computed by taken the total length of the ip packet and subtracting the length of the IP header. iphlen ( ranges ) Compares the length of the IP header to the specified ranges. iplen ( ranges ) Compares the total length of the IP packet to the specified ranges. ipoffset ( ranges ) Compares the offset into the packet for this IP packet to the specified ranges. For non-fragments or the first fragments of a fragmented packets this value will be zero. This can be used to prevent overwriting of port data after the first fragment has been passed through: // // Reject any packet that has an offset between 1 and 4. // The port numbers are located in the first 4 bytes for // both TCP and UDP packets. // ipoffset (1-4) { reject; } ipv ( ranges ) Compares the version field of the IP packet with the specified ranges. Normally the IPv4 or the IPv6 statement would be used if this information is of interest. packetlength ( ranges ) Compares the total packet length, as reported in the mbuf header, to the specified ranges. This may be different from iplen, which is taken from the IP header. preheader (ranges) Compares the number of bytes that preceeded the header to the specified ranges. If any bytes preceed the header, they are an encapsulation header. Normally the "decapsulated" command would be used. Preheader could be used to filter on a particular encapsulation header length, though we have no known use for that at this time. protocol ( ranges ) Compares the IP protocol number of the packet with the specified ranges. In addition to numeric values, the following specific IP protocols may be used: ip icmp igmp ggp ipip tcp egp pup udp idp tp eon encap ospf tcpflags ( ranges ) This should only be used on TCP packets. It compares the flags in the TCP header with the values in the specified ranges. Unlike other conditions, the only types of ranges that can be used are individual values or masks. Individual values are only tested to see if the specified bit is set. They do not do an actual comparison. For instance, a packet with both the ACK and FIN bit set will match all three conditions: tcpflags(fin) tcpflags(syn) tcpflags(fin, syn) A packet which has only the FIN bit set will match both: tcpflags(fin) tcpflags(fin, syn) but would not match tcpflags(syn) The flags that may be tested are: fin syn rst push ack urg filter "name" { command-list } Defines a filter named "name" that can be used later with the bind command. The command-list must terminate the filter. There is no concept of "falling through" the bottom of the filter and continuing on with the commands following the bind. bind "name" ; Calls the previously defined filter named "name". No statements following the bind are execute as the called filter must terminate the filter. The filter and bind statements are used to perform common functions. For instance, you may use these to insert a ban list of IP addresses that only take affect after all other rules have been evaluated: filter "ban-evil" { srcaddr(205.199.212.0/24, 205.199.2.0/24) { reject[120]; } accept; } #define ACCEPT bind "ban-evil" change srcaddr from addr to addr ; change dstaddr from addr to addr ; These experimental statements adjust either the source or destination IP address from the first addr to the second. The first addr is only specified to allow precomputation of the delta to the checksum. The filter is responsible for only executing this statement when the first addr does match the current source/destination address. These statements assume the packet is of the TCP protocol as the adjust both the IP checksum as well as the TCP checksum. These are only intended for use in a pre-input or pre-output filter. set srcaddr ( addr ) ; set dstaddr ( addr ) ; These experimental statements adjust either the source or destination IP address to the specified addr. The checksum is not modified, however, in a pre-output filter the IP checksum has not yet been computed (though the TCP or UDP checksum has already been computed and will still need updating.) fastfwd ; This experimental statement sets the mbuf flag M_CANFASTFWD. This flag is not part of any released or beta BSD/OS system. This feature only exists for certain research systems. --Apple-Mail-35--722324879 Content-Type: text/plain; charset=us-ascii --------------------------------------------------------------------- To unsubscribe, e-mail: bsdi-users-unsubscribe@mailinglists.org For additional commands, e-mail: bsdi-users-help@mailinglists.org --Apple-Mail-35--722324879--