IPTables which allows inbound connection but disallows inbound traffic - tcp

Currently trying to setup iptables to allow for a client to connect to the server to listen to a stream of messages via TCP. The thing is, we want to block the client from sending any messages once connected (is OK if the client is DROP'ed in this case).
Is there a way to allow a client to connect and enforce a 1 way communication from the server to the client only?
Requires this to work purely within iptables (no software proxy-like solution).

The primary problem to be solved here is that we can't just shut down all the traffic from the client after the connection has been established because TCP is a positive acknowledgement protocol. If the server does not receive acks from the client, it will retransmit, and eventually time out. In what follows, I will assume that we are using IPV4.
So what we want to do is to allow the connection to be established, and then only allow acknowledgements from the client, that is, packets that contain no TCP payload.
Unfortunately, the length of the TCP payload is not explicitly represented in the TCP header. We can try to use the overall length in the IP header, but this is complicated by the fact that both the IP header and the TCP header include variable length options fields, so there are many possible overall lengths that ack with no payload.
Since IP options are rarely used and commonly filtered, let's simplify things by first dropping all packets that contain options in the IP header (if your firewall is not already doing so). The implications of doing so are discussed at length here.
To do this, we will drop all traffic to our server (here taken to be 10.2.3.4:1234) where the IP header length (bits 4-7 of byte 0 in the IP header) is not 5:
iptables -A INPUT -p tcp -d 10.2.3.4 --dport 1234 \
-m u32 --u32 "0>>24&0xF=6:0xF" -j DROP
This uses the iptables u32 module to grab 4 bytes from the packet starting at byte 0, right shift it 24 bits, mask the lower nibble, and then drop the packet if this is in the range 6-15. Note that 5 is actually the minimum size of an IP header.
The situation with TCP options is a bit more complicated. In establishing the connections, many different options may be used, e.g., to negotiate window scaling. However, once the connection is established, the only thing we need to worry about is TCP timestamps and selective acks. So let's let the connection be established:
iptables -A INPUT -p tcp -d 10.2.3.4 --dport 1234 \
--tcp-flags SYN SYN -j ACCEPT
Note that it is possible to send a payload in a SYN packet, so here we are not completely meeting your requirement. Most ordinary TCP implementations will not do so, although TCP fast open does. If you want to mitigate this, you could drop SYN packets that are fragments (that could reassemble to something very large) and limit the overall length of the non-fragmented SYN packets to something reasonable that would allow for the usual options to be present in the TCP three way handshake. Note that the above rule was added to the INPUT chain, which is processed after IP fragment reassembly.
OK, so we can get the TCP connection established, and the IP headers are restricted to 5 words (20 bytes).
However the TCP headers may contain selective acks, tcp timestamps, both, or neither. Let's start with TCP headers with no options. An ack with no options and no payload will consist of a 5 word IP header followed by a 5 word TCP header followed by no data. So the total length in the IP header would be 40. If the packet was a fragment, it could conceal a payload in a subsequent fragment, but since we are working with the INPUT chain which is processed after IP fragmentation reassembly, we won't have to worry about this.
iptables -A INPUT -p tcp -d 10.2.3.4 --dport 1234 \
-m u32 --u32 "32>>28=5 && 0&0xFFFF=40" -j ACCEPT
The IP header is 20 bytes and the data offset nibble is in byte 12, so taking the 4 bytes starting at byte 32=20+12, we shift the nibble down and compare it to five, and then make sure that the total length in bytes 2 and 3 of word 0 of the IP header is 40.
If there are TCP timestamps in the TCP header, then there will be an additional 12 bytes (3 words) in the TCP header. We can accept that in a similar manner:
iptables -A INPUT -p tcp -d 10.2.3.4 --dport 1234 \
-m u32 --u32 "32>>28=8 && 0&0xFFFF=52" -j ACCEPT
I'll leave it as an exercise for the reader to work out the other combinations. (Note that dealing with selective ack is several cases as there can be 1-4 selective ack blocks, or 1-3 selective ack blocks with timestamps.)
DISCLAIMER: I did not actually try this out, so my apologies if there is a typo or if I overlooked something. I believe the strategy to be sound, and if there is any error or omission, let me know and I will correct.

Related

How can I look for uneven port numbers only using BPF?

How can I get a tcpdump that contains only uneven port numbers, using BPF?
You can first retrieve the source TCP port with tcp[0:2], i.e., the first 2 bytes of the TCP header. Then, checking if that value is odd, is a simple matter of checking if the last bit is 1:
tcp[0:2] & 1 == 1
To extend this to UDP ports, you shouldn't need to change anything because the source and destination ports for UDP are at the same offset in the UDP header as in the TCP header.
I'll let you extend to the destination ports :-)

How to stop tcpdump from truncation http payload?

I'm trying to do a capture and am seeing half the payload with '[!http]' half way through. Is there a way to make it show the whole payload?
I'm performing a REST call and want to see what the server is receiving:
<value>0x100000</value>
</attribute>
</greater-than-or-equals>
<less-than-or-equals>
<attribute id="0x129fa">
[!http]
16:34:06.549662 IP (tos 0x0, ttl 255, id 49564, offset 0, flags [DF], proto TCP (6), length 1301)
I'm using:
tcpdump -vvv -i ens192 tcp port 8080 and src 192.168.1.1
Any help would be appreciated.
Many Thanks
Frank
It is quite likely that the problem has less to do with what you are capturing and more to do with the payload being larger than a single packet.
When you run tcpdump, these days, the default is to capture packets whose length match the MTU of your interface (at least). You can override this, if you are unsure, by specifying a capture length of zero:
tcpdump -s 0 -w captureFile.cap
Again, this is likely not the problem here. It is more likely that the rest of the data is in the next TCP segment. Unfortunately, tcpdump is not the ideal tool for extracting session data. I would suggest that you look at Wireshark (or tshark) which will allow you to easily select a packet and then reassemble the data stream with all of the IP and TCP headers removed.

TCP checksum error for fragmented packets

I'm working on a server/client socket application that is using Linux TUN interface.
Server gets packets directly from TUN interface and pass them to clients and clients put received packets directly in the TUN interface.
<Server_TUN---><---Server---><---Clients---><---Client_TUN--->
Sometimes the packets from Server_TUN need to be fragmented in IP layer before transmitting to a client.
So at the server I read a packet from TUN, start fragmenting it in the IP layer and send them via socket to clients.
When the fragmentation logic was implemented, the solution did not work well.
After starting Wireshark on Client_TUN I noticed for all incoming fragmented packets I get TCP Checksum error.
At the given screenshot, frame number 154 is claimed to be reassembled in in 155.
But TCP checksum is claimed to be incorrect!
At server side, I keep tcp data intact and for the given example, while you see the reverse in Wireshark, I've split a packet with 1452 bytes (including IP header) and 30 bytes (Including IP header)
I've also checked the TCP checksum value at the server and its exactly is 0x935e and while I did not think that Checksum offloading matters for incoming packets, I checked offloading at the client and it was off.
$ sudo ethtool -k tun0 | grep ": on"
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: on
generic-segmentation-offload: on
generic-receive-offload: on
tx-vlan-offload: on
tx-vlan-stag-hw-insert: on
Despite that, because of the solution is not working now, I don't think its caused by offload effect.
Do you have any idea why TCP checksum could be incorrect for fragmented packets?
Hopefully I found the issue. It was my mistake. Some tcp data was missing when I was coping buffers. I was tracing on the indexes and lengths but because of the changes in data, checksum value was calculating differently in the client side.

Identify single communication

I have problem with identifying communication established by TCP.
I have to identify first completed communication, for example first complete http communication.
I have dump .pcap file with capture. I know that communication should start by three way handshake ( SYN, SYN - ACK, ACK ) and then closing of communication by double FIN flag from both side.
But I have a lot of communication in that dump file.
So here is the question. Which things i need to remember to match exact one communication ?
I thought about source IP, destination IP, protocol, maybe port but i am not sure.
Thank you for every advice.
And sorry for my english.
You stated that you need:
To identify a particular conversation
To identify the first completed conversation
You can identify a particular TCP or UDP conversation by filtering for
the 5-tuple of the connection:
Source IP
Source Port
Destination IP
Destination Port
Transport (TCP or UDP)
As Shane mentioned, this is protocol dependent e.g. ICMP does not have the concept of
ports like TCP and UDP do.
A libpcap filter like the following would work for TCP and UDP:
tcp and host 1.1.1.1 and port 53523 and dst ip 1.1.1.2 and port 80
Apply it with tcpdump:
$ tcpdump -nnr myfile.pcap 'tcp and host 1.1.1.1 and port 53523 and dst ip 1.1.1.2 and port 80'
To identify the first completed connection you will have to follow the timestamps.
Using a tool like Bro to read a PCAP would yield the answer as it will list each connection
attempt seen (complete or incomplete):
$ bro -r myfile.pcap
$ bro-cut -d < conn.log | head -1
2014-03-14T10:00:09-0500 CPnl844qkZabYchIL7 1.1.1.1 57596 1.1.1.2 80 tcp http 0.271392 248 7775 SF F ShADadfF 14 1240 20 16606 (empty) US US
Use the flag data for TCP to judge whether there was a successful handshake and tear down.
For other protocols you can make judgements based on byte counts, sent and received.
Identifying the first completed communication is highly protocol specific. You are on the right track with your filters. If your protocol is a commonly used one there are plug ins called protocol analyzers and filters that can locate "conversations" for you from a pcap data stream. If you know approximate start time and end time that would help narrow it down too.

Disable ICMP Host unreachable

I'm using a single raw socket to read UDP packets from local test network with 1024 ports. Each UDP src and dest port is unique and I need access to IP and UDP header fields. I can stream and process data (in and out) at 100 mbps in linux-rt kernel with very low jitter < 250 usec, 10 usec nominal.
I'd like to prevent kernel from issuing ICMP port unreachable errors back to the sending host, however, I don't want to create 1024 vanilla UDP sockets and bind to each one because of resource constraints. Currently, I'm using iptables to drop the outbound port unreachable messages. Does anyone know of a way (programmatic using C code) to prevent the ICMP unreachable traffic? Perhaps an IOCTL or socket option? I also tried changing /proc/sys/net/ipv4/icmp_ratelimit but that seemed to have no effect. By default the ratemask is set for dest unreachables and a variety of ratelimit values did not change any behavior that I could see.

Resources