I've been learning about TCP and UDP lately, and I know that ping uses ICMP so I'm trying to understand that too. My understanding is that when the command ping google.com is run, your computer sends an echo request ICMP packet over IP to google, and then google responds with an echo reply message.
My question is, when a server responds with that echo reply message, what is actually taking care of that? Is it the operating system? Is it a particular application? Or is it something else entirely?
Its the Kernel module which responds the ICMP requests. The ICMPv4 module is net/ipv4/icmp.c.
The ICMP module defines a table of array on how to handle various ICMP requests with object being icmp_objects, named icmp_pointers which is indexed by ICMP message type.
ICMP control structure:
struct icmp_control {
void (*handler)(struct sk_buff *skb);
short error; /* This ICMP is classed as an error message */
};
static const struct icmp_control icmp_pointers[NR_ICMP_TYPES + 1] = {
...
[ICMP_ECHO] = {
.handler = icmp_echo,
},
...
};
From above struct, when you send a echo request to google.com server the message type will be icmp_echo boiling down to subroutine call icmp_echo() which handles echo (ping) requests (ICMP_ECHO) by sending echo replies (ICMP_ECHOREPLY) with
icmp_reply().
In terms of the TCP/IP reference model it is the Network layer of the protocol stack, which is normally in the kernel.
Related
I have asked one question already related what is the purpose of payload in Packet Too Big ICMPv6 message. why_PTB_payload
As per the latest IPv6 logo certification I have came across this test case where a sender send ICMPV6 echo request message to destination having some routers in between and the receiver (destination) received the echo request.
then,
Receiver send ICMPv6 echo reply but got ICMPV6 packet too big message from router in between with wrong/forged echo reply header appending to the (PTB) packet too big message as a payload. (Deliberately sending wrong payload).
Again the ICMPv6 echo request sent by the sender, now the receiver start fragmenting the echo reply because of the above step1 (i.e without validating the PTB payload the Receiver changed it's MTU value).
According to the test case we should not change the MTU on receiving the wrong or forged payload on PTB message (payload will be the original packet that cannot be forwarded due to less Path MTU)
But this look unnecessary work to validate the PTB for echo reply and seems no proper use-case, this test case look invalid to me as we never store the state of echo reply sent by the kernel to validate the PTB in case PTB carry the wrong ICMPv6 reply header on it's payload.
If this is the valid case then I want to know the logic how we can implement this or if not then why this test case is even there at the IPV6 logo certification.
Link to the document containing the test case IPv6LogoCertificationTestCases (Test case number v6LC.4.1.12)
I'm injecting ICMP "Fragmentation needed, DF bit set" into the server and ideally server should start sending packets with the size mentioned in the field 'next-hop MTU' in ICMP. But this is not working.
Here is the server code:
#!/usr/bin/env python
import socket # Import socket module
import time
import os
range= [1,2,3,4,5,6,7,8,9]
s = socket.socket() # Create a socket object
host = '192.168.0.17' # Get local machine name
port = 12349 # Reserve a port for your service.
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind((host, port)) # Bind to the port
rand_string = os.urandom(1600)
s.listen(5) # Now wait for client connection.
while True:
c, addr = s.accept() # Establish connection with client.
print 'Got connection from', addr
for i in range:
c.sendall(rand_string)
time.sleep(5)
c.close()
Here is the client code:
#!/usr/bin/python # This is client.py file
import socket # Import socket module
s = socket.socket() # Create a socket object
host = '192.168.0.17' # Get local machine name
port = 12348 # Reserve a port for your service.
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.connect((host, port))
while 1:
print s.recv(1024)
s.close()
Scapy to inject ICMP:
###[ IP ]###
version= 4
ihl= None
tos= 0x0
len= None
id= 1
flags= DF
frag= 0
ttl= 64
proto= ip
chksum= None
src= 192.168.0.45
dst= 192.168.0.17
\options\
###[ ICMP ]###
type= dest-unreach
code= fragmentation-needed
chksum= None
unused= 1300
Send(ip/icmp)
Unused field shows as next-hop MTU in wireshark. Is server smart enough to check that DF Bit was not set when it was communicating with client and it is still receiving ICMP "Fragmentation needed, DF bit set" message? If it is not then why is server not reducing its packet size from 1500 to 1300?
First of all, let's answer your first question (is ICMP sent over TCP?).
ICMP runs directly over IP, as specified in RFC 792:
ICMP messages are sent using the basic IP header.
This can be a bit confusing as ICMP is classified as a network layer protocol rather than a transport layer protocol but it makes sense when taking into account that it's merely an addition to IP to carry error, routing and control messages and data. Thus, it can't rely on the TCP layer to transfer itself since the TCP layer depends on the IP layer which ICMP helps to manage and troubleshoot.
Now, let's deal with your second question (How does TCP come to know about the MTU if ICMP isn't sent over TCP?). I've tried to answer this question to the best of my understanding, with reliance on official specifications, but perhaps the best approach would be to analyze some open source network stack implementation in order to see what's really going on...
The TCP layer may come to know of the path's MTU value even though the ICMP message is not layered upon TCP. It's up to the implementation of OS the network stack to notify the TCP layer of the MTU so it can then use this value to update its MSS value.
RFC 1122 requires that the ICMP message includes the IP header as well as the first 8 bytes of the problematic datagram that triggered that ICMP message:
Every ICMP error message includes the Internet header and at least the first 8 data octets of the datagram that triggered the error; more than 8 octets MAY be sent; this header and data MUST be unchanged from the received datagram.
In those cases where the Internet layer is required to pass an ICMP error message to the transport layer, the IP protocol number MUST be extracted from the original header and used to select the appropriate transport protocol entity to handle the error.
This illustrates how the OS can pinpoint the TCP connection whose MSS should be updated, as these 8 bytes include the source and destination ports.
RFC 1122 also states that there MUST be a mechanism by which the transport layer can learn the maximum transport-layer message size that may be sent for a given {source, destination, TOS} triplet. Therefore, I assume that once an ICMP Fragmentation needed and DF set error message is received, the MTU value is somehow made available to the TCP layer that can use it to update its MSS value.
Furthermore, I think that the application layer that instantiated the TCP connection and taking use of it may handle such messages as well and fragment the packets at a higher level. The application may open a socket that expects ICMP messages and act accordingly when such are received. However, fragmenting packets at the application layer is totally transparent to the TCP & IP layers. Note that most applications would allow the TCP & IP layers to handle this situation by themselves.
However, once an ICMP Fragmentation needed and DF set error message is received by a host, its behavior as dictated by the lower layers is not conclusive.
RFC 5927, section 2.2 refers to RFC 1122, section 4.2.3.9 which states that TCP should abort the connection when an ICMP Fragmentation needed and DF set error message is passed up from the IP layer, since it signifies a hard error condition. The RFC states that the host should implement this behavior, but it is not a must (section 4.2.5). This RFC also states in section 3.2.2.1 that a Destination Unreachable message that is received MUST be reported to the TCP layer. Implementing both of these would result in the destruction of a TCP connection when an ICMP Fragmentation needed and DF set error message is received on that connection, which doesn't make any sense, and is clearly not the desired behavior.
On the other hand, RFC 1191 states this in regard to the required behavior:
RFC 1191 does not outline a specific behavior that is expected from the sending
host, because different applications may have different requirements, and
different implementation architectures may favor different strategies [This
leaves a room for this method-OA].
The only required behavior is that a host must attempt to avoid sending more
messages with the same PMTU value in the near future. A host can either
cease setting the Don't Fragment bit in the IP header (and allow
fragmentation by the routers in the way) or reduce the datagram size. The
better strategy would be to lower the message size because fragmentation
will cause more traffic and consume more Internet resources.
For conclusion, I think that the specification is not definitive in regard to the required behavior from a host upon receipt of an ICMP Fragmentation needed and DF set error message. My guess is that both layers (IP & TCP) are notified of the message in order to update their MTU & MSS values, respectively and that one of them takes upon the responsibility of retransmitting the problematic packet in smaller chunks.
Lastly, regarding your implementation, I think that for full compliance with RFC 1122, you should update the ICMP message to include the IP header of the problematic packet, as well as its next 8 bytes (though you may include more than just the first 8 bytes). Moreover, you should verify that the ICMP message is received before the corresponding ACK for the packet to which that ICMP message refers. In fact, just in order to be on the safe side, I would abolish that ACK altogether.
Here is a sample implementation of how the ICMP message should be built. If sending the ICMP message as a response to one of the TCP packets fails, I suggest you try sending the ICMP message before even receiving the TCP packet to which it relates at first, in order to assure it is received before the ACK. Only if that fails as well, try abolishing the ACK altogether.
The way i understand it, the host receives a "ICMP Fragmentation needed and DF set" but the message can come from a intermediate device(router) in the path, thus the host cant directly matched the icmp response with a current session, the icmp only contains the destination ip and mtu limit.
The host then adds a entry to the routing table for the destination ip that records the route and mtu with a expiry of 10min.
This can be observed on linux by asking for the specific route with ip route get x.x.x.x after doing a tracepath or ping that triggers the icmp response.
$ ip route get 10.x.y.z
10.z.y.z via 10.a.b.1 dev eth0 src 10.a.b.100
cache expires 598sec mtu 1300
I am writing a small application in QT that sends a UDP packet broadcast over the local network and waits for a UDP responce packet from one or more devices over the network.
Creating the sockets and sending the broadcast packet.
udpSocketSend = new QUdpSocket(this);
udpSocketGet = new QUdpSocket(this);
bcast = new QHostAddress("192.168.1.255");
udpSocketSend->connectToHost(*bcast,65001,QIODevice::ReadWrite);
udpSocketGet->bind(udpSocketSend->localPort());
connect(udpSocketGet,SIGNAL(readyRead()),this,SLOT(readPendingDatagrams()));
QByteArray *datagram = makeNewDatagram(); // data from external function
udpSocketSend->write(*datagram);
The application sends the packet properly and the response packet arrives but the readPendingDatagrams() function is never called. I have verified the packets are sent and received using Wireshark and that the application is listening on the port indicated in wireshark using Process Explorer.
I solved the problem. Here is the solution.
udpSocketSend = new QUdpSocket(this);
udpSocketGet = new QUdpSocket(this);
host = new QHostAddress("192.168.1.101");
bcast = new QHostAddress("192.168.1.255");
udpSocketSend->connectToHost(*bcast,65001);
udpSocketGet->bind(*host, udpSocketSend->localPort());
connect(udpSocketGet,SIGNAL(readyRead()),this,SLOT(readPendingDatagrams()));
QByteArray *datagram = makeNewDatagram(); // data from external function
udpSocketSend->write(*datagram);
The device on the network listens on port 65001 and responds to packets on the source port of the received packet. It is necessary to use connectToHost(...) in order to know what port to bind for the response packet.
It is also necessary to bind to the correct address and port to receive the packets. This was the problem.
You're binding your udpSocketSend in QIODevice::ReadWrite mode. So that's the object that's going to be receiving the datagrams.
Try one of:
binding the send socket in write only mode, and the receive one in receive only mode
using the same socket for both purposes (remove udpSocketGet entirely).
depending on your constraints.
For me, changing the bind from
udpSocket->bind(QHostAddress::LocalHost, 45454);
to simple
udpSocket->bind(45454);
does the trick!
so i'm trying to send udp packet to a listening port on a computer which is not connected to the same LAN but has internet access via gen_udp in erlang.
I start my first node by opening the port
({ok, Socket} = gen_udp:open(8887).) and the open the port on the other node the same way, When i send a packet from one node to the other via gen_udp:send i don't receive anything (trying flush() on the receiving node), So i'm wondering if there is something i'm doing wrong ? , i checked the firewalls and erlang and epmd is permitted.
did you try setting the controlling process of the Socket as the current process via :
gen_udp:controlling_process(Socket,Pid) ?
You should then setup a receive loop and messages will be sent to you. The format of the messages should be : {udp, Socket, IP, InPortNo, Packet}
You could also try setting the socket to passive mode by using inet:setopts(Socket, [{active, false}]) after you have opened it. After which you can use 'gen_udp:recv/3` to read from the socket.
If a socket is bound to IN6ADDR_ANY or INADDR_ANY and you use a call such as recvfrom() to receive messages on the socket. Is there a way to find out which interface the message came from?
In the case of IPv6 link-scope messages, I was hoping that the from argument of recvfrom() would have the scope_id field initialized to the interface Id. Unfortunately it is set to 0 in my test program.
Anybody know of a way to find out this information?
dwc is right, IPV6_PKTINFO will work for IPv6 on Linux.
Moreover, IP_PKTINFO will work for IPv4 — you can see details in manpage ip(7)
I've constructed an example that extracts the source, destination and interface addresses. For brevity, no error checking is provided. See this duplicate: Get destination address of a received UDP packet.
// sock is bound AF_INET socket, usually SOCK_DGRAM
// include struct in_pktinfo in the message "ancilliary" control data
setsockopt(sock, IPPROTO_IP, IP_PKTINFO, &opt, sizeof(opt));
// the control data is dumped here
char cmbuf[0x100];
// the remote/source sockaddr is put here
struct sockaddr_in peeraddr;
// if you want access to the data you need to init the msg_iovec fields
struct msghdr mh = {
.msg_name = &peeraddr,
.msg_namelen = sizeof(peeraddr),
.msg_control = cmbuf,
.msg_controllen = sizeof(cmbuf),
};
recvmsg(sock, &mh, 0);
for ( // iterate through all the control headers
struct cmsghdr *cmsg = CMSG_FIRSTHDR(&mh);
cmsg != NULL;
cmsg = CMSG_NXTHDR(&mh, cmsg))
{
// ignore the control headers that don't match what we want
if (cmsg->cmsg_level != IPPROTO_IP ||
cmsg->cmsg_type != IP_PKTINFO)
{
continue;
}
struct in_pktinfo *pi = CMSG_DATA(cmsg);
// at this point, peeraddr is the source sockaddr
// pi->ipi_spec_dst is the destination in_addr
// pi->ipi_addr is the receiving interface in_addr
}
Apart from binding to each interface, I'm not aware of a way with IPv4, per se.
IPv6 has added the IPV6_PKTINFO socket option to address this shortcoming. With that option in effect, a struct in6_pktinfo will be returned as ancillary data.
Its been a while since I've been doing C/C++ TCP/IP coding but as far as I remember on every message (or derived socket) you can get into the IP headers information. These headers should include the receiving address which will be the IP of the interface you are asking about.
Outside of opening a separate socket on each interface as Glomek suggested, the only way I know to do this definitively on Windows is to use a raw socket, e.g.,
SOCKET s = socket(AF_INET, SOCK_RAW, IPPROTO_IP);
Each receive from this socket will be an IP packet, which contains both the source and destination addresses. The program I work on requires me to put the socket in promiscuous mode using the SIO_RCVALL option. Doing this means I get every IP packet the interface "sees" on the network. To extract packets expressly for my application requires me to filter the data using the addresses and ports in the IP and TCP/UDP headers. Obviously, that's probably more overhead than you're interested in. I only mention it to say this - I've never used a raw socket without putting it in promiscuous mode. So I'm not sure if you can bind it to INADDR_ANY and just use it as a regular socket from that point forward or not. It would seem to me that you can; I've just never tried it.
EDIT: Read this article for limitations regarding raw sockets on Windows. This biggest hurdle I faced on my project was that one has to be a member of the Administrators group to open a raw socket on Windows 2000 and later.