Why do I need source port on UDP - tcp

When I use TCP I need destination port (to be able to "talk" to other process on the other host) and source port (because TCP is connection oriented so I'll send data back to source like ack, seq and more).
On the other side, UDP which is connectionless needs also source port.
Why is it? (I don't need to send back data)

Probably, two reasons.
First, receivers often need to reply and it is useful to provision a standard tool for that.
Secondly, you may have multiple interfaces (network cards) and using source address, you decide which of them must be used to emit the packet.

You don't need to but there's still the possibility to send a response back (that is very useful actually) however as stated in the RCF 768
Source Port is an optional field, when meaningful, it indicates the port
of the sending process, and may be assumed to be the port to which a
reply should be addressed in the absence of any other information. If
not used, a value of zero is inserted.
https://www.rfc-editor.org/rfc/rfc768

I would like to add to the answers here. Apart from simply knowing what to reply to, the source port can belong to the list of well-known port numbers. These ports specify what kind of data is encapsulated in the UDP (or TCP!) packet.
For example, the source port 530 indicates that the packet contains a Remote Procedure Call, and 520 indicates a Routing Information Protocol packet.

Related

Why is it possible to use the same port on TCP and UDP at the same time?

I've seen while searching that it is possible to use two different programs on the same computer communicating over the network using the same port and same network interface provided one use UDP and the other TCP. However I didn't get a good explanation, how does it actually work and why this is possible?
Is it also possible for multiple programs to use the same UDP port since UDP does not establish a real connection between the peers, but just sends the packets to an address? I understand it's not possible with TCP as it creates a synchronized connection between the server and the client, but what about UDP?
Please explain in details if possible, or link a good article on the topic.
The other answers are correct but somewhat incomplete.
An IP (aka "INET") socket "connection" (i.e. communication between two processes, possibly on different machines) is defined by a 5-tuple: protocol, source address, source port, destination address, destination port. You can see that this is not limited to a stateful connection such as TCP.
This means that you can bind different processes to any unique instance of that 5-tuple. Because the "protocol" (e.g. TCP and UDP) is part of the differentiating factor, each can have a different process.
Theoretically, you could bind different services to the same TCP port if they bind to different interfaces (network cards, loopback, etc.) though I've never tried it.
It is standard practice, however, to always use the same service on the same port number. If both UDP and TCP are supported, they're just different ways of communicating with that same service. DNS, for example, uses UDP on port 53 for lookup because they are small requests and it's faster than creating a TCP connection but DNS also uses TCP on port 53 for "transfers" which are infrequent and can have large amounts of data.
Lastly, in complete accuracy, it isn't necessarily a 5-tuple. IP uses the "protocol" to pass to the next layer such as TCP and UDP though there are others. TCP and UDP each seperately differentiate connections based on the remaining 4 items. It's possible to create other protocols on top of IP that use completely different (perhaps port-less) differentiation mechanisms.
And then there are different socket "domains", such as the "unix" socket domain, which is completely distinct from "inet" and uses the filesystem for addressing.
The destination isn't identified by IP Addr:Port alone. There is another thing - IP header has a field called Protocol which differentiates the TCP and UDP endpoint. As such it becomes possible for two process to bind to same IP:Port as long as communication protocol is different.
The endpoint of a connection is for UDP and TCP defined by IP, protocol (TCP or UDP) and port. This means as long as you use a different protocol the endpoint of the communication is different too.
Because they are not the only component of the means of address. It's the same as why you can have two houses with the same number on different streets, or why you know John Whorfin is not the same Red Lectroid as John Bigbooté.
Each IP packet contains a field that says which transport-layer protocol is to be used, and within the domain of that protocol is a set of ports that can be the same as in any other protocol because they are actually a completely separate set.
As for the second question, there are answers elsewhere.

Confused between ports and sockets

Ok so when I tried to do research on ip addresses, ports, and sockets, this is what I got out of it:
IP Addresses are used to map to different devices over a network.
Port numbers are used to get to the specific application on the hosts.
Sockets are a combination of the two..
What I don't understand is that if ports connect you to a specific application, you should only have 1 port number per application right? But for example port 80 is used for HTTP, so if an application is using that port it's listening to HTTP requests right? So what happens if more than one person tries to access it? Sockets and ports have me confused a lot..
A socket is an abstraction used in software to make it easier for programmers to send and receive data through networks. They are an interface, which you use in application-level code, to access the underlying network protocol implementations provided by your OS and language runtime.
The TCP protocol, IP protocol, and other popular network protocols do not, in of themselves, have any concept of "sockets". "Sockets" are a concept which implementers of TCP/IP came up with.
So what is the concept of a "socket"? Basically, an object which you can write data to, and read data from. "Opening" a socket means creating one of those objects in your program's memory. You can also "close" a socket, which means freeing any system resources which that object uses behind the scenes.
Some kinds of sockets can be "bound" to local and remote addresses, which you can think of as setting some data fields, or properties, on the socket object. The value of those fields affect what happens when you read from or write to the socket.
In Unix, there are various kinds of sockets. If you "open" a TCP socket, "bind" it to local and remote addresses (and ports), and write some data into it, your libraries/OS will package that data up into a TCP segment and send it out through whichever network interface matches the local address which you "bound" the socket to. If you "open" an IP socket, and write some data to it, that data will be packaged up into a IP packet (without any added TCP headers) and sent out. If you open a "raw", link-level socket, and write to it, the data will be sent out as the payload of a link-level frame, minus IP and TCP headers. There are also "Unix domain sockets". If you open one of those and write to it, the data will be passed directly through system memory to another process on the same machine.
So although they are often used in non-OO languages like C, sockets are a perfect example of what OO languages call "polymorphism". If you ever have trouble explaining what "polymorphism" is to someone, just teach them about network sockets.
"Ports" are a completely different concept. The idea of "ports" is built in to TCP and other transport protocols.
Others may give more high-falutin', and perhaps more technically accurate, definitions of a "port". Here is one which is totally down to earth:
A "port" is a number which appears in the TCP headers on a TCP segment. (Or the UDP headers on a UDP segment.)
Just a number. Nothing more, nothing less.
If you are using a "socket"-based interface to do network programming, the significance of that number is that each of your TCP or UDP sockets has a "local port" property, and a "remote port" property. As I said before, setting those properties is called "binding".
If your socket's "local port" property is "bound" to 80, then all the TCP segments you send out will have "80" in the "sender port" header. Then, when others respond to your messages, they will put "80" in their "destination port" headers.
More than that, if your socket is "bound" to local port 80, then when data arrives from elsewhere, addressed to your port 80, the OS will pass it to your application process and not any other. Then, when you try to read from the socket, that data will be returned.
Obviously, the OS needs to know what port each of your sockets is bound to. So when "binding", system calls must be made. If your program is not running with sufficient privileges, the OS may refuse to let you bind to a certain port. Then, depending on the language you are using, your networking library will throw an exception, or return an error code.
Sometimes the OS may refuse to let you bind to a certain port, not because you don't have the right privileges, but because another process has already bound to it. However, and this is what some of the other answers get wrong, if certain flags are set when opening a socket, your OS may allow more than one socket to be bound to the same local address and port.
You still don't know what "listening" and "connected" sockets are. But once you understand the above, that will just be a small jump.
The above explains the difference between what we today call a "socket" and what we call a "port". What may still not be clear is: why do we need to make that distinction?
You have really got me thinking here (thank you)! Could we call the software abstraction which is called a "socket" a "port" instead, so that instead of calling socket_recv you would call port_recv?
If you are only interested in TCP and UDP, maybe that would work. Remember, the "socket" abstraction is not only for TCP and UDP. It is also for other network protocols, as well as for inter-process communication on the same machine.
Then again, a TCP socket does not only map to a port. A "connected" TCP socket maps to a local IP address, local port, remote address, and remote port. It also has other associated data, including various flags, send and receive buffers, sequence numbers for the incoming/outgoing data streams, and various other variables used for congestion control (rate limiting), etc. That data does not belong just to a local port.
There can be thousands of TCP connections going simultaneously through the same "port". Each of those connections has its own associated data, and the software object which encapsulates that per-connection data is a "TCP socket".
Even if you only use TCP/UDP, and even if you only have a single process using any given local port at one time, and even if you only have a single connection going through each local port at one time, I think the "socket" abstraction still makes sense. If we just called sockets "ports", there would be more meanings conflated in that one word. Reusing the same word for too many meanings hinders communication.
"Ports" are transport-protocol level identifiers for an application process. "Sockets" are the objects used in software to send/receive messages which are addressed from/to those identifiers.
Differentiating between "my address" and "the thing which sends letters addressed as coming from me" is a useful distinction to make. "My address" is just a label. A label is not something active, which does things like sending data. It is logical to give "the thing which is used to send data" its own name, different from the name which denotes "the sender address which the data is labelled with".
When application (say web server like Apache or Nginx) is listening on say port 80, it creates so called listening socket.
When some client comes, this listening socket gets update (which can be noticed via select or poll API), and our application creates communication socket. This socket is uniquely identified by tuple (src_addr, src_port, dst_addr, dst_port) - it is very much possible that many clients will have exact same (dst_addr, dst_port) combination.
Then our web server can talk over that communication socket to deliver say web page and eventually close this socket. When many clients come in parallel, web server can either create thread/process per client (Apache model), or service all sockets one by one (Nginx model).
Note that in this situation only one listening socket per port can exist - multiple application cannot bind to the same port like 80. But, it is perfectly ok to have many communication sockets (some people report successfully serving more than a million simultaneous requests).
Every time you accept a connection on a socket in listening state (e.g. on port 80), you will get a new socket in established state that represents a connection.
On the client side, each time a new connection (new socket that is being connected) is being made with that address and port, the operating system will assign a random port on your side.
For example if you connect two times:
your-host:22482 <---> remote-host:80
your-host:23366 <---> remote-host:80

Wireshark physical packet

How does wireshark interpret physical packets?
As far as I know, all packets look to be the same, so how does it decode them to pass to next higher protocol?
When it's used to capture live traffic it knows the type of the interface and therefore the L2 encapsulation of packets, and when it reads a pcap file, the file has a field in the header indicating network type.
There are probably a number of different mechanisms. You can download the dissectors and study the source to find out the various methods.
I wrote a dissector for a network sniffer and ported it to Ethereal and then Wireshark (or maybe someone else ported it; I don't remember). But the basic logic is that the dissector gets added to the list of possible dissectors. Wireshark calls a dissector and it decodes the packet if it can. If not, it calls the next one in the chain.
In the code I wrote, I simply analyzed the packet (UDP in my situation) to determine if it fit the profile of the desired packet using checksums and known data in the packet. If it decided it was the packet I was interested in I just extracted the various pieces of interesting data from the packet. The function tvb_get_ptr returns a pointer to the start of the data.

Why do we say the IP protocol in TCP/IP suite is connectionless?

Why is the IP called a connectionless protocol? If so, what is the connection-oriented protocol then?
Thanks.
Update - 1 - 20:21 2010/12/26
I think, to better answer my question, it would be better to explain what "connection" actually means, both physically and logically.
Update - 2 - 9:59 AM 2/1/2013
Based on all the answers below, I come to the feeling that the 'connection' mentioned here should be considered as a set of actions/arrangements/disciplines. Thus it's more an abstract concept rather than a concrete object.
Update - 3 - 11:35 AM 6/18/2015
Here's a more physical explanation:
IP protocol is connectionless in that all packets in IP network are routed independently, they may not necessarily go through the same route, while in a virtual circuit network which is connection oriented, all packets go through the same route. This single route is what 'virtual circuit' means.
With connection, because there's only 1 route, all data packets will arrive in the same order as they are sent out.
Without connection, it is not guaranteed all data packets will arrive
in the same order as they are sent out.
Update - 4 - 9:55 AM 2016/1/20/Wed
One of the characteristics of connection-oriented is that the packet order is preserved. TCP use a sequence number to achieve that but IP has no such facility. Thus TCP is connection-oriented while IP is connection-less.
The basic idea is pretty simple: with IP (on its own -- no TCP, UDP, etc.) you're just sending a packet of data. You simply send some data onto the net with a destination address, but that's it. By itself, IP gives:
no assurance that it'll be delivered
no way to find out if it was
nothing to let the destination know to expect a packet
much of anything else
All it does is specify a minimal packet format so you can get some data from one point to another (e.g., routers know the packet format, so they can look at the destination and send the packet on its next hop).
TCP is connection oriented. Establishing a connection means that at the beginning of a TCP conversation, it does a "three way handshake" so (in particular) the destination knows that a connection with the source has been established. It keeps track of that address internally, so it can/will/does expect more packets from it, and be able to send replies to (for example) acknowledge each packet it receives. The source and destination also cooperate to serial number all the packets for the acknowledgment scheme, so each end knows whether packets it sent were received at the other end. This doesn't involve much physically, but logically it involves allocating some memory on both ends. That includes memory for metadata like the next packet serial number to use, as well as payload data for possible re-transmission until the other side acknowledges receipt of that packet.
TCP/IP means "TCP over IP".
TCP
--
IP
TCP provides the "connection-oriented" logic, ordering and control
IP provides getting packets from A to B however it can: "connectionless"
Notes:
UDP is connection less but at the same level as TCP
Other protocols such as ICMP (used by ping) can run over IP but have nothing to do with TCP
Edit:
"connection-oriented" mean established end to end connection. For example, you pick up the telephone, call someone = you have a connection.
"connection-less" means "send it, see what happens". For example, sending a letter via snail mail.a
So IP gets your packets from A to B, maybe, in any order, not always eventually. TCP sorts them out, acknowledges them, requests a resends and provides the "connection"
Connectionless means that no effort is made to set up a dedicated end-to-end connection, While Connection-Oriented means that when devices communicate, they perform handshaking to set up an end-to-end connection.
IP is an example of the Connectionless protocols , in this kind of protocols you usually send informations in one direction, from source to destination without checking to see if the destination is still there, or if it is prepared to receive the information . Connectionless protocols (Like IP and UDP) are used for example with the Video Conferencing when you don't care if some packets are lost , while you have to use a Connection-Oriented protocol (Like TCP) when you send a File because you want to insure that all the packets are sent successfully (actually we use FTP to transfer Files). Edit :
In telecommunication and computing in
general, a connection is the
successful completion of necessary
arrangements so that two or more
parties (for example, people or
programs) can communicate at a long
distance. In this usage, the term has
a strong physical (hardware)
connotation although logical
(software) elements are usually
involved as well.
The physical connection is layer 1 of
the OSI model, and is the medium
through which the data is transfered.
i.e., cables
The logical connection is layer 3 of
the OSI model, and is the network
portion. Using the Internetwork
Protocol (IP), each host is assigned a
32 bit IP address. e.g. 192.168.1.1
TCP is the connection part of TCP/IP. IP's the addressing.
Or, as an analogy, IP is the address written on the envelope, TCP is the postal system which uses the address as part of the work of getting the envelope from point A to point B.
When two hosts want to communicate using connection oriented protocol, one of them must first initiate a connection and the other must accept it. Logically a connection is made between a port in one host and other port in the other host. Software in one host must perform a connect socket operation, and the other must perform an accept socket operation. Physically the initiator host sends a SYN packet, which contains all four connection identifying numbers (source IP, source port, destination IP, destination port). The other receives it and sends SYN-ACK, the initiator sends an ACK, then the connection are established. After the connection established, then the data could be transferred, in both directions.
In the other hand, connectionless protocol means that we don't need to establish connection to send data. It means the first packet being sent from one host to another could contain data payloads. Of course for upper layer protocols such as UDP, the recipient must be ready first, (e.g.) it must perform a listen udp socket operation.
The connectionless IP became foundation for TCP in the layer above
In TCP, at minimal 2x round trip times are required to send just one packet of data. That is : a->b for SYN, b->a for SYN-ACK, a->b for ACK with DATA, b->a for ACK. For flow rate control, Nagle's algorithm is applied here.
In UDP, only 0.5 round trip times are required : a->b with DATA. But be prepared that some packets could be silently lost and there is no flow control being done. Packets could be sent in the rate that are larger than the capability of the receiving system.
In my knowledge, every layer makes a fool of the one above it. The TCP gets an HTTP message from the Application layer and breaks it into packets. Lets call them data packets. The IP gets these packets one by one from TCP and throws it towards the destination; also, it collects an incoming packet and delivers it to TCP. Now, TCP after sending a packet, waits for an acknowledgement packet from the other side. If it comes, it says the above layer, hey, I have established a connection and now we can communicate! The whole communication process goes on between the TCP layers on both the sides sending and receiving different types of packets with each other (such as data packet, acknowledgement packet, synchronization packet , blah blah packet). It uses other tricks (all packet sending) to ensure the actual data packets to be delivered in ordered as they were broken and assembled. After assembling, it transfers them to the above application layer. That fool thinks that it has got an HTTP message in an established connection but in reality, just packets are being transferred.
I just came across this question today. It was bouncing around in my head all day and didn't make any sense. IP doesn't handle transport. Why would anyone even think of IP as connectionless or connection oriented? It is technically connectionless because it offers no reliability, no guaranteed delivery. But so is my toaster. My toaster offers no guaranteed delivery, so why not call aa toaster connectionless too?
In the end, I found out it's just some stupid title that someone somewhere attached to IP and it stuck, and now everyone calls IP connectionless and has no good reason for it.
Calling IP connectionless implies there is another layer 3 protocol that is connection oriented, but as far as I know, there isn't and it is just plain stupid to specify that IP is connectionless. MAC is connectionless. LLC is connectionless. But that is useless, technically correct info.

sending multiple tcp packets in an ip packet

is it possible to send multiple tcp or udp packets on a single ip packet? are there any specifications in the protocol that do not allow this.
if it is allowed by the protocol but is generally not done by tcp/udp implementations could you point me to the relevant portion in the linux source code that proves this.
are there any implementations of tcp/udp on some os that do send multiple packets on a single ip packet. (if it is allowed).
It is not possible.
The TCP seqment header does not describe its length. The length of the TCP payload is derived from the length of the IP packet(s) minus the length of the IP and TCP headers. So only one TCP segment per IP packet.
Conversely, however, a single TCP segment can be fragmented over several IP packets by IP fragmentation.
Tcp doesn't send packets: it is a continuous stream. You send messages.
Udp, being packet based, will only send one packet at a time.
The protocol itself does not allow it. It won't break, it just won't happen.
The suggestion to use tunneling is valid, but so is the warning.
You might want to try tunneling tcp over tcp, although it's generally considered a bad idea. Depending on your needs, your mileage may vary.
You may want to take a look at the Stream Control Transmission Protocol which allows multiple data streams across a single TCP connection.
EDIT - I wasn't aware that TCP doesn't have it's own header field so there would be no way of doing this without writing a custom TCP equivalent that contains this info. SCTP may still be of use though so I'll leave that link.
TCP is a public specification, why not just read it?
RFC4164 is the roadmap document, RFC793 is TCP itself, and RFC1122 contains some errata and shows how it fits together with the rest of the (IPv4) universe.
But in short, because the TCP header (RFC793 section 3.1) does not have a length field, TCP data extends from the end of the header padding to the end of the IP packet. There is nowhere to put another data segment in the packet.
You cannot pack several TCP packets into one IP packet - that is a restriction of specification as mentioned above. TCP is the closest API which is application-oriented. Or you want to program sending of raw IP messages? Just tell us, what problem do you want to solve. Think about how you organize the delivery of the messages from one application to another, or mention that you want to hook into TCP/IP stack. What I can suggest you:
Consider packing whatever you like into UDP packet. I am not sure, how easy is to initiate routing of "unpacked" TCP packages on remote side.
Consider using PPTP or similar tunnelling protocol.

Resources