What defines the max integer value for a URI port? - uri

RFC 3986 defines the port of a URI as below -- zero or more digits. http/https URLs are defined to dereference over TCP, so ports would be limited to [1,65535], but in the general case of URIs of various schemes I'm having trouble finding a clear maximum.
For context, I'm writing a library that parses and handles URIs, and I want to make sure the library is sufficiently general.
3.2.3. Port
The port subcomponent of authority is designated by an optional
port number in decimal following the host and delimited from it by
a single colon (":") character.
port = *DIGIT
A scheme may define a default port. For example, the "http" scheme
defines a default port of "80", corresponding to its reserved TCP
port number. The type of port designated by the port number (e.g.,
TCP, UDP, SCTP) is defined by the URI scheme. URI producers and
normalizers should omit the port component and its ":" delimiter if
port is empty or if its value would be the same as that of the
scheme's default.

For TCP, UDP and SCTP port range is the same as your original message - [1, 65535]. That can be validated with appropriate RFCs (TCP, UDP, SCTP).
If you want to support any other transport protocols you need to look into their specification to see the range. However quite likely it will be the same.

Related

Does the Internet Protocol do Routing?

Many sources in the web have different opinion on this. It is obvious that the Internet Protocol speicfied in RFC 791 is responsible for addressing host interfaces, encapsulating data into datagrams (including fragmentation and reassembly). But what is about routing? Is this the function of IP or is this realized by the protocols RIP, OSPF nad BGP?
The word "routing" has two closely related meanings:
1) As RFC 791 section 1.4 says "The selection of a path for transmission is called routing." When a layer 3 packet (IP datagram) arrives on an incoming interface, the router does a longest-prefix-match lookup in the routing table to decide on which outgoing interface and next-hop the packet should be forwarded.
2) The act of filling the routing tables, by running some routing protocol such as RIP, OSPF, or BGP is also called "routing".
The former is often done in hardware, and the latter is done in software.
When the difference matters, the former is often called forwarding (hence "FIB" for Forwarding Information Base) and the latter is called routing (hence "RIB" for Routing Information Base).

Why do I need source port on UDP

When I use TCP I need destination port (to be able to "talk" to other process on the other host) and source port (because TCP is connection oriented so I'll send data back to source like ack, seq and more).
On the other side, UDP which is connectionless needs also source port.
Why is it? (I don't need to send back data)
Probably, two reasons.
First, receivers often need to reply and it is useful to provision a standard tool for that.
Secondly, you may have multiple interfaces (network cards) and using source address, you decide which of them must be used to emit the packet.
You don't need to but there's still the possibility to send a response back (that is very useful actually) however as stated in the RCF 768
Source Port is an optional field, when meaningful, it indicates the port
of the sending process, and may be assumed to be the port to which a
reply should be addressed in the absence of any other information. If
not used, a value of zero is inserted.
https://www.rfc-editor.org/rfc/rfc768
I would like to add to the answers here. Apart from simply knowing what to reply to, the source port can belong to the list of well-known port numbers. These ports specify what kind of data is encapsulated in the UDP (or TCP!) packet.
For example, the source port 530 indicates that the packet contains a Remote Procedure Call, and 520 indicates a Routing Information Protocol packet.

What is the largest TCP/IP network port number allowable for IPv4?

What is the highest port number one can use?
The port number is an unsigned 16-bit integer, so 65535.
The largest port number is an unsigned short 2^16-1: 65535
A registered port is one assigned by the Internet Corporation for
Assigned Names and Numbers (ICANN) to a certain use. Each registered
port is in the range 1024–49151.
Since 21 March 2001 the registry agency is ICANN; before that time it
was IANA.
Ports with numbers lower than those of the registered ports are called
well known ports; port with numbers greater than those of the
registered ports are called dynamic and/or private ports.
Wikipedia: Registered Ports
As I understand it, you should only use up to 49151, as from 49152 up to 65535 are reserved for Ephemeral ports
Just a followup to smashery's answer. The ephemeral port range (on Linux at least, and I suspect other Unices as well) is not a fixed. This can be controlled by writing to
/proc/sys/net/ipv4/ip_local_port_range
The only restriction (as far as IANA is concerned) is that ports below 1024 are designated to be well-known ports. Ports above that are free for use.
Often you'll find that ports below 1024 are restricted to superuser access, I believe for this very reason.
According to RFC 793, the port is a 16 bit unsigned int.
This means the range is 0 - 65535.
However, within that range, ports 0 - 1023 are generally reserved for specific purposes. I say generally because, apart from port 0, there is usually no enforcement of the 0-1023 reservation. TCP/UDP implementations usually don't enforce reservations apart from 0. You can, if you want to, run up a web server's TLS port on port 80, or 25, or 65535 instead of the standard 443. Likewise, even tho it is the standard that SMTP servers listen on port 25, you can run it on 80, 443, or others.
Most implementations reserve 0 for a specific purpose - random port assignment. So in most implementations, saying "listen on port 0" actually means "I don't care what port I use, just give me some random unassigned port to listen on".
So any limitation on using a port in the 0-65535 range, including 0, ephemeral reservation range etc, is implementation (i.e. OS/driver) specific, however all, including 0, are valid ports in the RFC 793.
Valid numbers for ports are: 0 to 2^16-1 = 0 to 65535
That is because a port number is 16 bit length.
However ports are divided into:
Well-known ports: 0 to 1023 (used for system services e.g. HTTP, FTP, SSH, DHCP ...)
Registered/user ports: 1024 to 49151 (you can use it for your server, but be careful some famous applications: like Microsoft SQL Server database management system (MSSQL) server or Apache Derby Network Server are already taking from this range i.e. it is not recommended to assign the port of MSSQL to your server otherwise if MSSQL is running then your server most probably will not run because of port conflict )
Dynamic/private ports: 49152 to 65535. (not used for the servers rather the clients e.g. in NATing service)
In programming you can use any numbers 0 to 65535 for your server, however you should stick to the ranges mentioned above, otherwise some system services or some applications will not run because of port conflict.
Check the list of most ports here: https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers
It depends on which range you're talking about, but the dynamic range goes up to 65535 or 2^16-1 (16 bits).
http://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers
It should be 65535.

How do ports work with IPv6?

Conventional IPv4 dotted quad notation separates the address from the port with a colon, as in this example of a webserver on the loopback interface:
127.0.0.1:80
but with IPv6 notation the address itself can contain colons. For example, this is the short form of the loopback address:
::1
How are ports (or their functional equivalent) expressed in a textual representation of an IPv6 address/port endpoint?
They work almost the same as today. However, be sure you include [] around your IP.
For example : http://[1fff:0:a88:85a3::ac1f]:8001/index.html
Wikipedia has a pretty good article about IPv6: http://en.wikipedia.org/wiki/IPv6#Addressing
The protocols used in IPv6 are the same as the protocols in IPv4. The only thing that changed between the two versions is the addressing scheme, DHCP [DHCPv6] and ICMP [ICMPv6]. So basically, anything TCP/UDP related, including the port range (0-65535) remains unchanged.
Edit: Port 0 is a reserved port in TCP but it does exist. See RFC793
Wikipedia points out that the syntax of an IPv6 address includes colons and has a short form preventing fixed-length parsing, and therefore you have to delimit the address portion with []. This completely avoids the odd parsing errors.
(Taken from an edit Peter Wone made to the original question.)
They're the same, aren't they? Now I'm losing confidence in myself but I really thought IPv6 was just an addressing change. TCP and UDP are still addressed as they are under IPv4.
I'm pretty certain that ports only have a part in tcp and udp. So it's exactly the same even if you use a new IP protocol
I would say the best reference is Format for Literal IPv6 Addresses in URL's where usage of [] is defined.
Also, if it is for programming and code, specifically Java, I would suggest this readsClass for Inet6Address java/net/URL definition where usage of Inet4 address in Inet6 connotation and other cases are presented in details. For my case, IPv4-mapped address Of the form::ffff:w.x.y.z, for IPv6 address is used to represent an IPv4 address also solved my problem. It allows the native program to use the same address data structure and also the same socket when communicating with both IPv4 and IPv6 nodes. This is the case on Amazon cloud Linux boxes default setup.

What is the Significance of Pseudo Header used in UDP/TCP

Why is the Pseudo header prepended to the UDP datagram for the computation of the UDP checksum? What's the rational behind this?
The nearest you will get to an answer "straight from the horse's mouth", is from David P. Reed at the following link.
http://www.postel.org/pipermail/end2end-interest/2005-February/004616.html
The short version of the answer is, "the pseudo header exists for historical reasons".
Originally, TCP/IP was a single monolithic protocol (called just TCP). When they decided to split it up into TCP and IP (and others), they didn't separate the two all that cleanly: the IP addresses were still thought of as part of TCP, but they were just "inherited" from the IP layer rather than repeated in the TCP header. The reason why the TCP checksum operates over parts of the IP header (including the IP addresses) is because they intended to use cryptography to encrypt and authenticate the TCP payload, and they wanted the IP addresses and other TCP parameters in the pseudo header to be protected by the authentication code. That would make it infeasible for a man in the middle to tamper with the IP source and destination addresses: intermediate routers wouldn't notice the tampering, but the TCP end-point would when it attempted to verify the signature.
For various reasons, none of that grand cryptographic plan came to pass, but the TCP checksum which took its place still operates over the pseudo header as though it were a useful thing to do. Yes, it gives you a teensy bit of extra protection against random errors, but that's not why it exists. Frankly, we'd be better off without it: the coupling between TCP and IP means that you have to redefine TCP when you change IP. Thus, the definition of IPv6 includes a new definition for the TCP and UDP pseudo header (see RFC 2460, s8.1). Why the IPv6 designers chose to perpetuate this coupling rather than take the chance to abolish it is beyond me.
From the TCP or UDP point of view, the packet does not contain IP addresses. (IP being the layer beneath them.)
Thus, to do a proper checksum, a "pseudo header" is included. It's "pseudo", because it is not actaully part of the UDP datagram. It contains the most important parts of the IP header, that is, source and destination address, protocol number and data length.
This is to ensure that the UDP checksum takes into account these fields.
When these protocols were being designed, a serious concern of theirs was a host receiving a packet thinking it was theirs when it was not. If a few bits were flipped in the IP header during transit and a packet changed course (but the IP checksum was still correct), the TCP/UDP stack of the redirected receiver can still know to reject the packet.
Though the pseudo-header broke the separation of layers idiom, it was deemed acceptable for the increased reliability.
"The purpose of using a pseudo-header is to verify that the UDP
datagram has reached its correct destination. The key to
understanding the pseudo-header lies in realizing that the correct
destination consists of a specific machine and a specific protocol
port within that machine. The UDP header itself specifies only the
protocol port number. Thus, to verify the destination, UDP on the
sending machine computes a checksum that covers the destination IP
address as well as the UDP datagram. The pseudo-header is not
transmitted with the UDP datagram, nor is it included in the length."
E. Comer - Internetworking with TCP/IP 4th edition.
Pseudo IP header contains the source IP, destination IP, protocol and Total length fields. Now, by including these fields in TCP checksum, we are verifying the checksum for these fields both at Network layer and Transport layer, thus doing a double check to ensure that the data is delivered to the correct host.

Resources