Best TCP port number range for internal applications [closed] - networking

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I work in a place where each of our internal applications runs on an individual Tomcat instance and uses a specific TCP port. What would be the best IANA port range to use for these apps in order to avoid port number collisions with any other process on the server?
Based on http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xml, these are the options as I currently see them:
System Ports (0-1023): I don't want to use any of these ports
because the server may be running services on standard ports in this
range
User Ports (1024-49151): Given that the applications are internal I don't intend to request IANA to reserve a number for any of our applications. However, I'd like to reduce the likelihood of the same port being used by another process, e.g., Oracle Net Listener on 1521.
Dynamic and/or Private Ports (49152-65535): This range is ideal for custom port numbers. My only concern is if this were to happen:
a. I configure one of my applications to use port X
b. The application is down for a few minutes or hours (depending on the nature of the app), leaving the port unused for a little while,
c. The operating system allocates port number X to another process, for instance, when that process acts as a client requiring a TCP connection to another server. This succeeds given that it falls within the dynamic range and X is currently unused as far as the operating system is concerned, and
d. The app fails to start because port X is already in use

I decided to download the assigned port numbers from IANA, filter out the used ports, and sort each "Unassigned" range in order of most ports available, descending. This did not work, since the csv file has ranges marked as "Unassigned" that overlap other port number reservations. I manually expanded the ranges of assigned port numbers, leaving me with a list of all assigned port numbers. I then sorted that list and generated my own list of unassigned ranges.
Since this stackoverflow.com page ranked very high in my search about the topic, I figured I'd post the largest ranges here for anyone else who is interested. These are for both TCP and UDP where the number of ports in the range is at least 500.
Total Start End
829 29170 29998
815 38866 39680
710 41798 42507
681 43442 44122
661 46337 46997
643 35358 36000
609 36866 37474
596 38204 38799
592 33657 34248
571 30261 30831
563 41231 41793
542 21011 21552
528 28590 29117
521 14415 14935
510 26490 26999
Source (via the CSV download button):
http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml

I can't see why you would care. Other than the "don't use ports below 1024" privilege rule, you should be able to use any port because your clients should be configurable to talk to any IP address and port!
If they're not, then they haven't been done very well. Go back and do them properly :-)
In other words, run the server at IP address X and port Y then configure clients with that information. Then, if you find you must run a different server on X that conflicts with your Y, just re-configure your server and clients to use a new port. This is true whether your clients are code, or people typing URLs into a browser.
I, like you, wouldn't try to get numbers assigned by IANA since that's supposed to be for services so common that many, many environments will use them (think SSH or FTP or TELNET).
Your network is your network and, if you want your servers on port 1234 (or even the TELNET or FTP ports for that matter), that's your business. Case in point, in our mainframe development area, port 23 is used for the 3270 terminal server which is a vastly different beast to telnet. If you want to telnet to the UNIX side of the mainframe, you use port 1023. That's sometimes annoying if you use telnet clients without specifying port 1023 since it hooks you up to a server that knows nothing of the telnet protocol - we have to break out of the telnet client and do it properly:
telnet big_honking_mainframe_box.com 1023
If you really can't make the client side configurable, pick one in the second range, like 48042, and just use it, declaring that any other software on those boxes (including any added in the future) has to keep out of your way.

Short answer: use an unassigned user port
Over achiever's answer - Select and deploy a resource discovery solution. Have the server select a private port dynamically. Have the clients use resource discovery.
The risk that that a server will fail because the port it wants to listen on is not available is real; at least it's happened to me. Another service or a client might get there first.
You can almost totally reduce the risk from a client by avoiding the private ports, which are dynamically handed out to clients.
The risk that from another service is minimal if you use a user port. An unassigned port's risk is only that another service happens to be configured (or dyamically) uses that port. But at least that's probably under your control.
The huge doc with all the port assignments, including User Ports, is here: http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.txt look for the token Unassigned.

Related

Process ID instead of port Address [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 7 years ago.
Improve this question
I have been to one confusion that when i request certain web page from my computer let us say www.example.com then my computer will make request to DNS server to find the IP Address of the destination and it will bind the source ip,destination ip and source port for finding which application was there to request the web page. And my question is that if i request from chrome (let us say) then it has already a certain process id associated with it. why don't we use that process id instead of port address? I know that socket id should be associated as it identifies the different application running inside the chrome in this example.
Sorry if my concept is wrong.
A port is a logical id for a service. You can see these port ids on a Unix system by opening up /etc/services. Here's an example couple of lines that define the ports for SMTP:
smtp 25/udp # Simple Mail Transfer
smtp 25/tcp # Simple Mail Transfer
These services are well known. They don't vary. Conversely, the process ID of a daemon that handles a service (like SMTP) will vary, most likely across reboots and quite possibly it will vary within a single instance of a server boot (e.g., an admin stops and restarts it will cause it to get a new process ID).
The service ID/port has to be unique so that applications can specify where on a server the packets being received should be sent. It's just a short way of saying "send my packets to this application/service". The ID could just have been a string or name such as "webserver", "emailserver", "ftpserver" but that would not be as efficient as a short integer ID to send, but it is effectively the same concept - whether a string or an id, it identifies a service.
Well, you might ask, what if you had a service (kind of like DNS) that you called to say "give me the process ID of the smtp server"? First off, why do that when you can just say "send this to the smtp server" to begin with? You do that by embedding the port number in each packet sent (it's a field in each TCP/UDP packet sent by the client). Secondly, that process ID would have to be valid for the duration of the client session, but as I mentioned, processes can be stopped and restarted, meaning the process ID for the service could change. Hence, static, agreed upon port numbers are used; process ids are at best inefficient (because of the lookup you'd need to do to find them out), and worse simply not reliable (because process IDs change).
IP addresses tell you what machine/host to route the packets (destined for SMTP are to be sent). IP address of a machine can change, and that is one reason why you need something like DNS to translate a name (e.g., www.google.com) into an IP address. In contrast, the number of services text file with IANA port number assignments is small enough to represent locally (in /etc/services for example) on each host, no need for some kind of lookup service (not to mention that port numbers are usually only something that the programmer of the service, or of the client, needs to be aware of).
By analogy, everyone's home is at a different street address/city. This is what IP addresses provide, a way to find a server/host on the Internet. DNS is like asking a service to answer the question "what is the address of Joe Smith?".
Now that we can find the home, we need to agree on terminology for rooms in the home. By convention, everyone calls their bedroom "bedroom". This is analogous to identifying a service within the server as a port number.
So much for the port of a well known service on the server side, and how clients can address it. If you are asking about how ports are assigned for client connections, or for connections that are made by children or threads of the server that are bound to a specific client (typically, a server will spawn a thread of child process and carry out the transaction over a different connection than the one listening for clients on), then ephemeral ports (in a specific assigned range) are often used. In this case, the client and server don't particularly care the port number because they've already found each other, so to speak, and these port numbers will vary and are effectively random (at least to the client and server using them). These port numbers cannot truly be random, however, because they may conflict with the port assignments used by well known services or the range of ports available for use explicitly by services that are not registered with the IANA. They come from a reserved range that the IANA has set aside for this purpose. Just using the PID of the process does not ensure the port will not conflict with another port, or be in the range assigned for this purpose.

Discovering free ports

I wrote an server application in erlang and a client in C#. They communicate through 3 TCP ports. Port numbers are hardcoded. Now I'd like to do this dynamically. This is my first time doing network programming, so please pardon my inability to use proper terminology :-D
What I would like to do is make a supervisor which would accept a TCP connection from a client on a previously known port (say, 10000, or whatever), then find 3 free ports, start a server application on those 3 ports and tell the client those port numbers so client can connect to the server.
My particular problem is: how do I find 3 ports which are not in use? (clarification: which module:fun() to use to find a free port?)
My general problem is: I'm sure this kind of stuff (one server allocating ports and redirecting clients) is quite common network programming problem and there should be a bunch of (erlang-specific or general) resources about this, but I just don't have the terminology to google it out.
According to the Erlang documentation here, if the Port argument to the gen_tcp:listen/2 function is 0, then the OS will assign any available port to the socket. The latter can then be retreived using inet:port/1 .
You can therefore do something like this :
{ok, Listen} = gen_tcp:listen(0, [Options]),
Port = inet:port(Listen).
just in case you didn't know that - you dont have to allocate new ports for each client, it's perfectly fine to have all clients to connect to same ports
UPDATE:
if there is a reason to allocate new ports for incoming clients then it's far beyond your first "introduction to network programming" program.
separate ports could mean you want to completely isolate environments of different groups of clients. it's comparable to providing completely different IP addresses to connect to. if you want to write a simple ping-pong program - you don't need it. and i honestly believe you will never need to use such solution in your whole life - that's how incredibly rarely it is.
regarding cpu/ports overhead - allocating ports and starting a server that listens to that port is already far bigger overhead than accepting clients on same port.
You need to avoid commonly known ports, ftp, http, smtp etc, But I don't think there is any master list of which ports other software uses that you should avoid. I think your best bet is to come up with a range of ports you want to use. Check at runtime if anybody else answers ( is using the ports ) on the numbers you choose dynamically, if not issue it to the client.

How to retain one million simultaneous TCP connections?

I am to design a server that needs to serve millions of clients that are simultaneously connected with the server via TCP.
The data traffic between the server and the clients will be sparse, so bandwidth issues can be ignored.
One important requirement is that whenever the server needs to send data to any client it should use the existing TCP connection instead of opening a new connection toward the client (because the client may be behind a firewall).
Does anybody know how to do this, and what hardware/software is needed (at the least cost)?
What operating systems are you considering for this?
If using a Windows OS and using something later than Vista then you shouldn't have a problem with many thousands of connections on a single machine. I've run tests (here: http://www.lenholgate.com/blog/2005/11/windows-tcpip-server-performance.html) with a low spec Windows Server 2003 machine and easily achieved more than 70,000 active TCP connections. Some of the resource limits that affect the number of connections possible have been lifted considerably on Vista (see here: http://www.lenholgate.com/blog/2005/11/windows-tcpip-server-performance.html) and so you could probably achieve your goal with a small cluster of machines. I don't know what you'd need in front of those to route the connections.
Windows provides a facility called I/O Completion Ports (see: http://msdn.microsoft.com/en-us/magazine/cc302334.aspx) which allow you to service many thousands of concurrent connections with very few threads (I was running tests yesterday with 5000 connections saturating a link to a server with 2 threads to process the I/O...). Thus the basic architecture is very scalable.
If you want to run some tests then I have some freely available tools on my blog that allow you to thrash a simple echo server using many thousands of connections (1) and (2) and some free code which you could use to get you started (3)
The second part of your question, from your comments, is more tricky. If the client's IP address keeps changing and there's nothing between you and them that is providing NAT to give you a consistent IP address then their connections will, no doubt, be terminated and need to be re-established. If the clients detect this connection tear down when their IP address changes then they can reconnect to the server, if they can't then I would suggest that the clients need to poll the server every so often so that they can detect the connection loss and reconnect. There's nothing the server can do here as it can't predict the new IP address and it will discover that the old connection has failed when it tries to send data.
And remember, your problems are only just beginning once you get your system to scale to this level...
This problem is related to the so-called C10K problem. The C10K page lists a large number of good resources for addressing the problems you will encounter when you try to allow thousands of clients to connect to the same server.
I've come across the APE Project
a while back. It seems like a dream come true. They can support up to 100k concurrent clients on a single node. Spread them across 10 or 20 nodes, and you can serve millions. Perfect for RESTful applications. Might want to look deeper for any shared namespace. One drawback is that this is a standalone server, as in supplementary to a web server. This server is of course Open Source, so any cost is hardware/ISP related.
You cannot use UDP. If the client sends a request and you don't reply immediately, a router is going to forget the reverse route in 30 seconds or less, so your server will never be able to reply to the client.
TCP is the only option, and it, too, will give you headaches. Most routers are going to forget the route and/or drop the connection after a few minutes, so your client/server code is going to have to send "keep alives" fairly often.
I recommend setting up a "sniffer", to see how the phone companies are staying in touch with your smartphone for their "push" technology. Copy whatever they're doing, because that stuff works!
As Greg mentioned, the problem you are describing is C10K (or rather "C1M" in your case )
I recently made a simple TCP echo server on linux that scales very well with the number of sessions (only tested up to 200.000 though), by using the epoll queue. On BSD, you have something similar called kqueue.
You can check out the code if you want to. Hope this helps and good luck!
EDIT: As noted in the comments below, my original assertion that there is a 64K limit based on the number of ports is incorrect, however there is a 32K limit on the number of socket handles, so my suggested design is valid.
With a typical TCP/IP server design, you're limited in the number of simultaneous open connections you can have. The server has one listening port, and when a client connects to it the server makes an accept call, and that creates a new socket on a random port for the rest of the connection.
To handle more than 64K simultaneous connections I think you need to use UDP instead. You only need one port for the server to listen on, and you need to manage the connections using a 32-bit client ID in the packet data instead of having a separate port for each client. The 32-bit client ID could be the client's IP address, and the client can listen on a known UDP port for messages coming back from the server. That port would be the only one that needs to be open on the firewall.
With this approach, your only limitation is how quickly you can handle and respond to UDP messages. With millions of clients, even sparse traffic could give you large spikes, and if you don't read the packets fast enough your input queue will fill up and you'll start dropping packets. The C10K page Greg points to will give you strategies for that.

Can two or more SNMP agents be run on the same port (on the same machine)?

Just a technical question -
Can two or more SNMP agents be run on the same port (on the same machine)?
My first instinct would be no since host:port identifies an instance of an application but I'm not sure.
Thank you!
Technically, if the OS supports it, the SO_REUSEADDR SO_REUSEPORT options may be set on a socket to allow other processes to bind to the same address/port and thus allow multiple processes to receive messages on the same address/port. But both processes would have to set the option, and I doubt any agent implementations do that because it would not make sense to do so--it would just cause headaches having both agents potentially responding to a single request. Managers won't be equipped to handle it.
However, you can instead run an SNMP proxy in the primary address/port, configured to forward requests to one of multiple agents based on query, security, or (with SNMPv3) context/engine ID parameters, and forward responses back.
Also, using AgentX, you have an SNMP master agent running on the primary address/port, and one or more SNMP sub-agents connected to the master agent. The master agent dispatches requests to the sub-agents as appropriate, merging the results into a single response, so that to the outside world it appears as a single agent. Each sub-agent typically handles a different branch of OID space (one sub-agent implementing certain module(s), another sub-agent implementing other module(s)).
But taking two agents intended to own the address/port exclusively, and forcing them to share through the REUSE options, while it may be possible, would not be wise.
You can run multiple agents on the same host and with the same port if they have differents ip address (can use a netsh script for that).
Personnaly I use the nsoftware ddl : SecureSNMP V8 edition .NET to do this.
You can look at this post : Multiple SNMP Agents with nsoftware dll
No, two agents cannot both run on the same port as seperate applications for the reasons you assumed (except with a brittle packet sniffing hack, which we'll not go into).
However, 2 agents can be accessed through the same port if there is some mechanism that handles the actual port and distributes requests based on MIB. For example the Windows SNMP service does this, allowing any number of SNMP agents to be added as "extensions" through the registry (HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\SNMP\Parameters\ExtensionAgents) by writing them as DLLs and using the snmp.h headers in the platform SDK.
You are correct: ports can't be shared.
If both the agents were designed by you, then the answer can be different.
Consider the HTTP and FTP cases, we can use host names to distinguise multiple sites on the same port, then why can't we do it for SNMP?
We can create a dispatcher who monitors port 161 for incoming traffic. Then use multiple real agents to handle those traffic behind. We can feel free to design how to distinguise them. Personally I prefer the FTP virtual host name manner and use | to distinguise agents.
Maybe I can create a demo for #SNMP Suite in the future.
But if you need to work with existing agents on the same server, then such flexibility is lost.

How many socket connections can a web server handle?

Say if I was to get shared, virtual or dedicated hosting, I read somewhere a server/machine can only handle 64,000 TCP connections at one time, is this true? How many could any type of hosting handle regardless of bandwidth? I'm assuming HTTP works over TCP.
Would this mean only 64,000 users could connect to the website, and if I wanted to serve more I'd have to move to a web farm?
In short:
You should be able to achieve in the order of millions of simultaneous active TCP connections and by extension HTTP request(s). This tells you the maximum performance you can expect with the right platform with the right configuration.
Today, I was worried whether IIS with ASP.NET would support in the order of 100 concurrent connections (look at my update, expect ~10k responses per second on older ASP.Net Mono versions). When I saw this question/answers, I couldn't resist answering myself, many answers to the question here are completely incorrect.
Best Case
The answer to this question must only concern itself with the simplest server configuration to decouple from the countless variables and configurations possible downstream.
So consider the following scenario for my answer:
No traffic on the TCP sessions, except for keep-alive packets (otherwise you would obviously need a corresponding amount of network bandwidth and other computer resources)
Software designed to use asynchronous sockets and programming, rather than a hardware thread per request from a pool. (ie. IIS, Node.js, Nginx... webserver [but not Apache] with async designed application software)
Good performance/dollar CPU / Ram. Today, arbitrarily, let's say i7 (4 core) with 8GB of RAM.
A good firewall/router to match.
No virtual limit/governor - ie. Linux somaxconn, IIS web.config...
No dependency on other slower hardware - no reading from harddisk, because it would be the lowest common denominator and bottleneck, not network IO.
Detailed Answer
Synchronous thread-bound designs tend to be the worst performing relative to Asynchronous IO implementations.
WhatsApp can handle a million WITH traffic on a single Unix flavoured OS machine - https://blog.whatsapp.com/index.php/2012/01/1-million-is-so-2011/.
And finally, this one, http://highscalability.com/blog/2013/5/13/the-secret-to-10-million-concurrent-connections-the-kernel-i.html, goes into a lot of detail, exploring how even 10 million could be achieved. Servers often have hardware TCP offload engines, ASICs designed for this specific role more efficiently than a general purpose CPU.
Good software design choices
Asynchronous IO design will differ across Operating Systems and Programming platforms. Node.js was designed with asynchronous in mind. You should use Promises at least, and when ECMAScript 7 comes along, async/await. C#/.Net already has full asynchronous support like node.js. Whatever the OS and platform, asynchronous should be expected to perform very well. And whatever language you choose, look for the keyword "asynchronous", most modern languages will have some support, even if it's an add-on of some sort.
To WebFarm?
Whatever the limit is for your particular situation, yes a web-farm is one good solution to scaling. There are many architectures for achieving this. One is using a load balancer (hosting providers can offer these, but even these have a limit, along with bandwidth ceiling), but I don't favour this option. For Single Page Applications with long-running connections, I prefer to instead have an open list of servers which the client application will choose from randomly at startup and reuse over the lifetime of the application. This removes the single point of failure (load balancer) and enables scaling through multiple data centres and therefore much more bandwidth.
Busting a myth - 64K ports
To address the question component regarding "64,000", this is a misconception. A server can connect to many more than 65535 clients. See https://networkengineering.stackexchange.com/questions/48283/is-a-tcp-server-limited-to-65535-clients/48284
By the way, Http.sys on Windows permits multiple applications to share the same server port under the HTTP URL schema. They each register a separate domain binding, but there is ultimately a single server application proxying the requests to the correct applications.
Update 2019-05-30
Here is an up to date comparison of the fastest HTTP libraries - https://www.techempower.com/benchmarks/#section=data-r16&hw=ph&test=plaintext
Test date: 2018-06-06
Hardware used: Dell R440 Xeon Gold + 10 GbE
The leader has ~7M plaintext reponses per second (responses not connections)
The second one Fasthttp for golang advertises 1.5M concurrent connections - see https://github.com/valyala/fasthttp
The leading languages are Rust, Go, C++, Java, C, and even C# ranks at 11 (6.9M per second). Scala and Clojure rank further down. Python ranks at 29th at 2.7M per second.
At the bottom of the list, I note laravel and cakephp, rails, aspnet-mono-ngx, symfony, zend. All below 10k per second. Note, most of these frameworks are build for dynamic pages and quite old, there may be newer variants that feature higher up in the list.
Remember this is HTTP plaintext, not for the Websocket specialty: many people coming here will likely be interested in concurrent connections for websocket.
This question is a fairly difficult one. There is no real software limitation on the number of active connections a machine can have, though some OS's are more limited than others. The problem becomes one of resources. For example, let's say a single machine wants to support 64,000 simultaneous connections. If the server uses 1MB of RAM per connection, it would need 64GB of RAM. If each client needs to read a file, the disk or storage array access load becomes much larger than those devices can handle. If a server needs to fork one process per connection then the OS will spend the majority of its time context switching or starving processes for CPU time.
The C10K problem page has a very good discussion of this issue.
To add my two cents to the conversation a process can have simultaneously open a number of sockets connected equal to this number (in Linux type sytems) /proc/sys/net/core/somaxconn
cat /proc/sys/net/core/somaxconn
This number can be modified on the fly (only by root user of course)
echo 1024 > /proc/sys/net/core/somaxconn
But entirely depends on the server process, the hardware of the machine and the network, the real number of sockets that can be connected before crashing the system
It looks like the answer is at least 12 million if you have a beefy server, your server software is optimized for it, you have enough clients. If you test from one client to one server, the number of port numbers on the client will be one of the obvious resource limits (Each TCP connection is defined by the unique combination of IP and port number at the source and destination).
(You need to run multiple clients as otherwise you hit the 64K limit on port numbers first)
When it comes down to it, this is a classic example of the witticism that "the difference between theory and practise is much larger in practise than in theory" - in practise achieving the higher numbers seems to be a cycle of a. propose specific configuration/architecture/code changes, b. test it till you hit a limit, c. Have I finished? If not then d. work out what was the limiting factor, e. go back to step a (rinse and repeat).
Here is an example with 2 million TCP connections onto a beefy box (128GB RAM and 40 cores) running Phoenix http://www.phoenixframework.org/blog/the-road-to-2-million-websocket-connections - they ended up needing 50 or so reasonably significant servers just to provide the client load (their initial smaller clients maxed out to early, eg "maxed our 4core/15gb box # 450k clients").
Here is another reference for go this time at 10 million: http://goroutines.com/10m.
This appears to be java based and 12 million connections: https://mrotaru.wordpress.com/2013/06/20/12-million-concurrent-connections-with-migratorydata-websocket-server/
Note that HTTP doesn't typically keep TCP connections open for any longer than it takes to transmit the page to the client; and it usually takes much more time for the user to read a web page than it takes to download the page... while the user is viewing the page, he adds no load to the server at all.
So the number of people that can be simultaneously viewing your web site is much larger than the number of TCP connections that it can simultaneously serve.
in case of the IPv4 protocol, the server with one IP address that listens on one port only can handle 2^32 IP addresses x 2^16 ports so 2^48 unique sockets. If you speak about a server as a physical machine, and you are able to utilize all 2^16 ports, then there could be maximum of 2^48 x 2^16 = 2^64 unique TCP/IP sockets for one IP address. Please note that some ports are reserved for the OS, so this number will be lower. To sum up:
1 IP and 1 port --> 2^48 sockets
1 IP and all ports --> 2^64 sockets
all unique IPv4 sockets in the universe --> 2^96 sockets
There are two different discussions here: One is how many people can connect to your server. This one has been answered adequately by others, so I won't go into that.
Other is how many ports yours server can listen on? I believe this is where the 64K number came from. Actually, TCP protocol uses a 16-bit identifier for a port, which translates to 65536 (a bit more than 64K). This means that you can have that many different "listeners" on the server per IP Address.
I think that the number of concurrent socket connections one web server can handle largely depends on the amount of resources each connection consumes and the amount of total resource available on the server barring any other web server resource limiting configuration.
To illustrate, if every socket connection consumed 1MB of server resource and the server has 16GB of RAM available (theoretically) this would mean it would only be able to handle (16GB / 1MB) concurrent connections. I think it's as simple as that... REALLY!
So regardless of how the web server handles connections, every connection will ultimately consume some resource.

Resources