Machine A needs to send a message to machine B. Machine A has a static IP but machine B does not.
One option I could think of to solve this problem is that machine B opens a TCP connection to machine A and then machine A sends the data/message to machine B. However, this solution has the following limitations:
a) It is not scalable if there are many such machines as machine B to whom/which the data has be sent. It might be a kill on the resources of machine A.
b) It is machine A that needs to send the data when it wants. Machine B does not know when there will be data for it. In the current design, machine B will have to keep polling machine A repeatedly with a TCP connection asking if it has any data for it or not. This can get expensive if there are many machine B's.
Is there a less expensive way to solve this problem? The Observer design pattern comes to mind. Machine B could subscribe to a notification from machine A to inform it when data becomes available. However, how does one implement the pattern in a distributed environment when machine B does not have a static IP?
Observer aside, is there a way other than using raw sockets for machine A to send that data to machine B, that would be less expensive?
What if machine B makes a call to machine A to register its IP address for updates? That would be a quick message exchange. Whenever machine A has data it could create a new connection to all of the IPs that have registered themselves and send them the data.
Look at IP Multicast, though you may be able to get by with simple UDP broadcast.
Not having a static IP means that it is accessible from outside, but it's address changes?
If it does, then you can have the machine B call A.detach(old_ip); A.attach(new_ip) every time the address is changed.
I'd go with your original idea, except that there's no need to poll - B can just maintain an idle TCP connection to A at all times, then when A wants to send a message it just sends it out to all the clients it has connected at that time. Overhead won't be a problem - even a fairly old machine can handle thousands of simultaneous mostly-idle TCP connections.
(You'll also want to implement some kind of keep-alive echo / echo reply type messages if the gaps between real messages are longer than a few minutes, so that B can quickly detect a dead connection and reconnect, and to avoid connection-tracking information in firewalls or routers in path from timing-out).
Related
I wrote a Java application that communicates with a 3rd-party server that is hosted on AWS. According to what I've read, there are no connection limits on their REST API, yet regardless of how many threads I try to hit them with I am seeing a limit of 64 outgoing connections.
My client is written in Java, running under Windows 10. This number smells like an intentional limit, but I have been unable to find any documented limits specific to the aforementioned environment.
I tried pointing my application to https://test.com/ and got a limit of 128 outgoing connections which leads me to believe the limit is server-side.
How can one determine (from the client's side) whether outgoing connections are being restricted by the server-side? (I tried netstat -an but I only saw ESTABLISHED connections)
Seeing as my application is not receiving any connection failures from the server, could they be intentionally withholding SYN-ACK responses until older connections are closed?
I'm pretty sure there's no reliable way to tell from a client's side what the outgoing connections are being restricted to (and that would seem like a security issue to me if you could).
As for what the limits are, I think that also could be a difficult thing to determine from the server itself (unless it was configured at some lower bound) as there would probably be a lot of variables such as pretty much everything, from the amount of memory, ports, CPU on the machine you are connected to, from and every single one in between.
Also 64 and 128 don't seem like human set limits to me, but instead limits on something internal as it's a power of 2.
I have two applications that need to communicate with each other running on the same system.
I've been using the very strange practice of opening a TCP COM channel between the two applications for communication.
Is this practice frowned-upon in anyway? Is there any alternative (apart from using stdio, not possible due to other reasons).
Is there a restriction on maximum transfer rate and/or any latency involved (compared to piped stdio)?
I'm using the local (127.0.0.1) address for both server and client, will the connection be guaranteed to stay within the local machine itself or could it relay off the nearest router before coming back to itself and does the network card influence the properties of the connection at all?
I worked on a system a while ago with Java. and I was looking for the same question. I don't have much experience with it. But I ended up using tcp connections for communications for the following advantages:
1) The ability to put the different application in different servers in the future if needed to.
2) The applications are totaly independent. one application could crash without effecting the other one. If the working application then tries to connect it gets an error and you can handle that.
I saw this used in many other type of applications. So I went with it and it is working fine. But you have to be carefull and handle networks errors and IO errors and closing all open sockets after finishing with the connection. I was only closing the socket from the client end so I ended up with many CLOSE_WAIT ports in the server.
Regards,
It's pretty common to use TCP for inter-application communication.
Performance should not be an issue.
Sockets On Same Machine For Windows and Linux
You should consider security. What will happen if another user on the machine connects to the port, how will the application authenticate etc.
I wrote an server application in erlang and a client in C#. They communicate through 3 TCP ports. Port numbers are hardcoded. Now I'd like to do this dynamically. This is my first time doing network programming, so please pardon my inability to use proper terminology :-D
What I would like to do is make a supervisor which would accept a TCP connection from a client on a previously known port (say, 10000, or whatever), then find 3 free ports, start a server application on those 3 ports and tell the client those port numbers so client can connect to the server.
My particular problem is: how do I find 3 ports which are not in use? (clarification: which module:fun() to use to find a free port?)
My general problem is: I'm sure this kind of stuff (one server allocating ports and redirecting clients) is quite common network programming problem and there should be a bunch of (erlang-specific or general) resources about this, but I just don't have the terminology to google it out.
According to the Erlang documentation here, if the Port argument to the gen_tcp:listen/2 function is 0, then the OS will assign any available port to the socket. The latter can then be retreived using inet:port/1 .
You can therefore do something like this :
{ok, Listen} = gen_tcp:listen(0, [Options]),
Port = inet:port(Listen).
just in case you didn't know that - you dont have to allocate new ports for each client, it's perfectly fine to have all clients to connect to same ports
UPDATE:
if there is a reason to allocate new ports for incoming clients then it's far beyond your first "introduction to network programming" program.
separate ports could mean you want to completely isolate environments of different groups of clients. it's comparable to providing completely different IP addresses to connect to. if you want to write a simple ping-pong program - you don't need it. and i honestly believe you will never need to use such solution in your whole life - that's how incredibly rarely it is.
regarding cpu/ports overhead - allocating ports and starting a server that listens to that port is already far bigger overhead than accepting clients on same port.
You need to avoid commonly known ports, ftp, http, smtp etc, But I don't think there is any master list of which ports other software uses that you should avoid. I think your best bet is to come up with a range of ports you want to use. Check at runtime if anybody else answers ( is using the ports ) on the numbers you choose dynamically, if not issue it to the client.
I am to design a server that needs to serve millions of clients that are simultaneously connected with the server via TCP.
The data traffic between the server and the clients will be sparse, so bandwidth issues can be ignored.
One important requirement is that whenever the server needs to send data to any client it should use the existing TCP connection instead of opening a new connection toward the client (because the client may be behind a firewall).
Does anybody know how to do this, and what hardware/software is needed (at the least cost)?
What operating systems are you considering for this?
If using a Windows OS and using something later than Vista then you shouldn't have a problem with many thousands of connections on a single machine. I've run tests (here: http://www.lenholgate.com/blog/2005/11/windows-tcpip-server-performance.html) with a low spec Windows Server 2003 machine and easily achieved more than 70,000 active TCP connections. Some of the resource limits that affect the number of connections possible have been lifted considerably on Vista (see here: http://www.lenholgate.com/blog/2005/11/windows-tcpip-server-performance.html) and so you could probably achieve your goal with a small cluster of machines. I don't know what you'd need in front of those to route the connections.
Windows provides a facility called I/O Completion Ports (see: http://msdn.microsoft.com/en-us/magazine/cc302334.aspx) which allow you to service many thousands of concurrent connections with very few threads (I was running tests yesterday with 5000 connections saturating a link to a server with 2 threads to process the I/O...). Thus the basic architecture is very scalable.
If you want to run some tests then I have some freely available tools on my blog that allow you to thrash a simple echo server using many thousands of connections (1) and (2) and some free code which you could use to get you started (3)
The second part of your question, from your comments, is more tricky. If the client's IP address keeps changing and there's nothing between you and them that is providing NAT to give you a consistent IP address then their connections will, no doubt, be terminated and need to be re-established. If the clients detect this connection tear down when their IP address changes then they can reconnect to the server, if they can't then I would suggest that the clients need to poll the server every so often so that they can detect the connection loss and reconnect. There's nothing the server can do here as it can't predict the new IP address and it will discover that the old connection has failed when it tries to send data.
And remember, your problems are only just beginning once you get your system to scale to this level...
This problem is related to the so-called C10K problem. The C10K page lists a large number of good resources for addressing the problems you will encounter when you try to allow thousands of clients to connect to the same server.
I've come across the APE Project
a while back. It seems like a dream come true. They can support up to 100k concurrent clients on a single node. Spread them across 10 or 20 nodes, and you can serve millions. Perfect for RESTful applications. Might want to look deeper for any shared namespace. One drawback is that this is a standalone server, as in supplementary to a web server. This server is of course Open Source, so any cost is hardware/ISP related.
You cannot use UDP. If the client sends a request and you don't reply immediately, a router is going to forget the reverse route in 30 seconds or less, so your server will never be able to reply to the client.
TCP is the only option, and it, too, will give you headaches. Most routers are going to forget the route and/or drop the connection after a few minutes, so your client/server code is going to have to send "keep alives" fairly often.
I recommend setting up a "sniffer", to see how the phone companies are staying in touch with your smartphone for their "push" technology. Copy whatever they're doing, because that stuff works!
As Greg mentioned, the problem you are describing is C10K (or rather "C1M" in your case )
I recently made a simple TCP echo server on linux that scales very well with the number of sessions (only tested up to 200.000 though), by using the epoll queue. On BSD, you have something similar called kqueue.
You can check out the code if you want to. Hope this helps and good luck!
EDIT: As noted in the comments below, my original assertion that there is a 64K limit based on the number of ports is incorrect, however there is a 32K limit on the number of socket handles, so my suggested design is valid.
With a typical TCP/IP server design, you're limited in the number of simultaneous open connections you can have. The server has one listening port, and when a client connects to it the server makes an accept call, and that creates a new socket on a random port for the rest of the connection.
To handle more than 64K simultaneous connections I think you need to use UDP instead. You only need one port for the server to listen on, and you need to manage the connections using a 32-bit client ID in the packet data instead of having a separate port for each client. The 32-bit client ID could be the client's IP address, and the client can listen on a known UDP port for messages coming back from the server. That port would be the only one that needs to be open on the firewall.
With this approach, your only limitation is how quickly you can handle and respond to UDP messages. With millions of clients, even sparse traffic could give you large spikes, and if you don't read the packets fast enough your input queue will fill up and you'll start dropping packets. The C10K page Greg points to will give you strategies for that.
I am making an EventLog which will log the transaction log in my website. The details of the log will include the Public IP from where the transaction has orginated and also the local IP address (under the public IP).
I have found ways to obtain the Public IP Address, but i am unable to find out the local IP or machine IP from where the transaction is made.
A large number of entries will be done by people using the same connection. ie 5 or 10 computers connected to the same connection.
I need to find the machine IP (192.168.0.1 for 1 system 192.168.0.2 for the next) of the machines making the transactions and also the Computer name...
Is this possible
To clarify, you want the private IP address of a client when the client is connecting through a router? Then no, there isn't a way to do this.
Are you doing this purely to distinguish between different users?
Can you use another method like cookies?
If your client connects from behind a NAT or firewall you cannot reliably get his address or computer name. If you need such information then your protocol should request them as part of the request and the client machine should voluntarily provide them. There is no way to validate the information provided (short of deploying a trusted cryptographic infrastructure, ie. you establish a strong trust in the client machines themselves).
Sadly, the answer is no. No modern browser will present that private address in the HTTP transaction. The client's router which performs the NAT (Network Address Translation) offers only the public client IP address when making the IP connection.
Not likely. See a short discussion in http://javascript.about.com/library/blip.htm
Well yes we are doing this just to differentiate between the computers to know who is doing the entries...
Since you guys say that tracing the IP is not very reliable are there any other methods that i can use to do the same thing
I just need to know from which computer each entry is entered.
Any suggestions would be welcome
You are making the false assumption that there is a way to know from which computer each entry is entered. Nobody has the job of ensuring that this information exists. Often, it will not exist.
The only way to make sure each computer is uniquely identified is for you to identify it. You can do this through client certificates, for instance. In general, if you want each computer to have a unique identifier, then you need to create a unique identifier, then put it on that computer. You then need the computer to send that identifier back.
There is no other unique identifier for computers.
This is probably way beyond what you're looking for but it makes for an interesting read: Remote physical device fingerprinting
This allows you to uniquely identify a remote physical device without its cooperation, across NAT or whatever else you can imagine.