comet server and network latency time - networking

For a social site we use a node.js based comet server as the instant messenger, everything is working great we have only one problem how to solve the latency problem to Australia and New Zealand where we have RRT between 310 ms to 440 ms.
One idea is to have local servers, but in this case they must connect to the main server that a user in Australia is able to communicate with one from the UK. This comet-comet connection will have a higher latency too, but local users can chat fast which will be mostly the case.
Has anyone a better idea then use of a local comet servers?

If your latency is due to the geographical distance, there is no option how to shorten it. The only thing you can do is try find upstream network providers who have more "straight" cables. But you can never achieve latency shorter than direct air distance between those 2 countries/servers.
If you will have users in Australia communicating with each other, then yes, it will be a difference for them if they will be connecting to a local server. But for communication between one user in UK and one in AU, it won't matter if you have a local server.
But anyway for an instant messenger the latency is not so important IMHO. The recipient does not know the moment when the sender finished his message and hit the send button, so he can't measure the latency. And human is not able to send multiple messages per second, so I think it won't be possible to see the difference between 400 ms and 10 ms latency. If it would be over 1 second, it could be visible...
So to summarize I would bother with making local servers only when there would be enough local users communicating between themselves.
(Please let me know if some of my assumptions about your setup were incorrect.)

Related

Detecting VPNs and Proxies via latency

Consider a user who is using a service (say an app backend) and routing their connection through an intermediary proxy and/or vpn. Specifically let’s assume the user is in Shanghai-China, the proxy is in the Dallas-Texas and the backend is on AWS. In theory, compared to a user who actually lives in Dallas-Texas (on the same network) the Shanghai-China user will have additional latency in sending/receiving events due to the Asia<-> USA trip.
Questions:
Are there known/published methodologies for seeing this additional latency and thereby identifying imposters from far away? The simplest I can think of is grouping by isp providers and then looking for outliers in latency.
Are there additional ways to honeypot such users? I’m not a network export but I think various sorts of media (eg video streaming) get different treatment on these networks so I’m wondering if it is possible to send additional event data to honeypot more precise latency anomalies.
Assumptions:
We can assume that we have plenty of user data, from each network provider. We also have streams of event data that includes client and server side timestamps for sending and receiving data.
I’m strictly interested in identifying users who are very far away from the IP source. I am NOT interested in methodologies that strictly try to classify an IP as a VPN (eg what Maxmind does in the above link).
Since you have stated you are not interested in the VPN or Relay identification aspect of it, but only latency detection for "far away" users, I will offer some ideas:
Run your own HTTP(S) measurement server that all clients must run an HTTP ping against. HTTP round-trips time in milliseconds will be acceptable for broad-strokes classification of a user's "distance" from your server - assuming that the initial TCP handshake has been completed by the client, all intermediaries, and your measurement server ("pre-warmed" connection).
Use an IP geolocation API. These will give you the country code (almost always accurate), and approximate latitude, longitude (broadly accurate) that you may use to calculate the distance from your server. This of course assumes that the public IP of the client is visible to you, and not completely obfuscated by the intermediaries.

Improving EC2 ping times from home

I've been trying to run a gaming machine in EC2 following the excellent blog post by Larry Land here. The problem I have is latency from my home to my nearest AWS region. I get a ping of around 35ms, and I'm looking to improve on that. Is there anything I can do? I'm using Steam n-home streaming over a Hamachi VPN, on Windows Server 2012.
My internet connection is roughly 120Mbps down and 35Mbps up, and there's nothing I can do to improve on that sadly.
In some cases the nearest region geographically isn't the one with the lowest latency. This is due to routing agreements that sometimes result in non-optimal routes.
A common example, is with Eastern Australia and Singapore. Routes often go to the US and or Japan before finally going back to Singapore.
Besides this, you should not be using wifi on your local network, depending how noisy the environment is, this can result in dropped packets that need to be retransmitted and increase the overall latency.
Routers can have an effect on this too, but unless its heavily loaded, its probably not adding much latency.
You may want to do some research with traceroute to see how each data center performs and where the slow spots are.

Persistent TCP connections in terms of Mobile networks?

I have a questions regarding WebSocket communications in mobile connections.
I was wondering how the long-lived TCP connections can be handled for a long time in mobility networks when the user migrate among different networks. What happens to already established TCP connections when handover (hand-off) occurs?
Do different technologies (3G, 4G or etc) behave differently in this case?
I will appreciate if you could leave some online sources or articles as well that I can read more in this regard?
Thank you in advance :)
The hand-off is always transparent to the user — all TCP and voice connections are always kept active when transitioning between the towers on a commercial mobile network like LTE, UMTS etc. You might experience some periods of time where the data stops flowing, but that's about it.
I've had several opportunities to verify this myself through an interesting experiment on a T-Mobile USA's HSPA+ nationwide network. Take a 12-hour-plus drive from one major city to another one, without turning your phone off. Take a look at the area where the external IPv4-address terminates (by using traceroute). You might as well notice that it's still at the same area where you've started your trip. Now reboot the phone, and see where the external IPv4 address is routed to now. You'll notice that now it's likely terminated in a major metro area closer to where you are. I.e., your connection within the core network of the operator follows you along not just within a given city, metro or state, but also between the states and the timezones.
The reason for this is that the carrier has a Core Network, and all external connections are handled by the Packet Gateway of the Core Network, which keeps track of all the connections. More on this is documented in Chapter 7 of the book called High Performance Browser Networking (HPBN.co).
This is not really a SO but more a programmers question and I don't see what you have researched for yourself, but you certainly can't rely on a connection to stay alive, mobile or not.
In fact mobile operators kill long-living connections by resetting them after a certain amount of time or data. So you should be ready to reconnect upon a socket exception anyway.

Trying to trace the time it takes for my website hosted in New Zealand to be accessed from the US

I live in New Zealand and I just started a website which I also have hosted in New Zealand as novazeal.com and novazeal.co.nz. I am hoping to target clients overseas though as well as in New Zealand so I trying to decide whether to start a second website hosted in the US and point the .com domain to that website instead.
I have heard from friends in the States that a site I had hosted here in New Zealand was slow to access, so what I really need to do is get the time it takes for a traceroute to hop through a location in the US. A normal tracert from my computer here will hop through servers in NZ only, so I can't get the measure I am looking for by using a normal tracert. Does anyone know of an alternative I could use such as an application that forces hops through a distant ISP, or a proxy service that gives the time it takes to retrieve a page from a distant location.
Of course if anyone in the States is willing to run the trace for me and send me the hop time stats I would be most grateful. I could ask the friends I mentioned, but they are not particularly technical, so it would probably be a confusing thing to try to explain to them by email.
There are web-based route tracing utilities, some of which are hosted in the US, that will show you the route and latency between that service and your site (at a point in time).
However, traceroute doesn't give you a full picture of the network latency effects to which you're subject: these days routers do all sorts of sophisticated traffic shaping and traceroute probes just won't be treated the same as your HTTP traffic even if you specify that probes should use TCP and port 80.
Not to mention that network latency itself is just a tiny piece of the puzzle. ISPs perform all sorts of cacheing using (sometimes transparent) HTTP proxies, from which you won't benefit unless/until your site is visited by their customers.

How to retain one million simultaneous TCP connections?

I am to design a server that needs to serve millions of clients that are simultaneously connected with the server via TCP.
The data traffic between the server and the clients will be sparse, so bandwidth issues can be ignored.
One important requirement is that whenever the server needs to send data to any client it should use the existing TCP connection instead of opening a new connection toward the client (because the client may be behind a firewall).
Does anybody know how to do this, and what hardware/software is needed (at the least cost)?
What operating systems are you considering for this?
If using a Windows OS and using something later than Vista then you shouldn't have a problem with many thousands of connections on a single machine. I've run tests (here: http://www.lenholgate.com/blog/2005/11/windows-tcpip-server-performance.html) with a low spec Windows Server 2003 machine and easily achieved more than 70,000 active TCP connections. Some of the resource limits that affect the number of connections possible have been lifted considerably on Vista (see here: http://www.lenholgate.com/blog/2005/11/windows-tcpip-server-performance.html) and so you could probably achieve your goal with a small cluster of machines. I don't know what you'd need in front of those to route the connections.
Windows provides a facility called I/O Completion Ports (see: http://msdn.microsoft.com/en-us/magazine/cc302334.aspx) which allow you to service many thousands of concurrent connections with very few threads (I was running tests yesterday with 5000 connections saturating a link to a server with 2 threads to process the I/O...). Thus the basic architecture is very scalable.
If you want to run some tests then I have some freely available tools on my blog that allow you to thrash a simple echo server using many thousands of connections (1) and (2) and some free code which you could use to get you started (3)
The second part of your question, from your comments, is more tricky. If the client's IP address keeps changing and there's nothing between you and them that is providing NAT to give you a consistent IP address then their connections will, no doubt, be terminated and need to be re-established. If the clients detect this connection tear down when their IP address changes then they can reconnect to the server, if they can't then I would suggest that the clients need to poll the server every so often so that they can detect the connection loss and reconnect. There's nothing the server can do here as it can't predict the new IP address and it will discover that the old connection has failed when it tries to send data.
And remember, your problems are only just beginning once you get your system to scale to this level...
This problem is related to the so-called C10K problem. The C10K page lists a large number of good resources for addressing the problems you will encounter when you try to allow thousands of clients to connect to the same server.
I've come across the APE Project
a while back. It seems like a dream come true. They can support up to 100k concurrent clients on a single node. Spread them across 10 or 20 nodes, and you can serve millions. Perfect for RESTful applications. Might want to look deeper for any shared namespace. One drawback is that this is a standalone server, as in supplementary to a web server. This server is of course Open Source, so any cost is hardware/ISP related.
You cannot use UDP. If the client sends a request and you don't reply immediately, a router is going to forget the reverse route in 30 seconds or less, so your server will never be able to reply to the client.
TCP is the only option, and it, too, will give you headaches. Most routers are going to forget the route and/or drop the connection after a few minutes, so your client/server code is going to have to send "keep alives" fairly often.
I recommend setting up a "sniffer", to see how the phone companies are staying in touch with your smartphone for their "push" technology. Copy whatever they're doing, because that stuff works!
As Greg mentioned, the problem you are describing is C10K (or rather "C1M" in your case )
I recently made a simple TCP echo server on linux that scales very well with the number of sessions (only tested up to 200.000 though), by using the epoll queue. On BSD, you have something similar called kqueue.
You can check out the code if you want to. Hope this helps and good luck!
EDIT: As noted in the comments below, my original assertion that there is a 64K limit based on the number of ports is incorrect, however there is a 32K limit on the number of socket handles, so my suggested design is valid.
With a typical TCP/IP server design, you're limited in the number of simultaneous open connections you can have. The server has one listening port, and when a client connects to it the server makes an accept call, and that creates a new socket on a random port for the rest of the connection.
To handle more than 64K simultaneous connections I think you need to use UDP instead. You only need one port for the server to listen on, and you need to manage the connections using a 32-bit client ID in the packet data instead of having a separate port for each client. The 32-bit client ID could be the client's IP address, and the client can listen on a known UDP port for messages coming back from the server. That port would be the only one that needs to be open on the firewall.
With this approach, your only limitation is how quickly you can handle and respond to UDP messages. With millions of clients, even sparse traffic could give you large spikes, and if you don't read the packets fast enough your input queue will fill up and you'll start dropping packets. The C10K page Greg points to will give you strategies for that.

Resources