It appears that cheap consumer routers are fairly easy to crash: hanging around in various backup/sync software forums, I see this mentioned from time to time. Developers seem to be putting a fair amount of effort into making sure they don't crash the routers.
What are the "do"s and "don't"s for my network-heavy application to ensure that it doesn't cause issues with badly designed routers? Especially one that intends to connect to a number of peers?
IMO trying to workaround bad hardware is the road to nowhere, because every router fails in its own remarkable way :).
What you can do in the network-heavy application is assume that network is not stable media (routers can crash, etc) and design application network operations accordingly.
For instance, provide reconnect logic, connection timeouts, some sort of state caching to allow users work with app even if network connectivity is gone.
Concerning faulty routers - they usually crash because of great number of simultaneous connections (e.g. downloading via bittorrent or other p2p protocol). So, maintaining minimum number of connections can help.
Related
I have a questions regarding WebSocket communications in mobile connections.
I was wondering how the long-lived TCP connections can be handled for a long time in mobility networks when the user migrate among different networks. What happens to already established TCP connections when handover (hand-off) occurs?
Do different technologies (3G, 4G or etc) behave differently in this case?
I will appreciate if you could leave some online sources or articles as well that I can read more in this regard?
Thank you in advance :)
The hand-off is always transparent to the user — all TCP and voice connections are always kept active when transitioning between the towers on a commercial mobile network like LTE, UMTS etc. You might experience some periods of time where the data stops flowing, but that's about it.
I've had several opportunities to verify this myself through an interesting experiment on a T-Mobile USA's HSPA+ nationwide network. Take a 12-hour-plus drive from one major city to another one, without turning your phone off. Take a look at the area where the external IPv4-address terminates (by using traceroute). You might as well notice that it's still at the same area where you've started your trip. Now reboot the phone, and see where the external IPv4 address is routed to now. You'll notice that now it's likely terminated in a major metro area closer to where you are. I.e., your connection within the core network of the operator follows you along not just within a given city, metro or state, but also between the states and the timezones.
The reason for this is that the carrier has a Core Network, and all external connections are handled by the Packet Gateway of the Core Network, which keeps track of all the connections. More on this is documented in Chapter 7 of the book called High Performance Browser Networking (HPBN.co).
This is not really a SO but more a programmers question and I don't see what you have researched for yourself, but you certainly can't rely on a connection to stay alive, mobile or not.
In fact mobile operators kill long-living connections by resetting them after a certain amount of time or data. So you should be ready to reconnect upon a socket exception anyway.
I am looking for networking designs and tricks specific to games. I know about a few problems and I have some partial solutions to some of them but there can be problems I can't see yet. I think there is no definite answer to this but I will accept an answer I really like. I can think of 4 categories of problems.
Bad network
The messages sent by the clients take some time to reach the server. The server can't just process them FCFS because that is unfair against players with higher latency. A partial solution for this would be timestamps on the messages but you need 2 things for that:
Be able to trust the clients clock. (I think this is impossible.)
Constant latencies you can measure. What can you do about variable latency?
A lot of games use UDP which means messages can be lost. In that case they try to estimate the game state based on the information they already have. How do you know if the estimated state is correct or not after the connection is working again?
In MMO games the server handles a large amount of clients. What is the best way for distributing the load? Based on location in game? Bind a groups of clients to servers? Can you avoid sending everything through the server?
Players leaving
I have seen 2 different behaviours when this happens. In most FPS games if the player who hosted the game (I guess he is the server) leaves the others can't play. In most RTS games if any player leaves the others can continue playing without him. How is it possible without dedicated server? Does everyone know the full state? Are they transfering the role of the server somehow?
Access to information
The next problem can be solved by a dedicated server but I am curious if it can be done without one. In a lot of games the players should not know the full state of the game. Fog-of-war in RTS and walls in FPS are good examples. However, they need to know if an action is valid or not. (Eg. can you shoot me from there or are you on the other side of the map.) In this case clients need to validate changes to an unknown state. This sounds like something that can be solved with clever use of cryptographic primitives. Any ideas?
Cheating
Some of the above problems are easy in a trusted client environment but that can not be assumed. Are there solutions which work for example in a 80% normal user - 20% cheater environment? Can you really make an anti-cheat software that works (and does not require ridiculous things like kernel modules)?
I did read this questions and some of the answers https://stackoverflow.com/questions/901592/best-game-network-programming-articles-and-books but other answers link to unavailable/restricted content. This is a platform/OS independent question but solutions for specific platforms/OSs are welcome as well.
Thinking cryptography will solve this kind of problem is a very common and very bad mistake: the client itself of course have to be able to decrypt it, so it is completely pointless. You are not adding security, you're just adding obscurity (and that will be cracked).
Cheating is too game specific. There are some kind of games where it can't be totally eliminated (aimbots in FPS), and some where if you didn't screw up will not be possible at all (server-based turn games).
In general network problems like those are deeply related to prediction which is a very complicated subject at best and is very well explained in the famous Valve article about it.
The server can't just process them FCFS because that is unfair against players with higher latency.
Yes it can. Trying to guess exactly how much latency someone has is no more fair as latency varies.
In that case they try to estimate the game state based on the information they already have. How do you know if the estimated state is correct or not after the connection is working again?
The server doesn't have to guess at all - it knows the state. The client only has to guess while the connection is down - when it's back up, it will be sent the new state.
In MMO games the server handles a large amount of clients. What is the best way for distributing the load? Based on location in game?
There's no "best way". Geographical partitioning works fairly well, however.
Can you avoid sending everything through the server?
Only for untrusted communications, which generally are so low on bandwidth that there's no point.
In most RTS games if any player leaves the others can continue playing without him. How is it possible without dedicated server? Does everyone know the full state?
Many RTS games maintain the full state simultaneously across all machines.
Some of the above problems are easy in a trusted client environment but that can not be assumed.
Most games open to the public need to assume a 100% cheater environment.
Bad network
Players with high latency should buy a new modem. I don't think its a good idea to add even more latency because one person in the game got a bad connection. Or if you mean minor latency differences, who cares? You will only make things slower and complicated if you refuse to FCFS.
Cheating: aimbots and similar
Can you really make an anti-cheat software that works? No, you can not. You can't know if they are running your program or another program that acts like yours.
Cheating: access to information
If you have a secure connection with a dedicated server you can trust, then cheating, like seeing more state than allowed, should be impossible.
There are a few games where cryptography can prevent cheating. Card games like poker, where every player gets a chance to 'shuffle the deck'. Details on wikipedia : Mental Poker.
With a RTS or FPS you could, in theory, encrypt your part of the game state. Then send it to everyone and only send decryption keys for the parts they are allowed to see or when they are allowed to see it. However, I doubt that in 2010 we can do this in real time.
For example, if I want to verify, that you could indeed be at location B. Then I need to know where you came from and when you were there. But if you've told me that before, I knew something I was not allowed to know. If you tell me afterwards, you can tell me anything you want me to believe. You could have told me before, encrypted, and give me the decryption key when I need to verify it. That would mean, you'll have to encrypt every move you make with a different encryption key. Ouch.
If your not implementing a poker site, cheating won't be your biggest problem anyway.
With a lot of people accessing games on mobile devices, a "bad network" can occur when a player is in an area of poor reception or they're connected to a slow-wifi connection. So it's not just a problem of people connecting in sparsely populated areas. With mobile clients "bad networks" can occur very very often and it's usually EXTREMELY hard to diagnose.
UDP results in packet loss, but even games that use TCP and HTTP based can experience problems where the client & server communication slows to a crawl while packets are verified to have been sent. With communication UDP compensation for packet loss USUALLY depends on what the packets contain. If you're talking about motion data, usually if packets aren't received, the server interpolates the previous trajectory and makes a position change. Usually it's custom to the game how this is handled, which is why people often avoid UDP unless their game type requires it. Often to handle high network latency, problems games will automatically degrade the amount of features available to the users so that they can still interact with the game without causing the user to get kicked or experience too many broken features.
Optimally you want to have a logging tool like Loggly available that can help you find errors related to bad connection and latency and show you the conditions on the clients and server at the time they happened, this visibility lets you diagnose common problems users experience and develop strategies to address them.
Players leaving
Most games these days have dedicated servers, so this issue is mostly moot. However, sometimes yes, the server can be changed to another client.
Cheating
It's extremely hard to anticipate how players will cheat and create a cheat-proof system no one can hack. These days, a lot of cheat detection strategies are based on heuristic analysis of logging and behavioral analytics information data to spot abnormalities when they happen and flag it for review. You definitely should try to cheat-proof as much as is reasonable, but you also really need an early detection system that can spot new flaws people are exploiting.
I have an old school foxpro web app that I am trying to help limp along while I rewrite the system. Every day, multiple times, I get this following error message: The specified network name is no longer available.
Does anyone have any suggestions how to troubleshoot this? Perhaps, prove to my IT guys that there really is a network issue. I have theories, but I have no idea how to prove anything, it always comes back to foxpro sucks rewrite it now.
I'll take any help, tools, and will answer any questions that may clarify this for you.
thanks
We have a very large multi-user VFP application on hundreds of sites. Occasionally you get this sort of problem. It is almost always down to environmental issues.
Had one just recently where a client had two machines continually crashing out of the VFP application. Network IT guys swearing up and down that it's not their problem. But what's this in the System Log of both machines? Why, it's the Broadcom NIC reporting a network link loss detected at the same times the application crashed.
Check if the client and server NICs in your situation can report this.
You could consider writing a small program that pings the network resource periodically. You might just look for a file and if the network is failing and the program cannot find the file email the folks in charge of the network and yourself. This would be an independent app, and best if not written in FoxPro so you can independently prove it is not the application or the language/tool it was written in.
I have seen this when networks have bad wiring, a bad port on the switch/hub, a failing NIC in the mix, and sometimes when the network is just flooded with requests from workstations.
You also did not mention if this was a wireless connection. I am hoping not, but I have seen wireless (especially slower wireless) hubs fail with respect to the network overload and slow and unreliable performance. Especially compared to a wired network.
Rick Schummer
In addition to the comments about IP address, is the setting on the network controller to be energy efficient? and thus turn itself off when not actively in use.
This Paper (When the CRC and TCP checksum disagree) suggests that since the TCP checksumming algorithm is rather weak, there would occur an undetected error every 16 million to 10 billion packets using TCP.
Are there any application developers out there who protect the data against such kind of errors by adding checksums at the application level?
Are there any patterns available to protect against such errors while doing EJB remote method invocation (Java EE 5)? Or does Java already checksum serialized objects automatically (additionally to the underlying network protocol)?
Enterprise software has been running on computers doing not only memory ECC, but also doing error checking within the CPU at the registers etc (SPARC and others). Bit errors at storage systems (hard drives, cables, ...) can be prevented by using Solaris ZFS.
I was never afraid of network bit errors because of TCP - until I saw that article.
It might not be that much work to implement application level checksumming for some very few client server remote interfaces. But what about distributed enterprise software that runs on many machines in a single datacenter. There can be a really huge number of remote interfaces.
Is every Enterprise Software vendor like SAP, Oracle and others just ignoring this kind of problem? What about banks? What about stock exchange software?
Follow up: Thank you very much for all your answers! So it seems that it is pretty uncommon to check against undetected network data corruption - but they do seem to exist.
Couldn't I solve this problem simply by configuring the Java EE Application Servers (or EJB deployment descriptors) to use RMI over TLS with the TLS configured to use MD5 or SHA1 and by configuring the Java SE clients to do the same? Would this be a way to get reliable transparent checksumming (although by overkill) so that I would not have to implement this at application level? Or am I completely confused network-stack wise?
I am convinced that every application that cares about data integrity should use a secure hash. Most, however, do not. People simply ignore the problem.
Although I have frequently seen data corruption over the years - even that which gets by checksums - the most memorable in fact involved a stock trading system. A bad router was corrupting data such that it usually got past the TCP checksum. It was flipping the same bit off and on. And of course, no one is alerted for the packets that in fact failed the TCP checksum. The application had no additional checks for data integrity.
The messages were things like stock orders and trades. The consequences of corrupting the data are as serious as it sounds.
Luckily, the corruption caused the messages to be invalid enough to result in the trading system completely crashing. The consequences of some lost business were nowhere near as severe as the potential consequences of executing bogus transactions.
We identified the problem with luck - someone's SSH session between two of the servers involved failed with a strange error message. Obviously SSH must ensure data integrity.
After this incident, the company did nothing to mitigate the risk of data corruption while in flight or in storage. The same code remains in production, and in fact additional code has gone into production that assumes the environment around it will never corrupt data.
This actually is the correct decision for all the individuals involved. A developer who prevents a problem that was caused by some other part of the system (e.g. bad memory, bad hard drive controller, bad router) is not likely to gain anything. The extra code creates the risk of adding a bug, or being blamed for a bug that isn't actually related. If a problem does occur later, it will be someone else's fault.
For management, it's like spending time on security. The odds of an incident are low, but the "wasted" effort is visible. For example, notice how end-to-end data integrity checking has been compared to premature optimization already here.
So far as things changing since that paper was written - all that has changed is we have greater data rates, more complexity to systems, and faster CPUs to make a cryptographic hash less costly. More chances for corruption, and less cost to preventing it.
The real issue is whether it is better in your environment to detect/prevent problems or to ignore them. Remember that by detecting a problem, it may become your responsibility. And if you spend time preventing problems that management does not recognize is a problem, it can make you look like you are wasting time.
I've worked on trading systems for IBs, and I can assure you there is no extra checksumming going on - most apps use naked sockets. Given the current problems in the financial sector, I think bad TCP/IP checksums should be the least of your worries.
Well, that paper is from 2000, so it's from a LONG time ago (man, am I old), and on a pretty limited set of traces. So take their figures with a huge grain of salt. That said, it would be interesting to see if this is still the case. However, I suspect things have changed, though some classes of errors may still well exist, such as hardware faults.
More useful than checksums if you really need the extra application-level assurance would be a SHA-N hash of the data, or MD5, etc.
I'm developing a multi-player game and I know nothing about how to connect from one client to another via a server. Where do I start? Are there any whizzy open source projects which provide the communication framework into which I can drop my message data or do I have to write a load of complicated multi-threaded sockety code? Does the picture change at all if teh clients are running on phones?
I am language agnostic, although ideally I would have a Flash or Qt front end and a Java server, but that may be being a bit greedy.
I have spent a few hours googling, but the whole topic is new to me and I'm a bit lost. I'd appreciate help of any kind - including how to tag this question.
If latency isn't a huge issue, you could just implement a few web services to do message passing. This would not be a slow as you might think, and is easy to implement across languages. The downside is the client has to poll the server to get updates. so you could be looking at a few hundred ms to get from one client to another.
You can also use the built in flex messaging interface. There are provisions there to allow client to client interactions.
Typically game engines send UDP packets because of latency. The fact is that TCP is just not fast enough and reliability is less of a concern than speed is.
Web services would compound the latency issues inherent in TCP due to additional overhead. Further, they would eat up memory depending on number of expected players. Finally, they have a large amount of payload overhead that you just don't need (xml anyone?).
There are several ways to go about this. One way is centralized messaging (client/server). This means that you would have a java server listening for udp packets from the clients. It would then rebroadcast them to any of the relevant users.
A second way is decentralized (peer to peer). A client registers with the server to state what game / world it's in. From that it gets a list of other clients in that world. The server maintains that list and notifies the other clients of people who join / drop out.
From that point forward clients broadcast udp packets directly to the other users.
If you look for communication framework with high performance try look at ACE C++ framework (it has Java bindings).
Official web-site is: http://www.cs.wustl.edu/~schmidt/ACE-overview.html
You could also look into Flash Media Interactive Server, or if you want a Java implementation, Wowsa or Red5. Those use AMF and provide native functionality for ShareObjects including synching of the ShareObjects among connected clients.
Those aren't peer to peer though (yet, it's coming soon I hear). They use centralized messaging managed by the server.
Good luck