So the two general problem states that there is no deterministic way of knowing if the other party - to whom we communicate via a unreliable channel - has received our messages. This is quite analogous to the TCP handshake where we send a syn syn ack ack and establish a connection. Isn't this opposing the two general problem claim?
The two general problem is indeed the asynchronous model for TCP, which is why (as the theoretical result shows) the two endpoints cannot simultaneously have common knowledge about the state of the connection.
The way every distributed agreement protocol deals with this issue is to always promise safety (nothing bad will happen), but cannot guarantee liveness (that progress will eventually be made). Liveness is not in your hands. In good times, one can try to do ones best and hope to make progress.
In TCP it means that an endpoint can make an assumption (such as "connection established") without definitely knowing the other's state. However, it is not an unsafe assumption to make; at worst, it is a benign misunderstanding. After a timeout, it will change its opinion. It is no different from being on one end of a long-distance telephone and continuing to talk thinking the connection is still on; after a while, you may have to ask "hello, you still there?", and time out. Real world protocols must always have timeouts (unlike asynchronous formal models) because somewhere up the stack they serve some human function, and human patience is limited. In practice, there are sufficiently good stretches of time that progress can be made, so we just have to pick appropriate timeouts that don't time out too early either.
That said, even benign misunderstandings can have undesirable consequences. For example, after a server responds to the syn, it allocates resources for the connection in the hope that the client will finish the protocol. This is a classic denial of service attack because a rogue client can simply start off the handshake sequence but never finish it, leaving an unprepared server with millions of state machines allocated. Care is required.
Related
I have an account with a trading exchange and they have a websockets API which supports ws://... and wss://...
For the non-authenticated channels such as the current state of the orderbook, is it an easy decision to just use ws, mostly for the (however minimal) time savings? Obviously I would like to have my data as recent as possible.
I just want to check there's not some other factor which is more important than a few TLS encryption CPU cycles and ms in latency saved.
I see absolutely no reason not to use wss for all your connections, particular with a webSocket. The normal webSocket use is to make a connection, then keep that connection for a long time and use it. While there is a bit of overhead on every transmission because of the encryption, the main wss overhead is when you first make the connection and that only happens once per connection.
I just want to check there's not some other factor which is more important than a few TLS encryption CPU cycles and ms in latency saved.
No, there is not some other factor. In fact, the opposite. There are more and more and more reasons these days to use TLS whenever possible to protect your privacy.
For the non-authenticated channels such as the current state of the orderbook, is it an easy decision to just use ws, mostly for the (however minimal) time savings?
Why? If wss is available, I'd be using it for everything. If you actually run into a CPU problem down the road, you could revisit whether using wss has anything to do with it, but that is unlikely to happen and, in my opinion, you have nothing to lose by starting with wss. While one wants to design code intelligently, you don't want to try to micro-optimize performance-related things before you even have a documented, measured performance issue to worry about.
General reasons to use TLS:
Privacy (nobody in the middle can snoop on what you're doing)
Security of data (nobody can read your data, not even proxies)
Security of endpoint (endpoint you're connecting to can't be hijacked without you knowing about it)
I just want to check there's not some other factor which is more important than a few TLS encryption CPU cycles and ms in latency saved.
Actually, there is.
This is also stated in the RFC:
At the time of writing of this specification, it should be noted that connections on ports 80 and 443 have significantly different success rates, with connections on port 443 being significantly more likely to succeed, though this may change with time.
Some network intermediaries (especially with some mobile providers) will fail on ws connections but work properly when using wss.
The reason seems to be that these intermediaries (proxies / routers) will attempt to read the WebSocket message as if it were HTTP and "fix" HTTP errors or resolve caching (which actually corrupts WebSocket data).
The encrypted wss protocol will trigger a pass-through mode, since these intermediaries won't be able to read the data or "fix" any HTTP errors.
The Websocket protocol uses client-side frame masking for the same purpose, but sometimes with limited results. Using wss increases connectivity on some networks.
There's a product (that will stay unnamed) that suffers from a peculiar feature/issue.
It will drop new SYN packets if it is overloaded. This may not seem unreasonable to some. It may seem unthinkable to others. Be that as it may.
How many times will an upstream TCP client retry sending the SYN before giving up? Is this number based on a RFC or standard or is it just industry norm?
How about SCTP and INIT?
The default TCP connect timeout, which is the underlying topic of your question, is platform-dependent at the client, somewhere around a minute. This is coded into the connect() system call as 3 retries using timeouts of say 8,16,32 seconds, depending on the implementation. The TCP stack may do its own retries as well before failing back to the connect() function, so there could be quite a few tries before the eventual ECONNTIMEOUT.
There is nothing wrong with a platform dropping SYN packets under overload. Unix has been doing it for thirty years, so it can't suddenly have become a problem now. It is a platform action, not a product action, so the secret unnamed product you mention is not the culprit, unless it is an operating system. I don't see why it can't be named here.
When I'm learning about various technologies, I often try to think of how applications I use regularly implement such things. I've played a few MMOs along with some FPSs. I did some looking around and happened upon this thread:
http://www.gamedev.net/topic/319003-mmorpg-and-the-ol-udp-vs-tcp
I have been seeing that UDP shines when some loss of packets is permissible. There's less overhead involved and updates are done more quickly. After looking around for a bit and reading various articles and threads, I've come to see that character positioning will often be done with UDP. Games like FPSs will often be done with UDP because of all the rapid changes that are occuring.
I've seen multiple times now where someone pointed out issues that can occur when using UDP and TCP simultaneously. What might some of these problems be? Are these issues that would mostly be encountered by novice programmers? It seems to me that it would be ideal to use a combination of UDP and TCP, gaining the advantages of each. However, if using the two together adds a significant amount of complexity to the code to deal with problems caused, it may not be worth it in certain situations.
Many games use UDP and TCP together. Since it is mandatory for every game to deliver the actions of a player to everyone, it has to be done in one way or the other. It now depends on what kind of game you want to make. In a RTS, surely TCP would be much wiser, since you cannot lose information about your opponents movement. In an RPG, it is not that incredibly important to keep track of everything every second of the game.
Bottom line is, if data has to arrive at the client, in any case, (position updates, upgrades aso.), you have to send it via TCP, or, you implement your own reliable protocol based on UDP. I have constructed quite a few network stacks for games, and I have to say, what you use depends on the usecase and what you are trying to accomplish. I mostly did a heartbeat over UDP, so the remote server/client knows that I am still there. The problem with UDP is, that packets get lost and not resent. If a packet drops, it is lost for ever. You have to take that into account. If you send some information over UDP, it has to be information that is allowed to be lost. Everything else goes via TCP.
And so the madness began. If you want to get most out of both, you will have to adapt them. TCP can be slow sometimes, and you have to wait, if packets get fragmented or do not arrive in order, until the OS has reassembled them. In some cases it could be advisable to build your own reliable protocol on top of UDP. That would allow you complete control over your traffic. Most firewalls do not drop UDP or anything else, but as with TCP, any traffic that is not declared to be safe (Opening Ports, packet redirects, aso.), gets dropped. I would suggest you read up the TCP and UDP and UDP-Lite article up at wikipedia and then decide which ones you want to use for what. AFAIK Battle.net uses a combination of the two.
Many services use udp and tcp together, but doing so without adding congestion control to your udp implementation can cause major problems for tcp. Long story short, udp can and often will clog the routers at each endpoint making tcp's congestion control to go haywire and significantly limit the throughout of the tcp connection. The udp based congestion can also cause significant increase in packet loss for tcp, limiting tcp's throughput even more as it will need to have these packets retransmitted. Using them together isn't a bad idea and is even becoming somewhat common, but you'll want to keep this in mind.
The first possible issue I can think of is that, because UDP doesn't have the overhead inherent in the "transmission control" that TCP does, UDP has higher data bandwidth and lower latency. So, it is possible for a UDP datagram that was sent after a TCP message to be available on the remote computer's input buffers before the TCP message is received in full.
This may cause problems if you use TCP to control or monitor UDP transmission; for instance, you might tell the server via TCP that you'll be sending some datagrams via UDP on port X. If the remote computer isn't already listening on port X, it may not receive some of these datagrams, because they arrived before it was told to listen; if it is listening, but not expecting traffic from you, it may discard them because they showed up before it was told to expect them. This may have an adverse effect on your program's flow or your user's experience.
I think that if you like to transfer game data TCP will be the only solution. Imagine that you send a command (e.g gained x item :P) at the server and this packet never reach its destination. (udp has no guaranties).
Also imagine the scenario that two or more UDP packets reach their destination in wrong order.
But if with your game you integrate any VoIP or Video Call capabilities you can use at these UDP.
Some games today use a network system that transmits messages over UDP, and ensures that the messages are reliable and ordered.
For example, RakNet is a popular game network engine. It uses only UDP for its connections, and has a whole system to ensure that packets can be reliable and ordered if you so choose.
My basic question is, what's up with that? Isn't TCP the same thing as ordered, reliable UDP? What makes it so much slower that people have to basically reinvent the wheel?
General/Specialisation
TCP is a general purpose reliable system
UDP +whatever is a special purpose reliable system.
Specialized things are usually better than general purpose things for the thing they are specialized.
Stream / Message
TCP is stream-based
UDP is message-based
Sending discrete gaming information maps usually better to a message-based paradigm. Sending it through a stream is possible but horribly ineffective. If you want to reliably send a huge amount of data (File transfer), TCP is quite effective. That's why Bit-torrent use UDP for control messages and TCP for data sending.
We switched from reliable to unreliable in "league of legends" about a year ago because of several advantages which have since proven to be true:
1) Old information becomes irrelevant. If I send a health packet and it doesn't arrive... I don't want to have to wait for that same health packet to resend when I know its changed.
2) Order is sometimes not necessary. If I'm sending different messages to different systems it may not be necessary to get those messages in order. I don't force the client to wait for in-order messages.
3) Unreliable doesn't get backed up with messages... ie waiting for acknowledgements which means you can resolve loss spikes much more quickly.
4) You can control resends when necessarily more efficiently. Such as repacking something that didn't send into another packet. (TCP does repack but you can do it more efficiently with knowledge about how your program works.)
5) Flow control of message such as throwing away messages that are less relevant when the network suddenly spikes. The network system can choose not to resend less relevant messages when you have a loss spike. With TCP you'd still have a queue of messages that are trying to resend which may be lower priority.
6) Smaller header packet... don't really need to say much about that.
There's much more of a difference between UDP and TCP than just reliability and sequencing:
At the heart of the matter is the fact that UDP is connectionless while TCP is connected. This simple difference leads to a host of other differences that I'm not going to be able to reasonbly summarize here. You can read the analysis below for much more detail.
TCP - UDP Comparative Analysis
The answer in in my opinion the two words: "Congestion control".
TCP goes to great lengths to manage the bandwidth of the path - to use the most of it, but to ensure that there is space for other applications. This is a very hard task, and inherently it is not possible to use 100% of the bandwidth 100% of the time.
With UDP, on the other hand, one can make their own protocol to send the packets onto the wire as fast as they want - this makes the protocol very unfriendly to other applications, but can gain more "performance" short-term. On the other hand, it is with high probability that if the conditions are appropriate, this kind of protocols might contribute to congestion collapse.
TCP is a stream-oriented protocol, whereas UDP is a message-oriented protocol. Hence TCP does more than just reliability and ordering. See this post for more details. Basically, the RakNet developers added the reliability and ordering while still keeping it as a message-oriented protocol, and so the result was more lightweight than TCP (which has to do more).
This little article is old, but it's still pretty true when it comes to games. It explains the two protocols, and the havoc these folks went trying to develop a multiplayer internet game. "X-Wing vs Tie Fighter"
Lessons Learned (The Internet Sucks)
There is one caveat to this though, I run/develop a multiplayer game, and I've used both. UDP was much better for my app, but alot of people couldn't play with UDP. Routers and such blocked the connections. So I changed to the "reliable" TCP. Well... Reliable? I don't think so. You send a packet, no errors, you send another and it crashes (exception) in the middle of the packet. Now which packets made it? So you end up writing a reliable protocol ON TOP OF tcp, to simulate UDP - but continuously establish a new connection when it crashes. Take about inefficient.
UDP + Stop and wait ARW = good
UDP + Sliding Window Protocol = better
TCP + Sliding Window Protocol with reconnection? = Worthless bulkware. (IMHO)
The other side effect is multi-threaded applications. TCP works well for a chat room type thing, since each room can be it's own thread. A room can hold 60-100 people and it runs fine, as the Room thread contains the Sockets for each participant.
UDP on the other hand is best served (IMO) by one thread, but when you get the packet, you have to parse it to figure out who it came from (via info sent or RemoteEndPoint), then pass that data to the chatroom thread in a threadsafe manner.
Actually, you have to do the same with TCP, but only on connect.
Last point. Remember that TCP will just error out and kill the connection at anytime, but you can reconnect in about .5 seconds and send the same information. Most bizzare thing I've ever worked with.
UDP have a lower reliability give it more reliability by making it send a message and wait for respond if no respond came it resend the message.
How long can I expect a client/server TCP connection to last in the wild?
I want it to stay permanently connected, but things happen, so the client will have to reconnect. At what point do I say that there's a problem in the code rather than there's a problem with some external equipment?
I agree with Zan Lynx. There's no guarantee, but you can keep a connection alive almost indefinitely by sending data over it, assuming there are no connectivity or bandwidth issues.
Generally I've gone for the application level keep-alive approach, although this has usually because it's been in the client spec so I've had to do it. But just send some short piece of data every minute or two, to which you expect some sort of acknowledgement.
Whether you count one failure to acknowledge as the connection having failed is up to you. Generally this is what I have done in the past, although there was a case I had wait for three failed responses in a row to drop the connection because the app at the other end of the connection was extremely flaky about responding to "are you there?" requests.
If the connection fails, which at some point it probably will, even with machines on the same network, then just try to reestablish it. If that fails a set number of times then you have a problem. If your connection persistently fails after it's been connected for a while then again, you have a problem. Most likely in both cases it's probably some network issue, rather than your code, or maybe a problem with the TCP/IP stack on your machine (has been known: I encountered issues with this on an old version of QNX--it'd just randomly fall over). Having said that you might have a software problem, and the only way to know for sure is often to attach a debugger, or to get some logging in there. E.g. if you can always connect successfully, but after a time you stop getting ACKs, even after reconnect, then maybe your server is deadlocking, or getting stuck in a loop or something.
What's really useful is to set up a series of long-running tests under a variety of load conditions, from just sending the keep alive are you there?/ack requests and responses, to absolutely battering the server. This will generally give you more confidence about your software components, and can be really useful in shaking out some really weird problems which won't necessarily cause a problem with your connection, although they might result in problems with the transactions taking place. For example, I was once writing a telecoms application server that provided services such as number translation, and we'd just leave it running for days at a time. The thing was that when Saturday came round, for the whole day, it would reject every call request that came in, which amounted to millions of calls, and we had no idea why. It turned out to be because of a single typo in some date conversion code that only caused a problem on Saturdays.
Hope that helps.
I think the most important idea here is theory vs. practice.
The original theory was that the connections had no lifetimes. If you had a connection, it stayed open forever, even if there was no traffic, until an event caused it to close.
The new theory is that most OS releases have turned on the keep-alive timer. This means that connections will last forever, as long as the system on the other end responds to an occasional TCP-level exchange.
In reality, many connections will be terminated after time, with a variety of criteria and situations.
Two really good examples are: The remote client is using DHCP, the lease expires, and the IP address changes.
Another example is firewalls, which seem to be increasingly intelligent, and can identify keep-alive traffic vs. real data, and close connections based on any high level criteria, especially idle time.
How you want to implement reconnect logic depends a lot on your architecture, the working environment, and your performance goals.
It shouldn't really matter, you should design your code to automatically reconnect if that is the desired behavior.
There really is no way to tell. There is nothing inherent to TCP that would cause the connection to just drop after a certain amount of time. Someone on a reliable connection could have years of uptime, while someone on a different connection could have to reconnect every 5 minutes. There is no way to tell or even guess.
You will need some data going over the connection periodically to keep it alive - many OS's or firewalls will drop an inactive connection.
Pick a value. One drop every hour is probably fine. Ten unexpected connection drops in 5 minutes probably indicates a problem.
TCP connections will generally last about two hours without any traffic. Either end can send keep-alive packets, which are, I think, just an ACK on the last received packet. This can usually be set per socket or by default on every TCP connection.
An application level keep-alive is also possible. For a telnet style protocol like FTP, SMTP, POP or IMAP something like sending return, newline and getting back a command prompt.