Avoiding double inbound/outbound connection between bittorrent peers - tcp

I am implementing a bittorrent client in Elixir as a learning exercise.
Reading multiple docs, it appears that a proper client should not only be able to create outbound socket connections but also listen on a port and accept inbound connections.
But I cannot find docs about Alice and Bob trying to connect to each other simultaneously.
That would end up with two connections with the same peer (for the same info hash / torrent). Now if I want to track my connections the reasonable thing seems to associate them with the torrent id + peer id, which means boths connections would have the same identifier.
Is there something I missed that prevents that case? Or do clients generally just close one of the two connections randomly? Or maybe the IP (and even port) can be part of the ID so keep both connections is actually OK (though useless I guess).
What would be the proper solution?
Thank you.

Or do clients generally just close one of the two connections randomly?
More or less that, although with some biases such as preferring the longer-lived connection to stay alive or with some randomized delay to prevent simultaneous close attempts.
The situation that two clients initiate a connection to each other at the same time is not that common anyway, most of the time things will be different enough between peers that one will be clearly arrive first and then the other side will know about it and avoid the reverse attempt.
Sidenote: One shouldn't trust the peer ID too much. It's useful, but an adversarial client could intentionally pick the ID of another one or randomize it on every connection.

Related

Filtering UDP packets accidentally sent to my port

I am designing a simple protocol on top of UDP and now I've realized that someone else can send a packet to the port I'm listening on. Such packet will obviusly be incorrect for my application (I'm not concerned about security now)
Is there the way of filtering these invalid messages? I was thinking about adding some fixed magic number at the beginning of each packet, but how large should it be? Is 16 bits enough?
I believe the typical solution is to require a handshake (that could include a "long enough" magic number) in the beginning of the session. Of course the "session" is something your protocol needs to keep track of, UDP does not have the concept. Keeping a list ip, port and last packet receive time for all current sessions should do it: then you can drop all packets from a peers that have not done the handshake beforehand. This not only prevents random unknown app traffic from breaking your application but will also prevent multiple legitimate peers from screwing up each others traffic.
Additionally you could add either a session id or an increasing packet sequence number (with allowances for missing packets) to packets if you need to ensure that the peer agrees with you on which session this is.
Probably. Java uses 32 for .class files (0xcafebabe). But you only have 534 bytes in practical UDP so you might need to save a couple.

TCP as connection protocol questions

I'm not sure if this is the correct place to ask, so forgive me if it isn't.
I'm writing computer monitoring software that needs to connect to a server. The server may send out relatively urgent messages, such as sound or cancel an alarm, and the client may send out data about the computer, such as screenshots. The data that the client sends isn't too critical on timing, but shouldn't be more than a two minutes late.
It is essential to the software that portforwarding need not be set up, and it is assumed that the internet connection will be done through a wireless router that has NAT almost all the time.
My idea is to have a TCP connection initiated from the client, and use that to transfer data. Ideally, I would have no data being sent when it is not needed, but I believe this to be impossible. Would sending the equivalent of a ping every now and again keep the connection alive, and what sort of bandwidth would it use if this program was running all the time on the computer? In addition, would it be possible to reduce the header size for these keep-alives?
Before I start designing the communication and programming, is this plan for connection flawed? Are there better alternatives?
Thanks!
1) You do not need to send 'ping' data to keep the connection alive, the TCP stack does this automatically; one reason for sending 'ping' data would be to detect a connection close on the client side - typically you only find out something has gone wrong when you try and read/write from the socket. There may be a way to change various time-outs so you can detect this condition faster.
2) In general while TCP provides a stream-oriented error free channel, it makes no guarantees about timeliness, if you are using it on the internet it is even more unpredictable.
3) For applications such as this (I hope you are making it for ethical purposes) - I would tend to use TCP, since you don't want a situation where the client receives a packet to raise an alarm but misses that one that turns it off again.

Requirements for Repeated TCP connects

I am using Winsock, and I have a need to issue a TCP connect repeatedly to a third-party server. These applications will stay up potentially for days at a time. I am the only client connecting to the server. The time between connects is on the order of seconds, and the connection stays up only long enough to send a single message of a few bytes. I am currently seeing that the connects start to fail (WSAECONNREFUSED) after a few hours. Is there anything I must do (e.g. socket options, etc.) to ensure these frequent repeated connects will succeed for an indefinite amount of time? Thanks!
When doing a lot of transaction based connections and having issues with TCP's TIME_WAIT state duration (which last 2MSL = 120 seconds) leading to no more connections available for a client host toward a special server host, you should consider UDP and managing yourself the re-sending of lost requests.
I know that sounds odd. But standard services like DNS are required to use UDP to handle a ton of transactions (request then a single answer in one UDP segment) in order to avoid issues you are experimenting yourself. Web browsers send a request using UDP to the DNS. Re-request is done using UDP after a short time, no longer than a few milliseconds I guess. Sometimes the resolved name is too long and does not fit in the UDP paquet. As a consequence the DNS server send a UDP reply with a dedicated flag raised, in order to ask the client to use TCP this time.
Moreover you may consider also the T/TCP extension (Transactional TCP) of TCP, if available on your Windows platform. It provides TCP reliability with shorter TIME_WAIT state, as nearly no costs in the modifications of your client code. As far as I know it may work even though the server does not handle that extension. As a side note it is currently not used on the internet as it is know to have some flaw...

HTTP push over 100,000 connections

I want to use a client-server protocol to push data to clients which will always remain connected, 24/7.
HTTP is a good general-purpose client-server protocol. I don't think the semantics possibly could be very different for any other protocol, and many good HTTP servers exist.
The critical factor is the number of connections: the application will gradually scale up to a very large number of clients, say 100,000. They cannot be servers because they have dynamic IP addresses and may be behind firewalls. So, a socket link must be established and preserved, which leads us to HTTP push. Only rarely will data actually be pushed to a given client, so we want to minimize the connection overhead too.
The server should handle this by accepting the connection, inserting the remote IP and port into a table, and leaving it idle. We don't want 100,000 threads running, just so many table entries and file descriptors.
Is there any way to achieve this using an off-the-shelf HTTP server, without writing at the socket layer?
Use Push Framework : http://www.pushframework.com.
It was designed for that goal of managing a large number of long-lived asynchronous full-duplex connections.
LightStreamer (http://www.lightstreamer.com/) is the tool that is made specifically for PUSH operations of HTTP.
It should solve this problem.
You could also look at Jetty + Continuations.

TCP Connection Life

How long can I expect a client/server TCP connection to last in the wild?
I want it to stay permanently connected, but things happen, so the client will have to reconnect. At what point do I say that there's a problem in the code rather than there's a problem with some external equipment?
I agree with Zan Lynx. There's no guarantee, but you can keep a connection alive almost indefinitely by sending data over it, assuming there are no connectivity or bandwidth issues.
Generally I've gone for the application level keep-alive approach, although this has usually because it's been in the client spec so I've had to do it. But just send some short piece of data every minute or two, to which you expect some sort of acknowledgement.
Whether you count one failure to acknowledge as the connection having failed is up to you. Generally this is what I have done in the past, although there was a case I had wait for three failed responses in a row to drop the connection because the app at the other end of the connection was extremely flaky about responding to "are you there?" requests.
If the connection fails, which at some point it probably will, even with machines on the same network, then just try to reestablish it. If that fails a set number of times then you have a problem. If your connection persistently fails after it's been connected for a while then again, you have a problem. Most likely in both cases it's probably some network issue, rather than your code, or maybe a problem with the TCP/IP stack on your machine (has been known: I encountered issues with this on an old version of QNX--it'd just randomly fall over). Having said that you might have a software problem, and the only way to know for sure is often to attach a debugger, or to get some logging in there. E.g. if you can always connect successfully, but after a time you stop getting ACKs, even after reconnect, then maybe your server is deadlocking, or getting stuck in a loop or something.
What's really useful is to set up a series of long-running tests under a variety of load conditions, from just sending the keep alive are you there?/ack requests and responses, to absolutely battering the server. This will generally give you more confidence about your software components, and can be really useful in shaking out some really weird problems which won't necessarily cause a problem with your connection, although they might result in problems with the transactions taking place. For example, I was once writing a telecoms application server that provided services such as number translation, and we'd just leave it running for days at a time. The thing was that when Saturday came round, for the whole day, it would reject every call request that came in, which amounted to millions of calls, and we had no idea why. It turned out to be because of a single typo in some date conversion code that only caused a problem on Saturdays.
Hope that helps.
I think the most important idea here is theory vs. practice.
The original theory was that the connections had no lifetimes. If you had a connection, it stayed open forever, even if there was no traffic, until an event caused it to close.
The new theory is that most OS releases have turned on the keep-alive timer. This means that connections will last forever, as long as the system on the other end responds to an occasional TCP-level exchange.
In reality, many connections will be terminated after time, with a variety of criteria and situations.
Two really good examples are: The remote client is using DHCP, the lease expires, and the IP address changes.
Another example is firewalls, which seem to be increasingly intelligent, and can identify keep-alive traffic vs. real data, and close connections based on any high level criteria, especially idle time.
How you want to implement reconnect logic depends a lot on your architecture, the working environment, and your performance goals.
It shouldn't really matter, you should design your code to automatically reconnect if that is the desired behavior.
There really is no way to tell. There is nothing inherent to TCP that would cause the connection to just drop after a certain amount of time. Someone on a reliable connection could have years of uptime, while someone on a different connection could have to reconnect every 5 minutes. There is no way to tell or even guess.
You will need some data going over the connection periodically to keep it alive - many OS's or firewalls will drop an inactive connection.
Pick a value. One drop every hour is probably fine. Ten unexpected connection drops in 5 minutes probably indicates a problem.
TCP connections will generally last about two hours without any traffic. Either end can send keep-alive packets, which are, I think, just an ACK on the last received packet. This can usually be set per socket or by default on every TCP connection.
An application level keep-alive is also possible. For a telnet style protocol like FTP, SMTP, POP or IMAP something like sending return, newline and getting back a command prompt.

Resources