Is SCTP good for peer-to-peer apps? - networking

I am considering using SCTP instead of TCP for a p2p app written in C. Should I do it? Also how does the speed of SCTP compare to the speed of TCP?
EDIT:
I found that SCTP can be tunneled over UDP with the only problem being tunneled SCTP is not interoperable with untunneled SCTP.

Have you considered whether your target systems will all have SCTP pre-installed on them or whether your application will need to include SCTP itself? In my experience I would not expect all systems to have SCTP installed on them, and I would expect them not to if it were Windows.
If you include SCTP in the application itself then that will more than double the number of messages being passed into an out of the Kernel which will impact performance when compared with using the pre installed TCP.
Have you considered what benefits you want from SCTP? You mentioned fault tolerance but for this to work with SCTP it requires the application to have multiple ethernet ports and and IP addresses. Is this likely on your app?
As much as I love SCTP (!) I would seriously consider sticking with TCP unless you are sure SCTP is needed or unless you control the hosts your app is deployed on.
Regards

If it's for a local area network, sure go for it.
Note however that if you plan to use it on the open internet many consumer grade firewalls aren't flexible enough to permit unrecognised IP protocols through them.

How does it help you?
You're P2P, so every peer must have at least one socket open to every other peer.
If you've got a socket open, then you can do everything you need to do over that. If you've taken the approach of one socket per file and you have multiple files being tranferred concurrently between two given peers, then SCTP will save you one socket per file. However, on a normal P2P network of any size, you will almost never have multiple files being transferred concurrently between two peers.
Just have one socket and have your own little protocol; send a packet with a header, the header indicates content type, e.g. a command, or part a file - and if so, which file, and which byte range.
Of course, you get a little overhead for that, whereas if you have one socket for commands and one per file, you're more efficient. Is saving one socket per peer (assuming one download at a time) worth the time/hassle/complexity of using SCTP?

Related

What strategies I can use to overcome networking limitations?

I maintain a service that basically pings sites to check whether they're online or not. The service per se is really simple, it relies only on the HTTP status code returned by the requested URL. For instance, I ignore the response body completely.
The service works fine for a small list of domains. However, networking becomes an issue as the number of sites to ping grows. I tried a couple of different languages and libraries. My latest implementation uses NodeJS and node-fetch. But I already had versions of it wrote in Python, PHP, Java, Golang. From that experience, I now know the language is not what determines the request/response speed. There are differences between languages and lib, for sure, but the bottleneck is not there.
Today, I think the only way to make the service scales is with multiple clusters in different networks (e.g. VPC if we're talking AWS). I can't think of a way to deal with networking restrictions in a single or just a few instances.
So, I'm asking this really broad question: what strategies I can use to overcome networking limitations? I'm looking for both dev and ops answers, but mostly focusing on keep the structure as light as possible.
One robust way to ping a website (or any TCP service in general) is to send TCP SYN packet to port 443 (or 80 for insecure HTTP) and measure the time till SYN+ACK response. Tools like hping3 and MTR utilize this method.
This method is one of the best because ICMP may be blocked, take a different path, be prioritized differently on routers in the path, or be responded to by a totally different host. Whereas TCP SYN is the actual scenario the users of the website exercise. The network load is minimal as no data is sent in SYN/SYN+ACK packets, only protocol headers (TCP, IP, and lower level protocol headers).
The answer of #Maxim Egorushkin is great, TCP SYN scanning is the most efficient way I can think of. There are other tools like Masscan, use pcap to send SYN packet in userspace, reduce TCP connection management overhead in kernel. This approach may do the job with a single instance.
If you wanna use HTTP protocol to make sure application layer works fine, use HTTP HEAD request. It responses with a header and status code as GET, but without the body.
Another potential optimization is DNS, you can host a DNS server locally and manage to update domains beforehand, or use a script to update host file before pinging those sites. This can save several milliseconds and bandwith
during pinging sites.
At development level, you could impletement a library just parse status code in HTTP response, so saving some CPU time on parsing headers.
It is helpful to address the actual bottleneck first, it that bandwith limit? memory limit? file descriptor limit? etc.

JGroups UDP for membership but TCP for messaging?

We're prototyping a jgroups-based cluster node messaging system that will replace one that was JDBC-based. There are a lot of folks in my organization who are concerned about adding more multicast traffic to an already busy network, so I'm getting some pushback on a UDP/multicast solution.
I know JGroups can be configured to be TCP only, but I do not want to have to force a configuration step into the application where each node has to be identified ahead of time in a config file.
What I'd like then is to see if we can get a hybrid working here where multicast is used ONLY for group membership operations (discovery, heartbeats, failure detection), but messaging is all TCP-based.
I'm not finding examples of that in my searches, however, and am therefore questioning whether JGroups can be configured this way.
Can it, and any example configs showing how?
Thanks!
As for discovery, using MPING you can do that - it uses IP multicast to discover new nodes, although these later respond via the main transport (TCP in your case).
As for FD/FD_ALL, I don't think it's possible, the protocols are designed to use the main transport. You'd have to write your own FD protocol, shouldn't be that complicated.
However, if you can use UDP, you probably should. It's up to you whether you send a message to one node, multiple ones or all of them - for one destination there will be unicast UDP, for few destinations (if you set the anycast option) will be multiple unicasts used and only for all nodes the UDP will reduce the network load by multicasting. It's really up to the application, UDP just allows multicasts.

multicast packet order on localhost

since multicast packets are udp based in general it's unreliable
on localhost i would assume that a packet is just copied from one process's buffer to another thus it would be kinda queued in the order as the sender transmitted, right?
what i'm not sure about is:
can i assume correct package order in general on localhost for multicasting (or udp)? if it does not, why?
what are the concrete differences in differnt operating systems handling (win,mac,linux)?
thank you
Can I assume correct package order in general on localhost for multicasting (or udp)? If it does not, why?
No, because the packets aren't cataloged. Plus, as you may already know, there are no guarantees that a loop-back connection will keep UDP in order.
What are the concrete differences in differnt operating systems handling (win, mac, linux)?
There are no differences in protocol (see the RFC) but the nitty-gritty details are most likely platform (and version) dependent and I'm sure no one knows them off-hand (some are close-sourced anyway). This is another reason why in-orderness most likely can't be guaranteed. Even if you do test this and the packets come in order, it's a very bad idea (tm) to rely on something like the order of UDP packets in loop-back connections.
Also: the argument that UDP is "unreliable," while true, can be misleading. A lot of security-conscious software works via UDP, and, in general, only a small percentage of packets are dropped. With appropriate contingencies in place, a piece of software should not blow up if using UDP (for the sake of performance, lets say) and a packet drops. But if you're that worried about it, why not use TCP?

How to retain one million simultaneous TCP connections?

I am to design a server that needs to serve millions of clients that are simultaneously connected with the server via TCP.
The data traffic between the server and the clients will be sparse, so bandwidth issues can be ignored.
One important requirement is that whenever the server needs to send data to any client it should use the existing TCP connection instead of opening a new connection toward the client (because the client may be behind a firewall).
Does anybody know how to do this, and what hardware/software is needed (at the least cost)?
What operating systems are you considering for this?
If using a Windows OS and using something later than Vista then you shouldn't have a problem with many thousands of connections on a single machine. I've run tests (here: http://www.lenholgate.com/blog/2005/11/windows-tcpip-server-performance.html) with a low spec Windows Server 2003 machine and easily achieved more than 70,000 active TCP connections. Some of the resource limits that affect the number of connections possible have been lifted considerably on Vista (see here: http://www.lenholgate.com/blog/2005/11/windows-tcpip-server-performance.html) and so you could probably achieve your goal with a small cluster of machines. I don't know what you'd need in front of those to route the connections.
Windows provides a facility called I/O Completion Ports (see: http://msdn.microsoft.com/en-us/magazine/cc302334.aspx) which allow you to service many thousands of concurrent connections with very few threads (I was running tests yesterday with 5000 connections saturating a link to a server with 2 threads to process the I/O...). Thus the basic architecture is very scalable.
If you want to run some tests then I have some freely available tools on my blog that allow you to thrash a simple echo server using many thousands of connections (1) and (2) and some free code which you could use to get you started (3)
The second part of your question, from your comments, is more tricky. If the client's IP address keeps changing and there's nothing between you and them that is providing NAT to give you a consistent IP address then their connections will, no doubt, be terminated and need to be re-established. If the clients detect this connection tear down when their IP address changes then they can reconnect to the server, if they can't then I would suggest that the clients need to poll the server every so often so that they can detect the connection loss and reconnect. There's nothing the server can do here as it can't predict the new IP address and it will discover that the old connection has failed when it tries to send data.
And remember, your problems are only just beginning once you get your system to scale to this level...
This problem is related to the so-called C10K problem. The C10K page lists a large number of good resources for addressing the problems you will encounter when you try to allow thousands of clients to connect to the same server.
I've come across the APE Project
a while back. It seems like a dream come true. They can support up to 100k concurrent clients on a single node. Spread them across 10 or 20 nodes, and you can serve millions. Perfect for RESTful applications. Might want to look deeper for any shared namespace. One drawback is that this is a standalone server, as in supplementary to a web server. This server is of course Open Source, so any cost is hardware/ISP related.
You cannot use UDP. If the client sends a request and you don't reply immediately, a router is going to forget the reverse route in 30 seconds or less, so your server will never be able to reply to the client.
TCP is the only option, and it, too, will give you headaches. Most routers are going to forget the route and/or drop the connection after a few minutes, so your client/server code is going to have to send "keep alives" fairly often.
I recommend setting up a "sniffer", to see how the phone companies are staying in touch with your smartphone for their "push" technology. Copy whatever they're doing, because that stuff works!
As Greg mentioned, the problem you are describing is C10K (or rather "C1M" in your case )
I recently made a simple TCP echo server on linux that scales very well with the number of sessions (only tested up to 200.000 though), by using the epoll queue. On BSD, you have something similar called kqueue.
You can check out the code if you want to. Hope this helps and good luck!
EDIT: As noted in the comments below, my original assertion that there is a 64K limit based on the number of ports is incorrect, however there is a 32K limit on the number of socket handles, so my suggested design is valid.
With a typical TCP/IP server design, you're limited in the number of simultaneous open connections you can have. The server has one listening port, and when a client connects to it the server makes an accept call, and that creates a new socket on a random port for the rest of the connection.
To handle more than 64K simultaneous connections I think you need to use UDP instead. You only need one port for the server to listen on, and you need to manage the connections using a 32-bit client ID in the packet data instead of having a separate port for each client. The 32-bit client ID could be the client's IP address, and the client can listen on a known UDP port for messages coming back from the server. That port would be the only one that needs to be open on the firewall.
With this approach, your only limitation is how quickly you can handle and respond to UDP messages. With millions of clients, even sparse traffic could give you large spikes, and if you don't read the packets fast enough your input queue will fill up and you'll start dropping packets. The C10K page Greg points to will give you strategies for that.

Why is SCTP not much used/known

I recently checked out the book "UNIX Network Programming, Vol. 1" by Richards Stevens and I found that there is a third transport layer standard besides TCP and UDP: SCTP.
Summary: SCTP is a transport-level protocol that is message-driven like UDP, but reliable like TCP. Here is a short introduction from IBM DeveloperWorks.
Honestly, I have never heard of SCTP before. I can't remember reading about it in any networking books or hearing about it in classes I had taken. Reading other stackoverflow questions that mentions SCTP suggests that I'm not alone with this lack of knowledge.
Why is SCTP so unknown? Why is it not much used?
Indeed, SCTP is used mostly in the telecom area. Traditionally, telecom switches use SS7 (Signaling System No. 7) to interconnect different entities in the telecom network. For example - the telecom provider's subscriber data base(HLR), with a switch (MSC), the subscriber is connected too (MSC).
The telecom area is moving to higher speeds and more reachable environment. One of these changes is to replace SS7 protocol by some more elegant, fast and flexible IP-based protocol.
The telecom area is very conservative. The SS7 network has been used here for decades. It is very a reliable and closed network. This means a regular user has no access to it.
The IP network, in contrast, is open and not reliable, and telecoms will not convert to it if it won't handle at least the load that SS7 handles. This is why SCTP was developed. It tries:
to mimic all advantages of the SS7 network accumulated over the decades.
to create a connection-oriented protocol better than TCP in speed, security, and redundancy
The latest releases of Linux already have SCTP support.
We have been deploying SCTP in several applications now, and encountered significant problem with SCTP support in various home routers. They simply don't handle SCTP correctly. I believe this is primarily a performance issue (the SCTP protocol specification require checksums for the whole packets to be recalculated and not just for headers).
Like many other promising protocols SCTP is sadly dead in the water until D-link and Netgear fixes their broken NAT boxes.
SCTP is not very much known and not used/deployed a lot because:
Widespread: Not widely integrated in TCP/IP stacks (in 2013: still missing natively in latest Mac OSX and Windows. 2020 update: still not in Windows nor Mac OS X)
Libraries: Few high level bindings in easy to use languages (Disclaimer: i'm maintainer of pysctp, SCTP easy stack support for Python)
NAT: Doesn't cross NAT very well/at all (less than 1% internet home & enterprise routers do NAT on SCTP).
Popularity: No general public app use it
Programming paradigm: it changed a bit: it's still a socket, but you can connect many hosts to many hosts (multihoming), datagram is ordered and reliable, erc...
Complexity: SCTP stack is complex to implement (due to above)
Competition: Multipath TCP is coming and should address multihoming needs / capabilities so people refrain from implementing SCTP if possible, waiting for MTCP
Niche: Needs SCTP fills are very peculiar (ordered reliable datagrams, multistream) and not needed by much applications
Security: SCTP evades security controls (some firewalls, most IDSes, all DLPs, does not appear on netstat except CentOS/Redhat/Fedora...)
Audit-ability: Something like 3 companies in the world routinely do audits of SCTP security (Disclaimer: I work in one of them)
Learning curve: Not much toolchain to play with SCTP (check the excellent withsctp that combines nicely with netcat or use socat, 2020 edit: nmap supports it for a few years now )
Under the hood: Used mostly in telecom and everytime you send SMS, start surfing the net on your mobile or make phone calls, you're often triggering messages that flow over SCTP (SIGTRAN/SS7 with GSM/UMTS, Diameter with LTE/IMS/RCS, S1AP/X2AP with LTE), so you actually use it a lot but you never know about it ;-) 2020 edit: it's being removed from the core 5G network (no more Diameter, HTTP/2 instead) and will be only used in the 5G radio access network between antennas and core.
SCTP requires more design within the application to get the best use of it. There are more options than TCP, the Sockets-like API came later, and it is young. However I think most people that take the time to understand it (and who know the shortcomings of TCP) appreciate it -- it is a well designed protocol that builds on our ~30 years of knowledge of TCP and UDP.
One of the aspects that requires some thought is that of streams. Streams provide (usually, I think you can turn it off) an order guarantee within them (much like a TCP connection) but there can be multiple streams per SCTP connection. If your application's data can be sent over multiple streams then you avoid head-of-line blocking where the receiver starves due to one mislaid packet. Effectively different conversations can be had over the same connection without impacting each other.
Another useful addition is that of multi-homing support -- one connection can be across multiple interfaces on both ends and it copes with failures. You can emulate this in TCP, but at the application layer.
Proper link heartbeating, which is the first thing any application using TCP for non-transient connections implements, is there for free.
My personal summary of SCTP is that it doesn't do anything you couldn't do another way (in TCP or UDP) with substantial application support. The thing it provides is the ability to not have to implement that code (badly) yourself.
FYI, SCTP is mandated as supported for Diameter (cf RADIUS next gen). see RFC 3588
Diameter clients MUST support either TCP or SCTP, while agents and
servers MUST support both. Future versions of this specification MAY
mandate that clients support SCTP.
p1. SCTP mapped directly over IPv4 requires support in NAT gateways, which has never been widely deployed anywhere, and without it the typical NAT gateway will only permit one private host per public address to be using SCTP at a time.
p2. SCTP mapped over UDP/IPv4 allows more private hosts per public address, but UDP mappings in IPv4/NAT gateways are notoriously tricky to establish and keep maintained, due to the fact that UDP is a connectionless transport without any explicit state for a NAT to track.
p3. SCTP mapped directly over IPv6 requires... well... IPv6. Have you tried to deploy IPv6? If so, have you tried to buy an IPv6 firewall? Does it support SCTP? How about a load balancer? A SSL accelerator?
p4. Finally, a lot of the Internet is pretty much constrained to what can fit through TCP port 80 and port 443, so SCTP of any flavor tends to lose there. Hence, you see efforts like the MPTCP working group in IETF.
Many of us will be using SCTP soon, since it's used by WebRTC datachannels to create a TCP-like reliable layer on top of UDP -- SCTP over DTLS over UDP: https://datatracker.ietf.org/doc/html/draft-ietf-rtcweb-data-channel-13#section-6
Reading the SCTP Wikipedia page I'd say that the main reason is that SCTP is a very young protocol (proposed in 2000) that is currently unsupported by the mainstream OSs (Windows, OS X, Linux).
If "very young" seems inappropriate to you, think about IPV6: "in December 2008, despite marking its 10th anniversary as a Standards Track protocol, IPv6 was only in its infancy in terms of general worldwide deployment."
SCTP is used extensively in the 4G LTE network where Diameter is used for AAA.
It might not be well known, but it's not unused. Quite recently there was a draft published at the IETF about Using SCTP as a Transport Layer Protocol for HTTP.
In reference to all of the comments about commercial routers being broken or lacking SCTP support, the issue is that SCTP with NAT is still in draft form with the IETF. So there is no RFC specification for them to implement it.
https://datatracker.ietf.org/doc/html/draft-ietf-behave-sctpnat-09
Sctp is born too late, and for many situation TCP is enough.
Also, as I know most of its usage is on telecommunication area.

Resources