I usually pass data between my web servers (in different locations) using HTTP requests (sometimes using SSL if it's sensitive). I was wondering if there were any lighter protocols that I might be able to swap HTTP(S) for that would also support public/private keys like SSH or something.
I used PHP sockets to build a SMTP client before so I wouldn't mind doing that if required.
There are lots and lots and lots of protocols. Lots. Start here for a list.
http://en.wikipedia.org/wiki/Internet_Protocol_Suite
SFTP is fun for passing data around. It works well. You'll find that it's not much better than HTTP, however, because HTTP is pretty simple.
http://en.wikipedia.org/wiki/SSH_file_transfer_protocol
SMTP would work. http://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol
SNMP can be made to work. http://en.wikipedia.org/wiki/Simple_Network_Management_Protocol You have to really push the envelope.
All of these, however, involve TCP/IP sockets, which involve a fair amount of overhead because of the negotiation for a connection and the acknowledgement of packets.
If you want real fun with very low overhead, use UDP.
http://en.wikipedia.org/wiki/User_Datagram_Protocol
You might want to use Reliable UDP if you're worried about messages getting dropped.
http://en.wikipedia.org/wiki/Reliable_User_Datagram_Protocol
I'd like to mention XMPP in addition to protocols already listed in other answers.
It's lightweight, and it is used in some "realtime" communication systems (for example, in GTalk).
WebSocket is a good option if you are interested in keeping a connection open to pass multiple messages back and forth. It's useful for issuing updates from the server to clients in real time, for example.
Why don't you simply use FTPS:
http://en.wikipedia.org/wiki/FTPS
or SFTP
http://en.wikipedia.org/wiki/SSH_file_transfer_protocol
Related
I maintain a service that basically pings sites to check whether they're online or not. The service per se is really simple, it relies only on the HTTP status code returned by the requested URL. For instance, I ignore the response body completely.
The service works fine for a small list of domains. However, networking becomes an issue as the number of sites to ping grows. I tried a couple of different languages and libraries. My latest implementation uses NodeJS and node-fetch. But I already had versions of it wrote in Python, PHP, Java, Golang. From that experience, I now know the language is not what determines the request/response speed. There are differences between languages and lib, for sure, but the bottleneck is not there.
Today, I think the only way to make the service scales is with multiple clusters in different networks (e.g. VPC if we're talking AWS). I can't think of a way to deal with networking restrictions in a single or just a few instances.
So, I'm asking this really broad question: what strategies I can use to overcome networking limitations? I'm looking for both dev and ops answers, but mostly focusing on keep the structure as light as possible.
One robust way to ping a website (or any TCP service in general) is to send TCP SYN packet to port 443 (or 80 for insecure HTTP) and measure the time till SYN+ACK response. Tools like hping3 and MTR utilize this method.
This method is one of the best because ICMP may be blocked, take a different path, be prioritized differently on routers in the path, or be responded to by a totally different host. Whereas TCP SYN is the actual scenario the users of the website exercise. The network load is minimal as no data is sent in SYN/SYN+ACK packets, only protocol headers (TCP, IP, and lower level protocol headers).
The answer of #Maxim Egorushkin is great, TCP SYN scanning is the most efficient way I can think of. There are other tools like Masscan, use pcap to send SYN packet in userspace, reduce TCP connection management overhead in kernel. This approach may do the job with a single instance.
If you wanna use HTTP protocol to make sure application layer works fine, use HTTP HEAD request. It responses with a header and status code as GET, but without the body.
Another potential optimization is DNS, you can host a DNS server locally and manage to update domains beforehand, or use a script to update host file before pinging those sites. This can save several milliseconds and bandwith
during pinging sites.
At development level, you could impletement a library just parse status code in HTTP response, so saving some CPU time on parsing headers.
It is helpful to address the actual bottleneck first, it that bandwith limit? memory limit? file descriptor limit? etc.
After reviewing the differences between raw TCP and websocket, I am thinking to use websocket, even though it will be a client/server system with no web browser in the picture. My motivation stems from:
websocket is message-oriented, so I do not have to write down a protocol on top of the tcp layer to delimit messages myself.
The initial handshake of websocket is quite fitting for my use case as I can authenticate/authorize the user in this initial response-request exchange.
Performance does matter a lot here though, I am wondering if, excluding the websocket handshake, there would be a loss of performance between the websocket messages vs writing a custom protocol on raw tcp? If not, then websocket is the most convenient choice to me, even if I don't use the benefits related to the "web" part.
Also would using wss change the answer to the above question?
You are basically asking if using an already implemented library which perfectly fits your requirements and which even has the option for secure connections (wss) is better then designing and implementing your own message based protocol on TCP, assuming that performance and overhead are not relevant for your use case.
If you rephrase your question this way the answer should be obvious: using an existing implementation which fits your purpose saves you a lot of time and hassle for design, implementation and testing. It is also easier to train developers to use this protocol. It is easier to debug problems since common tools like Wireshark understand the protocol already.
Apart from this websockets have an established mechanism to use proxies, use a common protocol so that they can easier pass firewalls etc. So you will likely run into less problems when rolling out your application.
In other words: I can see no reason on why you should not use websockets if they fit your purpose.
I don't have much experience with network programming, but an interesting problem came up that requires it. The server will be transmitting multiple streams of different types of data to other machines. Each machine should be able choose which of the streams (one or more) it will like to receive. The whole setup is confined to the local network only. Initially, there will be only two clients, but I would like to design a scalable approach, if possible.
The existing server code, which is streaming only a single stream, is using TCP streaming socket for doing so. However, from some reading on the subject, I am not sure if this approach will scale to multiple streams and multiple clients well. The reason is: wouldn't two clients, who want to receive the same stream but connect via different TCP sockets, result in wastage of bandwidth? Especially compared to UDP, which allows to multicast.
Due to my inexperience, I am relying on better informed people out there to advise me: considering that i do want the stream to be reliable, would it be worth it to start from the scratch with UDP, and implement reliability into it, than to keep using TCP? Or, will this be better solved by designing an appropriate network structure? I'd be happy to provide more details if needed. Thanks.
UPDATE: I am looking at PGM and emcaster for reliable multicasting at the moment. Must have C# implementations at server side, and python implementations at client side.
Since you want a scalable program, then UDP would be a better choice, because it does not go the extra length to verify that the data has been received, thus making the process of sending data faster.
Quick question: do most chat applications (ie. AIM, Skype, Oovoo) use peer to peer UDP exchange for talking to other users or an echoing TCP connection with a server? Or some combination in-between?
Traditionally, most applications used a TURN-like solution (i.e., communication via a server) to overcome NAT traversal issues. Since chat does not consume much bandwidth, servers could support thousands of communications.
But now that P2P has evolved and the NAT traversal issues are now well understood, some use direct UDP communication provided that the users' NAT allows this (i.e., STUN-like communication). They still need a central server to punch the hole though. Direct communication is also helpful when lots of data needs to be transmitted.
I believe it is fair to say that most modern frameworks use a combination of both.
when you need small fragments of data, such as text messaging, there's no need of using P2P. data can be transmitted from client1 to server, and from server back to the client2.
When you need to transfer data quickly between clients, in cases such as VoIP (voice over IP), or file transfer, you will use P2P.
A pretty standard IM protocol is XMPP. I know it's used by Google Talk, as well as a few other big names in chat.
I am writing an application where the client side will be uploading data to the server through a wireless link.
The connection should be very reliable.The link is expected to break many times and there will be many clients connected to the server.
I am confused whether to use TCP or reliable UDP.
Please share your thoughts.
Thanks.
RUDP is not, of course, a formal standard, and there's no telling if you will find existing implementations you can use. Given a choice between rolling this from scratch and just re-making TCP connections, I'd chose TCP.
To be safe, I would go with TCP just because it's a reliable, standard protocol. RUDP has the disadvantage of not being an established standard (although it's been mentioned in several IETF discussions).
Good luck with your project!
It's likely that both your TCP and RUDP links would be broken by your environment, so the fact that you're using RUDP is unlikely to help there; there will likely be times when no datagrams can get through...
What you actually need to make sure of is that a) you can handle the number of connected clients, b) your application protocol can detect reasonably quickly when you've lost connectivity with a client (or server) and c) you can handle the required reconnection and maintenance of cross connection session state for clients.
As long as you deal with b) and c) it doesn't really matter if the connection keeps being broken. Make sure you design your application protocol so that you can get things done in short batches; so if you're uploading files, make sure that you're sending small blocks and that the application protocol can resume a transfer that was broken half way through; you don't want to get 99% of the way through a 2gb transfer and lose the connection and have to start again.
For this to work your server needs some kind of client session state cache where you can keep the logical state of a client's connection beyond the life of the connection itself. Design from the start to expect a given session to include multiple separate connections. The session state should possibly have some kind of timeout so if the client goes away for along time it doesn't continue to consume resources on the server but, to be honest, it may simply be a case of saving the state off to disk after a while.
In summary, I don't think the choice of transport matters and I'd go with TCP at least to start with. What will really matter is being able to manage your client's session state on the server and deal with the fact that clients will connect and disconnect regularly.
If you aren't sure, odds are that you should use TCP. For one thing, it's certain to be part of the network stack for anything supporting IP. "Reliable UDP" is rarely supported out of the box, so you'll have some extra support work for your clients.