Deciding whether to use ws or wss? - tcp

I have an account with a trading exchange and they have a websockets API which supports ws://... and wss://...
For the non-authenticated channels such as the current state of the orderbook, is it an easy decision to just use ws, mostly for the (however minimal) time savings? Obviously I would like to have my data as recent as possible.
I just want to check there's not some other factor which is more important than a few TLS encryption CPU cycles and ms in latency saved.

I see absolutely no reason not to use wss for all your connections, particular with a webSocket. The normal webSocket use is to make a connection, then keep that connection for a long time and use it. While there is a bit of overhead on every transmission because of the encryption, the main wss overhead is when you first make the connection and that only happens once per connection.
I just want to check there's not some other factor which is more important than a few TLS encryption CPU cycles and ms in latency saved.
No, there is not some other factor. In fact, the opposite. There are more and more and more reasons these days to use TLS whenever possible to protect your privacy.
For the non-authenticated channels such as the current state of the orderbook, is it an easy decision to just use ws, mostly for the (however minimal) time savings?
Why? If wss is available, I'd be using it for everything. If you actually run into a CPU problem down the road, you could revisit whether using wss has anything to do with it, but that is unlikely to happen and, in my opinion, you have nothing to lose by starting with wss. While one wants to design code intelligently, you don't want to try to micro-optimize performance-related things before you even have a documented, measured performance issue to worry about.
General reasons to use TLS:
Privacy (nobody in the middle can snoop on what you're doing)
Security of data (nobody can read your data, not even proxies)
Security of endpoint (endpoint you're connecting to can't be hijacked without you knowing about it)

I just want to check there's not some other factor which is more important than a few TLS encryption CPU cycles and ms in latency saved.
Actually, there is.
This is also stated in the RFC:
At the time of writing of this specification, it should be noted that connections on ports 80 and 443 have significantly different success rates, with connections on port 443 being significantly more likely to succeed, though this may change with time.
Some network intermediaries (especially with some mobile providers) will fail on ws connections but work properly when using wss.
The reason seems to be that these intermediaries (proxies / routers) will attempt to read the WebSocket message as if it were HTTP and "fix" HTTP errors or resolve caching (which actually corrupts WebSocket data).
The encrypted wss protocol will trigger a pass-through mode, since these intermediaries won't be able to read the data or "fix" any HTTP errors.
The Websocket protocol uses client-side frame masking for the same purpose, but sometimes with limited results. Using wss increases connectivity on some networks.

Related

Isn't http keep alive feature against three rule of thumbs: assyncronous, reactive programing and scalability

I know in HTTP 1.1, keep-alive is the default behavior, unless the client explicitly asks the server to close the connection by including a Connection: close header in its request, or the server decides to includes a Connection: close header in its response. I am wondering if this isn't kind of an obstacle in scalability when growning servers horizontaly.
My scenario: we are developing all new services following microservices patterns either in Java or Phyton. It is desarible we can design and implement such way we can increase horizontally. For isntance, I can use docker in order to easily scale up or use Spring Boot Cloud Config. Whatever the phisical host implementation the basic idea is favour scalability.
My understanding: I must keep server and client as musch agnostic as possible and when I set up HTTP Keep Alive I understand there will be advantage while taking use of same http connection (and save some cpu process) but I guess I am forcing the client (eg. another service) to keep using same connection which may downgrade the advantage of several docker instances of same service since I will promote the client to keep consuming the same initial connection.
Assuming my understanding is correct, I assume it is not a good idea since we develop the service providing response that can be reuseable from different consumers with different approaches: some consumers can consume assyncronously or following reactive design paradigms which make me wondering if keeping alive same connection. Let's say in practical terms: the connection used should be free soon as possible in order to really balance the demand over all providers.
***edited after first comment
Let´s assume I have multiple diferent consumer services (CS1, CS2 ... CSn) connecting to a single Load Balance instance (LB) which will forward the request to multiple Dockers with same provider service (D1, D2 ... Dn). Since keep alive is the default behaviour in http 1+, we have keep "alive = true" in all connection (either between Cx and LB or LB and Dx). As far as I know the only advantage to keep alive is save cpu process while opening/closing a connection. If I send Connection:close after each request there is no advantage at all to use keep alive. If I have some logic to send "connection: close" it means I promote LB to keep connected to a specific Dx using exactly the same connection for while, right? (I choose here the word promote because I iguess force might not be the appropriate one since there is time out in keep alive and then LB migh route to another Dx anyway). So I have in some moment C1 -> LB -> D1 alive persisted for while, right? Comming back to my original question, isn't that against the idea of assyncronous/paralelal/reactive paradigm? For instance, I have some scenario where a single consumer service will call another service few times before returning a single answer to a page. Today we are doing it sequentially but if we decide to call in paralalel and depending on first answer therer will be already a answer to a page or we decide to compouse an answer to the page but I don't care the order. The caller service will wait every answers before returning to a ccontroller and the order doesn't matter. Ins't strange I have keep alive = true?
I am forcing the client (eg. another service) to keep using same connection
You are not forcing. The client can easily avoid persistent connections by sending HTTP/1.0 and/or Connection: close. There are many practical HTTP applications that work just like that.
keep using same connection which may downgrade the advantage of several docker instances of same service since I will promote the client to keep consuming the same initial connection
Assuming your load balancer works well, it will usually distribute connections evenly across your instances. Of course, this may not work when you only have a few connections altogether, but a few connections can hardly pose a scalability or performance problem.

How can HTTP pipelining make performance worse?

It's a popular claim that HTTP pipelining can degrade performance of downloading sites due to the head of line (HoL) blocking phenomena. Is this performance compared to a single non-pipelined persistent HTTP connection or to multiple TCP connections opened simultaneously in order to download resources of the site in parallel?
In the first case I can't really see how large response blocking sending subsequential smaller ones can result in performance loss. Yes, this blocking will occur. But in the case of a single non-pipelined persistent HTTP connection the HoL blocking phenomena occurs every time the client sends a request and every time the server sends a response. The only reasons of performance being potentially worse here I was able to think of are that:
1) the time needed to properly queue/buffer requests/responses may be longer that the time saved by the fact that the server can start processing n-th request without waiting for the processing of (n-1)-th request to complete. But it basically comes down to numbering the requests/responses correctly, so it seems to be more of a concern if many small requests have to be dealt with (it's unlikely that queueing/buffering-related computations will take more time than processing a large response and people indicating that HoL blocking can be a problem refer to large responses, not to small ones) and it is not directly related to the HoL blocking;
2) if many clients had pipelining enabled then it is possible that many large responses would have to be buffered effectively leading to memory exhaustion on the server side. But this is a kind of special situation and clearly it is not what people have in mind when speaking about enabling pipeling in a browser being able to make performance worse.
On the other hand, in the case of comparison of pipelining to multiple simultaneous TCP connections it is readily seen that the necessity to send large response before sending subsequential smaller ones will slow things down.
However, if the comparison is made to a single non-pipelined HTTP connection and pipelining can indeed result in performance loss - can you demonstrate some basic (perhaps simplified) calculations showing that?
I tried to search the response to my question on the Internet but was unable to find it.
Some of the resources that I tried:
https://devcentral.f5.com/articles/http-pipelining-a-security-risk-without-real-performance-benefits
What are the disadvantage(s) of using HTTP pipelining?

What is the recommended HTTP POST content length?

I have several clients that constantly post data to a REST service. REST service is put behind a network load balancer. Each client sends 100 - 500 MB a day and I need to support 500+ clients.
I can POST either very large packets, this will reduce overhead for TCP/IP session set up and HTTP headers. This will, however, firmly tie one client to a particular server and limit my scalability options. Alternatively, I can send small HTTP packets, which I can load balance well, but I will get more overhead for TCP/IP session set up and HTTP headers.
What is the recommended packet size for HTTP POST? Or how can I calculate one for my environment?
There is no recommended size.
While HTTP POST size is not constrained by the RFCs, since HTTP is a commodity protocol implementing request / response type messaging, most of the infrastructure is configured around the idea that TCP connections are not particularly long lasting / does not carry significant amounts of data. i.e. there will be factors outside your control which may impact the service - although HTTP supports range requests for responses, there is no corollary for requests.
You can get around a lot of these (although not all) by using HTTPS. However you still need to think about how you detect/manage outages - are you happy to wait for a TCP timeout?
With 500+ clients presumably using the system quite heavily, the congestion avoidance limits shouldn't be a problem - whether TCP window scaling is likely to be an issue depends on how the system is used. HTTP handshakes should not be an issue unless you restrict the request size to something silly.
If the service is highly dependant on clients pushing lots of data on to your server, then I'd encourage you to look at parsing the data on the client (given the volume, presumably it's coming from files - implying a signed java applet or javascript with UniversalBrowserRead privilege) then sending it over a bi-directional communication channel (e.g. websocket).
Leaving that aside for now, the only way you can find out what the route between your clients and your server will support is to measure it - and monitor it. I would expect that a 2Mb upload size would work pretty much anywhere, while a 10Mb size would work most of the time within the US or Europe - and that you could probably increase this to 50Mb as long as there's no mobile clients.
But if you want to maintain the effectiveness of the service you'll need to monitor bandwidth, packet loss and lost connections.

Discussion: Chat server via node.js: HTTP or TCP?

I was considering doing a chat server using node.js/socket.io. Should I make it a tcp server or a http server? I'd imagine tcp server would be more efficient, but can you send other stuff to it like file attachments etc? If tcp is more efficient, how much more so? Also, just wondering how many concurrent connections can one node.js server handle? Is it more work to do TCP or HTTP?
You are talking about 2 totally different approaches here - TCP is a transport layer protocol and HTTP is an application layer protocol. HTTP (usually) operates over TCP, so whichever option you choose, it will still be operating over TCP.
The efficiency question is sort of a moot point, because you are talking about different OSI layers. If you went for raw TCP sockets, your solution would probably be more efficient - in bandwidth at least - since HTTP contains a whole bunch of extra data (the headers) that would likely be irrelevant to your purposes (depending on the scale of the chat program). What you are talking about developing there is your own application layer protocol.
You can send anything you like over TCP - after all HTTP can send attachments, and that operates over TCP. FTP also operates over TCP, and that is designed purely for transferring "attachments". In order to do this, you would need to write your protocol so that it was able to tell the remote party that the following data was a file, then send the file data, then tell the remote party that the transfer is complete. Implementations of this are many and varied (the HTTP approach is completely different from the FTP approach) and your options are pretty much infinite.
I don't know for sure about the node.js connection limit, but I can say with a fair amount of confidence that it is limited by the operating system. This might help you get to grips with the answer to that question.
It is debatable whether it is more work to do it with TCP or HTTP - it's a lot of work to do it in both. I would probably lean more toward the TCP option being your best bet. While TCP would require you to design a protocol rather than/as well as an application, HTTP is not particularly suited to live, 2-way applications like chat servers. There are many implementations of chat over HTTP that use AJAX, but I can tell you from painful experience that they are a complete pain in the rear-end.
I would say that you should only be looking at HTTP if you are intending the endpoint (i.e. the client) to be a browser. If you are going to write a desktop app for the endpoint, a direct TCP link would definitely be the way to go. The main reason for this is that HTTP works in a request-response manner, where the client sends a request to the server, and the server responds. Over TCP you can open a single TCP stream, that can be used for bi-directional communication. This means that the server can push an event to the client instantly, while over HTTP you have to wait for the client to send a request, so you can respond with an event. If you were intending to use a browser as the client, it will make the whole file transfer thing much more tricky (the sending at least).
There are ways to implement this over HTTP using long-polling and server push (read this) but it can be a real pain to implement.
If you are going to implement this on a LAN (or possibly even over the internet) it is worth considering UDP over TCP - in a chat application it is not usually absolutely mission critical that messages arrive in the right order, and even if it was, users would probably not be able to type faster than the variations in network latency (probably <100ms). Then for file transfers you could either negotiate a seperate TCP socket for the data exchange (like FTP), or implement some kind of UDP ACK system (like TFTP).
I feel there is a lot more to say on this subject but right now I can't put it into words - I may extend this answer at some point.
Chat servers are the Hello World program in node. Use http.
As far as the question of how many concurrent connections can it handle, that all depends on your system. Set up a simple chat server and then try benchmarking it.
Also, check out http://search.npmjs.org/ and search for chat for a few pointers.

Are socket connections faster than http on Blackberry?

I'm writing an app for Blackberry that was originally implemented in standard J2ME. The network connection was done using Connector.open("socket://...:80/...") instead of http://
Now, I've implemented the connection using both methods, and it seems like some times, the socket method is more responsive, and some times it doesn't work at all. Is there a significant difference between the two? Mostly what I'm trying to achieve is responsiveness from the connection to get a smooth progress bar.
Blackberry's implementation of http and https provide more options for connecting to the target server than socket, and, of course, implement all the HTTP protocol stuff for you. I've not benchmarked them, but it makes a certain amount of sense that direct TCP via socket would be quicker in some cases, especially if what is listening on port 80 isn't an HTTP server (no protocol overhead)
I've had difficulty in the past with different network providers, some requiring deviceside=true others deviceside=false, and no real way to know until the first support call for that network came in.
Mostly what I'm trying to achieve is responsiveness from the connection to get a smooth progress bar.
Pardon my saying so, but a "smooth progress bar" is "gilding the lily" - nice to have and look at, but not critical to the application's function, reliability or robustness. Go with what is more robust and reduces code size - likely http in this case.
Since both operate over a network I don't think you can guarantee a smooth progress bar. You might have more chance of that if you remind the person to stay in one place so you have a chance of a consistent connection ;)
There is less overhead with a socket connection than an HTTP one. In fact, HTTP connections run over the socket connection. You can take advantage of the reduced overhead of the socket connection to appear more responsive, but you will likely have more work to do than you would with HTTP. The API is more low-level so coding is more complex.
One difference between a socket and an HTTP connection on the BlackBerry is that HTTP connections may be transparently routed via an HTTP proxy in the case of BES and BIS connections.
In theory sockets will be faster, but then you're responsible for managing the overhead of rolling your own protocol (depending on complexity). Though sockets are more lightweight, I've found that HTTP and all the comes with it greatly reduces the headache.

Resources