Sending multiple 5mb binary files over WS vs Http - http

Does sending large files over a websocket "block" websocket for other messages while the large files are being sent?
Does sending the files via independent Http requests while the other messages continue to be sent over WS have any distinct advantage "in keeping the WS unblocked"?
Assume 1 network card.

In case of WebSocket over HTTP/1.1, yes, the upload of a large file (in the form of a large WebSocket message) blocks the WebSocket connection.
In case of WebSocket over HTTP/2 (if supported by both the client and server), one HTTP/2 stream will upload the large file, and another HTTP/2 stream is be used to carry WebSocket messages. In this case, the problem becomes the HTTP/2 flow control window, which may be exhausted by the large upload stream, leaving the WebSocket message stream stalled (so that messages are queued and delayed). Unfortunately, the details of this queueing/delay depend on the client and on the server implementations, so you have to try.
Typically implementations do a good job at interleaving streams, so rarely the possible stalls are a problem.
For WebSocket over HTTP/1.1, if you open multiple WebSocket connections, you may be able to send files and messages in parallel, using N WebSocket connections for the files, and 1 WebSocket connection for the messages, for example.
Some non-browser clients allow you to open multiple HTTP/2 connections to the same domain, so again you will have the chance to send files and messages in parallel. However, to my knowledge, browsers do not allow more than 1 HTTP/2 connection to the same domain, so the parallelism is there, but constrained by the HTTP/2 flow control window.
Not sure what you mean by "keeping the WS unblocked", but HTTP/1.1 works in the same way as WebSocket for what pertains its usage of connections.
If you are in a browser environment, browsers allow 6-8 HTTP connections to the same domain, and typically unlimited (or at least many more) WebSocket connections.
So if you want to send, say, 10 large files, 6-8 of them will be uploaded via HTTP, but the remaining will be queued waiting for one of the HTTP connections to finish the previous upload.
Meanwhile, you can use the WebSocket connection to send messages.
In case of HTTP/2, browsers only open 1 connection, so you may use HTTP/2 streams for the uploads and a WebSocket over HTTP/2 stream for the messages, but they will all share the same HTTP/2 flow control window, potentially stalling each other.
All in all, WebSocket has not been designed for large uploads.
I would not be surprised if you hit WebSocket message size limits, as servers cannot allow clients to upload messages of arbitrary size (as it will blow up the server memory). The same is true for clients; browsers have typically small limits for the size of WebSocket messages that they receive, independently of whether HTTP/1.1 or HTTP/2 is used.
If you really need to upload large files, I think a solution where you upload via HTTP (which allow larger sizes, for example when using multipart/form-data), and keep small messaging via WebSocket is optimal.
The use of HTTP/2 may hit the HTTP/2 flow control window limit, but you have a limit in 6-8 connections in HTTP/1.1 too, so again you have to try and see if you hit any limit, and if you do, which limit it is in what case.
Using HTTP for uploads makes less likely that you hit WebSocket message size limits that are not known in advance and possibly different from client to client (browser to browser), and you don't want to implement your own splitting and merging of large uploads via WebSocket to respect those limits.

Related

Can I have multiple open SSE channels when using HTTP/2?

So far I only used HTTP/1.1, but recently I switched to HTTP/2. On 1.1 I ran into request number limit issues, but HTTP/2 uses one connection with multiplexing, does that mean that I can keep multiple SSE channels open with no problems, or should I still use only one with some internal message routing solution?
If you want to be safe: Use just one channel or only a few of them and multiplex internally.
Longer answer: The reason that more channels caused problems with HTTP/1.1 is that each channel required a dedicated TCP connection, and browsers limited the number of concurrent TCP connections for each tab (I think to something around 10). With HTTP/2 making concurrent HTTP requests is possible on a single connection. therefore opening multiple concurrent SSE streams is more likely be possible. However browsers (and also webservers) may still limit the number of concurrent HTTP/2 streams they support over a TCP connection. HTTP/2 even supports that by allowing each peer in a HTTP/2 setting to communicate the maximum amount of concurrent streams it supports (SETTINGS_MAX_CONCURRENT_STREAMS). To be safe you would need to figure out what the limit is that your target browsers and your web server supports and use a lower number of SSE streams. I unfortunately don't know whether it's part of any HTML or browser specification, that they all should support at least a well-specified number of concurrent requests over HTTP/2. If you keep the number of requests low you avoid to run into problems.
One other advantage for using only a few channels is that you can still support HTTP/1.1 clients well. And not only those which might be directly connected to your server but also those which might connect through a proxy-server (which means the connection browser<->proxy uses HTTP/1.1 and proxy<->webserver uses HTTP/2).

Do open websockets use bandwidth or other resources?

If I have a websocket connection open between a server and a desktop client, is it true that there is no data or bandwidth being used or exchanged between the two except for when I explicitly send some? And if that is true, does that mean I could essentially have thousands of open connections on a server at a time so long as data was only being transferred very infrequently?
Technically yes. Although the WebSocket protocol has ping/pong frames, and the any of the two ends can send pings periodically and expect pongs as answers, otherwise the would kill the connection.
It would be a very bad idea not implement a "keep alive" mechanism, you won't be able of tell which connections are actually connected or improperly closed.
http://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html

What is the recommended HTTP POST content length?

I have several clients that constantly post data to a REST service. REST service is put behind a network load balancer. Each client sends 100 - 500 MB a day and I need to support 500+ clients.
I can POST either very large packets, this will reduce overhead for TCP/IP session set up and HTTP headers. This will, however, firmly tie one client to a particular server and limit my scalability options. Alternatively, I can send small HTTP packets, which I can load balance well, but I will get more overhead for TCP/IP session set up and HTTP headers.
What is the recommended packet size for HTTP POST? Or how can I calculate one for my environment?
There is no recommended size.
While HTTP POST size is not constrained by the RFCs, since HTTP is a commodity protocol implementing request / response type messaging, most of the infrastructure is configured around the idea that TCP connections are not particularly long lasting / does not carry significant amounts of data. i.e. there will be factors outside your control which may impact the service - although HTTP supports range requests for responses, there is no corollary for requests.
You can get around a lot of these (although not all) by using HTTPS. However you still need to think about how you detect/manage outages - are you happy to wait for a TCP timeout?
With 500+ clients presumably using the system quite heavily, the congestion avoidance limits shouldn't be a problem - whether TCP window scaling is likely to be an issue depends on how the system is used. HTTP handshakes should not be an issue unless you restrict the request size to something silly.
If the service is highly dependant on clients pushing lots of data on to your server, then I'd encourage you to look at parsing the data on the client (given the volume, presumably it's coming from files - implying a signed java applet or javascript with UniversalBrowserRead privilege) then sending it over a bi-directional communication channel (e.g. websocket).
Leaving that aside for now, the only way you can find out what the route between your clients and your server will support is to measure it - and monitor it. I would expect that a 2Mb upload size would work pretty much anywhere, while a 10Mb size would work most of the time within the US or Europe - and that you could probably increase this to 50Mb as long as there's no mobile clients.
But if you want to maintain the effectiveness of the service you'll need to monitor bandwidth, packet loss and lost connections.

Would you see a significant speedup using a single websocket connection for all requests on a website?

Imagine I'm building an ordinary old website. Not a game, not a chat program, an ordinary website. Let's say it's a stack overflow clone.
The client side would simply make service calls to the server side. The server is essentially a dumb data store and never sends down HTML. The client handles all templating via javascript.
If I established a single websocket connection and did all requests through that, would I see a significant speedup over doing ajax requests?
The obvious advantage to using a single connection is that it only has to be established once. But how much time does that actually save? I know establishing a TCP connection can be costly, but in the grand scheme of things, does it matter?
I would not recommend websockets for webpages. HTTP 1.1 can reuse a TCP-connection for multiple requests, it's only HTTP 1.0 that had to use a new TCP connection for each request.
SPDY is probably a protocol that do what you are looking for. See SPDY: An experimental protocol for a faster web, but it's only supported by Chrome.
If you use websockets, the requests will not be cached.
One HTTP connection can only by used for one HTTP request at the same time. Say that a page requested a 100Kb document, nothing else will be send from the client to the server until that 100Kb document has been transferred. This is called head-of-line blocking. The client can establish an additional connection with the server, but there is also a limit on the amount of concurrent connections with the same server.
One of the primary reasons for developing SPDY and later HTTP/2 was solving this exact problem. However, support for SPDY and HTTP/2 is not yet as widespread as for WebSocket. WebSocket can get you there earlier because it supports multiple streams in full-duplex mode.
Once HTTP/2 is better supported it will be the preferred solution for this problem, but WebSocket will still be better for real-time web applications, where server needs to push data to the client.
Have a look at the N2O framework, it was created to address the problems I described above. In N2O WebSocket is used to send all assets associated with a page.
How much speed you could gain from using WebSocket instead of standard HTTP requests pretty much depends on your specific website: how often it requests data from the server, how big is a typical response, etc.

What's the behavioral difference between HTTP Keep-Alive and Websockets?

I've been working with websockets lately in detail. Created my own server and there's a public demo. I don't have such detailed experience or knowledge re: http. (Although since websocket requests are upgraded http requests, I have some.)
On my end, the server reports details of each hit. Among them are a bunch of http keep-alive requests. My server doesn't handle them because they're not websocket requests. But it got my curiosity up.
The whole big thing about websockets is that the connection stays alive. Then you can pass messages in both directions (simultaneously even). I've read that the Keep-Alive HTTP connection is a relatively new development (I don't know how many years in people time, just that it's only included in the latest standard - 1.1 - is that actually old now?)
I guess I can assume that there's a behavioral difference between the two or there would have been no reason for a websocket standard? What's the difference?
A Keep Alive HTTP header since HTTP 1.0, which is used to indicate a HTTP client would like to maintain a persistent connection with HTTP server. The main objects is to eliminate the needs for opening TCP connection for each HTTP request. However, while there is a persistent connection open, the protocol for communication between client and server is still following the basic HTTP request/response pattern. In other word, server side can't push data to client.
WebSocket is completely different mechanism, which is used to setup a persistent, full-duplex connection. With this full-duplex connection, server side can push data to client and client should be expected to process data from server side at any time.
Quoting corresponding entries on Wikipedia for reference:
1) http://en.wikipedia.org/wiki/HTTP_persistent_connection
2) http://en.wikipedia.org/wiki/WebSocket
You should read up on COMET, a design pattern which shows the limits of HTTP Keep-Alive. Keep-Alive is over 12 years old now, so it's not a new feature of HTTP. The problem is that it's not sufficient; the client and server cannot communicate in a truly asynchronous manner. The client must always use a "hanging" request in order to get a message back from the server; the server may not just send a message to the client at any time it wants.
HTTP vs Websockets
REST (HTTP)
Resources benefit from caching when the representation of a resource changes rarely or multiple clients are expected to retrieve the resource.
HTTP methods have well-known idempotency and safety properties. A request is “idempotent” if it can be issued multiple times without resulting in unique outcomes.
The HTTP design allows for responses to describe errors with the request, with the resource, or to provide nuanced status information to differentiate between success scenarios.
Have request and response functionality.
HTTP v1.1 may allow multiple requests to reuse a single connection, there will generally be small timeout periods intended to control resource consumption.
You might be using HTTP incorrectly if…
Your design relies on a client polling the service often, without the user taking action.
Your design requires frequent service calls to send small messages.
The client needs to quickly react to a change to a resource, and it cannot predict when the change will occur.
The resulting design is cost-prohibitive. Ask yourself: Is a WebSocket solution substantially less effort to design, implement, test, and operate?
WebSockets
WebSocket design does not allow explicit or transparent proxies to cache messages, which can degrade client performance.
WebSocket protocol offers support only for error scenarios affecting the establishment of the connection. Once the connection is established and messages are exchanged, any additional error scenarios must be addressed in the messaging layer design, but WebSockets allow for a higher amount of efficiency compared to REST because they do not require the HTTP request/response overhead for each message sent and received.
When a client needs to react quickly to a change (especially one it cannot predict), a WebSocket may be best.
This makes the protocol well suited to “fire and forget” messaging scenarios and poorly suited for transactional requirements.
WebSockets were designed specifically for long-lived connection scenarios, they avoid the overhead of establishing connections and sending HTTP request/response headers, resulting in a significant performance boost
You might be using WebSockets incorrectly if..
The connection is used only for a very small number of events, or a very small amount of time, and the client does not - need to quickly react to the events.
Your feature requires multiple WebSockets to be open to the same service at once.
Your feature opens a WebSocket, sends messages, then closes it—then repeats the process later.
You’re re-implementing a request/response pattern within the messaging layer.
The resulting design is cost-prohibitive. Ask yourself: Is a HTTP solution substantially less effort to design, implement, test, and operate?
Ref: https://blogs.windows.com/buildingapps/2016/03/14/when-to-use-a-http-call-instead-of-a-websocket-or-http-2-0/

Resources