Use Jetty or Netty? - asynchronous

We're in the process of writing a high-performance server for processing messages. We've been using Jetty for several years and like it, but Netty looks like it has some cool features. In particular, it has support for asynchronous processing so a thread doesn't have to be tied up waiting for the system to process a given message. It's designed to solve the C10k problem.
I know that Jetty has some support for NIO internally. Does it also have an asynchronous model?
The messages are likely to be in http format. Does Netty have any performance advantages over Jetty when doing plain old http?
I'd like to have all the convenient features of a real servlet container, but not at the cost of reduced performance.

Jetty has had support for asynchronous request processing since version 6 (see here), using a proprietary API. More recent versions support the asynchronous API as part of the Servlet 3.0 API, like any other compliant implementation.
Using Netty would seem like a lot of work for little gain, unless you have highly specific requirements. Otherwise, Jetty would do the job for you with minimal effort.

Related

Using RabbitMQ over HTTP

I have to connect an old but critical software to RabbitMQ. The software doesn't support AMQP, but it can do HTTP Requests.
Does RabbitMQ support plain HTTP? Or should I use a "proxy" or "app" that actively transforms the HTTP Requests to AMQP 1.0 and pushes it to the RabbitMQ server?
https://www.rabbitmq.com/management.html
The management plugin supports a simple HTTP API to send and receive messages. This is primarily intended for diagnostic purposes but can be used for low volume messaging without reliable delivery.
As mentioned, it's designed for very low loads, but it may be usable. If you need higher loads, then by all means cast around for a library that does the job and create a proxy. Most languages will have something. I've personally created a lightweight API using Lumen and https://github.com/bschmitt/laravel-amqp to tie a few disparate services together in the past, and it seems to work very well.
It is possible not but really recommended depending on load. You have three options really, two of which are web socket based and one that seems like what you're looking for. I'd suggest starting with the rabbitmq docs.

Websockets for background processing

Is it a good idea to use Websockets (comet, server push, ...) to overcome a problem with long running HTTP requests? Imagine you have an app, build on full-stack web app framework, like Django, or Rails. You want to do some background processing in the name of performance. That's easy to do from programmer perspective, but the problem arises in the UI.
Users demand immediate response. So my idea was to use Socket.IO + node.js + AMQP messaging, to push notifications back to browsers, once the background task completed. I like the idea, but it still feels like lots of engineering, just because we don't want to long running requests in our main app. Competing idea could be to use another, more robust, web server, that can handle many long running HTTP requests.
Which one you think is better?
Is it a good idea to use Websockets to
overcome a problem with long running HTTP requests?
Yes it is. You can save singificant amount of data when compared to other techniques, such as continuous or long polling. Try to look at this article, namely the Step 3 part.
I like the idea, but it still feels like lots of engineering, just
because we don't want to long running requests in our main app.
Competing idea could be to use another, more robust, web server, that
can handle many long running HTTP requests.
Socket.io abstracts transport layer and fallback solutions (in case of websockets absence) for you. If you want to use socket.io/node.js/AMPQ stack only for messaging and notifications then it shouldn't be a complex or time consuming development process, however it may depend on various stuff around.
By delegating messaging/notifications to node.js you may disburden your main app to great extent thanks to its non-blocking architecture although you will introduce dependency on another technology.
On the other hand choosing more performant web server may solve your performance concerns for some time, but you may eventually end up with scaling your system (either up or out).
WebSockets in themselves provide little here over e.g. XHR or jsonp long polling. From the user's perspective, messaging over either transport would feel the same. From the server's perspective, an open WebSocket connection or an open long poll isn't violently different.
What you're really doing, and should be doing regardless of the underlying technology, is build your application to be asynchronous - event driven.

What so different about Node.js's event-driven? Can't we do that in ASP.Net's HttpAsyncHandler?

I'm not very experienced in web programming,
and I haven't actually coded anything in Node.js yet, just curious about the event-driven approach. It does seems good.
The article explains some bad things that could happen when we use a thread-based approach to handle requests, and should opt for a event-driven approach instead.
In thread-based, the cashier/thread is stuck with us until our food/resource is ready. While in event-driven, the cashier send us somewhere out of the request queue so we don't block other requests while waiting for our food.
To scale the blocking thread-based, you need to increase the number of threads.
To me this seems like a bad excuse for not using threads/threadpools properly.
Couldn't that be properly handled using IHttpAsyncHandler?
ASP.Net receives a request, uses the ThreadPool and runs the handler (BeginProcessRequest), and then inside it we load the file/database with a callback. That Thread should then be free to handle other requests. Once the file-reading is done, the ThreadPool is called into action again and executes the remaining response.
Not so different for me, so why is that not as scalable?
One of the disadvantages of the thread-based that I do know is, using threads needs more memory. But only with these, you can enjoy the benefits of multiple cores. I doubt Node.js is not using any threads/cores at all.
So, based on just the event-driven vs thread-based (don't bring the "because it's Javascript and every browser..." argument), can someone point me out what is the actual benefit of using Node.js instead of the existing technology?
That was a long question. Thanks :)
First of all, Node.js is not multi-threaded. This is important. You have to be a very talented programmer to design programs that work perfectly in a threaded environment. Threads are just hard.
You have to be a god to maintain a threaded project where it wasn't designed properly. There are just so many problems that can be hard to avoid in very large projects.
Secondly, the whole platform was designed to be run asynchronously. Have you see any ASP.NET project where every single IO interaction was asynchronous? simply put, ASP.NET was not designed to be event-driven.
Then, there's the memory footprint due to the fact that we have one thread per open-connection and the whole scaling issue. Correct me if I'm wrong but I don't know how you would avoid creating a new thread for each connection in ASP.NET.
Another issue is that a Node.js request is idle when it's not being used or when it's waiting for IO. On the other hand, a C# thread sleeps. Now, there is a limit to the number of these threads that can sleep. In Node.js, you can easily handle 10k clients at the same time in parallel on one development machine. You try handling 10k threads in parallel on one development machine.
JavaScript itself as a language makes asynchronous coding easier. If you're still in C# 2.0, then the asynchronous syntax is a real pain. A lot of developers will simply get confused if you're defining Action<> and Function<> all over the place and using callbacks. An ASP.NET project written in an evented way is just not maintainable by an average ASP.NET developer.
As for threads and cores. Node.js is single-threaded and scales by creating multiple-node processes. If you have a 16 core then you run 16 instances of your node.js server and have a single Node.js load balancer in front of it. (Maybe a nginx load balancer if you want).
This was all written into the platform at a very low-level right from the beginning. This was not some functionality bolted on later down the line.
Other advantages
Node.js has a lot more to it then above. Above is only why Node.js' way of handling the event loop is better than doing it with asynchronous capabilities in ASP.NET.
Performance. It's fast. Real fast.
One big advantage of Node.js is its low-level API. You have a lot of control.
You have the entire HTTP server integrated directly into your code then outsourced to IIS.
You have the entire nginx vs Apache comparison.
The entire C10K challenge is handled well by node but not by IIS
AJAX and JSON communication feels natural and easy.
Real-time communication is one of the great things about Node.js. It was made for it.
Plays nicely with document-based nosql databases.
Can run a TCP server as well. Can do file-writing access, can run any unix console command on the server.
You query your database in javascript using, for example, CouchDB and map/reduce. You write your client in JavaScript. There are no context switches whilst developing on your web stack.
Rich set of community-driven open-source modules. Everything in node.js is open source.
Small footprint and almost no dependencies. You can build the node.js source yourself.
Disadvantages of Node.js
It's hard. It's young. As a skilled JavaScript developer, I face difficulty writing a website with Node.js just because of its low-level nature and the level of control I have. It feels just like C. A lot of flexibility and power either to be used for me or to hang me.
The API is not frozen. It's changing rapidly. I can imagine having to rewrite a large website completely in 5 years because of the amount Node.js will be changed by then. It is do-able, you just have to be aware that maintenance on node.js websites is not cheap.
further reading
http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/
http://blip.tv/file/2899135
http://nodeguide.com/
There are a lot of misconceptions regarding node.js vs. ASP.Net and asynchronous programming. You can do non blocking IO in ASP.NET. Most people don't know that the .Net framework uses Windows iocompletion ports underneath when you do web service calls or other I/O bound operations using the begin/end pattern in .Net 2.0 and above. IO completion ports is the way the Windows operating system supports non-blocking IO so that the app thread is freed why the IO operation completes. Interestingly, node.js uses a less optimal non blocking IO implementation in Windows through Cygwin. A new Windows version is on the road map, which with Microsoft's guidance will be using IO completions ports. At that point there is underneath no difference.
It is also possible to do non-blocking database calls in ADO.NET but be aware of ORM tools such as NHibernate and Entity Framework. They are still very much synchronous.
Synchronous IO (blocking) makes the control flow much clearer and it has for this reason become popular. The reason why computer environments are multithreaded has only superficially to do with this. It is more generally related to time sharing and utilization of multiple CPUs.
Having only a single thread can cause starvation during lengthy operations, which can be related to both IO and complex computations. So, even though the rule of thumb is one thread pr. core when utilizing non-blocking IO, one should still consider a sufficient thread pool size so that simple requests don't get starved by more complex operations if such exist. Multiple threads also allows complex operations to be split easily among multiple CPUs. A single threaded environment like node.js can only utilize multicore processors through more processes and message passing to coordinate action.
I have personally not yet seen any compelling argument to introduce an additional technology such a node.js. However, there may be good reasons but they have in my opinion little to do with servicing a large number of connections through non-blocking IO since this can also be done with ASP.NET.
BTW tamejs can help make your nodejs code more readable similar to the new upcoming .Net Async CTP.
It is easy to understate the cultural difference between the Node.js and ASP.NET communities. Sure, IHttpAsyncHandler exists and it's been around since .NET 1.0 so it might even be good, but all of the code and discussion around Node.js is about async I/O which is decidedly not the case when it comes to .NET. Want to use LINQ To SQL? You kind of can, kind of. Want to log stuff? Maybe "CSharp DotNet Logger" will work, maybe.
So yes, IHttpAsyncHandler is there and if you're really careful you might be able to write an event driven web-service without tripping over some blocking I/O somewhere, but I don't really get the impression a lot of people are using it (and it certainly isn't the prominent way for writing ASP.NET apps). In contrast, Node.js is all about evented I/O, all the code examples, all the libraries and it's the only way people are using it. So if you were going to bet on which one's evented I/O model actually worked all the way through, Node.js would probably be the one to pick.
As per current age technology improvements and reading below links, I can say, it is matter of expertise and choosing perfect mix as per the particular scenario that matters. NodeJS is getting mature and ASP.NET side we have ASP.NET MVC, WebAPI, and SignalR etc. to make things better.
Node.js vs .Net performance
http://www.salmanq.com/blog/net-and-node-js-performance-comparison/2013/03/
and
http://www.hanselman.com/blog/InstallingAndRunningNodejsApplicationsWithinIISOnWindowsAreYouMad.aspx
Thanks.

Jetty or Tomcat with Non-blocking IO (servlet 3.0)

I need a point to start from. I read from Yakov Fain about a performance break through with jetty and blazeds.
I realized that we already have some trouble with about 1200 concurrent users, some consumers dont get messages and cpu is under heavy fire.
Did somebody already tried this Nio with BlazeDS?
Did this work with Tomcat too?
Where to start and what do I need to improve messaging performance?
Thank you so much!!!
I would suggest before you go down the road of customizing BlazeDS to support NIO that you profile your application and verify were the hotspots are. Have you verified that it is the BlazeDS networking stack that is causing lost messages? Have you profiled your code to see if there are optimizations that can be done to better optimize message handling?
Some actually contend the Java NIO doesn't actually improve through-put - http://paultyma.blogspot.com/2008/03/writing-java-multithreaded-servers.html
I say this because BlazeDS does not support NIO only the commercial version of the server does - LCDS. What LCDS does actually set-up it's own NIO sockets and manages requests through these connections, bypassing the standard servlet stack. To get NIO support Yakov said "To support thousands concurrent users you also need to customize networking layer of BlazeDS" I would be willing to guess this customized networking layer is not production ready and is more of a prototype because it is extremely difficult to reliably customize the networking layer of any server.

TCP Vs. Http Benchmark

I am having a Web application sitting on IIS, and talking with [remote]Service-Machine.
I am not sure whether to choose TCP or Http, as the main protocol.
more details:
i will have more than one service\endpoint
some of them will be one-way
the other will be two-ways
the web pages will work infront of the services
we are talking about hi-scale web-site
I know the difference pretty well, but I am looking for a good benchmark, that shows how much faster is the TCP?
HTTP is a layer built ontop of the TCP layer to some what standardize data transmission. So naturally using TCP sockets will be less heavy than using HTTP. If performance is the only thing you care about then plain TCP is the best solution for you.
You may want to consider HTTP because of its ease of use and simplicity which ultimately reduces development time. If you are doing something that might be directly consumed by a browser (through an AJAX call) then you should use HTTP. For a non-modern browser to directly consume TCP connections without HTTP you would have to use Flash or Silverlight and this normally happens for rich content such as video and/or audio. However, many modern browsers now (as of 2013) support API's to access network, audio, and video resources directly via JavaScript. The only thing to consider is the usage rate of modern web browsers among your users; see caniuse.com for the latest info regarding browser compatibility.
As for benchmarks, this is the only thing I found. See page 5, it has the performance graph. Note that it doesn't really compare apples to apples since it compares the TCP/Binary data option with the HTTP/XML data option. Which begs the question: what kind of data are your services outputting? binary (video, audio, files) or text (JSON, XML, HTML)?
In general performance oriented system like those in the military or financial sectors will probably use plain TCP connections. Where as general web focused companies will opt to use HTTP and use IIS or Apache to host their services.
The question you really need an answer for is "will TCP or HTTP be faster for my application". The answer is that it depends on the nature of your application, and on the way that you use TCP and/or HTTP in your application. A generic HTTP vs TCP benchmark won't answer your question, because the chances are that the benchmark won't match your application behaviour.
In theory, an optimally designed / implemented solution using TCP will be faster than one that uses HTTP. But it may also be considerably more work to implement ... depending on the details of your application.
There are other issues that might affect your choice. For example, you are less likely to run into firewall issues if you use HTTP than if you use TCP on some random port. Another is that HTTP would make it easier to implement a load balancer between the IIS server and the backend systems.
Finally, at the end of the day it is probably more important that your system is secure, reliable, maintainable and (maybe) scalable than it is fast. A sensible strategy is to implement the simple version first, but have plans in your head for how to make it faster ... if the simple solution is too slow.
You could always benchmark it.
In general, if what you want to accomplish can be easily done over HTTP (i.e. the only reason you would otherwise think about using raw TCP is for a possible performance boost) you should probably just use HTTP. Sure, you can do socket programming, but why bother? Lots of people have spent a lot of time and effort building HTTP client libraries and servers, and they have spent waaaaaay more time optimizing and testing that code than you will ever be able to possibly spend on your TCP sockets. There are simply so many possible errors that you would have to handle, edge cases, and optimizations that can be done, that it is usually easier and safer to use a library function for HTTP.
Plus, the HTTP specs define all kinds of features (and clients/servers implement, which you get to use "for free", i.e. no extra implementation work) which makes any third-party interoperability that much easier. "Here is my URL, here are the rules for what you send, here are the rules for what I return..."
I have a Self Hosted Windows native C++ server application that I use the Casablanca C++ REST SDK code in. I can use any client C#, JavaScript, C++, cURL, basically anything that can send a POST, GET, PUT, DEL message can be used to send request messages to this self hosted windows app. Also I can use a plain browser address bar to do GET related requests using various parameters. Currently I only run this system on a private intranet so it is very fast - I haven't benchmark it against just doing raw TCP, but on a private intranet I doubt there would be even a few microseconds difference? For the convenience and ease of development and ability to expand to full blown internet app it's a dream come true. It is a dedicated system with a private protocol using small JSON packets so not certain if that fits your application needs or not? Another nice thing is this Windows application native C++ code could be ported fairly easily to run on Linux/MacOS as the Casablanca REST SDK is portable to those OSes.

Resources