I am currently developping a massively multi-threaded application that heavily relies on gRPC (only one service)
As I am using a single Channel object shared between threads, the number of stubs/clients I should use is not clear to me.
How many stubs should I instantiate in this case (1 or n)?
Thanks for your help
It doesn't really matter. Channels are the expensive object, whereas stubs/clients are lighter weight. Each stub/client will be an allocation, but otherwise don't really have much overhead.
In Java, you are free to share stubs as they are thread-safe.
Related
I am currently developping a massively multi-threaded application that heavily relies on gRPC (only one service)
As I am using a single Channel object shared between threads, the number of stubs/clients I should use is not clear to me.
How many stubs should I instantiate in this case (1 or n)?
Thanks for your help
It doesn't really matter. Channels are the expensive object, whereas stubs/clients are lighter weight. Each stub/client will be an allocation, but otherwise don't really have much overhead.
In Java, you are free to share stubs as they are thread-safe.
We've looking at moving splitting up our architecture (and adding new components) using a Service Oriented Architecture (SOA). There will be a number of external API's that will be used by third parties, which we will make using a REST HTTP interface, however I was wondering what would be best to use internally as all components are with in our control and will be on the same network, however potentially different technologies (mainly .net and ruby on rails).
Would there be big performance/functionality gains in using a messaging system (redis, rabbitmq, EMS, other notable exceptions I've not heard of...) instead of HTTP (REST, SOAP, etc).
I've struggled to find good information on this topic and (as you can probably tell) I'm fairly new to this side area, so any advice or good resources would be appreciated!
Thnaks
Messaging tends to give you a more loosely coupled architecture. It can potentially be more robust as well, since individual components can fail without killing the entire infrastructure.
The downside is complexity, the paradigm shift to an asynchronous model, and possibly performance (especially if you're persisting messages every where).
You also need to ensure that your messaging system is particularly robust. A single aspect of your logic can go down and restart without affecting everything, but if you lose your core message base, then ALL of your logic is down waiting for the messaging to be back up.
Fortunately, the message bus can be long running without humans fiddling and touching it, the largest source of errors and instability in any system.
In addition to what #Will Hartung mentioned, I would also say that it depends on what you are going to do with your system. If you have mostly client-server type applications, where you have few servers/services and they tend to be completely independent, then it will probably be easier to implement service contracts via REST over HTTP.
If, on the other hand, your entire system is doing bi-directional communication, or if there are many inter-process calls (and particularly if every participant in the system is going to be both a client and a server at some point), then messaging is your best bet. Of the messaging options, I find that AMQP/RabbitMQ is the most feature-rich and easy to use of all of these. It offers you a true asynchronous platform to code against.
They key benefit to using messaging is that you can have queues for each type of message, so as your system expands and changes, the queues/messages can be the same, but the service that handles them can change underneath. It promotes separation of layers.
Finally, and this is a huge thing in my opinion, the proper use of messaging promotes small, independent pieces of code. These are both more testable and more maintainable, and in general it simplifies your enterprise architecture. If you attempt to handle too many services from HTTP endpoints, you will eventually (over the course of a year or two) end up with either (1) way too many endpoints to keep track of or (2) an unmaintainable mess of spaghetti code.
My company started out with using a message-based framework, and it has worked very well for us. The RabbitMQ server has easily been the most reliable component. Feel free to ask if you have any more questions about messaging or SOA.
I'm not very experienced in web programming,
and I haven't actually coded anything in Node.js yet, just curious about the event-driven approach. It does seems good.
The article explains some bad things that could happen when we use a thread-based approach to handle requests, and should opt for a event-driven approach instead.
In thread-based, the cashier/thread is stuck with us until our food/resource is ready. While in event-driven, the cashier send us somewhere out of the request queue so we don't block other requests while waiting for our food.
To scale the blocking thread-based, you need to increase the number of threads.
To me this seems like a bad excuse for not using threads/threadpools properly.
Couldn't that be properly handled using IHttpAsyncHandler?
ASP.Net receives a request, uses the ThreadPool and runs the handler (BeginProcessRequest), and then inside it we load the file/database with a callback. That Thread should then be free to handle other requests. Once the file-reading is done, the ThreadPool is called into action again and executes the remaining response.
Not so different for me, so why is that not as scalable?
One of the disadvantages of the thread-based that I do know is, using threads needs more memory. But only with these, you can enjoy the benefits of multiple cores. I doubt Node.js is not using any threads/cores at all.
So, based on just the event-driven vs thread-based (don't bring the "because it's Javascript and every browser..." argument), can someone point me out what is the actual benefit of using Node.js instead of the existing technology?
That was a long question. Thanks :)
First of all, Node.js is not multi-threaded. This is important. You have to be a very talented programmer to design programs that work perfectly in a threaded environment. Threads are just hard.
You have to be a god to maintain a threaded project where it wasn't designed properly. There are just so many problems that can be hard to avoid in very large projects.
Secondly, the whole platform was designed to be run asynchronously. Have you see any ASP.NET project where every single IO interaction was asynchronous? simply put, ASP.NET was not designed to be event-driven.
Then, there's the memory footprint due to the fact that we have one thread per open-connection and the whole scaling issue. Correct me if I'm wrong but I don't know how you would avoid creating a new thread for each connection in ASP.NET.
Another issue is that a Node.js request is idle when it's not being used or when it's waiting for IO. On the other hand, a C# thread sleeps. Now, there is a limit to the number of these threads that can sleep. In Node.js, you can easily handle 10k clients at the same time in parallel on one development machine. You try handling 10k threads in parallel on one development machine.
JavaScript itself as a language makes asynchronous coding easier. If you're still in C# 2.0, then the asynchronous syntax is a real pain. A lot of developers will simply get confused if you're defining Action<> and Function<> all over the place and using callbacks. An ASP.NET project written in an evented way is just not maintainable by an average ASP.NET developer.
As for threads and cores. Node.js is single-threaded and scales by creating multiple-node processes. If you have a 16 core then you run 16 instances of your node.js server and have a single Node.js load balancer in front of it. (Maybe a nginx load balancer if you want).
This was all written into the platform at a very low-level right from the beginning. This was not some functionality bolted on later down the line.
Other advantages
Node.js has a lot more to it then above. Above is only why Node.js' way of handling the event loop is better than doing it with asynchronous capabilities in ASP.NET.
Performance. It's fast. Real fast.
One big advantage of Node.js is its low-level API. You have a lot of control.
You have the entire HTTP server integrated directly into your code then outsourced to IIS.
You have the entire nginx vs Apache comparison.
The entire C10K challenge is handled well by node but not by IIS
AJAX and JSON communication feels natural and easy.
Real-time communication is one of the great things about Node.js. It was made for it.
Plays nicely with document-based nosql databases.
Can run a TCP server as well. Can do file-writing access, can run any unix console command on the server.
You query your database in javascript using, for example, CouchDB and map/reduce. You write your client in JavaScript. There are no context switches whilst developing on your web stack.
Rich set of community-driven open-source modules. Everything in node.js is open source.
Small footprint and almost no dependencies. You can build the node.js source yourself.
Disadvantages of Node.js
It's hard. It's young. As a skilled JavaScript developer, I face difficulty writing a website with Node.js just because of its low-level nature and the level of control I have. It feels just like C. A lot of flexibility and power either to be used for me or to hang me.
The API is not frozen. It's changing rapidly. I can imagine having to rewrite a large website completely in 5 years because of the amount Node.js will be changed by then. It is do-able, you just have to be aware that maintenance on node.js websites is not cheap.
further reading
http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/
http://blip.tv/file/2899135
http://nodeguide.com/
There are a lot of misconceptions regarding node.js vs. ASP.Net and asynchronous programming. You can do non blocking IO in ASP.NET. Most people don't know that the .Net framework uses Windows iocompletion ports underneath when you do web service calls or other I/O bound operations using the begin/end pattern in .Net 2.0 and above. IO completion ports is the way the Windows operating system supports non-blocking IO so that the app thread is freed why the IO operation completes. Interestingly, node.js uses a less optimal non blocking IO implementation in Windows through Cygwin. A new Windows version is on the road map, which with Microsoft's guidance will be using IO completions ports. At that point there is underneath no difference.
It is also possible to do non-blocking database calls in ADO.NET but be aware of ORM tools such as NHibernate and Entity Framework. They are still very much synchronous.
Synchronous IO (blocking) makes the control flow much clearer and it has for this reason become popular. The reason why computer environments are multithreaded has only superficially to do with this. It is more generally related to time sharing and utilization of multiple CPUs.
Having only a single thread can cause starvation during lengthy operations, which can be related to both IO and complex computations. So, even though the rule of thumb is one thread pr. core when utilizing non-blocking IO, one should still consider a sufficient thread pool size so that simple requests don't get starved by more complex operations if such exist. Multiple threads also allows complex operations to be split easily among multiple CPUs. A single threaded environment like node.js can only utilize multicore processors through more processes and message passing to coordinate action.
I have personally not yet seen any compelling argument to introduce an additional technology such a node.js. However, there may be good reasons but they have in my opinion little to do with servicing a large number of connections through non-blocking IO since this can also be done with ASP.NET.
BTW tamejs can help make your nodejs code more readable similar to the new upcoming .Net Async CTP.
It is easy to understate the cultural difference between the Node.js and ASP.NET communities. Sure, IHttpAsyncHandler exists and it's been around since .NET 1.0 so it might even be good, but all of the code and discussion around Node.js is about async I/O which is decidedly not the case when it comes to .NET. Want to use LINQ To SQL? You kind of can, kind of. Want to log stuff? Maybe "CSharp DotNet Logger" will work, maybe.
So yes, IHttpAsyncHandler is there and if you're really careful you might be able to write an event driven web-service without tripping over some blocking I/O somewhere, but I don't really get the impression a lot of people are using it (and it certainly isn't the prominent way for writing ASP.NET apps). In contrast, Node.js is all about evented I/O, all the code examples, all the libraries and it's the only way people are using it. So if you were going to bet on which one's evented I/O model actually worked all the way through, Node.js would probably be the one to pick.
As per current age technology improvements and reading below links, I can say, it is matter of expertise and choosing perfect mix as per the particular scenario that matters. NodeJS is getting mature and ASP.NET side we have ASP.NET MVC, WebAPI, and SignalR etc. to make things better.
Node.js vs .Net performance
http://www.salmanq.com/blog/net-and-node-js-performance-comparison/2013/03/
and
http://www.hanselman.com/blog/InstallingAndRunningNodejsApplicationsWithinIISOnWindowsAreYouMad.aspx
Thanks.
I was wondering if OCaml will perform well in terms of performance and ease of implementation while dealing with typical client/server interactions over TCP in a multi threaded environment.. I mean something really typical like having a thread per client that receives data, operated changes on game states and send them back to clients.
This because I need to write a server for a game and I always did these things in C but since now I know OCaml I was curious to know if it would be ok or I'll just find myself trying to solve a typical problem in a language that doesn't fit well that.
Performance: probably not. OCaml's threads do not provide parallel execution, they are only a way to structure your program. The OCaml runtime itself is not thread-safe, so the only code that could possibly execute in parallel of a single OCaml thread would be interfaced C code (without callbacks to OCaml!).
Implementation-wise, there is a mutex on the run-time, which is released when calling blocking C primitives, and could also be released when calling C functions that do significant work.
Ease of implementation: it wouldn't be world-changing. You would have the comfort of OCaml and a pthread-like library on the side. If you are looking for new things to discover while leveraging what you have learnt of OCaml, I recommend Jocaml. It goes in and out of sync with OCaml, but there was a (re-)re-implementation quite recently, and even when it is slightly out of sync, it is a lot of fun, and a completely new perspective of concurrent programs.
Jocaml is implemented on top of OCaml. What with the run-time not being concurrent and all, I am almost sure it uses separate processes and message-passing. But for the application that you mentioned it should be able to do fine.
OCaml is quite suitable for writing network servers, although as Pascal observes, there are limitations on threading.
Fortunately, however, threading isn't the only way to organize such a program. The Lwt library (for Light Weight Threads) provides an abstraction of asynchronous I/O that is quite easy to use (particularly when combined with a bit of syntax support). Everything actually runs in one thread, but it's all driven by an asynchronous I/O loop (built on the Unix select call), and the programming style lets you write code that looks like direct code (avoiding much of the normal code overhead of doing asynchronous I/O in many other languages). For example:
lwt my_message = read_message socket in
let repsonse = compute_response my_message in
send_response socket response
Both the read and the write happen back in the main event loop, but you avoid the normal "read, calling this function when you're done" manual overhead.
I'm so sorry this question has been sitting here for eight years with what I consider to be several quite bad answers because they all ignore the elephant in the room.
You say "really typical like having a thread per client" but having an OS thread per client is an extremely bad design. Threads are heavyweight, taking a long time to create and destroy and consuming ~1MB just for the thread stack. If you have one thread per connection then 1,000 simultaneous client connections (which is entirely feasible) will burn 1GB of RAM just for their stacks and the performance of your program (in any language) will be cripppled by the amount of context switching required to get any work done. You don't want to use that design in any language including both C and OCaml. Note that this problem is especially bad in the context of tracing garbage collected languages because the GC also traverses all of those thread stack in order to collate global roots before every GC cycle. I am the first to admit that this anti-pattern is ubiquitous in the real world but please don't copy it! I have seen "low latency" servers in the finance industry written in C++ using one thread per connection and they suffered latency stalls of up to six seconds just from the (Windows) OS servicing those threads.
See: http://people.eecs.berkeley.edu/~sangjin/2012/12/21/epoll-vs-kqueue.html
Let's consider an efficient design instead, like an epoll or kqueue interface to the OS kernel giving the server's code information about incoming and outgoing data buffers. Single threaded servers can attain excellent performance with this design. However, a typical server has serialization work to do per client and some core work that is often performed in serial across all client connections. Therefore, serialization and deserialization can be parallelized but the core server operation cannot. In this context, OCaml is great for everything except the serialization layer because it has poor support for parallelism.
I have personally implemented many servers for various industries with hugely varying performance requirements. In my experience, OCaml is one of the best tools for this because it offers excellent libraries (easy to use and reliable) and excellent serial performance. The only issue I have is around parallelizing the serialization layer but, in practice, I have found that OCaml runs circles around alternatives like Java and .NET even though they can parallelize this. I found typical latencies were ~100us for .NET and 10us for OCaml.
See also: http://prl.ccs.neu.edu/blog/2016/05/24/measuring-gc-latencies-in-haskell-ocaml-racket/
OCaml will work great for networking applications as long as you can live with a relatively small number of threads active at one time—say no more than 100. You could consider MLdonkey as an example, although in the client space, not in the server space.
Haskell would be a better choice if you want to use many preemptive threads. GHC can support huge numbers of threads and they run in parallel on multicore systems. OCaml prefers cooperative multithreading and multiple processes.
I have been researching asynchronous messaging, and I like the way it elegantly deals with some problems within certain domains and how it makes domain concepts more explicit. But is it a viable pattern for general domain-driven development (at least in the service/application/controller layer), or is the design overhead such that it should be restricted to SOA-based scenarios, like remote services and distributed processing?
Great question :). The main problem with asynchronous messaging is that when folks use procedural or object oriented languages, working in an asynchronous or event based manner is often quite tricky and complex and hard for the programmer to read & understand. Business logic is often way simpler if its built in a kinda synchronous manner - invoking methods and getting results immediately etc :).
My rule of thumb is generally to try use simpler synchronous programming models at the micro level for business logic; then use asynchrony and SEDA at the macro level.
For example submitting a purchase order might just write a message to a message queue; but the processing of the purchase order might require 10 different steps all being asynchronous and parallel in a high performance distributed system with many concurrent processes & threads processing individual steps in parallel. So the macro level wiring is based on a SEDA kind of approach - but at the micro level the code for the individual 10 steps could be written mostly in a synchronous programming style.
Like so many architecture and design questions, the answer is "it depends".
In my experience, the strength of asynchronous messaging has been in the loose coupling it brings to a design. The coupling can be in:
Time - Requests can be handled asynchronously, helping overall scalability.
Space - As you point out, allowing for distributed processing in a more robust way than many synchronous designs.
Technology - Messages and queues are one way to bridge technology differences.
Remember that messages and queues are an abstraction that can have a variety of implementations. You don't necessarily need to use a JMS-compliant, transactional, high-performance messaging framework. Implemented correctly, a table in a relational database can act as a queue with the rows as messages. I've seen both approaches used to great effect.
I agree with #BradS too BTW
BTW here's a way of hiding the middleware from your business logic while still getting the benefits of loose coupling & SEDA - while being able to easily switch between a variety of different middleware technology - from in memory SEDA to JMS to AMQP to JavaSpaces to database, files or FTP etc