I have read some lib as libev, all of them use the non-blocking io to handle the network communication. However, in which case the blocking io is used in networking?
For simple programs (e.g. test utility or dedicated client) or when dedicated threads are used.
In the first case there is no point in the extra logic involved with non-blocking I/O, while in the second case the logic is replaced with the logic involving multiple threads where the thread using blocking I/O essentially simulated a dedicated client (or server), this is done at the cost of additional resources for threads and synchronisation, but is often justified, especially when multiple threads are necessary anyway or where threads are cheap in terms of resources.
Non blocking I/O is often used in libraries and other cases where using dedicated threads cannot be justified, for resource, testability or portability concerns. It often boils down to individual taste. The difference often being a matter of style.
Related
The latest version of MPI includes these types of point to point (p-p) communication:
blocking
ordinary non-blocking
persistent non-blocking
partitioned non-blocking
As far as I know, historically blocking p-p communication was the first type that existed. Then different types of non-blocking p-p communication were introduced one after the other to increase performance. For example, they allow overlap of computation and communication. But are there cases where blocking p-p communication is actually faster than the non-blocking alternatives? If no, what does justify their existence? Simply backward compatibility and their simplicity of use?
It is a misconception that non-blocking communication was motivated by performance: it was mostly to be able to be able to express deadlock/serialization-free communication patterns. (Indeed, actually getting overlap of communication/computation was only possible through the "Iprobe trick". Only recently with "progress threads" has it become a more systematic possibility.)
Partitioned communication is intended for multi-threaded contexts. You may call that a performance argument, or a completely new use case.
Persistent sends have indeed the potential for performance improvement, since various setups and buffer allocations can be amortized. However, I don't see much evidence that MPI implementations actually do this. https://arxiv.org/abs/1809.10778
Finally, you're missing buffered, synchronized, and ready sends. These indeed have the potential to improve performance, though again I don't see much evidence that they do.
cool project.
one Question. Does it make sense to use it within php-fpm instance that takes request over nginx's fastcgi.
best.
That depends highly on what you're doing. If you can leverage it for concurrent I/O within the same request, you probably will be able to see improvements with using Amp or any other event loop + concurrent non-blocking I/O.
In case you're not aware, there's also a standalone application server called Aerys. If every I/O operation you do is non-blocking, you might benefit even more. Aerys runs by default with the number of workers corresponding to the number of CPU cores available. You won't have many worker threads with rather expensive context switches and no overhead from multiple worker processes.
I understood that .net know to use multiple threads for multiple requests.
So, if probably our service wont get more request than the number of threads our server can produce (it look like huge number), the only reason I can see to use async is on single request that do multiple blocking operations which can done in parallel.
Am I right?
Another advantage may be that serve multiple requests with same thread is cheaper than use multiple threads. How significant is this difference?
(note: no UI exists in our service (I saw that there is single thread for this, but it isn't relevant))
thanks!
Am I right?
No, doing multiple independent blocking operations, is the job of Concurrent APIs anyway (though sometimes they need Synchronization (like lock, mutex) to maintain the object state and avoid Race condition), but the usage of Async-Await is to schedule the IO Operations, like File Read / Write, call a remote service or Database Read / Write, which doesn't need a thread, as they are queued on a queue in hardware called IO Completion ports.
Benefits of Async-Await:
Doesn't start a IO operation on a separate Thread, since Thread is a costly resource, in terms memory and resource allocation and would do little precious than wait for IO call to come back. Separate thread shall be used for the compute bound operations, no IO bound.
Free up the UI / caller thread to make it completely responsive to carry out other tasks / operations
This is the evolution of Asynchronous programming model (BeginXX, EndXX), which was fairly complex to understand and implement
Another advantage may be that serve multiple requests with same thread is cheaper than use multiple threads. How significant is this difference?
Its a good strategy depending on the kind of request from caller, if they are compute bound better invoke a Parallel API and finish them fast, IO bound there's Async-Await, only issue with multiple threads is Resource allocation and Context switching, which needs to be factored in, but on other end it efficiently utilize the processor cores, which are fairly under utilized in the current day systems, as you would see most of the time processor is lying idle
I was wondering if OCaml will perform well in terms of performance and ease of implementation while dealing with typical client/server interactions over TCP in a multi threaded environment.. I mean something really typical like having a thread per client that receives data, operated changes on game states and send them back to clients.
This because I need to write a server for a game and I always did these things in C but since now I know OCaml I was curious to know if it would be ok or I'll just find myself trying to solve a typical problem in a language that doesn't fit well that.
Performance: probably not. OCaml's threads do not provide parallel execution, they are only a way to structure your program. The OCaml runtime itself is not thread-safe, so the only code that could possibly execute in parallel of a single OCaml thread would be interfaced C code (without callbacks to OCaml!).
Implementation-wise, there is a mutex on the run-time, which is released when calling blocking C primitives, and could also be released when calling C functions that do significant work.
Ease of implementation: it wouldn't be world-changing. You would have the comfort of OCaml and a pthread-like library on the side. If you are looking for new things to discover while leveraging what you have learnt of OCaml, I recommend Jocaml. It goes in and out of sync with OCaml, but there was a (re-)re-implementation quite recently, and even when it is slightly out of sync, it is a lot of fun, and a completely new perspective of concurrent programs.
Jocaml is implemented on top of OCaml. What with the run-time not being concurrent and all, I am almost sure it uses separate processes and message-passing. But for the application that you mentioned it should be able to do fine.
OCaml is quite suitable for writing network servers, although as Pascal observes, there are limitations on threading.
Fortunately, however, threading isn't the only way to organize such a program. The Lwt library (for Light Weight Threads) provides an abstraction of asynchronous I/O that is quite easy to use (particularly when combined with a bit of syntax support). Everything actually runs in one thread, but it's all driven by an asynchronous I/O loop (built on the Unix select call), and the programming style lets you write code that looks like direct code (avoiding much of the normal code overhead of doing asynchronous I/O in many other languages). For example:
lwt my_message = read_message socket in
let repsonse = compute_response my_message in
send_response socket response
Both the read and the write happen back in the main event loop, but you avoid the normal "read, calling this function when you're done" manual overhead.
I'm so sorry this question has been sitting here for eight years with what I consider to be several quite bad answers because they all ignore the elephant in the room.
You say "really typical like having a thread per client" but having an OS thread per client is an extremely bad design. Threads are heavyweight, taking a long time to create and destroy and consuming ~1MB just for the thread stack. If you have one thread per connection then 1,000 simultaneous client connections (which is entirely feasible) will burn 1GB of RAM just for their stacks and the performance of your program (in any language) will be cripppled by the amount of context switching required to get any work done. You don't want to use that design in any language including both C and OCaml. Note that this problem is especially bad in the context of tracing garbage collected languages because the GC also traverses all of those thread stack in order to collate global roots before every GC cycle. I am the first to admit that this anti-pattern is ubiquitous in the real world but please don't copy it! I have seen "low latency" servers in the finance industry written in C++ using one thread per connection and they suffered latency stalls of up to six seconds just from the (Windows) OS servicing those threads.
See: http://people.eecs.berkeley.edu/~sangjin/2012/12/21/epoll-vs-kqueue.html
Let's consider an efficient design instead, like an epoll or kqueue interface to the OS kernel giving the server's code information about incoming and outgoing data buffers. Single threaded servers can attain excellent performance with this design. However, a typical server has serialization work to do per client and some core work that is often performed in serial across all client connections. Therefore, serialization and deserialization can be parallelized but the core server operation cannot. In this context, OCaml is great for everything except the serialization layer because it has poor support for parallelism.
I have personally implemented many servers for various industries with hugely varying performance requirements. In my experience, OCaml is one of the best tools for this because it offers excellent libraries (easy to use and reliable) and excellent serial performance. The only issue I have is around parallelizing the serialization layer but, in practice, I have found that OCaml runs circles around alternatives like Java and .NET even though they can parallelize this. I found typical latencies were ~100us for .NET and 10us for OCaml.
See also: http://prl.ccs.neu.edu/blog/2016/05/24/measuring-gc-latencies-in-haskell-ocaml-racket/
OCaml will work great for networking applications as long as you can live with a relatively small number of threads active at one time—say no more than 100. You could consider MLdonkey as an example, although in the client space, not in the server space.
Haskell would be a better choice if you want to use many preemptive threads. GHC can support huge numbers of threads and they run in parallel on multicore systems. OCaml prefers cooperative multithreading and multiple processes.
I have a quick and dirty proof of concept app that I wrote in C# that reads high data rate multicast UDP packets from the network. For various reasons the full implementation will be written in C++ and I am considering using boost asio. The C# version used a thread to receive the data using blocking reads. I had some problems with dropped packets if the computer was heavily loaded (generally with processing those packets in another thread).
What I would like to know is if the async read operations in boost (which use overlapped io in windows) will help ensure that I receive the packets and/or reduce the cpu time needed to receive the packets. The single thread doing blocking reads is pretty straightforward, using the async reads seems like a step up in complexity, but I think it would be worth it if it provided higher performance or dropped fewer packets on a heavily loaded system. Currently the data rate should be no higher than 60Mb/s.
I've written some multicast handling code using boost::asio also. I would say that overall, in my experience there is a lot of added complexity to doing things in asio that may not make it easy for other people you work with to understand the code you end up writing.
That said, presumably the argument in favour of moving to asio instead of using lots of different threads to do the work is that you would have to do less context switching. This would clearly be true on a single-core box, but what about when you go multi-core? Are you planning to offload the work you receive to threads or just have a single thread doing the processing work? If you go for a single threaded approach you are going to end up in a situation where you could drop packets waiting for that thread to process the work.
In the end it's swings and roundabouts. I'd say you want to get some fairly solid figures backing up your arguments for going down this route if you are going to do so, just because of all the added complexity it entails (a whole new paradigm for some people I'm sure).