asp.net mvc 2 and .net 4 parallelism - asp.net

How can MVC website benefit from the new parallelism features in .net 4? don't websites support parallelism by default since multiple users access them at the same time? Can someone clarify this?

Executing tasks in parallel is especially useful for long running tasks. What constitutes a long running task might differ, but it should be longer than the overhead of spinning up and synchronizing the threads.
So, there is no particular benefit for MVC, but there is a general benefit for each request which require more things to be run in parallel.
There's an article from 2007 in MSDN Magazine which outlines some performance aspects of the parallel library.
Example 1: a user hits a page which displays two different graphs. Each graph is calculated from a dataset. Executing the calculations in parallel will benefit the overall time to render the page. (Executing parallel individual tasks)
Example 2: You need to execute some function on a list of data, And use Parallel.For to enumerate over the data and execute some code on it in parallel.
You should analyze your application and figure out which parts can be run in parallel, and then test with the new language features if it helps your application or not.

Parallelism is by computer processor. Parallelism doesn't mean multiple users access the application. Parallelism is that for one request, the app can split tasks across multiple processors to do the work. So multiple users accessing the app is a factor in how the server dishes out the work, but not the core idea of parallelism.

Every request is handled in its own thread, so you get some degree of parallelism by default with ASP.Net.
When considering implementing any kind of parallelism it is important to test your code with and without parallelism. You have to make sure that the overhead of wrapping, distributing and executing each task is not greater than running the tasks in serial. It often is for most of your day-to-day loops. Parallelism is best used in computationally intensive tasks.

Related

What is missing in JuliaDB to use it as production database in a website backend?

I have some difficulties to understand the pros and the cons of using JuliaDB as a main backend database for a production website.
https://juliadb.org/
My use case is a collaborative data sciences platform. The client request 1 million unique visitors and 100 000 writes per day. Well... I wish so.
Implementing a SQL database mean that I need to "translate" the data science dataframes used for the calculus into SQL and backwards.
In the other hands JuliaDB is an end to end solution.
Regarding the different criterions for a website production database:
Julia natively hase concurrency:
Julia supports three main categories of features for concurrent and
parallel programming:
Asynchronous "tasks", or coroutines Multi-threading Distributed
computing Julia Tasks allow suspending and resuming computations for
I/O, event handling, producer-consumer processes, and similar
patterns. Tasks can synchronize through operations like wait and
fetch, and communicate via Channels.
Multi-threading functionality builds on tasks by allowing them to run
simultaneously on more than one thread or CPU core, sharing memory.
Finally, distributed computing runs multiple processes with separate
memory spaces, potentially on different machines. This functionality
is provided by the Distributed standard library as well as external
packages like MPI.jl and DistributedArrays.jl.
In the other hand, JuliaDB doc tells that they support parallel computing but does not give much details.
Can JuliaDB handle parallel connections and asynchronous operation, making it performant for lots of users using it in parrallel?
From your question it looks that what you need is a massively parallel data ingestion mechanism. You a software architecture that allows to concurrently collect data for huge amount of users.
Perhaps you should look at one of noSQL databases that provide horizontal scaling capability good example could be MongoDB (or perhaps a cloud equivalent such as DynamoDB).
If your data volume and parallelism is even higher you should consider streaming solution such as Apache Kafka.
On the other hand JuliaDB is totally on the other site of processing workflow. Once your massive data gets collected it ends up in an analytic process. In the recent years the most popular tool was the Hadoop stack with Apache Spark used for processing.
JuliaDB brings a new paradigm on the analytics step of data workflow. With this tool you can massively parallelize processing of huge data and hence you should consider JuliaDB as a nice alternative to Spark.

Will using Parallel.ForEach in ASP.NET prevent other threads from processing requests?

I'm looking at using some PLINQ in an ASP.NET application to increase throughput on some I/O bound and computationally bound processing we have (e.g., iterating a collection to use for web service calls and calculating some geo-coordinate distances).
My question is: should I be concerned that running operations like Parallel.ForEach will take threads away from processing other requests, or does it use a separate pool of threads (I've heard about something called I/O completion ports, but not sure how that plays into the discussion)?
Parallel.ForEach will, at most, use one thread for as many cores as you have. The thread pool default maximum size is 250 times the number of cores you have. So you'll have to be really trying to run out of available threads.

.NET 4 Parallel namespace in a server environment - pros and cons?

I have been playing with the Parallel namespace of .NET4 on client apps a little. Seems to do a pretty decent job and really speeds up loops easily without worrying about threads.
However, what is the recommendation in a server environment, on a website powered by ASP.NET? If I imagine a server with four CPU cores and I use some code based on Parallel namespace's methods, I can loop whatever list I have pretty fast.
At the same time I will probably block all four CPU cores. If instead I use one thread only to do the looping, other users may access the same page of my website and get a response - just slower.
So how do Parallel namespace, ASP.NET and IIS play together? Is it somehow managed?
Personally, my biggest concern was sessions (in ORM tools) not being thread safe thus any thread would require its own session. So it is one major issue to take into account while using ORM tools (if any, i.e. entity framework) and auto-parallelism.
Also straight from MSDN: The .NET Framework Parallel Extensions allow you to nest parallel loops. In this situation, the run-time environment coordinates the use of processor resources. You'll find that with nested parallel loops, each loop uses fewer threads than in a non-nested loop.
A related problem is the handling of parallel loops in server applications. The Parallel class attempts to deal with multiple AppDomains in server applications in exactly the same manner that nested loops are handled. If the server application is already using all the available thread pool threads to process other ASP.NET requests, a parallel loop will only run on the thread that invoked it. If the workload decreases and other threads become available and there's no other ASP.NET work to be processed, the loop will start using additional threads.

Is OCaml suitable to write networking servers?

I was wondering if OCaml will perform well in terms of performance and ease of implementation while dealing with typical client/server interactions over TCP in a multi threaded environment.. I mean something really typical like having a thread per client that receives data, operated changes on game states and send them back to clients.
This because I need to write a server for a game and I always did these things in C but since now I know OCaml I was curious to know if it would be ok or I'll just find myself trying to solve a typical problem in a language that doesn't fit well that.
Performance: probably not. OCaml's threads do not provide parallel execution, they are only a way to structure your program. The OCaml runtime itself is not thread-safe, so the only code that could possibly execute in parallel of a single OCaml thread would be interfaced C code (without callbacks to OCaml!).
Implementation-wise, there is a mutex on the run-time, which is released when calling blocking C primitives, and could also be released when calling C functions that do significant work.
Ease of implementation: it wouldn't be world-changing. You would have the comfort of OCaml and a pthread-like library on the side. If you are looking for new things to discover while leveraging what you have learnt of OCaml, I recommend Jocaml. It goes in and out of sync with OCaml, but there was a (re-)re-implementation quite recently, and even when it is slightly out of sync, it is a lot of fun, and a completely new perspective of concurrent programs.
Jocaml is implemented on top of OCaml. What with the run-time not being concurrent and all, I am almost sure it uses separate processes and message-passing. But for the application that you mentioned it should be able to do fine.
OCaml is quite suitable for writing network servers, although as Pascal observes, there are limitations on threading.
Fortunately, however, threading isn't the only way to organize such a program. The Lwt library (for Light Weight Threads) provides an abstraction of asynchronous I/O that is quite easy to use (particularly when combined with a bit of syntax support). Everything actually runs in one thread, but it's all driven by an asynchronous I/O loop (built on the Unix select call), and the programming style lets you write code that looks like direct code (avoiding much of the normal code overhead of doing asynchronous I/O in many other languages). For example:
lwt my_message = read_message socket in
let repsonse = compute_response my_message in
send_response socket response
Both the read and the write happen back in the main event loop, but you avoid the normal "read, calling this function when you're done" manual overhead.
I'm so sorry this question has been sitting here for eight years with what I consider to be several quite bad answers because they all ignore the elephant in the room.
You say "really typical like having a thread per client" but having an OS thread per client is an extremely bad design. Threads are heavyweight, taking a long time to create and destroy and consuming ~1MB just for the thread stack. If you have one thread per connection then 1,000 simultaneous client connections (which is entirely feasible) will burn 1GB of RAM just for their stacks and the performance of your program (in any language) will be cripppled by the amount of context switching required to get any work done. You don't want to use that design in any language including both C and OCaml. Note that this problem is especially bad in the context of tracing garbage collected languages because the GC also traverses all of those thread stack in order to collate global roots before every GC cycle. I am the first to admit that this anti-pattern is ubiquitous in the real world but please don't copy it! I have seen "low latency" servers in the finance industry written in C++ using one thread per connection and they suffered latency stalls of up to six seconds just from the (Windows) OS servicing those threads.
See: http://people.eecs.berkeley.edu/~sangjin/2012/12/21/epoll-vs-kqueue.html
Let's consider an efficient design instead, like an epoll or kqueue interface to the OS kernel giving the server's code information about incoming and outgoing data buffers. Single threaded servers can attain excellent performance with this design. However, a typical server has serialization work to do per client and some core work that is often performed in serial across all client connections. Therefore, serialization and deserialization can be parallelized but the core server operation cannot. In this context, OCaml is great for everything except the serialization layer because it has poor support for parallelism.
I have personally implemented many servers for various industries with hugely varying performance requirements. In my experience, OCaml is one of the best tools for this because it offers excellent libraries (easy to use and reliable) and excellent serial performance. The only issue I have is around parallelizing the serialization layer but, in practice, I have found that OCaml runs circles around alternatives like Java and .NET even though they can parallelize this. I found typical latencies were ~100us for .NET and 10us for OCaml.
See also: http://prl.ccs.neu.edu/blog/2016/05/24/measuring-gc-latencies-in-haskell-ocaml-racket/
OCaml will work great for networking applications as long as you can live with a relatively small number of threads active at one time—say no more than 100. You could consider MLdonkey as an example, although in the client space, not in the server space.
Haskell would be a better choice if you want to use many preemptive threads. GHC can support huge numbers of threads and they run in parallel on multicore systems. OCaml prefers cooperative multithreading and multiple processes.

Using ThreadPool.QueueUserWorkItem in ASP.NET in a high traffic scenario

I've always been under the impression that using the ThreadPool for (let's say non-critical) short-lived background tasks was considered best practice, even in ASP.NET, but then I came across this article that seems to suggest otherwise - the argument being that you should leave the ThreadPool to deal with ASP.NET related requests.
So here's how I've been doing small asynchronous tasks so far:
ThreadPool.QueueUserWorkItem(s => PostLog(logEvent))
And the article is suggesting instead to create a thread explicitly, similar to:
new Thread(() => PostLog(logEvent)){ IsBackground = true }.Start()
The first method has the advantage of being managed and bounded, but there's the potential (if the article is correct) that the background tasks are then vying for threads with ASP.NET request-handlers. The second method frees up the ThreadPool, but at the cost of being unbounded and thus potentially using up too many resources.
So my question is, is the advice in the article correct?
If your site was getting so much traffic that your ThreadPool was getting full, then is it better to go out-of-band, or would a full ThreadPool imply that you're getting to the limit of your resources anyway, in which case you shouldn't be trying to start your own threads?
Clarification: I'm just asking in the scope of small non-critical asynchronous tasks (eg, remote logging), not expensive work items that would require a separate process (in these cases I agree you'll need a more robust solution).
Other answers here seem to be leaving out the most important point:
Unless you are trying to parallelize a CPU-intensive operation in order to get it done faster on a low-load site, there is no point in using a worker thread at all.
That goes for both free threads, created by new Thread(...), and worker threads in the ThreadPool that respond to QueueUserWorkItem requests.
Yes, it's true, you can starve the ThreadPool in an ASP.NET process by queuing too many work items. It will prevent ASP.NET from processing further requests. The information in the article is accurate in that respect; the same thread pool used for QueueUserWorkItem is also used to serve requests.
But if you are actually queuing enough work items to cause this starvation, then you should be starving the thread pool! If you are running literally hundreds of CPU-intensive operations at the same time, what good would it do to have another worker thread to serve an ASP.NET request, when the machine is already overloaded? If you're running into this situation, you need to redesign completely!
Most of the time I see or hear about multi-threaded code being inappropriately used in ASP.NET, it's not for queuing CPU-intensive work. It's for queuing I/O-bound work. And if you want to do I/O work, then you should be using an I/O thread (I/O Completion Port).
Specifically, you should be using the async callbacks supported by whatever library class you're using. These methods are always very clearly labeled; they start with the words Begin and End. As in Stream.BeginRead, Socket.BeginConnect, WebRequest.BeginGetResponse, and so on.
These methods do use the ThreadPool, but they use IOCPs, which do not interfere with ASP.NET requests. They are a special kind of lightweight thread that can be "woken up" by an interrupt signal from the I/O system. And in an ASP.NET application, you normally have one I/O thread for each worker thread, so every single request can have one async operation queued up. That's literally hundreds of async operations without any significant performance degradation (assuming the I/O subsystem can keep up). It's way more than you'll ever need.
Just keep in mind that async delegates do not work this way - they'll end up using a worker thread, just like ThreadPool.QueueUserWorkItem. It's only the built-in async methods of the .NET Framework library classes that are capable of doing this. You can do it yourself, but it's complicated and a little bit dangerous and probably beyond the scope of this discussion.
The best answer to this question, in my opinion, is don't use the ThreadPool or a background Thread instance in ASP.NET. It's not at all like spinning up a thread in a Windows Forms application, where you do it to keep the UI responsive and don't care about how efficient it is. In ASP.NET, your concern is throughput, and all that context switching on all those worker threads is absolutely going to kill your throughput whether you use the ThreadPool or not.
Please, if you find yourself writing threading code in ASP.NET - consider whether or not it could be rewritten to use pre-existing asynchronous methods, and if it can't, then please consider whether or not you really, truly need the code to run in a background thread at all. In the majority of cases, you will probably be adding complexity for no net benefit.
Per Thomas Marquadt of the ASP.NET team at Microsoft, it is safe to use the ASP.NET ThreadPool (QueueUserWorkItem).
From the article:
Q) If my ASP.NET Application uses CLR ThreadPool threads, won’t I starve ASP.NET, which also uses the CLR ThreadPool to execute requests?
..
A) To summarize, don’t worry about
starving ASP.NET of threads, and if
you think there’s a problem here let
me know and we’ll take care of it.
Q) Should I create my own threads
(new Thread)? Won’t this be better
for ASP.NET, since it uses the CLR
ThreadPool.
A) Please don’t. Or to put it a
different way, no!!! If you’re really
smart—much smarter than me—then you
can create your own threads;
otherwise, don’t even think about it.
Here are some reasons why you should
not frequently create new threads:
It is very expensive, compared to
QueueUserWorkItem...By the way, if you can write a better ThreadPool than the CLR’s, I encourage you to apply for a job at Microsoft, because we’re definitely looking for people like you!.
Websites shouldn't go around spawning threads.
You typically move this functionality out into a Windows Service that you then communicate with (I use MSMQ to talk to them).
-- Edit
I described an implementation here: Queue-Based Background Processing in ASP.NET MVC Web Application
-- Edit
To expand why this is even better than just threads:
Using MSMQ, you can communicate to another server. You can write to a queue across machines, so if you determine, for some reason, that your background task is using up the resources of the main server too much, you can just shift it quite trivially.
It also allows you to batch-process whatever task you were trying to do (send emails/whatever).
I definitely think that general practice for quick, low-priority asynchronous work in ASP.NET would be to use the .NET thread pool, particularly for high-traffic scenarios as you want your resources to be bounded.
Also, the implementation of threading is hidden - if you start spawning your own threads, you have to manage them properly as well. Not saying you couldn't do it, but why reinvent that wheel?
If performance becomes an issue, and you can establish that the thread pool is the limiting factor (and not database connections, outgoing network connections, memory, page timeouts etc) then you tweak the thread pool configuration to allow more worker threads, higher queued requests, etc.
If you don't have a performance problem then choosing to spawn new threads to reduce contention with the ASP.NET request queue is classic premature optimization.
Ideally you wouldn't need to use a separate thread to do a logging operation though - just enable the original thread to complete the operation as quickly as possible, which is where MSMQ and a separate consumer thread / process come in to the picture. I agree that this is heavier and more work to implement, but you really need the durability here - the volatility of a shared, in-memory queue will quickly wear out its welcome.
You should use QueueUserWorkItem, and avoid creating new threads like you would avoid the plague. For a visual that explains why you won't starve ASP.NET, since it uses the same ThreadPool, imagine a very skilled juggler using two hands to keep a half dozen bowling pins, swords, or whatever in flight. For a visual of why creating your own threads is bad, imagine what happens in Seattle at rush hour when heavily used entrance ramps to the highway allow vehicles to enter traffic immediately instead of using a light and limiting the number of entrances to one every few seconds. Finally, for a detailed explanation, please see this link:
http://blogs.msdn.com/tmarq/archive/2010/04/14/performing-asynchronous-work-or-tasks-in-asp-net-applications.aspx
Thanks,
Thomas
That article is not correct. ASP.NET has it's own pool of threads, managed worker threads, for serving ASP.NET requests. This pool is usually a few hundred threads and is separate from the ThreadPool pool, which is some smaller multiple of processors.
Using ThreadPool in ASP.NET will not interfere with ASP.NET worker threads. Using ThreadPool is fine.
It would also be acceptable to setup a single thread which is just for logging messages and using producer/consumer pattern to pass logs messages to that thread. In that case, since the thread is long-lived, you should create a single new thread to run the logging.
Using a new thread for every message is definitely overkill.
Another alternative, if you're only talking about logging, is to use a library like log4net. It handles logging in a separate thread and takes care of all the context issues that could come up in that scenario.
I'd say the article is wrong. If you're running a large .NET shop you can safely use the pool across multiple apps and multiple websites (using seperate app pools), simply based on one statement in the ThreadPool documentation:
There is one thread pool per process.
The thread pool has a default size of
250 worker threads per available
processor, and 1000 I/O completion
threads. The number of threads in the
thread pool can be changed by using
the SetMaxThreads method. Each thread
uses the default stack size and runs
at the default priority.
I was asked a similar question at work last week and I'll give you the same answer. Why are you multi threading web applications per request? A web server is a fantastic system optimized heavily to provide many requests in a timely fashion (i.e. multi threading). Think of what happens when you request almost any page on the web.
A request is made for some page
Html is served back
The Html tells the client to make further requets (js, css, images, etc..)
Further information is served back
You give the example of remote logging, but that should be a concern of your logger. An asynchronous process should be in place to receive messages in a timely fashion. Sam even points out that your logger (log4net) should already support this.
Sam is also correct in that using the Thread Pool on the CLR will not cause issues with the thread pool in IIS. The thing to be concerned with here though, is that you are not spawning threads from a process, you are spawning new threads off of IIS threadpool threads. There is a difference and the distinction is important.
Threads vs Process
Both threads and processes are methods
of parallelizing an application.
However, processes are independent
execution units that contain their own
state information, use their own
address spaces, and only interact with
each other via interprocess
communication mechanisms (generally
managed by the operating system).
Applications are typically divided
into processes during the design
phase, and a master process explicitly
spawns sub-processes when it makes
sense to logically separate
significant application functionality.
Processes, in other words, are an
architectural construct.
By contrast, a thread is a coding
construct that doesn't affect the
architecture of an application. A
single process might contains multiple
threads; all threads within a process
share the same state and same memory
space, and can communicate with each
other directly, because they share the
same variables.
Source
You can use Parallel.For or Parallel.ForEach and define the limit of possible threads you want to allocate to run smoothly and prevent pool starvation.
However, being run in background you will need to use pure TPL style below in ASP.Net web application.
var ts = new CancellationTokenSource();
CancellationToken ct = ts.Token;
ParallelOptions po = new ParallelOptions();
po.CancellationToken = ts.Token;
po.MaxDegreeOfParallelism = 6; //limit here
Task.Factory.StartNew(()=>
{
Parallel.ForEach(collectionList, po, (collectionItem) =>
{
//Code Here PostLog(logEvent);
}
});
I do not agree with the referenced article(C#feeds.com). It is easy to create a new thread but dangerous. The optimal number of active threads to run on a single core is actually surprisingly low - less than 10. It is way too easy to cause the machine to waste time switching threads if threads are created for minor tasks. Threads are a resource that REQUIRE management. The WorkItem abstraction is there to handle this.
There is a trade off here between reducing the number of threads available for requests and creating too many threads to allow any of them to process efficiently. This is a very dynamic situation but I think one that should be actively managed (in this case by the thread pool) rather than leaving it to the processer to stay ahead of the creation of threads.
Finally the article makes some pretty sweeping statements about the dangers of using the ThreadPool but it really needs something concrete to back them up.
Whether or not IIS uses the same ThreadPool to handle incoming requests seems hard to get a definitive answer to, and also seems to have changed over versions. So it would seem like a good idea not to use ThreadPool threads excessively, so that IIS has a lot of them available. On the other hand, spawning your own thread for every little task seems like a bad idea. Presumably, you have some sort of locking in your logging, so only one thread could progress at a time, and the rest would just take turns getting scheduled and unscheduled (not to mention the overhead of spawning a new thread). Essentially, you run into the exact problems the ThreadPool was designed to avoid.
It seems that a reasonable compromise would be for your app to allocate a single logging thread that you could pass messages to. You would want to be careful that sending messages is as fast as possible so that you don't slow down your app.

Resources