I consider myself aware of the concepts of threading and why certain code is or isn’t “thread-safe,” but as someone who primarily works with ASP.NET, threading and thread safety is something I rarely think about. However, I seem to run across numerous comments and answers (not necessarily for ASP.NET) on Stack Overflow to the effect of “warning – that’s not thread-safe!,” and it tends to make me second guess whether I’ve written similar code that actually could cause a problem in my applications. [shock, horror, etc.] So I’m compelled to ask:
Do ASP.NET developers really need to be concerned with thread safety?
My Take: While a web application is inherently multi-threaded, each particular request comes in on a single thread, and all non-static types you create, modify, or destroy are exclusive to that single thread/request. If the request creates an instance of a DAL object which creates an instance of a business object and I want to lazy-initialize a collection within this object, it doesn’t matter if it’s not thread-safe, because it will never be touched by another thread. ...Right? (Let’s assume I’m not starting a new thread to kick off a long-running asynchronous process during the request. I’m well aware that changes everything.)
Of course, static classes, methods and variables are just the opposite. They are shared by every request, and the developer must be very careful not to have “unsafe” code that when executed by one user, can have an unintended effect on all others.
But that’s about it, and thus thread safety in ASP.NET mostly boils down to this: Be careful how you design and use statics. Other than that, you don’t need to worry about it much at all.
Am I wrong about any of this? Do you disagree? Enlighten me!
There are certain objects in addition to static items that are shared across all requests to an application. Be careful about putting items in the application cache that are not thread-safe, for example. Also, nothing prevents you from spawning your own threads for background processing while handling a request.
There are different levels of ASP.NET Developers. You could make a perfectly fine career as an ASP.NET Developer without knowing anything threads, mutexes, locks, semaphores and even design patterns because a high percentage of ASP.NET applications are basically CRUD applications with little to no additional business logic.
However, most great ASP.NET Developers which I have come across aren't just ASP.NET Developers, their skills run the gamut so they know all about threading and other good stuff because they don't limit themselves to ASP.NET.
So no, for the most part ASP.NET Developers do not need to know about thread safety. But what fun is there in only knowing the bare minimum?
Only if you create, within the processing stream for a single HTTPRequest, multiple threads of your own... For e.g., if the web page will display stock quotes for a set of stocks, and you make separate calls to a stock quote service, on independant threads, to retrive the quotes, before generating the page to send back to the client... Then you would have to make sure that the code you are running in your threads is thread-safe.
I believe you covered it all very well. I agree with you. Being focused on ASP.NET only it rarely (if at all) comes to multi-threading issues.
The situation changes however when it comes to optimizations. Whenever your start a long-lasting query, you may often want to let it run in a separate thread so that the page load does not stop until the server reports connection timeout. You may wish to have this page periodically check for completion status to notify the user. Here where it comes to multi-threading issues.
Related
No, the answer to my second question is not the winter.
Preface:
I've been doing a lot of research on Entity Framework recently and something that keeps bothering me is its performance when the queries are not warmed-up, so called cold queries.
I went through the performance considerations article for Entity Framework 5.0. The authors introduced the concept of Warm and Cold queries and how they differ, which I also noticed myself without knowing of their existence. Here it's probably worth to mention I only have six months of experience behind my back.
Now I know what topics I can research into additionally if I want to understand the framework better in terms of performance. Unfortunately most of the information on the Internet is outdated or bloated with subjectivity, hence my inability to find any additional information on the Warm vs Cold queries topic.
Basically what I've noticed so far is that whenever I have to recompile or the recycling hits, my initial queries are getting very slow. Any subsequent data read is fast (subjective), as expected.
We'll be migrating to Windows Server 2012, IIS8 and SQL Server 2012 and as a Junior I actually won myself the opportunity to test them before the rest. I'm very happy they introduced a warming-up module that will get my application ready for that first request. However, I'm not sure how to proceed with warming up my Entity Framework.
What I already know is worth doing:
Generate my Views in advance as suggested.
Eventually move my models into a separate assembly.
What I consider doing, by going with common sense, probably wrong approach:
Doing dummy data reads at Application Start in order to warm things
up, generate and validate the models.
Questions:
What would be the best approach to have high availability on my Entity Framework at anytime?
In what cases does the Entity Framework gets "cold" again? (Recompilation, Recycling, IIS Restart etc.)
What would be the best approach to have high availability on my Entity Framework at anytime?
You can go for a mix of pregenerated views and static compiled queries.
Static CompiledQuerys are good because they're quick and easy to write and help increase performance. However with EF5 it isn't necessary to compile all your queries since EF will auto-compile queries itself. The only problem is that these queries can get lost when the cache is swept. So you still want to hold references to your own compiled queries for those that are occurring only very rare, but that are expensive. If you put those queries into static classes they will be compiled when they're first required. This may be too late for some queries, so you may want to force compilation of these queries during application startup.
Pregenerating views is the other possibility as you mention. Especially, for those queries that take very long to compile and that don't change. That way you move the performance overhead from runtime to compile time. Also this won't introduce any lag. But of course this change goes through to the database, so it's not so easy to deal with. Code is more flexible.
Do not use a lot of TPT inheritance (that's a general performance issue in EF). Neither build your inheritance hierarchies too deep nor too wide. Only 2-3 properties specific to some class may not be enough to require an own type, but could be handled as optional (nullable) properties to an existing type.
Don't hold on to a single context for a long time. Each context instance has its own first level cache which slows down the performance as it grows larger. Context creation is cheap, but the state management inside the cached entities of the context may become expensive. The other caches (query plan and metadata) are shared between contexts and will die together with the AppDomain.
All in all you should make sure to allocate contexts frequently and use them only for a short time, that you can start your application quickly, that you compile queries that are rarely used and provide pregenerated views for queries that are performance critical and often used.
In what cases does the Entity Framework gets "cold" again? (Recompilation, Recycling, IIS Restart etc.)
Basically, every time you lose your AppDomain. IIS performs restarts every 29 hours, so you can never guarantee that you'll have your instances around. Also after some time without activity the AppDomain is also shut down. You should attempt to come up quickly again. Maybe you can do some of the initialization asynchronously (but beware of multi-threading issues). You can use scheduled tasks that call dummy pages in your application during times when there are no requests to prevent the AppDomain from dying, but it will eventually.
I also assume when you change your config file or change the assemblies there's going to be a restart.
If you are looking for maximum performance across all calls you should consider your architecture carefully. For instance, it might make sense to pre-cache often used look-ups in server RAM when the application loads up instead of using database calls on every request. This technique will ensure minimum application response times for commonly used data. However, you must be sure to have a well behaved expiration policy or always clear your cache whenever changes are made which affect the cached data to avoid issues with concurrency.
In general, you should strive to design distributed architectures to only require IO based data requests when the locally cached information becomes stale, or needs to be transactional. Any "over the wire" data request will normally take 10-1000 times longer to retrieve than an a local, in memory cache retrieval. This one fact alone often makes discussions about "cold vs. warm data" inconsequential in comparison to the "local vs. remote" data issue.
General tips.
Perform rigorous logging including what is accessed and request time.
Perform dummy requests when initializing your application to warm boot very slow requests that you pick up from the previous step.
Don't bother optimizing unless it's a real problem, communicate with the consumer of the application and ask. Get comfortable having a continuous feedback loop if only to figure out what needs optimization.
Now to explain why dummy requests are not the wrong approach.
Less Complexity - You are warming up the application in a manner that will work regardless of changes in the framework, and you don't need to figure out possibly funky APIs/framework internals to do it the right way.
Greater Coverage - You are warming up all layers of caching at once related to the slow request.
To explain when a cache gets "Cold".
This happens at any layer in your framework that applies a cache, there is a good description at the top of the performance page.
When ever a cache has to be validated after a potential change that makes the cache stale, this could be a timeout or more intelligent (i.e. change in the cached item).
When a cache item is evicted, the algorithm for doing this is described in the section "Cache eviction algorithm" in the performance article you linked, but in short.
LFRU (Least frequently - recently used) cache on hit count and age with a limit of 800 items.
The other things you mentioned, specifically recompilation and restarting of IIS clear either parts or all of the in memory caches.
As you have stated, use "pre-generated views" that's really all you need to do.
Extracted from your link:
"When views are generated, they are also validated. From a performance standpoint, the vast majority of the cost of view generation is actually the validation of the views"
This means the performance knock will take place when you build your model assembly. Your context object will then skip the "cold query" and stay responsive for the duration of the context object life cycle as well as subsequent new object contexts.
Executing irrelevant queries will serve no other purpose than to consume system resources.
The shortcut ...
Skip all that extra work of pre-generated views
Create your object context
Fire off that sweet irrelevant query
Then just keep a reference to your object context for the duration of your process
(not recommended).
I have no experience in this framework. But in other contexts, e.g. Solr, completely dummy reads will not be of much use unless you can cache the whole DB (or index).
A better approach would be to log the queries, extract the most common ones out of the logs and use them to warm up. Just be sure not to log the warm up queries or remove them from the logs before proceeding.
A few months ago I was interviewing for a job inside the company I am currently in, I dont have a strong web development background, but one of the questions he posed to me was how could you improve this block of code.
I dont remember the code block perfectly but to sum it up it was a web hit counter, and he used lock on the hitcounter.
lock(HitCounter)
{
// Bla...
}
However after some discussion he said, lock is good but never use it in web applications!
What is the basis behind his statement? Why shouldnt I use lock in web applications?
There is no special reason why locks should not be used in web applications. However, they should be used carefully as they are a mechanism to serialize multi-threaded access which can cause blocking if lock blocks are contended. This is not just a concern for web applications though.
What is always worth remembering is that on modern hardware an uncontended lock takes 20 nanoseconds to flip. With this in mind, the usual practice of trying to make code inside of lock blocks as minimal as possible should be followed. If you have minimal code within a block, the overhead is quite small and potential for contention low.
To say that locks should never be used is a bit of a blanket statement really. It really depends on what your requirements are e.g. a thread-safe in-memory cache to be shared between requests will potentially result in less request blocking than on-demand fetching from a database.
Finally, BCL and ASP.Net Framework types certainly use locks internally, so you're indirectly using them anyway.
The application domain might be recycled.
This might result in the old appdomain still finishing serving some requests and the new appdomain also serving new requests.
Static variables are not shared between them, so locking on a static global would not grant exclusivity in this case.
First of all, you never want to lock an object that you actually use in any application. You want to create a lock object and lock that:
private readonly object _hitCounterLock = new object();
lock(_hitCounterLock)
{
//blah
}
As for the web portion of the question, when you lock you block every thread that attempts to access the object (which for the web could be hundreds or thousands of users). They will all be waiting until each thread ahead of them unlocks.
Late :), but for future readers of this, an additional point:
If the application is run on a web farm, the ASP's running on multiple machines will not share the lock object
So this can only work if
1. No web farm has to be supported AND 2. ASP is configured (non-default) NOT to use parallel instances during recycle until old requests are served (as mentioned by Andras above)
This code will create a bottleneck for your application since all incoming request will have to wait at this point before the previous went out of the lock.
lock is only intended to be used for multithreaded applications where multiple threads require access to the same shared variable, thus a lock is exclusively acquired by the requesting thread and all pending threads will block and wait until the lock is released.
in web applications, user requests are isolated so there is no need for locking by default
Couple reasons...
If you're trying to lock a database read/write operation, there's a really high risk of a race condition happening anyway because the database isn't owned by the process doing the lock, so it could be read from/written to by another process -- perhaps even a hypothetical future version of IIS that runs multiple processes per application.
Locks are typically used in client applications for non-UI threads, i.e. background/worker threads. Web applications don't have as much of a use for multithreaded processing unless you're trying to take advantage of multiple cores (in which case locks on request-associated objects would be acceptable), because each request can be assumed to run on its own thread, and the server can't respond until it's processed the entire output (or at least a sequential chunk) anyway.
In order to improve speed of chat application, I am remembering last message id in static variable (actually, Dictionary).
Howeever, it seems that every thread has own copy, because users do not get updated on production (single server environment).
private static Dictionary<long, MemoryChatRoom> _chatRooms = new Dictionary<long, MemoryChatRoom>();
No treadstaticattribute used...
What is fast way to share few ints across all application processes?
update
I know that web must be stateless. However, for every rule there is an exception. Currently all data stroed in ms sql, and in this particular case some piece of shared memory wil increase performance dramatically and allow to avoid sql requests for nothing.
I did not used static for years, so I even missed moment when it started to be multiple instances in same application.
So, question is what is simplest way to share memory objects between processes? For now, my workaround is remoting, but there is a lot of extra code and I am not 100% sure in stability of this approach.
I'm assuming you're new to web programming. One of the key differences in a web application to a regular console or Windows forms application is that it is stateless. This means that every page request is basically initialised from scratch. You're using the database to maintain state, but as you're discovering this is fairly slow. Fortunately you have other options.
If you want to remember something frequently accessed on a per-user basis (say, their username) then you could use session. I recommend reading up on session state here. Be careful, however, not to abuse the session object -- since each user has his or her own copy of session, it can easily use a lot of RAM and cause you more performance problems than your database ever was.
If you want to cache information that's relevant across all users of your apps, ASP.NET provides a framework for data caching. The simplest way to use this is like a dictionary, eg:
Cache["item"] = "Some cached data";
I recommend reading in detail about the various options for caching in ASP.NET here.
Overall, though, I recommend you do NOT bother with caching until you are more comfortable with web programming. As with any type of globally shared data, it can cause unpredictable issues which are difficult to diagnosed if misused.
So far, there is no easy way to comminucate between processes. (And maybe this is good based on isolation, scaling). For example, this is mentioned explicitely here: ASP.Net static objects
When you really need web application/service to remember some state in memory, and NOT IN DATABASE you have following options:
You can Max Processes count = 1. Require to move this piece of code to seperate web application. In case you make it separate subdomain you will have Cross Site Scripting issues when accesing this from JS.
Remoting/WCF - You can host critical data in remoting applcation, and access it from web application.
Store data in every process and syncronize changes via memcached. Memcached doesn't have actual data, because it took long tim eto transfer it. Only last changed date per each collection.
With #3 I am able to achieve more than 100 pages per second from single server.
Joe Duffy's article about ReaderWriterLockSlim does not fill me with confidence!
Introducing the new ReaderWriterLockSlim in Orcas
The lock is not robust to asynchronous exceptions such as thread aborts and out of memory conditions. If one of these occurs while in the middle of one of the lock’s methods, the lock state can be corrupt, causing subsequent deadlocks, unhandled exceptions, and (sadly) due to the use of spin locks internally, a pegged 100% CPU.
How can I safely use ReaderWriterLockSlim in ASP.NET?
Is your ASP.NET application regularly encountering thread aborts (from other threads) or trying to survive OutOfMemoryExceptions? If not, I can't see that the post is too worrying... and if it is, I'd argue you've got bigger problems.
In particular, note this bit:
There are some downsides to the new lock, however, that may cause programmers writing hosted
or low-level reliability-sensitive code to wait to adopt it.
Don’t get me wrong, most people really don’t need to worry about these topics, so I
apologize if my words of warning have scared you off: but those that do really need to be
told about the state of affairs.
Now yes, ASP.NET is "hosted", but it's not quite as severe as the SQL Server CLR hosting. I don't think you need to worry.
We are starting to write more and more code for an ASP.Net web application uses a new thread to complete long running tasks. I can find no solid documentation that give any useful guide to any limitations of restrictions of using threads within IIS (6). Any advice to this end would be appreciated - specifically the following:
What (if any) is the max number of threads
Is there a recommended max number
Are there any pitfalls of using threads within an ASP.Net IIS web application?
Thanks for any advice
I assume you have already looked into Asynchronous ASP.NET page processing?
Improving .NET Application Performance and Scalability
http://msdn.microsoft.com/en-us/library/ms998530.aspx
10 Tips for Writing High-Performance Web Applications
http://msdn.microsoft.com/en-us/magazine/cc163854.aspx
I can find no solid documentation that
give any useful guide to any
limitations of restrictions of using
threads within IIS (6).
Mainly because this is a bad idea. Long running processes should be converted into windows services which either run continuously and occasionally check the database or whatever else for work to do or services that can be woken up by your asp.net app.
I myself have frequently done the same thing. What i found was that there is a maximum which is based on a "n number of threads per CPU" these can be adjusted and fine tuned in the web.config and machine.config files. This post has a reasonable explanation of this.
The recommended maximum would be the default setting, at least according to the documentation I have read from Microsoft on this topic sometime ago.
The biggest pitfall you will find you need to cross is how to report progress or the results back to the user. I typically use a polling mechanism from the client to call back to a page which checks the session state for progress. The session state is of course being updated from the main thread. If you want to see this approach working in real life see the House of Travel website and do a search for flights.
Was going to make this a comment, but I realized it was more relevant than I thought. Eric Lippert has heard this set of questions before, and states that it is unanswerable.
So, in short, don't even go there.
Come up with a design that uses a
small number of threads and tune that.
Make sure that you only use threads when you're going to benefit. If your long-running code is CPU-intensive, then you won't actually benefit from making the call asynchronous (in fact, performance will decrease as there is an overhead).
Use threads for I/O operations or calling Web Services.
Each application is different. Simply setting the ThreadPool to max isn't the answer, or it would already be set at this level!
The higher you set the ThreadPool, the more you'll saturate the CPU, so IF you have CPU-intensive code then this will just compound the problem even more.
Of course, you could off-load these CPU-intensive calls onto another machine.
http://msdn.microsoft.com/en-gb/magazine/cc163327.aspx
http://msdn.microsoft.com/en-us/magazine/cc163725.aspx