Making my controller DB calls Async or not - asp.net

I've inherited a MVC web application, that's using Dapper on top of ADO.NET to make DB calls in the Controller action methods. It's all pretty standard stuff - not many of the controlers are async, and all the database calls go through a repository which is fully synchronous.
I'm moving this to Azure, and I'm using SQL Azure for the database back end. I'm expecting load to be fairly standard - say 500 - 1000 hits per minute.
So, I'm wondering, should I be ploughing through this code to make all my db calls async, so that i can await then in the controllers. Doing this is going to free up my threads to serve up other requests, but I'm wondering in real terms if I'm going to notice any improvement.
I know that previously, it's been noted that if you have a single db server (as I do), then you won't really see much improvement, because the bottleneck is all on the db. However, SQL Azure is a slightly different beast, and Azure state that
Good practice demands that you use only asynchronous techniques to access Azure services such as SQL Database source
So - is this worth the effort?

Frankly, it's impossible to give any sort of definitive answer here. As Azure states, best practice is to use async with I/O-bound operations, including things like querying a remote database. If you were starting this application from scratch, today, I'd definitely tell you use async for your database calls.
However, this is not a new application, and it sounds like using async will require quite a bit of surgery at this point. Depending on the load you get, you may not end up seeing any gains for the work, but you might also see great gains. My recommendation is to start small. I would pick out some of the more longer-running queries you make or actions that rely on the database heavily and start with those. That way you can introduce a bit of async and judge for yourself whether it's worth pursuing it further. And, since these are likely to be the bottlenecks of your application, anyways, you gain the benefits of async where it will potentially matter the most.
Any new functionality you add should be async from the start, and then, simply when you have the time and inclination, work slowly on converting the whole application.

What I've learned (and verified through testing) is that you will not see a great improvement on your relatively long SQL calls. But you will see improvement on concurrent short SQL and non-SQL related responses. That is because there is a significant cost to initializing a thread. So reusing the dormant threads that are waiting for SQL does increase performance.
Using async also protects you from going over "Threads Per Processor Limit" setting in IIS. When that happens your requests get queued. We have experimented increasing the default value of 25. This did improve performance under high load but we saw better improvements by changing all our controllers to async.
So I guess the answer to your question is, it depends. If you have a significant number of concurrent requests other than your SQL calls, you should see a noticeable improvement on the response time of those concurrent requests. But you won't see much of an improvement on the relatively long SQL calls.

Related

WebAPI Lifecycle/Request Queue

I have an AngularJS app that calls WebAPI. If I log the time I initiatiate a request (in my angluar controller) and log the time OnActionExecuting runs (in an action filter in my WebAPI controller), I notice at times a ~2 second gap. I'm assuming nothing else is running before this filter and this is due to requests being blocked/queued. The reason I assume this is because if I remove all my other data calls, I do not see this gap.
What is the number of parallel requests that WebAPI can handle at once? I tried looking at the ASP.NET performance monitors but couldn't find where I could see this data. Can someone shed some insight into this?
There's no straight answer for this but the shortest one is ...
There is no limit to this for WebApi the limits come from what your server can handle and how efficient the code you have it run is.
...
But since you asked, lets consider some basic things that we can assume about our server and our application ...
concurrent connections
A typical server is known for issues like "c10k" ... https://en.wikipedia.org/wiki/C10k_problem ...so that puts a hard limit on the number of concurrent connections.
Assuming each WebApi call is made from say, some AJAX call on a web page, that gives us a limit of around 10k connections before things get evil.
2.Dependency related overheads
If we then consider the complexity of the code in question you may then have a bottleneck in doing things like SQL queries, I have often written WebApi controllers that have business logic that runs 10+ db queries, the overhead here may be your problem?
Feed in Overhead
What about network bandwidth to the server?
Lets assume we are streaming 1MB of data for each call, it wont take long to choke a 1Gb/s ethernet line with messages that size.
Processing Overhead
Assuming you wrote an Api that does complex calculations (e.g mesh generation for complex 3D data) you could easily choke your CPU for some time on each request.
Timeouts
Assuming the server could accept your request and the request was made asynchronously the biggest issue then is, how long are you prepared to wait for your response? Assuming this is quite short you would reduce the number of problems you have time to solve before each request then needed a response.
...
So as you can see, this is by no means an exhaustive list but it outlines the complexity of the question you asked. That said, I would argue that WebApi (the framework) has no limits, it's really down to the infrastructure around it that has limitations in order to determine what can be possible.

Is there any reason *not* to implement asynchronous ASP.NET web pages in every application?

With regards to asynchronous ASP.NET web pages article on MSDN.
The advantages are obvious with long-running pages or high server load. So, given projects where you think demand may be high somewhere down the track, when usage grows, is there any reason NOT to implement async ASP.NET in every web application as a standard? Are there any disadvantages to the approach?
Secondary question: are there any real-world studies/examples of where the advantages start to appear, in different web app situations? Or is it just a matter of suck it and see?
From your own link:
Only I/O-bound operations are good candidates for becoming async action methods on an asynchronous controller class. An I/O-bound operation is an operation that doesn’t depend on the local CPU for completion. When an I/O-bound operation is active, the CPU just waits for data to be processed (that is, downloaded) from external storage (a database or a remote service). I/O-bound operations are in contrast to CPU-bound operations, where the completion of a task depends on the activity of the CPU.
Async pages are not free, they do come at a price. They are generally good when your page is making an external call to a service or performing some long-running, non-CPU bound, operation. Otherwise, you are likely to thrash the CPU, leaving you with a worse situation that you had before going async.
The idea is to use async when you would be eating up a thread from your application's thread pool doing non-CPU intensive work (waiting for a response from a long-running service). That way, your application can continue processing requests and doesn't start queuing new ones, slowly draining the responsiveness from your app.
Here is another link with information when/when not to use async pages.
Edit
As for what is considered "long running," you're faced with the crummy answer of "It depends." To figure this out, you would need to profile your application, see how many of your "long running" requests cause subsequent requests to be queued, instead of processed, by IIS. The decision comes down to being in a situation in which paying the costly toll of context switching is less than the return you're going to get for doing so. If your bottleneck is a certain page or service that causes incoming requests to be held off, it is probably a good idea to start thinking about async work. But, you might also be doing too much work in the request and it could be a "code smell" that you need to refactor your code.
In the end, It depends.
Here is an exerpt from MSDN.
In general, use asynchronous pipelines when the following conditions are true:
The operations are network-bound or I/O-bound instead of CPU-bound.
Testing shows that the blocking operations are a bottleneck in site performance and that IIS can service more requests by using asynchronous action methods for these blocking calls.
Parallelism is more important than simplicity of code.
You want to provide a mechanism that lets users cancel a long-running request.
While the link is about MVC, the idea holds true for other flavors of ASP.NET, too.

How to "warm-up" Entity Framework? When does it get "cold"?

No, the answer to my second question is not the winter.
Preface:
I've been doing a lot of research on Entity Framework recently and something that keeps bothering me is its performance when the queries are not warmed-up, so called cold queries.
I went through the performance considerations article for Entity Framework 5.0. The authors introduced the concept of Warm and Cold queries and how they differ, which I also noticed myself without knowing of their existence. Here it's probably worth to mention I only have six months of experience behind my back.
Now I know what topics I can research into additionally if I want to understand the framework better in terms of performance. Unfortunately most of the information on the Internet is outdated or bloated with subjectivity, hence my inability to find any additional information on the Warm vs Cold queries topic.
Basically what I've noticed so far is that whenever I have to recompile or the recycling hits, my initial queries are getting very slow. Any subsequent data read is fast (subjective), as expected.
We'll be migrating to Windows Server 2012, IIS8 and SQL Server 2012 and as a Junior I actually won myself the opportunity to test them before the rest. I'm very happy they introduced a warming-up module that will get my application ready for that first request. However, I'm not sure how to proceed with warming up my Entity Framework.
What I already know is worth doing:
Generate my Views in advance as suggested.
Eventually move my models into a separate assembly.
What I consider doing, by going with common sense, probably wrong approach:
Doing dummy data reads at Application Start in order to warm things
up, generate and validate the models.
Questions:
What would be the best approach to have high availability on my Entity Framework at anytime?
In what cases does the Entity Framework gets "cold" again? (Recompilation, Recycling, IIS Restart etc.)
What would be the best approach to have high availability on my Entity Framework at anytime?
You can go for a mix of pregenerated views and static compiled queries.
Static CompiledQuerys are good because they're quick and easy to write and help increase performance. However with EF5 it isn't necessary to compile all your queries since EF will auto-compile queries itself. The only problem is that these queries can get lost when the cache is swept. So you still want to hold references to your own compiled queries for those that are occurring only very rare, but that are expensive. If you put those queries into static classes they will be compiled when they're first required. This may be too late for some queries, so you may want to force compilation of these queries during application startup.
Pregenerating views is the other possibility as you mention. Especially, for those queries that take very long to compile and that don't change. That way you move the performance overhead from runtime to compile time. Also this won't introduce any lag. But of course this change goes through to the database, so it's not so easy to deal with. Code is more flexible.
Do not use a lot of TPT inheritance (that's a general performance issue in EF). Neither build your inheritance hierarchies too deep nor too wide. Only 2-3 properties specific to some class may not be enough to require an own type, but could be handled as optional (nullable) properties to an existing type.
Don't hold on to a single context for a long time. Each context instance has its own first level cache which slows down the performance as it grows larger. Context creation is cheap, but the state management inside the cached entities of the context may become expensive. The other caches (query plan and metadata) are shared between contexts and will die together with the AppDomain.
All in all you should make sure to allocate contexts frequently and use them only for a short time, that you can start your application quickly, that you compile queries that are rarely used and provide pregenerated views for queries that are performance critical and often used.
In what cases does the Entity Framework gets "cold" again? (Recompilation, Recycling, IIS Restart etc.)
Basically, every time you lose your AppDomain. IIS performs restarts every 29 hours, so you can never guarantee that you'll have your instances around. Also after some time without activity the AppDomain is also shut down. You should attempt to come up quickly again. Maybe you can do some of the initialization asynchronously (but beware of multi-threading issues). You can use scheduled tasks that call dummy pages in your application during times when there are no requests to prevent the AppDomain from dying, but it will eventually.
I also assume when you change your config file or change the assemblies there's going to be a restart.
If you are looking for maximum performance across all calls you should consider your architecture carefully. For instance, it might make sense to pre-cache often used look-ups in server RAM when the application loads up instead of using database calls on every request. This technique will ensure minimum application response times for commonly used data. However, you must be sure to have a well behaved expiration policy or always clear your cache whenever changes are made which affect the cached data to avoid issues with concurrency.
In general, you should strive to design distributed architectures to only require IO based data requests when the locally cached information becomes stale, or needs to be transactional. Any "over the wire" data request will normally take 10-1000 times longer to retrieve than an a local, in memory cache retrieval. This one fact alone often makes discussions about "cold vs. warm data" inconsequential in comparison to the "local vs. remote" data issue.
General tips.
Perform rigorous logging including what is accessed and request time.
Perform dummy requests when initializing your application to warm boot very slow requests that you pick up from the previous step.
Don't bother optimizing unless it's a real problem, communicate with the consumer of the application and ask. Get comfortable having a continuous feedback loop if only to figure out what needs optimization.
Now to explain why dummy requests are not the wrong approach.
Less Complexity - You are warming up the application in a manner that will work regardless of changes in the framework, and you don't need to figure out possibly funky APIs/framework internals to do it the right way.
Greater Coverage - You are warming up all layers of caching at once related to the slow request.
To explain when a cache gets "Cold".
This happens at any layer in your framework that applies a cache, there is a good description at the top of the performance page.
When ever a cache has to be validated after a potential change that makes the cache stale, this could be a timeout or more intelligent (i.e. change in the cached item).
When a cache item is evicted, the algorithm for doing this is described in the section "Cache eviction algorithm" in the performance article you linked, but in short.
LFRU (Least frequently - recently used) cache on hit count and age with a limit of 800 items.
The other things you mentioned, specifically recompilation and restarting of IIS clear either parts or all of the in memory caches.
As you have stated, use "pre-generated views" that's really all you need to do.
Extracted from your link:
"When views are generated, they are also validated. From a performance standpoint, the vast majority of the cost of view generation is actually the validation of the views"
This means the performance knock will take place when you build your model assembly. Your context object will then skip the "cold query" and stay responsive for the duration of the context object life cycle as well as subsequent new object contexts.
Executing irrelevant queries will serve no other purpose than to consume system resources.
The shortcut ...
Skip all that extra work of pre-generated views
Create your object context
Fire off that sweet irrelevant query
Then just keep a reference to your object context for the duration of your process
(not recommended).
I have no experience in this framework. But in other contexts, e.g. Solr, completely dummy reads will not be of much use unless you can cache the whole DB (or index).
A better approach would be to log the queries, extract the most common ones out of the logs and use them to warm up. Just be sure not to log the warm up queries or remove them from the logs before proceeding.

Adding more hardware v/s refactoring code under a time crunch

Background:
Enterprise application - very will written for its time in 2004.
Stack:
.NET, Heavy use of Remoting, ASMX style web services, SQL Server
Problem:
The application allows user to go through various wizards for lack of a better term, all of their actions are stored in what we call "wiz state", which is essentially XML that is persisted to a SQL server database very frequently because we allow users to pause/resume their application. Often in these wizards, the XML that comprises the wizard state grows very large, I'm talking 5-8 MB of data, and we noticed that when we had a sudden influx of simultaneous users, we started receiving occasional timeouts against the database, because a lot of what the wizard state is comprised of, is keeping track of collections of "things". Sometimes these custom collections grow very large.
Question:
We were in a meeting today and we're expecting a flurry of activity in October that will test the system like never before, and possibly result in huge wizard states that go back and forth from the web server to the database. The crux of the situation is that there is only one database and one web server.
For arguments sake, because of the complexity of the application, lets say adding any kind of clustering/mirroring to increase database throughput is out of the question. I spoke up in the meeting and said the quickest way to address this in the shortest time period would be to add more servers to the front end web application so the load could be distributed amongst web servers. The development lead said I was completely wrong and it would have no effect because we only have one database, so adding more web power would do nothing. He is having one of the other developers reduce the xml bloat that we persist frequently to the database. Probably in the long run, reducing the size of the xml that we pass back and forth is the right idea, but will adding additional web servers truly have no effect, I just think in terms of simultaneous users, it should help.
Any responses thoughts are appreciated, proof that more web servers would help would be pure win.
Thanks.
EDIT: We use binary serialization to store the XML in the database in an image field.
I haven't heard anything about locating the "bottlenecks". Isn't that the first thing to do? Here's the method I use.
Otherwise you're just investing in guesses. That won't work.
I've been in meetings like that, where everybody gets excited throwing ideas around, and "management" wants to make "decisions", but it's the blind leading the blind. Knuckle down and find out what's going on. You can't do that in meetings.
Some time ago I looked at a performance problem with some similarity to yours. The biggest "bottleneck" was in writing and parsing XML, with attendant memory allocation, setup, and destruction. Then there were others as well. You might find the same thing, or something different.
P.S. I keep quoting "bottleneck" because all the performance problems I've found have been nothing at all like the necks of bottles. Rather they are like way over-bushy call trees that need radical pruning, such as making and reading mountains of XML for no good reason.
If the rate at which the data is written by SQL is the bottleneck, feeding data to SQL more quickly should have no effect.
I am not sure exactly what the data structure is, but perhaps compressing the XML data on the web server(s) before writing may have a positive effect.
If the bottleneck is the database, then more web services will not help you a lot.
The problem may be that the problem is not only the size of the data, but the number of concurrent request to the same table. The number of writes will be the big problem. If your XML write is in a transaction with other queries you may try to break out the XML write from that transaction to reduce locking time of the XML table.
As stated by vdeych you may try compression to reduce the data size. (That would increase the load on the web servers.)
You may also try caching the data. Only read from the SQL server if the data is not already in the cache. Make sure you don't update the SQL server if your data has not changed.
No one seems to have suggest this, what about replacing your XML serialization of your wizard with JsonSerialization.
Not only should this give you a minor boost in performance in the serialization itself since both the DataContractSerializer (faster) and Newtonsoft Json.NET (fastest) out perform the XML serializers in .NET. This should easily reduce the size of your object graph by upwards of 50% or more (depending on number of properties vs large strings in the XML).
This should dramatically lower the IO that is inflicted upon Sql server. This should also limit the amount of scope required to alter your application significantly (assuming it's well designed and works through common calls for serialization/deserialization).
If you choose to go this route also invest time comparing BSON vs JSON as I think it would be likely that the binary encoded one will offer even more space savings (and further IO reduction) due to the size of your object graphs.
I'm not a .NET expert but maybe using a binary serialization would increase throughput. Making sure that the XML isn't stored as text (fairly obvious but thought I'd mention it). Also relational databases are best for storing relational data, so perhaps substituting an ORM layer in place of the serialization (sounds feasible) could speed things up.
Mike is spot on, without understanding the resource constaint leading to the performance issues, no amount of discussion will resolve the problem. I'll add that socket timeouts that affect running statements are a symptom, and are never imposed by SQL Server, they're an artifact of your driver configuration or a firewall or similar device between app and db imposing them (unless you're talking about timeouts for new connections, then you have a host in serious distress under load).
Given your symptom is database timeouts, you need to start there. If they're indicative of long running statements that result in a socket timeout, use SQL Server profiler to capture the workload while simultaneously monitoring system resources. Given it's a mature application and the type of workload you mention, it's unlikely to be statement tuning related, it probably boils down to resource limitations CPU, memory or disk IO capacity
This Technet guide is a very good place to start:
http://technet.microsoft.com/en-us/library/cc966540.aspx
If it's resource contention, then it's a simple discussion about how the resource contention can be tuned, configured for or addressed by adding more of whatever is needed.
Edit: I should add that given a database performance issue, more applications servers is likely to worsen the problem as you increase the amount of concurrency, that might otherwise be kept in check by connection pool, request processing or other limits.

Do ASP.NET developers really need to be concerned with thread safety?

I consider myself aware of the concepts of threading and why certain code is or isn’t “thread-safe,” but as someone who primarily works with ASP.NET, threading and thread safety is something I rarely think about. However, I seem to run across numerous comments and answers (not necessarily for ASP.NET) on Stack Overflow to the effect of “warning – that’s not thread-safe!,” and it tends to make me second guess whether I’ve written similar code that actually could cause a problem in my applications. [shock, horror, etc.] So I’m compelled to ask:
Do ASP.NET developers really need to be concerned with thread safety?
My Take: While a web application is inherently multi-threaded, each particular request comes in on a single thread, and all non-static types you create, modify, or destroy are exclusive to that single thread/request. If the request creates an instance of a DAL object which creates an instance of a business object and I want to lazy-initialize a collection within this object, it doesn’t matter if it’s not thread-safe, because it will never be touched by another thread. ...Right? (Let’s assume I’m not starting a new thread to kick off a long-running asynchronous process during the request. I’m well aware that changes everything.)
Of course, static classes, methods and variables are just the opposite. They are shared by every request, and the developer must be very careful not to have “unsafe” code that when executed by one user, can have an unintended effect on all others.
But that’s about it, and thus thread safety in ASP.NET mostly boils down to this: Be careful how you design and use statics. Other than that, you don’t need to worry about it much at all.
Am I wrong about any of this? Do you disagree? Enlighten me!
There are certain objects in addition to static items that are shared across all requests to an application. Be careful about putting items in the application cache that are not thread-safe, for example. Also, nothing prevents you from spawning your own threads for background processing while handling a request.
There are different levels of ASP.NET Developers. You could make a perfectly fine career as an ASP.NET Developer without knowing anything threads, mutexes, locks, semaphores and even design patterns because a high percentage of ASP.NET applications are basically CRUD applications with little to no additional business logic.
However, most great ASP.NET Developers which I have come across aren't just ASP.NET Developers, their skills run the gamut so they know all about threading and other good stuff because they don't limit themselves to ASP.NET.
So no, for the most part ASP.NET Developers do not need to know about thread safety. But what fun is there in only knowing the bare minimum?
Only if you create, within the processing stream for a single HTTPRequest, multiple threads of your own... For e.g., if the web page will display stock quotes for a set of stocks, and you make separate calls to a stock quote service, on independant threads, to retrive the quotes, before generating the page to send back to the client... Then you would have to make sure that the code you are running in your threads is thread-safe.
I believe you covered it all very well. I agree with you. Being focused on ASP.NET only it rarely (if at all) comes to multi-threading issues.
The situation changes however when it comes to optimizations. Whenever your start a long-lasting query, you may often want to let it run in a separate thread so that the page load does not stop until the server reports connection timeout. You may wish to have this page periodically check for completion status to notify the user. Here where it comes to multi-threading issues.

Resources