How to "warm-up" Entity Framework? When does it get "cold"? - asp.net

No, the answer to my second question is not the winter.
Preface:
I've been doing a lot of research on Entity Framework recently and something that keeps bothering me is its performance when the queries are not warmed-up, so called cold queries.
I went through the performance considerations article for Entity Framework 5.0. The authors introduced the concept of Warm and Cold queries and how they differ, which I also noticed myself without knowing of their existence. Here it's probably worth to mention I only have six months of experience behind my back.
Now I know what topics I can research into additionally if I want to understand the framework better in terms of performance. Unfortunately most of the information on the Internet is outdated or bloated with subjectivity, hence my inability to find any additional information on the Warm vs Cold queries topic.
Basically what I've noticed so far is that whenever I have to recompile or the recycling hits, my initial queries are getting very slow. Any subsequent data read is fast (subjective), as expected.
We'll be migrating to Windows Server 2012, IIS8 and SQL Server 2012 and as a Junior I actually won myself the opportunity to test them before the rest. I'm very happy they introduced a warming-up module that will get my application ready for that first request. However, I'm not sure how to proceed with warming up my Entity Framework.
What I already know is worth doing:
Generate my Views in advance as suggested.
Eventually move my models into a separate assembly.
What I consider doing, by going with common sense, probably wrong approach:
Doing dummy data reads at Application Start in order to warm things
up, generate and validate the models.
Questions:
What would be the best approach to have high availability on my Entity Framework at anytime?
In what cases does the Entity Framework gets "cold" again? (Recompilation, Recycling, IIS Restart etc.)

What would be the best approach to have high availability on my Entity Framework at anytime?
You can go for a mix of pregenerated views and static compiled queries.
Static CompiledQuerys are good because they're quick and easy to write and help increase performance. However with EF5 it isn't necessary to compile all your queries since EF will auto-compile queries itself. The only problem is that these queries can get lost when the cache is swept. So you still want to hold references to your own compiled queries for those that are occurring only very rare, but that are expensive. If you put those queries into static classes they will be compiled when they're first required. This may be too late for some queries, so you may want to force compilation of these queries during application startup.
Pregenerating views is the other possibility as you mention. Especially, for those queries that take very long to compile and that don't change. That way you move the performance overhead from runtime to compile time. Also this won't introduce any lag. But of course this change goes through to the database, so it's not so easy to deal with. Code is more flexible.
Do not use a lot of TPT inheritance (that's a general performance issue in EF). Neither build your inheritance hierarchies too deep nor too wide. Only 2-3 properties specific to some class may not be enough to require an own type, but could be handled as optional (nullable) properties to an existing type.
Don't hold on to a single context for a long time. Each context instance has its own first level cache which slows down the performance as it grows larger. Context creation is cheap, but the state management inside the cached entities of the context may become expensive. The other caches (query plan and metadata) are shared between contexts and will die together with the AppDomain.
All in all you should make sure to allocate contexts frequently and use them only for a short time, that you can start your application quickly, that you compile queries that are rarely used and provide pregenerated views for queries that are performance critical and often used.
In what cases does the Entity Framework gets "cold" again? (Recompilation, Recycling, IIS Restart etc.)
Basically, every time you lose your AppDomain. IIS performs restarts every 29 hours, so you can never guarantee that you'll have your instances around. Also after some time without activity the AppDomain is also shut down. You should attempt to come up quickly again. Maybe you can do some of the initialization asynchronously (but beware of multi-threading issues). You can use scheduled tasks that call dummy pages in your application during times when there are no requests to prevent the AppDomain from dying, but it will eventually.
I also assume when you change your config file or change the assemblies there's going to be a restart.

If you are looking for maximum performance across all calls you should consider your architecture carefully. For instance, it might make sense to pre-cache often used look-ups in server RAM when the application loads up instead of using database calls on every request. This technique will ensure minimum application response times for commonly used data. However, you must be sure to have a well behaved expiration policy or always clear your cache whenever changes are made which affect the cached data to avoid issues with concurrency.
In general, you should strive to design distributed architectures to only require IO based data requests when the locally cached information becomes stale, or needs to be transactional. Any "over the wire" data request will normally take 10-1000 times longer to retrieve than an a local, in memory cache retrieval. This one fact alone often makes discussions about "cold vs. warm data" inconsequential in comparison to the "local vs. remote" data issue.

General tips.
Perform rigorous logging including what is accessed and request time.
Perform dummy requests when initializing your application to warm boot very slow requests that you pick up from the previous step.
Don't bother optimizing unless it's a real problem, communicate with the consumer of the application and ask. Get comfortable having a continuous feedback loop if only to figure out what needs optimization.
Now to explain why dummy requests are not the wrong approach.
Less Complexity - You are warming up the application in a manner that will work regardless of changes in the framework, and you don't need to figure out possibly funky APIs/framework internals to do it the right way.
Greater Coverage - You are warming up all layers of caching at once related to the slow request.
To explain when a cache gets "Cold".
This happens at any layer in your framework that applies a cache, there is a good description at the top of the performance page.
When ever a cache has to be validated after a potential change that makes the cache stale, this could be a timeout or more intelligent (i.e. change in the cached item).
When a cache item is evicted, the algorithm for doing this is described in the section "Cache eviction algorithm" in the performance article you linked, but in short.
LFRU (Least frequently - recently used) cache on hit count and age with a limit of 800 items.
The other things you mentioned, specifically recompilation and restarting of IIS clear either parts or all of the in memory caches.

As you have stated, use "pre-generated views" that's really all you need to do.
Extracted from your link:
"When views are generated, they are also validated. From a performance standpoint, the vast majority of the cost of view generation is actually the validation of the views"
This means the performance knock will take place when you build your model assembly. Your context object will then skip the "cold query" and stay responsive for the duration of the context object life cycle as well as subsequent new object contexts.
Executing irrelevant queries will serve no other purpose than to consume system resources.
The shortcut ...
Skip all that extra work of pre-generated views
Create your object context
Fire off that sweet irrelevant query
Then just keep a reference to your object context for the duration of your process
(not recommended).

I have no experience in this framework. But in other contexts, e.g. Solr, completely dummy reads will not be of much use unless you can cache the whole DB (or index).
A better approach would be to log the queries, extract the most common ones out of the logs and use them to warm up. Just be sure not to log the warm up queries or remove them from the logs before proceeding.

Related

Adding more hardware v/s refactoring code under a time crunch

Background:
Enterprise application - very will written for its time in 2004.
Stack:
.NET, Heavy use of Remoting, ASMX style web services, SQL Server
Problem:
The application allows user to go through various wizards for lack of a better term, all of their actions are stored in what we call "wiz state", which is essentially XML that is persisted to a SQL server database very frequently because we allow users to pause/resume their application. Often in these wizards, the XML that comprises the wizard state grows very large, I'm talking 5-8 MB of data, and we noticed that when we had a sudden influx of simultaneous users, we started receiving occasional timeouts against the database, because a lot of what the wizard state is comprised of, is keeping track of collections of "things". Sometimes these custom collections grow very large.
Question:
We were in a meeting today and we're expecting a flurry of activity in October that will test the system like never before, and possibly result in huge wizard states that go back and forth from the web server to the database. The crux of the situation is that there is only one database and one web server.
For arguments sake, because of the complexity of the application, lets say adding any kind of clustering/mirroring to increase database throughput is out of the question. I spoke up in the meeting and said the quickest way to address this in the shortest time period would be to add more servers to the front end web application so the load could be distributed amongst web servers. The development lead said I was completely wrong and it would have no effect because we only have one database, so adding more web power would do nothing. He is having one of the other developers reduce the xml bloat that we persist frequently to the database. Probably in the long run, reducing the size of the xml that we pass back and forth is the right idea, but will adding additional web servers truly have no effect, I just think in terms of simultaneous users, it should help.
Any responses thoughts are appreciated, proof that more web servers would help would be pure win.
Thanks.
EDIT: We use binary serialization to store the XML in the database in an image field.
I haven't heard anything about locating the "bottlenecks". Isn't that the first thing to do? Here's the method I use.
Otherwise you're just investing in guesses. That won't work.
I've been in meetings like that, where everybody gets excited throwing ideas around, and "management" wants to make "decisions", but it's the blind leading the blind. Knuckle down and find out what's going on. You can't do that in meetings.
Some time ago I looked at a performance problem with some similarity to yours. The biggest "bottleneck" was in writing and parsing XML, with attendant memory allocation, setup, and destruction. Then there were others as well. You might find the same thing, or something different.
P.S. I keep quoting "bottleneck" because all the performance problems I've found have been nothing at all like the necks of bottles. Rather they are like way over-bushy call trees that need radical pruning, such as making and reading mountains of XML for no good reason.
If the rate at which the data is written by SQL is the bottleneck, feeding data to SQL more quickly should have no effect.
I am not sure exactly what the data structure is, but perhaps compressing the XML data on the web server(s) before writing may have a positive effect.
If the bottleneck is the database, then more web services will not help you a lot.
The problem may be that the problem is not only the size of the data, but the number of concurrent request to the same table. The number of writes will be the big problem. If your XML write is in a transaction with other queries you may try to break out the XML write from that transaction to reduce locking time of the XML table.
As stated by vdeych you may try compression to reduce the data size. (That would increase the load on the web servers.)
You may also try caching the data. Only read from the SQL server if the data is not already in the cache. Make sure you don't update the SQL server if your data has not changed.
No one seems to have suggest this, what about replacing your XML serialization of your wizard with JsonSerialization.
Not only should this give you a minor boost in performance in the serialization itself since both the DataContractSerializer (faster) and Newtonsoft Json.NET (fastest) out perform the XML serializers in .NET. This should easily reduce the size of your object graph by upwards of 50% or more (depending on number of properties vs large strings in the XML).
This should dramatically lower the IO that is inflicted upon Sql server. This should also limit the amount of scope required to alter your application significantly (assuming it's well designed and works through common calls for serialization/deserialization).
If you choose to go this route also invest time comparing BSON vs JSON as I think it would be likely that the binary encoded one will offer even more space savings (and further IO reduction) due to the size of your object graphs.
I'm not a .NET expert but maybe using a binary serialization would increase throughput. Making sure that the XML isn't stored as text (fairly obvious but thought I'd mention it). Also relational databases are best for storing relational data, so perhaps substituting an ORM layer in place of the serialization (sounds feasible) could speed things up.
Mike is spot on, without understanding the resource constaint leading to the performance issues, no amount of discussion will resolve the problem. I'll add that socket timeouts that affect running statements are a symptom, and are never imposed by SQL Server, they're an artifact of your driver configuration or a firewall or similar device between app and db imposing them (unless you're talking about timeouts for new connections, then you have a host in serious distress under load).
Given your symptom is database timeouts, you need to start there. If they're indicative of long running statements that result in a socket timeout, use SQL Server profiler to capture the workload while simultaneously monitoring system resources. Given it's a mature application and the type of workload you mention, it's unlikely to be statement tuning related, it probably boils down to resource limitations CPU, memory or disk IO capacity
This Technet guide is a very good place to start:
http://technet.microsoft.com/en-us/library/cc966540.aspx
If it's resource contention, then it's a simple discussion about how the resource contention can be tuned, configured for or addressed by adding more of whatever is needed.
Edit: I should add that given a database performance issue, more applications servers is likely to worsen the problem as you increase the amount of concurrency, that might otherwise be kept in check by connection pool, request processing or other limits.

Combining Session and Cache

To make my extranet web application even faster/more scalable I think of using some of the caching mechanisms. For some of the pages we'll use HTML caching, please ignore that technique for this question.
E.g.: at some point in time 2500 managers will simultaneously login on our application
(most of them with the same Account/Project)
I think of storing an Account-cachekey and Project-cachekey into the user's Session and use that to get the item from the Cache.
I could have simply stored the Account into the session, but that would result in 2500 of the same Accounts in memory.
Is there a better solution to this or does it make sense :)?
Generally adding items to session is seen as having a negative impact on scalability. Depending on your technology, you may have a problem scaling out to more than 1 server when using session variables (eg in classic asp).
Having said that, if performance is your top priority you could cache data in both session and application variables. I have always thought that its not worth the hassle for a dataset of this size becuase sql server will almost certainly have this data cached in memory and all you are saving is a network round trip.
Lastly, look at code and hardware optimisation first for performance enhancements. Migrating to managed/compiled code, reducing the size of your html, optimising your images, minifying JavaScript, and of course the html caching you mentioned previously - these are all things I would consider first.

Do ASP.NET developers really need to be concerned with thread safety?

I consider myself aware of the concepts of threading and why certain code is or isn’t “thread-safe,” but as someone who primarily works with ASP.NET, threading and thread safety is something I rarely think about. However, I seem to run across numerous comments and answers (not necessarily for ASP.NET) on Stack Overflow to the effect of “warning – that’s not thread-safe!,” and it tends to make me second guess whether I’ve written similar code that actually could cause a problem in my applications. [shock, horror, etc.] So I’m compelled to ask:
Do ASP.NET developers really need to be concerned with thread safety?
My Take: While a web application is inherently multi-threaded, each particular request comes in on a single thread, and all non-static types you create, modify, or destroy are exclusive to that single thread/request. If the request creates an instance of a DAL object which creates an instance of a business object and I want to lazy-initialize a collection within this object, it doesn’t matter if it’s not thread-safe, because it will never be touched by another thread. ...Right? (Let’s assume I’m not starting a new thread to kick off a long-running asynchronous process during the request. I’m well aware that changes everything.)
Of course, static classes, methods and variables are just the opposite. They are shared by every request, and the developer must be very careful not to have “unsafe” code that when executed by one user, can have an unintended effect on all others.
But that’s about it, and thus thread safety in ASP.NET mostly boils down to this: Be careful how you design and use statics. Other than that, you don’t need to worry about it much at all.
Am I wrong about any of this? Do you disagree? Enlighten me!
There are certain objects in addition to static items that are shared across all requests to an application. Be careful about putting items in the application cache that are not thread-safe, for example. Also, nothing prevents you from spawning your own threads for background processing while handling a request.
There are different levels of ASP.NET Developers. You could make a perfectly fine career as an ASP.NET Developer without knowing anything threads, mutexes, locks, semaphores and even design patterns because a high percentage of ASP.NET applications are basically CRUD applications with little to no additional business logic.
However, most great ASP.NET Developers which I have come across aren't just ASP.NET Developers, their skills run the gamut so they know all about threading and other good stuff because they don't limit themselves to ASP.NET.
So no, for the most part ASP.NET Developers do not need to know about thread safety. But what fun is there in only knowing the bare minimum?
Only if you create, within the processing stream for a single HTTPRequest, multiple threads of your own... For e.g., if the web page will display stock quotes for a set of stocks, and you make separate calls to a stock quote service, on independant threads, to retrive the quotes, before generating the page to send back to the client... Then you would have to make sure that the code you are running in your threads is thread-safe.
I believe you covered it all very well. I agree with you. Being focused on ASP.NET only it rarely (if at all) comes to multi-threading issues.
The situation changes however when it comes to optimizations. Whenever your start a long-lasting query, you may often want to let it run in a separate thread so that the page load does not stop until the server reports connection timeout. You may wish to have this page periodically check for completion status to notify the user. Here where it comes to multi-threading issues.

Using static data in ASP.NET vs. database calls?

We are developing an ASP.NET HR Application that will make thousands of calls per user session to relatively static database tables (e.g. tax rates). The user cannot change this information, and changes made at the corporate office will happen ~once per day at most (and do not need to be immediately refreshed in the application).
About 2/3 of all database calls are to these static tables, so I am considering just moving them into a set of static objects that are loaded during application initialization and then refreshed every 24 hours (if the app has not restarted during that time). Total in-memory size would be about 5MB.
Am I making a mistake? What are the pitfalls to this approach?
From the info you present, it looks like you definitely should cache this data -- rarely changing and so often accessed. "Static" objects may be inappropriate, though: why not just access the DB whenever the cached data is, say, more than N hours old?
You can vary N at will, even if you don't need special freshness -- even hitting the DB 4 times or so per day will be much better than "thousands [of times] per user session"!
Best may be to keep with the DB info a timestamp or datetime remembering when it was last updated. This way, the check for "is my cache still fresh" is typically very light weight, just get that "latest update" info and check it with the latest update on which you rebuilt the local cache. Kind of like an HTTP "if modified since" caching strategy, except you'd be implementing most of it DB-client-side;-).
If you decide to cache the data (vs. make a database call each time), use the ASP.NET Cache instead of statics. The ASP.NET Cache provides functionality for expiry, handles multiple concurrent requests, it can even invalidate the cache automatically using the query notification features of SQL 2005+.
If you use statics, you'll probably end up implementing those things anyway.
There are no drawbacks to using the ASP.NET Cache for this. In fact, it's designed for caching data too (see the SqlCacheDependency class http://msdn.microsoft.com/en-us/library/system.web.caching.sqlcachedependency.aspx).
With caching, a dbms is plenty efficient with static data anyway, especially only 5M of it.
True, but the point here is to avoid the database roundtrip at all.
ASP.NET Cache is the right tool for this job.
You didnt state how you will be able to find the matching data for a user. If it is as simple as finding a foreign key in the cached set then you dont have to worry.
If you implement some kind of filtering/sorting/paging or worst searching then you might at some point miss the quereing capabilities of SQL.
ORM often have their own quereing and linq makes things easy to, but it is still not SQL.
(try to group by 2 columns)
Sometimes it is a good way to have the db return the keys of a resultset only and use the Cache to fill the complete set.
Think: Premature Optimization. You'll still need to deal with the data as tables eventually anyway, and you'd be leaving an "unusual design pattern".
With event default caching, a dbms is plenty efficient with static data anyway, especially only 5M of it. And the dbms partitioning you're describing is often described as an antipattern. One example: multiple identical databases for multiple clients. There are other questions here on SO about this pattern. I understand there are security issues, but doing it this way creates other security issues. I've recently seen this same concept in a medical billing database (even more highly sensitive) that ultimately had to be refactored into a single database.
If you do this, then I suggest you at least wait until you know it's solving a real problem, and then test to measure how much difference it makes. There are lots of opportunities here for Unintended Consequences.

Custom caching in ASP.NET

I want to cache custom data in an ASP.NET application. I am putting lots of data into it, such as List<objects>, and other objects.
Is there a best practice for this? Since if I use a static data, if the w3p.exe dies or gets recycled, the cache will need to be filled again.
The database is also getting updated by other applications, so a thread would be needed to make sure it is on the latest data.
Update 1:
Just found this, which problably helps me
http://www.codeproject.com/KB/web-cache/cachemanagementinaspnet.aspx?fid=229034&df=90&mpp=25&noise=3&sort=Position&view=Quick&select=2818135#xx2818135xx
Update 2:
I am using DotNetNuke as the application, ( :( ). I have enabled persistent caching and now the whole application feels slugish.
Such as a Multiview takes about 3 seconds to swap view....
Update 3:
Strategies for Caching on the Web?
Linked to this, I am using the DotNetNuke caching method, which in turn uses the ASP.NET Cache object, it also has file based caching.
I have a helper:
CachingProvider.Instance().Add( _
(label & "|") + key, _
newObject, _
Nothing, _
Cache.NoAbsoluteExpiration, _
Cache.NoSlidingExpiration, _
CacheItemPriority.NotRemovable, _
Nothing)
Which runs that to add the objects to the cache, is this correct? As I want to keep it cached as long as possible. I have a thread which runs every x Minutes, which will update the cache. But I have noticied, the cache is getting emptied, I check for an object "CacheFilled" in the cache.
As a test I've told the worker process not to recycle, etc., but still it seems to clear out the cache. I have also changed the DotNetNuke settings from "heavy" to "light" but think that is for module caching.
You are looking for either out of process caching or a distributed caching system of some sort, based upon your requirements. I recommend distributed caching, because it is very scalable and is dedicated to caching. Someone else had recommended Velocity, which we have been evaluating and thoroughly enjoying. We have written several caching providers that we can interchange while we are evaluating different distributed caching systems without having to rebuild. This will come in handy when we are load testing the various systems as part of the final evaluation.
In the past, our legacy application has been a random assortment of cached items. There have been DataTables, DataViews, Hashtables, Arrays, etc. and there was no logic to what was used at any given time. We have started to move to just caching our domain object (which are POCOs) collections. Using generic collections is nice, because we know that everything is stored the same way. It is very simple to run LINQ operations on them and if we need a specialized "view" to be stored, the system is efficient enough to where we can store a specific collection of objects.
We also have put an abstraction layer in place that pretty much brokers calls between either the DAL or the caching model. Calls through this layer will check for a cache miss or cache hit. If there is a hit, it will return from the cache. If there is a miss, and the call should be cached, it will attempt to cache the data after retrieving it. The immediate benefit of this system is that in the event of a hardware or software failure on the machines dedicated to caching, we are still able to retrieve data from the database without having a true outage. Of course, the site will perform slower in this case.
Another thing to consider, in regards to distributed caching systems, is that since they are out of process, you can have multiple applications use the same cache. There are some interesting possibilities there, involving sharing database between applications, real-time manipulation of data, etc.
Also have a look at the MS Enterprise Caching Application block which allows your to write custom expiration policy, custom store etc.
http://msdn.microsoft.com/en-us/library/cc309502.aspx
You can also check "Velocity" which is available at
http://code.msdn.microsoft.com/velocity
This will be useful if you wish to scale your application across servers...
There are lots of articles about the Cache object in ASP.NET and how to make it use SqlDependencies and other types of cache expirations. No need to write your own. And using the Cache is recommended over session or any of the other collections people used to cram lots of data into.
Cache and Session can lead to sluggish behaviour, but sometimes they're the right solutions: the rule of right tool for right job applies.
Personally I've often created collections in pseudo-static singletons for the kind of role you describe (typically to avoid I/O overheads like storing a compiled xslttransform), but it's very important to keep in mind that that kind of cache is fragile, and design for it to A). filewatch or otherwise monitor what it's supposed to cache where appropriate and B). recreate/populate itself with use - it should expect to get flushed frequently.
Essentially I recommend it as a performance crutch, but don't rely on it for anything requiring real persistence.

Resources