Cache Persistence - asp.net

I am using the Asp.Net Caching mechanism for a highly frequent changing web app.
the cache holds chat participants and their messages, and it needs to keep track of
participants presence.
Data is changing very frequently, participants go in and out, and messages are sent and recieved.
The cache provides me with solutions for:
- performance
- reducing the number of DDL operations in the database (SQL Server) - we had a problem with the transaction log getting full.
I want to continue working this way, but I cannot rely on the cache (I can lose all data when the Cache recycles, or some of the data when the memory gets full).
The option I see right now is to save the data to the database every time the cache changes, otherwise I will lose data.
But this means many SQL update/insert statements.
someone adviced me to persist to the database evey N messages/changes, but it's still not a reliable solution. I still lose data.
Anyone has an idea?
Thanks
Yaron

Fix your database capacity issues. If you need to be able to reliably save n changes per second, then your database needs to be able to handle n operations per second.
Anything else (including saving every few operations) will lead to some possibility of data loss. If you can accept that data loss risk, then that would work.
A distributed cache (project Velocity or otherwise) could also help (data is at least saved to multiple machines). But that needs extra hardware, and you could spend that on the capacity of the database.
Finally, rather than trying to cache changes look for other opportunities to cache database reads, taking that load off might allow the writes to go through. At least until you get more usage.

Related

Many small writes to SQLite

I have an application, which runs all the time and receives some messages (rate of them varies from several per second to none per hour). Every message should be put into a SQLite database. What's the best way to do this?
Opening and closing the database on each message doesn't sound good: if there are tens of them per second, it will be extremely slow.
On the other hand, opening the database once and just writing to it can lead to loss of data if the process unexpectedly terminates.
It sounds like whatever you do, you'll have to make a trade-off.
If safety is your top-most concern, then update the database on each message and take the speed hit.
If you want a compromise, then update the database write every so many messages. For instance, maintain a buffer and every 100th message, issue an update, wrapped in a transaction.
The transaction wrapping is important for two reasons. First, it maximizes speed. Second, it can help you recover from errors if you employ logging.
If you do the batch update above, you can add an additional level of safety by logging each message as it comes to a file. You will reset this log every time a database update is successfully issues. That way, if an update fails, you know it failed on the entire block (since you are using transactions) and your log will have the information that did not update. This will allow you to re-issue the update, or even see if there was a problem with the data that caused the failure. This of course assumes that keeping a log is cheaper than updating the database, which can be the case depending on how you are connecting.
If your top rate is "several per second" then I dont see a real problem with opening and closing the db. This is especially true if its critical that the data be recorded right away in case of server failure.
We use SQLite in a reporting product and the best performance we have been able to eek out is recording rows in blocks of several thousands at a time. Our default is around setting is 50k. That means our app waits around until 50k rows of data is collected then commits it as one transaction.
There is an easy algorithm to adjust your application's behaviour to the message rate:
When you have just written a message, check if there is any new message.
If yes, write that message too, and repeat.
Only when you have run out of immediately available messages, commit the transaction and close the database.
In that manner, every message will be saved immediately, unless the message rate becomes too high for that.
Note: closing the database will not increase data durability (that's what transaction commit is for), it will just free up a little bit of memory.

How to implement real time updates in ASP.NET

I have seen several websites that show you a real time update of what's going on in the database. An example could be
A stock ticker website that shows stock prices in real time
Showing data like "What other users are searching for currently.."
I'd assume this would involve some kind of polling mechanism that queries the database every few seconds and renders it on a web page. But the thought scares me when I think about it from the performance standpoint.
In an application I am working on, I need to display the real time status of an operation that a user has submitted. Users wait for the process to be completed. As and when an operation is completed, the status is updated by another process (could be a windows service). Should I query the database every second to get the updated status?
It's not necessarily done in the db. As you suggested that's expensive. Although db might be a backing store, likely a more efficient mechanism is used to accompany the polling operation like storing the real-time status in memory in addition to finally on the db. You can poll memory much more efficiently than SELECT status from Table every second.
Also as I mentioned in a comment, in some circumstances, you can get a lot of mileage out of forging the appearance of status update through animations and such, employ estimation, checking the data source less often.
Edit
(an optimization to use less db resources for real time)
Instead of polling the database per user to check job status every X seconds, slightly alter the behaviour of the situation. Each time a job is added to the database, read the database once to put meta data about all jobs in the cache. So , for example, memory cache will reflect [user49 ... user3, user2, user1, userCurrent] 50 user's jobs if 1 job each. (Maybe I should have written it as [job49 ... job2, job1, job_current] but same idea)
Then individual users' web pages will poll that cache which is always kept current. In this example the db was read just 50 times into the cache (once per job submission). If those 50 users wait an average 1 minute for job processing and poll for status every second then user base polls the cache a total of 50 users x 60 secs = 3000 times.
That's 50 database reads instead of 3000 over a period of 50 min. (avg. one per min.) The cache is always fresh and yet handles the load. It's much less scary than considering hitting the db every second for each user. You can store other stats and info in the cache to help out with estimation and such. As long as fresh cache provides more efficiency then it's a viable alternative to massive db hits.
Note: By cache I mean a global store like Application or 3rd party solution, not the ASP.NET page cache which goes out of scope quickly. Caching using ASP.NET's mechanisms might not suit your situation.
Another Note: The db will know when another job record is appended no matter from where, so a trigger can init the cache update.
Despite a good database solution, so many users polling frequently is likely to create problems with web server connections and you might need a different solution at that level, depending on traffic.
Maybe have a cache and work with it so yo don't hit the database each time the data is modified and update the database every few seconds or minutes or what you like
The problem touches many layers of a web application.
On the client, you either use an iframe whose content autorefreshes every n seconds using the meta refresh tag (HTML), or a javascript which is triggered by a timer and updated a named div (AJAX).
On the server, you have at least two places to cache your data:
One is in the Application object, where you keep a timestamp of the last update, and refresh the cached data as your refresh interval elapses.
If you want to present data from a database, keep aggregated values or cache relevant data for faster retrieval.

Running a query in Page Load a bad idea?

I'm running an ASP.NET app in which I have added an insert/update query to the [global] Page_Load. So, each time the user hits any page on the site, it updates the database with their activity (session ID, time, page they hit). I haven't implemented it yet, but this was the only suggestion given to me as to how to keep track of how many people are currently on my site.
Is this going to kill my database and/or IIS in the long run? We figure that the site averages between 30,000 and 50,000 users at one time. I can't have my site constantly locking up over a database hit with every single page hit for every single user. I'm concerned that's what will happen, however this is the first time I have attempted a solution like this so I may just be overly paranoid.
Do it Async.
Create a dll that handles the update, and in the page load do a fire and forget with parameters.
Insert-Based designs have less locking than Update-Based designs.
So if a user logged-in and then logged-out, in an Insert-Based design you would have multiple rows with a SessionID in each, one for each activity whereas in an Update-Based design, you would have a SessionId, LoginTime and a LogoutTime column and you would update the LogoutTime based on the SessionId.
I have seen many more locking and contention problems caused by Update activity more than Insert activity.
Activities such as counting and linking logins to logouts etc take more complex queries and a little more resources.
It goes without saying that your queries, especially the ones that run on every page, should be as fast as possible so that the site doesn't appear slow to users.
To keep track of how many users are currently on your site you could use performance counters. What you describe though sounds more like a full fledged logging of every page hit.
Lets say you realy have 50k users connected at any one time.
As long as you don't have contention between the updates (trying to lock the same record) a database can track a very high number of inserts and updates. You need to do some capacity planning to assure the load can be carried. 50k users visiting a page every minute will give you 50k inserts and 50k updates per minute, roughly 850 inserts and 850 updates per second, which have to commit (flush the log). Does your DB I/O subsytem support such a write pressure load, in addition to responding to all the requests (reads)?
Also 50k users doing 1 page hit per minute adds up to 72 mil hits per day, 72 mil. logging inserts, at such a rate you need to carefully plan the size capacity of the database and consider what kind of analysis you'll do on the collected data since querying ad-hoc 2 billion rows (one month data) will get you nowhere fast (actually... quite slow).
Doing it async can give you some relief over very short spikes, but not on the long run. If your DB system cannot handle the load then doing async calls will just create a backlog queue in the application process (in the ASP app pool) and this will grow until out of memory, at which moment the all vigilant IIS will 'recycle' the app pool, thus loosing all pending async updates.
I think updating the database in the begin session and end session will do the job. that will reduce the count of statements dramatically.
I think it makes no difference if you track hits or begin/end session. with hits you'll also need additional logic to subtract inactive users
EDIT: session end is not fired always. I would suggest to call an update statement/stored procedure in another session begin event (in addition to the other insert statement) that will fix invalid sessions.
I don't think that calling this "fix routine" is necessary in every page load event because I think you cant exactly count "current no. of visitors".
I would keep this in Application state instead - if possible. On ApplicationStart create some data structure saved to App state that you can update from anywhere in your application - session start, page load, wherever. Keep it out of the database. You are just using it to track "currently online" info anyway it sounds like.
If you have multiple instances of your app, or if there is a requirement to maintain historical info beyond the IIS logs, this won't work obviously. Go with chris' fire-and-forget solution in that case.
What's wrong with IIS Logs?
2009-05-01 12:30:31 207.219.27.35 GET /assocadmin/ibb-reg.asp - usernameremoved 544.566.570.575 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+6.0;+SLCC1;+.NET+CLR+2.0.50727;+Media+Center+PC+5.0;+.NET+CLR+3.5.30729;+.NET+CLR+3.0.30618) 200 0 0 40058
EDIT: I'd like to close this answer, but I want the comments to stay. Consider this answer withdrawn.
How about adding a small object to the session?
Something like LoggedInUserFlag:IDisposable
In the constructor, increment your counter however you decide to implement it.
Then in the Dispose method, decrement the counter.
This way, regardless of how the session is ended, the counter will always be (eventually) decremented.
see:
http://weblogs.asp.net/cnagel/archive/2005/01/23/359037.aspx
for info on using IDisposable.
I am not an ASP guy at all, but what about rather than logging all that other info, and insert their IP address?
If they have an IP address already in there, have a last_seen timestamp, and on each refresh just delete any row that isn't 10 minutes ago?
This is how I would take a shot at it. It is much more space efficient, but I am not sure about the checking and deleting so much on such a high profile site.
As a direct answer to your question, yes, running a database query in-line with every request is a bad idea:
Synchronous requests will tie up a thread, which will reduce your scalability (fewer simultaneous activities)
DB inserts (or updates) are writes to the DB, which will put a load on your log volume
DB accesses shouldn't be required in a single server / single AppPool scenario
I answered your question about how to count users in the other thread:
Best way to keep track of current online users
If you are operating in a multi-server / load-balanced environment, then DB accesses may in fact be required. In that case:
Queue them to a background thread so the foreground request thread doesn't have to wait
Use Resource Governor in SQL 2008 to reduce contention with other DB accesses
Collect several updates / inserts together into a single batch, in a single transaction, to minimize log disk I/O pressure
Return the current count with each DB access, to minimize round-trips
In case it's of any interest, I cover sync/async threading issues and the techniques above in detail in my book, along with code examples: Ultra-Fast ASP.NET.

Using static data in ASP.NET vs. database calls?

We are developing an ASP.NET HR Application that will make thousands of calls per user session to relatively static database tables (e.g. tax rates). The user cannot change this information, and changes made at the corporate office will happen ~once per day at most (and do not need to be immediately refreshed in the application).
About 2/3 of all database calls are to these static tables, so I am considering just moving them into a set of static objects that are loaded during application initialization and then refreshed every 24 hours (if the app has not restarted during that time). Total in-memory size would be about 5MB.
Am I making a mistake? What are the pitfalls to this approach?
From the info you present, it looks like you definitely should cache this data -- rarely changing and so often accessed. "Static" objects may be inappropriate, though: why not just access the DB whenever the cached data is, say, more than N hours old?
You can vary N at will, even if you don't need special freshness -- even hitting the DB 4 times or so per day will be much better than "thousands [of times] per user session"!
Best may be to keep with the DB info a timestamp or datetime remembering when it was last updated. This way, the check for "is my cache still fresh" is typically very light weight, just get that "latest update" info and check it with the latest update on which you rebuilt the local cache. Kind of like an HTTP "if modified since" caching strategy, except you'd be implementing most of it DB-client-side;-).
If you decide to cache the data (vs. make a database call each time), use the ASP.NET Cache instead of statics. The ASP.NET Cache provides functionality for expiry, handles multiple concurrent requests, it can even invalidate the cache automatically using the query notification features of SQL 2005+.
If you use statics, you'll probably end up implementing those things anyway.
There are no drawbacks to using the ASP.NET Cache for this. In fact, it's designed for caching data too (see the SqlCacheDependency class http://msdn.microsoft.com/en-us/library/system.web.caching.sqlcachedependency.aspx).
With caching, a dbms is plenty efficient with static data anyway, especially only 5M of it.
True, but the point here is to avoid the database roundtrip at all.
ASP.NET Cache is the right tool for this job.
You didnt state how you will be able to find the matching data for a user. If it is as simple as finding a foreign key in the cached set then you dont have to worry.
If you implement some kind of filtering/sorting/paging or worst searching then you might at some point miss the quereing capabilities of SQL.
ORM often have their own quereing and linq makes things easy to, but it is still not SQL.
(try to group by 2 columns)
Sometimes it is a good way to have the db return the keys of a resultset only and use the Cache to fill the complete set.
Think: Premature Optimization. You'll still need to deal with the data as tables eventually anyway, and you'd be leaving an "unusual design pattern".
With event default caching, a dbms is plenty efficient with static data anyway, especially only 5M of it. And the dbms partitioning you're describing is often described as an antipattern. One example: multiple identical databases for multiple clients. There are other questions here on SO about this pattern. I understand there are security issues, but doing it this way creates other security issues. I've recently seen this same concept in a medical billing database (even more highly sensitive) that ultimately had to be refactored into a single database.
If you do this, then I suggest you at least wait until you know it's solving a real problem, and then test to measure how much difference it makes. There are lots of opportunities here for Unintended Consequences.

Caching database data in session - getting the balance right

If you cache data from your database in ASP.NET session then you will speed up subsequent requests for that data (note: depending on the serialization/deserialization cost of the data concerned), at the expense of memory load in IIS.
(OK, this is probably a simplification of the reality of the situation - feel free to correct or refine my statement above)
Do you use any rules (simple rules of thumb or otherwise) to decide when it is appropriate to cache in Session?
Update
Can use of Session for storing read-only data without very well thought out rationale simply be considered another case of Premature Optimization? (and therefore, bad)
Keep in mind with caching in session if you use the InProc mode, you are limiting your scalability unless your code will go back to the DB when the session is empty. You will be forced to use a single server or pin your user to a particular server.
If your using Session State Server this doesn't apply, and if your using SQL to store session state then using session for caching is pointless.
Edit
Any advice around this topic is subject to your specific environment. As you stated, a very complex sql query might benefit from being cached even when using SQL Session State. I think the most important thing is first make your application perform the functions and achieve the business requirements. Then go back and test and optimize the application to handle the load you expect.
Edit 2
Based on your update no I don't think this is true. In one of my ASP.Net Applications I am using session state to store a complex object model that the user is then manipulating and modifying. We are using AJAX and so we have a lot of short communication to server as user updates the object. Keeping the object in session was done as a convience. The object performs a lot of customized calculations to generate different data points, so doing this on the server was ideal as opposed to trying to replicate the code in the servr and in javascript. Also keeping it in session lets us have an undo function very easily.
And Yes I know we sacrificed scalability, but I welcome the day when I sit down with my boss and explain we have a problem because we have too many users (And I know he would too).
So I think the question is why are you storing data in session is it for convience and to provide access to transient data while the user is logged in? That is different then storing it there for caching. One thing to remeber with caching is how are you going to flush the cache? How do you invalidate it? I don't session state is built to handle this.
Edit 3
Back at it well read only data, user specific, expensive to load from DB, go ahead cache it in session. You can write your code so if it's not in session then you hit the DB. Have a nice little helper class that does this hidding it from your web app that your even using session and you should be good. Nice thing about hidding where you store it from your web if you find you run into issues you only have one place to change it.
Given that it is very hard to guarantee that data in a session gets cleared up in a reasonable time frame, I would be very wary of cashing anything more than the odd integer there. When you get up to a large scale you will either hit memory issues or have all your users get session time out errors. Remember that unlike the Cache object sessions don't get killed off when memory is low.
Also as Josh says if you go to multiple servers and need to use either a Database or Session State Server your session object will need to be serialisable. In this case the cost of serialising and de serialising is likely to be worse than the cost of a well optimized query.

Resources