I have seen several websites that show you a real time update of what's going on in the database. An example could be
A stock ticker website that shows stock prices in real time
Showing data like "What other users are searching for currently.."
I'd assume this would involve some kind of polling mechanism that queries the database every few seconds and renders it on a web page. But the thought scares me when I think about it from the performance standpoint.
In an application I am working on, I need to display the real time status of an operation that a user has submitted. Users wait for the process to be completed. As and when an operation is completed, the status is updated by another process (could be a windows service). Should I query the database every second to get the updated status?
It's not necessarily done in the db. As you suggested that's expensive. Although db might be a backing store, likely a more efficient mechanism is used to accompany the polling operation like storing the real-time status in memory in addition to finally on the db. You can poll memory much more efficiently than SELECT status from Table every second.
Also as I mentioned in a comment, in some circumstances, you can get a lot of mileage out of forging the appearance of status update through animations and such, employ estimation, checking the data source less often.
Edit
(an optimization to use less db resources for real time)
Instead of polling the database per user to check job status every X seconds, slightly alter the behaviour of the situation. Each time a job is added to the database, read the database once to put meta data about all jobs in the cache. So , for example, memory cache will reflect [user49 ... user3, user2, user1, userCurrent] 50 user's jobs if 1 job each. (Maybe I should have written it as [job49 ... job2, job1, job_current] but same idea)
Then individual users' web pages will poll that cache which is always kept current. In this example the db was read just 50 times into the cache (once per job submission). If those 50 users wait an average 1 minute for job processing and poll for status every second then user base polls the cache a total of 50 users x 60 secs = 3000 times.
That's 50 database reads instead of 3000 over a period of 50 min. (avg. one per min.) The cache is always fresh and yet handles the load. It's much less scary than considering hitting the db every second for each user. You can store other stats and info in the cache to help out with estimation and such. As long as fresh cache provides more efficiency then it's a viable alternative to massive db hits.
Note: By cache I mean a global store like Application or 3rd party solution, not the ASP.NET page cache which goes out of scope quickly. Caching using ASP.NET's mechanisms might not suit your situation.
Another Note: The db will know when another job record is appended no matter from where, so a trigger can init the cache update.
Despite a good database solution, so many users polling frequently is likely to create problems with web server connections and you might need a different solution at that level, depending on traffic.
Maybe have a cache and work with it so yo don't hit the database each time the data is modified and update the database every few seconds or minutes or what you like
The problem touches many layers of a web application.
On the client, you either use an iframe whose content autorefreshes every n seconds using the meta refresh tag (HTML), or a javascript which is triggered by a timer and updated a named div (AJAX).
On the server, you have at least two places to cache your data:
One is in the Application object, where you keep a timestamp of the last update, and refresh the cached data as your refresh interval elapses.
If you want to present data from a database, keep aggregated values or cache relevant data for faster retrieval.
Related
I have a webpage, which takes a while to load because it has to pull information from lots of local databases. For example, if a user searches for person 1 then it will query 20 databases. It can sometimes take 5 minutes to pull all the information needed and apply the business logic. The best solution is to design a data warehouse, which is a long term aim.
If I use data caching it reduces the page load time (of the big records) from five minutes to four seconds. Is it bad practice to store information in the cache for a long period of time i.e. 24 hours? The cache will be refreshed every 24 hours. Alternatively I could store the cached information in a database table.
Every example I find online caches information for seconds e.g. 20 seconds.
Pros:
Faster load times
Less bandwidth usage
Less stress on the server
Cons:
May require high technical expertise to configure it just right
Will not work for content that is constantly being updated
For many system administrators, especially those with the skills to implement a caching system, the pros greatly outweigh the cons. Caching can make your websites run more smoothly for visitors and lessen the burden on your dedicated server.
For more Check this link
Cache is used for global resources, if in your application the data is per user then use Session which is like cache per user.
You can also cache results and connect it to database tables so if a table is being update so does the cache it is called Cache Dependency.
The main concern you need to have is what if the cached information in not up to date, in that case use Cache Dependency.
Don't worry about memory issues in your server, the server is already optimized and knows to clean cache in case of lack in memory.
I hope this helps you.
I have an application that performs complex queries against what amounts to data organized in a "star schema". The gold-owner keeps adding new "axes" to perform searches on, with the result that performance becomes worse over time. Currently, the execution of a search operation, using a stored procedure to do all the work on the SQL server, takes about 2 seconds, which doesn't fit the gold-owner's desire to have the code be interactive (<0.1 sec response time). Looking at the SQL Server query analyzer, the search is IO-bound on 9 table scans of 100,000 records, and then doing brutal joins. Due to the nature of the queries I need to perform and the limitations of SQL, this cannot be improved.
In desperation, I've rewritten the query processor so that it sucks in the 100,000 records into a cache at application start, then perform the complex queries against the cached memory. Loading all the records from the database takes about 12 seconds. This expensive initial load is mitigated by my rewritten query processor. It now only needs to do a single scan through the records, and gives a response time of 0.02 seconds.
This good news is tainted by the gold-owner's discovery that the 12-second hit for populating the cache is being experienced every hour or so. I'm currently storing the data in the ASP.NET application state, as Application["FactTable"]. It seems the application state is being reset after the ASP.NET application is idle for longer than a dozen minutes or so.
If I move the 100,000 records into the ASP.NET application cache, will I be experiencing these evictions just as often, or can I rely on the data remaining in memory for the fast retrievals for longer periods of time? If the ASP.NET cache is also victim to application resets, what other mechanism should I use? A separate app domain hosting an instance of my database cache comes to mind, but I don't want to go down that route unless my other options are closed off.
I realise that you have a lot of data and processing and you must have tried a few things to speed this scenario up, but using Application State which is managed by IIS will be volatile...
Have you thought of running the your calculations etc in another process, ie, create a windows service that periodically runs the queries to organise your data and save that "flat" data to a database cache. When the user requests the data, they will just get the last DB cached results... and then further speed this up by holding those results in the Application state which can just refresh itself if that gets destroyed?
Let's say I have a site where a reporting page will contain user-specific daily reports.
By default, the user lands on today's report but then if he clicks on a calendar control and selects a new date, a new report will load for that day. What I want to do is use sessions as a cache for reports. The first time he loads a report for a day, it's loaded from the database into the session and then from the session to the page. Each time he loads a new report for a date, the logic first checks to see if this particular report is in the session and only if it's not is it loaded from the DB. A report has about 15 columns and is made up of about 300-500 rows per day. What will be stored in the session is a dictionary of lists of objects, with the date as the key and the list as the value. I'm using InProc session. I'm also considering storing several other dictionaries as well for other lists of objects.
Is this an efficient way to make the most of the .net framework? If it doesn't actually improve performance over making calls to the data source, will it make it slower? I'm looking to build something that'll scale to about 500-1000 simultaneous users or so.
Thanks.
It sounds like your pre-optimizing to me. Have you run any performance tests to verify that you get any benefit at all to putting it in Session (or Cache)?
Have your users expressed concern over the performance?
For me, I wouldn't consider optimizing to any kind of cache until I knew I had a performance issue, and then I would look at why I had the performance problem, and optimize that
There are striking differences between cache and session state:
Cache is server-based but session is user-based (can be across multiple servers)
Cache is usually URL-based so if the URL is the same for all users, this cannot be implemented via cache.
Cache will be recycled and memory reclaimed when server is close to out of memory (so you will not get an OutOfMemory exception using cache) but session just grows and you will get an OutOfMemory if you are using in-process sessions and not.
Session is usually used for user level state that has to be stored but cache is meant for performance boosting.
With all above, and considering your requirement, you had better use cache unless the URL is the same for all users. If you are using session state, you need to be very careful as just a little performance nicety can kill your website.
I'm running an ASP.NET app in which I have added an insert/update query to the [global] Page_Load. So, each time the user hits any page on the site, it updates the database with their activity (session ID, time, page they hit). I haven't implemented it yet, but this was the only suggestion given to me as to how to keep track of how many people are currently on my site.
Is this going to kill my database and/or IIS in the long run? We figure that the site averages between 30,000 and 50,000 users at one time. I can't have my site constantly locking up over a database hit with every single page hit for every single user. I'm concerned that's what will happen, however this is the first time I have attempted a solution like this so I may just be overly paranoid.
Do it Async.
Create a dll that handles the update, and in the page load do a fire and forget with parameters.
Insert-Based designs have less locking than Update-Based designs.
So if a user logged-in and then logged-out, in an Insert-Based design you would have multiple rows with a SessionID in each, one for each activity whereas in an Update-Based design, you would have a SessionId, LoginTime and a LogoutTime column and you would update the LogoutTime based on the SessionId.
I have seen many more locking and contention problems caused by Update activity more than Insert activity.
Activities such as counting and linking logins to logouts etc take more complex queries and a little more resources.
It goes without saying that your queries, especially the ones that run on every page, should be as fast as possible so that the site doesn't appear slow to users.
To keep track of how many users are currently on your site you could use performance counters. What you describe though sounds more like a full fledged logging of every page hit.
Lets say you realy have 50k users connected at any one time.
As long as you don't have contention between the updates (trying to lock the same record) a database can track a very high number of inserts and updates. You need to do some capacity planning to assure the load can be carried. 50k users visiting a page every minute will give you 50k inserts and 50k updates per minute, roughly 850 inserts and 850 updates per second, which have to commit (flush the log). Does your DB I/O subsytem support such a write pressure load, in addition to responding to all the requests (reads)?
Also 50k users doing 1 page hit per minute adds up to 72 mil hits per day, 72 mil. logging inserts, at such a rate you need to carefully plan the size capacity of the database and consider what kind of analysis you'll do on the collected data since querying ad-hoc 2 billion rows (one month data) will get you nowhere fast (actually... quite slow).
Doing it async can give you some relief over very short spikes, but not on the long run. If your DB system cannot handle the load then doing async calls will just create a backlog queue in the application process (in the ASP app pool) and this will grow until out of memory, at which moment the all vigilant IIS will 'recycle' the app pool, thus loosing all pending async updates.
I think updating the database in the begin session and end session will do the job. that will reduce the count of statements dramatically.
I think it makes no difference if you track hits or begin/end session. with hits you'll also need additional logic to subtract inactive users
EDIT: session end is not fired always. I would suggest to call an update statement/stored procedure in another session begin event (in addition to the other insert statement) that will fix invalid sessions.
I don't think that calling this "fix routine" is necessary in every page load event because I think you cant exactly count "current no. of visitors".
I would keep this in Application state instead - if possible. On ApplicationStart create some data structure saved to App state that you can update from anywhere in your application - session start, page load, wherever. Keep it out of the database. You are just using it to track "currently online" info anyway it sounds like.
If you have multiple instances of your app, or if there is a requirement to maintain historical info beyond the IIS logs, this won't work obviously. Go with chris' fire-and-forget solution in that case.
What's wrong with IIS Logs?
2009-05-01 12:30:31 207.219.27.35 GET /assocadmin/ibb-reg.asp - usernameremoved 544.566.570.575 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+6.0;+SLCC1;+.NET+CLR+2.0.50727;+Media+Center+PC+5.0;+.NET+CLR+3.5.30729;+.NET+CLR+3.0.30618) 200 0 0 40058
EDIT: I'd like to close this answer, but I want the comments to stay. Consider this answer withdrawn.
How about adding a small object to the session?
Something like LoggedInUserFlag:IDisposable
In the constructor, increment your counter however you decide to implement it.
Then in the Dispose method, decrement the counter.
This way, regardless of how the session is ended, the counter will always be (eventually) decremented.
see:
http://weblogs.asp.net/cnagel/archive/2005/01/23/359037.aspx
for info on using IDisposable.
I am not an ASP guy at all, but what about rather than logging all that other info, and insert their IP address?
If they have an IP address already in there, have a last_seen timestamp, and on each refresh just delete any row that isn't 10 minutes ago?
This is how I would take a shot at it. It is much more space efficient, but I am not sure about the checking and deleting so much on such a high profile site.
As a direct answer to your question, yes, running a database query in-line with every request is a bad idea:
Synchronous requests will tie up a thread, which will reduce your scalability (fewer simultaneous activities)
DB inserts (or updates) are writes to the DB, which will put a load on your log volume
DB accesses shouldn't be required in a single server / single AppPool scenario
I answered your question about how to count users in the other thread:
Best way to keep track of current online users
If you are operating in a multi-server / load-balanced environment, then DB accesses may in fact be required. In that case:
Queue them to a background thread so the foreground request thread doesn't have to wait
Use Resource Governor in SQL 2008 to reduce contention with other DB accesses
Collect several updates / inserts together into a single batch, in a single transaction, to minimize log disk I/O pressure
Return the current count with each DB access, to minimize round-trips
In case it's of any interest, I cover sync/async threading issues and the techniques above in detail in my book, along with code examples: Ultra-Fast ASP.NET.
I have a form with a list that shows information from a database. I want the list the update in run time (or almost real time) every time something changes in the database. These are the three ways I can think of to accomplish this:
Set up a timer on the client to check every few seconds: I know how to do this now, but it would involve making and closing a new connection to the database hundreds of times an hour, regardless of whether there was any change
Build something sort of like a TCP/IP chat server, and every time a program updates the database it would also send a message to the TCP/IP server, which in turn would send a message to the client's form: I have no idea how to do this right now
Create a web service that returns the date and time of when the last time the table was changed, and the client would compare that time to the last time the client updated: I could figure out how to build a web service, but I don't how to do this without making a connection to the database anyway
The second option doesn't seem like it would be very reliable, and the first seems like it would consume more resources than necessary. Is there some way to tell the client every time there is a change in the database without making a connection every few seconds, or is it not that big of a deal to make that many connections to a database?
I would imagine connection pooling would make this a non-issue. Depending on your database, it probably won't even notice it.
Are you making the update to the database? Or is the update happening from an external source?
Generally, hundreds of updates per hour won't even bother the DB. Even Access, which is pretty slow, won't cause a performance issue.
Here's a rough idea if you really want to optimize it and you're doing the data updates. Store an application variable on the server side called, say, LastUpdateTime. When you make updates to the database, you can update the LastUpdateTime variable with the current time. Since LastUpdateTime is a very lightweight object in server memory, your clients can technically request the last update time hundreds if not thousands of times per second without any round trip to the database. Based on the last time the client retrieved new information vs. the last update time on the server, you can then go fetch the updated info.
We have a similar question Polling database for updates from C# application. Another idea (may be not a proper solution) would be to use Microsoft Sync Framework. You can use a timer to sync the DB.