.NET vs SQL Server - best practise for repeated searching - asp.net

My client is asking for a "suggestion" based lookup to be added to a particular form field.
In other words, as you start typing into a field there should be a "Google style" popup which suggests possible results to select from. There will be in the order of "tens of thousands" of possible suggestions - this is the best estimate I currently have on the quantity.
Using AJAX to send/retrieve the result, my question is whether it is better to store ALL the suggestions within .NET cache and process there, or whether it's better to run a stored-procedure based query on SQL Server for each request?
This would be using .NET 2.0 and SQL Server 2005

There is one trick I use every time when faced with such task. Do not do it on every keystroke. Put the launching of the search on a sliding timeout.
The intent here is to launch the search only when the user paused in his/her typing. Usually I set the timeout at .1 to .2 sec. Psychologically it is still instantaneous, but it considerably reduces the load on whatever you will use to search

Your bottle neck will be transporting the data from the server to the browser. You can easily produce a large result from the database in almost no time at all, but it takes forever to return to the browser.
You will have to find a way to limit the result that the server returns, so that you only fetch what the user has to see right now.
When you have the data traffic down to reasonable level, you can start looking at optimising the server part. With some caching and logic that should be easy, considering that when the user types the result is often a subset of the previous result, e.g. the match for "hat" is a subset of the match for "ha".

When I've seen the "suggest as you type" type searches done in SQL Server environments, I've seen the best performance using some sort of a caching mechanism, and typically a distributed approach - like a memcached, typically. Even if your code is optimized well, your database is tuned well and you have your query taking only a <= 10ms with the call to it, process and return, that is still 10ms as they type.

It depends on the number of items. If you can cache the items in a .NET process without running out of memory this will defenitely be faster.
But if that can't be done you are stuck with accessing the database on each request. A stored procedure would be a nice way to do that.
But there are some design patterns which can guide you. I've used the following two while developing something similar to what you are doing.
Submission throttling
Browser side cache

Related

ASP Website runs slow when number of users Increases

I need some information from you.I have used session.TimeOut=540 in application.Is that effects on my Application performance after some time.When number of users increases its getting very slow. response time nearly more that 2 minutes for a button click also.This is hosted in server in Application pool .I don't know about Application pool much.If Session Timeout is the problem i will remove it.Please suggest me the way to for more users.
Job Numbers,CustomerID,Tasks will come from one database.when the user click start Button then the data saved in another Database.I need this need to be faster for more Users
I think that you have some page(s) that make some work that takes time, or for some reason or a bug is keep open for more time than the usual.
This page is keep lock the session and hold the rest page from response because the session holds all the pages.
Now, together with the increase of the timeout this page is lock everything and here is you response time near to 2 minutes.
The solution is to locate the page that have the long running problem and fix it or make it faster by optimize the process, or if this page must keep the long time running, then disable the session for that one.
relative:
Web app blocked while processing another web app on sharing same session
What perfmon counters are useful for identifying ASP.NET bottlenecks?
Replacing ASP.Net's session entirely
Trying to make Web Method Asynchronous
Does ASP.NET Web Forms prevent a double click submission?
About server
Now from the other hand, if your server suffer from hardware, or bad setup then here is one other answer with points that you need to check to make it faster.
Find out where the time is spent
add the StopWatch in the method which you said "more that 2 minutes for a button click". you can find which statment spent the most time.
If it is a query on DB that cost time. Check your sql statement.
are you using "SELECT Count(*)" instead of "SELECT Count(Id)"? the * is always slower. also, don't try "SELECT * FROM...."
Use cache.
there are many ways to do cache. both in ASPX pages and your biz layer.
the OutputCache is the most easy way.
and also, cache the page (for example a blog post) on the first time when a user visit it.
Did you use memory paging?
be careful when doing paging on gridview or other list. If you just call DataSource=xxx and DataBind(), even with PagedDataSource, this is likely a memory paging. It cost a lot of performance. Please use stored procedures to do paging.
Check your server environment
where did you deploy the website? many ISP will limit brandwide and IIS connection count and also CPU time to your account.
if you have RD access to your server. you can watch CPU and memory usage to see if they are high when many user comes to your site. If the site is slow and neither CPU nor memory useage is high, it may be a network brandwide problem.
Here are some simple steps to narrow down the issue -
1) Get HTTPWatch (theres a free Basic version) available and check whats really taking time from an end user perspective. Look at number of requests, number of resources downloaded, and the payload. If there is nothing to worry move on to next
2) If its not client, then its usually the processing time on the server. Jump on to DB first - since this is quite easier to eliminate quickly. Look at how many DB calls are made (run profiler in staging or dev) and see if there are any long running queries, missing indexes or statistics, and note the IO. If all is well, move on
3) Check your app code. You could get on with VS.NET in build profiler or professional tools such as Ants. If code is fine then its your network or external calls that you make, check your network bandwidth. If you still cannot narrow down, check your environment/hardware
The best way to get to it is to apply load - You could use simple tools such as ab.exe (that comes as part of Apache Web server) to have concurrent hits on your server and run the App, DB profilers in the background to get to the issue.
Hope this helps!

Adding more hardware v/s refactoring code under a time crunch

Background:
Enterprise application - very will written for its time in 2004.
Stack:
.NET, Heavy use of Remoting, ASMX style web services, SQL Server
Problem:
The application allows user to go through various wizards for lack of a better term, all of their actions are stored in what we call "wiz state", which is essentially XML that is persisted to a SQL server database very frequently because we allow users to pause/resume their application. Often in these wizards, the XML that comprises the wizard state grows very large, I'm talking 5-8 MB of data, and we noticed that when we had a sudden influx of simultaneous users, we started receiving occasional timeouts against the database, because a lot of what the wizard state is comprised of, is keeping track of collections of "things". Sometimes these custom collections grow very large.
Question:
We were in a meeting today and we're expecting a flurry of activity in October that will test the system like never before, and possibly result in huge wizard states that go back and forth from the web server to the database. The crux of the situation is that there is only one database and one web server.
For arguments sake, because of the complexity of the application, lets say adding any kind of clustering/mirroring to increase database throughput is out of the question. I spoke up in the meeting and said the quickest way to address this in the shortest time period would be to add more servers to the front end web application so the load could be distributed amongst web servers. The development lead said I was completely wrong and it would have no effect because we only have one database, so adding more web power would do nothing. He is having one of the other developers reduce the xml bloat that we persist frequently to the database. Probably in the long run, reducing the size of the xml that we pass back and forth is the right idea, but will adding additional web servers truly have no effect, I just think in terms of simultaneous users, it should help.
Any responses thoughts are appreciated, proof that more web servers would help would be pure win.
Thanks.
EDIT: We use binary serialization to store the XML in the database in an image field.
I haven't heard anything about locating the "bottlenecks". Isn't that the first thing to do? Here's the method I use.
Otherwise you're just investing in guesses. That won't work.
I've been in meetings like that, where everybody gets excited throwing ideas around, and "management" wants to make "decisions", but it's the blind leading the blind. Knuckle down and find out what's going on. You can't do that in meetings.
Some time ago I looked at a performance problem with some similarity to yours. The biggest "bottleneck" was in writing and parsing XML, with attendant memory allocation, setup, and destruction. Then there were others as well. You might find the same thing, or something different.
P.S. I keep quoting "bottleneck" because all the performance problems I've found have been nothing at all like the necks of bottles. Rather they are like way over-bushy call trees that need radical pruning, such as making and reading mountains of XML for no good reason.
If the rate at which the data is written by SQL is the bottleneck, feeding data to SQL more quickly should have no effect.
I am not sure exactly what the data structure is, but perhaps compressing the XML data on the web server(s) before writing may have a positive effect.
If the bottleneck is the database, then more web services will not help you a lot.
The problem may be that the problem is not only the size of the data, but the number of concurrent request to the same table. The number of writes will be the big problem. If your XML write is in a transaction with other queries you may try to break out the XML write from that transaction to reduce locking time of the XML table.
As stated by vdeych you may try compression to reduce the data size. (That would increase the load on the web servers.)
You may also try caching the data. Only read from the SQL server if the data is not already in the cache. Make sure you don't update the SQL server if your data has not changed.
No one seems to have suggest this, what about replacing your XML serialization of your wizard with JsonSerialization.
Not only should this give you a minor boost in performance in the serialization itself since both the DataContractSerializer (faster) and Newtonsoft Json.NET (fastest) out perform the XML serializers in .NET. This should easily reduce the size of your object graph by upwards of 50% or more (depending on number of properties vs large strings in the XML).
This should dramatically lower the IO that is inflicted upon Sql server. This should also limit the amount of scope required to alter your application significantly (assuming it's well designed and works through common calls for serialization/deserialization).
If you choose to go this route also invest time comparing BSON vs JSON as I think it would be likely that the binary encoded one will offer even more space savings (and further IO reduction) due to the size of your object graphs.
I'm not a .NET expert but maybe using a binary serialization would increase throughput. Making sure that the XML isn't stored as text (fairly obvious but thought I'd mention it). Also relational databases are best for storing relational data, so perhaps substituting an ORM layer in place of the serialization (sounds feasible) could speed things up.
Mike is spot on, without understanding the resource constaint leading to the performance issues, no amount of discussion will resolve the problem. I'll add that socket timeouts that affect running statements are a symptom, and are never imposed by SQL Server, they're an artifact of your driver configuration or a firewall or similar device between app and db imposing them (unless you're talking about timeouts for new connections, then you have a host in serious distress under load).
Given your symptom is database timeouts, you need to start there. If they're indicative of long running statements that result in a socket timeout, use SQL Server profiler to capture the workload while simultaneously monitoring system resources. Given it's a mature application and the type of workload you mention, it's unlikely to be statement tuning related, it probably boils down to resource limitations CPU, memory or disk IO capacity
This Technet guide is a very good place to start:
http://technet.microsoft.com/en-us/library/cc966540.aspx
If it's resource contention, then it's a simple discussion about how the resource contention can be tuned, configured for or addressed by adding more of whatever is needed.
Edit: I should add that given a database performance issue, more applications servers is likely to worsen the problem as you increase the amount of concurrency, that might otherwise be kept in check by connection pool, request processing or other limits.

Queuing using the Database or MSMQ?

A part of the application I'm working on is an swf that shows a test with some 80 questions. Each question is saved in SQL Server through WebORB and ASP.NET.
If a candidate finishes the test, the session needs to be validated. The problem is that sometimes 350 candidates finish their test at the same moment, and the CPU on the web server and SQL Server explodes (350 validations concurrently).
Now, how should I implement queuing here? In the database, there's a table that has a record for each session. One column holds the status. 1 is finished, 2 is validated.
I could implement queuing in two ways (as I see it, maybe you have other propositions):
A process that checks the table for records with status 1. If it finds one, it validates the session. So, sessions are validated one after one.
If a candidate finishes its session, a message is sent to a MSMQ queue. Another process listens to the queue and validates sessions one after one.
Now:
What would be the best approach?
Where do you start the process that will validate sessions? In your global.asax (application_start)? As a windows service? As an exe on the root of the website that is started in application_start?
To me, using the table and looking for records with status 1 seems the easiest way.
The MSMQ approach decouples your web-facing application from the validation logic service and the database.
This brings many advantages, a few of which:
It would be easier to handle situations where the validation logic can handle 5 sessions per second, and it receives 300 all at once. Otherwise you would have to handle copmlicated timeouts, re-attempts, etc.
It would be easier to do maintanance on the validation service, without having to interrupt the rest of the application. When the validation service is brought down, messages would queue up in MSMQ, and would get processed again as soon as it is brought up.
The same as above applies for database maintanance.
If you don't have experience using MSMQ and no infrastructrure set up, I would advice against it. Sure, it might be the "proper" way of doing queueing on the Microsoft platform, but it is not very straight-forward and has quite a learning curve.
The same goes for creating a Windows Service; don't do it unless you are familiar with it. For simple cases such as this I would argue that the pain is greater than the rewards.
The simplest solution would probably be to use the table and run the process on a background thread that you start up in global.asax. You probably also want to create an admin page that can report some status information about the process (number of pending jobs etc) and maybe a button to restart the process if it for some reason fails.
What is validating? Before working on your queuing strategy, I would try to make validating as fast as possible, including making it set based if it isn't already so.
I have recently been investigating this myself so wanted to mention my findings. The location of the Database in comparison to your application is a big factor on deciding which option is faster.
I tested inserting the time it took to insert 100 database entries versus logging the exact same data into a local MSMQ message. I then took the average of the results of performing this test several times.
What I found was that when the database is on the local network, inserting a row was up to 4 times faster than logging to an MSMQ.
When the database was being accessed over a decent internet connection, inserting a row into the database was up to 6 times slower than logging to an MSMQ.
So:
Local database - DB is faster, otherwise MSMQ is.

SQL Server query runs slower from ADO.NET than in SSMS

I have a query from a web site that takes 15-30 seconds while the same query runs in .5 seconds from SQL Server Management studio. I cannot see any locking issues using SQL Profiler, nor can I reproduce the delay manually from SSMS. A week ago, I detached and reattached the database which seemed to miraculously fix the problem. Today when the problem reared its ugly head again, I tried merely rebuilding the indexes. This also fixed the problem. However, I don't think it's necessarily an index problem since the indexes wouldn't be automatically rebuilt on a simple detach/attach, to my knowledge.
Any idea what could be causing the delay? My first thought was that perhaps some parameter sniffing on the stored procedure being called (said stored proc runs a CTE, if that matters) was causing a bad query plan, which would explain the intermittent nature of the problem. Since both detaching / reattaching and an index rebuild should theoretically invalidate the cached query plan, this makes sense, but I'm unsure how to verify this. Additionally, why wouldn't the same query (copied directly from SQL Profiler with the exact same parameters) exhibit the same delay when run manually through SSMS?
Any thoughts?
I know I am weighing in on this topic very late, but I wanted to post a solution that I found when having a similar issue. In brief, adding the SET ARITHABORT ON command at the outset of my procedures brought website query performance in line with performance seen from SQL Server tools. This option is typically being set on the connection when you run a query from QA or SSMS (you can change that option, but it is the default).
In my case, I had about 15 different stored procs doing mathematical aggregates (SUMs, COUNTs, AVGs, STDEVs) across a fairly sizeable set of data (10s to 100s of thousands of rows) - adding the SET ARITHABORT ON option moved them all from running in 3-5 seconds each to 20-30ms.
So, hopefully that helps someone else out there.
If a bad plan is cached then the same bad plan should be used from SSMS too, if you run the very same query with identical arguments.
There cannot be better solution that finding the root cause. Trying to peek and poke various settings in the hope it fixes the problem will never give you the confidence it is actually fixed. Besides, next time the system may have a different problem and you'll believe this same problem re-surfaced and apply a bad solution.
The best thing to try is to capture the bad execution plan. Showplan XML Event Class Profiler event is your friend, you can get the plan of the ADO.Net call. This is a very heavy event, so you should attach profiler and capture it only when the problem manifests itself, in a short session.
Query IO statistics can also be of help. RPC:Completed and SQL: Batch Completed events both include Reads and Writes so you can compare the amount of logical IO performed by ADO.Net invocation vs. SSMS one. Large difference (for exactly the same query and params) indicate different plans.sys.dm_exec_query_stats is another avenue of investigation. You can find your query plan(s) in there and inspect the execution stats.
All these should help establish with certitude if the problem is a bad plan or something else, to start with.
I have been having the same problem.
The only way i can fix this is setting ARITHABORT ON.
but unfortunatley when it occurs again i Have to set ARITHABORT OFF.
I have no clue what ARITHABORT has to do with this but it works, I have been having this problem for over 2 years now with still no solution. The databses i am working with are over 300GB so maybe it is a size issue...
The closest i got to resolving this problem was from an earlier post
Google Groups post
Let me know if you have managed to completely solve this problem as it is very frustrating!
Is it possible that your ADO.NET query is running after the system has been busy doing other things, so that the data it needs is no longer in RAM? And when you test on SSMS, it is?
You can check for that by running the following two commands from SSMS before you run the query:
CHECKPOINT
DBCC DROPCLEANBUFFERS
If that causes the SSMS query to run slowly, then there are some tricks you can play on the ADO.NET side to help it run faster.
Simon Sabin has a great session on "when a query plan goes wrong" ( http://sqlbits.com/Sessions/Event5/When_a_query_plan_goes_wrong ) that discusses how to address this issue within procs by using various "optimize for" hints and such to help a proc generate a consistent plan and not use the default parameter sniffing.
However I've got an issue with and ad-hoc query (not in a proc) where the SSMS plan and the ASP plan are exactly the same - clustered index / table scan - and yet the ASP query takes 3+ minutes instead of 1 second. (In this case table-scan happens to be a decent answer for fetching the results.)
Anyone care to explain that one?

Using static data in ASP.NET vs. database calls?

We are developing an ASP.NET HR Application that will make thousands of calls per user session to relatively static database tables (e.g. tax rates). The user cannot change this information, and changes made at the corporate office will happen ~once per day at most (and do not need to be immediately refreshed in the application).
About 2/3 of all database calls are to these static tables, so I am considering just moving them into a set of static objects that are loaded during application initialization and then refreshed every 24 hours (if the app has not restarted during that time). Total in-memory size would be about 5MB.
Am I making a mistake? What are the pitfalls to this approach?
From the info you present, it looks like you definitely should cache this data -- rarely changing and so often accessed. "Static" objects may be inappropriate, though: why not just access the DB whenever the cached data is, say, more than N hours old?
You can vary N at will, even if you don't need special freshness -- even hitting the DB 4 times or so per day will be much better than "thousands [of times] per user session"!
Best may be to keep with the DB info a timestamp or datetime remembering when it was last updated. This way, the check for "is my cache still fresh" is typically very light weight, just get that "latest update" info and check it with the latest update on which you rebuilt the local cache. Kind of like an HTTP "if modified since" caching strategy, except you'd be implementing most of it DB-client-side;-).
If you decide to cache the data (vs. make a database call each time), use the ASP.NET Cache instead of statics. The ASP.NET Cache provides functionality for expiry, handles multiple concurrent requests, it can even invalidate the cache automatically using the query notification features of SQL 2005+.
If you use statics, you'll probably end up implementing those things anyway.
There are no drawbacks to using the ASP.NET Cache for this. In fact, it's designed for caching data too (see the SqlCacheDependency class http://msdn.microsoft.com/en-us/library/system.web.caching.sqlcachedependency.aspx).
With caching, a dbms is plenty efficient with static data anyway, especially only 5M of it.
True, but the point here is to avoid the database roundtrip at all.
ASP.NET Cache is the right tool for this job.
You didnt state how you will be able to find the matching data for a user. If it is as simple as finding a foreign key in the cached set then you dont have to worry.
If you implement some kind of filtering/sorting/paging or worst searching then you might at some point miss the quereing capabilities of SQL.
ORM often have their own quereing and linq makes things easy to, but it is still not SQL.
(try to group by 2 columns)
Sometimes it is a good way to have the db return the keys of a resultset only and use the Cache to fill the complete set.
Think: Premature Optimization. You'll still need to deal with the data as tables eventually anyway, and you'd be leaving an "unusual design pattern".
With event default caching, a dbms is plenty efficient with static data anyway, especially only 5M of it. And the dbms partitioning you're describing is often described as an antipattern. One example: multiple identical databases for multiple clients. There are other questions here on SO about this pattern. I understand there are security issues, but doing it this way creates other security issues. I've recently seen this same concept in a medical billing database (even more highly sensitive) that ultimately had to be refactored into a single database.
If you do this, then I suggest you at least wait until you know it's solving a real problem, and then test to measure how much difference it makes. There are lots of opportunities here for Unintended Consequences.

Resources