How would one go about profiling a few queries that are being run from an ASP.NET application? There is some software where I work that runs extremely slow because of the database (I think). The tables have indexes but it still drags because it's working with so much data. How can I profile to see where I can make a few minor improvements that will hopefully lead to larger speed improvements?
Edit: I'd like to add that the webserver likes to timeout during these long queries.
Sql Server has some excellent tools to help you with this situation. These tools are built into Management Studio (which used to be called Enterprise Manager + Query Analyzer).
Use SQL Profiler to show you the actual queries coming from the web application.
Copy each of the problem queries out (the ones that eat up lots of CPU time or IO). Run the queries with "Display Actual Execution Plan". Hopefully you will see some obvious index that is missing.
You can also run the tuning wizard (the button is right next to "display actual execution plan". It will run the query and make suggestions.
Usually, if you already have indexes and queries are still running slow, you will need to re-write the queries in a different way.
Keeping all of your queries in stored procedures makes this job much easier.
To profile SQL Server, use the SQL Profiler.
And you can use ANTS Profiler from Red Gate to profile your code.
Another .NET profiler which plays nicely with ASP.NET is dotTrace. I have personally used it and found lots of bottlenecks in my code.
I believe you have the answer you need to profile the queries. However, this is the easiest part of performance tuning. Once you know it is the queries and not the network or the app, how do you find and fix the problem?
Performance tuning is a complex thing. But there some places to look at first. You say you are returning lots of data? Are you returning more data than you need? Are you really returning only the columns and records you need? Returning 100 columns by using select * can be much slower than returning the 5 columns you are actually using.
Are your indexes and statistics up-to-date? Look up how to update statisistcs and re-index in BOL if you haven't done this in a awhile. Do you have indexes on all the join fields? How about the fields in the where clause.
Have you used a cursor? Have you used subqueries? How about union-if you are using it can it be changed to union all?
Are your queries sargable (google if unfamiliar with the term.)
Are you using distinct when you could use group by?
Are you getting locks?
There are many other things to look at these are just a starting place.
If there is a particular query or stored procedure I want to tune, I have found turning on statistics before the query to be very useful:
SET STATISTICS TIME ON
SET STATISTICS IO ON
When you turn on statistics in Query Analyzer, the statistics are shown in the Messages tab of the Results pane.
IO statistics have been particularly useful for me, because it lets me know if I might need an index. If I see a high read count from the IO statistics, I might try adding different indexes to the affected tables. As I try an index, I run the query again to see if the read count has gone down. After a few iterations, I can usually find the best index(es) for the tables involved.
Here are links to MSDN for these statistics commands:
SET STATISTICS TIME
SET STATISTICS IO
Related
We've developed a system with a search screen that looks a little something like this:
(source: nsourceservices.com)
As you can see, there is some fairly serious search functionality. You can use any combination of statuses, channels, languages, campaign types, and then narrow it down by name and so on as well.
Then, once you've searched and the leads pop up at the bottom, you can sort the headers.
The query uses ROWNUM to do a paging scheme, so we only return something like 70 rows at a time.
The Problem
Even though we're only returning 70 rows, an awful lot of IO and sorting is going on. This makes sense of course.
This has always caused some minor spikes to the Disk Queue. It started slowing down more when we hit 3 million leads, and now that we're getting closer to 5, the Disk Queue pegs for up to a second or two straight sometimes.
That would actually still be workable, but this system has another area with a time-sensitive process, lets say for simplicity that it's a web service, that needs to serve up responses very quickly or it will cause a timeout on the other end. The Disk Queue spikes are causing that part to bog down, which is causing timeouts downstream. The end result is actually dropped phone calls in our automated VoiceXML-based IVR, and that's very bad for us.
What We've Tried
We've tried:
Maintenance tasks that reduce the number of leads in the system to the bare minimum.
Added the obvious indexes to help.
Ran the index tuning wizard in profiler and applied most of its suggestions. One of them was going to more or less reproduce the entire table inside an index so I tweaked it by hand to do a bit less than that.
Added more RAM to the server. It was a little low but now it always has something like 8 gigs idle, and the SQL server is configured to use no more than 8 gigs, however it never uses more than 2 or 3. I found that odd. Why isn't it just putting the whole table in RAM? It's only 5 million leads and there's plenty of room.
Poured over query execution plans. I can see that at this point the indexes seem to be mostly doing their job -- about 90% of the work is happening during the sorting stage.
Considered partitioning the Leads table out to a different physical drive, but we don't have the resources for that, and it seems like it shouldn't be necessary.
In Closing...
Part of me feels like the server should be able to handle this. Five million records is not so many given the power of that server, which is a decent quad core with 16 gigs of ram. However, I can see how the sorting part is causing millions of rows to be touched just to return a handful.
So what have you done in situations like this? My instinct is that we should maybe slash some functionality, but if there's a way to keep this intact that will save me a war with the business unit.
Thanks in advance!
Database bottlenecks can frequently be improved by improving your SQL queries. Without knowing what those look like, consider creating an operational data store or a data warehouse that you populate on a scheduled basis.
Sometimes flattening out your complex relational databases is the way to go. It can make queries run significantly faster, and make it a lot easier to optimize your queries, since the model is very flat. That may also make it easier to determine if you need to scale your database server up or out. A capacity and growth analysis may help to make that call.
Transactional/highly normalized databases are not usually as scalable as an ODS or data warehouse.
Edit: Your ORM may have optimizations as well that it may support, that may be worth looking into, rather than just looking into how to optimize the queries that it's sending to your database. Perhaps bypassing your ORM altogether for the reports could be one way to have full control over your queries in order to gain better performance.
Consider how your ORM is creating the queries.
If you're having poor search performance perhaps you could try using stored procedures to return your results and, if necessary, multiple stored procedures specifically tailored to which search criteria are in use.
determine which ad-hoc queries will most likely be run or limit the search criteria with stored procedures.. can you summarize data?.. treat this
app like a data warehouse.
create indexes on each column involved in the search to avoid table scans.
create fragments on expressions.
periodically reorg the data and update statistics as more leads are loaded.
put the temporary files created by queries (result sets) in ramdisk.
consider migrating to a high-performance RDBMS engine like Informix OnLine.
Initiate another thread to start displaying N rows from the result set while the query
continues to execute.
I've nearly finished the development of a project and would like to test its performance, especially the database query calls. I'm using Linq to SQL to search via usernames, but I've only got around 10 'users' in my database, so I can't really get a decent speed reading. How can I simulate thousands/millions of users in the database without actually creating new records? I've read about Selenium, but it seems that is good for repeat actions (simulating concurrent users?). Are there any other tools I should look into, or are there any options in VS 2008 (Professional Edition)?
Thanks
You can "trick" SQL Server into thinking there are more records than there actually are in a table using the approach outlined in this article. See the section on False SQL Server Statistics
e.g.
UPDATE STATISTICS TableName WITH ROWCOUNT=100000
will create statistics for the table as if it has 100000 rows in. You can then see what effect this has on the execution plan. But note this is undocumented functionality as so it may give quirky behaviour.
You could just populate your table with sample data. There's various tools available to help out with that like, Red Gate's SQL Data Generator. I prefer actually having large data volumes as I think that is what will be more accurate.
I have a general inquiry related to processing rows from a query. In general, I always try to format/process my rows in SQL itself, using numerous CASE WHEN statements to pre-format my db result, limiting rows and filling columns based on other columns.
However, you can also opt to just select all your rows and do the post-processing in code (asp.NET in my case). What do you guys think is the best approach in terms of performance?
Thanks in advance,
Stijn
I would recommend doing the processing in the code, unless you have network bandwidth considerations. The simple reason for this is that is is generally easier to make code changes than database changes. Furthermore, performance is more often related to the actual database query and disk access rather than the amount of data returned.
However, I'm assuming that your are referring to "minor" formatting changes to the result. Standard where clauses should naturally be done in the database.
Context
My current project is a large-ish public site (2 million pageviews per day) site running a mixture of asp classic and asp.net with a SQL Server 2005 back-end. We're heavy on reads, with occasional writes and virtually no updates/deletes. Our pages typically concern a single 'master' object with a stack of dependent (detail) objects.
I like the idea of returning all the data required for a page in a single proc (and absolutely no unnecesary data). True, this requires a dedicated proc for such pages, but some pages receive double-digit percentages of our overall site traffic so it's worth the time/maintenance hit. We typically only consume multiple-recordsets from our .net code, using System.Data.SqlClient.SqlDataReader and it's NextResult method. Oh, yeah, I'm not doing any updates/inserts in these procs either (except to table variables).
The question
SQL Server (2005) procs which return multiple recordsets are working well (in prod) for us so far but I am a little worried that multi-recordset procs are my new favourite hammer that i'm hitting every problem (nail) with. Are there any multi-recordset sql server proc gotchas I should know about? Anything that's going to make me wish I hadn't used them? Specifically anything about it affecting connection pooling, memory utilization etc.
Here's a few gotchas for multiple-recordset stored procs:
They make it more difficult to reuse code. If you're doing several queries, odds are you'd be able to reuse one of those queries on another page.
They make it more difficult to unit test. Every time you make a change to one of the queries, you have to test all of the results. If something changed, you have to dig through to see which query failed the unit test.
They make it more difficult to tune performance later. If another DBA comes in behind you to help performance improve, they have to do more slicing and dicing to figure out where the problems are coming from. Then, combine this with the code reuse problem - if they optimize one query, that query might be used in several different stored procs, and then they have to go fix all of them - which makes for more unit testing again.
They make error handling much more difficult. Four of the queries in the stored proc might succeed, and the fifth fails. You have to plan for that.
They can increase locking problems and incur load in TempDB. If your stored procs are designed in a way that need repeatable reads, then the more queries you stuff into a stored proc, the longer it's going to take to run, and the longer it's going to take to return those results back to your app server. That increased time means higher contention for locks, and the more SQL Server has to store in TempDB for row versioning. You mentioned that you're heavy on reads, so this particular issue shouldn't be too bad for you, but you want to be aware of it before you reuse this hammer on a write-intensive app.
I think multi recordset stored procedures are great in some cases, and it sounds like yours maybe one of them.
The bigger (more traffic), you site gets, the more important that 'extra' bit of performance is going to matter. If you can combine 2-3-4 calls (and possibly a new connections), to the database in one, you could be cutting down your database hits by 4-6-8 million per day, which is substantial.
I use them sparingly, but when I have, I have never had a problem.
I would recommend having invoking in one stored procedure several inner invocations of stored procedures that return 1 resultset each.
create proc foo
as
execute foobar --returns one result
execute barfoo --returns one result
execute bar --returns one result
That way when requirments change and you only need the 3rd and 5th result set, you have a easy way to invoke them without adding new stored procedures and regenerating your data access layer. My current app returns all reference tables (e.g. US states table) if I want them or not. Worst is when you need to get a reference table and the only access is via a stored procedure that also runs an expensive query as one of its six resultsets.
Is it quicker to make one trip to the database and bring back 3000+ plus rows, then manipulate them in .net & LINQ or quicker to make 6 calls bringing back a couple of 100 rows at a time?
It will entirely depend on the speed of the database, the network bandwidth and latency, the speed of the .NET machine, the actual queries etc.
In other words, we can't give you a truthful general answer. I know which sounds easier to code :)
Unfortunately this is the kind of thing which you can't easily test usefully without having an exact replica of the production environment - most test environments are somewhat different to the production environment, which could seriously change the results.
Is this for one user, or will many users be querying the data? The single database call will scale better under load.
Speed is only one consideration among many.
How flexible is your code? How easy is it to revise and extend when the requirements change? How easy is it for another person to read and maintain your code? How portable is your code? what if you change to a diferent DBMS, or a different progamming language? Are any of these considerations important in your case?
Having said that, go for the single round trip if all other things are equal or unimportant.
You mentioned that the single round trip might result in reading data you don't need. If all the data you need can be described in a single result table, then it should be possible to devise a query that will get that result. That result table might deliver some result data in more than one row, if the query denormalizes the data. In that case, you might gain some speed by obtaining the data in several result tables, and composing the result yourself.
You haven't given enough information to know how much programming effort it will be to compose a single query or to compose the data returned by 6 queries.
As others have said, it depends.
If you know which 6 SQL statements you're going to execute beforehand, you can bundle them into one call to the database, and return multiple result sets using ADO or ADO.NET.
http://support.microsoft.com/kb/311274
the problem I have here is that I need it all, i just need it displayed separately...
The answer to your question is 1 query for 3000 rows is better than 6 queries for 500 rows. (given that you are bringing all 3000 rows back regardless)
However, there's no way you're going (to want) to display 3000 rows at a time, is there? In all likelihood, irrespective of using Linq, you're going to want to run aggregating queries and get the database to do the work for you. You should hopefully be able to construct the SQL (or Linq query) to perform all required logic in one shot.
Without knowing what you're doing, it's hard to be more specific.
* If you absolutely, positively need to bring back all the rows, then investigate the ToLookup() method for your linq IQueryable< T >. It's very handy for grouping results in non-standard ways.
Oh, and I highly recommend LINQPad (free) for trying out queries with Linq. It has loads of examples, and it also shows you the sql and lambda forms so you can familiarize yourself with Linq<->lambda form<->Sql.
Well, the answer is always "it depends". Do you want to optimize on the database load or on the application load?
My general answer in this case would be to use as specific queries as possible at the database level, therefore using 6 calls.
Thx
I was kind of thinking "ball park", but it sounds as though its a choice thing...the difference is likely small.
I was thinking that getting all the data and manipulating in .net would be the best - I have nothing concrete to base this on (hence the question), I just tend to feel that calls to the DB are expensive and if I know i need all the data...get it in one hit?!?
Part of the problem is that you have not provided sufficient information to give you a precise answer. Obviously, available resources need to be considered.
If you pull 3000 rows infrequently, it might work for you in the short term. However, if there are say 10,000 people that execute the same query (ignoring cache effects), this could become a problem for both the app and db.
Now in the case of something like pagination, it makes sense to pull in just what you need. But that would be a general rule to try to only pull what is necessary. It's much more elegant to use a scalpel instead of a broadsword. =)
If you are talking about a query that has already been run by SQL (so optimized by SQL Server), working with LINQ or a SqlDataReader might actually have the same performance.
The only difference will be "how hard will it be to maintain your code?"
LINQ doesn't query anything to the database until you ask for the result with ".ToList()" or ".ToArray()" or even ".Count()". LINQ is dynamically building your query so it is exactly the same as having a SqlDataReader but with runtime verification.
Rather than speculating, why don't you try both and measure the results?
It depends
1) if your connector implementation precaches a lot of objects AND you have big rows (for example blobs, contry polygons etc.) you have a problem, you have to download a LOT of data. I've optimalized once a code that had this problem and it was just downloading some megs of garbage all the time via localhost, and my software runs now 10 times faster because i removed the precaching by an option
2) If your rows are small and you have a good chance that you need to read through all the 3000, you're better going on a big resultset
3) If you don't use prepared statements, all queries have to be parsed! Big resultset might be better.
Hope it helped
I always stick to the rule of "bring in what I need" and nothing more...the problem I have here is that I need it all, I just need it displayed separately.
So say...
I have a table with userid and typeid. I want to display all records with a userid, and display on the page in grids say separated by typeid.
At the moment I call sproc that does "select field1, field2 from tab where userid=1",
then on the page set the datasource of a grid to from t in tab where typeid=2 select t;
Rather than calling a different sproc "select field1, field2 from tab where userid=1 and typeid=2" 6 times.
??