Microsoft Indexing Service and OLEDB With Paging - asp.net

I have an ASP.net 2.0 intranet site that uses the indexing service on a folder and its contents.
OLEDB is used to query the files in this folder by using the same technique as discussed here.
This was written by another developer but i am starting to understand his way of working.
But now the clients are complaining about the long loadtime of the page because all files in the folder are queried at once. They are right about the fact that it's slow so i considered using paging (Like in linq Skip().Take()). I know that in SQL this translates as:
SELECT col1, col2
FROM
(
SELECT col1, col2, ROW_NUMBER() OVER (ORDER BY ID) AS RowNum
FROM MyTable
)
AS MyDerivedTable
WHERE MyDerivedTable.RowNum BETWEEN #startRow AND #endRow
But for some reason this does not work when used with OLEDB.
Which version of SQL does this use or do any of you got a suggestiong on how to implement the paging?
EDIT:
Because the above method is only available when using sql Server 2005 or higher, i am going to try a method prior to 2005. I think OLEDB doesn't support Row_Number() or Over.
Going to try:
SELECT ... FROM Table WHERE PK IN
(SELECT TOP #PageSize PK FROM Table WHERE PK NOT IN
(SELECT TOP #StartRow PK FROM Table ORDER BY SortColumn)
ORDER BY SortColumn)
ORDER BY SortColumn

Seems like MSIDXS doesn't support much SQL functions.
Only the basics like "Select", "Where", "Order by" works. The other functions like "Top", "Rowcount", "Over" don't work. It even fails on "Count(*)".
I implemented paging by using the DataAdapter.Fill() method with 2 integers; startrecord and maxrecord. This is not ideal but the best in this case solution.
Now all records will be collected but only those i need will be stored in the dataset which then is converted to a collection of my own class.
This works fast for the first pages because only the first rows will be looped and returned. But when you have 20 pages the last page will take longer because all the records before it will be looped.
I tested this with a page size of 20 and 400 results.
The first page took 200ms while the last page took around 1,6 seconds.
A noticeable lag but now it only takes place on the last pages and not on the first 10.
There is a search and sorting mechanism so the last pages won't be visited that much.

Related

Poor SP performance from ASP.NET

I have a stored procedure that handles sorting, filtering and paging (using Row_Number) and some funky trickery :) The SP is running against a table with ~140k rows.
The whole thing works great and for at least the first few dozen pages is super quick.
However, if I try to navigate to higher pages (e.g. head to the last page of 10k) the whole thing comes to a grinding halt and results in a SQL timeout error.
If I run the same query, using the same parms inside studio manager query window, the response is instant irrespective of the page number I pass in.
At the moment it's test code that is simply binding to a ASP:Datagrid in .NET 3.5
The SP looks like this:
BEGIN
WITH Keys
AS (
SELECT
TOP (#PageNumber * #PageSize) ROW_NUMBER() OVER (ORDER BY JobNumber DESC) as rn
,P1.jobNumber
,P1.CustID
,P1.DateIn
,P1.DateDue
,P1.DateOut
FROM vw_Jobs_List P1
WHERE
(#CustomerID = 0 OR CustID = #CustomerID) AND
(JobNumber LIKE '%'+#FilterExpression+'%'
OR OrderNumber LIKE '%'+#FilterExpression+'%'
OR [Description] LIKE '%'+#FilterExpression+'%'
OR Client LIKE '%'+#FilterExpression+'%')
ORDER BY P1.JobNumber DESC ),SelectedKeys
AS (
SELECT
TOP (#PageSize)SK.rn
,SK.JobNumber
,SK.CustID
,SK.DateIn
,SK.DateDue
,SK.DateOut
FROM Keys SK
WHERE SK.rn > ((#PageNumber-1) * #PageSize)
ORDER BY SK.JobNumber DESC)
SELECT
SK.rn
,J.JobNumber
,J.Description
,J.Client
,SK.CustID
,OrderNumber
,CAST(DateAdd(d, -2, CAST(isnull(SK.DateIn,0) AS DateTime)) AS nvarchar) AS DateIn
,CAST(DateAdd(d, -2, CAST(isnull(SK.DateDue,0) AS DateTime)) AS nvarchar) AS DateDue
,CAST(DateAdd(d, -2, CAST(isnull(SK.DateOut,0) AS DateTime)) AS nvarchar) AS DateOut
,Del_Method
,Ticket#
,InvoiceEmailed
,InvoicePrinted
,InvoiceExported
,InvoiceComplete
,JobStatus
FROM SelectedKeys SK
JOIN vw_Jobs_List J ON j.JobNumber=SK.JobNumber
ORDER BY SK.JobNumber DESC
END
And it's called via
sp_jobs (PageNumber,PageSize,FilterExpression,OrderBy,CustomerID)
e.g.
sp_Jobs '13702','10','','JobNumberDESC','0'
Can anyone shed any light on what might be the cause of the dramatic difference in performance between SQL query window and an asp.net page executing a dataset?
Check out the "WITH RECOMPILE" option
http://www.techrepublic.com/article/understanding-sql-servers-with-recompile-option/5662581
I have run into similar problems where the execution plan on stored procedures will work great for a while, but then get a new plan because the options changed. So, it will be "optimized" for one case and then perform "table scans" for another option. Here is what I have tried in the past:
Re-execute the stored procedure to calculate a new execution plan and then keep an eye on it.
Break up the stored procedure into separate stored procedures of each option such that it can be optimized and then the overall stored procedure simply calls each "optimized" stored procedure.
Bring in the records into an object and then perform all of the "funky trickery" in code and then it gives you the option to "cache" the results.
Obviously option #2 and #3 is better than option #1. I am honestly finding option #3 is becoming the best bet in most cases.
I just had another option 4. You could instead of performing your "inner selects" in one query, you could put the results of your inner selects into temporary tables and then JOIN on those results. I would still push for option #3 if possible, but I understand that sometimes you just need to keep working the stored procedure until it "works".
Good luck.

ASP.NET and SQL server with huge data sets

I am developing a web application in ASP.NET and on one page I am using a ListView with paging. As a test I populated the table it draws from with 6 million rows.
The table and a schema-bound view based off it have all the necessary indexes and executing the query in SQL Server Management Studio with SELECT TOP 5 returned in < 1 second as expected.
But on the ASP.NET page, with the same query, it seems to be selecting all 6 million rows without any limit. Shouldn't the paging control limit the query to return only N rows rather than the entire data set? How can I use these ASP.NET controls to handle huge data sets with millions of records? Does SELECT [columns] FROM [tablename] quite literally mean that for the ListView, and it doesn't actually inject a TOP <n> and does all the pagination at the application level rather than the database level?
When you enable paging, the paging control, datasource, grid, etc. will limit the number of rows displayed by the control. However, they definitely will not limit the number of rows returned by the SELECT statement.
You should be using DataObjectSource as the control's data source and have it call a class method that executes a SELECT statement that only returns the necessary rows. Otherwise, your performance will be horrible.
Unfortunately, SQL Server doesn't support any type of RANGE clause and the required SQL isn't very pretty. But it is absolutely necessary.
http://www.asp.net/data-access/tutorials/efficiently-paging-through-large-amounts-of-data-cs

Why does this query timeout? V2

This question is a followup to This Question
The solution, clearing the execution plan cache seemed to work at the time, but i've been running into the same problem over and over again, and clearing the cache no longer seems to help. There must be a deeper problem here.
I've discovered that if I remove the .Distinct() from the query, it returns rows (with duplicates) in about 2 seconds. However, with the .Distinct() it takes upwards of 4 minutes to complete. There are a lot of rows in the tables, and some of the where clause fields do not have indexes. However, the number of records returned is fairly small (a few dozen at most).
The confusing part about it is that if I get the SQL generated by the Linq query, via Linqpad, then execute that code as SQL or in SQL Management Studio (including the DISTINCT) it executes in about 3 seconds.
What is the difference between the Linq query and the executed SQL?
I have a short term workaround, and that's to return the set without .Distinct() as a List, then using .Distinct on the list, this takes about 2 seconds. However, I don't like doing SQL Server work on the web server.
I want to understand WHY the Distinct is 2 orders of magnitude slower in Linq, but not SQL.
UPDATE:
When executing the code via Linq, the sql profiler shows this code, which is basically identical query.
sp_executesql N'SELECT DISTINCT [t5].[AccountGroupID], [t5].[AccountGroup]
AS [AccountGroup1]
FROM [dbo].[TransmittalDetail] AS [t0]
INNER JOIN [dbo].[TransmittalHeader] AS [t1] ON [t1].[TransmittalHeaderID] =
[t0].[TransmittalHeaderID]
INNER JOIN [dbo].[LineItem] AS [t2] ON [t2].[LineItemID] = [t0].[LineItemID]
LEFT OUTER JOIN [dbo].[AccountType] AS [t3] ON [t3].[AccountTypeID] =
[t2].[AccountTypeID]
LEFT OUTER JOIN [dbo].[AccountCategory] AS [t4] ON [t4].[AccountCategoryID] =
[t3].[AccountCategoryID]
LEFT OUTER JOIN [dbo].[AccountGroup] AS [t5] ON [t5].[AccountGroupID] =
[t4].[AccountGroupID]
LEFT OUTER JOIN [dbo].[AccountSummary] AS [t6] ON [t6].[AccountSummaryID] =
[t5].[AccountSummaryID]
WHERE ([t1].[TransmittalEntityID] = #p0) AND ([t1].[DateRangeBeginTimeID] = #p1) AND
([t1].[ScenarioID] = #p2) AND ([t6].[AccountSummaryID] = #p3)',N'#p0 int,#p1 int,
#p2 int,#p3 int',#p0=196,#p1=20100101,#p2=2,#p3=0
UPDATE:
The only difference between the queries is that Linq executes it with sp_executesql and SSMS does not, otherwise the query is identical.
UPDATE:
I have tried various Transaction Isolation levels to no avail. I've also set ARITHABORT to try to force a recompile when it executes, and no difference.
The bad plan is most likely the result of parameter sniffing: http://blogs.msdn.com/b/queryoptteam/archive/2006/03/31/565991.aspx
Unfortunately there is not really any good universal way (that I know of) to avoid that with L2S. context.ExecuteCommand("sp_recompile ...") would be an ugly but possible workaround if the query is not executed very frequently.
Changing the query around slightly to force a recompile might be another one.
Moving parts (or all) of the query into a view*, function*, or stored procedure* DB-side would be yet another workaround.
 * = where you can use local params (func/proc) or optimizer hints (all three) to force a 'good' plan
Btw, have you tried to update statistics for the tables involved? SQL Server's auto update statistics doesn't always do the job, so unless you have a scheduled job to do that it might be worth considering scripting and scheduling update statistics... ...tweaking up and down the sample size as needed can also help.
There may be ways to solve the issue by adding* (or dropping*) the right indexes on the tables involved, but without knowing the underlying db schema, table size, data distribution etc that is a bit difficult to give any more specific advice on...
 * = Missing and/or overlapping/redundant indexes can both lead to bad execution plans.
The SQL that Linqpad gives you may not be exactly what is being sent to the DB.
Here's what I would suggest:
Run SQL Profiler against the DB while you execute the query. Find the statement which corresponds to your query
Paste the whole statment into SSMS, and enable the "Show Actual Execution Plan" option.
Post the resulting plan here for people to dissect.
Key things to look for:
Table Scans, which usually imply that an index is missing
Wide arrows in the graphical plan, indicating lots of intermediary rows being processed.
If you're using SQL 2008, viewing the plan will often tell you if there are any indexes missing which should be added to speed up the query.
Also, are you executing against a DB which is under load from other users?
At first glance there's a lot of joins, but I can only see one thing to reduce the number right away w/out having the schema in front of me...it doesn't look like you need AccountSummary.
[t6].[AccountSummaryID] = #p3
could be
[t5].[AccountSummaryID] = #p3
Return values are from the [t5] table. [t6] is only used filter on that one parameter which looks like it is the Foreign Key from t5 to t6, so it is present in [t5]. Therefore, you can remove the join to [t6] altogether. Or am I missing something?
Are you sure you want to use LEFT OUTER JOIN here? This query looks like it should probably be using INNER JOINs, especially because you are taking the columns that are potentially NULL and then doing a distinct on it.
Check that you have the same Transaction Isolation level between your SSMS session and your application. That's the biggest culprit I've seen for large performance discrepancies between identical queries.
Also, there are different connection properties in use when you work through SSMS than when executing the query from your application or from LinqPad. Do some checks into the Connection properties of your SSMS connection and the connection from your application and you should see the differences. All other things being equal, that could be the difference. Keep in mind that you are executing the query through two different applications that can have two different configurations and could even be using two different database drivers. If the queries are the same then that would be only differences I can see.
On a side note if you are hand-crafting the SQL, you may try moving the conditions from the WHERE clause into the appropriate JOIN clauses. This actually changes how SQL Server executes the query and can produce a more efficient execution plan. I've seen cases where moving the filters from the WHERE clause into the JOINs caused SQL Server to filter the table earlier in the execution plan and significantly changed the execution time.

ASP.NET, SQL 2005 "paging"

This is a followup on the question:
ASP.NET next/previous buttons to display single row in a form
As it says on the page above, theres a previous/next button on the page, that retrieves a single row one at a time.
Totally there's ~500,000 rows.
When I "page" through each subscribtion number, the form gets filled with subscriber details. What approach should I use on the SQL server?
Using the ROW_NUMBER() function seems a bit overkill as it has to number all ~500.000 rows (I guess?), so what other possible solutions are there?
Thanks in advance!
ROW_NUMBER() is probably your best choice.
From this MSDN article: http://msdn.microsoft.com/en-us/library/ms186734.aspx
WITH OrderedOrders AS
(
SELECT SalesOrderID, OrderDate,
ROW_NUMBER() OVER (ORDER BY OrderDate) AS 'RowNumber'
FROM Sales.SalesOrderHeader
)
SELECT *
FROM OrderedOrders
WHERE RowNumber BETWEEN 50 AND 60;
And just subsititute 50 and 60 with a parameter for the row number you want.
Tommy, if your user has time to page through 500,000 rows at one page per row, then he/she is unique.
I guess what I am saying here is that you may be able to provide a better UX. When - Too many pages? Build a search feature.
There are two potential workarounds (for this purpose, using a start of 201, pages of 100):
SQL
SELECT TOP 100 * FROM MyTable WHERE ID > 200 ORDER BY ID
LINQ to SQL
var MyRows = (from t in db.Table
order by t.ID ascending
select t).Skip(200).Take(100)
If your ID field has a clustered index, use the former. If not, both of these will take the same amount of time (LINQ returns 500,000 rows, then skips, then takes).
If you're sorting by something that's NOT ID and you have it indexed, use ROW_NUMBER().
Edit: Because the OP isn't sorting by ID, the only solution is ROW_NUMBER(), which is the clause that I put at the end there.
In this case, the table isn't indexed, so please see here for ideas on how to index to improve query performance.

Without stored procedures, how do you page result sets in ASP.NET?

Without stored procedures, how do you page result sets retrieved from SQL Server in ASP.NET?
You could use LINQ, for instance:
var customerPage = dataContext.Customers.Skip(50).Take(25);
and then display those 25 customers.
See Scott Guthrie's excellent Using LINQ-to-SQL - section 6 - retrieve products with server side paging.
Another option (on SQL Server 2005 and up) is to use ordered CTE's (Common Table Expression) - something like this:
WITH CustomerCTE AS
(
SELECT CustomerID,
ROW_NUMBER() OVER (ORDER BY CustomerID DESC) AS 'RowNum'
FROM Customers
)
SELECT * FROM CustomerCTE
WHERE rownum BETWEEN 150 AND 200
You basically define a CTE over your sort critiera using the ROW_NUMBER function, and then you can pick any number of those at will (here: those between 150 and 200). This is very efficient and very useful server-side paging. Join this CTE with your actual data tables and you can retrieve anything you need!
Marc
PS: okay, so the OP only has SQL Server 2000 at hand, so the CTE won't work :-(
If you cannot update to either SQL Server 2005, or .NET 3.5, I'm afraid your only viable option really is stored procedures. You could do something like this - see this blog post Efficient and DYNAMIC Server-Side paging with SQL Server 2000, or Paging with SQL Server Stored Procedures
The best is to use an ORM which will generate dynamic paging code for you - LINQ To SQL, NHibernate, Entity Framework, SubSonic, etc.
If you have a small result set, you can page on the server using either DataPager, PagedDataSource, or manually using LINQ Skip and Take commands.
(new answer since you're using SQL Server 2000, .NET 2.0, and don't want to use an ORM)
There are two ways to handle paging in SQL Server 2000:
If you have a ID column that's sequential with no holes, you can execute a SQL string that says something like SELECT Name, Title FROM Customers WHERE CustomerID BETWEEN #low and #high - #low and #high being parameters which are calculated based on the page size and page that you're on. More info on that here.
If you don't have a sequential ID, you end up using a minimum ID and ##rowcount to select a range. For instance, SET ##rowcount 20; SELECT Name, Title FROM Customers WHERE CustomerID > #low' - either calculating #low from the page size and page or from the last displayed CustomerID. There's some info on that approach here.
If you have a small dataset, you can page through it in .NET code, but it's less efficient. I'd recommend the PagedDataSource, but if you want to write it yourself you can just read your records from a SqlDataReader into an Array and then use the Array.Range function to page through it.
This is how I handled all of my paging and sorting with AJAX in my ASP.NET 2.0 application.
http://programming.top54u.com/post/AJAX-GridView-Paging-and-Sorting-using-C-sharp-in-ASP-Net.aspx
Well my general approach is usually to create two tables for the results to be paged. The first is an info table that has a search id identity column and has the min and max row numbers. The second table contains the actual results and has an identity column for the row number. I insert into the second table and get the min and max rows and store them in the first table. Then I can page through by selecting just the rows I want. I usually expire the results after 24 hours by using code right before the insert. I usually use a stored procedure to do the inserts for me but you could do it without a stored procedure.
This has the advantage of only performing the more complex sql search once. And the dataset won't change between page displays. It is a snapshot of the data. It also can ease server side sorting. I just have to select those rows in order and reinsert into the second table.

Resources