ASP.NET, SQL 2005 "paging" - asp.net

This is a followup on the question:
ASP.NET next/previous buttons to display single row in a form
As it says on the page above, theres a previous/next button on the page, that retrieves a single row one at a time.
Totally there's ~500,000 rows.
When I "page" through each subscribtion number, the form gets filled with subscriber details. What approach should I use on the SQL server?
Using the ROW_NUMBER() function seems a bit overkill as it has to number all ~500.000 rows (I guess?), so what other possible solutions are there?
Thanks in advance!

ROW_NUMBER() is probably your best choice.
From this MSDN article: http://msdn.microsoft.com/en-us/library/ms186734.aspx
WITH OrderedOrders AS
(
SELECT SalesOrderID, OrderDate,
ROW_NUMBER() OVER (ORDER BY OrderDate) AS 'RowNumber'
FROM Sales.SalesOrderHeader
)
SELECT *
FROM OrderedOrders
WHERE RowNumber BETWEEN 50 AND 60;
And just subsititute 50 and 60 with a parameter for the row number you want.

Tommy, if your user has time to page through 500,000 rows at one page per row, then he/she is unique.
I guess what I am saying here is that you may be able to provide a better UX. When - Too many pages? Build a search feature.

There are two potential workarounds (for this purpose, using a start of 201, pages of 100):
SQL
SELECT TOP 100 * FROM MyTable WHERE ID > 200 ORDER BY ID
LINQ to SQL
var MyRows = (from t in db.Table
order by t.ID ascending
select t).Skip(200).Take(100)
If your ID field has a clustered index, use the former. If not, both of these will take the same amount of time (LINQ returns 500,000 rows, then skips, then takes).
If you're sorting by something that's NOT ID and you have it indexed, use ROW_NUMBER().
Edit: Because the OP isn't sorting by ID, the only solution is ROW_NUMBER(), which is the clause that I put at the end there.
In this case, the table isn't indexed, so please see here for ideas on how to index to improve query performance.

Related

Sessions by hits.page.pagePath in GA bigquery tables

I am new to bigquery, so sorry if this is a noob question! I am interested in breaking out sessions by page path or title. I understand one session can contain multiple paths/titles so the sum would be greater than total sessions. Essentially, I want to create a 'session id' and do a count distinct of sessionids where path like a or b.
It might actually be helpful to start at the very beginning and manually calculate total sessions. I tried to concatenate visit id and full visitor id to create a unique visit id, but apparently that is quite different from sessions. Can someone help enlighten me? Thanks!
I am working with our GA site data. Schema is the standard in GA exports.
DATA SAMPLE
Let's use an example out of the sample BigQuery (London Helmet) data:
There are 63 sessions in this day:
SELECT count(*) FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
How many of those sessions are where hits.page.pagePath like /vests% or /helmets%? How many were vests only vs helmets only? Thanks!
Here is an example of how to calculate whether there were only helmets, or only vests or both helmets and vests or neither:
SELECT
visitID,
has_helmets AND has_vests AS both_helmets_and_vests,
has_helmets AND NOT has_vests AS helmets_only,
NOT has_helmets AND has_vests AS vests_only,
NOT has_helmets AND NOT has_vests AS neither_helmets_nor_vests
FROM (
SELECT
visitId,
SOME(hits.page.pagePath like '/helmets%') WITHIN RECORD AS has_helmets,
SOME(hits.page.pagePath like '/vests%') WITHIN RECORD AS has_vests,
FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
)
Way 1, easier but you need to repeat on each field
Obviously you can do something like this :
SELECT count(*) FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910] WHERE hits.page.pagePath like '/helmets%'
And then have multiple queries for your own substrings (one with '/vests%', one with 'helmets%', etc).
Way 2, works fine, but not with repeated fields
If you want ONE query that'll just group by on the first part of the string, you can do something like that :
Select a, Count(*) FROM (SELECT FIRST(SPLIT(hits.page.pagePath, '/')) as a FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910] ) group by a
When I do this, it returns me the following the 63 sessions, with a total count of 63 :).
Way 3, using a FLATTEN on the table to get each hit individually
Since the "hits" field is repeatable, you would need a FLATTEN in your query :
Select a, Count(*) FROM (SELECT FIRST(SPLIT(hits.page.pagePath, '/')) as a FROM FLATTEN ([google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910] , hits)) group by a
The reason why you need to FLATTEN here is that the "hits" field is repeatable. If you don't flatten, it won't look into ALL the "hits" in your response. Adding "FLATTEN" will make you work off a sub-table where each hit is in its own row, so you can query on all of them.
If you want it by sessions instead of hits, (it'll be both), do something like :
Select b, a Count(*) FROM (SELECT FIRST(SPLIT(hits.page.pagePath, '/')) as a, visitID as b, FROM FLATTEN ([google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910] , hits)) group by b, a

Microsoft Indexing Service and OLEDB With Paging

I have an ASP.net 2.0 intranet site that uses the indexing service on a folder and its contents.
OLEDB is used to query the files in this folder by using the same technique as discussed here.
This was written by another developer but i am starting to understand his way of working.
But now the clients are complaining about the long loadtime of the page because all files in the folder are queried at once. They are right about the fact that it's slow so i considered using paging (Like in linq Skip().Take()). I know that in SQL this translates as:
SELECT col1, col2
FROM
(
SELECT col1, col2, ROW_NUMBER() OVER (ORDER BY ID) AS RowNum
FROM MyTable
)
AS MyDerivedTable
WHERE MyDerivedTable.RowNum BETWEEN #startRow AND #endRow
But for some reason this does not work when used with OLEDB.
Which version of SQL does this use or do any of you got a suggestiong on how to implement the paging?
EDIT:
Because the above method is only available when using sql Server 2005 or higher, i am going to try a method prior to 2005. I think OLEDB doesn't support Row_Number() or Over.
Going to try:
SELECT ... FROM Table WHERE PK IN
(SELECT TOP #PageSize PK FROM Table WHERE PK NOT IN
(SELECT TOP #StartRow PK FROM Table ORDER BY SortColumn)
ORDER BY SortColumn)
ORDER BY SortColumn
Seems like MSIDXS doesn't support much SQL functions.
Only the basics like "Select", "Where", "Order by" works. The other functions like "Top", "Rowcount", "Over" don't work. It even fails on "Count(*)".
I implemented paging by using the DataAdapter.Fill() method with 2 integers; startrecord and maxrecord. This is not ideal but the best in this case solution.
Now all records will be collected but only those i need will be stored in the dataset which then is converted to a collection of my own class.
This works fast for the first pages because only the first rows will be looped and returned. But when you have 20 pages the last page will take longer because all the records before it will be looped.
I tested this with a page size of 20 and 400 results.
The first page took 200ms while the last page took around 1,6 seconds.
A noticeable lag but now it only takes place on the last pages and not on the first 10.
There is a search and sorting mechanism so the last pages won't be visited that much.

SQL sorting , paging, filtering best practices in ASP.NET

I am wondering how Google does it. I have a lot of slow queries when it comes to page count and total number of results. Google returns a count value of 250,000,00 in a fraction of a second.
I am dealing with grid views. I have built a custom pager for a gridview that requires an SQL query to return a page count based on the filters set by the user. The filters are at least 5 which includes a keyword, a category and subcategory, a date range filter, and a sort expression filter for sorting. The query contains about 10 massive table left joins.
This query is executed every time a search is performed and a query execution last an average of 30 seconds - be it count or a select. I believe what's making it slow is my query string of inclusive and exclusive date range filters. I have replaced (<=,>=) to BETWEEN and AND but still I experience the same problem.
See the query here:
http://friendpaste.com/4G2uZexRfhd3sSVROqjZEc
I have problems with a long date range parameter.
Check my table that contains the dates:
http://friendpaste.com/1HrC0L62hFR4DghE6ypIRp
UPDATE [9/17/2010] I minimized my date query and removed the time.
I tried reducing the joins for my count query (I am actually having a problem with my filter count which takes to long to return a result of 60k rows).
SELECT COUNT(DISTINCT esched.course_id)
FROM courses c
LEFT JOIN events_schedule esched
ON c.course_id = esched.course_id
LEFT JOIN course_categories cc
ON cc.course_id = c.course_id
LEFT JOIN categories cat
ON cat.category_id = cc.category_id
WHERE 1 = 1
AND c.course_type = 1
AND active = 1
AND c.country_id = 52
AND c.course_title LIKE '%cook%'
AND cat.main_category_id = 40
AND cat.category_id = 360
AND (
(2010-09-01' <= esched.date_start OR 2010-09-01' <= esched.date_end)
AND
('2010-09-25' >= esched.date_start OR '2010-09-25' >= esched.date_end)
)
I just noticed that my query is quite fast when I have a filter on my main or sub category fields. However when I only have a date filter and the range is a month or a week it needs to count a lot of rows and is done in 30seconds in average.
These are the static fields:
AND c.course_type = 1
AND active = 1
AND c.country_id = 52
UPDATE [9/17/2010] If a create a hash for these three fields and store it on one field will it do a change in speed?
These are my dynamic fields:
AND c.course_title LIKE '%cook%'
AND cat.main_category_id = 40
AND cat.category_id = 360
// ?DateStart and ?DateEnd
UPDATE [9/17/2010]. Now my problem is the leading % in LIKE query
Will post an updated explain
Search engines like Google use very complex behind-the-scenes algorythyms to index searches. Essentially, they have already determined which words occur on each page as well as the relative importance of those words and the relative importance of the pages (relative to other pages). These indexes are very quick because they are based on Bitwise Indexing.
Consider the following google searches:
custom : 542 million google hits
pager : 10.8 m
custom pager 1.26 m
Essentially what they have done is created a record for the word custom and in that record they have placed a 1 for every page that contains it and a 0 for every page that doesn't contain it. Then they zip it up because there are a lot more 0s than 1s. They do the same for pager.
When the search custom pager comes in, they unzip both records, perform a bitwise AND on them and this results in an array of bits where length is the total number of pages that they have indexed and the number of 1s represents the hit count for the search. The position of each bit corresponds to a particular result which is known in advance and they only have to look up the full details of the first 10 to display on the first page.
This is oversimplified, but that is the general principle.
Oh yes, they also have huge banks of servers performing the indexing and huge banks of servers responding to search requests. HUGE banks of servers!
This makes them a lot quicker than anything that could be done in a relational database.
Now, to your question: Could you paste some sample SQL for us to look at?
One thing you could try is changing the order that the tables and joins appear in your SQl statement. I know that it seems that it shouldn't make a difference but it certainly can. If you put the most restrictive joins earlier in the statement then you could well end up with fewer overall joins performed within the database.
A real world example. Say you wanted to find all of the entries in the phonebook under the name 'Johnson', with the number beginning with '7'. One way would be to look for all the numbers beginning with 7 and then join that with the numbers belonging to people called 'Johnson'. In fact it would be far quicker to perform the filtering the other way around even if you had indexing on both names and numbers. This is because the name 'Johnson' is more restrictive than the number 7.
So order does count, and datbase software is not always good at determining in advance which joins to perform first. I'm not sure about MySQL as my experience is mostly with SQL Server which uses index statistics to calculate which order to perform joins. These stats get out of date after a number of inserts, updates and deletes, so they have to be re-computed periodically. If MySQL has something similar, you could try this.
UPDATE
I have looked at the query that you posted. Ten left joins is not unusual and should perform fine as long as you have the right indexes in place. Yours is not a complicated query.
What you need to do is break this query down to its fundamentals. Comment out the lookup joins such as those to currency, course_stats, countries, states and cities along with the corresponding fields in the select statement. Does it still run as slowly? Probably not. But it is probably still not ideal.
So comment out all of the rest until you just have the courses and the group by course id and order by courseid. Then, experiment with adding in the left joins to see which one has the greatest impact. Then, focusing on the ones with the greatest impact on performance, change the order of the queries. This is the trial - and - error approach,. It would be a lot better for you to take a look at the indexes on the columns that you are joining on.
For example, the line cm.method_id = c.method_id would require a primary key on course_methodologies.method_id and a foreign key index on courses.method_id and so on. Also, all of the fields in the where, group by and order by clauses need indexes.
Good luck
UPDATE 2
You seriously need to look at the date filtering on this query. What are you trying to do?
AND ((('2010-09-01 00:00:00' <= esched.date_start
AND esched.date_start <= '2010-09-25 00:00:00')
OR ('2010-09-01 00:00:00' <= esched.date_end
AND esched.date_end <= '2010-09-25 00:00:00'))
OR ((esched.date_start <= '2010-09-01 00:00:00'
AND '2010-09-01 00:00:00' <= esched.date_end)
OR (esched.date_start <= '2010-09-25 00:00:00'
AND '2010-09-25 00:00:00' <= esched.date_end)))
Can be re-written as:
AND (
//date_start is between range - fine
(esched.date_start BETWEEN '2010-09-01 00:00:00' AND '2010-09-25 00:00:00')
//date_end is between range - fine
OR (esched.date_end BETWEEN '2010-09-01 00:00:00' AND '2010-09-25 00:00:00')
OR (esched.date_start <= '2010-09-01 00:00:00' AND esched.date_end >= '2010-09-01 00:00:00' )
OR (esched.date_start <= '2010-09-25 00:00:00' AND esched.date_end > = '2010-09-25 00:00:00')
)
on your update you mention you suspect the problem to be in the date filters.
All those date checks can be summed up in a single check:
esched.date_ends >= '2010-09-01 00:00:00' and esched.date_start <= '2010-09-25 00:00:00'
If with the above it behaves the same, check if the following returns quickly / is picking your indexes:
SELECT COUNT(DISTINCT esched.course_id)
FROM events_schedule esched
WHERE esched.date_ends >= '2010-09-01 00:00:00' and esched.date_start <= '2010-09-25 00:00:00'
ps I think that when using the join, you can do SELECT COUNT(c.course_id) to count main records of courses in the query directly i.e. might not need the distinct that way.
re update now most time going to the wild card search after the change:
Use a mysql full text search. Make sure to check fulltext-restrictions, one important is that its only supported in MyISAM tables. I must say that I haven't really used the mysql full text search, and I'm not sure how that impacts the use of other indexes in the query.
If you can't use a full text search, imho you are out luck in using your current approach to it i.e. since it can't use the regular index to check if a word its contained in any part of the text.
If that's the case, you might want to switch that specific part of the approach and introduce a tag/keywords based approach. Unlike categories, you can assign multiple to each item, so its flexible yet doesn't have the free text issue.

Does a multi-column index work for single column selects too?

I've got (for example) an index:
CREATE INDEX someIndex ON orders (customer, date);
Does this index only accelerate queries where customer and date are used or does it accelerate queries for a single-column like this too?
SELECT * FROM orders WHERE customer > 33;
I'm using SQLite.
If the answer is yes, why is it possible to create more than one index per table?
Yet another question: How much faster is a combined index compared with two separat indexes when you use both columns in a query?
marc_s has the correct answer to your first question. The first key in a multi key index can work just like a single key index but any subsequent keys will not.
As for how much faster the composite index is depends on your data and how you structure your index and query, but it is usually significant. The indexes essentially allow Sqlite to do a binary search on the fields.
Using the example you gave if you ran the query:
SELECT * from orders where customer > 33 && date > 99
Sqlite would first get all results using a binary search on the entire table where customer > 33. Then it would do a binary search on only those results looking for date > 99.
If you did the same query with two separate indexes on customer and date, Sqlite would have to binary search the whole table twice, first for the customer and again for the date.
So how much of a speed increase you will see depends on how you structure your index with regard to your query. Ideally, the first field in your index and your query should be the one that eliminates the most possible matches as that will give the greatest speed increase by greatly reducing the amount of work the second search has to do.
For more information see this:
http://www.sqlite.org/optoverview.html
I'm pretty sure this will work, yes - it does in MS SQL Server anyway.
However, this index doesn't help you if you need to select on just the date, e.g. a date range. In that case, you might need to create a second index on just the date to make those queries more efficient.
Marc
I commonly use combined indexes to sort through data I wish to paginate or request "streamily".
Assuming a customer can make more than one order.. and customers 0 through 11 exist and there are several orders per customer all inserted in random order. I want to sort a query based on customer number followed by the date. You should sort the id field as well last to split sets where a customer has several identical dates (even if that may never happen).
sqlite> CREATE INDEX customer_asc_date_asc_index_asc ON orders
(customer ASC, date ASC, id ASC);
Get page 1 of a sorted query (limited to 10 items):
sqlite> SELECT id, customer, date FROM orders
ORDER BY customer ASC, date ASC, id ASC LIMIT 10;
2653|1|1303828585
2520|1|1303828713
2583|1|1303829785
1828|1|1303830446
1756|1|1303830540
1761|1|1303831506
2442|1|1303831705
2523|1|1303833761
2160|1|1303835195
2645|1|1303837524
Get the next page:
sqlite> SELECT id, customer, date FROM orders WHERE
(customer = 1 AND date = 1303837524 and id > 2645) OR
(customer = 1 AND date > 1303837524) OR
(customer > 1)
ORDER BY customer ASC, date ASC, id ASC LIMIT 10;
2515|1|1303837914
2370|1|1303839573
1898|1|1303840317
1546|1|1303842312
1889|1|1303843243
2439|1|1303843699
2167|1|1303849376
1544|1|1303850494
2247|1|1303850869
2108|1|1303853285
And so on...
Having the indexes in place reduces server side index scanning when you would otherwise use a query OFFSET coupled with a LIMIT. The query time gets longer and the drives seek harder the higher the offset goes. Using this method eliminates that.
Using this method is advised if you plan on joining data later but only need a limited set of data per request. Join against a SUBSELECT as described above to reduce memory overhead for large tables.

Split Data into Pages

How do I split data into pages on ASP.Net?
I'm looking for something like what Google does when you have too many search results and it splits them into x number of pages.
It would depend entirely on the content. If it's a simple datagrid you can use the built in datagrid paging. If the data is coming from SQL though, I'd advise building a generic "paging control" and using the paging functionality of SQL to only pull back the data you want to see.
If it's SQL 2005 (or above) paging is nice and easy:
SELECT Description, Date
FROM (SELECT ROW_NUMBER() OVER (ORDER BY MyCol DESC) AS Row, Desc, Date FROM MyTable)
AS MyTableWithRowNumbers
WHERE Row >= 1 AND Row <= 10
A paginating datagrid or a repeater would be your best options.
Use a GridView and the LinqDataSource.
It will do it all for you.
See:
http://msdn.microsoft.com/en-us/library/bb470363.aspx
and:
http://msdn.microsoft.com/en-us/library/bb547113.aspx

Resources