SQL MAX() function is very slow, is there a better method? - asp.net

I have a SQL database and ASP.NET web application and most of my queries involve the SQL max function.
For example, the below query takes between approx. 36 seconds to execute (when using the profiler) on both the ASP.NET website and the SSMS.
SELECT MAX(CONVERT(FLOAT,ISNULL(Runhrs,Runho))) -
MIN(CONVERT(FLOAT,ISNULL(Runhrs,Runho))) AS ACTUALHOURSRUN
FROM REPORTINGSYSTEM.DBO.HL_LOGS
WHERE ID_LOCATION = #ID_LOCATION AND
CONVERT(VARCHAR,TIME_STAMP,102)
BETWEEN #STARTDATE AND #ENDDATE
The table in question has approx. 5,000,000 records and 45 columns.
What is the best/fastest/most efficient way of executing the query to reduce the execution time?
Thanks in advance...

An index on ID_LOCATION and TIME_STAMP would be a good candidate.
However, as you are converting the TIME_STAMP field to a VARCHAR, this will prevent it from being able to use the index. Instead, rework the query to remove the need to do the CONVERT and instead work with DATETIME values.
e.g.
SELECT MAX(CONVERT(FLOAT,ISNULL(Runhrs,Runho))) - MIN(CONVERT(FLOAT,ISNULL(Runhrs,Runho))) AS ACTUALHOURSRUN
FROM REPORTINGSYSTEM.DBO.HL_LOGS
WHERE ID_LOCATION = #ID_LOCATION AND TIME_STAMP >= #STARTDATE AND TIME_STAMP < #ENDDATE
Ensure #STARTDATE and #ENDDATE are DATETIME parameters also (I'm assuming this is the datatype of the TIME_STAMP column)
I'd also question what datatypes are the Runhrs/Runho columns? If not stored as FLOAT/DECIMAL already, then could they not be as that would appear to be the most suitable datatype?

There are several things that you need to do:
Make sure that the columns that you search are indexed - You need an index on ID_LOCATION and TIME_STAMP (if you have other queries that query by date across locations or by locations across dates, you may consider defining separate indexes; otherwise, a single combined index will work).
Stop converting the timestamp to a string - use the native data type for #STARTDATE and #ENDDATE in your query, or replace the condition with TIME_STAMP between CONVERT(datetime,#STARTDATE) and CONVERT(datetime,#ENDDATE)
These two changes should make your query faster, especially the second one: currently, CONVERT(VARCHAR,TIME_STAMP,102) forces the query optimizer into a full scan on everything that matches your location, or even a full table scan if there is no index on ID_LOCATION. An indexed search should bring the number of records down to acceptable levels.
The final item to take on is CONVERT(FLOAT,ISNULL(Runhrs, Runho)): if the query speed after the first two modifications remains insufficient, change MAX(CONVERT(FLOAT,ISNULL(Runhrs,Runho))) to CONVERT(FLOAT, MAX(ISNULL(Runhrs,Runho))), and do the same change for the MIN. This may or may not work, depending on the types of Runhrs and Runho.

First, store Rhnhrs as a numeric type. Then you don't need to do the conversion.
Second, you'll make this faster by creating an index on hl_logs(id_location, time_stamp). You could also throw Runhrs and Runho as well. With an index on all four columns (in that order), the query will not even need to go to the original data.
To use the index, you need to change the where statement to something like:
time_stamp bewteen #startTimeStamp and #EndTimeStamp
The SQL engine won't use the index if the variable is the argument to a function.
The resulting query should look more like:
select max(coalesce(runhrs, runho)) - min(coalesce(runhrs, runho) as Actual
from REPORTINGSYSTEM.DBO.HL_LOGS
WHERE ID_LOCATION = #ID_LOCATION AND
TIME_STAMP BETWEEN cast(#STARTDATE as datetime) AND cast(#ENDDATE as datetime)

SELECT CONVERT(FLOAT, MAX(Runhrs)), CONVERT(FLOAT, MAX(Runho),
CONVERT(FLOAT, MIN(Runhrs)), CONVERT(FLOAT, MIN(Runho)
FROM REPORTINGSYSTEM.DBO.HL_LOGS
WHERE ID_LOCATION = #ID_LOCATION
AND TIME_STAMP BETWEEN #STARTDATE AND #ENDDATE
Do the subtraction yourself in the code which call the SQL. (That is the .net code.)
You should have an index on ID_LOCATION, TIME_STAMP, Runhrs, Runho. One index with all four fields. Maybe two indexes.
CREATE INDEX REPORTINGSYSTEM.DBO.HL_LOGS ON ID_LOCATION, TIME_STAMP, Runhrs
CREATE INDEX REPORTINGSYSTEM.DBO.HL_LOGS ON ID_LOCATION, TIME_STAMP, Runho

AND CONVERT(VARCHAR,TIME_STAMP,102)
This is NOT the way to do it - it will cause slowness. Do NOT apply unnecessary functions to the DATA - it robs the optimizer of any chance to use indexes on this field. Do it the other way around, convert the #STARTDATE + #ENDDATE into the same data type as the field. (or, think of it this way you are doing ~ 5,000,000 conversions so that you can compare to 2 variables)
[time_stamp] BETWEEN #STARTDATE AND #ENDDATE
Please understand this actually means:
[time_stamp] > = #STARTDATE AND [time_stamp] <= #ENDDATE
i.e. BOTH "boundary" datetimes are INCLUDED, usually it's easier and more accurate to avoid between's inclusive nature and spell it out like this:
[time_stamp] > = #STARTDATE AND [time_stamp] < #ENDDATE (with #enddate
being "next day at 00:00:00:000)
MAX(CONVERT(FLOAT,ISNULL(Runhrs,Runho))) -
MIN(CONVERT(FLOAT,ISNULL(Runhrs,Runho))) AS ACTUALHOURSRUN
wow, This is a lot of function calls! It's not pure MAX()/MIN() that's at issue here.
Suggest you simply look for Max(Runhrs) etc as previously mentioned, THEN do the subtractions.

Related

Calculating the percentage of dates (SQL Server)

I'm trying to add an auto-calculated field in SQL Server 2012 Express, that stores the % of project completion, by calculating the date difference by using:
ALTER TABLE dbo.projects
ADD PercentageCompleted AS (select COUNT(*) FROM projects WHERE project_finish > project_start) * 100 / COUNT(*)
But I am getting this error:
Msg 1046, Level 15, State 1, Line 2
Subqueries are not allowed in this context. Only scalar expressions are allowed.
What am I doing wrong?
Even if it would be possible (it isn't), it is anyway not something you would want to have as a caculated column:
it will be the same value in each row
the entire table would need to be updated after every insert/update
You should consider doing this in a stored procedure or a user defined function instead.Or even better in the business logic of your application,
I don't think you can do that. You could write a trigger to figure it out or do it as part of an update statement.
Are you storing "percentageCompleted" as a duplicated column value in the same table as your project data?
If this is the case, I would not recommend this, because it would duplicate the data.
If you don't care about duplicate data, try something separating the steps out like this:
ALTER TABLE dbo.projects
ADD PercentageCompleted decimal(2,2) --You could also store it as a varchar or char
declare #percentageVariable decimal(2,2)
select #percentageVariable = (select count(*) from projects where Project_finish > project_start) / (select count(*) from projects) -- need to get ratio by completed/total
update projects
set PercentageCompleted = #percentageVariable
this will give you a decimal value in that table, then you can format it on select if you desire to % + PercentageCompleted * 100

SQLite Query For Dates Equals Today

How to make the query to retrieve from database only the records that have the time equals to 'today'. I store my dates as long.
E.g:
DateColumn (The name of my column) and the name of my table is MyTable
1360054701278 (Tuesday, February 5, 2013 8:58:21 AM GMT)
1359795295000 (Saturday, February 2, 2013 8:54:55 AM GMT)
So how should I make the query for this example in order to retrieve the first record (because it is the date equal to today)?
Any suggestion would be appreciated.
Thanks
sorry for not seeing that, your problem were the additional milliseconds saved in your column.
The solution was to remove them by division ;-)
SELECT * FROM MyTable WHERE date(datetime(DateColumn / 1000 , 'unixepoch')) = date('now')
Use a range for fastest query. You want to avoid converting to compare.
SELECT *
FROM My Table
WHERE DateColumn BETWEEN JulianDay('now') AND JulianDay('now','+1 day','-0.001 second')
Note: I just realized your dates are not stored as Julian Dates which SQLite supports natively. The concept is still the same, but you'll need to use your own conversion functions for whatever format you're storing your dates as.

Insert multiple rows into SQL Server based on start and end dates

I need to insert several rows into a SQL Server database based on Start Date and End Date textboxes.
E.G. tbStartDate.Text = "25/12/2012" and tbEndDate.Text = "29/12/2012" therefore I need to insert individual rows for the following dates:
25/12/2012
26/12/2012
27/12/2012
28/12/2012
29/12/2012
Please can you help me with the necessary T-SQL to achieve this?
As always there are a few ways. Here are some of them:
You can write code in your app that loops through the days and inserts a single record per day. (generally the worst design)
You can call some SQL script to do it all in the database.
You can wrap up your SQL script in a stored procedure and pass in the start and end date and get the stored procedure to do it for you.
You can cross join to a pre existing tally table and use it to generate your records.
If you can provide
-the version of SQL Server that you're using
-what the table looks like
-whether you're using C# or VB
then we can help further as it can be difficult to pass dates into databases. It can be particularly difficult if you do not validate them.
Anyway here is option 3 for you.
CREATE PROC dbo.t_test
#StartDate DATETIME,
#EndDate DATETIME
AS
WHILE #StartDate <= #EndDate
BEGIN
INSERT INTO YourTable(YourDateField) VALUES (#StartDate)
SET #StartDate = DATEADD(d,1,#StartDate)
END
Then you need to call this stored procedure (called dbo.t_test) from ASP.Net and pass in your two date parametes as dates.
Declare #Startdate datetime
Select #Startdate='20121025'
While #Startdate<='20121029'
begin
if not exists(select * from dummy where DATE=#Startdate)
insert into dummy (date) values (#Startdate)
set #Startdate=#Startdate + 1
end;

SQL sorting , paging, filtering best practices in ASP.NET

I am wondering how Google does it. I have a lot of slow queries when it comes to page count and total number of results. Google returns a count value of 250,000,00 in a fraction of a second.
I am dealing with grid views. I have built a custom pager for a gridview that requires an SQL query to return a page count based on the filters set by the user. The filters are at least 5 which includes a keyword, a category and subcategory, a date range filter, and a sort expression filter for sorting. The query contains about 10 massive table left joins.
This query is executed every time a search is performed and a query execution last an average of 30 seconds - be it count or a select. I believe what's making it slow is my query string of inclusive and exclusive date range filters. I have replaced (<=,>=) to BETWEEN and AND but still I experience the same problem.
See the query here:
http://friendpaste.com/4G2uZexRfhd3sSVROqjZEc
I have problems with a long date range parameter.
Check my table that contains the dates:
http://friendpaste.com/1HrC0L62hFR4DghE6ypIRp
UPDATE [9/17/2010] I minimized my date query and removed the time.
I tried reducing the joins for my count query (I am actually having a problem with my filter count which takes to long to return a result of 60k rows).
SELECT COUNT(DISTINCT esched.course_id)
FROM courses c
LEFT JOIN events_schedule esched
ON c.course_id = esched.course_id
LEFT JOIN course_categories cc
ON cc.course_id = c.course_id
LEFT JOIN categories cat
ON cat.category_id = cc.category_id
WHERE 1 = 1
AND c.course_type = 1
AND active = 1
AND c.country_id = 52
AND c.course_title LIKE '%cook%'
AND cat.main_category_id = 40
AND cat.category_id = 360
AND (
(2010-09-01' <= esched.date_start OR 2010-09-01' <= esched.date_end)
AND
('2010-09-25' >= esched.date_start OR '2010-09-25' >= esched.date_end)
)
I just noticed that my query is quite fast when I have a filter on my main or sub category fields. However when I only have a date filter and the range is a month or a week it needs to count a lot of rows and is done in 30seconds in average.
These are the static fields:
AND c.course_type = 1
AND active = 1
AND c.country_id = 52
UPDATE [9/17/2010] If a create a hash for these three fields and store it on one field will it do a change in speed?
These are my dynamic fields:
AND c.course_title LIKE '%cook%'
AND cat.main_category_id = 40
AND cat.category_id = 360
// ?DateStart and ?DateEnd
UPDATE [9/17/2010]. Now my problem is the leading % in LIKE query
Will post an updated explain
Search engines like Google use very complex behind-the-scenes algorythyms to index searches. Essentially, they have already determined which words occur on each page as well as the relative importance of those words and the relative importance of the pages (relative to other pages). These indexes are very quick because they are based on Bitwise Indexing.
Consider the following google searches:
custom : 542 million google hits
pager : 10.8 m
custom pager 1.26 m
Essentially what they have done is created a record for the word custom and in that record they have placed a 1 for every page that contains it and a 0 for every page that doesn't contain it. Then they zip it up because there are a lot more 0s than 1s. They do the same for pager.
When the search custom pager comes in, they unzip both records, perform a bitwise AND on them and this results in an array of bits where length is the total number of pages that they have indexed and the number of 1s represents the hit count for the search. The position of each bit corresponds to a particular result which is known in advance and they only have to look up the full details of the first 10 to display on the first page.
This is oversimplified, but that is the general principle.
Oh yes, they also have huge banks of servers performing the indexing and huge banks of servers responding to search requests. HUGE banks of servers!
This makes them a lot quicker than anything that could be done in a relational database.
Now, to your question: Could you paste some sample SQL for us to look at?
One thing you could try is changing the order that the tables and joins appear in your SQl statement. I know that it seems that it shouldn't make a difference but it certainly can. If you put the most restrictive joins earlier in the statement then you could well end up with fewer overall joins performed within the database.
A real world example. Say you wanted to find all of the entries in the phonebook under the name 'Johnson', with the number beginning with '7'. One way would be to look for all the numbers beginning with 7 and then join that with the numbers belonging to people called 'Johnson'. In fact it would be far quicker to perform the filtering the other way around even if you had indexing on both names and numbers. This is because the name 'Johnson' is more restrictive than the number 7.
So order does count, and datbase software is not always good at determining in advance which joins to perform first. I'm not sure about MySQL as my experience is mostly with SQL Server which uses index statistics to calculate which order to perform joins. These stats get out of date after a number of inserts, updates and deletes, so they have to be re-computed periodically. If MySQL has something similar, you could try this.
UPDATE
I have looked at the query that you posted. Ten left joins is not unusual and should perform fine as long as you have the right indexes in place. Yours is not a complicated query.
What you need to do is break this query down to its fundamentals. Comment out the lookup joins such as those to currency, course_stats, countries, states and cities along with the corresponding fields in the select statement. Does it still run as slowly? Probably not. But it is probably still not ideal.
So comment out all of the rest until you just have the courses and the group by course id and order by courseid. Then, experiment with adding in the left joins to see which one has the greatest impact. Then, focusing on the ones with the greatest impact on performance, change the order of the queries. This is the trial - and - error approach,. It would be a lot better for you to take a look at the indexes on the columns that you are joining on.
For example, the line cm.method_id = c.method_id would require a primary key on course_methodologies.method_id and a foreign key index on courses.method_id and so on. Also, all of the fields in the where, group by and order by clauses need indexes.
Good luck
UPDATE 2
You seriously need to look at the date filtering on this query. What are you trying to do?
AND ((('2010-09-01 00:00:00' <= esched.date_start
AND esched.date_start <= '2010-09-25 00:00:00')
OR ('2010-09-01 00:00:00' <= esched.date_end
AND esched.date_end <= '2010-09-25 00:00:00'))
OR ((esched.date_start <= '2010-09-01 00:00:00'
AND '2010-09-01 00:00:00' <= esched.date_end)
OR (esched.date_start <= '2010-09-25 00:00:00'
AND '2010-09-25 00:00:00' <= esched.date_end)))
Can be re-written as:
AND (
//date_start is between range - fine
(esched.date_start BETWEEN '2010-09-01 00:00:00' AND '2010-09-25 00:00:00')
//date_end is between range - fine
OR (esched.date_end BETWEEN '2010-09-01 00:00:00' AND '2010-09-25 00:00:00')
OR (esched.date_start <= '2010-09-01 00:00:00' AND esched.date_end >= '2010-09-01 00:00:00' )
OR (esched.date_start <= '2010-09-25 00:00:00' AND esched.date_end > = '2010-09-25 00:00:00')
)
on your update you mention you suspect the problem to be in the date filters.
All those date checks can be summed up in a single check:
esched.date_ends >= '2010-09-01 00:00:00' and esched.date_start <= '2010-09-25 00:00:00'
If with the above it behaves the same, check if the following returns quickly / is picking your indexes:
SELECT COUNT(DISTINCT esched.course_id)
FROM events_schedule esched
WHERE esched.date_ends >= '2010-09-01 00:00:00' and esched.date_start <= '2010-09-25 00:00:00'
ps I think that when using the join, you can do SELECT COUNT(c.course_id) to count main records of courses in the query directly i.e. might not need the distinct that way.
re update now most time going to the wild card search after the change:
Use a mysql full text search. Make sure to check fulltext-restrictions, one important is that its only supported in MyISAM tables. I must say that I haven't really used the mysql full text search, and I'm not sure how that impacts the use of other indexes in the query.
If you can't use a full text search, imho you are out luck in using your current approach to it i.e. since it can't use the regular index to check if a word its contained in any part of the text.
If that's the case, you might want to switch that specific part of the approach and introduce a tag/keywords based approach. Unlike categories, you can assign multiple to each item, so its flexible yet doesn't have the free text issue.

Does a multi-column index work for single column selects too?

I've got (for example) an index:
CREATE INDEX someIndex ON orders (customer, date);
Does this index only accelerate queries where customer and date are used or does it accelerate queries for a single-column like this too?
SELECT * FROM orders WHERE customer > 33;
I'm using SQLite.
If the answer is yes, why is it possible to create more than one index per table?
Yet another question: How much faster is a combined index compared with two separat indexes when you use both columns in a query?
marc_s has the correct answer to your first question. The first key in a multi key index can work just like a single key index but any subsequent keys will not.
As for how much faster the composite index is depends on your data and how you structure your index and query, but it is usually significant. The indexes essentially allow Sqlite to do a binary search on the fields.
Using the example you gave if you ran the query:
SELECT * from orders where customer > 33 && date > 99
Sqlite would first get all results using a binary search on the entire table where customer > 33. Then it would do a binary search on only those results looking for date > 99.
If you did the same query with two separate indexes on customer and date, Sqlite would have to binary search the whole table twice, first for the customer and again for the date.
So how much of a speed increase you will see depends on how you structure your index with regard to your query. Ideally, the first field in your index and your query should be the one that eliminates the most possible matches as that will give the greatest speed increase by greatly reducing the amount of work the second search has to do.
For more information see this:
http://www.sqlite.org/optoverview.html
I'm pretty sure this will work, yes - it does in MS SQL Server anyway.
However, this index doesn't help you if you need to select on just the date, e.g. a date range. In that case, you might need to create a second index on just the date to make those queries more efficient.
Marc
I commonly use combined indexes to sort through data I wish to paginate or request "streamily".
Assuming a customer can make more than one order.. and customers 0 through 11 exist and there are several orders per customer all inserted in random order. I want to sort a query based on customer number followed by the date. You should sort the id field as well last to split sets where a customer has several identical dates (even if that may never happen).
sqlite> CREATE INDEX customer_asc_date_asc_index_asc ON orders
(customer ASC, date ASC, id ASC);
Get page 1 of a sorted query (limited to 10 items):
sqlite> SELECT id, customer, date FROM orders
ORDER BY customer ASC, date ASC, id ASC LIMIT 10;
2653|1|1303828585
2520|1|1303828713
2583|1|1303829785
1828|1|1303830446
1756|1|1303830540
1761|1|1303831506
2442|1|1303831705
2523|1|1303833761
2160|1|1303835195
2645|1|1303837524
Get the next page:
sqlite> SELECT id, customer, date FROM orders WHERE
(customer = 1 AND date = 1303837524 and id > 2645) OR
(customer = 1 AND date > 1303837524) OR
(customer > 1)
ORDER BY customer ASC, date ASC, id ASC LIMIT 10;
2515|1|1303837914
2370|1|1303839573
1898|1|1303840317
1546|1|1303842312
1889|1|1303843243
2439|1|1303843699
2167|1|1303849376
1544|1|1303850494
2247|1|1303850869
2108|1|1303853285
And so on...
Having the indexes in place reduces server side index scanning when you would otherwise use a query OFFSET coupled with a LIMIT. The query time gets longer and the drives seek harder the higher the offset goes. Using this method eliminates that.
Using this method is advised if you plan on joining data later but only need a limited set of data per request. Join against a SUBSELECT as described above to reduce memory overhead for large tables.

Resources