Why does this query timeout? V2 - asp.net

This question is a followup to This Question
The solution, clearing the execution plan cache seemed to work at the time, but i've been running into the same problem over and over again, and clearing the cache no longer seems to help. There must be a deeper problem here.
I've discovered that if I remove the .Distinct() from the query, it returns rows (with duplicates) in about 2 seconds. However, with the .Distinct() it takes upwards of 4 minutes to complete. There are a lot of rows in the tables, and some of the where clause fields do not have indexes. However, the number of records returned is fairly small (a few dozen at most).
The confusing part about it is that if I get the SQL generated by the Linq query, via Linqpad, then execute that code as SQL or in SQL Management Studio (including the DISTINCT) it executes in about 3 seconds.
What is the difference between the Linq query and the executed SQL?
I have a short term workaround, and that's to return the set without .Distinct() as a List, then using .Distinct on the list, this takes about 2 seconds. However, I don't like doing SQL Server work on the web server.
I want to understand WHY the Distinct is 2 orders of magnitude slower in Linq, but not SQL.
UPDATE:
When executing the code via Linq, the sql profiler shows this code, which is basically identical query.
sp_executesql N'SELECT DISTINCT [t5].[AccountGroupID], [t5].[AccountGroup]
AS [AccountGroup1]
FROM [dbo].[TransmittalDetail] AS [t0]
INNER JOIN [dbo].[TransmittalHeader] AS [t1] ON [t1].[TransmittalHeaderID] =
[t0].[TransmittalHeaderID]
INNER JOIN [dbo].[LineItem] AS [t2] ON [t2].[LineItemID] = [t0].[LineItemID]
LEFT OUTER JOIN [dbo].[AccountType] AS [t3] ON [t3].[AccountTypeID] =
[t2].[AccountTypeID]
LEFT OUTER JOIN [dbo].[AccountCategory] AS [t4] ON [t4].[AccountCategoryID] =
[t3].[AccountCategoryID]
LEFT OUTER JOIN [dbo].[AccountGroup] AS [t5] ON [t5].[AccountGroupID] =
[t4].[AccountGroupID]
LEFT OUTER JOIN [dbo].[AccountSummary] AS [t6] ON [t6].[AccountSummaryID] =
[t5].[AccountSummaryID]
WHERE ([t1].[TransmittalEntityID] = #p0) AND ([t1].[DateRangeBeginTimeID] = #p1) AND
([t1].[ScenarioID] = #p2) AND ([t6].[AccountSummaryID] = #p3)',N'#p0 int,#p1 int,
#p2 int,#p3 int',#p0=196,#p1=20100101,#p2=2,#p3=0
UPDATE:
The only difference between the queries is that Linq executes it with sp_executesql and SSMS does not, otherwise the query is identical.
UPDATE:
I have tried various Transaction Isolation levels to no avail. I've also set ARITHABORT to try to force a recompile when it executes, and no difference.

The bad plan is most likely the result of parameter sniffing: http://blogs.msdn.com/b/queryoptteam/archive/2006/03/31/565991.aspx
Unfortunately there is not really any good universal way (that I know of) to avoid that with L2S. context.ExecuteCommand("sp_recompile ...") would be an ugly but possible workaround if the query is not executed very frequently.
Changing the query around slightly to force a recompile might be another one.
Moving parts (or all) of the query into a view*, function*, or stored procedure* DB-side would be yet another workaround.
 * = where you can use local params (func/proc) or optimizer hints (all three) to force a 'good' plan
Btw, have you tried to update statistics for the tables involved? SQL Server's auto update statistics doesn't always do the job, so unless you have a scheduled job to do that it might be worth considering scripting and scheduling update statistics... ...tweaking up and down the sample size as needed can also help.
There may be ways to solve the issue by adding* (or dropping*) the right indexes on the tables involved, but without knowing the underlying db schema, table size, data distribution etc that is a bit difficult to give any more specific advice on...
 * = Missing and/or overlapping/redundant indexes can both lead to bad execution plans.

The SQL that Linqpad gives you may not be exactly what is being sent to the DB.
Here's what I would suggest:
Run SQL Profiler against the DB while you execute the query. Find the statement which corresponds to your query
Paste the whole statment into SSMS, and enable the "Show Actual Execution Plan" option.
Post the resulting plan here for people to dissect.
Key things to look for:
Table Scans, which usually imply that an index is missing
Wide arrows in the graphical plan, indicating lots of intermediary rows being processed.
If you're using SQL 2008, viewing the plan will often tell you if there are any indexes missing which should be added to speed up the query.
Also, are you executing against a DB which is under load from other users?

At first glance there's a lot of joins, but I can only see one thing to reduce the number right away w/out having the schema in front of me...it doesn't look like you need AccountSummary.
[t6].[AccountSummaryID] = #p3
could be
[t5].[AccountSummaryID] = #p3
Return values are from the [t5] table. [t6] is only used filter on that one parameter which looks like it is the Foreign Key from t5 to t6, so it is present in [t5]. Therefore, you can remove the join to [t6] altogether. Or am I missing something?

Are you sure you want to use LEFT OUTER JOIN here? This query looks like it should probably be using INNER JOINs, especially because you are taking the columns that are potentially NULL and then doing a distinct on it.

Check that you have the same Transaction Isolation level between your SSMS session and your application. That's the biggest culprit I've seen for large performance discrepancies between identical queries.
Also, there are different connection properties in use when you work through SSMS than when executing the query from your application or from LinqPad. Do some checks into the Connection properties of your SSMS connection and the connection from your application and you should see the differences. All other things being equal, that could be the difference. Keep in mind that you are executing the query through two different applications that can have two different configurations and could even be using two different database drivers. If the queries are the same then that would be only differences I can see.
On a side note if you are hand-crafting the SQL, you may try moving the conditions from the WHERE clause into the appropriate JOIN clauses. This actually changes how SQL Server executes the query and can produce a more efficient execution plan. I've seen cases where moving the filters from the WHERE clause into the JOINs caused SQL Server to filter the table earlier in the execution plan and significantly changed the execution time.

Related

Error with SQLite query, What am I missing?

I've been attempting to increase my knowledge and trying out some challenges. I've been going at this for a solid two weeks now finished most of the challenge but this one part remains. The error is shown below, what am i not understanding?
Error in sqlite query: update users set last_browser= 'mozilla' + select sql from sqlite_master'', last_time= '13-04-2019' where id = '14'
edited for clarity:
I'm trying a CTF challenge and I'm completely new to this kind of thing so I'm learning as I go. There is a login page with test credentials we can use for obtaining many of the flags. I have obtained most of the flags and this is the last one that remains.
After I login on the webapp with the provided test credentials, the following messages appear: this link
The question for the flag is "What value is hidden in the database table secret?"
So from the previous image, I have attempted to use sql injection to obtain value. This is done by using burp suite and attempting to inject through the user-agent.
I have gone through trying to use many variants of the injection attempt shown above. Im struggling to find out where I am going wrong, especially since the second single-quote is added automatically in the query. I've gone through the sqlite documentation and examples of sql injection, but I cannot sem to understand what I am doing wrong or how to get that to work.
A subquery such as select sql from sqlite_master should be enclosed in brackets.
So you'd want
update user set last_browser= 'mozilla' + (select sql from sqlite_master''), last_time= '13-04-2019' where id = '14';
Although I don't think that will achieve what you want, which isn't clear. A simple test results in :-
You may want a concatenation of the strings, so instead of + use ||. e.g.
update user set last_browser= 'mozilla' || (select sql from sqlite_master''), last_time= '13-04-2019' where id = '14';
In which case you'd get something like :-
Thanks for everyone's input, I've worked this out.
The sql query was set up like this:
update users set last_browser= '$user-agent', last_time= '$current_date' where id = '$id_of_user'
edited user-agent with burp suite to be:
Mozilla', last_browser=(select sql from sqlite_master where type='table' limit 0,1), last_time='13-04-2019
Iterated with that found all tables and columns and flags. Rather time consuming but could not find a way to optimise.

Simple.Data Default Generated Queries and Performance

I am thinking of using Simple.Data Micro-ORM for my ASP.NET 4.5 website. However, there is something that I need to know before deciding whether to use it or not.
Let's take the following Join query for example:
var albums = db.Albums.FindAllByGenreId(1)
.Select(
db.Albums.Title,
db.Albums.Genre.Name);
This query will be translated to:
select
[dbo].[Albums].[Title],
[dbo].[Genres].[Name]
from [dbo].[Albums]
LEFT JOIN [dbo].[Genres] ON ([dbo].[Genres].[GenreId] = [dbo].[Albums].[GenreId])
WHERE [dbo].[Albums].[GenreId] = #p1
#p1 (Int32) = 1
Let's assume that the 'Genres' table is a a table with thousands or even millions of rows. I think that it might be very inefficient to filter the data after the JOIN has taken place, which is what this query translated for in Simple.Date.
Would it be better to filter the data firs in the Generes table, which means create make a SELECT statement first and make the JOIN with that filtered table?
Wouldn't it be better to filter the data ahead of time?
Furthermore, is there an option to make that type of complex (JOIN on a filtered table) query using Simple.Data.
Need your answer to know if to proceed with Simple.Data, or damp it in favor of another micro-ORM.
You are confused about how SQL is interpreted and executed by the database engine. Modern databases are incredibly smart about the best way to execute queries, and the order in which instructions appear in SQL statements has nothing to do with the order in which they are executed.
Try running some queries through SQL Management Studio and looking at the Execution Plan to see how they are actually optimised and executed. Or just try the SQL you think would work better and see how it actually performs compared to what is generated by Simple.Data.
The sql that Simple.Data is generating is idomatic T-SQL, too be honest its what I would be writing if I was drafting the sql myself.
This sql allows Sql Server to optimise the execution plan which should mean the most efficient retrieval of data.
The beauty of Simple.Data is that if you have any doubts or issues with the sql it generates you can just call a stored proc:
db.ProcedureWithParameters(1, 2);

Poor SP performance from ASP.NET

I have a stored procedure that handles sorting, filtering and paging (using Row_Number) and some funky trickery :) The SP is running against a table with ~140k rows.
The whole thing works great and for at least the first few dozen pages is super quick.
However, if I try to navigate to higher pages (e.g. head to the last page of 10k) the whole thing comes to a grinding halt and results in a SQL timeout error.
If I run the same query, using the same parms inside studio manager query window, the response is instant irrespective of the page number I pass in.
At the moment it's test code that is simply binding to a ASP:Datagrid in .NET 3.5
The SP looks like this:
BEGIN
WITH Keys
AS (
SELECT
TOP (#PageNumber * #PageSize) ROW_NUMBER() OVER (ORDER BY JobNumber DESC) as rn
,P1.jobNumber
,P1.CustID
,P1.DateIn
,P1.DateDue
,P1.DateOut
FROM vw_Jobs_List P1
WHERE
(#CustomerID = 0 OR CustID = #CustomerID) AND
(JobNumber LIKE '%'+#FilterExpression+'%'
OR OrderNumber LIKE '%'+#FilterExpression+'%'
OR [Description] LIKE '%'+#FilterExpression+'%'
OR Client LIKE '%'+#FilterExpression+'%')
ORDER BY P1.JobNumber DESC ),SelectedKeys
AS (
SELECT
TOP (#PageSize)SK.rn
,SK.JobNumber
,SK.CustID
,SK.DateIn
,SK.DateDue
,SK.DateOut
FROM Keys SK
WHERE SK.rn > ((#PageNumber-1) * #PageSize)
ORDER BY SK.JobNumber DESC)
SELECT
SK.rn
,J.JobNumber
,J.Description
,J.Client
,SK.CustID
,OrderNumber
,CAST(DateAdd(d, -2, CAST(isnull(SK.DateIn,0) AS DateTime)) AS nvarchar) AS DateIn
,CAST(DateAdd(d, -2, CAST(isnull(SK.DateDue,0) AS DateTime)) AS nvarchar) AS DateDue
,CAST(DateAdd(d, -2, CAST(isnull(SK.DateOut,0) AS DateTime)) AS nvarchar) AS DateOut
,Del_Method
,Ticket#
,InvoiceEmailed
,InvoicePrinted
,InvoiceExported
,InvoiceComplete
,JobStatus
FROM SelectedKeys SK
JOIN vw_Jobs_List J ON j.JobNumber=SK.JobNumber
ORDER BY SK.JobNumber DESC
END
And it's called via
sp_jobs (PageNumber,PageSize,FilterExpression,OrderBy,CustomerID)
e.g.
sp_Jobs '13702','10','','JobNumberDESC','0'
Can anyone shed any light on what might be the cause of the dramatic difference in performance between SQL query window and an asp.net page executing a dataset?
Check out the "WITH RECOMPILE" option
http://www.techrepublic.com/article/understanding-sql-servers-with-recompile-option/5662581
I have run into similar problems where the execution plan on stored procedures will work great for a while, but then get a new plan because the options changed. So, it will be "optimized" for one case and then perform "table scans" for another option. Here is what I have tried in the past:
Re-execute the stored procedure to calculate a new execution plan and then keep an eye on it.
Break up the stored procedure into separate stored procedures of each option such that it can be optimized and then the overall stored procedure simply calls each "optimized" stored procedure.
Bring in the records into an object and then perform all of the "funky trickery" in code and then it gives you the option to "cache" the results.
Obviously option #2 and #3 is better than option #1. I am honestly finding option #3 is becoming the best bet in most cases.
I just had another option 4. You could instead of performing your "inner selects" in one query, you could put the results of your inner selects into temporary tables and then JOIN on those results. I would still push for option #3 if possible, but I understand that sometimes you just need to keep working the stored procedure until it "works".
Good luck.

The question about the basics of LINQ to SQL

I just started learning LINQ to SQL, and so far I'm impressed with the easy of use and good performance.
I used to think that when doing LINQ queries like
from Customer in DB.Customers where Customer.Age > 30 select Customer
LINQ gets all customers from the database ("SELECT * FROM Customers"), moves them to the Customers array and then makes a search in that Array using .NET methods. This is very inefficient, what if there are hundreds of thousands of customers in the database? Making such big SELECT queries would kill the web application.
Now after experiencing how actually fast LINQ to SQL is, I start to suspect that when doing that query I just wrote, LINQ somehow converts it to a SQL Query string
SELECT * FROM Customers WHERE Age > 30
And only when necessary it will run the query.
So my question is: am I right? And when is the query actually run?
The reason why I'm asking is not only because I want to understand how it works in order to build good optimized applications, but because I came across the following problem.
I have 2 tables, one of them is Books, the other has information on how many books were sold on certain days. My goal is to select books that had at least 50 sales/day in past 10 days. It's done with this simple query:
from Book in DB.Books where (from Sale in DB.Sales where Sale.SalesAmount >= 50 && Sale.DateOfSale >= DateTime.Now.AddDays(-10) select Sale.BookID).Contains(Book.ID) select Book
The point is, I have to use the checking part in several queries and I decided to create an array with IDs of all popular books:
var popularBooksIDs = from Sale in DB.Sales where Sale.SalesAmount >= 50 && Sale.DateOfSale >= DateTime.Now.AddDays(-10) select Sale.BookID;
BUT when I try to do the query now:
from Book in DB.Books where popularBooksIDs.Contains(Book.ID) select Book
It doesn't work! That's why I think that we can't use thins kinds of shortcuts in LINQ to SQL queries, like we can't use them in real SQL. We have to create straightforward queries, am I right?
You are correct. LINQ to SQL does create the actual SQL to retrieve your results.
As for your shortcuts, there are ways to work around the limitations:
var popularBooksIds = DB.Sales
.Where(s => s.SalesAmount >= 50
&& s.DateOfSale >= DateTime.Now.AddDays(-10))
.Select(s => s.Id)
.ToList();
// Actually should work.
// Forces the table into memory and then uses LINQ to Objects for the query
var popularBooksSelect = DB.Books
.ToList()
.Where(b => popularBooksIds.Contains(b.Id));
Yes, query gets translated to a SQL string, and the underlying SQL can be different depending on what you are trying to do... so you have to be careful in that regard. Checkout a tool called linqpad, you can try your query in it and see the executing SQL.
Also, it runs when iterating through the collection or calling a method on it like ToList().
Entity framework or linq queries can be tricky sometimes. Sometimes you are surprised at the efficiency of the sql query generated and sometimes the query is so complicated and inefficient that you would smack your forehead.
Best idea is that if you have any suspicions about a query, run an sql profiler at the backend that would monitor all the queries coming in. That way you know exactly what is being passed on to the sql server and correct any inefficiencies if need be.
http://damieng.com/blog/2008/07/30/linq-to-sql-log-to-debug-window-file-memory-or-multiple-writers
This will help you to see what and when queries are being run. Also, Damiens blog is full of other linq to sql goodness.
You can generate an EXISTS clause by using the .Any method. I have had more success that way than trying to generate IN clauses, because it likes to retrieve all the data and pass it all back in as parameters to a query
In linq to sql, IQueryable expression fragments can be combined to create a single query, it will try to keep everything as an IQueryable for as long as it can, before you do something that cannot be expressed in SQL. When you call ToList you are directly asking it to resolve that query into an IEnumerable stored in memory.
In most cases you are better off not selecting the book ids in advance. Keep the fragment for popular books in a single place in the code and use it when necessary, to build on another query. An IQueryable is just an expression tree, which is resolved into SQL at some other point.
If you think your application will perform better by storing the popular books elsewhere (memcache or whatever), then you may consider pulling them out before hand, and checking against that later. This will mean each book id will be passed in as a sproc parameter and used in an IN clause.

Fun with Database Triggers and Recursion in RDB

I had a problem this week (which thankfully I've solved in a much better way);
I needed to keep a couple of fields in a database constant.
So, I knocked up a script to place a Trigger on the table, that would set the value back to a preset number when either an insert, or update took place.
The database is RDB running on VMS (but i'd be interested to know the similarities for SQLServer).
Here are the triggers:
drop trigger my_ins_trig;
drop trigger my_upd_trig;
!
!++ Create triggers on MY_TABLE
CREATE TRIGGER my_ins_trig AFTER INSERT ON my_table
WHEN somefield = 2
(UPDATE my_table table1
SET table1.field1 = 0.1,
table1.field2 = 1.2
WHERE my_table.dbkey = table1.dbkey)
FOR EACH ROW;
CREATE TRIGGER my_upd_trig AFTER UPDATE ON my_table
WHEN somefield = 2
(UPDATE my_table table1
SET table1.field1 = 0.1,
table1.field2 = 1.2
WHERE my_table.dbkey = table1.dbkey)
FOR EACH ROW;
Question Time
I'd would expect this to form an infinite recursion - but it doesnt seem to?
Can anyone explain to me how RDB deals with this one way or another...or how other databases deal with it.
[NOTE: I know this is an awful approach but various problems and complexities meant that even though this is simple in the code - it couldn't be done the best/easiest way. Thankfully I haven't implemented it in this way but I wanted to ask the SO community for its thoughts on this. ]
Thanks in advance
edit: It seems Oracle RDB just plain doesnt execute nested triggers that result in recursion. From the paper: 'A trigger can nest other triggers as long as recursion does not take place.' I'll leave the rest of the answer here for anyone else wondering about recursive triggers in other DBs.
Well firstly to answer your question - it depends on the database. Its entirely possible that trigger recursion is turned off on the instance you are working on. As you can imagine, trigger recursion could cause all kinds of chaos if handled incorrectly so SQL Server allows you to disable it altogether.
Secondly, I would suggest that perhaps there is a better way to get this functionality without triggers. You can get view based row level security with SQL Server. The same outcome can be achieved with Oracle VPDs.
Alternatively, if its configuration values you are trying to protect, I would group them all into a single table and apply permissions on that (simpler than row based security).

Resources