Why does adding this line to a Select slow down SQL drastically? - oracle11g

I have a fairly complicated query that uses a bunch of tables. It usually takes ~17 seconds to return all records (though it is usually filtered). One of the fields it returns is CAT.OLD_CODE. When I replace that field with NVL(CAT.OLD_CODE,(CASE WHEN ISS.WHDEF_CODE = 'SCRAP' THEN 'SCRAP' ELSE 'QH/PH/CUST' END)) the query takes forever (I haven't let sql developer run it long enough to get just the first 50 rows yet). ISS.WHDEF_CODE is already in the select statement, and both CAT and ISS are outer joined to the main body of the query. I'm using the command in a crystal report, and currently doing the filtering there, but I'm just curious why the CASE statement takes so long (just using NVL has no appreciable impact on performance). I've noticed this before when using string operations in select statements, but this seems to just be a simple comparison.

Related

U-Sql use rowset variable for decision making

I want to use rowset variable as scaler variable.
#cnt = Select count(*) from #tab1;
If (#cnt > 0) then
#cnt1= select * from #tab2;
End;
Is it possible?
======================================
I want to block the complex u-sql code based on some condition, lets say based on some control table. In my original code, I wrote 10-15 u-sql statements and I want to bound them within the If statement. I don't want to do cross join because it again start trying to join the table. If I use cross join, there is no significant save in execution time. Use of IF statement is, If the condition does not met, complete piece of code should not execute. Is it possible?
To add to wBob's and Alex's answers:
U-SQL does not provide data driven control flow within a script. The current IF statement requires the expression to be evaluated at compile time.
Consider a U-SQL script as just a single declarative query. So you have the following options:
Express your problem with relational expressions. This means that you will have to write a (cross) join to guard the execution. If you feel that the query optimizer does a bad job at optimizing such guards (e.g., it evaluates the expensive side of the join before the cheap guard), please report an issue and we will take a look.
Split your script into several scripts and look at the result of each script before doing your next step. This is a form of orchestration that you can do with ADF or writing your own orchestration with Powershell or any of the SDKs. The caveat here is that you will have to write intermediate results into files and download the files into your orchestration layer.
Having said this, it theoretically is possible to extent the language algebra with a "don't execute the remaining part of this operator tree if a condition is not satisfied" operator. However that is a major work item and can lead to very large query plans during compilation that may be going beyond the current limits. If you feel that neither 1 nor 2 above are sufficient to help with your scenario, please add your vote to https://feedback.azure.com/forums/327234-data-lake/suggestions/17635906-please-add-dynamic-if-evaluation-to-u-sql.
#cnt1 =
SELECT #tab2.*
FROM #tab2
CROSS JOIN (SELECT COUNT(*) AS cnt FROM #tab1) AS c
WHERE c.cnt > 0;
(Adding explanation) CROSS JOIN returns a cartesian product of all rows from #tab2 and the single row generated by the COUNT query. There WHERE condition then ensures the result of the query is all rows from #tab2 if COUNT(*)>0, no rows otherwise.

NHibernate Query slows other queries

i'm writing a program in which i use two database queries using NHibernate. First query is a large one - select with two joins (the big SELECT query) whose result is about 50000 records. Query takes about 30 secs. Next step in the program is iterating through these 50000 record and invoking query on each of this records. This query is pretty small COUNT method.
There are two interesting things tough:
If i run the small COUNT query before the big SELECT, the COUNT query takes about 10ms, but if i ran it after the big SELECT query it takes 8-9 seconds. Furthermore, if i reduce the complexity of the big SELECT query i also reduce the time of the COUNT query execution afterwards.
If i ran the the big SELECT query on sql server management studio it takes 1 sec, but from ASP.NET application it takes 30 secs.
SO there are two main questions. Why is the query taking so long to execute in code when its so fast in ssms? Why is the big SELECT query affecting the small COUNT queries afterwards.
I know there are many possible answers to this problem but i have googled a lot and this is what i have tried:
Setting the SET parameters of asp.net application and ssms so they are the same to avoid different query plans
Clearing the ssms cache so the good ssms result is not caused by ssms caching - same 1 second result after the cache clear
The big SELECT query:
var subjects = Query
.FetchMany(x => x.Registrations)
.FetchMany(x => x.Aliases)
.Where(x => x.InvalidationDate == null)
.ToList();
The small COUNT query:
Query.Count(x => debtorIRNs.Contains(x.DebtorIRN.CodIRN) && x.CurrentAmount > 0 && !x.ArchivationDate.HasValue && x.InvalidationDate == null);
As it turned out the above mentioned FatchMany's were inevitable for the program so i couldn't just skip. The first significant improvement i achieved was turning off the loggs of the application (as i mentioned the above code is just a fragment). Performance without logs were about a half faster. But still it took considerable amount of time. SO i decided to avoid using NHibernate for this query and wrote plain sqlQuery to data reader, which i than parsed into my object's. I was able to reduce the execution time from 2.5 days (50000 * 4 sec -> number of small queries * former execution time of one small query) to 8 minutes.

Why records added so slowly

I'm reading records from SQL Server 2005 and writing returned recordset to SQLite with following piece of code.
My compiler is Lazarus 1.0.12 and qt1 is "sqlquery" also "qrystkmas" is Ztable from Zeos dbo...
but the operation is quite slow. the test time is
start time : 15:47:11
finish time : 16:19:04
Record count is : 19500
So in SQL Server and SQL Server CE pair it is less than 2-3 minute on Delphi project.
How can I speed up this process?
Code:
Label2.Caption:=TimeToStr(Time);
if Dm.Qt1.Active then Dm.Qt1.Close;
Dm.Qt1.SQL.Clear;
Dm.Qt1.SQL.Add(' select ');
Dm.Qt1.SQL.Add(' st.sto_kod, st.sto_isim,st.sto_birim1_ad, ');
Dm.Qt1.SQL.Add(' st.sto_toptan_vergi,st.sto_perakende_vergi,');
Dm.Qt1.SQL.Add(' st.sto_max_stok,st.sto_min_stok, ');
Dm.Qt1.SQL.Add(' sba.bar_kodu, ');
Dm.Qt1.SQL.Add(' stf.sfiyat_fiyati ');
Dm.Qt1.SQL.Add(' from MikroDB_V14_DEKOR2011.dbo.STOKLAR st ');
Dm.Qt1.SQL.Add(' left JOIN MikroDB_V14_DEKOR2011.dbo.BARKOD_TANIMLARI sba on sba.bar_stokkodu=st.sto_kod ');
Dm.Qt1.SQL.Add(' left JOIN MikroDB_V14_DEKOR2011.dbo.STOK_SATIS_FIYAT_LISTELERI stf on stf.sfiyat_stokkod=st.sto_kod ');
Dm.Qt1.SQL.Add(' where LEFT(st.sto_kod,1)=''5'' --and stf.sfiyat_listesirano=1 ');
Dm.Qt1.Open;
Dm.qryStkMas.Open;
Dm.qrystkmas.First;
While not Dm.Qt1.EOF do
begin
Dm.qryStkMas.Append;
Dm.qryStkMas.FieldByName('StkKod').AsString :=Dm.Qt1.FieldByName('sto_kod').AsString;
Dm.qryStkMas.FieldByName('StkAd').AsString :=Dm.Qt1.FieldByName('sto_isim').AsString;
Dm.qryStkMas.FieldByName('StkBrm').AsString :=Dm.Qt1.FieldByName('sto_birim1_ad').AsString;
Dm.qryStkMas.FieldByName('StkBar').AsString :=Dm.Qt1.FieldByName('bar_kodu').AsString;
Dm.qryStkMas.FieldByName('StkKdv1').AsFloat :=Dm.Qt1.FieldByName('sto_toptan_vergi').AsFloat;
Dm.qryStkMas.FieldByName('StkKdv2').AsFloat :=Dm.Qt1.FieldByName('sto_perakende_vergi').AsFloat;
Dm.qryStkMas.FieldByName('StkGir').AsFloat :=0;
Dm.qryStkMas.FieldByName('StkCik').AsFloat :=0;
Dm.qryStkMas.FieldByName('YeniStk').AsBoolean :=False;
Dm.qryStkMas.FieldByName('MinStk').AsFloat :=Dm.Qt1.FieldByName('sto_min_stok').AsFloat;
Dm.qryStkMas.FieldByName('MaxStk').AsFloat :=Dm.Qt1.FieldByName('sto_max_stok').AsFloat;
Dm.qryStkMas.FieldByName('StkGrp1').AsString:='';
Dm.qryStkMas.FieldByName('StkGrp2').AsString:='';
Dm.qryStkMas.FieldByName('StkGrp3').AsString:='';
Dm.qryStkMas.FieldByName('StkGrp4').AsString:='';
Dm.qryStkMas.FieldByName('StkFytno').AsInteger:=1;
Label1.Caption:=Dm.Qt1.FieldByName('sto_isim').AsString;
Dm.qryStkMas.Post;
Dm.Qt1.Next;
end;
Dm.qryStkMas.Close;
label3.Caption:=timetostr(time);
The first step in speeding things up is diagnosis.
MEASURING
You can measure by splitting the select and insert up obviously, but you can also get some diagnostics out SQL itself.
If you prefix out query with the keyword EXPLAIN in SQLite it will tell you what indexes are use and how the statement is handled internally, see here: http://www.sqlite.org/eqp.html
This is invaluable info for optimizing.
IN MS SQL Server you go into the gui, put in the query and click on the estimated query plan button, see: what is the equivalent of EXPLAIN form SQLite in SQL Server?.
What's taking the most time? Is the select slow or is the insert.
SELECT
Selects are usually speed up by putting indexes on those fields that are evaluated.
In your case the fields involved in the join criteria.
The field in the where clause uses a function and you cannot put an index on a function in MSSQL (you can in PostgreSQL and Oracle).
INSERT
Inserts are speed up by disabling indexes.
One common trick is to disable all indexing prior to the insert batch and reenable them after the insert batch is done.
This is usually more efficient because its faster (per item) to sort the whole in one go that to keep resorting after each individual insert.
You can also disable transaction safeguards.
This will corrupt your data in case or power/disk etc failure, so consider yourself warned, see here: Improve large data import performance into SQLite with C#
Comments on the code
You select data using an SQL select statement, however you insert using the datasets append and fieldbyname() methods. FieldByName is notoriously slow because it does a name lookup every time.
FieldByName should never be used in a loop
Construct an insert SQL statement instead.
Remember that you can use parameters, or even hard paste the values in there.
Do some experimentation to see which is faster.
About.com has a nice article on how to speed up database access by eliminating FieldByName: http://delphi.about.com/od/database/ss/faster-fieldbyname-delphi-database.htm
Did you try wrapping your insertions in a transaction? You would need to BEGIN a transaction before your While... and COMMIT it after the ...End. Try it, it might help.
Edit: If you get an improvement, that would be because your database connection to SQLite is set up in the "autocommit" mode, where every operation (such as your .Append) is done independantly with all the others, and SQLite is smart enough to ensure ACID properties of the database. This means that for every write operation you make, the database will make one or more writes to your hard drive, which is slow. By explicitly creating a transaction (which turns off autocommit...), you group write operations in a transaction, and the database can issue a much smaller number of writes to the hard drive when you explicit commit the transaction.

How Optimize sql query make it faster

I have a very simple small database, 2 of tables are:
Node (Node_ID, Node_name, Node_Date) : Node_ID is primary key
Citation (Origin_Id, Target_Id) : PRIMARY KEY (Origin_Id, Target_Id) each is FK in Node
Now I write a query that first find all citations that their Origin_Id has a specific date and then I want to know what are the target dates of these records.
I'm using sqlite in python the Node table has 3000 record and Citation has 9000 records,
and my query is like this in a function:
def cited_years_list(self, date):
c=self.cur
try:
c.execute("""select n.Node_Date,count(*) from Node n INNER JOIN
(select c.Origin_Id AS Origin_Id, c.Target_Id AS Target_Id, n.Node_Date AS
Date from CITATION c INNER JOIN NODE n ON c.Origin_Id=n.Node_Id where
CAST(n.Node_Date as INT)={0}) VW ON VW.Target_Id=n.Node_Id
GROUP BY n.Node_Date;""".format(date))
cited_years=c.fetchall()
self.conn.commit()
print('Cited Years are : \n ',str(cited_years))
except Exception as e:
print('Cited Years retrival failed ',e)
return cited_years
Then I call this function for some specific years, But it's crazy slowwwwwwwww :( (around 1 min for a specific year)
Although my query works fine, it is slow. would you please give me a suggestion to make it faster? I'd appreciate any idea about optimizing this query :)
I also should mention that I have indices on Origin_Id and Target_Id, so the inner join should be pretty fast, but it's not!!!
If this script runs over a period of time, you may consider loading the database into memory. Since you seem to be coding in python, there is a connection function called connection.backup that can backup an entire database into memory. Since memory is much faster than disk, this should increase speed. Of course, this doesn''t do anything to optimize the statement itself, since I don't have enough of the code to evaluate what it is you are doing with the code.
Instead of COUNT(*) use MAX(n.Node_Date)
SQLite doesn't keep a counter on number of tables like mysql does but instead it scans all your rows everytime you call COUNT meaning extremely slow.. yet you can use MAX() to fix that problem.

Why does this query timeout? V2

This question is a followup to This Question
The solution, clearing the execution plan cache seemed to work at the time, but i've been running into the same problem over and over again, and clearing the cache no longer seems to help. There must be a deeper problem here.
I've discovered that if I remove the .Distinct() from the query, it returns rows (with duplicates) in about 2 seconds. However, with the .Distinct() it takes upwards of 4 minutes to complete. There are a lot of rows in the tables, and some of the where clause fields do not have indexes. However, the number of records returned is fairly small (a few dozen at most).
The confusing part about it is that if I get the SQL generated by the Linq query, via Linqpad, then execute that code as SQL or in SQL Management Studio (including the DISTINCT) it executes in about 3 seconds.
What is the difference between the Linq query and the executed SQL?
I have a short term workaround, and that's to return the set without .Distinct() as a List, then using .Distinct on the list, this takes about 2 seconds. However, I don't like doing SQL Server work on the web server.
I want to understand WHY the Distinct is 2 orders of magnitude slower in Linq, but not SQL.
UPDATE:
When executing the code via Linq, the sql profiler shows this code, which is basically identical query.
sp_executesql N'SELECT DISTINCT [t5].[AccountGroupID], [t5].[AccountGroup]
AS [AccountGroup1]
FROM [dbo].[TransmittalDetail] AS [t0]
INNER JOIN [dbo].[TransmittalHeader] AS [t1] ON [t1].[TransmittalHeaderID] =
[t0].[TransmittalHeaderID]
INNER JOIN [dbo].[LineItem] AS [t2] ON [t2].[LineItemID] = [t0].[LineItemID]
LEFT OUTER JOIN [dbo].[AccountType] AS [t3] ON [t3].[AccountTypeID] =
[t2].[AccountTypeID]
LEFT OUTER JOIN [dbo].[AccountCategory] AS [t4] ON [t4].[AccountCategoryID] =
[t3].[AccountCategoryID]
LEFT OUTER JOIN [dbo].[AccountGroup] AS [t5] ON [t5].[AccountGroupID] =
[t4].[AccountGroupID]
LEFT OUTER JOIN [dbo].[AccountSummary] AS [t6] ON [t6].[AccountSummaryID] =
[t5].[AccountSummaryID]
WHERE ([t1].[TransmittalEntityID] = #p0) AND ([t1].[DateRangeBeginTimeID] = #p1) AND
([t1].[ScenarioID] = #p2) AND ([t6].[AccountSummaryID] = #p3)',N'#p0 int,#p1 int,
#p2 int,#p3 int',#p0=196,#p1=20100101,#p2=2,#p3=0
UPDATE:
The only difference between the queries is that Linq executes it with sp_executesql and SSMS does not, otherwise the query is identical.
UPDATE:
I have tried various Transaction Isolation levels to no avail. I've also set ARITHABORT to try to force a recompile when it executes, and no difference.
The bad plan is most likely the result of parameter sniffing: http://blogs.msdn.com/b/queryoptteam/archive/2006/03/31/565991.aspx
Unfortunately there is not really any good universal way (that I know of) to avoid that with L2S. context.ExecuteCommand("sp_recompile ...") would be an ugly but possible workaround if the query is not executed very frequently.
Changing the query around slightly to force a recompile might be another one.
Moving parts (or all) of the query into a view*, function*, or stored procedure* DB-side would be yet another workaround.
 * = where you can use local params (func/proc) or optimizer hints (all three) to force a 'good' plan
Btw, have you tried to update statistics for the tables involved? SQL Server's auto update statistics doesn't always do the job, so unless you have a scheduled job to do that it might be worth considering scripting and scheduling update statistics... ...tweaking up and down the sample size as needed can also help.
There may be ways to solve the issue by adding* (or dropping*) the right indexes on the tables involved, but without knowing the underlying db schema, table size, data distribution etc that is a bit difficult to give any more specific advice on...
 * = Missing and/or overlapping/redundant indexes can both lead to bad execution plans.
The SQL that Linqpad gives you may not be exactly what is being sent to the DB.
Here's what I would suggest:
Run SQL Profiler against the DB while you execute the query. Find the statement which corresponds to your query
Paste the whole statment into SSMS, and enable the "Show Actual Execution Plan" option.
Post the resulting plan here for people to dissect.
Key things to look for:
Table Scans, which usually imply that an index is missing
Wide arrows in the graphical plan, indicating lots of intermediary rows being processed.
If you're using SQL 2008, viewing the plan will often tell you if there are any indexes missing which should be added to speed up the query.
Also, are you executing against a DB which is under load from other users?
At first glance there's a lot of joins, but I can only see one thing to reduce the number right away w/out having the schema in front of me...it doesn't look like you need AccountSummary.
[t6].[AccountSummaryID] = #p3
could be
[t5].[AccountSummaryID] = #p3
Return values are from the [t5] table. [t6] is only used filter on that one parameter which looks like it is the Foreign Key from t5 to t6, so it is present in [t5]. Therefore, you can remove the join to [t6] altogether. Or am I missing something?
Are you sure you want to use LEFT OUTER JOIN here? This query looks like it should probably be using INNER JOINs, especially because you are taking the columns that are potentially NULL and then doing a distinct on it.
Check that you have the same Transaction Isolation level between your SSMS session and your application. That's the biggest culprit I've seen for large performance discrepancies between identical queries.
Also, there are different connection properties in use when you work through SSMS than when executing the query from your application or from LinqPad. Do some checks into the Connection properties of your SSMS connection and the connection from your application and you should see the differences. All other things being equal, that could be the difference. Keep in mind that you are executing the query through two different applications that can have two different configurations and could even be using two different database drivers. If the queries are the same then that would be only differences I can see.
On a side note if you are hand-crafting the SQL, you may try moving the conditions from the WHERE clause into the appropriate JOIN clauses. This actually changes how SQL Server executes the query and can produce a more efficient execution plan. I've seen cases where moving the filters from the WHERE clause into the JOINs caused SQL Server to filter the table earlier in the execution plan and significantly changed the execution time.

Resources