I am thinking of using Simple.Data Micro-ORM for my ASP.NET 4.5 website. However, there is something that I need to know before deciding whether to use it or not.
Let's take the following Join query for example:
var albums = db.Albums.FindAllByGenreId(1)
.Select(
db.Albums.Title,
db.Albums.Genre.Name);
This query will be translated to:
select
[dbo].[Albums].[Title],
[dbo].[Genres].[Name]
from [dbo].[Albums]
LEFT JOIN [dbo].[Genres] ON ([dbo].[Genres].[GenreId] = [dbo].[Albums].[GenreId])
WHERE [dbo].[Albums].[GenreId] = #p1
#p1 (Int32) = 1
Let's assume that the 'Genres' table is a a table with thousands or even millions of rows. I think that it might be very inefficient to filter the data after the JOIN has taken place, which is what this query translated for in Simple.Date.
Would it be better to filter the data firs in the Generes table, which means create make a SELECT statement first and make the JOIN with that filtered table?
Wouldn't it be better to filter the data ahead of time?
Furthermore, is there an option to make that type of complex (JOIN on a filtered table) query using Simple.Data.
Need your answer to know if to proceed with Simple.Data, or damp it in favor of another micro-ORM.
You are confused about how SQL is interpreted and executed by the database engine. Modern databases are incredibly smart about the best way to execute queries, and the order in which instructions appear in SQL statements has nothing to do with the order in which they are executed.
Try running some queries through SQL Management Studio and looking at the Execution Plan to see how they are actually optimised and executed. Or just try the SQL you think would work better and see how it actually performs compared to what is generated by Simple.Data.
The sql that Simple.Data is generating is idomatic T-SQL, too be honest its what I would be writing if I was drafting the sql myself.
This sql allows Sql Server to optimise the execution plan which should mean the most efficient retrieval of data.
The beauty of Simple.Data is that if you have any doubts or issues with the sql it generates you can just call a stored proc:
db.ProcedureWithParameters(1, 2);
Related
In AX programming best practice, which is the best way:
using a Query created from the AOT,
using a select statement with X++ code,
using a query created with X++ code the Query classe ...
And when to use each one of them?
First off, AX always uses queries internally, X++ selects are translated to query constructions calls, which are executed at run time. The query is translated to SQL at runtime on the the first queryRun.next() or datasource.executeQuery(). There is thus no performance difference using one or the other.
Forms also use queries, most often it is automatically constructed for you, because property AutoQuery has a Yes default value. You can use a X++ select in the executeQuery method, but I would consider that bad practice, as the user will have no filter or sorting options available. Always use queries in forms, prefer to use the auto queries. Add ranges or sorting in the init method using this.queryBuildDatasource() if needed. The exception being listpages which always use an AOT query.
In RunBase classes prefer to use queries, as the user will have the option to change the query. You may of cause use simple X++ select in the inner loop, but consider to include it in the prebuilt query, if possible.
Otherwise, your primary goal as a programmer (besides solving the problem) is to minimize the number of code lines.
Queries defined in the AOT start out with zero code lines, which count in their favor. Thus, if there are serveral statically defined ranges, links, or complex joins, use AOT queries. You cannot beat:
QueryRun qr = new QueryRun(queryStr(MyQuery))
qr.query().dataSourceTable(tableNum(MyTable)).findRange(fieldNum(MyTable,MyField)).value('myValue');
With:
Query q = new Query();
QueryRun qr = new QueryRun(q);
QueryBuildDataSource ds = q.addDataSource(tableNum(MyTable));
QueryBuildRange qbr = ds.addRange(fieldNum(MyTable,MyField));
qbr.value('myValue');
qbr.locked(true);
Thus in the static case prefer to use AOT queries, then change the query at runtime if needed. On the flip side, if your table is only known at runtime, you cannot use AOT queries, nor X++ selects, and you will need to build your query at runtime. The table browser is a good example of that.
What is left for X++?
Simple selects with small where clauses and with simple or no joins.
Cases where you cannot use queries (yet), delete_from, update_recordset and insert_recordset comes to mind.
Avoiding external dependencies (like AOT queries) may sometimes be more important.
Code readability of X++ queries is better than query construction.
The main techniques for selecting records in the database are as follows:
The select statement
A query
The techniques are essentially the same. They both deliver a set of records from the database in a table variable that can be accessed.
Use the select statement when:
The selection criteria are complex.
You are selecting a set of records from X++. The user is not going to change the selection criteria.
Use a query when:
The user can choose which records are selected.
The user can change the range of records to be selected.
The selection criteria are not more complex than the query can accommodate.
When you use queries, develop them in the Application Object Tree (AOT) or build them from scratch in your code. In both situations, the query can be modified in the code. A query built from scratch in code can be saved so that it occurs in the AOT. However, this should usually be avoided.
Build a query in the AOT when:
A specific query definition is being used in many places. (The query in the AOT can be reused.)
The query must contain code.
The query definition is more complex. The AOT provides a visual representation of the query.
This question is a followup to This Question
The solution, clearing the execution plan cache seemed to work at the time, but i've been running into the same problem over and over again, and clearing the cache no longer seems to help. There must be a deeper problem here.
I've discovered that if I remove the .Distinct() from the query, it returns rows (with duplicates) in about 2 seconds. However, with the .Distinct() it takes upwards of 4 minutes to complete. There are a lot of rows in the tables, and some of the where clause fields do not have indexes. However, the number of records returned is fairly small (a few dozen at most).
The confusing part about it is that if I get the SQL generated by the Linq query, via Linqpad, then execute that code as SQL or in SQL Management Studio (including the DISTINCT) it executes in about 3 seconds.
What is the difference between the Linq query and the executed SQL?
I have a short term workaround, and that's to return the set without .Distinct() as a List, then using .Distinct on the list, this takes about 2 seconds. However, I don't like doing SQL Server work on the web server.
I want to understand WHY the Distinct is 2 orders of magnitude slower in Linq, but not SQL.
UPDATE:
When executing the code via Linq, the sql profiler shows this code, which is basically identical query.
sp_executesql N'SELECT DISTINCT [t5].[AccountGroupID], [t5].[AccountGroup]
AS [AccountGroup1]
FROM [dbo].[TransmittalDetail] AS [t0]
INNER JOIN [dbo].[TransmittalHeader] AS [t1] ON [t1].[TransmittalHeaderID] =
[t0].[TransmittalHeaderID]
INNER JOIN [dbo].[LineItem] AS [t2] ON [t2].[LineItemID] = [t0].[LineItemID]
LEFT OUTER JOIN [dbo].[AccountType] AS [t3] ON [t3].[AccountTypeID] =
[t2].[AccountTypeID]
LEFT OUTER JOIN [dbo].[AccountCategory] AS [t4] ON [t4].[AccountCategoryID] =
[t3].[AccountCategoryID]
LEFT OUTER JOIN [dbo].[AccountGroup] AS [t5] ON [t5].[AccountGroupID] =
[t4].[AccountGroupID]
LEFT OUTER JOIN [dbo].[AccountSummary] AS [t6] ON [t6].[AccountSummaryID] =
[t5].[AccountSummaryID]
WHERE ([t1].[TransmittalEntityID] = #p0) AND ([t1].[DateRangeBeginTimeID] = #p1) AND
([t1].[ScenarioID] = #p2) AND ([t6].[AccountSummaryID] = #p3)',N'#p0 int,#p1 int,
#p2 int,#p3 int',#p0=196,#p1=20100101,#p2=2,#p3=0
UPDATE:
The only difference between the queries is that Linq executes it with sp_executesql and SSMS does not, otherwise the query is identical.
UPDATE:
I have tried various Transaction Isolation levels to no avail. I've also set ARITHABORT to try to force a recompile when it executes, and no difference.
The bad plan is most likely the result of parameter sniffing: http://blogs.msdn.com/b/queryoptteam/archive/2006/03/31/565991.aspx
Unfortunately there is not really any good universal way (that I know of) to avoid that with L2S. context.ExecuteCommand("sp_recompile ...") would be an ugly but possible workaround if the query is not executed very frequently.
Changing the query around slightly to force a recompile might be another one.
Moving parts (or all) of the query into a view*, function*, or stored procedure* DB-side would be yet another workaround.
* = where you can use local params (func/proc) or optimizer hints (all three) to force a 'good' plan
Btw, have you tried to update statistics for the tables involved? SQL Server's auto update statistics doesn't always do the job, so unless you have a scheduled job to do that it might be worth considering scripting and scheduling update statistics... ...tweaking up and down the sample size as needed can also help.
There may be ways to solve the issue by adding* (or dropping*) the right indexes on the tables involved, but without knowing the underlying db schema, table size, data distribution etc that is a bit difficult to give any more specific advice on...
* = Missing and/or overlapping/redundant indexes can both lead to bad execution plans.
The SQL that Linqpad gives you may not be exactly what is being sent to the DB.
Here's what I would suggest:
Run SQL Profiler against the DB while you execute the query. Find the statement which corresponds to your query
Paste the whole statment into SSMS, and enable the "Show Actual Execution Plan" option.
Post the resulting plan here for people to dissect.
Key things to look for:
Table Scans, which usually imply that an index is missing
Wide arrows in the graphical plan, indicating lots of intermediary rows being processed.
If you're using SQL 2008, viewing the plan will often tell you if there are any indexes missing which should be added to speed up the query.
Also, are you executing against a DB which is under load from other users?
At first glance there's a lot of joins, but I can only see one thing to reduce the number right away w/out having the schema in front of me...it doesn't look like you need AccountSummary.
[t6].[AccountSummaryID] = #p3
could be
[t5].[AccountSummaryID] = #p3
Return values are from the [t5] table. [t6] is only used filter on that one parameter which looks like it is the Foreign Key from t5 to t6, so it is present in [t5]. Therefore, you can remove the join to [t6] altogether. Or am I missing something?
Are you sure you want to use LEFT OUTER JOIN here? This query looks like it should probably be using INNER JOINs, especially because you are taking the columns that are potentially NULL and then doing a distinct on it.
Check that you have the same Transaction Isolation level between your SSMS session and your application. That's the biggest culprit I've seen for large performance discrepancies between identical queries.
Also, there are different connection properties in use when you work through SSMS than when executing the query from your application or from LinqPad. Do some checks into the Connection properties of your SSMS connection and the connection from your application and you should see the differences. All other things being equal, that could be the difference. Keep in mind that you are executing the query through two different applications that can have two different configurations and could even be using two different database drivers. If the queries are the same then that would be only differences I can see.
On a side note if you are hand-crafting the SQL, you may try moving the conditions from the WHERE clause into the appropriate JOIN clauses. This actually changes how SQL Server executes the query and can produce a more efficient execution plan. I've seen cases where moving the filters from the WHERE clause into the JOINs caused SQL Server to filter the table earlier in the execution plan and significantly changed the execution time.
I just started learning LINQ to SQL, and so far I'm impressed with the easy of use and good performance.
I used to think that when doing LINQ queries like
from Customer in DB.Customers where Customer.Age > 30 select Customer
LINQ gets all customers from the database ("SELECT * FROM Customers"), moves them to the Customers array and then makes a search in that Array using .NET methods. This is very inefficient, what if there are hundreds of thousands of customers in the database? Making such big SELECT queries would kill the web application.
Now after experiencing how actually fast LINQ to SQL is, I start to suspect that when doing that query I just wrote, LINQ somehow converts it to a SQL Query string
SELECT * FROM Customers WHERE Age > 30
And only when necessary it will run the query.
So my question is: am I right? And when is the query actually run?
The reason why I'm asking is not only because I want to understand how it works in order to build good optimized applications, but because I came across the following problem.
I have 2 tables, one of them is Books, the other has information on how many books were sold on certain days. My goal is to select books that had at least 50 sales/day in past 10 days. It's done with this simple query:
from Book in DB.Books where (from Sale in DB.Sales where Sale.SalesAmount >= 50 && Sale.DateOfSale >= DateTime.Now.AddDays(-10) select Sale.BookID).Contains(Book.ID) select Book
The point is, I have to use the checking part in several queries and I decided to create an array with IDs of all popular books:
var popularBooksIDs = from Sale in DB.Sales where Sale.SalesAmount >= 50 && Sale.DateOfSale >= DateTime.Now.AddDays(-10) select Sale.BookID;
BUT when I try to do the query now:
from Book in DB.Books where popularBooksIDs.Contains(Book.ID) select Book
It doesn't work! That's why I think that we can't use thins kinds of shortcuts in LINQ to SQL queries, like we can't use them in real SQL. We have to create straightforward queries, am I right?
You are correct. LINQ to SQL does create the actual SQL to retrieve your results.
As for your shortcuts, there are ways to work around the limitations:
var popularBooksIds = DB.Sales
.Where(s => s.SalesAmount >= 50
&& s.DateOfSale >= DateTime.Now.AddDays(-10))
.Select(s => s.Id)
.ToList();
// Actually should work.
// Forces the table into memory and then uses LINQ to Objects for the query
var popularBooksSelect = DB.Books
.ToList()
.Where(b => popularBooksIds.Contains(b.Id));
Yes, query gets translated to a SQL string, and the underlying SQL can be different depending on what you are trying to do... so you have to be careful in that regard. Checkout a tool called linqpad, you can try your query in it and see the executing SQL.
Also, it runs when iterating through the collection or calling a method on it like ToList().
Entity framework or linq queries can be tricky sometimes. Sometimes you are surprised at the efficiency of the sql query generated and sometimes the query is so complicated and inefficient that you would smack your forehead.
Best idea is that if you have any suspicions about a query, run an sql profiler at the backend that would monitor all the queries coming in. That way you know exactly what is being passed on to the sql server and correct any inefficiencies if need be.
http://damieng.com/blog/2008/07/30/linq-to-sql-log-to-debug-window-file-memory-or-multiple-writers
This will help you to see what and when queries are being run. Also, Damiens blog is full of other linq to sql goodness.
You can generate an EXISTS clause by using the .Any method. I have had more success that way than trying to generate IN clauses, because it likes to retrieve all the data and pass it all back in as parameters to a query
In linq to sql, IQueryable expression fragments can be combined to create a single query, it will try to keep everything as an IQueryable for as long as it can, before you do something that cannot be expressed in SQL. When you call ToList you are directly asking it to resolve that query into an IEnumerable stored in memory.
In most cases you are better off not selecting the book ids in advance. Keep the fragment for popular books in a single place in the code and use it when necessary, to build on another query. An IQueryable is just an expression tree, which is resolved into SQL at some other point.
If you think your application will perform better by storing the popular books elsewhere (memcache or whatever), then you may consider pulling them out before hand, and checking against that later. This will mean each book id will be passed in as a sproc parameter and used in an IN clause.
Here's the situation. Due to the design of the database I have to work with, I need to write a stored procedure in such a way that I can pass in the name of the table to be queried against if at all possible. The program in question does its processing by jobs, and each job gets its own table created in the database, IE table-jobid1, table-jobid2, table-jobid3, etc. Unfortunately, there's nothing I can do about this design - I'm stuck with it.
However, now, I need to do data mining against these individualized tables. I'd like to avoid doing the SQL in the code files at all costs if possible. Ideally, I'd like to have a stored procedure similar to:
SELECT *
FROM #TableName AS tbl
WHERE #Filter
Is this even possible in SQL Server 2005? Any help or suggestions would be greatly appreciated. Alternate ways to keep the SQL out of the code behind would be welcome too, if this isn't possible.
Thanks for your time.
best solution I can think of is to build your sql in the stored proc such as:
#query = 'SELECT * FROM ' + #TableName + ' as tbl WHERE ' + #Filter
exec(#query)
not an ideal solution probably, but it works.
The best answer I can think of is to build a view that unions all the tables together, with an id column in the view telling you where the data in the view came from. Then you can simply pass that id into a stored proc which will go against the view. This is assuming that the tables you are looking at all have identical schema.
example:
create view test1 as
select * , 'tbl1' as src
from job-1
union all
select * , 'tbl2' as src
from job-2
union all
select * , 'tbl3' as src
from job-3
Now you can select * from test1 where src = 'tbl3' and you will only get records from the table job-3
This would be a meaningless stored proc. Select from some table using some parameters? You are basically defining the entire query again in whatever you are using to call this proc, so you may as well generate the sql yourself.
the only reason I would do a dynamic sql writing proc is if you want to do something that you can change without redeploying your codebase.
But, in this case, you are just SELECT *'ing. You can't define the columns, where clause, or order by differently since you are trying to use it for multiple tables, so there is no meaningful change you could make to it.
In short: it's not even worth doing. Just slop down your table specific sprocs or write your sql in strings (but make sure it's parameterized) in your code.
I will explain problem with an example:
There is two table in my database, named entry, tags
There is a column named ID_ENTRY in both table. When I add a record to table, entry, I have to take the ID_ENTRY of last added record and add it to table, tags. How can I do it?
The only way to do this is with multiple statements. Using dynamic sql you can do this by separating each statement in your query string with a semi-colon:
"DECLARE #ID int;INSERT INTO [Entry] (...) VALUES ...; SELECT #ID = scope_identity();INSERT INTO [TAGS] (ID_ENTRY) VALUES (#ID);"
Make sure you put this in a transaction to protect against concurrency problems and keep it all atomic. You could also break that up into two separate queries to return the new ID value in the middle if you want; just make sure both queries are in the same transaction.
Also: you are using parameterized queries with your dynamic sql, right? If you're not, I'll personally come over there and smack you 10,000 times with a wet noodle until you repent of your insecure ways.
Immediatly after executing the insert statement on first table, you should query ##IDENTITY doing "SELECT ##identity". That will retrieve the last autogenerated ID... and then just insert it on the second table.
If you are using triggers or something that inserts rows... this may be not work. Use Scope_Identity() instead of ##IDENTITY
I would probably do this with an INSERT trigger on the named entry table, if you have all of the data you need to push to the tags table available. If not, then you might want to consider using a stored procedure that creates both inside a transaction.
If you want to do it in code, you'll need to be more specific about how you are managing your data. Are you using DataAdapter, DataTables, LINQ, NHibernate, ...? Essentially, you need to wrap both inserts inside a transaction of some sort so that either inserts get executed or neither do, but the means to doing that depend on what technology you are using to interact with the database.
If you use dynamic sql, why not use Linq to Entity Framework, now EF is the recommend data access technology from Microsoft (see this post Clarifying the message on L2S Futures from ADO.NET team blog), and if you do an insert with EF the last identity id will available for you automatically, I use it all the time it's easy.
Hope this helps!
Ray.