Efficient joining the most recent record from another table in Entity Framework Core - asp.net

I am comming to ASP .NET Core from PHP w/ MySQL.
The problem:
For the illustration, suppose the following two tables:
T: {ID, Description, FK} and States: {ID, ID_T, Time, State}. There is 1:n relationship between them (ID_T references T.ID).
I need all the records from T with some specific value of FK (lets say 1) along with the related newest record in States (if any).
In terms of SQL it can be written as:
SELECT T.ID, T.Description, COALESCE(s.State, 0) AS 'State' FROM T
LEFT JOIN (
SELECT ID_T, MAX(Time) AS 'Time'
FROM States
GROUP BY ID_T
) AS sub ON T.ID = sub.ID_T
LEFT JOIN States AS s ON T.ID = s.ID_T AND sub.Time = s.Time
WHERE FK = 1
I am struggling to write an efficient equivalent query in LINQ (or the fluent API). The best working solution I've got so far is:
from t in _context.T
where t.FK == 1
join s in _context.States on t.ID equals o.ID_T into _s
from s in _s.DefaultIfEmpty()
let x = new
{
id = t.ID,
time = s == null ? null : (DateTime?)s.Time,
state = s == null ? false : s.State
}
group x by x.id into x
select x.OrderByDescending(g => g.time).First();
When I check the resulting SQL query in the output window when executed it is just like:
SELECT [t].[ID], [t].[Description], [t].[FK], [s].[ID], [s].[ID_T], [s].[Time], [s].[State]
FROM [T] AS [t]
LEFT JOIN [States] AS [s] ON [T].[ID] = [s].[ID_T]
WHERE [t].[FK] = 1
ORDER BY [t].[ID]
Not only it selects more columns than I need (in the real scheme there are more of them). There is no grouping in the query so I suppose it selects everything from the DB (and States is going to be huge) and the grouping/filtering is happening outside the DB.
The questions:
What would you do?
Is there an efficient query in LINQ / Fluent API?
If not, what workarounds can be used?
Raw SQL ruins the concept of abstracting from a specific DB technology and its use is very clunky in current Entity Framework Core (but maybe its the best solution).
To me, this looks like a good example for using a database view - again, not really supported by Entity Framework Core (but maybe its the best solution).

What happens if you try to do a more straight forward translation to LINQ?
var latestState = from s in _context.States
group s by s.ID_T into sg
select new { ID_T = sg.Key, Time = sg.Time.Max() };
var ans = from t in _context.T
where t.FK == 1
join sub in latestState on t.ID equals sub.ID_T into subj
from sub in subj.DefaultIfEmpty()
join s in _context.States on new { t.ID, sub.Time } equals new { s.ID, s.Time } into sj
from s in sj.DefaultIfEmpty()
select new { t.ID, t.Description, State = (s == null ? 0 : s.State) };
Apparently the ?? operator will translate to COALESCE and may handle an empty table properly, so you could replace the select with:
select new { t.ID, t.Description, State = s.State ?? 0 };

OK. Reading this article (almost a year old now), Smit's comment to the original question and other sources, it seems that EF Core is not really production ready yet. It is not able to translate grouping to SQL and therefore it is performed on the client side, which may be (and in my case would be) a serious problem. It corresponds to the observed behavior (the generated SQL query does no grouping and selects everything in all groups). Trying the LINQ queries out in Linqpad it always translates to a single SQL query.
I have downgraded to EF6 following this article. It required some changes in my model's code and some queries. After changing .First() to .FirstOrDefault() in my original LINQ query it works fine and translates to a single SQL query selecting only the needed columns. The generated query is much more complex than it is needed, though.
Using a query from NetMage's answer (after small fixes), it results in a SQL query almost identical to my own original SQL query (there's only a more complex construct than COALESCE).
var latestState = from s in _context.States
group s by s.ID_T into sg
select new { ID = sg.Key, Time = sg.Time.Max() };
var ans = from t in _context.T
where t.FK == 1
join sub in latestState on t.ID equals sub.ID into subj
from sub in subj.DefaultIfEmpty()
join s in _context.States
on new { ID_T = t.ID, sub.Time } equals new { s.ID_T, s.Time }
into sj
from s in sj.DefaultIfEmpty()
select new { t.ID, t.Description, State = (s == null ? false : s.State) };
In LINQ it's not as elegant as my original SQL query but semantically it's the same and it does more or less the same thing on the DB side.
In EF6 it is also much more convenient to use arbitrary raw SQL queries and AFAIK also the database views.
The biggest downside of this approach is that full .NET framework has to be targeted, EF6 is not compatible with .NET Core.

Related

LINQ Entities Where clause not in correct place

Apparently I'm missing something with how LINQ to entities works. Hopefully one of you all can educate me.
Please try the below locally and let me know if you are seeing the same results. Something is really strange here...
Lets look at a very simple LINQ expression using navigation properties.
This was generated in LinqPad in a C# statement.
var result = (from ge in group_execution
where ge.automation_sequences.project.client_id == 1 && ge.parent_group_exec_id != null
select new
{
ge.id,
ge.parent_group_exec_id,
ge.automation_sequences.project.client_id
});
result.Dump();
OR, we can use joins...which will lead to the same bad results, but lets continue...
var result = (from ge in group_execution
join aseq in automation_sequences on ge.automation_sequence_id equals aseq.id
join p in project on aseq.project_id equals p.id
where p.client_id == 1 && ge.parent_group_exec_id != null
select new
{
ge.id,
ge.parent_group_exec_id,
p.client_id
});
result.Dump();
These very simple LINQ expressions generate the following SQL:
SELECT
[Filter1].[id1] AS [id],
[Filter1].[parent_group_exec_id] AS [parent_group_exec_id],
[Extent5].[client_id] AS [client_id]
FROM (SELECT [Extent1].[id] AS [id1], [Extent1].[automation_sequence_id] AS [automation_sequence_id], [Extent1].[parent_group_exec_id] AS [parent_group_exec_id]
FROM [dbo].[group_execution] AS [Extent1]
INNER JOIN [dbo].[automation_sequences] AS [Extent2] ON [Extent1].[automation_sequence_id] = [Extent2].[id]
INNER JOIN [dbo].[project] AS [Extent3] ON [Extent2].[project_id] = [Extent3].[id]
WHERE ([Extent1].[parent_group_exec_id] IS NOT NULL) AND (1 = [Extent3].[client_id]) ) AS [Filter1]
LEFT OUTER JOIN [dbo].[automation_sequences] AS [Extent4] ON [Filter1].[automation_sequence_id] = [Extent4].[id]
LEFT OUTER JOIN [dbo].[project] AS [Extent5] ON [Extent4].[project_id] = [Extent5].[id]
This baffles me. For the life of me I can't understand why LINQ is doing this. It's horrible, just look at the execution plan:
Now lets manually clean this up in SSMS and view the correct SQL and execution plan:
Much better, but how do we get LINQ to act this way?
Is anyone else seeing this? Has anyone else ever saw this and corrected it and if so how?
Thanks for looking into this.
UPDATE, attempting Chris Schaller fix:
var result = (from ge in group_execution
select new
{
ge.id,
ge.parent_group_exec_id,
ge.automation_sequences.project.client_id
}).Where(x=>x.client_id == 1 && x.parent_group_exec_id != null);
result.Dump();
Just so you all know I'm monitoring the SQL through SQL Server Profiler. If anyone knows of any issues doing it this way let me know.
UPDATE, a fix for JOINS, but not nav properties, and a cause, but why?
Here's your solution:
var result = (from ge in group_execution.Where(x=>x.parent_group_exec_id != null)
join aseq in automation_sequences on ge.automation_sequence_id equals aseq.id
join p in project on aseq.project_id equals p.id
where p.client_id == 1// && ge.parent_group_exec_id != null
select new
{
ge.id,
ge.parent_group_exec_id,
p.client_id
});
result.Dump();
Null checks shouldn't cause the framework to mess up like this. Why should I have to write it this way? This just seems like a defect to me in the framework. It will make my dynamic expressions a little bit more difficult to write, but maybe I can find a way.
Navigation Properties still mess up...so I'm still really sad. Picture below:
var result = (from ge in group_execution.Where(x=>x.parent_group_exec_id != null)
where ge.automation_sequences.project.client_id == 1// && ge.parent_group_exec_id != null
select new
{
ge.id,
ge.parent_group_exec_id,
ge.automation_sequences.project.client_id
});
result.Dump();
move your where clause to after you have defined the structure of the select statement
var result = (from ge in group_execution
select new
{
ge.id,
ge.parent_group_exec_id,
ge.automation_sequences.project.client_id
}).Where(x => x.client_id == 1 && x.parent_group_exec_id != null)
result.Dump();
Remember that Linq-to-entities flattens the results of queries to execute as SQL and then hydrates the object graph from those results.
When your query uses navigation properties or joins the query parser has to allow for zero results from those sub queries (Extents) to make sure that all columns that are required in the output and any interim processing are represented. By explicitly specifying a filter on a table for != null early in the query the parser knows that there is no further possibility that the field and any relationships linked by that field will be null, until then the parser prepares the query as if the joins will return null results
It is worth checking, but i wonder if UseDatabaseNullSemantics has anything to do with this?
Try:
dbContext.Configuration.UseDatabaseNullSemantics = false
In Linq, we can specify Where clauses as often as we like, improve the resulting SQL we should filter early in the query and granularly.
The parser engine is optimized to follow and implement your query sequentially, and generate good SQL at the end of it. Don't try to write linq-to-entities the same way that you structure your SQL, I know it's counter intuitive because the syntax is similar
A good technique is to assume that before each clause all the records from the previous statements have been loaded into memory, and that the next operation will affect all of those records. So you want to reduce the records before each additional operation by specifying a filter before moving on to the next clause
In general, if you have a filter condition based on the root table, apply this to the query before define all other joins and filters and even selects, you will get much cleaner sql.

nature of SELECT query in MVC and LINQ TO SQL

i am bit confused by the nature and working of query , I tried to access database which contains each name more than once having same EMPid so when i accessed it in my DROP DOWN LIST then same repetition was in there too so i tried to remove repetition by putting DISTINCT in query but that didn't work but later i modified it another way and that worked but WHY THAT WORKED, I DON'T UNDERSTAND ?
QUERY THAT DIDN'T WORK
var names = (from n in DataContext.EmployeeAtds select n).Distinct();
QUERY THAT WORKED of which i don't know how ?
var names = (from n in DataContext.EmployeeAtds select new {n.EmplID, n.EmplName}).Distinct();
why 2nd worked exactly like i wanted (picking each name 1 time)
i'm using mvc 3 and linq to sql and i am newbie.
Both queries are different. I am explaining you both query in SQL that will help you in understanding both queries.
Your first query is:
var names = (from n in DataContext.EmployeeAtds select n).Distinct();
SQL:-
SELECT DISTINCT [t0].[EmplID], [t0].[EmplName], [t0].[Dept]
FROM [EmployeeAtd] AS [t0]
Your second query is:
(from n in EmployeeAtds select new {n.EmplID, n.EmplName}).Distinct()
SQL:-
SELECT DISTINCT [t0].[EmplID], [t0].[EmplName] FROM [EmployeeAtd] AS
[t0]
Now you can see SQL query for both queries. First query is showing that you are implementing Distinct on all columns of table but in second query you are implementing distinct only on required columns so it is giving you desired result.
As per Scott Allen's Explanation
var names = (from n in DataContext.EmployeeAtds select n).Distinct();
The docs for Distinct are clear – the method uses the default equality comparer to test for equality, and the default comparer sees 4 distinct object references. One way to get around this would be to use the overloaded version of Distinct that accepts a custom IEqualityComparer.
var names = (from n in DataContext.EmployeeAtds select new {n.EmplID, n.EmplName}).Distinct();
Turns out the C# compiler overrides Equals and GetHashCode for anonymous types. The implementation of the two overridden methods uses all the public properties on the type to compute an object's hash code and test for equality. If two objects of the same anonymous type have all the same values for their properties – the objects are equal. This is a safe strategy since anonymously typed objects are essentially immutable (all the properties are read-only).
Try this:
var names = DataContext.EmployeeAtds.Select(x => x.EmplName).Distinct().ToList();
Update:
var names = DataContext.EmployeeAtds
.GroupBy(x => x.EmplID)
.Select(g => new { EmplID = g.Key, EmplName = g.FirstOrDefault().EmplName })
.ToList();

Simple.Data join without where clause including any primary table columns

I am using Simple.Data and have been trying to find an example that will let me do a join with the only condition in the WHERE clause be from the joined table. All of the examples I have seen always have at least one column in the primary table included in the WHERE. Take for example the following data:
private void TestSetup()
{
var adapter = new InMemoryAdapter();
adapter.SetKeyColumn("Events", "Id");
adapter.SetAutoIncrementColumn("Events", "Id");
adapter.SetKeyColumn("Doors", "Id");
adapter.SetAutoIncrementColumn("Doors", "Id");
adapter.Join.Master("Events", "Id").Detail("Doors", "EventId");
Database.UseMockAdapter(adapter);
db.Events.Insert(Id: 1, Code: "CodeMash2013", Name: "CodeMash 2013");
db.Events.Insert(Id: 2, Code: "SomewhereElse", Name: "Some Other Conf");
db.Doors.Insert(Id: 1, Code: "F7E08AC9-5E75-417D-A7AA-60E88B5B99AD", EventID: 1);
db.Doors.Insert(Id: 2, Code: "0631C802-2748-4C63-A6D9-CE8C803002EB", EventID: 1);
db.Doors.Insert(Id: 3, Code: "281ED88F-677D-49B9-84FA-4FAE022BBC73", EventID: 1);
db.Doors.Insert(Id: 4, Code: "9DF7E964-1ECE-42E3-8211-1F2BF7054A0D", EventID: 2);
db.Doors.Insert(Id: 5, Code: "9418123D-312A-4E8C-8807-59F0A63F43B9", EventID: 2);
}
I am trying to figure out the syntax I need to use in Simple.Data to get something similar to this T-SQL:
SELECT d.Code FROM Doors AS d INNER JOIN Events AS e ON d.EventID = e.Id WHERE e.Code = #EventCode
The final result should be only the three Door rows for EventId 1 when I pass in an event code of "CodeMash2013". Thanks!
First, a general point: since you've got criteria against the joined Events table, the LEFT OUTER is redundant; only rows with matching Event Codes will be returned, which implies only those rows where the join from Doors to Events was successful.
If you've got referential integrity set up in your database, with a foreign key relationship from Doors to Events, then Simple.Data can handle joins automatically. With that in mind, this code will work, both with the InMemoryAdapter and SQL Server:
List<dynamic> actual = db.Doors.FindAll(db.Doors.Events.Code == "CodeMash2013")
.Select(db.Doors.Id, db.Events.Name)
.ToList();
Assert.AreEqual(3, actual.Count);
If you don't have referential integrity set up then you should, but if you can't for some reason, then the following will work with SQL Server, but will trigger a bug in the InMemoryAdapter that I've just fixed but haven't done a release for yet:
dynamic eventAlias;
List<dynamic> actual = db.Doors.All()
.Join(db.Events, out eventAlias)
.On(db.Doors.EventID == eventAlias.Id)
.Select(db.Doors.Id, db.Events.Name)
.Where(eventAlias.Code == eventCode)
.ToList();
Assert.AreEqual(3, actual.Count);
UPDATE : This answer applies when using the Simple.Data SQL Server Provider, not the InMemoryAdapter
You could probably use the following for this:
db.Doors.All()
.Select(
db.Doors.Code)
.LeftJoin(db.Events).On(db.Doors.EventID == db.Events.Id)
.Where(db.Events.Code == eventCode);
You might need to experiment between using LeftJoin and OuterJoin depending on your provider. If you're using the ADO provider for example, the two functions both generate LEFT JOIN statements as LEFT JOIN and LEFT OUTER JOIN are synonymous in t-sql.
If you need to use aliases for some reason, the syntax is slightly different.
dynamic EventAlias;
db.Doors.All()
.LeftJoin(db.Events.As"e", out EventAlias).On(db.Doors.EventID == db.EventAlias.Id)
.Select(
db.Doors.Code)
.Where(db.EventAlias.Code == eventCode);
There's no reason why a where clause must contain a field from the primary key table. You can find more examples of Joins here on the Simple.Data doc site. Click Explicit Joins when you get there.

LINQ - listview - 2 levels of data in one table

I have been trying to solve this problem and I can't seem to figure it out. I'm not sure if it's because of my db design and LINQ, but I'm hoping for some direction here.
My db table:
Id         Name         ParentId
1          Data1        null
2          Data2        null
3          Data3        null
4          Data4        1
5          Data5        1
6          Data6        2
7          Data7        2
Basically Data1 and Data2 are the top levels that I want to use for headings and their children will be related based on their ParentID.
I am trying to use a listview to present the data like the following:
Data1
-----
Data4
Data5
Data2
-----
Data6
Data7
I am trying to use a combination of LINQ and listview to accomplish this.
The following is the code for the linq query:
var query = from data in mydb.datatable
where data.ParentId == null
select data;
But this only gives the heading level... and unfortunately listview only takes in 1 datasource.
While it's possible with some databases (like SQL Server post 2005) to write recursive queries, I don't believe those get generated by LINQ. On the other hand, if the number of records is sufficiently small, you could materialize the data (to a list) and write a LINQ query that uses a recursive function to generate your list.
This is from memory, but it would look something like this:
Func<int?,IEnumerable<data>> f = null;
f = parentId => {
IEnumerable<data> result = from data in mydb.datatable
where data.ParentId = parentId
select data;
return result.ToList().SelectMany(d=>f(d.Id));
};
That should get you the hierarchy.
If your hierarchy has only two levels you can use a group join and anonymous objects:
var query = from data in mydb.datatable.Where(x => x.ParentId == null)
join child in mydb.datatable.Where(x => x.ParentId != null)
on data.Id equals child.ParentId into children
select new { data, children };
Edit: You will have to convert the data to a collection that can be bound to a ListView. One hack would be to have a list that is only one level deep with spacing in front of the subitems:
var listViewItems = (from item in query.AsEnumerable()
let dataName = item.data.Name
let childNames = item.children.Select(c => " " + c.Name)
from name in dataName.Concat(childNames)
select new ListViewItem(name)).ToArray();
You could also try to find a control that fits better, like a TreeView. You might want to ask a separate question about this issue.
I just wrote up a blog post describing a solution to build a graph from a self-referencing table with a single LINQ query to the database which might be of use. See http://www.thinqlinq.com/Post.aspx/Title/Hierarchical-Trees-from-Flat-Tables-using-LINQ.

Linq and SQL query comparison

I have this SQL query:
SELECT Sum(ABS([Minimum Installment])) AS SumOfMonthlyPayments FROM tblAccount
INNER JOIN tblAccountOwner ON tblAccount.[Creditor Registry ID] = tblAccountOwner.
[Creditor Registry ID] AND tblAccount.[Account No] = tblAccountOwner.[Account No]
WHERE (tblAccountOwner.[Account Owner Registry ID] = 731752693037116688)
AND (tblAccount.[Account Type] NOT IN
('CA00', 'CA01', 'CA03', 'CA04', 'CA02', 'PA00', 'PA01', 'PA02', 'PA03', 'PA04'))
AND (DATEDIFF(mm, tblAccount.[State Change Date], GETDATE()) <=
4 OR tblAccount.[State Change Date] IS NULL)
AND ((tblAccount.[Account Type] IN ('CL10','CL11','PL10','PL11')) OR
CONTAINS(tblAccount.[Account Type], 'Mortgage')) AND (tblAccount.[Account Status ID] <> 999)
I have created a Linq query:
var ownerRegistryId = 731752693037116688;
var excludeTypes = new[]
{
"CA00", "CA01", "CA03", "CA04", "CA02",
"PA00", "PA01", "PA02", "PA03", "PA04"
};
var maxStateChangeMonth = 4;
var excludeStatusId = 999;
var includeMortgage = new[] { "CL10", "CL11", "PL10", "PL11" };
var sum = (
from account in context.Accounts
from owner in account.AccountOwners
where owner.AccountOwnerRegistryId == ownerRegistryId
where !excludeTypes.Contains(account.AccountType)
where account.StateChangeDate == null ||
(account.StateChangeDate.Month - DateTime.Now.Month)
<= maxStateChangeMonth
where includeMortgage.Contains(account.AccountType) ||
account.AccountType.Contains("Mortgage")
where account.AccountStatusId != excludeStatusId
select account.MinimumInstallment).ToList()
.Sum(minimumInstallment =>
Math.Abs((decimal)(minimumInstallment)));
return sum;
Are they equal/same ? I dont have records in db so I cant confirm if they are equal. In SQL there are brackets() but in Linq I didnt use them so is it ok?
Please suggest.
It is not possible for us to say anything about this, because you didn't show us the DBML. The actual definition of the mapping between the model and the database is important to be able to see how this executes.
But before you add the DBML to your question: we are not here to do your work, so here are two tips to find out whether they are equal or not:
Insert data in your database and run the queries.
Use a SQL profiler and see what query is executed by your LINQ provider under the covers.
If you have anything more specific to ask, we will be very willing to help.
The brackets will be generated by LINQ provider, if necessary.
The simplest way to check if the LINQ query is equal to the initial SQL query is to log it like #Atanas Korchev suggested.
If you are using Entity Framework, however, there is no Log property, but you can try to convert your query to an ObjectQuery, and call the ToTraceString method then:
string sqlQuery = (sum as ObjectQuery).ToTraceString();
UPD. The ToTraceString method needs an ObjectQuery instance for tracing, and the ToList() call already performs materialization, so there is nothing to trace. Here is the updated code:
var sum = (
from account in context.Accounts
from owner in account.AccountOwners
where owner.AccountOwnerRegistryId == ownerRegistryId
where !excludeTypes.Contains(account.AccountType)
where account.StateChangeDate == null ||
(account.StateChangeDate.Month - DateTime.Now.Month)
<= maxStateChangeMonth
where includeMortgage.Contains(account.AccountType) ||
account.AccountType.Contains("Mortgage")
where account.AccountStatusId != excludeStatusId
select account.MinimumInstallment);
string sqlQuery = (sum as ObjectQuery).ToTraceString();
Please note that this code will not perform the actual query, it is usable for testing purposes only.
Check out this article if you are interested in ready-for-production logging implementation.
There can be a performance difference:
The SQL query returns a single number (SELECT Sum...) directly from the database server to the client which executes the query.
In your LINQ query you have a greedy operator (.ToList()) in between:
var sum = (...
...
select account.MinimumInstallment).ToList()
.Sum(minimumInstallment =>
Math.Abs((decimal)(minimumInstallment)));
That means that the query on the SQL server does not contain the .Sum operation. The query returns a (potentially long?) list of MinimumInstallments. Then the .Sum operation is performed in memory on the client.
So effectively you switch from LINQ to Entities to LINQ to Objects after .ToList().
BTW: Can you check the last proposal in your previous question here which would avoid .ToList() on this query (if the proposal should work) and would therefore be closer to the SQL statement.

Resources