Randomize entity framework query results - asp.net

Good afternoon,
I have a listview filled using linqdatasource + entity framework iqueryable query.
The query uses a take (top on t-sql) like this:
context.Categories().OrderBy(c=>c.Name).Take(20);
So it brings me the 20 records I want ordered by name.
Now I want to show those 20 records on a random order. Whats the best approach acomplish this?

I believe the answer in this post is what you need:
Linq to Entities, random order
EDIT:
Get your top 20 records first. Then with the top 20 items you've already fetched, randomize them all in C#, not involving the database at all:
var yourRecords = context.Categories().OrderBy(c=>c.Name).Take(20); // I believe .Take() triggers the actual database call
yourRecords = yourRecords.OrderBy(a => Guid.NewGuid()); // then randomize the items now that they are in C# memory

this turned out to be very simple using extension methods, ordering by name first, then calling Take (top on T-sql) and randomizing later
context.Categories().OrderByName().Take(20).OrderByRandom();
public static IQueryable<Category> OrderByName(this IQueryable<Category> query)
{
return from c in query
orderby c.Name
select c;
}
public static IQueryable<T> OrderByRandom<T>(this IQueryable<T> query)
{
return (from q in query
orderby Guid.NewGuid()
select q);
}

Related

Extending SELECT projection

I want to extend the documents that I receive from a SELECT clause.
Lets assume a I have a collection that stores documents in the following shape
{"foo": "yeah I am a foo", "bar": "And I am a bar"}
so that the query
SELECT * FROM f
would return the above document(s)
Now I want to add an additional property that is NOT part of the documents stored as part of the projection of the SELECT statement.
Basically I'd like to do something like using Javascript's spread operator (which is not possible in Cosmos DB)
SELECT {...*, "newprop": "oh! I am new here!"} FROM f
and which should then return document(s) like this
{"foo": "yeah I am a foo", "bar": "And I am a bar", "newprop": "oh! I am new here!"}
The one thing I DONT WANT TO DO is to repeat all the toplevel properties of my documents. So a solution in the form of
SELECT {"foo": f.foo, "bar":f.bar, "newprop": "oh! I am new here!"} FROM f
is not desired.
I also tried to get that done via a function. Which I was not able to do as I cant find out how to get the toplevel object / document handle within the SELECT clause.
I tried the following
SELECT udf.ExtendDocument(*) FROM f
SELECT udf.ExtendDocument($1) FROM f
SELECT udf.ExtendDocument(f) FROM f
SELECT udf.ExtendDocument(value) FROM f
most of which produced a syntax error
It's not possible to use SELECT *, then append columns to the projection.
One option you could explore is to add a static property and value to the class that you deserialize your data into.
For instance, you could create a class like this simple one for a person with a hardcoded property and default value. Then deserialize your query results into it with the static value added as another property with a default value.
class Person
{
[JsonProperty(PropertyName = "id")]
public string Id { get; set; }
[JsonProperty(PropertyName = "pk")]
public string Pk { get; set; }
[JsonProperty(PropertyName = "firstName")]
public string FirstName { get; set; }
[JsonProperty(PropertyName = "lastName")]
public string LastName { get; set; }
public string MyStaticColumn get; set; } = "Default Value";
}
Then the code to run the query...
public static async Task QueryPerson(Container container)
{
QueryDefinition query = new QueryDefinition("select * from c");
FeedIterator<Person> resultSet = container.GetItemQueryIterator<Person>(
query, requestOptions: new QueryRequestOptions()
{
MaxConcurrency = -1
});
List<Person> results = new List<Person>();
while (resultSet.HasMoreResults)
{
FeedResponse<Person> response = await resultSet.ReadNextAsync();
foreach(var p in response)
{
results.Add(p);
}
}
}
So I found a solution.
A) Build a user defined function that does the "Extension"
function extendProjection(x) {
var result = {
//usually one want to extend the returned doc
//with some new calculated properties and not
//with a static value
newprop: calculateNewPropFromObject(x)
}
return Object.assign(result, x)
}
B) Use the user defined function in your SELECT
SELECT VALUE udf.extendProjection(c) FROM c
//it is important to use the keyword "VALUE" eitherwise
//the resulting doc will look {$1: { //the extendedprojection }}
Having described that I would recommend against this approach
Your RUs will easily tripple. The reason seems to be the usage of the JS itself and not so much what the JS engine does.
its not possible to "reuse" different registered UDFs within your JS code.
So one has to copy code snippets
"Extended Properties" are not useable in your WHERE clause
Runtime error messages returned from Cosmos DB are horrible to decipher.
The lack of any decent development environment is basically a no go.
Like #mark-brown already answered, it does not seem to be possible.
I would just like to add that likely you shouldn't do that anyway and offer a workaround arguably better than the UDF (which is costly, hard-to-maintain, does not support multiple concurrent logic versions, etc).
If you want to add extra calculations to query output based on the same entire document, then it would make more sense to do it in business layer (after querying), not data layer (CosmsosDB queries). It would also be faster (less calculations, less data to move) and cheaper (less RU).
If you want to add static data (ex: a fix string or other constants), then the same argument applies - passing it back-and-forth to cosmosDB just makes things slower and costlier. That's not the responsibility of storage.
The workaround
If the goal is to query an entire CHILD object and add only a few selected properties from other areas of documents then its best not to try to flatten the object. Just keep your storage model objects and extras side-by-side, ex:
select c.childWithAllTheFutureChildren,
c.other.location.single.value as newProp
from c
If you really-really want to add some calculation/statics to query output then you could also still use the same pattern for entire document:
SELECT c as TheRealStoredThing,
'oh! I am new here!' as theNewProp
FROM c
Yes, it does require you to have a separate model on client side for this query, but that's a good clean practice anyway. And it's much simpler than using/maintaining UDFs.

EF Core Update with List

To make updates to a record of SQL Server using Entity Framework Core, I query the record I need to update, make changes to the object and then call .SaveChanges(). This works nice and clean.
For example:
var emp = _context.Employee.FirstOrDefault(item => item.IdEmployee == Data.IdEmployee);
emp.IdPosition = Data.IdPosition;
await _context.SaveChangesAsync();
But is there a standard method if I want to update multiple records?
My first approach was using a list passing it to the controller, but then I would need to go through that list and save changes every time, never really finished this option as I regarded it as not optimal.
For now what I do is instead of passing a list to the controller, I pass each object to the controller using a for. (kind of the same...)
for(int i = 0; i < ObjectList.Count; i ++)
{
/* Some code */
var httpResponseObject = await MyRepositories.Post<Object>(url+"/Controller", Object);
}
And then do the same thing on the controller as before, when updating only one record, for each of the records...
I don't feel this is the best possible approach, but I haven't found another way, yet.
What would be the optimal way of doing this?
Your question has nothing to do with Blazor... However, I'm not sure I understand what is the issue. When you call the SaveChangesAsync method, all changes in your context are committed to the database. You don't have to pass one object at a time...You can pass a list of objects
Hope this helps...
Updating records in bulk using Entity Framework or other Object Relational Mapping (ORM) libraries is a common challenge because they will run an UPDATE command for every record. You could try using Entity Framework Plus, which is an extension to do bulk updates.
If updating multiple records with a single call is critical for you, I would recommend just writing a stored procedure and call if from your service. Entity Framework can also run direct queries and stored procedures.
It looks like the user makes some changes and then a save action needs to persist multiple records at the same time. You could trigger multiple AJAX calls—or, if you need, just one.
What I would do is create an endpoint—with an API controller and an action—that's specific to your needs. For example, to update the position of records in a table:
Controller:
/DataOrder
Action:
[HttpPut]
public async void Update([FromBody] DataChanges changes)
{
foreach(var change in changes)
{
var dbRecord = _context.Employees.Find(change.RecordId);
dbRecord.IdPosition = change.Position;
}
_context.SaveChanges();
}
public class DataChanges
{
public List<DataChange> Items {get;set;}
public DataChangesWrapper()
{
Items = new List<DataChange>();
}
}
public class DataChange
{
public int RecordId {get;set;}
public int Position {get;set;}
}
The foreach statement will execute an UPDATE for every record. If you want a single database call, however, you can write a SQL query or have a stored procedure in the database and pass the data as a DataTable parameter instead.

Returing multiple Model objects from LINQ Joins

I am using ASP.NET MVC framework and accessing DB records with Entities.
I am doing some joins like this:
public IQueryable<...> GetThem()
{
var ords = from o in db.Orders
join c in db.Categories on o.CategoryID equals c.ID
select new {Order=o, Category=c};
return ords;
}
I need to use/pass 'ords' from one function to other in a strongly-typed manner.
(I will be doing this kind of joins in multiple places.)
What is the best way to do this?
Do I need create a new class containing both returned vals for every join I do?
Eg: public class OrderAndCategory { public Order; public Category; } in this case.
Is there any simpler way?
Thanks!
The class representing the data in ords is strongly typed, it is generated by the compiler. You could run into problems though if the compiler generates different classes for different instances of the query, but with the same types. You'd have to check that. If this is the case, you'll have to create classes for each different query, or use the class Tuple<Targs...> instead.

Strongly typed access to result-set from Linq Join - to display in MVC view

I am new to the MVC/Linq party (coming from Java).
I know that I can use something like the following to get strongly typed access, but what about if I want to use a join in my linq query?
public void IQueryable<Product> GetItems(int CategoryID)
{
...linq query...
}enter code here
In your example, if you have used the designer to create your Linq to SQL data context, and there is a foreign key relationship between the Product and Category tables in your database, then the Category will be a property in your Product class, so there is no need to join them.
Something along the lines of the following should help (guessing property names; db is your data context):
public IQueryable<Product> GetItems(int categoryID)
{
return db.Products.Where(p => p.Category.CategoryID == categoryID).AsQueryable();
}
Or if you are not comfortable with the lambda syntax, then the following is synonymous:
public IQueryable<Product> GetItems(int categoryID)
{
var result = from p in db.Products
where p.Category.CategoryID == categoryId
select p;
return result.AsQueryable();
}
The Product class has a Category property, providing strongly typed access to all of the properties in category. You would use something like product.Category.Name to get the category name for example.
Also, there are a plethora of good Linq to SQL tutorials out there. Try the following:
http://it-box.blogturk.net/2007/10/19/linq-to-sql-tutorial-series-by-scott-guthrie-pdf-book-format/
http://www.hookedonlinq.com/LINQtoSQL5MinuteOverview.ashx

What are good design practices when working with Entity Framework

This will apply mostly for an asp.net application where the data is not accessed via soa. Meaning that you get access to the objects loaded from the framework, not Transfer Objects, although some recommendation still apply.
This is a community post, so please add to it as you see fit.
Applies to: Entity Framework 1.0 shipped with Visual Studio 2008 sp1.
Why pick EF in the first place?
Considering it is a young technology with plenty of problems (see below), it may be a hard sell to get on the EF bandwagon for your project. However, it is the technology Microsoft is pushing (at the expense of Linq2Sql, which is a subset of EF). In addition, you may not be satisfied with NHibernate or other solutions out there. Whatever the reasons, there are people out there (including me) working with EF and life is not bad.make you think.
EF and inheritance
The first big subject is inheritance. EF does support mapping for inherited classes that are persisted in 2 ways: table per class and table the hierarchy. The modeling is easy and there are no programming issues with that part.
(The following applies to table per class model as I don't have experience with table per hierarchy, which is, anyway, limited.) The real problem comes when you are trying to run queries that include one or many objects that are part of an inheritance tree: the generated sql is incredibly awful, takes a long time to get parsed by the EF and takes a long time to execute as well. This is a real show stopper. Enough that EF should probably not be used with inheritance or as little as possible.
Here is an example of how bad it was. My EF model had ~30 classes, ~10 of which were part of an inheritance tree. On running a query to get one item from the Base class, something as simple as Base.Get(id), the generated SQL was over 50,000 characters. Then when you are trying to return some Associations, it degenerates even more, going as far as throwing SQL exceptions about not being able to query more than 256 tables at once.
Ok, this is bad, EF concept is to allow you to create your object structure without (or with as little as possible) consideration on the actual database implementation of your table. It completely fails at this.
So, recommendations? Avoid inheritance if you can, the performance will be so much better. Use it sparingly where you have to. In my opinion, this makes EF a glorified sql-generation tool for querying, but there are still advantages to using it. And ways to implement mechanism that are similar to inheritance.
Bypassing inheritance with Interfaces
First thing to know with trying to get some kind of inheritance going with EF is that you cannot assign a non-EF-modeled class a base class. Don't even try it, it will get overwritten by the modeler. So what to do?
You can use interfaces to enforce that classes implement some functionality. For example here is a IEntity interface that allow you to define Associations between EF entities where you don't know at design time what the type of the entity would be.
public enum EntityTypes{ Unknown = -1, Dog = 0, Cat }
public interface IEntity
{
int EntityID { get; }
string Name { get; }
Type EntityType { get; }
}
public partial class Dog : IEntity
{
// implement EntityID and Name which could actually be fields
// from your EF model
Type EntityType{ get{ return EntityTypes.Dog; } }
}
Using this IEntity, you can then work with undefined associations in other classes
// lets take a class that you defined in your model.
// that class has a mapping to the columns: PetID, PetType
public partial class Person
{
public IEntity GetPet()
{
return IEntityController.Get(PetID,PetType);
}
}
which makes use of some extension functions:
public class IEntityController
{
static public IEntity Get(int id, EntityTypes type)
{
switch (type)
{
case EntityTypes.Dog: return Dog.Get(id);
case EntityTypes.Cat: return Cat.Get(id);
default: throw new Exception("Invalid EntityType");
}
}
}
Not as neat as having plain inheritance, particularly considering you have to store the PetType in an extra database field, but considering the performance gains, I would not look back.
It also cannot model one-to-many, many-to-many relationship, but with creative uses of 'Union' it could be made to work. Finally, it creates the side effet of loading data in a property/function of the object, which you need to be careful about. Using a clear naming convention like GetXYZ() helps in that regards.
Compiled Queries
Entity Framework performance is not as good as direct database access with ADO (obviously) or Linq2SQL. There are ways to improve it however, one of which is compiling your queries. The performance of a compiled query is similar to Linq2Sql.
What is a compiled query? It is simply a query for which you tell the framework to keep the parsed tree in memory so it doesn't need to be regenerated the next time you run it. So the next run, you will save the time it takes to parse the tree. Do not discount that as it is a very costly operation that gets even worse with more complex queries.
There are 2 ways to compile a query: creating an ObjectQuery with EntitySQL and using CompiledQuery.Compile() function. (Note that by using an EntityDataSource in your page, you will in fact be using ObjectQuery with EntitySQL, so that gets compiled and cached).
An aside here in case you don't know what EntitySQL is. It is a string-based way of writing queries against the EF. Here is an example: "select value dog from Entities.DogSet as dog where dog.ID = #ID". The syntax is pretty similar to SQL syntax. You can also do pretty complex object manipulation, which is well explained [here][1].
Ok, so here is how to do it using ObjectQuery<>
string query = "select value dog " +
"from Entities.DogSet as dog " +
"where dog.ID = #ID";
ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance));
oQuery.Parameters.Add(new ObjectParameter("ID", id));
oQuery.EnablePlanCaching = true;
return oQuery.FirstOrDefault();
The first time you run this query, the framework will generate the expression tree and keep it in memory. So the next time it gets executed, you will save on that costly step. In that example EnablePlanCaching = true, which is unnecessary since that is the default option.
The other way to compile a query for later use is the CompiledQuery.Compile method. This uses a delegate:
static readonly Func<Entities, int, Dog> query_GetDog =
CompiledQuery.Compile<Entities, int, Dog>((ctx, id) =>
ctx.DogSet.FirstOrDefault(it => it.ID == id));
or using linq
static readonly Func<Entities, int, Dog> query_GetDog =
CompiledQuery.Compile<Entities, int, Dog>((ctx, id) =>
(from dog in ctx.DogSet where dog.ID == id select dog).FirstOrDefault());
to call the query:
query_GetDog.Invoke( YourContext, id );
The advantage of CompiledQuery is that the syntax of your query is checked at compile time, where as EntitySQL is not. However, there are other consideration...
Includes
Lets say you want to have the data for the dog owner to be returned by the query to avoid making 2 calls to the database. Easy to do, right?
EntitySQL
string query = "select value dog " +
"from Entities.DogSet as dog " +
"where dog.ID = #ID";
ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance)).Include("Owner");
oQuery.Parameters.Add(new ObjectParameter("ID", id));
oQuery.EnablePlanCaching = true;
return oQuery.FirstOrDefault();
CompiledQuery
static readonly Func<Entities, int, Dog> query_GetDog =
CompiledQuery.Compile<Entities, int, Dog>((ctx, id) =>
(from dog in ctx.DogSet.Include("Owner") where dog.ID == id select dog).FirstOrDefault());
Now, what if you want to have the Include parametrized? What I mean is that you want to have a single Get() function that is called from different pages that care about different relationships for the dog. One cares about the Owner, another about his FavoriteFood, another about his FavotireToy and so on. Basicly, you want to tell the query which associations to load.
It is easy to do with EntitySQL
public Dog Get(int id, string include)
{
string query = "select value dog " +
"from Entities.DogSet as dog " +
"where dog.ID = #ID";
ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance))
.IncludeMany(include);
oQuery.Parameters.Add(new ObjectParameter("ID", id));
oQuery.EnablePlanCaching = true;
return oQuery.FirstOrDefault();
}
The include simply uses the passed string. Easy enough. Note that it is possible to improve on the Include(string) function (that accepts only a single path) with an IncludeMany(string) that will let you pass a string of comma-separated associations to load. Look further in the extension section for this function.
If we try to do it with CompiledQuery however, we run into numerous problems:
The obvious
static readonly Func<Entities, int, string, Dog> query_GetDog =
CompiledQuery.Compile<Entities, int, string, Dog>((ctx, id, include) =>
(from dog in ctx.DogSet.Include(include) where dog.ID == id select dog).FirstOrDefault());
will choke when called with:
query_GetDog.Invoke( YourContext, id, "Owner,FavoriteFood" );
Because, as mentionned above, Include() only wants to see a single path in the string and here we are giving it 2: "Owner" and "FavoriteFood" (which is not to be confused with "Owner.FavoriteFood"!).
Then, let's use IncludeMany(), which is an extension function
static readonly Func<Entities, int, string, Dog> query_GetDog =
CompiledQuery.Compile<Entities, int, string, Dog>((ctx, id, include) =>
(from dog in ctx.DogSet.IncludeMany(include) where dog.ID == id select dog).FirstOrDefault());
Wrong again, this time it is because the EF cannot parse IncludeMany because it is not part of the functions that is recognizes: it is an extension.
Ok, so you want to pass an arbitrary number of paths to your function and Includes() only takes a single one. What to do? You could decide that you will never ever need more than, say 20 Includes, and pass each separated strings in a struct to CompiledQuery. But now the query looks like this:
from dog in ctx.DogSet.Include(include1).Include(include2).Include(include3)
.Include(include4).Include(include5).Include(include6)
.[...].Include(include19).Include(include20) where dog.ID == id select dog
which is awful as well. Ok, then, but wait a minute. Can't we return an ObjectQuery<> with CompiledQuery? Then set the includes on that? Well, that what I would have thought so as well:
static readonly Func<Entities, int, ObjectQuery<Dog>> query_GetDog =
CompiledQuery.Compile<Entities, int, string, ObjectQuery<Dog>>((ctx, id) =>
(ObjectQuery<Dog>)(from dog in ctx.DogSet where dog.ID == id select dog));
public Dog GetDog( int id, string include )
{
ObjectQuery<Dog> oQuery = query_GetDog(id);
oQuery = oQuery.IncludeMany(include);
return oQuery.FirstOrDefault;
}
That should have worked, except that when you call IncludeMany (or Include, Where, OrderBy...) you invalidate the cached compiled query because it is an entirely new one now! So, the expression tree needs to be reparsed and you get that performance hit again.
So what is the solution? You simply cannot use CompiledQueries with parametrized Includes. Use EntitySQL instead. This doesn't mean that there aren't uses for CompiledQueries. It is great for localized queries that will always be called in the same context. Ideally CompiledQuery should always be used because the syntax is checked at compile time, but due to limitation, that's not possible.
An example of use would be: you may want to have a page that queries which two dogs have the same favorite food, which is a bit narrow for a BusinessLayer function, so you put it in your page and know exactly what type of includes are required.
Passing more than 3 parameters to a CompiledQuery
Func is limited to 5 parameters, of which the last one is the return type and the first one is your Entities object from the model. So that leaves you with 3 parameters. A pitance, but it can be improved on very easily.
public struct MyParams
{
public string param1;
public int param2;
public DateTime param3;
}
static readonly Func<Entities, MyParams, IEnumerable<Dog>> query_GetDog =
CompiledQuery.Compile<Entities, MyParams, IEnumerable<Dog>>((ctx, myParams) =>
from dog in ctx.DogSet where dog.Age == myParams.param2 && dog.Name == myParams.param1 and dog.BirthDate > myParams.param3 select dog);
public List<Dog> GetSomeDogs( int age, string Name, DateTime birthDate )
{
MyParams myParams = new MyParams();
myParams.param1 = name;
myParams.param2 = age;
myParams.param3 = birthDate;
return query_GetDog(YourContext,myParams).ToList();
}
Return Types (this does not apply to EntitySQL queries as they aren't compiled at the same time during execution as the CompiledQuery method)
Working with Linq, you usually don't force the execution of the query until the very last moment, in case some other functions downstream wants to change the query in some way:
static readonly Func<Entities, int, string, IEnumerable<Dog>> query_GetDog =
CompiledQuery.Compile<Entities, int, string, IEnumerable<Dog>>((ctx, age, name) =>
from dog in ctx.DogSet where dog.Age == age && dog.Name == name select dog);
public IEnumerable<Dog> GetSomeDogs( int age, string name )
{
return query_GetDog(YourContext,age,name);
}
public void DataBindStuff()
{
IEnumerable<Dog> dogs = GetSomeDogs(4,"Bud");
// but I want the dogs ordered by BirthDate
gridView.DataSource = dogs.OrderBy( it => it.BirthDate );
}
What is going to happen here? By still playing with the original ObjectQuery (that is the actual return type of the Linq statement, which implements IEnumerable), it will invalidate the compiled query and be force to re-parse. So, the rule of thumb is to return a List<> of objects instead.
static readonly Func<Entities, int, string, IEnumerable<Dog>> query_GetDog =
CompiledQuery.Compile<Entities, int, string, IEnumerable<Dog>>((ctx, age, name) =>
from dog in ctx.DogSet where dog.Age == age && dog.Name == name select dog);
public List<Dog> GetSomeDogs( int age, string name )
{
return query_GetDog(YourContext,age,name).ToList(); //<== change here
}
public void DataBindStuff()
{
List<Dog> dogs = GetSomeDogs(4,"Bud");
// but I want the dogs ordered by BirthDate
gridView.DataSource = dogs.OrderBy( it => it.BirthDate );
}
When you call ToList(), the query gets executed as per the compiled query and then, later, the OrderBy is executed against the objects in memory. It may be a little bit slower, but I'm not even sure. One sure thing is that you have no worries about mis-handling the ObjectQuery and invalidating the compiled query plan.
Once again, that is not a blanket statement. ToList() is a defensive programming trick, but if you have a valid reason not to use ToList(), go ahead. There are many cases in which you would want to refine the query before executing it.
Performance
What is the performance impact of compiling a query? It can actually be fairly large. A rule of thumb is that compiling and caching the query for reuse takes at least double the time of simply executing it without caching. For complex queries (read inherirante), I have seen upwards to 10 seconds.
So, the first time a pre-compiled query gets called, you get a performance hit. After that first hit, performance is noticeably better than the same non-pre-compiled query. Practically the same as Linq2Sql
When you load a page with pre-compiled queries the first time you will get a hit. It will load in maybe 5-15 seconds (obviously more than one pre-compiled queries will end up being called), while subsequent loads will take less than 300ms. Dramatic difference, and it is up to you to decide if it is ok for your first user to take a hit or you want a script to call your pages to force a compilation of the queries.
Can this query be cached?
{
Dog dog = from dog in YourContext.DogSet where dog.ID == id select dog;
}
No, ad-hoc Linq queries are not cached and you will incur the cost of generating the tree every single time you call it.
Parametrized Queries
Most search capabilities involve heavily parametrized queries. There are even libraries available that will let you build a parametrized query out of lamba expressions. The problem is that you cannot use pre-compiled queries with those. One way around that is to map out all the possible criteria in the query and flag which one you want to use:
public struct MyParams
{
public string name;
public bool checkName;
public int age;
public bool checkAge;
}
static readonly Func<Entities, MyParams, IEnumerable<Dog>> query_GetDog =
CompiledQuery.Compile<Entities, MyParams, IEnumerable<Dog>>((ctx, myParams) =>
from dog in ctx.DogSet
where (myParams.checkAge == true && dog.Age == myParams.age)
&& (myParams.checkName == true && dog.Name == myParams.name )
select dog);
protected List<Dog> GetSomeDogs()
{
MyParams myParams = new MyParams();
myParams.name = "Bud";
myParams.checkName = true;
myParams.age = 0;
myParams.checkAge = false;
return query_GetDog(YourContext,myParams).ToList();
}
The advantage here is that you get all the benifits of a pre-compiled quert. The disadvantages are that you most likely will end up with a where clause that is pretty difficult to maintain, that you will incur a bigger penalty for pre-compiling the query and that each query you run is not as efficient as it could be (particularly with joins thrown in).
Another way is to build an EntitySQL query piece by piece, like we all did with SQL.
protected List<Dod> GetSomeDogs( string name, int age)
{
string query = "select value dog from Entities.DogSet where 1 = 1 ";
if( !String.IsNullOrEmpty(name) )
query = query + " and dog.Name == #Name ";
if( age > 0 )
query = query + " and dog.Age == #Age ";
ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>( query, YourContext );
if( !String.IsNullOrEmpty(name) )
oQuery.Parameters.Add( new ObjectParameter( "Name", name ) );
if( age > 0 )
oQuery.Parameters.Add( new ObjectParameter( "Age", age ) );
return oQuery.ToList();
}
Here the problems are:
- there is no syntax checking during compilation
- each different combination of parameters generate a different query which will need to be pre-compiled when it is first run. In this case, there are only 4 different possible queries (no params, age-only, name-only and both params), but you can see that there can be way more with a normal world search.
- Noone likes to concatenate strings!
Another option is to query a large subset of the data and then narrow it down in memory. This is particularly useful if you are working with a definite subset of the data, like all the dogs in a city. You know there are a lot but you also know there aren't that many... so your CityDog search page can load all the dogs for the city in memory, which is a single pre-compiled query and then refine the results
protected List<Dod> GetSomeDogs( string name, int age, string city)
{
string query = "select value dog from Entities.DogSet where dog.Owner.Address.City == #City ";
ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>( query, YourContext );
oQuery.Parameters.Add( new ObjectParameter( "City", city ) );
List<Dog> dogs = oQuery.ToList();
if( !String.IsNullOrEmpty(name) )
dogs = dogs.Where( it => it.Name == name );
if( age > 0 )
dogs = dogs.Where( it => it.Age == age );
return dogs;
}
It is particularly useful when you start displaying all the data then allow for filtering.
Problems:
- Could lead to serious data transfer if you are not careful about your subset.
- You can only filter on the data that you returned. It means that if you don't return the Dog.Owner association, you will not be able to filter on the Dog.Owner.Name
So what is the best solution? There isn't any. You need to pick the solution that works best for you and your problem:
- Use lambda-based query building when you don't care about pre-compiling your queries.
- Use fully-defined pre-compiled Linq query when your object structure is not too complex.
- Use EntitySQL/string concatenation when the structure could be complex and when the possible number of different resulting queries are small (which means fewer pre-compilation hits).
- Use in-memory filtering when you are working with a smallish subset of the data or when you had to fetch all of the data on the data at first anyway (if the performance is fine with all the data, then filtering in memory will not cause any time to be spent in the db).
Singleton access
The best way to deal with your context and entities accross all your pages is to use the singleton pattern:
public sealed class YourContext
{
private const string instanceKey = "On3GoModelKey";
YourContext(){}
public static YourEntities Instance
{
get
{
HttpContext context = HttpContext.Current;
if( context == null )
return Nested.instance;
if (context.Items[instanceKey] == null)
{
On3GoEntities entity = new On3GoEntities();
context.Items[instanceKey] = entity;
}
return (YourEntities)context.Items[instanceKey];
}
}
class Nested
{
// Explicit static constructor to tell C# compiler
// not to mark type as beforefieldinit
static Nested()
{
}
internal static readonly YourEntities instance = new YourEntities();
}
}
NoTracking, is it worth it?
When executing a query, you can tell the framework to track the objects it will return or not. What does it mean? With tracking enabled (the default option), the framework will track what is going on with the object (has it been modified? Created? Deleted?) and will also link objects together, when further queries are made from the database, which is what is of interest here.
For example, lets assume that Dog with ID == 2 has an owner which ID == 10.
Dog dog = (from dog in YourContext.DogSet where dog.ID == 2 select dog).FirstOrDefault();
//dog.OwnerReference.IsLoaded == false;
Person owner = (from o in YourContext.PersonSet where o.ID == 10 select dog).FirstOrDefault();
//dog.OwnerReference.IsLoaded == true;
If we were to do the same with no tracking, the result would be different.
ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>)
(from dog in YourContext.DogSet where dog.ID == 2 select dog);
oDogQuery.MergeOption = MergeOption.NoTracking;
Dog dog = oDogQuery.FirstOrDefault();
//dog.OwnerReference.IsLoaded == false;
ObjectQuery<Person> oPersonQuery = (ObjectQuery<Person>)
(from o in YourContext.PersonSet where o.ID == 10 select o);
oPersonQuery.MergeOption = MergeOption.NoTracking;
Owner owner = oPersonQuery.FirstOrDefault();
//dog.OwnerReference.IsLoaded == false;
Tracking is very useful and in a perfect world without performance issue, it would always be on. But in this world, there is a price for it, in terms of performance. So, should you use NoTracking to speed things up? It depends on what you are planning to use the data for.
Is there any chance that the data your query with NoTracking can be used to make update/insert/delete in the database? If so, don't use NoTracking because associations are not tracked and will causes exceptions to be thrown.
In a page where there are absolutly no updates to the database, you can use NoTracking.
Mixing tracking and NoTracking is possible, but it requires you to be extra careful with updates/inserts/deletes. The problem is that if you mix then you risk having the framework trying to Attach() a NoTracking object to the context where another copy of the same object exist with tracking on. Basicly, what I am saying is that
Dog dog1 = (from dog in YourContext.DogSet where dog.ID == 2).FirstOrDefault();
ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>)
(from dog in YourContext.DogSet where dog.ID == 2 select dog);
oDogQuery.MergeOption = MergeOption.NoTracking;
Dog dog2 = oDogQuery.FirstOrDefault();
dog1 and dog2 are 2 different objects, one tracked and one not. Using the detached object in an update/insert will force an Attach() that will say "Wait a minute, I do already have an object here with the same database key. Fail". And when you Attach() one object, all of its hierarchy gets attached as well, causing problems everywhere. Be extra careful.
How much faster is it with NoTracking
It depends on the queries. Some are much more succeptible to tracking than other. I don't have a fast an easy rule for it, but it helps.
So I should use NoTracking everywhere then?
Not exactly. There are some advantages to tracking object. The first one is that the object is cached, so subsequent call for that object will not hit the database. That cache is only valid for the lifetime of the YourEntities object, which, if you use the singleton code above, is the same as the page lifetime. One page request == one YourEntity object. So for multiple calls for the same object, it will load only once per page request. (Other caching mechanism could extend that).
What happens when you are using NoTracking and try to load the same object multiple times? The database will be queried each time, so there is an impact there. How often do/should you call for the same object during a single page request? As little as possible of course, but it does happens.
Also remember the piece above about having the associations connected automatically for your? You don't have that with NoTracking, so if you load your data in multiple batches, you will not have a link to between them:
ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>)(from dog in YourContext.DogSet select dog);
oDogQuery.MergeOption = MergeOption.NoTracking;
List<Dog> dogs = oDogQuery.ToList();
ObjectQuery<Person> oPersonQuery = (ObjectQuery<Person>)(from o in YourContext.PersonSet select o);
oPersonQuery.MergeOption = MergeOption.NoTracking;
List<Person> owners = oPersonQuery.ToList();
In this case, no dog will have its .Owner property set.
Some things to keep in mind when you are trying to optimize the performance.
No lazy loading, what am I to do?
This can be seen as a blessing in disguise. Of course it is annoying to load everything manually. However, it decreases the number of calls to the db and forces you to think about when you should load data. The more you can load in one database call the better. That was always true, but it is enforced now with this 'feature' of EF.
Of course, you can call
if( !ObjectReference.IsLoaded ) ObjectReference.Load();
if you want to, but a better practice is to force the framework to load the objects you know you will need in one shot. This is where the discussion about parametrized Includes begins to make sense.
Lets say you have you Dog object
public class Dog
{
public Dog Get(int id)
{
return YourContext.DogSet.FirstOrDefault(it => it.ID == id );
}
}
This is the type of function you work with all the time. It gets called from all over the place and once you have that Dog object, you will do very different things to it in different functions. First, it should be pre-compiled, because you will call that very often. Second, each different pages will want to have access to a different subset of the Dog data. Some will want the Owner, some the FavoriteToy, etc.
Of course, you could call Load() for each reference you need anytime you need one. But that will generate a call to the database each time. Bad idea. So instead, each page will ask for the data it wants to see when it first request for the Dog object:
static public Dog Get(int id) { return GetDog(entity,"");}
static public Dog Get(int id, string includePath)
{
string query = "select value o " +
" from YourEntities.DogSet as o " +
Please do not use all of the above info such as "Singleton access". You absolutely 100% should not be storing this context to be reused as it is not thread safe.
While informative I think it may be more helpful to share how all this fits into a complete solution architecture. Example- Got a solution showing where you use both EF inheritance and your alternative so that it shows their performance difference.

Resources