Extending SELECT projection - azure-cosmosdb

I want to extend the documents that I receive from a SELECT clause.
Lets assume a I have a collection that stores documents in the following shape
{"foo": "yeah I am a foo", "bar": "And I am a bar"}
so that the query
SELECT * FROM f
would return the above document(s)
Now I want to add an additional property that is NOT part of the documents stored as part of the projection of the SELECT statement.
Basically I'd like to do something like using Javascript's spread operator (which is not possible in Cosmos DB)
SELECT {...*, "newprop": "oh! I am new here!"} FROM f
and which should then return document(s) like this
{"foo": "yeah I am a foo", "bar": "And I am a bar", "newprop": "oh! I am new here!"}
The one thing I DONT WANT TO DO is to repeat all the toplevel properties of my documents. So a solution in the form of
SELECT {"foo": f.foo, "bar":f.bar, "newprop": "oh! I am new here!"} FROM f
is not desired.
I also tried to get that done via a function. Which I was not able to do as I cant find out how to get the toplevel object / document handle within the SELECT clause.
I tried the following
SELECT udf.ExtendDocument(*) FROM f
SELECT udf.ExtendDocument($1) FROM f
SELECT udf.ExtendDocument(f) FROM f
SELECT udf.ExtendDocument(value) FROM f
most of which produced a syntax error

It's not possible to use SELECT *, then append columns to the projection.
One option you could explore is to add a static property and value to the class that you deserialize your data into.
For instance, you could create a class like this simple one for a person with a hardcoded property and default value. Then deserialize your query results into it with the static value added as another property with a default value.
class Person
{
[JsonProperty(PropertyName = "id")]
public string Id { get; set; }
[JsonProperty(PropertyName = "pk")]
public string Pk { get; set; }
[JsonProperty(PropertyName = "firstName")]
public string FirstName { get; set; }
[JsonProperty(PropertyName = "lastName")]
public string LastName { get; set; }
public string MyStaticColumn get; set; } = "Default Value";
}
Then the code to run the query...
public static async Task QueryPerson(Container container)
{
QueryDefinition query = new QueryDefinition("select * from c");
FeedIterator<Person> resultSet = container.GetItemQueryIterator<Person>(
query, requestOptions: new QueryRequestOptions()
{
MaxConcurrency = -1
});
List<Person> results = new List<Person>();
while (resultSet.HasMoreResults)
{
FeedResponse<Person> response = await resultSet.ReadNextAsync();
foreach(var p in response)
{
results.Add(p);
}
}
}

So I found a solution.
A) Build a user defined function that does the "Extension"
function extendProjection(x) {
var result = {
//usually one want to extend the returned doc
//with some new calculated properties and not
//with a static value
newprop: calculateNewPropFromObject(x)
}
return Object.assign(result, x)
}
B) Use the user defined function in your SELECT
SELECT VALUE udf.extendProjection(c) FROM c
//it is important to use the keyword "VALUE" eitherwise
//the resulting doc will look {$1: { //the extendedprojection }}
Having described that I would recommend against this approach
Your RUs will easily tripple. The reason seems to be the usage of the JS itself and not so much what the JS engine does.
its not possible to "reuse" different registered UDFs within your JS code.
So one has to copy code snippets
"Extended Properties" are not useable in your WHERE clause
Runtime error messages returned from Cosmos DB are horrible to decipher.
The lack of any decent development environment is basically a no go.

Like #mark-brown already answered, it does not seem to be possible.
I would just like to add that likely you shouldn't do that anyway and offer a workaround arguably better than the UDF (which is costly, hard-to-maintain, does not support multiple concurrent logic versions, etc).
If you want to add extra calculations to query output based on the same entire document, then it would make more sense to do it in business layer (after querying), not data layer (CosmsosDB queries). It would also be faster (less calculations, less data to move) and cheaper (less RU).
If you want to add static data (ex: a fix string or other constants), then the same argument applies - passing it back-and-forth to cosmosDB just makes things slower and costlier. That's not the responsibility of storage.
The workaround
If the goal is to query an entire CHILD object and add only a few selected properties from other areas of documents then its best not to try to flatten the object. Just keep your storage model objects and extras side-by-side, ex:
select c.childWithAllTheFutureChildren,
c.other.location.single.value as newProp
from c
If you really-really want to add some calculation/statics to query output then you could also still use the same pattern for entire document:
SELECT c as TheRealStoredThing,
'oh! I am new here!' as theNewProp
FROM c
Yes, it does require you to have a separate model on client side for this query, but that's a good clean practice anyway. And it's much simpler than using/maintaining UDFs.

Related

C# database access, Dapper, SQL and POCOs - programming design

Let's say we have a table in SQL represented in C# like this:
public class Product
{
public int ID { get; set; }
public string Name { get; set; }
public string Picture { get; set; } // filename of the picture, e.g. apple.jpg
public int CategoryID { get; set; }
}
Now we would query the database and retrieve the object, let's say with values like this:
ID = 1
Name = Yellow apple
Picture = apple.jpg
CategoryID = 25
All perfectly normal. The thing I'm meditating about at the moment is this: if I want to show a product, I need some additional info that wasn't queried from the database, like exact file path to the image, all we have is
apple.jpg
, but we need maybe something like
~/images/apple.jpg
So, I was thinking of 3 possibilities:
1.) add a new property to the class Product
public string PictureUrl
{
get
{
return "~/images/apple.jpg";
}
}
2.) specify the full url during performing of the presentation logic, let's say:
public void ShowProductDetails()
{
Product p = ProductRepo.GetProduct(id);
txtName.Text = p.Name;
imgPicture.ImageUrl = "~/images/" + p.Picture;
}
3.) use Decorator pattern
First approach seems wrong to me (even though I have been using it for quite a long time), because I'm trying to have a layered web application. I'm not sure hard-coding this is a good way to go.
Second approach is better, but worse in the sense it can't be easily reused. If I have multiple places where I'm doing the same thing and something changes, ... Maybe it would work if I specify some static constants holding the paths...
Third possibility seems quite complicated in terms of maintainability. The number of my classes would probably have to double. If I have 30 classes now, it would suddenly become 60 :/
What is the best/recommended way of doing things like this? If I add properties to my POCOs that aren't included in the db schema, I'm unable to use Dapper.Contrib or Rainbow and similar libraries, because even though "selects" work fine, I can't "insert" nor "delete". I have to hard-code the sql strings for every command which becomes really tedious after some time, when you're doing all the time the same stuff.
EDIT:
The solution from Govind KamalaPrakash Malviya is great, but can't be used every time. I need a way to solve this for any type of properties, even those more complex ones - for instance the number of photos of some album. It's a good idea to query the count of photos along with albums, but assign it to what? Create a decorated class using a Decorator pattern?
How do YOU solve this kind of architecture problems?
I think you should manipulate it in presentation layer because image path for presentation layer only. so use third one but make it easy using utility method
public class PathUtility
{
public static string ImageUrl(string imageName)
{
if(string.IsNullOrEmpty(imageName))
{
throw new Exception("Image name not valid!!");
}
else
{
return "YourImageDirectroyUrl" + imageName;
}
}
}
and use it easily
PathUtility.ImageUrl("apple.jpg");
I normally solve this by leaving the entity object as it is and creating an extra data container, which will either hold a reference to the corresponding entity or implement the corresponding properties from the entity object itself. In the latter case I use a mapping library (AutoMapper) to copy data from an entity to a the enhanced container.
The logic for filling the extra properties normally lies in a factory (or factory method). It's up to you, where you want to place this in your architecture. In a current project we are including them in our data access facade on client side, because we don't want to clutter the data access layer with too many DTO's. This of course means, that the data access layer still needs to support retrieving the extra properties. In your case an operation like int GetNumberOfPhotosForAlbum(Album album).
We found that the benefits outweigh the risk of an ever-growing contract of the data access layer, which of course might need to support many different calls like the example above instead of just EnhancedAlbum GetEnhancedAlbumWithAllKindsOfExtraProperties(long albumId). This might also become a performance problem in some scenarios, because of the overhead of an increased frequency of service calls. In the end you need to decide, what's best for your project.
I like this approach, because my entities (Album) stay untouched and I retain a clear separation of concerns between persistence, client logic and mapping.
Example:
class Album
{
string Name { get; set; }
}
class EnhancedAlbum
{
Album Album { get; set; }
int NumberOfPhotos { get; set; }
}
class EnhancedAlbumFactory
{
private MyDataService _dataService;
//include some means of constructing or (better) injecting the data service
EnhancedAlbum GetEnhancedAlbum(Album album)
{
return new EnhancedAlbum
{
Album = Album,
NumberOfPhotos = _dataService.GetNumberOfPhotosForAlbum(album);
};
}
}

asp.net webservice object manipulation

Possibly not specific to webservices, but...
I have a webmethod that returns:
List<Tadpole> myList = getList();
return new { data = myList , count = 5 };
It returns this as JSON.
my code checks myList[x].fishsticks which isn't actually part of the Tadpole class (so it errors). I am wondering, can I add a fishsticks attribute to myList somehow to avoid the error, so it gets included when I return the data?
Is there perhaps another elegant solution for doing this?
In your example, you'll have to add a fishsticks property to Tadpole.
public class Tadpole
{
//....
public int Fishsticks { get; set; }
}
Also, why are you adding a .Count property to your JSON type? Wouldn't it make more sense to just .data.Count, or just return the list and skip the wrapper entirely?
I haven't checked what properties of List<> get serialized lately, so it's possible that it's not included, but even if that's the case it would make more sense to do this:
List<Tadpole> myList = getList();
return new { data = myList , count = myList.Count };
Or, create a descendant class that overrides .Count and adds a serialization attribute.
Edit
If I remember correctly, anonymous/dynamic types are internally implemented as dictionaries, while classes are, well, not. (BTW, anonymous types and dynamic objects bring a host of performance and maintenance issues along with them.)
If you don't want to modify Tadpole for some reason, you could always create a descendant class:
public class HungryTadpole : TadPole
{
public int FishSticks { get; set; }
}
Strong typing is your friend and will save you many headaches down the road.

ASP.NET which type of collection should I use?

I am writing a class to save searches on my site. I want to have in the class an "Array" of all the parameters that were specified. I tried a NameValueCollection but the problem I ran into is when I have a multi-select (e.g. states) it only stores one of the entries because the key gets taken. I need a collection type that will let me have something like the following:
Name => Bob
State => Alaska
State => Oregon
State => Washington
Company => Acme
What type of collection should I use?
EDIT: ==============================
I'm not sure the comments so far will help. Let me explain a little further. This search class will be used to save the parameters for any search on my site. Different searches may or may not have the same parameters. When this classes save method is called the search will be dumped into a database. One record will be created in the Searches table and as many records an there are items in the collection will be created in the SearchesParameters table. The SearchesParamaters table has these columns (ID,searches_ID,key,value).
The database could care less if there are two parameters with a key of "State". In order to keep my class generic enough to use on all searches without having to be updated I want to have a collection/array that will let me have key/value pairs and also let me have multiple instances of the same key. Really I just want to be able to call searchObj.addParameter(KEY,VALUE); How the class handles that on the back end is mostly irrelevant so long as i can reliably get the correct keys paired up with the correct values.
Is a collection the way to go with this or should I be considering something like two arrays one storing the keys and one storing the values?
A Dictionary that maps String to an List<string>. Something like Dictionary<string, List<string>>.
If an element isn't there in the Dictionary, create a new List for the Key and add to it. Otherwise, simply add the new Value to the existing List.
Create a class, and store that class in a collection.
class Search
{
public string Name { get; set; }
public List<string> State { get; set; }
public string Company { get; set; }
}
Then you can have multiple states per search. Add instances of this to List and away you to.
what about a generic list (System.Collections.Generic)?
e.g.,
string name;
List<string> states;
string company;
You can read up about generic lists here
You should use a List<KeyValuePair<string, string>>
I would use Dictionary<string, HashSet<string>>.

ASP.NET - Storing SQL Queries in Global Resource File?

Is it a good idea to store my SQL queries in a global resource file instead of having it in my codebehind? I know stored procedures would be a better solution but I don't have that luxury on this project.
I don't want queries all over my pages and thought a central repository would be a better idea.
Resource files are usually used for localization. But a string is just a string is just a string, and do you really want to be sending any old string in a resource file to your database?
I completely agree with others that you should be using linq or typed datasets, etc. Personally I've only had to resort to text queries a handful of times over the years, and when I do it's usually something like the following:
You set up a small framework and then all you need to do is maintain an Xml file. An single specific xml file is a lot easier to manage and deploy than a resource dll. You also have a well known place (repository) that stores Sql Queries and some metadata about them versus just some naming convention.
Never underestimate the utility of a (simple) class over a string literal. Once you've started using the class you can then add things down the road that you can't (easily) do with just a simple string.
Notepad compiler, so apologies if this isn't 100%. It's just a sketch of how everything interacts.
public static class SqlResource
{
private static Dictionary<string,SqlQuery> dictionary;
public static void Initialize(string file)
{
List<SqlQuery> list;
// deserialize the xml file
using (StreamReader streamReader = new StreamReader(file))
{
XmlSerializer deserializer = new XmlSerializer(typeof(List<SqlQuery>));
list = (List<SqlQuery>)deserializer.Deserialize(streamReader);
}
dictionary = new Dictionary<string,SqlQuery>();
foreach(var item in list )
{
dictionary.Add(item.Name,item);
}
}
public static SqlQuery GetQueryByName(string name)
{
SqlQuery query = dictionary[name];
if( query == null )
throw new ArgumentException("The query '" + name + "' is not valid.");
if( query.IsObsolete )
{
// TODO - log this.
}
return query;
}
}
public sealed class SqlQuery
{
[XmlAttributeAttribute("name")]
public bool Name { get; set; }
[XmlElement("Sql")]
public bool Sql { get; set; }
[XmlAttributeAttribute("obsolete")]
public bool IsObsolete { get; set; }
[XmlIgnore]
public TimeSpan Timeout { get; set;}
/// <summary>
/// Serialization only - XmlSerializer can't serialize normally
/// </summary>
[XmlAttribute("timeout")]
public string Timeout_String
{
get { return Timeout.ToString(); }
set { Timeout = TimeSpan.Parse(value); }
}
}
your xml file might look like
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfSqlQuery xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<SqlQuery name="EmployeeByEmployeeID" timeout="00:00:30" >
<Sql>
SELECT * From Employee WHERE EmployeeID = #T0
</Sql>
</SqlQuery>
<SqlQuery name="EmployeesForManager" timeout="00:05:00" obsolete="true" >
<Sql>
SELECT * From Employee WHERE ManagerID = #T0
</Sql>
</SqlQuery>
</ArrayOfSqlQuery>
Ok, I'll try to answer again, now when I have more information.
I would make a query-class that hold all querystrings as shared properties or functions that could be named quite well to be easy to use.
I would look up strongly typed datasets with tableadapters and let the tableadapters handle all queries. When you are used with it you'll never go back.
Just add a dataset to your solution, add a connection, and a tableadapter for a table, then start build all querys (update, select, delete, search and so on) and handle it easy in code behind.
I am in the same situation with some developers preferring to write the queries in the resource file. We are using subsonic and I would prefer to use stored procedures rather then using direct queries.
One option, even though it is bad is to place those queries in a config file and read when needed but this is a very bad option and we may use it if everyone cannot be agreement of using the stored procedures.
You could use the XML config file to associate names with stored procedures too. I'm doing that for a current C# project. The "query" would define what procedure to call.
Since some database engines don't support stored queries, that's not always an option.
Sometimes for small projects, it's OK to use parameterized SQL queries (don't concatenate string). This is especially true for select statements.
Views can also be used for selects instead of stored procedures.
Rob

What are good design practices when working with Entity Framework

This will apply mostly for an asp.net application where the data is not accessed via soa. Meaning that you get access to the objects loaded from the framework, not Transfer Objects, although some recommendation still apply.
This is a community post, so please add to it as you see fit.
Applies to: Entity Framework 1.0 shipped with Visual Studio 2008 sp1.
Why pick EF in the first place?
Considering it is a young technology with plenty of problems (see below), it may be a hard sell to get on the EF bandwagon for your project. However, it is the technology Microsoft is pushing (at the expense of Linq2Sql, which is a subset of EF). In addition, you may not be satisfied with NHibernate or other solutions out there. Whatever the reasons, there are people out there (including me) working with EF and life is not bad.make you think.
EF and inheritance
The first big subject is inheritance. EF does support mapping for inherited classes that are persisted in 2 ways: table per class and table the hierarchy. The modeling is easy and there are no programming issues with that part.
(The following applies to table per class model as I don't have experience with table per hierarchy, which is, anyway, limited.) The real problem comes when you are trying to run queries that include one or many objects that are part of an inheritance tree: the generated sql is incredibly awful, takes a long time to get parsed by the EF and takes a long time to execute as well. This is a real show stopper. Enough that EF should probably not be used with inheritance or as little as possible.
Here is an example of how bad it was. My EF model had ~30 classes, ~10 of which were part of an inheritance tree. On running a query to get one item from the Base class, something as simple as Base.Get(id), the generated SQL was over 50,000 characters. Then when you are trying to return some Associations, it degenerates even more, going as far as throwing SQL exceptions about not being able to query more than 256 tables at once.
Ok, this is bad, EF concept is to allow you to create your object structure without (or with as little as possible) consideration on the actual database implementation of your table. It completely fails at this.
So, recommendations? Avoid inheritance if you can, the performance will be so much better. Use it sparingly where you have to. In my opinion, this makes EF a glorified sql-generation tool for querying, but there are still advantages to using it. And ways to implement mechanism that are similar to inheritance.
Bypassing inheritance with Interfaces
First thing to know with trying to get some kind of inheritance going with EF is that you cannot assign a non-EF-modeled class a base class. Don't even try it, it will get overwritten by the modeler. So what to do?
You can use interfaces to enforce that classes implement some functionality. For example here is a IEntity interface that allow you to define Associations between EF entities where you don't know at design time what the type of the entity would be.
public enum EntityTypes{ Unknown = -1, Dog = 0, Cat }
public interface IEntity
{
int EntityID { get; }
string Name { get; }
Type EntityType { get; }
}
public partial class Dog : IEntity
{
// implement EntityID and Name which could actually be fields
// from your EF model
Type EntityType{ get{ return EntityTypes.Dog; } }
}
Using this IEntity, you can then work with undefined associations in other classes
// lets take a class that you defined in your model.
// that class has a mapping to the columns: PetID, PetType
public partial class Person
{
public IEntity GetPet()
{
return IEntityController.Get(PetID,PetType);
}
}
which makes use of some extension functions:
public class IEntityController
{
static public IEntity Get(int id, EntityTypes type)
{
switch (type)
{
case EntityTypes.Dog: return Dog.Get(id);
case EntityTypes.Cat: return Cat.Get(id);
default: throw new Exception("Invalid EntityType");
}
}
}
Not as neat as having plain inheritance, particularly considering you have to store the PetType in an extra database field, but considering the performance gains, I would not look back.
It also cannot model one-to-many, many-to-many relationship, but with creative uses of 'Union' it could be made to work. Finally, it creates the side effet of loading data in a property/function of the object, which you need to be careful about. Using a clear naming convention like GetXYZ() helps in that regards.
Compiled Queries
Entity Framework performance is not as good as direct database access with ADO (obviously) or Linq2SQL. There are ways to improve it however, one of which is compiling your queries. The performance of a compiled query is similar to Linq2Sql.
What is a compiled query? It is simply a query for which you tell the framework to keep the parsed tree in memory so it doesn't need to be regenerated the next time you run it. So the next run, you will save the time it takes to parse the tree. Do not discount that as it is a very costly operation that gets even worse with more complex queries.
There are 2 ways to compile a query: creating an ObjectQuery with EntitySQL and using CompiledQuery.Compile() function. (Note that by using an EntityDataSource in your page, you will in fact be using ObjectQuery with EntitySQL, so that gets compiled and cached).
An aside here in case you don't know what EntitySQL is. It is a string-based way of writing queries against the EF. Here is an example: "select value dog from Entities.DogSet as dog where dog.ID = #ID". The syntax is pretty similar to SQL syntax. You can also do pretty complex object manipulation, which is well explained [here][1].
Ok, so here is how to do it using ObjectQuery<>
string query = "select value dog " +
"from Entities.DogSet as dog " +
"where dog.ID = #ID";
ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance));
oQuery.Parameters.Add(new ObjectParameter("ID", id));
oQuery.EnablePlanCaching = true;
return oQuery.FirstOrDefault();
The first time you run this query, the framework will generate the expression tree and keep it in memory. So the next time it gets executed, you will save on that costly step. In that example EnablePlanCaching = true, which is unnecessary since that is the default option.
The other way to compile a query for later use is the CompiledQuery.Compile method. This uses a delegate:
static readonly Func<Entities, int, Dog> query_GetDog =
CompiledQuery.Compile<Entities, int, Dog>((ctx, id) =>
ctx.DogSet.FirstOrDefault(it => it.ID == id));
or using linq
static readonly Func<Entities, int, Dog> query_GetDog =
CompiledQuery.Compile<Entities, int, Dog>((ctx, id) =>
(from dog in ctx.DogSet where dog.ID == id select dog).FirstOrDefault());
to call the query:
query_GetDog.Invoke( YourContext, id );
The advantage of CompiledQuery is that the syntax of your query is checked at compile time, where as EntitySQL is not. However, there are other consideration...
Includes
Lets say you want to have the data for the dog owner to be returned by the query to avoid making 2 calls to the database. Easy to do, right?
EntitySQL
string query = "select value dog " +
"from Entities.DogSet as dog " +
"where dog.ID = #ID";
ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance)).Include("Owner");
oQuery.Parameters.Add(new ObjectParameter("ID", id));
oQuery.EnablePlanCaching = true;
return oQuery.FirstOrDefault();
CompiledQuery
static readonly Func<Entities, int, Dog> query_GetDog =
CompiledQuery.Compile<Entities, int, Dog>((ctx, id) =>
(from dog in ctx.DogSet.Include("Owner") where dog.ID == id select dog).FirstOrDefault());
Now, what if you want to have the Include parametrized? What I mean is that you want to have a single Get() function that is called from different pages that care about different relationships for the dog. One cares about the Owner, another about his FavoriteFood, another about his FavotireToy and so on. Basicly, you want to tell the query which associations to load.
It is easy to do with EntitySQL
public Dog Get(int id, string include)
{
string query = "select value dog " +
"from Entities.DogSet as dog " +
"where dog.ID = #ID";
ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance))
.IncludeMany(include);
oQuery.Parameters.Add(new ObjectParameter("ID", id));
oQuery.EnablePlanCaching = true;
return oQuery.FirstOrDefault();
}
The include simply uses the passed string. Easy enough. Note that it is possible to improve on the Include(string) function (that accepts only a single path) with an IncludeMany(string) that will let you pass a string of comma-separated associations to load. Look further in the extension section for this function.
If we try to do it with CompiledQuery however, we run into numerous problems:
The obvious
static readonly Func<Entities, int, string, Dog> query_GetDog =
CompiledQuery.Compile<Entities, int, string, Dog>((ctx, id, include) =>
(from dog in ctx.DogSet.Include(include) where dog.ID == id select dog).FirstOrDefault());
will choke when called with:
query_GetDog.Invoke( YourContext, id, "Owner,FavoriteFood" );
Because, as mentionned above, Include() only wants to see a single path in the string and here we are giving it 2: "Owner" and "FavoriteFood" (which is not to be confused with "Owner.FavoriteFood"!).
Then, let's use IncludeMany(), which is an extension function
static readonly Func<Entities, int, string, Dog> query_GetDog =
CompiledQuery.Compile<Entities, int, string, Dog>((ctx, id, include) =>
(from dog in ctx.DogSet.IncludeMany(include) where dog.ID == id select dog).FirstOrDefault());
Wrong again, this time it is because the EF cannot parse IncludeMany because it is not part of the functions that is recognizes: it is an extension.
Ok, so you want to pass an arbitrary number of paths to your function and Includes() only takes a single one. What to do? You could decide that you will never ever need more than, say 20 Includes, and pass each separated strings in a struct to CompiledQuery. But now the query looks like this:
from dog in ctx.DogSet.Include(include1).Include(include2).Include(include3)
.Include(include4).Include(include5).Include(include6)
.[...].Include(include19).Include(include20) where dog.ID == id select dog
which is awful as well. Ok, then, but wait a minute. Can't we return an ObjectQuery<> with CompiledQuery? Then set the includes on that? Well, that what I would have thought so as well:
static readonly Func<Entities, int, ObjectQuery<Dog>> query_GetDog =
CompiledQuery.Compile<Entities, int, string, ObjectQuery<Dog>>((ctx, id) =>
(ObjectQuery<Dog>)(from dog in ctx.DogSet where dog.ID == id select dog));
public Dog GetDog( int id, string include )
{
ObjectQuery<Dog> oQuery = query_GetDog(id);
oQuery = oQuery.IncludeMany(include);
return oQuery.FirstOrDefault;
}
That should have worked, except that when you call IncludeMany (or Include, Where, OrderBy...) you invalidate the cached compiled query because it is an entirely new one now! So, the expression tree needs to be reparsed and you get that performance hit again.
So what is the solution? You simply cannot use CompiledQueries with parametrized Includes. Use EntitySQL instead. This doesn't mean that there aren't uses for CompiledQueries. It is great for localized queries that will always be called in the same context. Ideally CompiledQuery should always be used because the syntax is checked at compile time, but due to limitation, that's not possible.
An example of use would be: you may want to have a page that queries which two dogs have the same favorite food, which is a bit narrow for a BusinessLayer function, so you put it in your page and know exactly what type of includes are required.
Passing more than 3 parameters to a CompiledQuery
Func is limited to 5 parameters, of which the last one is the return type and the first one is your Entities object from the model. So that leaves you with 3 parameters. A pitance, but it can be improved on very easily.
public struct MyParams
{
public string param1;
public int param2;
public DateTime param3;
}
static readonly Func<Entities, MyParams, IEnumerable<Dog>> query_GetDog =
CompiledQuery.Compile<Entities, MyParams, IEnumerable<Dog>>((ctx, myParams) =>
from dog in ctx.DogSet where dog.Age == myParams.param2 && dog.Name == myParams.param1 and dog.BirthDate > myParams.param3 select dog);
public List<Dog> GetSomeDogs( int age, string Name, DateTime birthDate )
{
MyParams myParams = new MyParams();
myParams.param1 = name;
myParams.param2 = age;
myParams.param3 = birthDate;
return query_GetDog(YourContext,myParams).ToList();
}
Return Types (this does not apply to EntitySQL queries as they aren't compiled at the same time during execution as the CompiledQuery method)
Working with Linq, you usually don't force the execution of the query until the very last moment, in case some other functions downstream wants to change the query in some way:
static readonly Func<Entities, int, string, IEnumerable<Dog>> query_GetDog =
CompiledQuery.Compile<Entities, int, string, IEnumerable<Dog>>((ctx, age, name) =>
from dog in ctx.DogSet where dog.Age == age && dog.Name == name select dog);
public IEnumerable<Dog> GetSomeDogs( int age, string name )
{
return query_GetDog(YourContext,age,name);
}
public void DataBindStuff()
{
IEnumerable<Dog> dogs = GetSomeDogs(4,"Bud");
// but I want the dogs ordered by BirthDate
gridView.DataSource = dogs.OrderBy( it => it.BirthDate );
}
What is going to happen here? By still playing with the original ObjectQuery (that is the actual return type of the Linq statement, which implements IEnumerable), it will invalidate the compiled query and be force to re-parse. So, the rule of thumb is to return a List<> of objects instead.
static readonly Func<Entities, int, string, IEnumerable<Dog>> query_GetDog =
CompiledQuery.Compile<Entities, int, string, IEnumerable<Dog>>((ctx, age, name) =>
from dog in ctx.DogSet where dog.Age == age && dog.Name == name select dog);
public List<Dog> GetSomeDogs( int age, string name )
{
return query_GetDog(YourContext,age,name).ToList(); //<== change here
}
public void DataBindStuff()
{
List<Dog> dogs = GetSomeDogs(4,"Bud");
// but I want the dogs ordered by BirthDate
gridView.DataSource = dogs.OrderBy( it => it.BirthDate );
}
When you call ToList(), the query gets executed as per the compiled query and then, later, the OrderBy is executed against the objects in memory. It may be a little bit slower, but I'm not even sure. One sure thing is that you have no worries about mis-handling the ObjectQuery and invalidating the compiled query plan.
Once again, that is not a blanket statement. ToList() is a defensive programming trick, but if you have a valid reason not to use ToList(), go ahead. There are many cases in which you would want to refine the query before executing it.
Performance
What is the performance impact of compiling a query? It can actually be fairly large. A rule of thumb is that compiling and caching the query for reuse takes at least double the time of simply executing it without caching. For complex queries (read inherirante), I have seen upwards to 10 seconds.
So, the first time a pre-compiled query gets called, you get a performance hit. After that first hit, performance is noticeably better than the same non-pre-compiled query. Practically the same as Linq2Sql
When you load a page with pre-compiled queries the first time you will get a hit. It will load in maybe 5-15 seconds (obviously more than one pre-compiled queries will end up being called), while subsequent loads will take less than 300ms. Dramatic difference, and it is up to you to decide if it is ok for your first user to take a hit or you want a script to call your pages to force a compilation of the queries.
Can this query be cached?
{
Dog dog = from dog in YourContext.DogSet where dog.ID == id select dog;
}
No, ad-hoc Linq queries are not cached and you will incur the cost of generating the tree every single time you call it.
Parametrized Queries
Most search capabilities involve heavily parametrized queries. There are even libraries available that will let you build a parametrized query out of lamba expressions. The problem is that you cannot use pre-compiled queries with those. One way around that is to map out all the possible criteria in the query and flag which one you want to use:
public struct MyParams
{
public string name;
public bool checkName;
public int age;
public bool checkAge;
}
static readonly Func<Entities, MyParams, IEnumerable<Dog>> query_GetDog =
CompiledQuery.Compile<Entities, MyParams, IEnumerable<Dog>>((ctx, myParams) =>
from dog in ctx.DogSet
where (myParams.checkAge == true && dog.Age == myParams.age)
&& (myParams.checkName == true && dog.Name == myParams.name )
select dog);
protected List<Dog> GetSomeDogs()
{
MyParams myParams = new MyParams();
myParams.name = "Bud";
myParams.checkName = true;
myParams.age = 0;
myParams.checkAge = false;
return query_GetDog(YourContext,myParams).ToList();
}
The advantage here is that you get all the benifits of a pre-compiled quert. The disadvantages are that you most likely will end up with a where clause that is pretty difficult to maintain, that you will incur a bigger penalty for pre-compiling the query and that each query you run is not as efficient as it could be (particularly with joins thrown in).
Another way is to build an EntitySQL query piece by piece, like we all did with SQL.
protected List<Dod> GetSomeDogs( string name, int age)
{
string query = "select value dog from Entities.DogSet where 1 = 1 ";
if( !String.IsNullOrEmpty(name) )
query = query + " and dog.Name == #Name ";
if( age > 0 )
query = query + " and dog.Age == #Age ";
ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>( query, YourContext );
if( !String.IsNullOrEmpty(name) )
oQuery.Parameters.Add( new ObjectParameter( "Name", name ) );
if( age > 0 )
oQuery.Parameters.Add( new ObjectParameter( "Age", age ) );
return oQuery.ToList();
}
Here the problems are:
- there is no syntax checking during compilation
- each different combination of parameters generate a different query which will need to be pre-compiled when it is first run. In this case, there are only 4 different possible queries (no params, age-only, name-only and both params), but you can see that there can be way more with a normal world search.
- Noone likes to concatenate strings!
Another option is to query a large subset of the data and then narrow it down in memory. This is particularly useful if you are working with a definite subset of the data, like all the dogs in a city. You know there are a lot but you also know there aren't that many... so your CityDog search page can load all the dogs for the city in memory, which is a single pre-compiled query and then refine the results
protected List<Dod> GetSomeDogs( string name, int age, string city)
{
string query = "select value dog from Entities.DogSet where dog.Owner.Address.City == #City ";
ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>( query, YourContext );
oQuery.Parameters.Add( new ObjectParameter( "City", city ) );
List<Dog> dogs = oQuery.ToList();
if( !String.IsNullOrEmpty(name) )
dogs = dogs.Where( it => it.Name == name );
if( age > 0 )
dogs = dogs.Where( it => it.Age == age );
return dogs;
}
It is particularly useful when you start displaying all the data then allow for filtering.
Problems:
- Could lead to serious data transfer if you are not careful about your subset.
- You can only filter on the data that you returned. It means that if you don't return the Dog.Owner association, you will not be able to filter on the Dog.Owner.Name
So what is the best solution? There isn't any. You need to pick the solution that works best for you and your problem:
- Use lambda-based query building when you don't care about pre-compiling your queries.
- Use fully-defined pre-compiled Linq query when your object structure is not too complex.
- Use EntitySQL/string concatenation when the structure could be complex and when the possible number of different resulting queries are small (which means fewer pre-compilation hits).
- Use in-memory filtering when you are working with a smallish subset of the data or when you had to fetch all of the data on the data at first anyway (if the performance is fine with all the data, then filtering in memory will not cause any time to be spent in the db).
Singleton access
The best way to deal with your context and entities accross all your pages is to use the singleton pattern:
public sealed class YourContext
{
private const string instanceKey = "On3GoModelKey";
YourContext(){}
public static YourEntities Instance
{
get
{
HttpContext context = HttpContext.Current;
if( context == null )
return Nested.instance;
if (context.Items[instanceKey] == null)
{
On3GoEntities entity = new On3GoEntities();
context.Items[instanceKey] = entity;
}
return (YourEntities)context.Items[instanceKey];
}
}
class Nested
{
// Explicit static constructor to tell C# compiler
// not to mark type as beforefieldinit
static Nested()
{
}
internal static readonly YourEntities instance = new YourEntities();
}
}
NoTracking, is it worth it?
When executing a query, you can tell the framework to track the objects it will return or not. What does it mean? With tracking enabled (the default option), the framework will track what is going on with the object (has it been modified? Created? Deleted?) and will also link objects together, when further queries are made from the database, which is what is of interest here.
For example, lets assume that Dog with ID == 2 has an owner which ID == 10.
Dog dog = (from dog in YourContext.DogSet where dog.ID == 2 select dog).FirstOrDefault();
//dog.OwnerReference.IsLoaded == false;
Person owner = (from o in YourContext.PersonSet where o.ID == 10 select dog).FirstOrDefault();
//dog.OwnerReference.IsLoaded == true;
If we were to do the same with no tracking, the result would be different.
ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>)
(from dog in YourContext.DogSet where dog.ID == 2 select dog);
oDogQuery.MergeOption = MergeOption.NoTracking;
Dog dog = oDogQuery.FirstOrDefault();
//dog.OwnerReference.IsLoaded == false;
ObjectQuery<Person> oPersonQuery = (ObjectQuery<Person>)
(from o in YourContext.PersonSet where o.ID == 10 select o);
oPersonQuery.MergeOption = MergeOption.NoTracking;
Owner owner = oPersonQuery.FirstOrDefault();
//dog.OwnerReference.IsLoaded == false;
Tracking is very useful and in a perfect world without performance issue, it would always be on. But in this world, there is a price for it, in terms of performance. So, should you use NoTracking to speed things up? It depends on what you are planning to use the data for.
Is there any chance that the data your query with NoTracking can be used to make update/insert/delete in the database? If so, don't use NoTracking because associations are not tracked and will causes exceptions to be thrown.
In a page where there are absolutly no updates to the database, you can use NoTracking.
Mixing tracking and NoTracking is possible, but it requires you to be extra careful with updates/inserts/deletes. The problem is that if you mix then you risk having the framework trying to Attach() a NoTracking object to the context where another copy of the same object exist with tracking on. Basicly, what I am saying is that
Dog dog1 = (from dog in YourContext.DogSet where dog.ID == 2).FirstOrDefault();
ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>)
(from dog in YourContext.DogSet where dog.ID == 2 select dog);
oDogQuery.MergeOption = MergeOption.NoTracking;
Dog dog2 = oDogQuery.FirstOrDefault();
dog1 and dog2 are 2 different objects, one tracked and one not. Using the detached object in an update/insert will force an Attach() that will say "Wait a minute, I do already have an object here with the same database key. Fail". And when you Attach() one object, all of its hierarchy gets attached as well, causing problems everywhere. Be extra careful.
How much faster is it with NoTracking
It depends on the queries. Some are much more succeptible to tracking than other. I don't have a fast an easy rule for it, but it helps.
So I should use NoTracking everywhere then?
Not exactly. There are some advantages to tracking object. The first one is that the object is cached, so subsequent call for that object will not hit the database. That cache is only valid for the lifetime of the YourEntities object, which, if you use the singleton code above, is the same as the page lifetime. One page request == one YourEntity object. So for multiple calls for the same object, it will load only once per page request. (Other caching mechanism could extend that).
What happens when you are using NoTracking and try to load the same object multiple times? The database will be queried each time, so there is an impact there. How often do/should you call for the same object during a single page request? As little as possible of course, but it does happens.
Also remember the piece above about having the associations connected automatically for your? You don't have that with NoTracking, so if you load your data in multiple batches, you will not have a link to between them:
ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>)(from dog in YourContext.DogSet select dog);
oDogQuery.MergeOption = MergeOption.NoTracking;
List<Dog> dogs = oDogQuery.ToList();
ObjectQuery<Person> oPersonQuery = (ObjectQuery<Person>)(from o in YourContext.PersonSet select o);
oPersonQuery.MergeOption = MergeOption.NoTracking;
List<Person> owners = oPersonQuery.ToList();
In this case, no dog will have its .Owner property set.
Some things to keep in mind when you are trying to optimize the performance.
No lazy loading, what am I to do?
This can be seen as a blessing in disguise. Of course it is annoying to load everything manually. However, it decreases the number of calls to the db and forces you to think about when you should load data. The more you can load in one database call the better. That was always true, but it is enforced now with this 'feature' of EF.
Of course, you can call
if( !ObjectReference.IsLoaded ) ObjectReference.Load();
if you want to, but a better practice is to force the framework to load the objects you know you will need in one shot. This is where the discussion about parametrized Includes begins to make sense.
Lets say you have you Dog object
public class Dog
{
public Dog Get(int id)
{
return YourContext.DogSet.FirstOrDefault(it => it.ID == id );
}
}
This is the type of function you work with all the time. It gets called from all over the place and once you have that Dog object, you will do very different things to it in different functions. First, it should be pre-compiled, because you will call that very often. Second, each different pages will want to have access to a different subset of the Dog data. Some will want the Owner, some the FavoriteToy, etc.
Of course, you could call Load() for each reference you need anytime you need one. But that will generate a call to the database each time. Bad idea. So instead, each page will ask for the data it wants to see when it first request for the Dog object:
static public Dog Get(int id) { return GetDog(entity,"");}
static public Dog Get(int id, string includePath)
{
string query = "select value o " +
" from YourEntities.DogSet as o " +
Please do not use all of the above info such as "Singleton access". You absolutely 100% should not be storing this context to be reused as it is not thread safe.
While informative I think it may be more helpful to share how all this fits into a complete solution architecture. Example- Got a solution showing where you use both EF inheritance and your alternative so that it shows their performance difference.

Resources