I read Ayende post about Futures in NHibernate and started using it. At first only for isolated scenario - queries that return paged results (to both query and count items in single trip to database). But than I thought why not to use futures for every single query. In most of my action methods I need to make multiple queries, on average 5 per page (to get user profile, footer, categories, group information etc).
Obvious benefit is that overall this queries summed up should take shorter to run. But are there any drawbacks for such design of web application? Areas that make me curious are:
Caching in NHibernate - will this design prevent in any way of using caching features of NHibernate?
Maybe such single pack of queries will lock database tables and make other parallel actions wait? I use Read Committed isolation level.
Any other? What are your experiences with futures? Would you recommend using it in such a way, to every possible item?
Your concerns won't apply. Caching won't be broken and its pretty rare to lock tables just batching up read statements.
Futures rock and I wish every ORM I use had them.
Related
Does GSI Overloading provide any performance benefits, e.g. by allowing cached partition keys to be more efficiently routed? Or is it mostly about preventing you from running out of GSIs? Or maybe opening up other query patterns that might not be so immediately obvious.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-overloading.html
e.g. I you have a base table and you want to partition it so you can query a specific attribute (which becomes the PK of the GSI) over two dimensions, does it make any difference if you create 1 overloaded GSI, or 2 non-overloaded GSIs.
For an example of what I'm referring to see the attached image:
https://drive.google.com/file/d/1fsI50oUOFIx-CFp7zcYMij7KQc5hJGIa/view?usp=sharing
The base table has documents which can be in a published or draft state. Each document is owned by a single user. I want to be able to query by user to find:
Published documents by date
Draft documents by date
I'm asking in relation to the more recent DynamoDB best practice that implies that all applications only require one table. Some of the techniques being shown in this documentation show how a reasonably complex relational model can be squashed into 1 DynamoDB table and 2 GSIs and yet still support 10-15 query patterns.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-relational-modeling.html
I'm trying to understand why someone would go down this route as it seems incredibly complicated.
The idea – in a nutshell – is to not have the overhead of doing joins on the database layer or having to go back to the database to effectively try to do the join on the application layer. By having the data sliced already in the format that your application requires, all you really need to do is basically do one select * from table where x = y call which returns multiple entities in one call (in your example that could be Users and Documents). This means that it will be extremely efficient and scalable on the db level. But also means that you'll be less flexible as you need to know the access patterns in advance and model your data accordingly.
See Rick Houlihan's excellent talk on this https://www.youtube.com/watch?v=HaEPXoXVf2k for why you'd want to do this.
I don't think it has any performance benefits, at least none that's not called out – which makes sense since it's the same query and storage engine.
That being said, I think there are some practical reasons for why you'd want to go with a single table as it allows you to keep your infrastructure somewhat simple: you don't have to keep track of metrics and/or provisioning settings for separate tables.
My opinion would be cost of storage and provisioned throughput.
Apart from that not sure with new limit of 20
I have a legacy website that needs a little optimization because of poor performance. It is an asp.net shopping website with linq to sql as data layer and MVP pattern as UI pattern.
The most costly entities in the db are product and category tables that have a one to many relationship. These two entities might not change regularly unless a user of admin group decides to add a product or category… etc. i was wondering how resource costly would it be to create and fetch everything from these two entities for each request! so if i could have had a way to keep my data alive…
first I thought well let’s use AJAX for data retrievals so I will create only those entities that I need to query or bind to, but wait, how can I do that without creating a new DataContext instance?!!
At the other side, using cache for whole DataContext is considered a bad decision because of memory cost. So what would be the best option here? How can I improve things?
UPDATE
1) doing what #HatSoft suggested.
Cons: those approaches will not help your code, only the database. beside this, there might be memory issues since we're putting data in memory instead of rendered html, however this might be the best option regarding de-coupling.
2) using output caching we have this code in an http handler with *.aspx wildcard:
string pagePath = Context.Request.Url.AbsolutePath;
object cacheKey = application[pagePath];
if(cacheKey == null)
return; //application restarted/first run so cache the stuff
else
Context.Response.RemoveOutputCacheItem(pagePath);
Cons: now we should link the pagePath to each database entity that the page uses, but if i do so then i'm coupling things instead of de-coupling them. this approach also will run into a little hard coding.
3) another solution would be output caching in post-cache mode instead of control cache mode. using Subsituation element and setting the OutPutCache Duration to 86400 so the page will be re-created every 24 hours.
Cons: hard coding user controls to produce the html output for Subsituation element dynamically.
so what do you suggest?
I would suggest you look in to SqlDependency class please read this article http://www.asp.net/web-forms/tutorials/data-access/caching-data/using-sql-cache-dependencies-cs
Also I would suggest you look in to loading data in the cache at application startup if it suits your application. Please see a good example here http://www.asp.net/web-forms/tutorials/data-access/caching-data/caching-data-at-application-startup-cs
With Linq2SQL you can use LinqToCache which offers a SqlDependency powered cache for your LINQ queries. It transforms the IQueryable<Products> into IEnumerable<Products> and enumerates form memmory after first access (first iteration of the underlying IQueryable). Based on SqlDependency data change notifications it invalidates the list and subsequent access will query again from DB, and cache the result.
My recommendation would be to cache the Products list and Categories in memory, since they change seldom and I expect them to be of a fairly constrained size.
I'm creating an ASP.NET MVC app that uses EF to perform all DB tasks.
There's a couple of related tables in the database that never change and I was thinking on creating a static collection that retrieves the data from those two tables (it's a few hundred records) the first time it is requested and just stores it in an object to prevent hitting the database every time.
Since I've read several people saying that you should avoid static objects in ASP.NET I was wondering if this was a bad practice or if it is acceptable for scenarios like this (read-only and small amount of data which should prevent concurrency problems).
Also I would like to know if there are other better alternatives to do this.
Thanks.
I have done exactly what you are planning on doing for exactly the same reason. It has been working well for several years already.
Just make sure that you get the initialization of the data right and you should be fine. When initializing, keep in mind:
Don't use locking if at all possible (or your app will deadlock 2
minutes before you're going on vacation)
You MUST NOT under any circumstance let a static constructor fail
Make sure no consumer of your cache has the ability to modify it
If the data isn't really static and you would actually need to re-read it fairly often then this might not be the best solution.
Just in case you're wondering, I've used this approach to cache for instance country data, currency data (base data, not rates), sales unit data (pcs, m, kg etc). These are all stored in a database but almost never change.
It is not a very good approach to use static objects. I would use something like RavenDB which can be used to store your settings or DB data in-memory. It has a very small footprint and is very fast. It has full LINQ support.
Instead create only stored procedures and call them from from the code?
There is a place for dynamic SQL and/or ad hoc SQL, but it needs to be justified based on the particular usage needs.
Stored procedures are by far a best practice for almost all situations and should be strongly considered first.
This issue is a little bigger than just procs or ad hoc, because the database has a wide variety of tools to define its interface, including tables, views, functions and procedures.
People here have mentioned the execution plans and parameterization but, by far, the most important thing in my mind is that any technique which relies on exposed base tables to users means that you lose any ability for the database to change its underlying implementation or control security vertically or horizontally. At the very least, I would expose only views to a typical application/user/role.
In a scenario where the application or user's account only has access to EXEC SPs, then there is no possibility of that account being able to even have a hope of using a SQL injection of the form: "; SELECT name, password from USERS;" or "; DELETE FROM USERS;" or "; DROP TABLE USERS;" because the account doesn't have anything but EXEC (and certainly no DDL). You can control column visibility at the SP level and not have to deny select on an employee salary column, for example.
In other words, unless you are comfortable granting db_datareader to public (because that's effectively what you are doing when you LINQ-to-tables), then you need some sort of realistic security in your application, and SPs are the only way to go, with LINQ-to-views possibly being acceptable.
Depends entirely on what you're doing.
As a general rule a stored proc will have it's query plan cached better than a dynamically generated SQL statement. It will also be slightly easier to maintain indexes for.
However, dynamically generated SQL statements can have their query plans cached, so the difference is marginal.
Dynamically generated SQL statement can also introduce security risk - always parameterise them.
That said sprocs are a pain to maintain and update, they separate DB-logic and .Net code in a way that makes it harder for developers to piece together what a data access method is doing.
Also, to fix or update a SQL string you just change code. To fix or update a sproc you have to change the database - often a much messier option.
So I wouldn't recommend that as a 'one size fits all' best practice.
There is no right or wrong answer here. There are benefits to both which can be easily obtained through a google search. Different projects with different requirements may lead you to different solutions. It's not as black or white as you might want it to be. You might as well throw ORMs into the mix. If you prefer sql queries in your data layer as opposed to stored procs, make sure you use parametrized queries.
sql in sp- easy to maintain, sql in app- pain in the butt ot maintain.
it's so much faster and easier to hop onto a sql instance, modify an sp, test it, then deploy the sp, instead of having to modify the code in the app, test it, then deploy the app.
It depends on the data distribution in your table. Prepared query plans and stored procedures get cached, and the plan itself depends on the table statistics.
Suppose you've building a blog and that your posts table has a user_id. And that you're frequently doing stuff like:
select posts.* from posts where user_id = ? order by published desc limit 20;
Suppose indexes on posts (user_id) and posts (published desc).
Additionally suppose that you've two authors, author1 which wrote 3 posts a long time ago, and author2 who has written 10k posts since.
In this case, the query plan of the ad hoc query will be very different depending on whether you're fetching the author1 posts or the author2 posts:
For author1, the database will decide to use the index on user_id and sort the results.
For author2, the database will read the first 20 rows using the index on published.
If you prepare the statement, the planner will pick either of the two. Suppose the second (which I think is likely): applied to author1, this means going through the whole table by way of the index -- which is much slower than the optimal plan.
If simplicity is your goal, then an ORM would be a good practice for your simple database operations
ORMs like Entity Framework, nHibernate, LINQ to SQL, etc. will manage the code creation of the data access and repository layers and provide you with strongly typed objects representing your tables. This can lead to a cleaner, more maintainable architecture.
Save the stored procedures for your more complex queries. This is where you can take advantage of advanced SQL and cached query plans.
Dynamic SQL - Bad
Stored Procedures - Better
Linq-To-SQL or Linq-to-EF (or ORM tools) - Best
You do not want dynamic SQL inside your application since you do not have compile-time checking. Stored procedures will at least be checked, but it is still not part of a cohesive usnit and removes business logic to the database layer. Linq-To-EF will allow business logic to stay inside your application and allow you to have compile-time checking of syntax.
Say I need to populate 4 or 5 dropdowns w/ items from a database. Each drop down will have < 15 items in it. These items almost never change.
Now I could query the DB each time the page is accessed or I could grab the values from a custom class that would check to see if they already exist in ASP.Net's cache and only if they don't query the DB to update the cache.
It's trivial for me to write but I'm unsure if the performace would be better or not. I think it would be (although not likely anything huge).
What do you think?
When dealing with performance issues you should always:
Do things the simplest way first (avoid premature optimisation)
Performance test your code with set performance goals (e.g. 200ms response time under load of N concurent users)
Then, IF your code doesn't perform then profile your code to determine what is slow, and profile your proposed performance fixes to accurately measure what the real-world performance change will be.
Having said that then yes, what you are suggesting seems sensible (you would usually expect an in-memory cache to be quicker than a database), however it also depends on what data is being returned, what the memory load of your application is, how expensive the query is, what the query parameters are etc...
You should performance test your changes before and after to determine the actual effect of your changes (including things like memory load), and you should only really be doing things like this once you have identified that these dropdowns are the cause of an unacceptable performance problem.
That's what System.Web.Helpers.WebCache class exists for.
IO is usually more expensive than memory operations (by orders of magnitude). Especially if your database is in another machine, then you would even be using network resources, and it will definitely be faster to just use the cache.
But indeed, optimize in the end when you have really identified it as a performance bottleneck by measuring.
Quick answer to your question:
Use the built in .Net cache.
Additional points to ponder over..
Preferably, retrieve all master data in a single database retrieval (think stored procedure and dataset): though, I do not advocate the used of stored procs in all scenarios.
As you rightly said, ensure that your data access layer checks the cache before making a round trip to the database
Also, as your drop down values do not change very often; do remember to keep a long expiry duration
Finally, based on your page design you could also look at Fragment Caching (partial page caching: user controls) which could give you bigger benefits since now you neither access the data cache nor the database.
Performance:
Again, the performance depends more on the application's load as compared to your direct round trips for fetching the master data. Put simply, As Thomas suggested use the cache class!