Caching result of SELECT statement for reuse in multiple queries - asp.net

I have a reasonably complex query to extract the Id field of the results I am interested in based on parameters entered by the user.
After extracting the relevant Ids I am using the resulting set of Ids several times, in separate queries, to extract the actual output record sets I want (by joining to other tables, using aggregate functions, etc).
I would like to avoid running the initial query separately for every set of results I want to return. I imagine my situation is a common pattern so I am interested in what the best approach is.
The database is in MS SQL Server and I am using .NET 3.5.

It would definitely help if the question contained some measurements of the unoptimized solution (data sizes, timings). There is a variety of techniques that could be considered here, some listed in the other answers. I will assume that the reason why you do not want to run the same query repeatedly is performance.
If all the uses of the set of cached IDs consist of joins of the whole set to additional tables, the solution should definitely not involve caching the set of IDs outside of the database. Data should not travel there and back again if you can avoid it.
In some cases (when cursors or extremely complex SQL are not involved) it may be best (even if counterintuitive) to perform no caching and simply join the repetitive SQL to all desired queries. After all, each query needs to be traversed based on one of the joined tables and then the performance depends to a large degree on availability of indexes necessary to join and evaluate all the remaining information quickly.
The most intuitive approach to "caching" the set of IDs within the database is a temporary table (if named #something, it is private to the connection and therefore usable by parallel independent clients; or it can be named ##something and be global). If the table is going to have many records, indexes are necessary. For optimum performance, the index should be a clustered index (only one per table allowed), or be only created after constructing that set, where index creation is slightly faster.
Indexed views are cleary preferable to temporary tables except when the underlying data is read only during the whole process or when you can and want to ignore such updates to keep the whole set of reports consistent as far as the set goes. However, the ability of indexed views to always accurately project the underlying data comes at a cost of slowing down those updates.
One other answer to this question mentions stored procedures. This is largely a way of organizing your code. However, it if you go this way, it is preferable to avoid using temporary tables, because such references to a temporary table prevent pre-compilation of the stored procedure; go for views or indexed views if you can.
Regardless of the approach you choose, do not guess at the performance characteristics and query optimizer behavior. Learn to display query execution plans (within SQL Server Management Studio) and make sure that you see index accesses as opposed to nested loops combining multiple large sets of data; only add indexes that demonstrably and drastically change the performance of your queries. A well chosen index can often change the performance of a query by a factor of 1000, so this is somewhat complex to learn but crucial for success.
And last but not least, make sure you use UPDATE STATISTICS when repopulating the database (and nightly in production), or your query optimizer will not be able to put the indexes you have created to their best uses.

If you are planning to cache the result set in your application code, then ASP.NET has cache, Your Winform will have the object holding the data with it with which you can reuse the data.
If planning to do the same in SQL Server, you might consider using indexed views to find out the Id's. The view will be materialized and hence you can get the results faster. You might even consider using a staging table to hold the id's temporarily.

With SQL Server 2008 you can pass table variables as params to SQL. Just cache the IDs and then pass them as a table variable to the queries that fetch the data. The only caveat of this approach is that you have to predefine the table type as UDT.
http://msdn.microsoft.com/en-us/library/bb510489.aspx

For SQL Server, Microsoft generally recommends using stored procedures whenever practical.
Here are a few of the advantages:
http://blog.sqlauthority.com/2007/04/13/sql-server-stored-procedures-advantages-and-best-advantage/
* Execution plan retention and reuse
* Query auto-parameterization
* Encapsulation of business rules and policies
* Application modularization
* Sharing of application logic between applications
* Access to database objects that is both secure and uniform
* Consistent, safe data modification
* Network bandwidth conservation
* Support for automatic execution at system start-up
* Enhanced hardware and software capabilities
* Improved security
* Reduced development cost and increased reliability
* Centralized security, administration, and maintenance for common routines
It's also worth noting that, unlike other RDBMS vendors (like Oracle, for example), MSSQL automatically caches all execution plans:
http://msdn.microsoft.com/en-us/library/ms973918.aspx
However, for the last couple of versions of SQL Server, execution
plans are cached for all T-SQL batches, regardless of whether or not
they are in a stored procedure

The best approach depends on how often the Id changes, or how often you want to look it up again.
One technique is to simply store the result in the ASP.NET object cache, using the Cache object (also accessible from HttpRuntime.Cache). For example (from a page):
this.Cache["key"] = "value";
There are many possible variations on this theme.

You can use Memcached to cache values in the memory.
As I see there are some .net ports.

How frequently does the data change that you'll be querying? To me, this sounds like a perfect scenario for data warehousing, where you flatting the data for quicker data retrieval and create the tables exactly as your 'DTO' wants to see the data. This method is different than an indexed view in that it's simply a table which will have quick seek operations, and could especially be improved if you setup the indexes properly on the columns that you plan to query

You can create Global temporary Table. Create the table on the fly. Now insert the records as per your request. Access this table in your next request in your joins... for reusability

Related

Does DynamoDB GSI overloading give performance benefits or just flexibility

Does GSI Overloading provide any performance benefits, e.g. by allowing cached partition keys to be more efficiently routed? Or is it mostly about preventing you from running out of GSIs? Or maybe opening up other query patterns that might not be so immediately obvious.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-overloading.html
e.g. I you have a base table and you want to partition it so you can query a specific attribute (which becomes the PK of the GSI) over two dimensions, does it make any difference if you create 1 overloaded GSI, or 2 non-overloaded GSIs.
For an example of what I'm referring to see the attached image:
https://drive.google.com/file/d/1fsI50oUOFIx-CFp7zcYMij7KQc5hJGIa/view?usp=sharing
The base table has documents which can be in a published or draft state. Each document is owned by a single user. I want to be able to query by user to find:
Published documents by date
Draft documents by date
I'm asking in relation to the more recent DynamoDB best practice that implies that all applications only require one table. Some of the techniques being shown in this documentation show how a reasonably complex relational model can be squashed into 1 DynamoDB table and 2 GSIs and yet still support 10-15 query patterns.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-relational-modeling.html
I'm trying to understand why someone would go down this route as it seems incredibly complicated.
The idea – in a nutshell – is to not have the overhead of doing joins on the database layer or having to go back to the database to effectively try to do the join on the application layer. By having the data sliced already in the format that your application requires, all you really need to do is basically do one select * from table where x = y call which returns multiple entities in one call (in your example that could be Users and Documents). This means that it will be extremely efficient and scalable on the db level. But also means that you'll be less flexible as you need to know the access patterns in advance and model your data accordingly.
See Rick Houlihan's excellent talk on this https://www.youtube.com/watch?v=HaEPXoXVf2k for why you'd want to do this.
I don't think it has any performance benefits, at least none that's not called out – which makes sense since it's the same query and storage engine.
That being said, I think there are some practical reasons for why you'd want to go with a single table as it allows you to keep your infrastructure somewhat simple: you don't have to keep track of metrics and/or provisioning settings for separate tables.
My opinion would be cost of storage and provisioned throughput.
Apart from that not sure with new limit of 20

How can i improve the performance of the SQLite database?

Background: I am using SQLite database in my flex application. Size of the database is 4 MB and have 5 tables which are
table 1 have 2500 records
table 2 have 8700 records
table 3 have 3000 records
table 4 have 5000 records
table 5 have 2000 records.
Problem: Whenever I run a select query on any table, it takes around (approx 50 seconds) to fetch data from database tables. This has made the application quite slow and unresponsive while it fetches the data from the table.
How can i improve the performance of the SQLite database so that the time taken to fetch the data from the tables is reduced?
Thanks
As I tell you in a comment, without knowing what structures your database consists of, and what queries you run against the data, there is nothing we can infer suggesting why your queries take much time.
However here is an interesting reading about indexes : Use the index, Luke!. It tells you what an index is, how you should design your indexes and what benefits you can harvest.
Also, if you can post the queries and the table schemas and cardinalities (not the contents) maybe it could help.
Are you using asynchronous or synchronous execution modes? The difference between them is that asynchronous execution runs in the background while your application continues to run. Your application will then have to listen for a dispatched event and then carry out any subsequent operations. In synchronous mode, however, the user will not be able to interact with the application until the database operation is complete since those operations run in the same execution sequence as the application. Synchronous mode is conceptually simpler to implement, but asynchronous mode will yield better usability.
The first time SQLStatement.execute() on a SQLStatement instance, the statement is prepared automatically before executing. Subsequent calls will execute faster as long as the SQLStatement.text property has not changed. Using the same SQLStatement instances is better than creating new instances again and again. If you need to change your queries, then consider using parameterized statements.
You can also use techniques such as deferring what data you need at runtime. If you only need a subset of data, pull that back first and then retrieve other data as necessary. This may depend on your application scope and what needs you have to fulfill though.
Specifying the database with the table names will prevent the runtime from checking each database to find a matching table if you have multiple databases. It also helps prevent the runtime will choose the wrong database if this isn't specified. Do SELECT email FROM main.users; instead of SELECT email FROM users; even if you only have one single database. (main is automatically assigned as the database name when you call SQLConnection.open.)
If you happen to be writing lots of changes to the database (multiple INSERT or UPDATE statements), then consider wrapping it in a transaction. Changes will made in memory by the runtime and then written to disk. If you don't use a transaction, each statement will result in multiple disk writes to the database file which can be slow and consume lots of time.
Try to avoid any schema changes. The table definition data is kept at the start of the database file. The runtime loads these definitions when the database connection is opened. Data added to tables is kept after the table definition data in the database file. If changes such as adding columns or tables, the new table definitions will be mixed in with table data in the database file. The effect of this is that the runtime will have to read the table definition data from different parts of the file rather than at the beginning. The SQLConnection.compact() method restructures the table definition data so it is at the the beginning of the file, but its downside is that this method can also consume much time and more so if the database file is large.
Lastly, as Benoit pointed out in his comment, consider improving your own SQL queries and table structure that you're using. It would be helpful to know your database structure and queries are the actual cause of the slow performance or not. My guess is that you're using synchronous execution. If you switch to asynchronous mode, you'll see better performance but that doesn't mean it has to stop there.
The Adobe Flex documentation online has more information on improving database performance and best practices working with local SQL databases.
You could try indexing some of the columns used in the WHERE clause of your SELECT statements. You might also try minimizing usage of the LIKE keyword.
If you are joining your tables together, you might try simplifying the table relationships.
Like others have said, it's hard to get specific without knowing more about your schema and the SQL you are using.

Passing whole dataset to stored procedure in MSSQL 2005

How do I pass a dataset object to a stored procedure? The dataset comprises multiple tables and I'll need to be able to access them from within the SQL.
You can use Table valued parameter for passing single table in SQL 2008 http://msdn.microsoft.com/en-us/library/bb675163.aspx
or
refer to this article and use SQL CLR procedure to pass dataset http://blogs.msdn.com/b/jpapiez/archive/2005/09/26/474059.aspx
It looks like you can do this with SQL Server 2008 or newer (at least with a DataTable). Here are the links:
http://www.eggheadcafe.com/community/aspnet/10/10138579/passing-dataset-to-stored-procedure.aspx
http://www.sqlteam.com/article/sql-server-2008-table-valued-parameters
As the article from MusiGenesis' answer states
In SQL Server 2005 and earlier, it is
not possible to pass a table variable
as a parameter to a stored procedure.
When multiple rows of data to SQL
Server need to send multiple rows of
data to SQL Server, developers either
had to send one row at a time or come
up with other workarounds to meet
requirements. While a VB.Net developer
recently informed me that there is a
SQLBulkCopy object available in .Net
to send multiple rows of data to SQL
Server at once, the data still can not
be passed to a stored proc.
At the risk of stating obvious here are two more approaches
Parametrize your processing procedure
You might re-evaluate if you truly and really need to pass a general table variable. While sometimes this can not be avoided the reason why this is a later addition to the set of features that MS SQL Server has is partially because usually you can get around it by structuring your stored procedures and the flow of your data processing.
If you are able to 'parametrize' your process then you should be able to let stored procedures retrieve full dataset based on a limited number of parameters.
This will make the process less flexible, but it will also make it more controlled, which is not a bad thing (similarly like the database which interfaces with applications only on the level of stored procedures is more robust, this approach also, by limiting the flexibility reduces the number of possible cases and consequently the number of possibly unhandeled cases. read: security holes and general bugs)
Temp tables
Besides the above there's always approach with temp tables, which can be more or less complicated, depending on the scope of sharing that you need on the data (sharing can be between db users, app users, connections, processes, etc..).
Nice side effect is that such approach would allow persistence of the process (which bring you closer to having undo, redo and ability to continue interrupted work).

Alternatives of Datatable

In my web application, I have a dynamic query that returns huge data to datatable, and this query is often recalled with different parameters. So database is exhausted.
I want to get all record with no parameters to an object, and perform queries (may be with linq) on this object. So database will not be exthausted.
Which objects can be used instead of datatable?
This is one of my pet peeves - people who return all the data from the database.
There is absolutely no need for this unless you are doing reporting.
If you are doing reporting, then you need to increase your hardware capability so that the database can cope. This may also include tuning your database, rearranging tables, reindexing, regular rebuilding of indexes, updating statistics, archiving out old data, etc.
If you are NOT doing reporting, then start limiting how much data can be queried at any one time. Users DO NOT need to see massive quantities of data all at once. They need to see discrete amounts of data presented in a manageable and coherent way.
Another rule of thumb i like to observe is: let your database server do the work, it is made to manipulate lots of data, it is what it is good at, and it should have the power to do it. Pulling back loads of data to the client, and then trying to manipulate that data on the client is a foolish thing to do. If your client machines are more powerful than the database server then you have issues.
Never ever perform this(except cache)!!!
You are trying to implement DB mechanisms, like
persistent storage
index search and query strategy
replication
and so on
Spend your time on db optimization(optimal scheme, indexes, query, partitioning).

Any SQL Server multiple-recordset stored procedure gotchas?

Context
My current project is a large-ish public site (2 million pageviews per day) site running a mixture of asp classic and asp.net with a SQL Server 2005 back-end. We're heavy on reads, with occasional writes and virtually no updates/deletes. Our pages typically concern a single 'master' object with a stack of dependent (detail) objects.
I like the idea of returning all the data required for a page in a single proc (and absolutely no unnecesary data). True, this requires a dedicated proc for such pages, but some pages receive double-digit percentages of our overall site traffic so it's worth the time/maintenance hit. We typically only consume multiple-recordsets from our .net code, using System.Data.SqlClient.SqlDataReader and it's NextResult method. Oh, yeah, I'm not doing any updates/inserts in these procs either (except to table variables).
The question
SQL Server (2005) procs which return multiple recordsets are working well (in prod) for us so far but I am a little worried that multi-recordset procs are my new favourite hammer that i'm hitting every problem (nail) with. Are there any multi-recordset sql server proc gotchas I should know about? Anything that's going to make me wish I hadn't used them? Specifically anything about it affecting connection pooling, memory utilization etc.
Here's a few gotchas for multiple-recordset stored procs:
They make it more difficult to reuse code. If you're doing several queries, odds are you'd be able to reuse one of those queries on another page.
They make it more difficult to unit test. Every time you make a change to one of the queries, you have to test all of the results. If something changed, you have to dig through to see which query failed the unit test.
They make it more difficult to tune performance later. If another DBA comes in behind you to help performance improve, they have to do more slicing and dicing to figure out where the problems are coming from. Then, combine this with the code reuse problem - if they optimize one query, that query might be used in several different stored procs, and then they have to go fix all of them - which makes for more unit testing again.
They make error handling much more difficult. Four of the queries in the stored proc might succeed, and the fifth fails. You have to plan for that.
They can increase locking problems and incur load in TempDB. If your stored procs are designed in a way that need repeatable reads, then the more queries you stuff into a stored proc, the longer it's going to take to run, and the longer it's going to take to return those results back to your app server. That increased time means higher contention for locks, and the more SQL Server has to store in TempDB for row versioning. You mentioned that you're heavy on reads, so this particular issue shouldn't be too bad for you, but you want to be aware of it before you reuse this hammer on a write-intensive app.
I think multi recordset stored procedures are great in some cases, and it sounds like yours maybe one of them.
The bigger (more traffic), you site gets, the more important that 'extra' bit of performance is going to matter. If you can combine 2-3-4 calls (and possibly a new connections), to the database in one, you could be cutting down your database hits by 4-6-8 million per day, which is substantial.
I use them sparingly, but when I have, I have never had a problem.
I would recommend having invoking in one stored procedure several inner invocations of stored procedures that return 1 resultset each.
create proc foo
as
execute foobar --returns one result
execute barfoo --returns one result
execute bar --returns one result
That way when requirments change and you only need the 3rd and 5th result set, you have a easy way to invoke them without adding new stored procedures and regenerating your data access layer. My current app returns all reference tables (e.g. US states table) if I want them or not. Worst is when you need to get a reference table and the only access is via a stored procedure that also runs an expensive query as one of its six resultsets.

Resources