Returning parent-child data from SQL query - asp.net

I have an asp.net page which needs to return an object with a one-to-many relationship. i.e. there is a single header row followed by an unspecified number of data rows. Typically there will be between 1 and 10 rows, so I'm not dealing with a huge amount of data here - just a page that is called frequently.
I know that the OLE provider supports the SHAPE command which allows hierarchical data to be returned, but I'm using SQLDataReader (ADO?) which doesn't support it. The question is, what is best practice/best performance here?
INNER JOIN in the query and return a single table containing the header and data rows, meaning the header is repeated in every row after the first. Is this generating unnessary data traffic or is there some hidden optimisation behind the scenes?
Make two separate calls to SQLCommand.executeReader() - one to return the header row and one for the variable number of data rows. But this incurs the overhead of issuing 2 separate queries instead of 1.
Use an API that supports SHAPE instead.
Make a single call to SQLCommand.executeReader() but with the query containing 2 select statements to return exactly 2 data sets. Then call SQLDataReader.nextResult() after processing the header row. This seems like a good idea for solving my original problem, as there is only one header row. There are other situations where this is not the case however, in which case #4 wouldn't be an option. Furthermore the data access is sequential and my application needs to use the header both before and after I use the data rows. So I'd have to write the header row to an in-memory DataTable before calling nextResult().

I don't have experience with SHAPE so I don't know if it performs well. Of the other three options, Option #4 is absolutely the most efficient because it minimizes both trips to the database and duplication of data.

Related

DynamoDB Scan Vs Query on same data

I have a use case where I have to return all elements of a table in Dynamo DB.
Suppose my table has a partition key (Column X) having same value in all rows say "monitor" and sort key (Column Y) with distinct elements.
Will there be any difference in execution time in the below approaches or is it the same?
Scanning whole table.
Querying data based on the partition key having "monitor".
You should use the parallell scans concept. Basically you're doing multiple scans at once on different segments of the Table. Watch out for higher RCU usage though.
Avoid using scan as far as possible.
Scan will fetch all the rows from a table, you will have to use pagination also to iterate over all the rows. It is more like a select * from table; sql operation.
Use query if you want to fetch all the rows based on the partition key. If you know which partition key you want the results for, you should use query, because it will kind of use indexes to fetch rows only with the specific partition key
Direct answer
To the best of my knowledge, in the specific case you are describing, scan will be marginally slower (esp. in first response). This is when assuming you do not do any filtering (i.e., FilterExpression is empty).
Further thoughts
DynamoDB can potentially store huge amounts of data. By "huge" I mean "more than can fit in any machine's RAM". If you need to 'return all elements of a table' you should ask yourself: what happens if that table grows such that all elements will no longer fit in memory? you do not have to handle this right now (I believe that as of now the table is rather small) but you do need to keep in mind the possibility of going back to this code and fixing it such that it addresses this concern.
questions I would ask myself if I were in your position:
(1) can I somehow set a limit on the number of items I need to read (say,
read only the first 1000 items)?
(2) how is this information (the list of
items) used? is it sent back to a JS application running inside a
browser which displays it to a user? if the answer is yes, then what
will the user do with a huge list of items?
(3) can you work on the items one at a time (or 10 or 100 at a time)? if the answer is yes then you only need to store one (or 10 or 100) items in memory but not the entire list of items
In general, in DDB scan operations are used as described in (3): read one item (or several items) at a time, do some processing and then moving on to the next item.

SQL server inserting lots of data from ASP.NET?

I have this application, where there is a parent child table, and customers can order products. The whole structure is quite complex to post here but suffice to say, there is one Order table and one OrderDetails table for storing the orders. Currently what we are doing is INSERT one record in Order table, and then for each item the customer added, insert each item in a loop to OrderDetails table. The solution is not scalable for obvious reasons. It works fine for 100 or so items, but if user goes over 1000 items, or 1000 qty of a item or so, one can start to notice the unresponsiveness of the application.
There are a couple of solutions that come to mind, but I am not sure which one would scale well. One is I use BulkInsert from my asp.net application to insert into the OrderDetails table. Second is I generate XML and then pass that to a sql proc and extract / insert data into OrderDetails table from that XML, but that have associate overhead of memory consumption of the XML generated. I know I could benchmark and see for myself what would suit best for my application, but I would like to know what is the most common strategy and would scale better when compared to other. Also, if there is another technique that I could use instead of these two, that would be better performance wise ( I know performance is subjective word, but let me narrow it down to speed ) I could use that. Which is generally used the most? What do you use in your application?
You could consider exploring the option of using a table valued parameter in the database. You will have to create a table type object, whose structure will mimic that of the OrderDetails table. The stored proc for inserting the data will accept an input parameter of this type (such parameters are always READONLY).
In your server side code, you can construct a DataTable object containing all the Order Details data, which will be mapped to the input parameter of the stored proc. Ensure that the order of columns in the DataTable object exactly matches the order in the table valued parameter. Upon executing the query, all the data will be inserted in one shot. This will save you from looping for each row of data that is there, and will also prevent the overhead of XML parsing. This approach though will involve passing an entire object over the network.
You can read more about it here : MSDN Table Valued Parameters
1000 items for an order does seem quite excessive!
Would it be feasible to introduce a limit of 100 items per order into the business logic of the application?

When to include an index (automated heuristic)

I have a piece of software which takes in a database, and uses it to produce graphs based on what the user wants (primarily queries of the form SELECT AVG(<input1>) AS x, AVG(<intput2>) as y FROM <input3> WHERE <key> IN (<vals..> AND ...). This works nicely.
I have a simple script that is passed a (often large) number of files, each describing a row
name=foo
x=12
y=23.4
....... etc.......
The script goes through each file, saving the variable names, and an INSERT query for each. It then loads the variable names, sort | uniq's them, and makes a CREATE TABLE statement out of them (sqlite, amusingly enough, is ok with having all columns be NUMERIC, even if they actually end up containing text data). Once this is done, it then executes the INSERTS (in a single transaction, otherwise it would take ages).
To improve performance, I added an basic index on each row. However, this increases database size somewhat significantly, and only provides a moderate improvement.
Data comes in three basic types:
single value, indicating things like program version, etc.
a few values (<10), indicating things like input parameters used
many values (>1000), primarily output data.
The first type obviously shouldn't need an index, since it will never be sorted upon.
The second type should have an index, because it will commonly be filtered by.
The third type probably shouldn't need an index, because it will be used in output.
It would be annoying to determine which type a particular value is before it is put in the database, but it is possible.
My question is twofold:
Is there some hidden cost to extraneous indexes, beyond the size increase that I have seen?
Is there a better way to index for filtration queries of the form WHERE foo IN (5) AND bar IN (12,14,15)? Note that I don't know which columns the user will pick, beyond the that it will be a type 2 column.
Read the relevant documentation:
Query Planning;
Query Optimizer Overview;
EXPLAIN QUERY PLAN.
The most important thing for optimizing queries is avoiding I/O, so tables with less than ten rows should not be indexed because all the data fits into a single page anyway, so having an index would just force SQLite to read another page for the index.
Indexes are important when you are looking up records in a big table.
Extraneous indexes make table updates slower, because each index needs to be updated as well.
SQLite can use at most one index per table in a query.
This particular query could be optimized best by having a single index on the two columns foo and bar.
However, creating such indexes for all possible combinations of lookup columns is most likely not worth the effort.
If the queries are generated dynamically, the best idea probably is to create one index for each column that has good selectivity, and rely on SQLite to pick the best one.
And don't forget to run ANALYZE.

ASP.Net MVC - Data Design - Single Wide Record versus Many small record retrievals

I am designing a Web application that we estimate may have about 1500 unique users per hour. (We have no stats for concurrent users.). I am using ASP.NET MVC3 with an Oracle 11g backend and all retrieval will be through packaged stored procedures, not inline SQL. The application is read-only.
Table A has about 4 million records in it.
Table B has about 4.5 million records.
Table C has less than 200,000 records.
There are two other tiny lookup tables that are also linked to table A.
Tables B and C both have a 1 to 1 relationship to Table A - Tables A and B are required, C is not. Tables B and C contain many string columns (some up to 256 characters).
A search will always return 0, 1, or 2 records from Table A, with its mate in table b and any related data in C and the lookup tables.
My data access process would create a connection and command, execute the query, return a reader, load the appropriate object from that reader, close the connection, and dispose.
My question is this....
Is it better (as performance goes) to return a single, wide record set all at once (using only one connection) or is it better to query one table right after the other (using one connection for each query), returning narrower records and joining them in the code?
EDIT:
Clarification - I will always need all the data I would bring over in either option. Both options will eventually result in the same amount of data displayed on the screen as was brought from the DB. But one would have a single connection getting all at once (but wider, so maybe slower?) and the other would have multiple connections, one right after the other, getting smaller amounts at a time. I don't know if the impact of the number of connections would influence the decision here.
Also - I have the freedom to denormalize the table design, if I decide it's appropriate.
You only ever want to pull as much data as you need. Whichever way moves less from the database over to your code is the way you want to go. I would pick your second suggestion.
-Edit-
Since you need to pull all of the records regardless, you will only want to establish a connection once. Since you're getting the same amount of data either way, you should try to save as much memory as possible by keeping the number of connections down.

Fastest alternative to Datatable.Select() to narrow cached data?

Stack Overflowers:
I have a search function on my company's website (based on .NET 2.0) that allows you to narrow the product catalog using up to 9 different fields. Right now, after you make your selections on the frontend I am building a dynamic query and hitting the database (SQL Server) to get the resulting list of items numbers.
I would like to move away from hitting the database everytime and do all of this in memory for faster results. Basically a 3500 - 4500 row "table" with 10 columns: the item number (which could be a primary key) and the 9 attribute fields (which have repeating values for many many rows). There can be any number of different searches between the 9 columns to get the items you want:
Column A = 'foo' AND Column D = 'bar'
Column B = 'foo' AND Column C = 'bar' AND Column I = 'me'
Column H = 'foo'
etc...
Based on my research, the .Select() function seems like the slowest way to perform the search, but it stands out to me as being the quickest and easiest way to perform the narrowing searches to get the list of item numbers:
MyDataSet.Select("Column B = 'foo' AND Column E = 'bar' AND Column I = 'me'")
In my specific case, what method do you suggest I use as an alternative that has the same narrowing functionality and better performance instead of settling for the datatable.select() method?
Your best alternative is to let your database do what it's best at: querying and filtering data.
Caching DataTables (especially ones with 3500-4500 rows) is a bad idea for web applications. Calling Select() on a DataTable doesn't reduce the number of rows in the DataTable - it returns a new collection of rows (copied from the original), which means you'll still have the original 4000 rows sitting in the cache. Better to have nothing at all in the cache, and just get the rows you need when the user requests them.
DataTables (and DataSets) are best used with fat clients (usually Windows applications) that need to work with in-memory copies of database data while in a disconnected state.
Datatables are not optimally built for being queried, I wouldn't recommend going down this route, unless you really have a documented performance problem that you're certain would be improved by doing so.
If your dynamic queries are slow, it's probably because you haven't indexed your table properly in your database. Databases are designed to be able to optimally query your data, so my hunch would be that a little work on the database side of things should get you where you need to go.
If you really need to query ADO.Net datatables, make sure you read Scaling ADO.Net DataTables thoroughly. It talks about things you can do to speed up the performance of them, and gives you some benchmarks so you can see the difference.

Resources