SQL server inserting lots of data from ASP.NET? - asp.net

I have this application, where there is a parent child table, and customers can order products. The whole structure is quite complex to post here but suffice to say, there is one Order table and one OrderDetails table for storing the orders. Currently what we are doing is INSERT one record in Order table, and then for each item the customer added, insert each item in a loop to OrderDetails table. The solution is not scalable for obvious reasons. It works fine for 100 or so items, but if user goes over 1000 items, or 1000 qty of a item or so, one can start to notice the unresponsiveness of the application.
There are a couple of solutions that come to mind, but I am not sure which one would scale well. One is I use BulkInsert from my asp.net application to insert into the OrderDetails table. Second is I generate XML and then pass that to a sql proc and extract / insert data into OrderDetails table from that XML, but that have associate overhead of memory consumption of the XML generated. I know I could benchmark and see for myself what would suit best for my application, but I would like to know what is the most common strategy and would scale better when compared to other. Also, if there is another technique that I could use instead of these two, that would be better performance wise ( I know performance is subjective word, but let me narrow it down to speed ) I could use that. Which is generally used the most? What do you use in your application?

You could consider exploring the option of using a table valued parameter in the database. You will have to create a table type object, whose structure will mimic that of the OrderDetails table. The stored proc for inserting the data will accept an input parameter of this type (such parameters are always READONLY).
In your server side code, you can construct a DataTable object containing all the Order Details data, which will be mapped to the input parameter of the stored proc. Ensure that the order of columns in the DataTable object exactly matches the order in the table valued parameter. Upon executing the query, all the data will be inserted in one shot. This will save you from looping for each row of data that is there, and will also prevent the overhead of XML parsing. This approach though will involve passing an entire object over the network.
You can read more about it here : MSDN Table Valued Parameters

1000 items for an order does seem quite excessive!
Would it be feasible to introduce a limit of 100 items per order into the business logic of the application?

Related

Pagination with Filtering using Query Operation in DynamoDB Template

I would like to be able to filter a pagination result using query operation before the limit is taken into consideration.Is there any suggestion to get right pagination on filtered results?
I would like to implement a DynamoDB Scan OR Query with the following logic:
Scanning -> Filtering(boolean true or false) -> Limiting(for pagination)
However, I have only been able to implement a Scan OR Query with this logic:
Scanning -> Limiting(for pagination) -> Filtering(boolean true or false)
Note: I have already tried Global Secondary Index but it didn't work in my case Because I have 5 different attributes to filter and limit.
Unfortunatelly DynamoDB is not capable to do this, once you do Query on one of your indexes, it will read every single item that satisfies your partition and sort key.
Lets check your example - You have boolean and you have index over that field. Lets say 50% of items are false and 50% are true. Once you search by that index you will read through 50% of all items in table (so its almost like SCAN). If you set up limit, it will read only that number of items and then it stops. You cannot use the combination of limit and skip/page/offset like in other databases.
There is some level of pagination https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.Pagination.html but it does not allow you to jump to i.e. page 10, it only allows you go through all the pages one by one. Also I am not sure how it is priced, maybe internally the AWS will go through all the items before preparing the results for you, so you will pay for reading 50% of whole table even if you stop iterating before you reach the end.
There is also the limitation that index can have maximum of 2 fields (partition, sort).
EXAMPLE
You wrote that you have 5 parameters you want to query. The workaround that is used to address these limitations is to create and manage extra fields that have combination of parameters you want to query. Lets say you have table of users and you have there gender, age, name, surname and position. Lets say its huge database, so you have to think about amount of data you can load. Then if you want to use DynamoDB, you have to think about all queries you want to do.
You most likely want to search by name and surname, so you create index with surname as partition key and name as sort key (in such case you can search by surname or by both surname and name). It can work for lot of names, but you found out that some name combinations are too common and you need to filter by position as well. In such case, you create new field (column) called i.e. name-surname and whenever you create or update item, you will need to handle this field in your app to make sure it contains both of it, i.e. will-smith. Then you can make another index, that has name-surname as partition key and position as sort key. Now you can use it for such searches.
However you found out, that for some name-surname-position combination you get too many results and you dont want to handle it on application level and you want to limit results by age as well. Then you can create index with name-surname-position as partition key and age as sort key. At this moment you can also figure out that your old name-surname field and index can be removed as it server no purposes anymore (name and surname are handled by another index and for searching just name-surname-position you can use this index)
You want to query by gender as well sometimes? Its probably better to handle that in application level (or extra filter in db query) rather than creating new index that must be handled and payed for. There are only two types of gender (ok, lets say there exists more, but 99% of people will have just male or female) so its probably cheaper to just hide few fields on application level if someone wants to check only male/female/transgenders..., but load all of them. Because for extra index you would have to pay for every single insert, but this filter will be used only from time to time. Also when someone searches already by name, surname and position you dont expect that much results anyway, so if you get 20 (all genders) or just 10 (male only) results does not make much difference.
This ^^ was just example of how you can think and work with DynamoDB. How exactly you use it depends on your business logic.
Very important note: DynamoDB is very simple database that can only do very simple queries. It has little more functionality than Redis but a lot less functionality than traditional databases. The valid result of thinking about your business model/use-cases is that maybe you should NOT use the DynamoDB at all, because it can simply not satisfy your needs and queries.
Some basic thinking can look like this:
Is key-value persistant storage enough? Use DynamoDB
Is key-value persistant storage, where one item can have multiple keys and I can search and filter by maximum of 2 fields enough? Use DynamoDB
Is persistant storage, where I want to search single Table/Collection by many multiple keys with lot of options enough? Use MongoDB
Do I need to search through multiple tables or do complex joins or need transactions? Use traditional SQL database

How to develop a "Check All" feature on a large, paginated grid/table

I am looking for advice on best practices regarding the following scenario.
I have a grid that supports server-side pagination.
Let's say that there are 100,000 records in the grid, and each page size is 100 records.
Let's say they "Select All".
Considering the records are paginated, do you store a list of "selected" records client-side or server-side?
My first pass was to have the service page that populates the grid to return a comma delimited list of all the records primary keys for future reference alongside the normal dataset, but this seems clunky or even dangerous with large datasets.
My second pass was to store all the records in a special table for future reference, but this adds a lot of overhead in the form of table management to the process.
Or something else? Thanks!

ASP.NET MySQL Data Access Layer

I have an ASP.NET application that access a MySQL database. For that I made a class with all the queries I need to retrieve data from database.
In order to bring from database just the info I need I have a lot of queries:
example
One query that gets the NAME and DATE from the table NEWS
Another query to get the NAME, DATE AND TEXT from the table news.
I do this because in some pages I just need the name and date and in others I need also the text.
What do you think would be better for performance, just to have one query and get all the information even if I don't use some of the fields in some pages or to have a query for each case?
This has been a very simple example, in some cases I have many fields...
Thanks.
It really depends more on how often you create connection with database. For example, if your page loads and some parts of your page use first query and other use the second, there is benefit of executing the second query for both only once and distribute data as needed. You save on unnecessary connections and this does result in performance gain. However if you have different pages calling different methods and you can not reduce number of calls, you can keep both methods and call the one that will select only what you need.

How to setup data model for customizable application

I have an ASP.NET data entry application that is used by multiple clients. The application consists of multiple data entry modules that are common to all clients.
I now have multiple clients that want their own custom module added which will typically consist of a dozen or so data points. Some values will be text, others numeric, some will be dropdown selections, etc.
I'm in need of suggestions for handling the data model for this. I have two thoughts on how to handle. First would be to create a new table for each new module for each client. This is pretty clean but I don't particular like it. My other thought is to have one table with columns for each custom data point for each client. This table would end up with a lot of columns and a lot of NULL values. I don't really like either solution and suspect there's a better way to do this, so any feedback you have will be appreciated.
I'm using SQL Server 2008.
As always with these questions, "it depends".
The dreaded key-value table.
This approach relies on a table which lists the fields and their values as individual records.
CustomFields(clientId int, fieldName sysname, fieldValue varbinary)
Benefits:
Infinitely flexible
Easy to implement
Easy to index
non existing values take no space
Disadvantage:
Showing a list of all records with complete field list is a very dirty query
The Microsoft way
The Microsoft way of this kind of problem is "sparse columns" (introduced in SQL 2008)
Benefits:
Blessed by the people who design SQL Server
records can be queried without having to apply fancy pivots
Fields without data don't take space on disk
Disadvantage:
Many technical restrictions
a new field requires DML
The xml tax
You can add an xml field to the table which will be used to store all the "extra" fields.
Benefits:
unlimited flexibility
can be indexed
storage efficient (when it fits in a page)
With some xpath gymnastics the fields can be included in a flat recordset.
schema can be enforced with schema collections
Disadvantages:
not clearly visible what's in the field
xquery support in SQL Server has gaps which makes getting your data a real nightmare sometimes
There are maybe more solutions, but to me these are the main contenders. Which one to choose:
key-value seems appropriate when the number of extra fields is limited. (say no more than 10-20 or so)
Sparse columns is more suitable for data with many properties which are filled out infrequent. Sounds more appropriate when you can have many extra fields
xml column is very flexible, but a pain to query. Appropriate for solutions that write rarely and query rarely. ie: don't run aggregates etc on the data stored in this field.
I'd suggest you go with the first option you described. I wouldn't over think it. The second option you outlined would be a bad idea in my opinion.
If there are fields common to all the modules you're adding to the system you should consider keeping those in a single table then have other tables with the fields specific to a particular module related back to the primary key in the common table. This is basically table inheritance (http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server) and will centralize the common module data and make it easier to query across modules.

.NET Object Design

I have a series of objects I have created:
Item
Order
Song
etc.
Each object has a reasonable number of properties, and I use a datareader where I pass it "SELECT * FROM .objectname." and then I fill a collection of objects, and return the collection. This works as: GetOrdersCollection(), GetSongsCollection(), etc.
I understand SELECT * to be a performance problem, and additionally, sometimes I prefer to include additional columns in the select statement which do not exist in the object, and have those all returned as well.
So my question is, what is the best way to approach this problem?
Should I create a new object for every query type?
I tried performing a check to see if column is in datareader before storing it, but this presents perf. issues. Is there a negligible perf. way to avoid IndexOutOfRange?
Should I just use Datatable and read right from the table?
I understand SELECT * to be a
performance problem,
It's not a performance problem if there are only a few columns, or you need all of the columns anyway.
1.Should I create a new object for every query type?
You should create a new object for each table, and a new method for each query type.
2.I tried performing a check to see if column is in datareader before storing
it, but this presents perf. issues. Is
there a negligible perf. way to avoid
IndexOutOfRange?
If you are referring to your fields by name rather than index, there shouldn't be any IndexOutOfRange problems. If you are referring to your fields by index, you can loop thru them where your index is less than the column Count(), and there shouldn't be any IndexOutOfRange problems.
3.Should I just use Datatable and read right from the table?
That's a perfectly good approach to start out with. Consider spending some time to learn a simple ORM as others have suggested. Subsonic is a good "first" ORM.
Performance-wise reading from a forward only data structure like DataReader is going to net you the best performance and resource conservation.
On the other hand populating object (like a OR/M does) can be negligible so long as you are not returning more than a handful of objects.
Your first step should be to profile your database and ensure that you have proper indexes. Write some tests to see where your largest time expense is in the process and optimize the target areas that cost you the most.
Are there any reasons you can't use a simple ORM generator like SubSonic? This will allow you to very easily access these types of collections, and they'll be strongly typed. You also won't have to worry about the SQL since the queries will be built by SubSonic.

Resources