Dealing with huge amount of select statements

Dealing with huge amount of select statements - asp.net

i'm designing an ASP.NET Application wich builds an Overview of all the sales per partner in a period of time.
How it works so far:
Select all partnerNo(SQL-Server) and add to List(ASP.NET)
Select sales of partnerNo1 over period of time(SQL-Server), summarize them(ASP.NET) and add them to a DataTable(ASP.NET)
Select sales of partnerNo2 over period of time, summarize them and add them to a datatable
Select sales of partnerNo3 over period of time, summarize them and add them to a datatable
and so on
Now here is the Problem: if i select only the TOP 100 partnerNo, if takes a while, but i get a result. If i change the TOP to 1000, the SQL-Server processes the SQL-Statements
(can see him working in activitymonitor), and the iis-server is feeding him the new SQL-Selects... but after a while, the iis is terminating the page-request from the browser, so no result is shown
i really hope, i could explain it enough for someone to help me.
With regards
Dirk Th.

That's the RBAR anti-pattern. It should be possible to create one SQL query that returns summarized information from all partners.
That's typically much faster: less data has to go over the line, and less often. A roundtrip to a database can cost 50ms. If you do 600 of those, you're at the 30 second timeout for web pages.

If you have Framework 4.5, AND getting the summary data for each partnerN is mutually exclusive, you can try parallel tasks.
http://msdn.microsoft.com/en-us/library/dd460720.aspx
Now, that's not a simple subject. But it would allow you to take advantage of multiple processors.
Number one rule. You CANNOT RELY ON SEQUENCE.
........
Option 2, a more "traditional" approach is to hit the database for everything you need.
I would abandon DataTables, and start using DTO or POCO objects.
Then you can author mini "read only properties" that replace your calculated/derived data-table columns.
Go to the database, do not use cursors or looping, and hit the database for all the info you need. After you get it back, stuff it into DTOs/POCO's rely on read-only properties where you can (for derived values)..........and then if you have to run some business logic to figure-out some derived values, then do that.
If you're "stuck" with a DataSet/DataTable for the presentation layer, you can do loop over your DTOs/POCOs and stuff them into a DataSet/DataTable.

Related

Need to run multiple copies of ASP.net website with datagridview sharing single url C#

Have drop-down menu which fills 4 datagridviews based on the branch selected or when the start button is pressed loops through 80 branches.
4 sql server procs, 1 per datagridview, unique sql table, read access, only.
Need to access multiple copies, single url.
Database retrieval time = # of copies run (single asp.net websites over single url called multiple times) * database runtime.
So if it takes 30 seconds for data retrieval, running 3 copies takes 90 seconds and seems to fragment the data or timeout..
I'm using nolocks so there isn't deadlock.
But I need to optimize this.
Should I create one web service and will this solve the problem of hitting the database only one time instead of 1x per single url hit.
Thank you.
David

Thank you, the timer was taking over and performing differently on the server than on my local. Also the UI, timer, and Database were out of synch. So adding a thread.sleep helped. Adding a longer interval on the timer, helped. Also putting all the database calls together, instead of 1 connection per database call helped. Now it runs all the same time.
The main takeaway I think is that the timer and the Thread.Sleep was the main thing.
I also had a UI button - which I added some code so that once it's pushed, if you keep pushing it, it doesn't do anything.
Thank you to everyone that posted answer..

Well, this will come down to not really the numbers of records pulled, but that if you are executing multiple SQL statements over and over.
I mean, to fill 4 gv's with 4 queries? That's going to be quite much instant assuming the record set size for each grid is say in the 100 row range. Such a button click and filling the grids should be very low time.
And even if you using a row databound event - once again, it will run fast. But ONLY if you not executing a whole bunch of additional SQL queries. So the number of "hits" or so called SQL statements sent to the database is what for the most part determine the speed of this setup.
So say you have one grid - pulls 100 rows. But then the next grid say needs data based on 100 rows of "new" SQL queries. In that case, what you can often do is fill a reocrdset with the child data - and filter against that recordset, and thus say not have to execute 100 SQL queries, but only 1 query.
So, this will really come down to how many separate SQL queries you execute in total.
To fill 4 grids with 4 queries? I don't see that as being a problem, and thus we are no doubt missing some big "whopper" of a detail you not shared with us.
Expand in your question how many SQL statements are generated in total - that's the bottle neck here. Reduce that, and your performance should be just fine.
And if the 4 simple stored procedures have "cursors" that loop and again generate many SQL commands - get rid of that.
4 basic SQL queries pulls is nothing - something else is at work that you not sharing. Why would each single stored procedure take so very long? That's the detail we are missing here.

MS SQL product list with filtering

I'm building an application in ASP.NET(VB) with a MS SQL database. It is a search tool for cars that has a list of every car and all of their attributes (colors, # of doors, gas milage, mfg. year, etc). This tool outputs the results in a gridview and the users has the ability to perform advanced searches and filtering. The filtering needs to be very fine-grained (range of gas milage, color(s), mfg year range, etc.) and I cannot seem to find the best way to do this filtering without a large SQL where statement that is going to greatly impact SQL performance and page load. I feel like I'm missing something very obvious here, thank you for any help. I'm not sure what other details would be helpful.

This is not an OLTP database you're building--it's really an analytics database. There really isn't a way around the problem of having to filter. The question is whether the organization of the data will allow seeks most of the time, or will it require scans; and also whether the resulting JOINs can be done efficiently or not.
My recommendation is to go ahead and create the data normalized and all, as you are doing. Then, build a process that spins it into a data warehouse, denormalizing like crazy as needed, so that you can do filtering by WHERE clauses that have to do a lot less work.
For every single possible search result, you have a row in a table that doesn't require joining to other tables (or only a few fact tables).
You can reduce complexity a bit for some values such as gas mileage, by striping the mileage into bands of, say, 5 mpg. (10-19, 20-24, 25-29, etc.)
As you need to add to the data and change it, your data-warehouse-loading process (that runs once a day perhaps) will keep the data warehouse up to date. If you want more frequent loading that doesn't keep clients offline, you can build the data warehouse to an alternate node, then swap them out. Let's say it takes 2 hours to build. You build for 2 hours to a new database, then swap to the new database, and all your data is only 2 hours old. Then you wipe out the old database and use the space to do it again.

Strategy for handling variable time queries?

I have a typical scenario that I'm struggling with from a performance standpoint. The user selects a value from a dropdown and clicks a button. A stored procedure takes that value as an input parameter, executes, and returns the results to a grid. For just one of the values ('All'), the query runs for roughly 2.5 minutes. For the rest of the values the query runs less than 1ms.
Obviously, having the user wait for 2.5 minutes just isn't going to fly. So, what are some typical strategies to handle this?
Some of my own thoughts:
New table that stores the information for the 'All' value and is generated nightly
Cache the data on the caching server
Any help is appreciated.
Thanks!
Update
A little bit more info:
sp returns two result sets. The first is a group by rollup summary and the second is the first result set, disaggregated (roughly 80,000 rows).

I would first look at if your have the proper indexes in place. Using the Query Analyzer and the Database Tuning Assistant is a simple and often effective way of seeing what indexes might help.
If you still have performance problems after creating the appropriate indexes you might then look at adding tables/views to speed things up. If your query does a lot of joins you might consider creating an indexed view that allows you to do a select with no joins on the denormalized data. Since indexed views are persisted you can see big gains from their use.
You can read up on indexed views here:
http://msdn.microsoft.com/en-us/library/dd171921%28v=sql.100%29.aspx
and read about the database tuning adviser here:
http://msdn.microsoft.com/en-us/library/ms166575.aspx
Also, how many records does "All" return? I have seen people get hung up on the "All" scenario before, but if it returns 1 million records or something then the data is not usable to a person anyways...

Caching data is a good thing, but.... if the SP is inherently flawed, then you might want to actually fix it instead of trying to bandage it with caching.
You might also want to (since you didn't mention here) look at the number of rows "All" returns compared to the other selections and think about your indexes.
Also in your SP does the "All" cause it to run a different sets of tsql as in maybe a case or an if... or is it running the same code just with a different "WHERE"?
It might simply be that "ALL" just returns A LOT of records. You may want to implement paging and partial dataset return using ajax... (kinda like return the first 1000 records early so that it can be displayed and also show a throbber on the screen while the rest of the dataset is returned)
These are all options... if the number of records really isnt that different between ALL and the others... then it probably has something to do with the query/index/program flow.

Autocomplete optimization for large data sets

I am working on a large project where I have to present efficient way for a user to enter data into a form.
Three of the fields of that form require a value from a subset of a common data source (SQL Table). I used JQuery and JQuery UI to build an autocomplete, which posts to a generic HttpHandler.
Internally the handler uses Linq-to-sql to grab the data required from that specific table. The table has about 10 different columns, and the linq expression uses the SqlMethods.Like() to match the single search term on each of those 10 fields.
The problem is that that table contains some 20K rows. The autocomplete works flawlessly, accept the sheer volume of data introduces deleays, in the vicinity of 6 seconds or so (when debugging on my local machine) before it shows up.
The JqueryUI autocomplete has 0 delay, queries on the 3 key, and the result of the post is made in a Facebook style multi-row selectable options. (I almost had to rewrite the autocomplete plugin...).
So the problem is data vs. speed. Any thoughts on how to speed this up? The only two thoughts I had were to cache the data (How/Where?); or use straight up sql data reader for data access?
Any ideas would be greatly appreciated!
Thanks,
<bleepzter/>

I would look at only returning the first X number of rows using the .Take(10) linq method. That should translate into a sensbile sql call, which will put much less load on your database. As the user types they will find less and less matches, so they will only see that data they require.
I'm normally reckon 10 items is enough for the user to understand what is going on and still get to the data they need quickly (see the amazon.com search bar for an example).
Obviously if you can sort the data in a meaningful fashion then the 10 results will be much more likely to give the user what they are after quickly.

Returning the top N results is a good idea for sure. We found (querying a potential list of 270K) that returning the top 30 is a better bet for the user finding what they're looking for, but that COMPLETELY depends on the data you are querying.
Also, you REALLY should drop the delay to something sensible like 100-300 ms. When you set delay to ZERO, once you hit the 3-character trigger, effectively EVERY. SINGLE. KEY. STROKE. is sent as a new query to your server. This could easily have the unintended and unwelcome effect of slowing down the response even MORE.

Which is fastest? Data retrieval

Is it quicker to make one trip to the database and bring back 3000+ plus rows, then manipulate them in .net & LINQ or quicker to make 6 calls bringing back a couple of 100 rows at a time?

It will entirely depend on the speed of the database, the network bandwidth and latency, the speed of the .NET machine, the actual queries etc.
In other words, we can't give you a truthful general answer. I know which sounds easier to code :)
Unfortunately this is the kind of thing which you can't easily test usefully without having an exact replica of the production environment - most test environments are somewhat different to the production environment, which could seriously change the results.

Is this for one user, or will many users be querying the data? The single database call will scale better under load.

Speed is only one consideration among many.
How flexible is your code? How easy is it to revise and extend when the requirements change? How easy is it for another person to read and maintain your code? How portable is your code? what if you change to a diferent DBMS, or a different progamming language? Are any of these considerations important in your case?
Having said that, go for the single round trip if all other things are equal or unimportant.
You mentioned that the single round trip might result in reading data you don't need. If all the data you need can be described in a single result table, then it should be possible to devise a query that will get that result. That result table might deliver some result data in more than one row, if the query denormalizes the data. In that case, you might gain some speed by obtaining the data in several result tables, and composing the result yourself.
You haven't given enough information to know how much programming effort it will be to compose a single query or to compose the data returned by 6 queries.
As others have said, it depends.

If you know which 6 SQL statements you're going to execute beforehand, you can bundle them into one call to the database, and return multiple result sets using ADO or ADO.NET.
http://support.microsoft.com/kb/311274

the problem I have here is that I need it all, i just need it displayed separately...
The answer to your question is 1 query for 3000 rows is better than 6 queries for 500 rows. (given that you are bringing all 3000 rows back regardless)
However, there's no way you're going (to want) to display 3000 rows at a time, is there? In all likelihood, irrespective of using Linq, you're going to want to run aggregating queries and get the database to do the work for you. You should hopefully be able to construct the SQL (or Linq query) to perform all required logic in one shot.
Without knowing what you're doing, it's hard to be more specific.
* If you absolutely, positively need to bring back all the rows, then investigate the ToLookup() method for your linq IQueryable< T >. It's very handy for grouping results in non-standard ways.
Oh, and I highly recommend LINQPad (free) for trying out queries with Linq. It has loads of examples, and it also shows you the sql and lambda forms so you can familiarize yourself with Linq<->lambda form<->Sql.

Well, the answer is always "it depends". Do you want to optimize on the database load or on the application load?
My general answer in this case would be to use as specific queries as possible at the database level, therefore using 6 calls.

Thx
I was kind of thinking "ball park", but it sounds as though its a choice thing...the difference is likely small.
I was thinking that getting all the data and manipulating in .net would be the best - I have nothing concrete to base this on (hence the question), I just tend to feel that calls to the DB are expensive and if I know i need all the data...get it in one hit?!?

Part of the problem is that you have not provided sufficient information to give you a precise answer. Obviously, available resources need to be considered.
If you pull 3000 rows infrequently, it might work for you in the short term. However, if there are say 10,000 people that execute the same query (ignoring cache effects), this could become a problem for both the app and db.
Now in the case of something like pagination, it makes sense to pull in just what you need. But that would be a general rule to try to only pull what is necessary. It's much more elegant to use a scalpel instead of a broadsword. =)

If you are talking about a query that has already been run by SQL (so optimized by SQL Server), working with LINQ or a SqlDataReader might actually have the same performance.
The only difference will be "how hard will it be to maintain your code?"
LINQ doesn't query anything to the database until you ask for the result with ".ToList()" or ".ToArray()" or even ".Count()". LINQ is dynamically building your query so it is exactly the same as having a SqlDataReader but with runtime verification.

Rather than speculating, why don't you try both and measure the results?

It depends
1) if your connector implementation precaches a lot of objects AND you have big rows (for example blobs, contry polygons etc.) you have a problem, you have to download a LOT of data. I've optimalized once a code that had this problem and it was just downloading some megs of garbage all the time via localhost, and my software runs now 10 times faster because i removed the precaching by an option
2) If your rows are small and you have a good chance that you need to read through all the 3000, you're better going on a big resultset
3) If you don't use prepared statements, all queries have to be parsed! Big resultset might be better.
Hope it helped

I always stick to the rule of "bring in what I need" and nothing more...the problem I have here is that I need it all, I just need it displayed separately.
So say...
I have a table with userid and typeid. I want to display all records with a userid, and display on the page in grids say separated by typeid.
At the moment I call sproc that does "select field1, field2 from tab where userid=1",
then on the page set the datasource of a grid to from t in tab where typeid=2 select t;
Rather than calling a different sproc "select field1, field2 from tab where userid=1 and typeid=2" 6 times.
??

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex