How much data is too much for an array? - apache-flex

I have a flex webapp that retrieves some names & addresses from a database. Project works fine but I'd like to make it faster. Instead of making a call to the database for each name request, I could pre-load all names into an array & filter the array when the user makes a request. Before I go down this route though I wanted to check if it is even feasible to have an application w/ 50,000 or 1 million elements in an array? What is the limit b/f it slows down the app? (I anticipate that it will have a lot to do w/ what else is going on in my app but for this sake lets just assume the app ONLY consists of this huge array).

Searching through a large array can be slower than necessary, particularly if you're talking about 1 million records.
Can you split it into a few still-large-but-smaller arrays? If you're always searching by account number, then divide them up based on the first digit or two digits.
To directly answer your question though, pure AS3 processing of a 50,000 element array should be fine. Once you get over 250,000 I'd think you need to break it up.
Displaying that many UI elements is different though. If you try to bind a chart to a dataProvider with 10,000 elements, it's too much. Same for a list or datagrid.
But for pure model data, not ui bound, I'd recommend up to 250,000 in my experience.

If your loading large amounts of data (not sure if your using Lists though), you could check out James Wards post about using AsyncListView with paging to grab the data in chuncks as its needed. Gonna try and implement something like this soon. His runnable example uses 100,000 rows with paging of 100 (works with HttpService/AMF type calls):
http://www.jamesward.com/2010/10/11/data-paging-in-flex-4/

Yes, you could probably stuff a few million items in an array if you wanted to, and the Flash player wouldn't yell at you. But do you really want to?
Is the application going to take longer to start if it has to download the entire database locally before being able to work? If the additional time needed to download that much data isn't significant, are a few database lookups really worth optimizing?
If you have a good use case to do this, you're going to have to pay attention to the way you use those data structures. Looping over the array to find an item is going to be a bit slow, so you'll want to create indexes locally, most likely by using a few hash structures. The more flexible you allow the search queries to be, the more interesting the indexing issues will be.

Related

J2ME - Read directly from RecordStores or store records in a Vector as they're added and retrieve them from that vector later

How do you usually work with the data contained in a RecordStore:
Do you always "query" directly the RecordStore when you have to
perform
some operations over its records (searching, sorting,etc) or
Do you "cache" those records in a vector or array so that you query
that vector or array later, instead of the RecordStore?
Personally, I was following the second approach until yesterday when I got a nasty exception, reminding me that memory is a luxury we should be really careful about when developing j2me apps :S
Taking memory in consideration, now I'm not really sure that keeping arrays would be such a good idea.
In any case, I would like to hear your opinions guys.After all, you've got more experience.
Thanks for your time.
That depends on the number of records and the size of each record.
If you have already had OOME with the Vector approach, then try to work with only a single register at a time.
If you structure well your record you can do some fast searches on it. String searches will probably be slower.
Keep in mind that, although RMS has no fixed max size, it is advisable to call RecordStore.getSizeAvailable to give you an idea of how much info you can store in a given device.
Here you have a good tutorial on RMS:
http://www.ibm.com/developerworks/library/j-j2me3/

ASP.net 3.5 Webservice returns large dataset

I know there are lots of similar questions out there like this, but all of the solutions are eithers ones I cannot use or do not work. The basics of the issue is that I have to make a web service call that returns a typed dataset. This dataset can have 30,000 rows or more in some cases. So my issue is how do I get the page to be more responsive and perhaps load everything while the web service is still downloading the dataset?
Please note that normally I would never return this amount of data and would instead do paging on the server side, but the requirements for this really lock down what I can do. I can make the web service return JSon if need be, but my problem at that point is how to get the JSON data back into a format that the gridview could use to bind the data. I know there is an external library out there, but that is out as well.
Sad to say that the restrictions I have here are pretty obscene, but they are what they are and I cannot really change them.
TIA
-Stanley
A common approach to this kind of scenario is to page (in chunks) your data as it comes back. Do this asynchronously (separate thread). You might even be able to do this in only two chunks: first 1000 rows, then the rest. It will seem very responsive to your users. If there is any way to require the users to filter the result-set, to reduce the result-set, that would be ideal.
#Lostdreamer is right. Use JQuery to do two AJAX calls. The first call gets the first 1000 rows then kicks off the second call (etc). Honestly, this is simply simulating what HTTP typically does (limiting packet sizes and loading multiple chunks).

ArrayCollection fast filtering

I have the following situation: have a screen with 4 charts, each being populated with 59 array collections. Each array collection has 2000+ data points.
This construct needs to be filtered(by time), but the problem I'm encountering is the fact that it takes a long time for the filtering to finish(which is to expected given the amount of data points that need to filtered).
Filtering one chart at a time is not option, so I wanted to ask you what do you think would be the best approach? (should I use vector instead?). To generalize this question what would be the best way to filter large collections in flex/as3?
Thanks.
You'll have to squeeze out every bit of performance improvement that's possible and that's suited:
use Vector if possible, and as much as you can. It has (contrary to what www.flextras.com posits) a filter property which accepts a filtering function.
ArrayCollections are slow. (In general all flex native classes are unnecessarily slow) So if you really HAVE to use ArrayCollections, only use them to present the resulting Vectors.
if the problem is that the application "freezes" you can look into green threading so you can present the user with a progress bar, that way they at least have a sense of progress.
http://blog.generalrelativity.org/actionscript-30/green-threads/
I would suggest to filter large collections on the server. This has several benefits:
you may be able to minimize network traffic because only the filtered data have to be transferred
server side computing can be parallelized and is typically faster, because of more performant hardware and the server's runtime language (e.g. Java).
services requests are done asynchronously, so your client application is not blocked
Use vector's wherever possible, use green threading if you still can't manage. Internally we use a lot of dictionaries to cache computed queries for later lookup. Dictionaries in as3 are one of the fastest objects around. So we pre-filter in the background and story the various filterted collections in a dictionary. Not sure if it works for your case.
Ok, so I googled around more for green threading and stumbled upon a project by gskinner (PerformanceTestv2). Testing the generation of data vs rendering time got me the following results:
[MethodTest name='Set gain series test: ' time=1056.0 min=1056 max=1056 deviation=0.000 memory=688] //filtering data source
[MethodTest name='Set gain series test: ' time=24810.0 min=24810 max=24810 deviation=0.000 memory=16488] //filtering + rendering.
Next I looked on how to improve rendering time of the chart, but not much improvement. However I did find a project based on Degrafa: Axis. This has been ported to flex 4, event 4.5([Axiss 4.5])3. I've integrated charts based on this framework and the results are really great so far.

Autocomplete optimization for large data sets

I am working on a large project where I have to present efficient way for a user to enter data into a form.
Three of the fields of that form require a value from a subset of a common data source (SQL Table). I used JQuery and JQuery UI to build an autocomplete, which posts to a generic HttpHandler.
Internally the handler uses Linq-to-sql to grab the data required from that specific table. The table has about 10 different columns, and the linq expression uses the SqlMethods.Like() to match the single search term on each of those 10 fields.
The problem is that that table contains some 20K rows. The autocomplete works flawlessly, accept the sheer volume of data introduces deleays, in the vicinity of 6 seconds or so (when debugging on my local machine) before it shows up.
The JqueryUI autocomplete has 0 delay, queries on the 3 key, and the result of the post is made in a Facebook style multi-row selectable options. (I almost had to rewrite the autocomplete plugin...).
So the problem is data vs. speed. Any thoughts on how to speed this up? The only two thoughts I had were to cache the data (How/Where?); or use straight up sql data reader for data access?
Any ideas would be greatly appreciated!
Thanks,
<bleepzter/>
I would look at only returning the first X number of rows using the .Take(10) linq method. That should translate into a sensbile sql call, which will put much less load on your database. As the user types they will find less and less matches, so they will only see that data they require.
I'm normally reckon 10 items is enough for the user to understand what is going on and still get to the data they need quickly (see the amazon.com search bar for an example).
Obviously if you can sort the data in a meaningful fashion then the 10 results will be much more likely to give the user what they are after quickly.
Returning the top N results is a good idea for sure. We found (querying a potential list of 270K) that returning the top 30 is a better bet for the user finding what they're looking for, but that COMPLETELY depends on the data you are querying.
Also, you REALLY should drop the delay to something sensible like 100-300 ms. When you set delay to ZERO, once you hit the 3-character trigger, effectively EVERY. SINGLE. KEY. STROKE. is sent as a new query to your server. This could easily have the unintended and unwelcome effect of slowing down the response even MORE.

Which is fastest? Data retrieval

Is it quicker to make one trip to the database and bring back 3000+ plus rows, then manipulate them in .net & LINQ or quicker to make 6 calls bringing back a couple of 100 rows at a time?
It will entirely depend on the speed of the database, the network bandwidth and latency, the speed of the .NET machine, the actual queries etc.
In other words, we can't give you a truthful general answer. I know which sounds easier to code :)
Unfortunately this is the kind of thing which you can't easily test usefully without having an exact replica of the production environment - most test environments are somewhat different to the production environment, which could seriously change the results.
Is this for one user, or will many users be querying the data? The single database call will scale better under load.
Speed is only one consideration among many.
How flexible is your code? How easy is it to revise and extend when the requirements change? How easy is it for another person to read and maintain your code? How portable is your code? what if you change to a diferent DBMS, or a different progamming language? Are any of these considerations important in your case?
Having said that, go for the single round trip if all other things are equal or unimportant.
You mentioned that the single round trip might result in reading data you don't need. If all the data you need can be described in a single result table, then it should be possible to devise a query that will get that result. That result table might deliver some result data in more than one row, if the query denormalizes the data. In that case, you might gain some speed by obtaining the data in several result tables, and composing the result yourself.
You haven't given enough information to know how much programming effort it will be to compose a single query or to compose the data returned by 6 queries.
As others have said, it depends.
If you know which 6 SQL statements you're going to execute beforehand, you can bundle them into one call to the database, and return multiple result sets using ADO or ADO.NET.
http://support.microsoft.com/kb/311274
the problem I have here is that I need it all, i just need it displayed separately...
The answer to your question is 1 query for 3000 rows is better than 6 queries for 500 rows. (given that you are bringing all 3000 rows back regardless)
However, there's no way you're going (to want) to display 3000 rows at a time, is there? In all likelihood, irrespective of using Linq, you're going to want to run aggregating queries and get the database to do the work for you. You should hopefully be able to construct the SQL (or Linq query) to perform all required logic in one shot.
Without knowing what you're doing, it's hard to be more specific.
* If you absolutely, positively need to bring back all the rows, then investigate the ToLookup() method for your linq IQueryable< T >. It's very handy for grouping results in non-standard ways.
Oh, and I highly recommend LINQPad (free) for trying out queries with Linq. It has loads of examples, and it also shows you the sql and lambda forms so you can familiarize yourself with Linq<->lambda form<->Sql.
Well, the answer is always "it depends". Do you want to optimize on the database load or on the application load?
My general answer in this case would be to use as specific queries as possible at the database level, therefore using 6 calls.
Thx
I was kind of thinking "ball park", but it sounds as though its a choice thing...the difference is likely small.
I was thinking that getting all the data and manipulating in .net would be the best - I have nothing concrete to base this on (hence the question), I just tend to feel that calls to the DB are expensive and if I know i need all the data...get it in one hit?!?
Part of the problem is that you have not provided sufficient information to give you a precise answer. Obviously, available resources need to be considered.
If you pull 3000 rows infrequently, it might work for you in the short term. However, if there are say 10,000 people that execute the same query (ignoring cache effects), this could become a problem for both the app and db.
Now in the case of something like pagination, it makes sense to pull in just what you need. But that would be a general rule to try to only pull what is necessary. It's much more elegant to use a scalpel instead of a broadsword. =)
If you are talking about a query that has already been run by SQL (so optimized by SQL Server), working with LINQ or a SqlDataReader might actually have the same performance.
The only difference will be "how hard will it be to maintain your code?"
LINQ doesn't query anything to the database until you ask for the result with ".ToList()" or ".ToArray()" or even ".Count()". LINQ is dynamically building your query so it is exactly the same as having a SqlDataReader but with runtime verification.
Rather than speculating, why don't you try both and measure the results?
It depends
1) if your connector implementation precaches a lot of objects AND you have big rows (for example blobs, contry polygons etc.) you have a problem, you have to download a LOT of data. I've optimalized once a code that had this problem and it was just downloading some megs of garbage all the time via localhost, and my software runs now 10 times faster because i removed the precaching by an option
2) If your rows are small and you have a good chance that you need to read through all the 3000, you're better going on a big resultset
3) If you don't use prepared statements, all queries have to be parsed! Big resultset might be better.
Hope it helped
I always stick to the rule of "bring in what I need" and nothing more...the problem I have here is that I need it all, I just need it displayed separately.
So say...
I have a table with userid and typeid. I want to display all records with a userid, and display on the page in grids say separated by typeid.
At the moment I call sproc that does "select field1, field2 from tab where userid=1",
then on the page set the datasource of a grid to from t in tab where typeid=2 select t;
Rather than calling a different sproc "select field1, field2 from tab where userid=1 and typeid=2" 6 times.
??

Resources