Is it possible to hook into GDI+ and save all strings sent to DrawString? - gdi+

I need to get a large table (300K+ rows) from an application and there is no export function.
After a lot of unsuccessful attempts I'm left with a Copy Paste macro that goes one row at a time. If there is a way to get the strings as they are drawn I could get a page(40 rows) at once.

If you aren't doing this for commercial purposes, you can use Detours to hook into drawstring really easily. There are some examples using detours on CodingTheWheel's blog series on a pokerbot. Even if the Detours option is unavailable, there is tons of information on windows api hooking on the web.

Related

How to scrape data on a web site that is changing contents in active manner?

I would like to grab satellite positions from the page(s) below, but I'm not sure if scraping is appropriate because the page appears to be updating itself every second using some internal code (it keeps updating after I disconnect from the internet). Background information can be found in my question at Space Stackexchange: A nicer way to download the positions of the Orbcomm-2 satellites.
I need a "snapshot" of four items simultaneously:
UTC time
latitude
longitude
altitude
Right now I use screen shots and manual typing. Since these values are being updated by the page - is conventional web-scraping going to work here? I found a "screen-scraping" tag, should I try to learn about that instead?
I'm looking for the simplest solution to get those four values, I wonder if I can just use urllib or urllib2 and avoid installing something new?
example page: http://www.satview.org/?sat_id=41186U I need to do 41179U through 41189U (the eleven Orbcomm-2 satellites that SpaceX just put in orbit)
Those values are calculated with a little math using javascript. The calculations are detailed here: https://www.satview.org/track.js
So, I guess one option is to write that script (plus any dependencies) in the language of your choice and use it to return your desired values.
There is one major function track() which takes arg $modo which can be one of two values - tic or plot.
There may be other source files (dependencies) referenced:
The easier way would probably be to use something which allows javascript to run e.g. automating a browser, and extracting the calculated values as they are generated.

Should I use Wordpress Transient API in this case?

I'm writing a simple Wordpress plugin for work and am wondering if using the Transients API is practical in this case, or if I should seek out another way.
The plugin's purpose is simple. I'm making a call to USZip Web Service (http://www.webservicex.net/uszip.asmx?op=GetInfoByZIP) to retrieve data. Our sales team is using a Lead Intake sheet that the plugin will run on.
I wanted to reduce the number of API calls, so I thought of setting a transient for each zip code as the key and store the incoming data (city and zip). If the corresponding data for a given zip code already exists, then no need to make an API call.
Here are my concerns:
1. After a quick search, I realized that the transient data is stored in the wp_options table and storing the data would balloon that table in no time. Would this cause a significance performance issue if the db becomes huge?
2. Is this horrible practice to create this many transient keys? It could easily becomes thousands in a few months time.
If using Transient is not the best way, could you please help point me in the right direction? Thanks!
P.S. I opted for the Transients API vs the Options API. I know zip codes don't change often, but they sometimes so. I set expiration time of 3 months.
A less-inflated solution would be:
Store a single option called uszip with a serialized array inside the option
Grab the entire array each time and simply check if the zip code exists
If it doesn't exist, grab the data and save the whole transient again
You should make sure you don't hit the upper bounds of a serialized array in this table (9,000 elements) considering 43,000 zip codes exist in the US. However, you will most likely have a very localized subset of zip codes.

ASP.net 3.5 Webservice returns large dataset

I know there are lots of similar questions out there like this, but all of the solutions are eithers ones I cannot use or do not work. The basics of the issue is that I have to make a web service call that returns a typed dataset. This dataset can have 30,000 rows or more in some cases. So my issue is how do I get the page to be more responsive and perhaps load everything while the web service is still downloading the dataset?
Please note that normally I would never return this amount of data and would instead do paging on the server side, but the requirements for this really lock down what I can do. I can make the web service return JSon if need be, but my problem at that point is how to get the JSON data back into a format that the gridview could use to bind the data. I know there is an external library out there, but that is out as well.
Sad to say that the restrictions I have here are pretty obscene, but they are what they are and I cannot really change them.
TIA
-Stanley
A common approach to this kind of scenario is to page (in chunks) your data as it comes back. Do this asynchronously (separate thread). You might even be able to do this in only two chunks: first 1000 rows, then the rest. It will seem very responsive to your users. If there is any way to require the users to filter the result-set, to reduce the result-set, that would be ideal.
#Lostdreamer is right. Use JQuery to do two AJAX calls. The first call gets the first 1000 rows then kicks off the second call (etc). Honestly, this is simply simulating what HTTP typically does (limiting packet sizes and loading multiple chunks).

How much data is too much for an array?

I have a flex webapp that retrieves some names & addresses from a database. Project works fine but I'd like to make it faster. Instead of making a call to the database for each name request, I could pre-load all names into an array & filter the array when the user makes a request. Before I go down this route though I wanted to check if it is even feasible to have an application w/ 50,000 or 1 million elements in an array? What is the limit b/f it slows down the app? (I anticipate that it will have a lot to do w/ what else is going on in my app but for this sake lets just assume the app ONLY consists of this huge array).
Searching through a large array can be slower than necessary, particularly if you're talking about 1 million records.
Can you split it into a few still-large-but-smaller arrays? If you're always searching by account number, then divide them up based on the first digit or two digits.
To directly answer your question though, pure AS3 processing of a 50,000 element array should be fine. Once you get over 250,000 I'd think you need to break it up.
Displaying that many UI elements is different though. If you try to bind a chart to a dataProvider with 10,000 elements, it's too much. Same for a list or datagrid.
But for pure model data, not ui bound, I'd recommend up to 250,000 in my experience.
If your loading large amounts of data (not sure if your using Lists though), you could check out James Wards post about using AsyncListView with paging to grab the data in chuncks as its needed. Gonna try and implement something like this soon. His runnable example uses 100,000 rows with paging of 100 (works with HttpService/AMF type calls):
http://www.jamesward.com/2010/10/11/data-paging-in-flex-4/
Yes, you could probably stuff a few million items in an array if you wanted to, and the Flash player wouldn't yell at you. But do you really want to?
Is the application going to take longer to start if it has to download the entire database locally before being able to work? If the additional time needed to download that much data isn't significant, are a few database lookups really worth optimizing?
If you have a good use case to do this, you're going to have to pay attention to the way you use those data structures. Looping over the array to find an item is going to be a bit slow, so you'll want to create indexes locally, most likely by using a few hash structures. The more flexible you allow the search queries to be, the more interesting the indexing issues will be.

Autocomplete optimization for large data sets

I am working on a large project where I have to present efficient way for a user to enter data into a form.
Three of the fields of that form require a value from a subset of a common data source (SQL Table). I used JQuery and JQuery UI to build an autocomplete, which posts to a generic HttpHandler.
Internally the handler uses Linq-to-sql to grab the data required from that specific table. The table has about 10 different columns, and the linq expression uses the SqlMethods.Like() to match the single search term on each of those 10 fields.
The problem is that that table contains some 20K rows. The autocomplete works flawlessly, accept the sheer volume of data introduces deleays, in the vicinity of 6 seconds or so (when debugging on my local machine) before it shows up.
The JqueryUI autocomplete has 0 delay, queries on the 3 key, and the result of the post is made in a Facebook style multi-row selectable options. (I almost had to rewrite the autocomplete plugin...).
So the problem is data vs. speed. Any thoughts on how to speed this up? The only two thoughts I had were to cache the data (How/Where?); or use straight up sql data reader for data access?
Any ideas would be greatly appreciated!
Thanks,
<bleepzter/>
I would look at only returning the first X number of rows using the .Take(10) linq method. That should translate into a sensbile sql call, which will put much less load on your database. As the user types they will find less and less matches, so they will only see that data they require.
I'm normally reckon 10 items is enough for the user to understand what is going on and still get to the data they need quickly (see the amazon.com search bar for an example).
Obviously if you can sort the data in a meaningful fashion then the 10 results will be much more likely to give the user what they are after quickly.
Returning the top N results is a good idea for sure. We found (querying a potential list of 270K) that returning the top 30 is a better bet for the user finding what they're looking for, but that COMPLETELY depends on the data you are querying.
Also, you REALLY should drop the delay to something sensible like 100-300 ms. When you set delay to ZERO, once you hit the 3-character trigger, effectively EVERY. SINGLE. KEY. STROKE. is sent as a new query to your server. This could easily have the unintended and unwelcome effect of slowing down the response even MORE.

Resources