Getting Google Spreadsheet in the Background - asp.net

We have a Google Spreadsheet from which we wish to load data into our webpage.
I started by using the Google Spreadsheet APi via C# and the Google API .NET libraries to read the spreadsheet and load it into an html unsorted list.
The spreadhsheet has about 200 rows, but could have more, as it will be updated frequently. So the problem is that the users have to wait until the spreadsheed data is retrieved and parsed before they can see anything in the webpage (the page is white whilst loading).
How can I load this data in the background whilst the page loads?
I've already written my code in C# and don't much want to spend the time swapping to javascript, but I will if I have to.
Could I use the AJAX Control Toolkit to do this? I know it will load html, but can I use it to fetch google data?
What can I do here that would be fast and easy?
[Edit]
The account that hosts the google spreadsheet is inside a google domain, so it's documents can't be shared to the public as a whole - only to individuals. The C# libraries allow me to use the account's username and password to log into the account to get the spreadsheet data, and so the spreadsheet doesn't need to be shared at all. Even if I went with a javascript/ajax solution, I would yet need this functionality.

Well, this probably isn't the BEST answer, but it IS a solution. I'd like to see if y'all have a better one.
Anyway, I found this, which is an example of how to use an asp:Timer to delay the calling of a function for a certain amount of time - in my case, long enough for the page itself to load. At least this way, the user gets to see the page, and can watch the nice loading-gif until the actual content arrives.
It is an AJAXy approach that allows me to keep my c# programming without having to add any javascript.

Related

data being hidden by and class regenerated when scraping web page using Beautiful Soup

I am trying to pull pricing data from a website, but each time the page is loaded, thet class is regenerated to a different sequence of letters, and the price is showing instead of a number. Is there a technique that I can use to bypass this in any way? Thanks! Here is the line of html as how it appears when I inspect the element:
<div class="zlgJQq">$</div>
<div class="qFwqmC hkVukg2 njGalW"> </div>
Your help would be much appreciated!
Perhaps that website is actively discouraging you from scraping their data. That would explain the apparently random class names. You might want to read their terms of use to be sure that it's OK to scrape their site.
However, if the raw HTML does not contain the price data but it is visible when the page is rendered, then it's likely that Javascript is being used to insert the prices after the page has loaded. You could try enabling the developer tools in your browser and monitoring the network activity while the page is loading. That might reveal that the site is using dynamic Ajax queries to populate the price data, and you could then write code to interact with the Ajax resource directly.
It's also possible that the price data is embedded somewhere in the HTML, possibly obfuscated, and then loaded dynamically by javascript.
That's just a couple of suggestions. You will need to analyse the site to see whether automated scraping is feasible. If you can let us know what website you're dealing with then someone might be able to suggest something more specific.

Extracting content data from webpages

I'm looking to get structured article data from webpage urls. So far I've found these two services http://www.diffbot.com/ and http://embed.ly/extract/demos/nlp. Are there better alternatives or is it worthwhile to write the code to do this myself?
If you'd like to skip the code, and are looking for a simple software for web scraping / ETL applications, I'd suggest Foxtrot. It's easy enough to use and doesn't require coding. I use it to scrape data from certain gov't websites and dump it into an Excel spreadsheet for reporting purposes.
I have done web scraping / content extract for quite some time now.
For me the best approach is to write a Chrome content extension and automate the browser with their API. This requires that you know Javascript and HTML. In one of my recent projects I use a background page with a couple of editable divs to configure the scraping session. I have some buttons on the background page to start the process. The background page loads a JS script which listens to click events of the buttons.
When one of the buttons is clicked I add a new tab for the scraping session with chrome.tab.create. The background js also defines some chrome.tabs.onUpdated.addListener to inject content scripts when the tab url contains a specific page/domain name.
The content script then does the scraping job for example selecting some elements with jquery, regular expressions etc and finally send a message with an object back to background JS using chrome.runtime.sendmessage. The background JS script listens to messages with chrome.runtime.onMessage.addListener and acts based on the content being extracted.
The extension also automates web databases by clicking for example the next page links.
I have added a timing setting to control the amount of links being clicked / tabs being opened per minute so that the access is slowed down on purpose and too much crawling is avoided.
Finally the results are being uploaded to a database with an AJAX call and inserted with a PHP page into MySQL.
When the extension runs the next time it compares the keys/links which already exist in the database with another AJAX call and ensures that only new information is being extracted.
I have also built extension like the above with Firefox but the best and easiest solution for me is a Chrome/Chromium content extension.

Reading a PDF back from an iFrame?

I have a PDF document that is getting generated on the fly, and rendered on the fly to an iFrame within a radwindow. Basically the document is already largely prepopulated, however the user will still have a chunk of information that they are required to enter. I've found a good amount of information about sending a pdf TO an iframe, but not much information about going the other way. I have a button within the radwindow that can access the iframe object, however I'm somewhat lost as to where to go from there.
EDIT: The PDF is an editable form. I'm trying to pull back the entire PDF document as is, after the client side makes their entries to the form.
I think you'll need to send the file to the user so they can edit it locally and instruct them to upload it.
The content-disposition header with value attachment can help with the first task and you can use RadAsyncUpload to upload it: http://demos.telerik.com/aspnet-ajax/asyncupload/examples/overview/defaultcs.aspx.
I am not aware of ways to tap into the PDF viewer plugin the browsers use to show the PDF. Perhaps there is API from Adobe or some other third party plugin but that would rely on them and is out of your control.
Perhaps the JS PDF viewer from FireFox has something: https://mozillalabs.com/en-US/pdfjs/ but I don't know how stable and usable it is.
As per what was described in the comments, I ended up using postbacks through the PDF's themselves along with 1 pixel fields to store data required to identify the documents. It's a little hacky, but functional. I'm leaving this as an actual answer as this is as close to a real solution to the problem I originally had. This has been up and running for close to 4 years in this manner, and thus far hasn't caused any issues.

Show list of pages that are opened at the time being

I have a task to list all pages which are opened at that moment and show how many people are on that page.
I am looking for a way to make that happen without keeping any db records or saving information on a text file or smth like that. (Not seccessarily, then. Of course I am going to save that info to a dB, I just wanted to the logic of catching opened page addresses.)
I can of course keep track of every page which are opened till that time, but I want the page address appear on the page when someone opens that address and disappear when user is no longer browsing that address.
Can you give me some ideas how to make that happen using ASP.NET?
Note: I am using web forms with asp.net 4.5
Thanks!
"I just wanted to the logic of catching opened page addresses"
Use javascript in a timed loop (onload and then every 30 seconds perhaps) on every page, to asynchronously post to a page on your server. It should send information identifying the page. This will give you a good idea of how many people currently are on this page.
Store this information in a db in your code-behind, and use this information to report as you wish.
Of course if a user leaves their browser open on one of these pages or opens another tab it will still be reporting as 'open'.
To get the current url in javascript you can use:
var pathname = window.location.pathname;
In google analytics you can see what pages are being used in near-real time.
Why not use that to solve this issue - it's easy to setup.

VB.Net application - display a message to the user whilst the application is starting up

I have recently created an application where a lot of data is loaded into objects when the application starts up, and other data as it is required. For example if the user requests the catalogue page then it will load all the top level category data into objects of type Category. This will then stay there to be used by other users (who will therefore not have to load this data into objects) and can be altered by admin if they happen to login during the same application instance. I know this is not the most efficient solution, as pointed out below, but it works and the page load, at the moment, is not too long. It is very quick if most of the required data is already loaded into objects. It is also tailored to the business' needs - unlike other techniques such as Linq-to-SQL.
The problem I am facing is when a page is requested which requires lots of data to be displayed about different types of object. For example when a catalogue page is requested which displays information on a product which can be bought, it then loads all the products and categories (as the products make reference to the category object, not just the category name).
I would like to display a loading symbol with a message whilst all this data is being loaded into objects, so the user knows its not just in a loop or anything. Is there any way to do this? I am open to using JS / jQuery if I need to.
Thanks in advance.
Regards,
Richard
PS I am working on ways to make it more efficient - such as using HashTables or HashMaps. However this is taking time as there are so many different types of item (News, Events, Catalogue Item - Range, Collection, Design, RangeCollection, CollectionDesign, RangeCollectionDesign and RangeDesign - Users, PageViews and the list goes on).
Please correct me if I'm wrong, but I do believe that Javascript is required in order to display a "loading" image... Using server-side scriping alone would typically require an entire page load after all the content loads unless you want to start messing with IFrames.
This is a job for AJAX. A common solution to your problem is to have a small page that displays a loading icon. The page has some JavaScript that makes additional HTTP requests to the server to download the rest of the page. JQuery has a "$.ajax" method that is designed to simplify this process.
I would suggest looking at the documentation to the .ajax method in the jQuery documentation. Unfortunately, it seems to be a rather delicate process to get all the scripting code right and it takes a while to learn it all.

Resources