Using ImportHTML and Google sheets for web scrape - web-scraping

I'm trying to scrape stock quotes from web pages using Google sheets and ImportHTML (and its variants like ImportXML and ImportData). It works on some web pages but not others. An example of a web page I'm unable to import data from is https://www.barchart.com/stocks/performance/price-change/advances.
I used the following code:
=IMPORTHTML("https://www.barchart.com/stocks/performance/price-change/advances","table",0). Is there a way to download or scrape this data?

You have other options. You can write a simple script in any language for web scraping, using python or JS, or try the barchart API, its free.
https://www.barchart.com/ondemand/free-market-data-api

Tl;Dr IMPORTHMTL, IMPORTXML and IMPORTDATA can't import data from the referred webpage because it requires JavaScript be enable on the web browser.
IMPORTHMTL, IMPORTXML and IMPORTDATA are able to get data from the source code of a file on the web.
The first two requires that the HTML be well formed, they can't get data from dynamic pages.
Reference
How to know if Google Sheets IMPORTDATA, IMPORTFEED, IMPORTHTML or IMPORTXML functions are able to get data from a resource hosted on a website?

Related

Upload an image from disk to Google sheets through API (preferably googlesheets4 package)

I want to upload an image to existing Google spreadsheets through Google's API. However, I a) haven#t been able to figure out how nor b) if there's even a more or less "easy" way of doing it.
Among others, I looked e.g. at: Insert image into Google Sheets cell using Google Sheets API or R script to insert images into Google Sheets
Also looking at ?googlesheets::gs4_formula indicate that adding images is generally possible. However, it looks like this can only be done for web images.
Is there any way to upload images from a) disk or b) plotted objects within R to an existing google sheet through R?
More specifically:
Is there a dedicated APi endpoint or would be the existing ones do the trick?
Is it possible to define a file path instead of a web path in the IMAGE function of Gsheets?

How to Scrape data from list

I'm trying to scrape data from [https://www.idealista.com/maps/madrid-madrid/][1]
I'm not getting the whole content of the page. I used BeautifulSoup python library. what I need is the list of the streets available on the webpage.
I'm a beginner to web scraping, Can anyone help on how to proceed or which libraries to use to get this done?

Looking for a web scraper tool to extract entire table out of web pages and put them in different sheets in excel

The first column of this table contains all the links I have to work with: https://www.metabolomicsworkbench.org/data/DRCCStudySummary.php?Mode=StudySummary&SortBy=Analysis&AscDesc=asc&ResultsPerPage=2000
From each of the links I have to download entire tables like this: https://www.metabolomicsworkbench.org/data/show_metabolites_by_study.php?STUDY_ID=ST000886&SORTFIELD=moverz_quant
and put each of the table from each of the links into separate sheets in excel.
I'd highly appreciate if anyone could tell me how to automate the entire process.
P.S.: I can't code...
ParseHub is a tool that is free and powerful web scraper to scrape data from tables.
I have used it in the past by following this step by step description:
NO CODING NEEDED.

Facebook search scrape

I want help regarding how i can save Facebook data from search results.
I have 1000 query URLs like:
https://www.facebook.com/search/people/?q=name
https://www.facebook.com/search/people/?q=mobile
How can I quickly scrape data from the resulting web pages?
I have tried to scrape with some scraper programs but could not get them to work. Does anyone have a faster way?
Use python requests library. It is pure and fast library. Scraping speed is not only dependent on your code, it also depends on the web site you are scraping.

Web scraping Oracle (ATG) Commerce

I am new to web scraping, and I use the following tool and method to scrap:
I use R (with packages Curl, XML, etc) to read the web pages (with a url link), and htmlTreeParse function to parse the html page.
Then in order to know get the data I want, I first use the developer tool i Chrome to insepct the code.
When I know in which node the data are, I use xpathApply to get them.
Usually, it works well. But I had an issue with this site: http://www.sephora.fr/Parfum/Parfum-Femme/C309/2
When you click on the link, you will load the page, and in fact it is the page 1 (of the products).
You have to load the url again (by entering a second time the url), in order to get the page 2.
When I use the usual process to read the data. The htmlTreeParse function always gives me the page1.
I tried to understand more this web site:
It seems that it is built with Oracle commerce (ATG commerce).
The "real" url is hidden, and when you click on the filter (for instance, you select a brand), you will get url with requestid: http://www.sephora.fr/Parfum/Parfum-Femme/C309?_requestid=285099
This doesn't help to know which selection I made.
Could you please help:
How can I access to more products ?
Thank you
I found the solution: selenium ! I think that it is the ultimate tool for web scraping. I posted several questions concerning web scraping, now with rselenium, almost everything is possible.

Resources