This question already has an answer here:
Web scrape return empty values
(1 answer)
Closed 1 year ago.
I am trying to automate the prices of items in a game to display on a spreadsheet (World of Warcraft). I am using TheUndermine journal to get prices.
link : https://theunderminejournal.com/#us/garona/item/23445
what I want to get is the Current Price to display in my spreadsheet. The site is updated hourly.
=importxml("https://theunderminejournal.com/#us/garona/item/23445","/html/body/div[2]/div[2]/div[11]/div[1]/table/tr[3]/td/span")
=importxml("https://theunderminejournal.com/#us/garona/item/23445","//*[#id="item-page"]/div[1]/table/tr[3]/td/span")
I have tried these but to no avail. Any help would be appreciated. I have got this to work with other sites like retail sites and such, just not this one.
Thanks!
Google Sheets does not support web scraping of JavaScript controlled elements. you can easily check it by disabling JS for a given site and only what's left visible can be scraped. in your case, that's nothing, unfortunately:
JavaScript generated sites can't be fetched by import functions of sheets.
Moreover, I tried fetching the current price using script via UrlFetchApp but it seems that I can't make it work.
I tried finding an alternative site but was not able to find one. If you can provide a similar/alternative site that can show a data similar to the one from TUJ, I might be able to give you a working script but import functions from sheet will definitely not work.
Related
This question already has an answer here:
Guidance on webscraping using Google Sheets [duplicate]
(1 answer)
Closed 1 year ago.
I am looking to pull the price data for each of the SKUs I have listed in a google sheet.
My initial try was the use a formula inside of the sheet.
=importxml("SKU Search URL", "//p/span[#data-qa='grid-cell-price']/text()")
This does not work, since the content is dynamic.
What would be the easiest method of obtaining this data?
Google Sheets does not support web scraping of JavaScript controlled elements. you can easily check it by disabling JS for a given site and only what's left visible can be scraped. in your case, that's nothing, unfortunately:
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I'm trying to get some data from a web page using import XML but it tells me "N/A Imported content is empty"
I've tried with a different query but is not working.
=IMPORTXML("https://www.shein.com/Floral-Lace-Halter-Teddy-Bodysuit-p-699186-cat-1862.html","//div[#class='opt-size j-sa-select-size j-opt-size']")
I want to be able to parse the different sizes of the clothing, that would be: XS, S, M, L, etc.
Google Sheets does not support web scraping of JavaScript controlled elements. you can easily check it by disabling JS for a given site and only what's left visible can be scraped. in your case, that's nothing, unfortunately:
I have a Google sheet that creates URLs to JIRA with project ID and some parameters to have specific searches available from the "hub" sheet for each project listed. What I'd like to do is have the text in the hyperlink cell display the number of issues in the search from the link.
Now I'd just like to know what's the best way to do this, as I'm not a programmer at all so I'd rather spend time learning something that will end up working instead of just trying things on my own .-.
Could a kind soul maybe let me know what they think the best tool/flow for this would be?
PS: The reason I'm bothering with a sheet and not a JIRA Dashboard is that the order and list of the projects I need to keep track of changes every one or two days :[
if you are looking for scraping the generated URL, you will need to use one of the import formulae which fits your need
IMPORTHTML
IMPORTXML
IMPORTDATA
etc.
then all you is combine it like:
=HYPERLINK(CONCATENATE("URL link to search"), IMPORT...())
I am trying to automatically pull a few pieces of data from Keepa on Amazon products to a google sheet using IMPORTXML. I have very little programming experience so honestly I don't exactly know what I'm doing. I need to pull stuff like sales rank, Amazon price, and New price.
From the research I've done I think part of the problem I'm running into is that all the information is filled in by java therefore isn't in the html to be able to be pulled using IMPORTXML. I'm copying the Xpath to try to get it to find the right place look.
=IMPORTXML("https://keepa.com/#!product/1-B00066LORG","//*[#id='borderLayout_eGridPanel']/div[1]/div/div/div[3]/div[3]/div/div/div[1]/div[2]/span/span")
I want it to pull that value into the sheet and simply put it into a cell. Most of the time I am getting a #N/A Error that the imported content is empty. Any help will be greatly appreciated, thank you!
I use Kimonolabs right now for scraping data from websites that have the same goal. To make it easy, lets say these websites are online shops selling stuff online (actually they are job websites with online application possibilities, but technically it looks a lot like a webshop).
This works great. For each website an scraper-API is created that goes trough the available advanced search page to crawl all product-url's. Let's call this API the 'URL list'. Then a 'product-API' is created for the product-detail-page that scrapes all necessary elements. E.g. the title, product text and specs like the brand, category, etc. The product API is set to crawl daily using all the URL's gathered in the 'URL list'.
Then the gathered information for all product's is fetched using Kimonolabs JSON endpoint using our own service.
However, Kimonolabs will quit its service end of february 2016 :-(. So, I'm looking for an easy alternative. I've been looking at import.io, but I'm wondering:
Does it support automatic updates (letting the API scrape hourly/daily/etc)?
Does it support fetching all product-URL's from a paginated advanced search page?
I'm tinkering around with the service. Basically, it seems to extract data via the same easy proces as Kimonolabs. Only, its unclear to me if paginating the URL's necesarry for the product-API and automatically keeping it up to date are supported.
Any import.io users here that can give advice if import.io is a usefull alternative for this? Maybe even give some pointers in the right direction?
Look into Portia. It's an open source visual scraping tool that works like Kimono.
Portia is also available as a service and it fulfills the requirements you have for import.io:
automatic updates, by scheduling periodic jobs to crawl the pages you want, keeping your data up-to-date.
navigation through pagination links, based on URL patterns that you can define.
Full disclosure: I work at Scrapinghub, the lead maintainer of Portia.
Maybe you want to give Extracty a try. Its a free web scraping tool that allows you to create endpoints that extract any information and return it in JSON. It can easily handle paginated searches.
If you know a bit of JS you can write CasperJS Endpoints and integrate any logic that you need to extract your data. It has a similar goal as Kimonolabs and can solve the same problems (if not more since its programmable).
If Extracty does not solve your needs you can checkout these other market players that aim for similar goals:
Import.io (as you already mentioned)
Mozenda
Cloudscrape
TrooclickAPI
FiveFilters
Disclaimer: I am a co-founder of the company behind Extracty.
I'm not that much fond of Import.io, but seems to me it allows pagination through bulk input urls. Read here.
So far not much progress in getting the whole website thru API:
Chain more than one API/Dataset It is currently not possible to fully automate the extraction of a whole website with Chain API.
For example if I want data that is found within category pages or paginated lists. I first have to create a list of URLs, run Bulk Extract, save the result as an import data set, and then chain it to another Extractor.Once set up once, I would like to be able to do this in one click more automatically.
P.S. If you are somehow familiar with JS you might find this useful.
Regarding automatic updates:
This is a beta feature right now. I'm testing this for myself after migrating from kimonolabs...You can enable this for your own APIs by appending &bulkSchedule=1 to your API URL. Then you will see a "Schedule" tab. In the "Configure" tab select "Bulk Extract" and add your URLs after this the scheduler will run daily or weekly.