Guidance on webscraping using Google Sheets [duplicate] - web-scraping

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I'm trying to get some data from a web page using import XML but it tells me "N/A Imported content is empty"
I've tried with a different query but is not working.
=IMPORTXML("https://www.shein.com/Floral-Lace-Halter-Teddy-Bodysuit-p-699186-cat-1862.html","//div[#class='opt-size j-sa-select-size j-opt-size']")
I want to be able to parse the different sizes of the clothing, that would be: XS, S, M, L, etc.

Google Sheets does not support web scraping of JavaScript controlled elements. you can easily check it by disabling JS for a given site and only what's left visible can be scraped. in your case, that's nothing, unfortunately:

Related

importxml to gather prices from undermine journal [duplicate]

This question already has an answer here:
Web scrape return empty values
(1 answer)
Closed 1 year ago.
I am trying to automate the prices of items in a game to display on a spreadsheet (World of Warcraft). I am using TheUndermine journal to get prices.
link : https://theunderminejournal.com/#us/garona/item/23445
what I want to get is the Current Price to display in my spreadsheet. The site is updated hourly.
=importxml("https://theunderminejournal.com/#us/garona/item/23445","/html/body/div[2]/div[2]/div[11]/div[1]/table/tr[3]/td/span")
=importxml("https://theunderminejournal.com/#us/garona/item/23445","//*[#id="item-page"]/div[1]/table/tr[3]/td/span")
I have tried these but to no avail. Any help would be appreciated. I have got this to work with other sites like retail sites and such, just not this one.
Thanks!
Google Sheets does not support web scraping of JavaScript controlled elements. you can easily check it by disabling JS for a given site and only what's left visible can be scraped. in your case, that's nothing, unfortunately:
JavaScript generated sites can't be fetched by import functions of sheets.
Moreover, I tried fetching the current price using script via UrlFetchApp but it seems that I can't make it work.
I tried finding an alternative site but was not able to find one. If you can provide a similar/alternative site that can show a data similar to the one from TUJ, I might be able to give you a working script but import functions from sheet will definitely not work.

Use a list of SKUs in a google sheet to pull their data from GOAT marketplace [duplicate]

This question already has an answer here:
Guidance on webscraping using Google Sheets [duplicate]
(1 answer)
Closed 1 year ago.
I am looking to pull the price data for each of the SKUs I have listed in a google sheet.
My initial try was the use a formula inside of the sheet.
=importxml("SKU Search URL", "//p/span[#data-qa='grid-cell-price']/text()")
This does not work, since the content is dynamic.
What would be the easiest method of obtaining this data?
Google Sheets does not support web scraping of JavaScript controlled elements. you can easily check it by disabling JS for a given site and only what's left visible can be scraped. in your case, that's nothing, unfortunately:

Using Google Sheet Formula IMPORTXML to extract hyperlinks from table on web page, and flag when an image is in seperate column

I'm doing some analysis which requires me to save table data and (hyperlinked) links to lots of PDF's from a webpage (https://www.asx.com.au/asx/v2/statistics/prevBusDayAnns.do).
I've been playing around with the =IMPORTHTML and =IMPORTXML formulas in Google Sheets and have managed to extract the table data using =IMPORTHTML(A1,"table",1), but I'm struggling to extract the "Price sens." column which contains images or the hyperlinks attached to the "Headline" items. I'm having no luck with IMPORTXML so far, and can't seem to find any solutions online.
The formula for IMPORTXML you're looking for is:
=IMPORTXML("https://www.asx.com.au/asx/v2/statistics/prevBusDayAnns.do","//*[#id='content']/div/announcement_data/table/tbody/tr")
You need to provide an XPATH, which you can get by clicking on an element in the browser dev tools and selecting copy > XPATH.
Unfortunately, while this does produce output, it's just the same as for IMPORTHTML. The price sensitivity column is always empty, too.
The reason for this is, that the content of the price sensitivity columns is not text, but an image, as you can see in your screenshots.
So it looks like you need some more powerful HTML parsing tools here than Google Sheets provides. It would be easy to look for img tags if you parsed the website using Python and Beautifulsoup, for instance. So you may want to go down this route.
Here's what I got using IMPORTXML, same as you:
The problem is that price sensitivity is img not text:

Using API to extract tagged answers in SurveyMonkey

In the Analyze page of SurveyMonkey you are able to add tags to open ended responses. Is there a way to extract these tags from SurveyMonkeys API? Specifically through R.

How would I show, in a Google Sheets cell, the number of issues found in a JIRA search

I have a Google sheet that creates URLs to JIRA with project ID and some parameters to have specific searches available from the "hub" sheet for each project listed. What I'd like to do is have the text in the hyperlink cell display the number of issues in the search from the link.
Now I'd just like to know what's the best way to do this, as I'm not a programmer at all so I'd rather spend time learning something that will end up working instead of just trying things on my own .-.
Could a kind soul maybe let me know what they think the best tool/flow for this would be?
PS: The reason I'm bothering with a sheet and not a JIRA Dashboard is that the order and list of the projects I need to keep track of changes every one or two days :[
if you are looking for scraping the generated URL, you will need to use one of the import formulae which fits your need
IMPORTHTML
IMPORTXML
IMPORTDATA
etc.
then all you is combine it like:
=HYPERLINK(CONCATENATE("URL link to search"), IMPORT...())

Resources