IMPORTHTML / Table Pull Issues [duplicate] - web-scraping

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
Trying to import weather forecasts for multiple sales markets, but the site I was using blocked Bot Crawl, so my ImportHTML function can't fetch the URL.
I found another site, but the table is formatted in calendar view instead of the list view.
Can I still pull this information into Google Sheets (GS) somehow? I've gotten it to pull information, but it just comes up as [TABLE] in GS.
This is the code I was using to pull changing dates:
=CONCATENATE("https://www.wunderground.com/calendar/us/ca/eureka/KACV/date/",$B$3,"-",$C$3,"?cm_ven=localwx_calendar")
And the code to pull the completed URL's table into GS:
=IMPORTHTML(A2, "Table", 1)
I want the first string of code to pull today's year and month from B3 and C3, and Concatenate, and then the second string of code pulls all that together and then pulls the desired table from the website, but I get a bunch of cells with [TABLE].

There is an API that returns json. Look into documentation to see if there is an endpoint that meets your needs. For example, network tab shows the following for 15 day forecast
https://api.weather.com/v3/wx/forecast/daily/15day?language=en-US&apiKey=6532d6454b8aa370768e63d6ba5a832e&geocode=40.95%2C-124.11&units=e&format=json
You would probably need to write your own script to handle this response though or use a tool like ImportJSON. With a little research it is highly likely you will find something suitable.
Explore 15 day forecast JSON here

Related

How to create a dynamic report in ignition s perspective

I’m trying to create a report at the end of the day for 1 product that has a range of 40-100 pieces per day. Here’s my issue:
How can I pass a start date and time and a end date and time parameter to a report that has 1 graph and 1 table. The graph is data coming in from a sensor per piece of product and a table displaying the details for that run. There could be 40-100 pieces ran and I need to query a ms sql server to report on these pieces all in one report. Is there a way to do this? I know if repeater components, but idk if that’s possible in a report. I think the easiest way is to use scripting to query and create tables and graphs on the fly and add them to the report but I haven’t found an example of how to implement that. Any help is appreciated.
I’ve tried nested queries and nested tables to display a graph in the header and have details for every piece of product but the problem is the data for the graph isn’t the same as the data for the details

Issue scraping financial data via xpath + tables

I'm trying to build a stock analysis spreadsheet in Google sheets by using the importXML function in conjunction with XPath (absolute) and importHTML function using tables to scrape financial data from www.morningstar.co.uk key ratios page for the corresponding companies I like to keep an eye on.
Example: https://tools.morningstar.co.uk/uk/stockreport/default.aspx?tab=10&vw=kr&SecurityToken=0P00007O1V%5D3%5D0%5DE0WWE%24%24ALL&Id=0P00007O1V&ClientFund=0&CurrencyId=BAS
=importxml(N9,"/html/body/div[2]/div[2]/form/div[4]/div/div[1]/div/div[3]/div[2]/div[2]/div/div[2]/table/tbody/tr/td[3]")
=INDEX(IMPORTHTML(N9","table",12),3,2)
N9 being the cell containing the URL to the data source
I'm mainly using Morningstar as my source data due to the overwhelming amount of free information but the links keep on breaking, either the URL has slightly changed or the XPath hierarchy altered.
I'm guessing from what I've read so far is that busy websites such as these are dynamic and change often which is why my static links are breaking.
Is anyone able to suggest a solution or confirm if CSS selectors would be a more stable / reliable method of retrieving the data.
Many thanks in advance
Tried short XPath and long XPath links ( copied from dev tool in chrome ) frequently changed URL to repair link to data source but keeps breaking shortly after and unable to retrieve any information

How can i visualize a customer journey with tag manager and google data studio?

I have a website where the customer can take a quiz to find the right products. I want to track the customer's journey on the quiz, to see what they choose and where people fall off.
The quiz starts at example.com/start/ and for every step they take, the URL "expand" to e.g. example.com/start/first_step/, next step example.com/start/first_step/second_step etc.
I think I can do it with tag manager, by creating events for each step / URL. But my first issue is, that the events get to long. The other issue is, I cant figure out how to visualize the journey in either Google analytics 4 or Google data studio.
enter image description here
Does any of you have a great idea for how I can do it?
what exactly to you mean saying "the events get to long"? Do you mean the requests payload to the google analytics server?
Generally speaking, I'd recommend the following:
Create a generic event for all quiz steps
Add two parameter for the progress, one containing the steps name an one containing the index (e.g. "quiz_step_str" & "quiz_step_index")
Send the events to GA4
Visualize them using a bar chart. (https://analyticsdemystified.com/google-analytics/step-step-guide-creating-funnels-googles-data-studio/)
Additional information regarding step 2:
The parameter containing the step name can be generated using js and getting the value from the url. Just scrape the corrsponding part from it (e.g. "example.com/start/first_step/" -> "first_step").
For the parameter containing the steps' index, I recommend creating a lookup table.

Scraping sector information from Yahoo Finance into Google Sheets using IMPORTXML [duplicate]

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I am very new to web-scraping and was introduced to it just today after trying to figure out a formula on a spreadsheet.
I would like to retrieve the Sector information onto Yahoo Finance into Google Sheets. I would also like to the data to update when there is a change to cell B7. Link: https://finance.yahoo.com/quote/MIDD/profile?p=MIDD
I came up with the following, but get a #N/A error: =importxml("https://finance.yahoo.com/quote/",B7,"/profile?p=",B7, "//*[#class='Fw(600) [#data-reactid='21']")
Please let me know what I might be doing wrong. Thank you in advance.
Solution
This is the right syntax to use IMPORTXML formula:
=IMPORTXML("URL", "XPATH_QUERY")
In your case this will translate to:
=importxml("https://finance.yahoo.com/quote/"&B7&"/profile?p="&B7,"//*[#class='Fw(600)'] [#data-reactid='21']")
Which will return an empty result.
Considerations
Keep in mind that many sites go to great lengths to actively prevent scraping. Allowing you to scrape their data entirely, undermines their business model. Since they might make profit from adds for example.
Check in the page you want to scrape if the tags you are watching for correspond to the data you wanted to get in the first place. I believe in this case it's just a matter of changing the tags values to the proper ones.

Attempting to design a flexible reporting system. Getting stuck

I’m having some trouble coming up with a future-proof-ish design for reports for a company. Essentially the requirements are:
Be able to pull whatever data from the database
Generate formatted report from that data by populating a template (HTML, docx)
Export to Word and/or PDF
So initially I made an API endpoint per report (this is a web app), and had PDFs generated and formatted correctly.
But now I need to get the data into .docx/Word format, and I’m trying to figure out how I can design something as D.R.Y. as possible so that I don’t have to put in a TON of work every time the company decides they need another report (they’ve done this two, three times which is how I became aware that I had coded myself into a corner).
Every report I’ve done thus far has been done via a “brute-force” method: code the queries needed for the report, format the data, and then render to PDF (using HTML to PDF via phantomjs).
The complexity occurred when the company came back and said “Hey, we need all of those reports in Word format, also we have 3 other new reports that we need and a report that is a slight variation on the old one but +/- 2 fields”.
I am just having trouble coming up with a solid design/abstraction here, one that doesn’t send me down a week long hacking spree every time a requirement changes.

Resources