I am using Omnibug to analyze the data my company is tracking on our site. I'd like to go through each page and harvest the Omnibug data in order to create a table for our data tagging documentation. We have around 600 pages so I want to automate this process. Does anybody know how I can go about scraping the data Omnibug produces? It is a Developer Tools plugin on Chrome so any advice on how to scrape data from the Developer Tools interface would also be helpful.
Thanks in advance!
Related
im trying to learn how to use WEBSCRAPER.IO so i can build a database using a webpage's data (For a project) - But when i try to do as the video shows me i cannot get the scrape to go through the pages because the URL is different to that of the video.
Video Example
www.webpage/products/laptops?page=[1-20]
The webpage i want to scan
www.webpage/products/laptops/page/2/
So how would i create the Start URL for webscraper to go through the 20 pages
when i try to use the example from the video it only scans 1 page of my chosen webpage
I have tried veriations like
www.webpage/products/laptops/page/page=[1-20]/
www.webpage/products/laptops/page=[1-20]/
www.webpage/products/laptops?page=[1-20]/
but none of them seem to work. Im stuck.
Could anybody provide my with any advice.
Thank you.
I'm new to xpath and scraping pages. I need to extract a link to the developer website from google play app page (Developer -> Visit Website) by using importxml function in google sheets. Tried several approaches, didn't work:
Started with //main
importxml(link; "//main/c-wiz[3]/div[1]/div[2]/div//div[9]/div/span/div/span/div/#href")
Full xpath from Developer Console
importxml(link; "//div[4]/c-wiz/div/div[2]/div/div/main/c-wiz[3]/div[1]/div[2]/div/div[9]/span/div/span/div[1]/a/#href")
Before scraping google play page, I had similar task for AppStore and came up with following formula that didn't work on Google Play: importxml(link; "//section[contains(#class,'section--link-list')]/ul/li[1]/a/#href")
For me the main issue now that the path to the website link is correct in the first two cases, but I cannot get any link at all. Can you please advice me how to scrape it correctly?
Thank you in advance!
try:
=REGEXEXTRACT(QUERY(FLATTEN(IMPORTDATA(A1)),
"where Col1 starts with 'url:'
and Col1 ends with '}'", 0), """(.*)""")
previously this website:
https://fund.fipiran.ir/mf/compare/1
was easily scraped by =importhtml("https://fund.fipiran.ir/mf/compare/1","table",0)
But right now the structure of the website is changed and this formula does not work, what can I do?
As mentioned at the comments, the sites that are generated by script can't be scraped:
The content on that site is generated by script now, not posted as HTML. So there is no way Google Sheets can access the data.
I am scraping data from a site, and each item has a related document URL. I want to scrape data from that document, which is available is HTML format after clicking link. Right now, I've been using Google Sheets to ImportFeed to get the basic columns filled.
Is there a next step that I could do to go into each respective URL and grab elements from the document and populate the Google sheet with them? The reason I'm using the RSS feed (instead of python and BS is because they actually offer an RSS feed.
I've looked, and haven't found a question that matches mine specifically.
Haven't personally tried this yet but I've come across web scraping samples using App Script with the use of UrlFetchApp.fetch. You can also check the XmlService sample which is also related to scraping.
I got this kind of URL in my site: /preinscripcion/231, the number is the code of a course and it is self generated by my CRM, the problem comes when I want to see to which course the person is trying to register, I have an excel with this codes but it is too demanding and cumbersome as I already have hundreds of courses, I've tried with advanced filters to change my code with the name of the course, but I can only make up to 100 filters.
¿Is there a way of doing it?
It does not have to be on analytics itself, my site is made under wordpress http://formarte.edu.co/
Thanks in advance for all of your help.