I'm new to xpath and scraping pages. I need to extract a link to the developer website from google play app page (Developer -> Visit Website) by using importxml function in google sheets. Tried several approaches, didn't work:
Started with //main
importxml(link; "//main/c-wiz[3]/div[1]/div[2]/div//div[9]/div/span/div/span/div/#href")
Full xpath from Developer Console
importxml(link; "//div[4]/c-wiz/div/div[2]/div/div/main/c-wiz[3]/div[1]/div[2]/div/div[9]/span/div/span/div[1]/a/#href")
Before scraping google play page, I had similar task for AppStore and came up with following formula that didn't work on Google Play: importxml(link; "//section[contains(#class,'section--link-list')]/ul/li[1]/a/#href")
For me the main issue now that the path to the website link is correct in the first two cases, but I cannot get any link at all. Can you please advice me how to scrape it correctly?
Thank you in advance!
try:
=REGEXEXTRACT(QUERY(FLATTEN(IMPORTDATA(A1)),
"where Col1 starts with 'url:'
and Col1 ends with '}'", 0), """(.*)""")
Related
im trying to learn how to use WEBSCRAPER.IO so i can build a database using a webpage's data (For a project) - But when i try to do as the video shows me i cannot get the scrape to go through the pages because the URL is different to that of the video.
Video Example
www.webpage/products/laptops?page=[1-20]
The webpage i want to scan
www.webpage/products/laptops/page/2/
So how would i create the Start URL for webscraper to go through the 20 pages
when i try to use the example from the video it only scans 1 page of my chosen webpage
I have tried veriations like
www.webpage/products/laptops/page/page=[1-20]/
www.webpage/products/laptops/page=[1-20]/
www.webpage/products/laptops?page=[1-20]/
but none of them seem to work. Im stuck.
Could anybody provide my with any advice.
Thank you.
I was using Picasa slideshow for a couple of years on multiple webpages and now it stopped working with popup window
Error: This API is no longer available.
example of webpage where the gadget is used:
http://www.b-mont.sk/
Link to the original Google gadget:
https://ab56117cb9163641ca621ac5d4df8b73601b2f7a.googledrive.com/host/0B4yfJJJSNrfubzJEUkxIYm1PcHM/picasa-slideshow-simple-nb.xml
I think Google deprecated some of the features gadget is using.
Please help me to fix this gadget
I believe it is working again at the moment, but that slideshow uses the deprecated Google Feed API: https://developers.google.com/feed/terms?hl=en#deprecation-policy. So it might go away at any time again. Here are some alternatives: Loading RSS feed with AJAX: alternatives to Google Feed API?
I am unable to activate Google rich snippet with my website www.hensdry.net As it shows all good in Google Structured tool but in Google results it does not show the results it suppose to be.
My website is www.hensdry.net
Thanks
Jim
Are you talking about a G+ image next to SERPs and Google's validator at http://www.google.com/webmasters/tools/richsnippets ?
Having correct structured data is only the first step in getting your G+ image next to SERPs. The final decision is up to Google, and they have recently reduced the number of sites that get the benefit of structured data. See http://www.google.com/search?q=google+reduces+authorship
Google take some time to crawl your pages & detect the structured codes added to your page's code. Adding structured code doesn't guarantee rich snippets. It all depends on the search query from the users. If the search query matches the data in your structured code, Google would show it a rich snippet. Otherwise, the regular snippet that matches the search query will be shown.
Reference: rich snippet guide by learnly.info
I recently created Google Custom Search Engine for my website and attached it on my site. Its working fine when I select
Search only selected site
under the advance option of Basic tab under setup menu of Google CSE. The problem arises when I select
Search the entire web but emphasize included site
As I understand from the last option that, it should show result from entire web but give preference to the included website. However, I could not find any result from my website.
I tried by typing in search
site:xxx.com
and it's showing all results to the specific website but if I place
site:mywebsite.com
It shows no results.
Can anybody shed some light on this.
Refreshing the page several times fixed it for me, though it's not a perfect solution. I think this is a problem that Google has to fix.
i want to scrape the top 10 search links from a google page on searching a keyword.
i am using webharvest . Planning to scrape the href links and filter out the top 10 using some
attribute pattern? Is it the right way,its not working at the moment. Any other simple way to do it ? :(
How about just using the google search REST API as described here.
It's easier to use Google Sheets (even you can monitor changes), but probably you have your reasons for choosing an external tool.
In general you need 3 functions to get results:
extract Title "//h3[#class='r']"
extract URL "//h3/a/#href"
clean URL "\/url\?q=(.+)&sa" - (All external URLs in Google Search results have tracking enabled and we’ll use Regular Expression to extract clean URLs)