Web scrape return empty values

Web scrape return empty values - web-scraping

Trying to scoop the new inventory on this truck sales site, but can't seem to get the problem.
=IMPORTHTML("https://usedtrucks.ryder.com/en/search-used-trucks#/facet-search?g=tractor&r=united-states&group=tandem-axle-sleeper&price=22563&price=30741&f=new-inventory&trant=auto&trant=auto-shift&sleeper=condo&sort=11&view=list","table",1)
I have tried table number up to 35. :(

Google Sheets does not support web scraping of JavaScript controlled elements. you can easily check it by disabling JS for a given site and only what's left visible can be scraped. in your case, that's nothing, unfortunately:

Related

Web scraping from a google search page using html tag

I'm trying to do a google search and get the first 5 result (title/URL) into a excel document.
I tried using 'Data Scraping' but depending on the search term, google will display a different page. Sometimes its will have video, images or related search term. So most of the time, I was not able to actually get all the result from the page as uiPath would not recognize them, probably because of the different div. So my thought was to get them by html tag, as every title use H3 but I can't find a way to do that.
Also tried with find children > get attributes but no success, I feel that might be the best ways tho, I'm just not enough experimented with it to make it work. Tried for hours.
Anyone had a similar problem and found a solution?

When I did this before I had to do multiple scrapes to get the data. The first scrape will get the initial page results and then you can do a second to get data on page 2 forward. I have had instances where i had to do multiple scrapes on the first page to get all the information but after page 1 the data is consistent and easy to scrape. Hope this helps.

GTM Strips URL fragments breaking functionality

We have on our site a physician directory search which has been working cross platform for years. Over the last few months, we have been getting reports that the functionality is broken. My debugging into this issue has led me to find that GTM is actually stripping the URL fragments breaking the functionality in all browsers but IE.
We use Ajax calls to retrieve the directory page by page, 10 items at a time. Some results can yield up to 15 pages, but users are no longer able to get past page 2 of the result set. After page 2 it produces the search page again.
This was rewritten a number of years ago to utilize the URL hash as opposed to using the original cookie based system which simply didn't work. This can be easily reproduced using Chrome by:
Visit https://www.montefiore.org/doctors
Click Search By Specialty
Click Family Practice
Navigate to any secondary page, you will see that the hash fragments have been striped
When you try to navigate to any tertiary page, you are simply presented with the specialty list again.
I have gone through various debugging sessions and have even outsourced the issue to our outside developers, but the issue remains unresolved. Any idea on what could be causing GTM to strip out the fragments?

Google Analytics: Goals with regular expressions are not working

I know that it might be odd, but I need your help with the google analytics set up.
Task: I need to set up brochure downloads as a goal for international students on a page https://www.cqu.edu.au/international-students/international-brochures .
In a perfect world, I would need to set up an individual goal for each type of brochure download (postgraduation, undergraduate, English courses) but I decided to start from "all brochures" to save the number of goals that I have for the view. Unfortunately, I don't have a chance to set up "events", so I have to work with goals only.
Final goal destination: Any page containing "pdf_file" in its description.
Pathway: come to International section, move to brochures, then go to brochure page (containing "pdf_file" in its description, for ex. - https://www.cqu.edu.au/__data/assets/pdf_file/0005/158540/2017-Undergraduate-International-Guide.pdf).
The problem: I tried to use regular expressions such as "^/__data/assets/pdf_file/." or ^/pdf_file/(.) and I can't see conversions in real time test.
However, nothing helped, and goals (even the page visit) still aren't tracking correctly. What am I doing wrong? And, if possible, how can I split goals across different brochure types?
Many thanks,
Kirill

You are on the right track. You just need one Goal. The problem you have is that after clicking a pdf document you are being redirected to a PDF viewer iframe. This is a PDF view "page" with no Google analytics tracking code whatsoever.If you use are using destination goals the only way this will work is by having installed the Google Analytics (GA) tracking code at the "final destination page".
One way to track pdf "views" is by creating a short url for each one, hence you will be able to track or check how many of them have views.
Another way is to create an onclick event within each link. But this is only possible if you can setup the events in GA. Creating this kind of event tracking will allow you to set up labels for each pdf's name to be able to identify or track each one of them.

Web scraping of an eCommerce website using Google Chrome extension

I am trying to do web scraping of an eCommerce website and have looked for all major kind of possible solutions.The best I found out is web scraping extension of Google Chrome. I actually want to pull out all data available in the website.
For example, I am trying to scrape data of an eCommerce site www.bigbasket.com. Now while trying to create a site map , I am stuck to this part where I have to chose element from a page. Same page of say category A, while being scrolled down contains various products ,and one category page is further split as as page 1, page 2 and few categories have page 3 and so on as well.
Now if I am selecting multiple elements of same page say page 1 it's totally fine, but when I am trying to select element from page 2 or page 3, the scraper prompts with different type element section is disabled,and asks me to enable by selecting the checkbox, and after that I am able to select different elements. But when I run the site map and start scraping, scraper returns null values and data is not pulled out. I don't know how to overcome this problem so that I can draw a generalized site map and pull the data in one go.

To prevent web scraping various websites now use rendering by JavaScript. The website (bigbasket.com), you're using also uses JS for rendering info to various elements. To scrape websites like these you will need to use Selenium instead of traditional methods (like beautifulsoup in Java).
You will also have to check various legal aspects of this and whether the website wants you crawling this data.

Google Analytics Content Experiments: Possible to setup Variations for Multiple Pages at once?

I've recently learned about the new Google Analytics Content Experiments which looks interesting. ( http://analytics.blogspot.nl/2012/06/helping-to-create-better-websites.html )
The standard usecase seems to be that for a certain page, say a product detail page, you supply variations (different urls) and select a percentage of users that are included in the test. Such a user will be presented a variation of the product-detail page (and will continue to be presented the same variation over and over for continuation/ux reasons, based on cookies presumably) .
All fine and good.
However, say I have 100 products on my site. Just testing a variation on 1 of those products has imho the following disadvantages:
slow progressing tests because of lower nr of visitors.
the test isn't isolated. I.e: since other product detail pages aren't included in the test, displaying a variation-page for 1 product-detail page while all other product-detail pages show the original can (will) lead to a confusing experience (and thus skewed conversion statistics) for the user that browses multiple products, which most of them do.
To me it seems far better to be able to dynamically include all products of a certain type into the same test (e.g: all TV's) , for example by enabling to set some regular expression or other filter on urls to include in the test.
Is such a thing possible currently, scheduled, useful, or completely missing the point?
EDIT
Part of the solution seems to be "relative urls"
https://support.google.com/analytics/bin/answer.py?hl=en&answer=2664470
Taking the previous example one step further, we can see how the use
of relative URLs lets you easily run an experiment on a set of
different original pages, and test visual alternatives across that
group of pages (e.g., the product pages in an e-commerce site).
Remaining question: How to dynamically tag which pages belong to the experiment (e.g: based on regex)
Thanks.

The solution is to use relative url for the variation page.
E.g. you have a number of product pages:
www.mysite.com/products/eggs.html
www.mysite.com/products/cheese.html
www.mysite.com/products/bread.html
etc.
For each page you have a matching variation page:
www.mysite.com/products/eggs.html?var=bigpicture
www.mysite.com/products/cheese.html?var=bigpicture
www.mysite.com/products/bread.html?var=bigpicture
etc.
You want to use all the product pages in 1 experiment.
Go To google Analytics Content Experiments:
For the orginal page choose ONE of the many product pages (e.g. www.mysite.com/products/eggs.html) (This is just to get the experiment code and provide GA with an example page)
For the variation page choose relative url and put ?var=bigpicture
Then place the javascript required for the experiment on ALL the original product pages you want in the experiment
For more information see: http://support.google.com/analytics/bin/answer.py?hl=en&answer=2664470&topic=1745208&ctx=topic

Use the Javascript API as described here:
https://developers.google.com/analytics/devguides/collection/gajs/experiments#pro-server
You can set the experimentid programmatically in your code, on every page. Of course you need first to create the experiment in GA, in doing so provide GA fake urls for each variation, discard the GA generated code, ignore the validation errors.
And just use the experimentid as described in the link above.

OK, so a solution to this is:
Create experiment.
Chose a placeholder url for your original url. Something like www.example.com/products/eggs. Set variations as relative urls eg ?var=large_heading, ?var=small_price
Have some mechanism on the server-side which determines if the current user is part of the experiment. A simple cookie is good enough. If this cookie is present show a variation of the page.
If the user visits a product page but isn't in an experiment then show the javascript given when you created an experiment.
Add something to your product page which checks for the querystring var=[something]. When detected show the appropriate variation as well as setting the cookie which tells marks the user as being in an experiment.
You can hack around the JavaScript that Google gives you to make this a bit easier. Something like:
var variation = utmx('variation_code', 'A/B');
if (variation) { set_a_cookie(variation); }
utmx('url', 'A/B');
This is largely cribbed from the GWO Techie Guide. http://static.googleusercontent.com/external_content/untrusted_dlcp/www.google.com/en//websiteoptimizer/techieguide.pdf

There is also a way to the A/B testing with GA without experiment API if you really want to keep things simple. The idea behind it is to create your own split parameter and than you can pass it to GA as a custom variable. So you can yous your own development tools to differentiate the content in the groups and you don't have to use redirect. Here is a simple tutorial how to do this: link.

I recently implemented a GA experiment to test out different text on a nav bar across many pages. This is what worked for me:
Set up the experiment in GA for a single page. E.g. index.html and
index.html?var=menu2.
Implement the solution across multiple pages. Specifically,
insert the GA experiment code in all the pages
that you want to run
the test. Then ensure that your page(s) can render the page
variation based on the parameter passed. My php code went something
like this: If var=menu2, display page with menu2; otherwise,
display original menu.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex