Web scraping Element filled via JavaScript - web-scraping

Here is the link in question: https://bittrex.com/Market/Index?MarketName=BTC-DASH
I would like to be able to scrape the data (or find the underlying REST API method) contained in the "Timeline" element.
I've used Chrome to "Inspect element" and look at network sources, but I can't seem to find it.

Related

Web scraping using rvest works partially ok

I'm new into web scraping using rvest in R and I'm trying to access to the left column match names of this betting house using xpath. So i know the names are under the tag. But i cant access to them using the next code:
html="https://www.supermatch.com.uy/live#/5009370062"
a=read_html(html)
a %>% html_nodes(xpath="//span") %>% html_text()
But i only access to some of the text. I was reading that this may be because the website dynamically pull data from databases using JavaScript and jQuery. Do you know how i can access to these match names? Already thank you guys.
Some generic notes about basic scraping strategies
Following refers to Google Chrome and Chrome DevTools, but those same concepts apply to other browsers and built-in developer tools too. One thing to remember about rvest is that it can only handle response delivered for that specific request, i.e. content that is not fetched / transformed / generated by JavasScript running on the client side.
Loading the page and inspecting elements to extract xpath or css selector for rvest seems to be most most common approach. Though the static content behind that URL versus the rendered page in browser and elemts in inspector can be quite different. To take some guesswork out of the process, it's better to start by checking what is the actual content that rvest might receive - open the page source and skim through it or just search for a term you are interested in. At the time of writing Viettel is playing, but they are not listed anywhere in the source:
Meaning there's no reason to expect that rvest would be able to extract that data.
You could also disable JavaScript for that particular site in your browser and check if that particular piece of information is still there. If not, it's not there for rvest either.
If you want to step further and/or suspect that rvest receives something different compared to your browser session (target site is checking request headers and delivers some anti-scraping notice when it doesn't like the user-agent, for example), you can always check the actual content rvest was able to retrieve, for example read_html(some_url) %>% as.character() to dump the whole response, read_html(some_url) %>% xml2::html_structure() to get formatted stucture of the page or read_html(some_url) %>% xml2::write_html("temp.html") to save the page content and inspect it in editor or browser.
Coming back to Supermatch & DevTools. That data on a left pane must be coming from somewhere. What usually works is a search on the network pane - open network, clear current content, refresh the page and make sure page is fully loaded; run a search (for "Viettel" for example):
And you'll have the URL from there. There are some IDs in that request (https://www.supermatch.com.uy/live_recargar_menu/32512079?_=1656070333214) and it's wise to assume those values might be related to current session or are just shortlived. So sometimes it's worth trying what would happen if we just clean it up a bit, i.e. remove 32512079?_=1656070333214. In this case it happens to work.
While here it's just a fragment of html and it makes sense to parse it with rvest , in most cases you'll end up landing on JSON and the process transforms into working with APIs. When it happens it's time to switch from rvest to something more apropriate for JSON, jsonlite + httr for example.
Sometimes plane rvest is not enough and you either want or need to work with the page as it would have been rendered in your JavaScript-enabled browser. For this there's RSelenium

Web scraping site using polymerjs / webcomponent

I'm using colly to web scrape youtube charts. This site use polymerjs and as a result, I'm having issues to capture the DOM elements. A simple test I did was document.querySelector("#search-native") on console, and it's returning null.
I saw an element called ytmc-app and I could get this element, but it's not possible to continue querying after that.
Someone has idea how to proceed?

Populate Drop Down List On Google Form Via Google Script

I'm trying to populate a form's drop down list on a google form via a google sheet. I've been looking at the following website and it's been walking me through the process. The issue is that I can't figure out the ID of the drop down list via the "inspect element" feature. I'm using Safari on a Mac. Here's what I see when I highlight the element:
Any idea what the ID would be?
I figured it out. Just in case any one runs into the same issue, the value you're looking for is called data-observe-id. Here's an image of what it looks like:

How to find the source of this javascript file?

I am trying to find which file calls this google analytics js file.
The file is not in the HTML source code. It is either call by a tag management system like GTM or another JS file.
I used chrome developer tools to track it down but not real luck finding it. So far I am thinking it is in the cdn.optimizely.com
Is there something I am missing or a tool that I haven't used yet?
This looks like it's being introduced by the GTM container, which can be seen in your screenshot, with the container ID of GTM-TDDX2G. If you have access to that container, then you should also be able to see the base/pageview tag which uses the particular tracking ID on that page.

Edit what displays in an iframe

I'm trying to display content from anther site on to mine using an iframe. I'm just wondering if its possible to only display certain parts (divs) of this external page in the iframe.
Thanks!
You could try and use some jQuery on your site to dynamically alter the styles of the external site. I did something similar with SSRS where we had an iframe containing SSRS reports which we wanted to style. We used jQuery in the master page to find the matching elements inside the frame target and alter them as required.
As long as the external site is well marked up (plenty of ids, good semantic structure) you may be able to hide/re-arrange elements as you require. You may also need to delay the jQuery execution as the frame contents may not be completely loaded by the time your JavaScript executes.
You can find a VERY simple example here.
BUT, be careful of the legalities involved with showing partial content from someone else's site. If you're presenting their site as your own or without identifying information, you could be infringing on their copyright.

Resources