Scraping dynamic information in R

Scraping dynamic information in R - r

I'm trying to use an xpath to scrape a figure I need on this website
I need these two numbers
So far I'm having no luck. Any help appreciated.

Does it need to be xPath? You can get it with:
library(rvest)
page <- read_html("http://www.myfxbook.com/community/outlook/EURUSD")
page %>% html_nodes("#leftColumn td:nth-child(4)") %>% html_text()

Related

Rvest and xpath returns misleading information

I am struggling with some scraping issues, using rvest and xpath.
The objective is to scrape the following page
https://www.barchart.com/futures/quotes/BT*0/futures-prices
and to extract the names of the futures
BTF21
BTG21
BTH21
etc for the full list of names.
The xpath for those variables seem to be xpath='//a'.
The following code provides no information of relevance, thus my query
library(rvest)
url <- 'https://www.barchart.com/futures/quotes/BT*0'
valuation_col <- url %>%
read_html() %>%
html_nodes(xpath='//a')
value <- valuation_col %>% html_text()
Any hint to proceed further to get the information would be much needed. Thanks in advance!

Web scraping price with the use of xml

I am trying to scrape the following: 13.486 Kč from: https://www.aofis.cz/informace-pro-klienty/elba-opf/
For some reason, the following code does not seem to find the number. I am rather a newbie to this so perhaps it is because the string in xml_find_all is wrong. Can anyone please have a look why?
library(xml)
library(xml2)
page <- "https://www.aofis.cz/informace-pro-klienty/elba-opf/"
read_page <- read_html(page)
Price <- read_page %>%
rvest::html_nodes('page-content') %>%
xml2::xml_find_all("//strong[contains(#class 'sg_selected')]") %>%
rvest::html_text()
Price
Thank you!!
Michael

The html code you see in your browser developer panel (or selector gadget) is not the same as the content that is being delivered to your R session. It is actually a javascript file which then builds the web page. This is why your rvest call isn't finding the correct html node: there are no html nodes in the string you are processing!
There are a few different ways to get the information you want, but perhaps the best is to just get the monetary values from the javascript code using regex:
page <- "https://www.aofis.cz/informace-pro-klienty/elba-opf/"
read_page <- httr::content(httr::GET(page), "text")
stringr::str_extract_all(read_page, "\\d+\\.\\d+ K")[[1]][1]
#> [1] "13.486 K"

Scraping a HTML table in R, after changing a Javascript dropdown option

I am looking to scrape the main table from this website. I have managed to get the table into R and working, but the one problem is the website defaults to PS4, while I wanted the data for Xbox (this is changed in the top-right dropdown menu).
Ideally there would be a way to pass a parameter in the URL that will define the platform in this way, but I haven't been able to find anything about that.
Looking around it seems that PhantomJS would be the best way to go but I have no experience using Javascript and I'm not sure how you would implement performing an action on the page, then scraping the resulting table.
This is currently all I have in terms of my main code scraping the data:
library(rvest)
url1 <- "https://www.futbin.com/19/players?page="
pge <- 1
tbl <- paste0(url1, pge) %>%
read_html() %>%
html_nodes(xpath='//*[#id="repTb"]') %>%
html_table()
Thanks in advance for any help.

Scrape title attribute from CSS with rvest

I use rvest to scrape web data.
I have the following CSS code from a website:
<abbr class="intabbr" title="2.856.890">2,9M</abbr>
I scrape this data with
library(rvest)
library(dplyr)
n <- read_html("https://www.last.fm/de/music/Fang+Island")
n %>%
html_node("abbr") %>%
html_text()
This gives me "2M", but what I would like to get is the "2.856.890".
I am not very knowledgeable in CSS: Is it possible to get the information which I want by the changing the expression in html_node()?
This post suggests that it is not possible, however this one suggests that it might be possible since it pops up as a tooltip on the page?

Use html_attr to get a tag's attribute:
n %>%
html_node("abbr") %>%
html_attr("title")

Web scraping with R using rvest for financial website

I am trying to scrape data table from following website using R, but it is not returning any value. I am using SelectorGadget to get the nodes detail.
library(rvest)
url = "http://www.bursamalaysia.com/market/derivatives/prices/"
text <- read_html(url) %>%
html_nodes("td") %>%
html_text()
output:
text
character(0)
I would appreciate any kind of help. Thank you!

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Scraping dynamic information in R - r

I'm trying to use an xpath to scrape a figure I need on this website I need these two numbers So far I'm having no luck. Any help appreciated.

Does it need to be xPath? You can get it with: library(rvest) page <- read_html("http://www.myfxbook.com/community/outlook/EURUSD") page %>% html_nodes("#leftColumn td:nth-child(4)") %>% html_text()

Related

Rvest and xpath returns misleading information

Web scraping price with the use of xml

Scraping a HTML table in R, after changing a Javascript dropdown option

Scrape title attribute from CSS with rvest

Web scraping with R using rvest for financial website

Categories

Resources