scraping tables with rvest in R

scraping tables with rvest in R - r

I'm attempting to scrape the table featuring trading data from this website: https://emma.msrb.org/IssuerHomePage/Issuer?id=F5FDC93EE0375953E043151E0A0AA7D0&type=M
This should be a rather simple process, but I run this code:
library(rvest)
url <- "https://emma.msrb.org/IssuerHomePage/Issuer?
id=F5FDC93EE0375953E043151E0A0AA7D0&type=M"
deals <- url %>%
read_html() %>%
html_nodes(xpath='//*[#id="lvTrades"]') %>%
html_table()
deals <- deals[[1]]
and I get the following error:
Error in deals[[1]] : subscript out of bounds
On top of this, it seems the scrape isn't returning any text. Any ideas on what I'm doing wrong? Sorry if this seems a little elementary, I'm relatively new to this scraping stuff.

Related

Rvest and xpath returns misleading information

I am struggling with some scraping issues, using rvest and xpath.
The objective is to scrape the following page
https://www.barchart.com/futures/quotes/BT*0/futures-prices
and to extract the names of the futures
BTF21
BTG21
BTH21
etc for the full list of names.
The xpath for those variables seem to be xpath='//a'.
The following code provides no information of relevance, thus my query
library(rvest)
url <- 'https://www.barchart.com/futures/quotes/BT*0'
valuation_col <- url %>%
read_html() %>%
html_nodes(xpath='//a')
value <- valuation_col %>% html_text()
Any hint to proceed further to get the information would be much needed. Thanks in advance!

Problem with scraping news headlines in R

I am trying to scrape news headlines in R. Here is the sample code I have written. However, it is giving me a null set. Can someone tell me where am I going wrong?
library(tidyverse)
library(stringr)
library(rvest)
news_url1 <- "https://www.washingtonpost.com/newssearch/?query=economy&sort=Relevance&datefilter=All%20Since%202005&startat=0#top"
news_html1 <- read_html(as.character(news_url1))
news_html1 %>% html_nodes(".pb-feed-headline")%>% html_text()

How to scrape NBA data?

I want to compare rookies across leagues with stats like Points per game (PPG) and such. ESPN and NBA have great tables to scrape from (as does Basketball-reference), but I just found out that they're not stored in html, so I can't use rvest. For context, I'm trying to scrape tables like this one (from NBA):
https://i.stack.imgur.com/SdKjE.png
I'm trying to learn how to use HTTR and JSON for this, but I'm running into some issues. I followed the answer in this post, but it's not working out for me.
This is what I've tried:
library(httr)
library(jsonlite)
coby.white <- GET('https://www.nba.com/players/coby/white/1629632')
out <- content(coby.white, as = "text") %>%
fromJSON(flatten = FALSE)
However, I get an error:
Error: lexical error: invalid char in json text.
<!DOCTYPE html><html class="" l
(right here) ------^
Is there an easier way to scrape a table from ESPN or NBA, or is there a solution to this issue?

ppg and others stats come from]
https://data.nba.net/prod/v1/2019/players/1629632_profile.json
and player info e.g. weight, height
https://www.nba.com/players/active_players.json
So, you could use jsonlite to parse e.g.
library(jsonlite)
data <- jsonlite::read_json('https://data.nba.net/prod/v1/2019/players/1629632_profile.json')
You can find these in the network tab when refreshing the page. Looks like you can use the player id in the url to get different players info for the season.

You actually can web scrape with rvest, here's an example of scraping White's totals table from Basketball Reference. Anything on Sports Reference's sites that is not the first table of the page is listed as a comment, meaning we must extract the comment nodes first then extract the desired data table.
library(rvest)
library(dplyr)
cobywhite = 'https://www.basketball-reference.com/players/w/whiteco01.html'
totalsdf = cobywhite %>%
read_html %>%
html_nodes(xpath = '//comment()') %>%
html_text() %>%
paste(collapse='') %>%
read_html() %>%
html_node("#totals") %>%
html_table()

Web scraping html table in R, but the output remains empty

I am trying to scrape the data in the table at : https://www.flashscore.com/football/france/coupe-de-la-ligue-2005-2006/results/
I wrote the following code, but the output results remains empty.
library (rvest)
url <- "https://www.soccer24.com/france/coupe-de-la-ligue-2005-2006/results/"
results <- read_html(url) %>%
html_nodes(xpath='/html/body/div[6]/div[1]/div/div[1]/div[2]/div[7]/div[3]/table') %>%
html_table()
Does anyone know why results is empty and how to scrape this table ?

Web scraping with R using rvest for financial website

I am trying to scrape data table from following website using R, but it is not returning any value. I am using SelectorGadget to get the nodes detail.
library(rvest)
url = "http://www.bursamalaysia.com/market/derivatives/prices/"
text <- read_html(url) %>%
html_nodes("td") %>%
html_text()
output:
text
character(0)
I would appreciate any kind of help. Thank you!

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

scraping tables with rvest in R - r

Related

Rvest and xpath returns misleading information

Problem with scraping news headlines in R

How to scrape NBA data?

Web scraping html table in R, but the output remains empty

Web scraping with R using rvest for financial website

Categories

Resources