Web scraping html table in R, but the output remains empty - r

I am trying to scrape the data in the table at : https://www.flashscore.com/football/france/coupe-de-la-ligue-2005-2006/results/
I wrote the following code, but the output results remains empty.
library (rvest)
url <- "https://www.soccer24.com/france/coupe-de-la-ligue-2005-2006/results/"
results <- read_html(url) %>%
html_nodes(xpath='/html/body/div[6]/div[1]/div/div[1]/div[2]/div[7]/div[3]/table') %>%
html_table()
Does anyone know why results is empty and how to scrape this table ?

Related

How to scrape data from a website with similar “#” urls in menu tabs using R?

I want to scrape stock data from other tabs of the following website: http://www.tsetmc.com/Loader.aspx?ParTree=151311&i=35178706978554988 But all of them have same urls. when I try to use "rvest" library function such as read_html() , html_nodes() and html_text() , I can just scrape data from the main tab. Switching between tabs get same results. I tried to use following code, but still couldn't get appropriate results.
Previously I could extract some info such as "InsCode" and "ZTitad" stored in the "" section using "rvest". But because all other tabs' data are not written in the "html-source" section, I didn't have any idea what to do.
#Scraping Libraries
library(rvest)
library(jsonlite)
#Target website
my_url<-"http://www.tsetmc.com/Loader.aspx?ParTree=151311&i=35178706978554988"
pagesource <- read_html(my_url)
content<- pagesource %>% html_node("script") %>% html_text()
data <- fromJSON(content)
Ultimately I want to export "حقیقی-حقوقی" tab data into a data-frame to continue my other analysis.

Web scraping with R using rvest for financial website

I am trying to scrape data table from following website using R, but it is not returning any value. I am using SelectorGadget to get the nodes detail.
library(rvest)
url = "http://www.bursamalaysia.com/market/derivatives/prices/"
text <- read_html(url) %>%
html_nodes("td") %>%
html_text()
output:
text
character(0)
I would appreciate any kind of help. Thank you!

scraping tables with rvest in R

I'm attempting to scrape the table featuring trading data from this website: https://emma.msrb.org/IssuerHomePage/Issuer?id=F5FDC93EE0375953E043151E0A0AA7D0&type=M
This should be a rather simple process, but I run this code:
library(rvest)
url <- "https://emma.msrb.org/IssuerHomePage/Issuer?
id=F5FDC93EE0375953E043151E0A0AA7D0&type=M"
deals <- url %>%
read_html() %>%
html_nodes(xpath='//*[#id="lvTrades"]') %>%
html_table()
deals <- deals[[1]]
and I get the following error:
Error in deals[[1]] : subscript out of bounds
On top of this, it seems the scrape isn't returning any text. Any ideas on what I'm doing wrong? Sorry if this seems a little elementary, I'm relatively new to this scraping stuff.

Web Scraping NBA Fantasy Projections - R

There are a number of NBA Fantasy Projections that I would like to scrape in a more streamlined approach. Currently I use a combination of importhtml function in google sheets and simple archaic cut'n'paste.
I use R regularly to scrape other data from the internet, however, I can't manage to get these tables to scrape. The tables I am having trouble with are located at three separate addresses (1 table per page), they are:
1) http://www.sportsline.com/nba/player-projections/player-stats/all-players/
2) https://swishanalytics.com/optimus/nba/daily-fantasy-projections
3) http://www.sportingcharts.com/nba/dfs-projections/
For all my other scraping activities I use packages rvest and xml. Following the same process I've tried both methods listed below which result in the outputs shown. I'm sure this has something to do with the format of the table on the website, however I haven't been able to find something that can help me.
Method 1
library(XML)
projections1 <- readHTMLTable("http://www.sportsline.com/nba/player-projections/player-stats/all-players/")
projections2 <- readHTMLTable("https://swishanalytics.com/optimus/nba/daily-fantasy-projections")
projections3 <- readHTMLTable("http://www.sportingcharts.com/nba/dfs-projections/")
Output
projections1
named list()
projections2
named list()
Warning message:
XML content does not seem to be XML: 'https://swishanalytics.com/optimus/nba/daily-fantasy-projections'
projections3 - I get the headers of the table but not the content of the table.
Method 2
library(rvest)
URL <- "http://www.sportsline.com/nba/player-projections/player-stats/all-players/"
projections1 <- URL %>%
read_html %>%
html_nodes("table") %>%
html_table(trim=TRUE,fill=TRUE)
URL <- "https://swishanalytics.com/optimus/nba/daily-fantasy-projections"
projections2 <- URL %>%
read_html %>%
html_nodes("table") %>%
html_table(trim=TRUE,fill=TRUE)
URL <- "http://www.sportingcharts.com/nba/dfs-projections/"
projections3 <- URL %>%
read_html %>%
html_nodes("table") %>%
html_table(trim=TRUE,fill=TRUE)
Output
projections1
list()
projections2 - I get the headers of the table but not the content of the table.
projections3 - I get the headers of the table but not the content of the table.
If anybody could point me in the right direction it would be greatly appreciated.
the content of the table is generated by javascript, so readHTMLTable and read_html find nothing, you can find the table as below
projections1: link
import requests
url = 'http://www.sportsline.com/sportsline-web/service/v1/playerProjections?league=nba&position=all-players&sourceType=FD&game=&page=PS&offset=0&max=25&orderField=&optimal=false&release2Ver=true&auth=3'
r = requests.get(url)
print r.json()
projections2: view-source:https://swishanalytics.com/optimus/nba/daily-fantasy-projections Line 1181
import requests
url = 'https://swishanalytics.com/optimus/nba/daily-fantasy-projections'
r = requests.get(url)
text = r.content
print eval(text.split('this.players = ')[1].split(';')[0])
projections3: view-source Line 918

How can I scrape data from this website (multiple webpages) using R?

I am a beginner in scraping data from website. It seems difficult for me to interpret the structure of html using XML or other packages.
Can anyone help me to download the data from this website?
http://wszw.hzs.mofcom.gov.cn/fecp/fem/corp/fem_cert_stat_view_list.jsp
It is about the investment from China. The character set is in Chinese.
What I've tried so far:
library("rvest")
url <- "http://wszw.hzs.mofcom.gov.cn/fecp/fem/corp/fem_cert_stat_view_list.jsp"
firm <- url %>%
html() %>%
html_nodes(xpath='//*[#id="Grid1MainLayer"]/table[1]') %>%
html_table()
firm <- firm[[1]] head(firm)
You can try with the function in the XML package called readHTMLTable that should download all the tables in the page and already format it into a data.frame.
library(XML)
all_tables = readHTMLTable("http://wszw.hzs.mofcom.gov.cn/fecp/fem/corp/fem_cert_stat_view_list.jsp")
Then since there is only one table in the page you linked it should be enough to get the first element so:
target_table = all_tables[[1]]

Resources