Scraping Tabular(Equity historical) data from the nse website - r

I'm trying to webscrape equity historical data from the nse website :
https://www.nseindia.com/products/content/equities/equities/eq_security.htm
I Tried to web scrape data data for a company(symbol name) named RELIANCE for the range(time period) past 2 weeks and transfer the contents to a CSV file
library(rvest)
url <- "https://www.nseindia.com/products/dynaContent/common/productsSymbolMapping.jsp?symbol=RELIANCE&segmentLink=3&symbolCount=2&series=ALL&dateRange=15days&fromDate=&toDate=&dataType=PRICEVOLUMEDELIVERABLE"
page_html <- read_html(url)
data <- html_nodes(page_html, "p")
data <- html_text(data)
write.csv(data$data, "scrapedData.csv", row.names=FALSE)
Its Says character(empty)
I know that there is an option to download the csv file there in the website but i want an automated R Script for getting the data.
I know that there are other packages such as quantmod are present for getting historical stock data but i require from this website as it has useful information such as TTQ,Turnover,etc.

why reinvent the wheel?
you can use nsepy python module.
https://github.com/swapniljariwala/nsepy
there are similar alternatives exist.

You just need to use this:
from nsepy import get_history
from datetime import date
data = get_history(symbol="SBIN", start=date(2015,1,1), end=date(2015,1,31))

Related

How to use jsonlite to import CMS dataset

I am trying to import a dataset from CMS using an API. My code, however, only returns 1,000 of the 155,262 observations. I don't know what I am doing wrong. Another user posted a similar problem, but regrettably, I still cannot it figure out.
library(jsonlite)
# url for CMS dataset
url <- 'https://data.cms.gov/data-api/v1/dataset/3cc6ad89-5cc0-4071-91e1-2a91aff79975/data?'
# read url and convert to data.frame
document <- fromJSON(url)
This is the link to the website on CMS: https://data.cms.gov/provider-characteristics/hospitals-and-other-facilities/provider-of-services-file-hospital-non-hospital-facilities. I am interested in accessing the POS file for Q4 2021. Thanks for your help.

Scraping data in Excel format from URL into R

I am trying to download data from url
https://migration.iom.int/datasets/europe-%E2%80%94-mixed-migration-flows-europe-quarterly-overview-april-june-2021
On this page is available dataset with file into Excel and link for downloading data is https://migration.iom.int/system/tdf/datasets/Q2%202021%20Mixed%20Migration%20Flows%20to%20Europe%20%28April%20-%20June%202021%29.xlsx?file=1&type=node&id=12261
So I want to download all this data in Excel format directly into R.
library(rvest)
URL <- "https://migration.iom.int/system/tdf/datasets/Q2%202021%20Mixed%20Migration%20Flows%20to%20Europe%20%28April%20-%20June%202021%29.xlsx?file=1&type=node&id=12261"
pg <- read_html(URL)
html_attr(html_nodes(pg, "download"), "href")
But I made some mistake and I don't make download. So can anybody help me how to download this data into R .
I personally would go about it in the following way.
Download the data into a specified destination, read the excel file from that location. An idea would be:
download.file(url, destinationFile)
fileInR <- read.table(file = desinationFile,sep = “\t”)
However, a simple google search for both (downloading and reading in an excel file in R) should provide you with plenty more options.

Download US census files from the web using R

I am trying to download all 1980 US Census files from the URL https://www2.census.gov/census_1980/ and store in my computer using R.
I already tried download.file and the package downloader, but the usual commands download only one file with no format.
Is there an easy way to download all files (including subfolders, etc) at once in R?
You can check if data you are interested in are in FRED | U.S. Census Bureau https://fred.stlouisfed.org/source?soid=19
If you are interested in somethin specific it is easy to get data with
# install.packages(quantmod)
library(quantmod)
retail_sales_total <- getSymbols('MRTSSM44X72USS', src = 'FRED', auto.assign = FALSE)
But if you want to get all files it is possible using xml2 and rvest packages.
# Readhtml
page <- read_html(URL)
# Try to extract the atributes of the html and get all the download links
links <- html_attr(html_nodes(page, "a"), "href")
and download it all in a loop

download/scraping table from htmlTable

I am trying to download a csv from
https://oceanwatch.pifsc.noaa.gov/erddap/griddap/goes-poes-1d-ghrsst-RAN.html
or I am trying to scrape data frame the html table output from the website found here
https://oceanwatch.pifsc.noaa.gov/erddap/griddap/goes-poes-1d-ghrsst-RAN.htmlTable?analysed_sst[(2019-02-09T12:00:00Z):1:(2019-02-09T12:00:00Z)][(-6.975):1:(42.025)][(179.025):1:(238.025)],analysis_error[(2019-02-09T12:00:00Z):1:(2019-02-09T12:00:00Z)][(-6.975):1:(42.025)][(179.025):1:(238.025)],mask[(2019-02-09T12:00:00Z):1:(2019-02-09T12:00:00Z)][(-6.975):1:(42.025)][(179.025):1:(238.025)],sea_ice_fraction[(2019-02-09T12:00:00Z):1:(2019-02-09T12:00:00Z)][(-6.975):1:(42.025)][(179.025):1:(238.025)]
I have tried to scrape the data using
library(rvest)
url <- read_html("https://oceanwatch.pifsc.noaa.gov/erddap/griddap/goes-
poes-1d-ghrsst-RAN.htmlTable?analysed_sst[(2019-02-09T12:00:00Z):1:(2019-
02-09T12:00:00Z)][(-7):1:(42)][(179):1:(238)],analysis_error[(2019-02-
09T12:00:00Z):1:(2019-02-09T12:00:00Z)][(-7):1:(42)][(179):1:
(238)],mask[(2019-02-09T12:00:00Z):1:(2019-02-09T12:00:00Z)][(-7):1:(42)]
[(179):1:(238)],sea_ice_fraction[(2019-02-09T12:00:00Z):1:(2019-02-
09T12:00:00Z)][(-7):1:(42)][(179):1:(238)]")
test <- url %>%
html_nodes(xpath='table.erd.commonBGColor.nowrap') %>%
html_text()
And I have tried to download a csv with
download.file(url, destfile = "~/Documents/test.csv", mode = 'wb')
But neither worked either. The download.file function downloaded a csv with the node description. and the rvest method gave me a huge character string on my macbook and a null data frame on my windows. I have also tried to use selectorgadget (chrome extension) to obtain only data i need, but selectorgadget does not seem to work on the htmlTable
I managed to find workaround solution using htmltab package, not sure if it's optimal though, it's big data frame for a webpage, took a while to load in data frame. table[2] is for actual table, as there're 2 html tables in link you've given.
url1 <- "https://oceanwatch.pifsc.noaa.gov/erddap/griddap/goes-poes-1d-ghrsst-RAN.htmlTable?analysed_sst[(2019-02-09T12:00:00Z):1:(2019-02-09T12:00:00Z)][(-6.975):1:(42.025)][(179.025):1:(238.025)],analysis_error[(2019-02-09T12:00:00Z):1:(2019-02-09T12:00:00Z)][(-6.975):1:(42.025)][(179.025):1:(238.025)],mask[(2019-02-09T12:00:00Z):1:(2019-02-09T12:00:00Z)][(-6.975):1:(42.025)][(179.025):1:(238.025)],sea_ice_fraction[(2019-02-09T12:00:00Z):1:(2019-02-09T12:00:00Z)][(-6.975):1:(42.025)][(179.025):1:(238.025)]"
tbls <- htmltab(url1,which = "//table[2]")
rdf <- as.data.frame(tbls)
let me know if it helps.

How can I scrape data from this website (multiple webpages) using R?

I am a beginner in scraping data from website. It seems difficult for me to interpret the structure of html using XML or other packages.
Can anyone help me to download the data from this website?
http://wszw.hzs.mofcom.gov.cn/fecp/fem/corp/fem_cert_stat_view_list.jsp
It is about the investment from China. The character set is in Chinese.
What I've tried so far:
library("rvest")
url <- "http://wszw.hzs.mofcom.gov.cn/fecp/fem/corp/fem_cert_stat_view_list.jsp"
firm <- url %>%
html() %>%
html_nodes(xpath='//*[#id="Grid1MainLayer"]/table[1]') %>%
html_table()
firm <- firm[[1]] head(firm)
You can try with the function in the XML package called readHTMLTable that should download all the tables in the page and already format it into a data.frame.
library(XML)
all_tables = readHTMLTable("http://wszw.hzs.mofcom.gov.cn/fecp/fem/corp/fem_cert_stat_view_list.jsp")
Then since there is only one table in the page you linked it should be enough to get the first element so:
target_table = all_tables[[1]]

Resources