customizing URLs in R - r

I am a day trader based in INDIA. I am using R to do my research. I want to download the End of the Day(EOD) stock prices for different stocks. I was using Quandl and quantmod but was not satisfied with them ( they are OK for historical data but not for EOD quotes). After much research I found out that the EOD for NSE(national stock exchange of india) can be found in the so called "bhav copy" that can be downloaded daily from its website. The URL, for 30th APRIL, is:
https://www.nseindia.com/content/historical/EQUITIES/2018/APR/cm30APR2018bhav.csv.zip
I have two questions:
1) If I type this in the address box of google chrome and execute, it throws a pop up window that asks where to store the csv file. How do I automate this in R? If I just enter the URL as an argument for read.csv, will it suffice?
2) The bhav copy is updated daily. I want to write a function in R that automates this downloading daily. But the URL changes daily( the above URL is only for 30th APRIL 2018). The function will take the current date as an argument. How can I create a one one map to the date and the URL for that particular date? In other words, the URL for date dt is:
https://www.nseindia.com/content/historical/EQUITIES/2018/APR/cmdtAPR2018bhav.csv.zip
the R function f(dt) should create the URL for that particular date and download the csv file.
Very many thanks for your time and effort....

download.file(url, destfile) should be what you need to download the data from the URL in R. Then you can use read.csv. You may need to use unzip() before processing it, judging by the URL you provided.
If you feel like it, you can use fread from the data.table library to pass the url directly, but if it's a zip file then the first option is probably better for you.
As for the URL and processing dates, the lubridate library will be handy for parsing dates.

Package nser solves your problem.
To download and read today's bhavcopy use bhavtoday
library(nser)
bhavtoday
To download and read historical bhavcopy of Equity segment
bhav("30042018")
bhavcopy of F&O segment
fobhav("30042018")
You can also use RSelenium to download bhavcopy zip file using function bhavs.
Package link https://cloud.r-project.org/web/packages/nser/index.html

Related

Downloading MCX data using R

I'm trying to download data on margin requirements from the MCX website using R
However, I am unable to recognise the appropriate url to use in order to download this data.
The link is here
files for different dates have seemingly different urls
for instance:
DailyMargin_20170919223427.csv
DailyMargin_20170919223104.csv
DailyMargin_20170919223039.csv
They seem to be of the form
DailyMargin_2017091922****.csv
(20170919 is the date on which I'm trying to download the data)
My code has the line:
myURL = paste("https://www.mcxindia.com/market-operations/clearing-settlement/daily-margin", "DailyMargin_2017091922","****", ".csv", sep = "")
the ****** part seems to be random.
From what I can tell, the remaining **** appears to be a timestamp when the data are created by the webpage using javascript. You will probably not be able to directly download data as it will not exist until it is created. That said, you might be able to utilize a package like rvest to do the scrapping for you.
https://stat4701.github.io/edav/2015/04/02/rvest_tutorial/

Downloading NetCDF files with R

I am a beginner in R.
I am trying to download NetCDF4 files from the NASA subset wizzard (https://disc.gsfc.nasa.gov/SSW/#keywords=TRMM_3B42RT_Daily) - in my case, I am looking for TRMM 3B42 precipitation data for South Africa. I will need to download thousands of datasets and work with them in raster format in R and therefore want to use the URL's provided by the subset wizzard. e.g.:
http://disc2.gesdisc.eosdis.nasa.gov/opendap/TRMM_RT/TRMM_3B42RT_Daily.
7/2016/10/3B42RT_Daily.20161001.7.nc4.nc4?precipitation[777:867][99:173],
precipitation_cnt[777:867][99:173],uncal_precipitation_cnt[777:867][99:173],
lat[99:173],lon[777:867]
I have tried
url1 <- "http://.."
dest <- "C:\\Users\\User\\Documents\\3_UP\\2016_Masters\\
Dissertation\\Satellite Data\\TRMM Precipitation\\TRMM 3B42 Daily RT\\Try.nc4.nc4"
download.file(url=url1, destfile=name, mode="wb")
And here I receive an error message "cannot open URL" and additional Warning Messages ending with "HTTP status was '401 Unauthorized'.
Which led me to suspect that the browser needed some login details. In Chrome- the URL works as is. in Internet Explorer (R's default) I typed in my username and password once and after that - the URL also works as is.
However, no success with R functions. The errors remain. I have tried to use other R packages and functions - however, this is my first time trying something like this and I am not seeing the light yet. e.g. most RCurl functions see illegal characters found in the URL.
Does anyone have more experience in working with these specific files ?
best regards
Marion

Download .txt from URL string in R

I am working with the EDGAR package in R to download Apple's 2005 Annual Report. This is my code to get that far:
library(edgar)
getMasterIndex(2005)
aapl<-getFilings(2005,320193, '10-K')
This was my output when I did that:
> aapl<-getFilings(2005,320193, '10-K')
Total number of filings to be downloaded=1. Do you want to download (yes/no)? yes
> aapl
Link Status
1 https://www.sec.gov/Archives/edgar/data/320193/0001104659-05-058421.txt
Download success
To me this looks like I just retrieved the URL to this particular document, I did not actually download the text file.
My next step I imagine would be to download the file based on URL. I thought doing a download.file using AAPL as my URL argument would work but I must be missing something.
Thoughts on how to download the full doc based on the URL? Thank you

r download url file with partial name

I am programming in R. I need to download a set of files from an http: address. The naming format of the file refers to a date/time period but also contains additional numbering that is not recognizable. For example for the file below the first set of numbers refers to the date 2014/10/24 at 05:10am but the second batch of numbers is not recognizable. All files on the webpage follow this standard format.
http://www.nemweb.com.au/REPORTS/CURRENT/MCCDispatch/PUBLIC_MCCDISPATCH_201410240510_0000000258279329.zip
My question is: How do I download the file with only partial name information?
For example if I wanted to download the file relating to the 6:30 time period I know that the url prefix is as below, but would not know the numbers that follow after: http://www.nemweb.com.au/REPORTS/CURRENT/MCCDispatch/PUBLIC_MCCDISPATCH_201410240630_??????????????.zip
You're actually in luck. Because you have a directory listing. Essentially, you have to download the list of links and then grep them. Here's how you would go about doing that.
library(XML)
url <- "http://www.nemweb.com.au/REPORTS/CURRENT/MCCDispatch/"
parsed <- htmlParse(url)
links <- xpathSApply(parsed, "//#href")
Now you have a list of URLs that you can search through and choose the one that's appropriate.
Hint: grep("pattern",links)

Obtaining twitter API timestamp from string ID

Recently downloaded a small (230k) dataset of tweets using the streamR package in R. I saved the workspace, quit R and today began trying to use the information but the time stamp in ALL of the tweets (the created_at column of the data frame that streamR creates) shows the time when I restarted R and loaded the workspace... How can this be? Is the timestamp dynamic or dependent on the save of the file?
Being at this point, is there any way to call back a specific string_id and return the timestamp using streamR? I could create a loop and fix the issue this way, being that it is a information that is VERY time sensitive.

Resources