available.packages by publication date - r

Is it possible to get the publication date of CRAN packages from within R? I would like to get a list of the k most recently published CRAN packages, or alternatively all packages published after date dd-mm-yy. Similar to the information on the available_packages_by_date.html?
The available.packages() command has a "fields" argument, but this only extracts fields from the DESCRIPTION. The date field on the package description is not always up-to-date.
I can get it with a smart regex from the html page, but I am not sure how reliable and up-to-date the this html file is... At some point Kurt might decide to give the layout a makeover which would break the script. An alternative is to use timestamps from the CRAN FTP but I am also not sure how good this solution is. I am not sure if there is somewhere a formally structured file with publication dates? I assume the HTML page is automatically generated from some DB.

Turns out there is an undocmented file "packages.rds" which contains the publication dates (not times) of all packages. I suppose these data are used to recreate the HTML file every day.
Below a simple function that extracts publication dates from this file:
recent.packages.rds <- function(){
mytemp <- tempfile();
download.file("http://cran.r-project.org/web/packages/packages.rds", mytemp);
mydata <- as.data.frame(readRDS(mytemp), row.names=NA);
mydata$Published <- as.Date(mydata[["Published"]]);
#sort and get the fields you like:
mydata <- mydata[order(mydata$Published),c("Package", "Version", "Published")];
}

The best approach is to take advantage of the fact the package DESCRIPTION is published on the cran mirror, and since the DESCRIPTION is from the build package, it contains information about exactly when it was packaged:
pkgs <- unname(available.packages()[, 1])[1:20]
desc_urls <- paste("http://cran.r-project.org/web/packages/", pkgs, "/DESCRIPTION", sep = "")
desc <- lapply(desc_urls, function(x) read.dcf(url(x)))
sapply(desc, function(x) x[, "Packaged"])
sapply(desc, function(x) x[, "Date/Publication"])
(I'm restricting it to the first 20 packages here to illustrate the basic idea)

Here a function that uses the HTML and regular expressions. I still rather get the information from a more formal place though in case the HTML ever changes layout.
recent.packages <- function(number=10){
#html is malformed
maxlines <- number*2 + 11
mytemp <- tempfile()
if(getOption("repos") == "#CRAN#"){
repo <- "http://cran.r-project.org"
} else {
repo <- getOption("repos");
}
newurl <- paste(repo,"/web/packages/available_packages_by_date.html", sep="");
download.file(newurl, mytemp);
datastring <- readLines(mytemp, n=maxlines)[12:maxlines];
#we only find packages from after 2010-01-01
myexpr1 <- '201[0-9]-[0-9]{2}-[0-9]{2} </td> <td> <a href="../../web/packages/[a-zA-Z0-9\\.]{2,}/'
myexpr2 <- '^201[0-9]-[0-9]{2}-[0-9]{2}'
myexpr3 <- '[a-zA-Z0-9\\.]{2,}/$'
newpackages <- unlist(regmatches(datastring, gregexpr(myexpr1, datastring)));
newdates <- unlist(regmatches(newpackages, gregexpr(myexpr2, newpackages)));
newnames <- unlist(regmatches(newpackages, gregexpr(myexpr3, newpackages)));
newdates <- as.Date(newdates);
newnames <- substring(newnames, 1, nchar(newnames)-1);
returndata <- data.frame(name=newnames, date=newdates);
return(head(returndata, number));
}

So here a solution that uses the dir listing from the FTP. It is a little tricky because the FTP gives the date in linux format with either a timestamp or a year. Other than that it does it's job. I'm still not convinced this is reliable though. If packages are copied over to another server all timestmaps might be reset.
recent.packages.ftp <- function(){
setwd(tempdir())
download.file("ftp://cran.r-project.org/pub/R/src/contrib/", destfile=tempfile(), method="wget", extra="--no-htmlify");
#because of --no-htmlify the destfile argument does not work
datastring <- readLines(".listing");
unlink(".listing");
myexpr1 <- "(?<date>[A-Z][a-z]{2} [0-9]{2} [0-9]{2}:[0-9]{2}) (?<name>[a-zA-Z0-9\\.]{2,})_(?<version>[0-9\\.-]*).tar.gz$"
matches <- gregexpr(myexpr1, datastring, perl=TRUE);
packagelines <- as.logical(sapply(regmatches(datastring, matches), length));
#subset proper lines
matches <- matches[packagelines];
datastring <- datastring[packagelines];
N <- length(matches)
#from the ?regexpr manual
parse.one <- function(res, result) {
m <- do.call(rbind, lapply(seq_along(res), function(i) {
if(result[i] == -1) return("")
st <- attr(result, "capture.start")[i, ]
substring(res[i], st, st + attr(result, "capture.length")[i, ] - 1)
}))
colnames(m) <- attr(result, "capture.names")
m
}
#parse all records
mydf <- data.frame(date=rep(NA, N), name=rep(NA, N), version=rep(NA,N))
for(i in 1:N){
mydf[i,] <- parse.one(datastring[i], matches[[i]]);
}
row.names(mydf) <- NULL;
#convert dates
mydf$date <- strptime(mydf$date, format="%b %d %H:%M");
#So linux only displays dates for packages of less then six months old.
#However strptime will assume the current year for packages that don't have a timestamp
#Therefore for dates that are in the future, we subtract a year. We can use some margin for timezones.
infuture <- (mydf$date > Sys.time() + 31*24*60*60);
mydf$date[infuture] <- mydf$date[infuture] - 365*24*60*60;
#sort and return
mydf <- mydf[order(mydf$date),];
row.names(mydf) <- NULL;
return(mydf);
}

You could process the page http://cran.r-project.org/src/contrib/, and split the fields by whitespace in order to obtain the fully specified package source filename, which includes the version # and a .gz suffix.
There are a few other items in the list that are not package files, such as the .rds files, various subdirectories, and so on.
Barring changes in how the directory structure is presented or the locations of the files, I can't think of anything more authoritative than this.

Related

Downloading Yahoo stock prices in R - date setting

I am downloading data on stocks from yahoo finance with tseries package. The issue is that I am not getting the most recent date - last price is always for 2 days ago.
Below is my code, can you please advise what I should correct to get all the available prices?
Thank you!
`dir <- "D:/Yahoo stock prices" #location
setwd(dir)
# Packages needed
require(tseries)
require(zoo)
YH <- read.csv2(file="SBI.csv",header=T, sep=";", dec=".")
date <- "2012-09-20"
penny_stocks <- c("SMDS.L", "MNDI.L", "SKG.L")
prices <- NULL
for(i in 1:length(YH[,1])){
prices <- try(get.hist.quote(as.character(YH[i,1]),
start=date,
quote='Open'
)
,silent=TRUE
)
if(!is.character(prices)){
if(as.character(YH[i,1]) %in% penny_stocks) prices <- prices / 100
prices <- as.data.frame(prices)
prices <- cbind(rownames(prices),prices)
colnames(prices) <- c("date",as.character(YH[i,1]))
if(length(prices) > 1){
if(i == 1){
allprices <- prices
names <- c("date",as.character(YH[i,1]))
} else {
names <- append(colnames(allprices),as.character(YH[i,1]))
allprices <- merge(allprices,prices,by ="date", all.x = TRUE)
colnames(allprices) <- names
}
}
}
}
write.csv2(allprices,"Prices 200511.csv")
warnings()
`
At the moment of writing, the data available on the yahoo site is until the 2020-05-12. You need to specify the end date as by default in tseries, the end is defined as Sys.Date() - 1. So using tseries::get.hist.quote("SMDS.L", end = Sys.Date(), quote = "Open") will return the data until 2020-05-12. Now you would expect that the default would be good enough, but there are a lot of issues with the yahoo data and getting the correct last records if the data is not located in the US. There probably is a process in place that loads the data a day after closing.
Note that the default settings of tseries::get.hist.quote are slightly different than the defaults of underlying function call to quantmod::getSymbols. tseries uses the default Sys.Date() - 1, quantmod uses Sys.Date(). Also the start dates are different. tseries uses "1991-01-02" as the start date, quantmod uses "2007-01-01".

List organization by file name in R

I'm trying to create a list of data separated by month and year (40 years worth). The data currently has the name structure (Year)-(Numeric Month)-(Var).nc. I'd like to get all the data into its appropriate list created below. Not exactly sure how to proceed from here. Any guidance is appreciated.
files_nc <- list.files(pattern = ".nc")
year <- vector("list", length = 40)
month <- vector("list", length = 12)
names(year) <- c(1978:2017)
names(month) <- c("Jan","Feb","Mar","Apr","May","Jun","Jul",
"Aug","Sep","Oct","Nov","Dec")
for (i in 1:40) {
year[[i]] <- month
}
It's not entirely clear what you're asking for, but I believe this should work. I'm assuming you're loading in a list of files, and each file is associated with a year and month.
file_names <- list(names(files_nc))
file_names_split <- lapply(file_names,function(x)strsplit(x,"-"))
for(i in 1:length(file_names_split)) {
y <- which(names(year) == file_names_split[[i]][[1]][1])
m <- as.numeric(file_names_split[[i]][[1]][2])
year[[y]][m] <- files_nc[[i]]
}
In general, this method should work. If it works I'd take the time to rewrite the for loop as an apply statement.

Web scraping of key stats in Yahoo! Finance with R

Is anyone experienced in scraping data from the Yahoo! Finance key statistics page with R? I am familiar scraping data directly from html using read_html, html_nodes(), and html_text() from rvest package. However, this web page MSFT key stats is a bit complicated, I am not sure if all the stats are kept in XHR, JS, or Doc. I am guessing the data is stored in JSON. If anyone knows a good way to extract and parse data for this web page with R, kindly answer my question, great thanks in advance!
Or if there is a more convenient way to extract these metrics via quantmod or Quandl, kindly let me know, that would be a extremely good solution!
I know this is an older thread, but I used it to scrape Yahoo Analyst tables so I figure I would share.
# Yahoo webscrape Analysts
library(XML)
symbol = "HD"
url <- paste('https://finance.yahoo.com/quote/HD/analysts?p=',symbol,sep="")
webpage <- readLines(url)
html <- htmlTreeParse(webpage, useInternalNodes = TRUE, asText = TRUE)
tableNodes <- getNodeSet(html, "//table")
earningEstimates <- readHTMLTable(tableNodes[[1]])
revenueEstimates <- readHTMLTable(tableNodes[[2]])
earningHistory <- readHTMLTable(tableNodes[[3]])
epsTrend <- readHTMLTable(tableNodes[[4]])
epsRevisions <- readHTMLTable(tableNodes[[5]])
growthEst <- readHTMLTable(tableNodes[[6]])
Cheers,
Sody
I gave up on Excel a long time ago. R is definitely the way to go for things like this.
library(XML)
stocks <- c("AXP","BA","CAT","CSCO")
for (s in stocks) {
url <- paste0("http://finviz.com/quote.ashx?t=", s)
webpage <- readLines(url)
html <- htmlTreeParse(webpage, useInternalNodes = TRUE, asText = TRUE)
tableNodes <- getNodeSet(html, "//table")
# ASSIGN TO STOCK NAMED DFS
assign(s, readHTMLTable(tableNodes[[9]],
header= c("data1", "data2", "data3", "data4", "data5", "data6",
"data7", "data8", "data9", "data10", "data11", "data12")))
# ADD COLUMN TO IDENTIFY STOCK
df <- get(s)
df['stock'] <- s
assign(s, df)
}
# COMBINE ALL STOCK DATA
stockdatalist <- cbind(mget(stocks))
stockdata <- do.call(rbind, stockdatalist)
# MOVE STOCK ID TO FIRST COLUMN
stockdata <- stockdata[, c(ncol(stockdata), 1:ncol(stockdata)-1)]
# SAVE TO CSV
write.table(stockdata, "C:/Users/your_path_here/Desktop/MyData.csv", sep=",",
row.names=FALSE, col.names=FALSE)
# REMOVE TEMP OBJECTS
rm(df, stockdatalist)
When I use the methods shown here with XML library, I get a Warning
Warning in readLines(page) : incomplete final line found on
'https://finance.yahoo.com/quote/DIS/key-statistics?p=DIS'
We can use rvest and xml2 for a cleaner approach. This example demonstrates how to pull a key statistic from the key-statistics Yahoo! Finance page. Here I want to obtain the float of an equity. I don't believe float is available from quantmod, but some of the key stats values are. You'll have to reference the list.
library(xml2)
library(rvest)
getFloat <- function(stock){
url <- paste0("https://finance.yahoo.com/quote/", stock, "/key-statistics?p=", stock)
tables <- read_html(url) %>%
html_nodes("table") %>%
html_table()
float <- as.vector(tables[[3]][4,2])
last <- substr(float, nchar(float)-1+1, nchar(float))
float <-gsub("[a-zA-Z]", "", float)
float <- as.numeric(as.character(float))
if(last == "k"){
float <- float * 1000
} else if (last == "M") {
float <- float * 1000000
} else if (last == "B") {
float <- float * 1000000000
}
return(float)
}
getFloat("DIS")
[1] 1.81e+09
That's a lot of shares of Disney available.

Importing data into R from google spreadsheet

There seems to be a change in the google spreadsheet publishing options. It is no longer possible to publish to the web as csv or tab file (see this recent post). Thus the usual way to use RCurl to import data into R from a google spreadsheed does not work anymore:
require(RCurl)
u <- "https://docs.google.com/spreadsheet/pub?hl=en_GB&hl=en_GB&key=0AmFzIcfgCzGFdHQ0eEU0MWZWV200RjgtTXVMY1NoQVE&single=true&gid=4&output=csv"
tc <- getURL(u, ssl.verifypeer=FALSE)
net <- read.csv(textConnection(tc))
Does anyone have a work-around?
I just wrote a simple package to solve exactly this problem: downloading a Google sheet using just the URL.
install.packages('gsheet')
library(gsheet)
gsheet2tbl('docs.google.com/spreadsheets/d/1I9mJsS5QnXF2TNNntTy-HrcdHmIF9wJ8ONYvEJTXSNo')
More detail is here: https://github.com/maxconway/gsheet
Use the googlesheets4 package, a Google Sheets R API by Jenny Bryan. It is the best way to analyze and edit Google Sheets data in R. Not only can it pull data from Google Sheets, but you can edit the data in Google Sheets, create new sheets, etc.
The package can be installed with install.packages("googlesheets4").
There's a vignette for getting started; see her GitHub repository for more. And you also can install the latest development version of the package from that GitHub page, if desired.
I am working on a solution for this. Here is a function that works on your data as well as a few of my own Google Spreadsheets.
First, we need a function to read from Google sheets. readGoogleSheet() will return a list of data frames, one for each table found on the Google sheet:
readGoogleSheet <- function(url, na.string="", header=TRUE){
stopifnot(require(XML))
# Suppress warnings because Google docs seems to have incomplete final line
suppressWarnings({
doc <- paste(readLines(url), collapse=" ")
})
if(nchar(doc) == 0) stop("No content found")
htmlTable <- gsub("^.*?(<table.*</table).*$", "\\1>", doc)
ret <- readHTMLTable(htmlTable, header=header, stringsAsFactors=FALSE, as.data.frame=TRUE)
lapply(ret, function(x){ x[ x == na.string] <- NA; x})
}
Next, we need a function to clean the individual tables. cleanGoogleTable() removes empty lines inserted by Google, removes the row names (if they exist) and allows you to skip empty lines before the table starts:
cleanGoogleTable <- function(dat, table=1, skip=0, ncols=NA, nrows=-1, header=TRUE, dropFirstCol=NA){
if(!is.data.frame(dat)){
dat <- dat[[table]]
}
if(is.na(dropFirstCol)) {
firstCol <- na.omit(dat[[1]])
if(all(firstCol == ".") || all(firstCol== as.character(seq_along(firstCol)))) {
dat <- dat[, -1]
}
} else if(dropFirstCol) {
dat <- dat[, -1]
}
if(skip > 0){
dat <- dat[-seq_len(skip), ]
}
if(nrow(dat) == 1) return(dat)
if(nrow(dat) >= 2){
if(all(is.na(dat[2, ]))) dat <- dat[-2, ]
}
if(header && nrow(dat) > 1){
header <- as.character(dat[1, ])
names(dat) <- header
dat <- dat[-1, ]
}
# Keep only desired columns
if(!is.na(ncols)){
ncols <- min(ncols, ncol(dat))
dat <- dat[, seq_len(ncols)]
}
# Keep only desired rows
if(nrows > 0){
nrows <- min(nrows, nrow(dat))
dat <- dat[seq_len(nrows), ]
}
# Rename rows
rownames(dat) <- seq_len(nrow(dat))
dat
}
Now we are ready to read you Google sheet:
> u <- "https://docs.google.com/spreadsheets/d/0AmFzIcfgCzGFdHQ0eEU0MWZWV200RjgtTXVMY1NoQVE/pubhtml"
> g <- readGoogleSheet(u)
> cleanGoogleTable(g, table=1)
2012-Jan Mobile internet Tanzania
1 Airtel Zantel Vodacom Tigo TTCL Combined
> cleanGoogleTable(g, table=2, skip=1)
BUNDLE FEE VALIDITY MB Cost Sh/MB
1 Daily Bundle (20MB) 500/= 1 day 20 500 25.0
2 1 Day bundle (300MB) 3,000/= 1 day 300 3,000 10.0
3 Weekly bundle (3GB) 15,000/= 7 days 3,000 15,000 5.0
4 Monthly bundle (8GB) 70,000/= 30 days 8,000 70,000 8.8
5 Quarterly Bundle (24GB) 200,000/= 90 days 24,000 200,000 8.3
6 Yearly Bundle (96GB) 750,000/= 365 days 96,000 750,000 7.8
7 Handset Browsing Bundle(400 MB) 2,500/= 30 days 400 2,500 6.3
8 STANDARD <NA> <NA> 1 <NA> <NA>
Not sure if other use cases have a higher complexity or if something changed in the meantime. After publishing the spreadsheet in CSV format this simple 1-liner worked for me:
myCSV<-read.csv("http://docs.google.com/spreadsheets/d/1XKeAajiH47jAP0bPkCtS4OdOGTSsjleOXImDrFzxxZQ/pub?output=csv")
R version 3.3.2 (2016-10-31)
There is an easiest way to fetch the google sheets even if you're behind the proxy
require(RCurl)
fileUrl <- "https://docs.google.com/spreadsheets/d/[ID]/export?format=csv"
fileCSV <- getURL(fileUrl,.opts=list(ssl.verifypeer=FALSE))
fileCSVDF <- read.csv(textConnection(fileCSV))
A simpler way.
Be sure to match your URL carefully to the format of the example one here. You can get all but the /export?format=csv piece from the Google Spreadsheets edit page. Then, just manually add this piece to the URL and then use as shown here.
library(RCurl)
library(mosaic)
mydat2 <- fetchGoogle(paste0("https://docs.google.com/spreadsheets/d/",
"1mAxpSTrjdFv1UrpxwDTpieVJP16R9vkSQrpHV8lVTA8/export?format=csv"))
mydat2
Scrape the html table using httr and XML packages.
library(XML)
library(httr)
url <- "https://docs.google.com/spreadsheets/d/12MK9EFmPww4Vw9P6BShmhOolH1C45Irz0jdzE0QR3hs/pubhtml"
readSpreadsheet <- function(url, sheet = 1){
library(httr)
r <- GET(url)
html <- content(r)
sheets <- readHTMLTable(html, header=FALSE, stringsAsFactors=FALSE)
df <- sheets[[sheet]]
dfClean <- function(df){
nms <- t(df[1,])
names(df) <- nms
df <- df[-1,-1]
row.names(df) <- seq(1,nrow(df))
df
}
dfClean(df)
}
df <- readSpreadsheet(url)
df
Publish as CSV doesn't seem to be supported (or at least isn't currently supported) in the new Google Sheets, which is the default for any new sheet you create. You can, though, create a sheet in the old Google Sheets format, which does support publish as CSV, through this link... https://g.co/oldsheets.
More details on the new vs. old Sheets is here... https://support.google.com/drive/answer/3541068?p=help_new_sheets&rd=1
Thanks for this solution! Works as good as the old one. I used another fix to get rid of the blank first line. When you just exclude it, you might per accident delete a valid observation when the line is 'unfreezed'. The extra instruction in the function deletes any rows which have no time stamp.
readSpreadsheet <- function(url, sheet = 1){
library(httr)
r <- GET(url)
html <- content(r)
sheets <- readHTMLTable(html, header=FALSE, stringsAsFactors=FALSE)
df <- sheets[[sheet]]
dfClean <- function(df){
nms <- t(df[1,])
names(df) <- nms
df <- df[-1,-1]
df <- df[df[,1] != "",] ## only select rows with time stamps
row.names(df) <- seq(1,nrow(df))
df
}
dfClean(df)
}
It is still (as of May 2015) possible to get a CSV file out of Google Spreadsheets, using the hidden URL <sheeturl>/export?format=csv trick 1.
However, after solving this problem, one encounters another problem - numbers are formatted according to the locale of the sheet, e.g. you may get 1,234.15 in a "US" sheet or 1.234,15 in a "German" sheet. To decide on a sheet locale, go to File > Spreadsheet Settings in Google Docs.
Now you need to remove the decimal mark from the numeric columns so that R can parse them; depending on how large your numbers are, this may need to be done several times for each column. A simple function I wrote to accomplish this:
# helper function to load google sheet and adjust for thousands separator (,)
getGoogleDataset <- function(id) {
download.file(paste0('https://docs.google.com/spreadsheets/d/', id, '/export?format=csv'),'google-ds.csv','curl');
lines <- scan('google-ds.csv', character(0), sep="\n");
pattern<-"\"([0-9]+),([0-9]+)";
for (i in 0:length(lines)) {
while (length(grep(pattern,lines[i]))> 0) {
lines[i] <- gsub(pattern,"\"\\1\\2",lines[i]);
}
}
return(read.csv(textConnection(lines)));
}
You will need to require(utils) and have curl installed, but no other extra packages.

Downloading Yahoo stock prices in R

This is a newbie question in R. I am downloading yahoo finance monthly stock price data using R where the ticker names are read from a text file. I am using a loop to read the ticker names to download the data and putting them in a list. My problem is some ticker names may not be correct thus my code stops when it encounters this case. I want the following.
skip the ticker name if it is not correct.
Each element in the list is a dataframe. I want the ticker names to be appended to variable names in element dataframes.
I need an efficient way to create a dataframe that has the closing prices as variables.
Here is the sample code for the simplified version of my problem.
library(tseries)
tckk <- c("MSFT", "C", "VIA/B", "MMM") # ticker names defined
numtk <- length(tckk);
ustart <- "2000-12-30";
uend <- "2007-12-30" # start and end date
all_dat <- list(); # empty list to fill in the data
for(i in 1:numtk)
{
all_dat[[i]] <- xxx <- get.hist.quote(instrument = tckk[i], start=ustart, end=uend, quote = c("Open", "High", "Low", "Close"), provider = "yahoo", compression = "m")
}
The code stops at the third entry but I want to skip this ticker and move on to "MMM". I have heard about Trycatch() function but do not know how to use it.
As per question 2, I want the variable names for the first element of the list to be "MSFTopen", "MSFThigh", "MSFTlow", and "MSFTclose". Is there a better to way to do it apart from using a combination of loop and paste() function.
Finally, for question 3, I need a dataframe with three columns corresponding to closing prices. Again, I am trying to avoid a loop here.
Thank you.
Your best bet is to use quantmod and store the results as a time series (in this case, it will be xts):
library(quantmod)
library(plyr)
symbols <- c("MSFT","C","VIA/B","MMM")
#1
l_ply(symbols, function(sym) try(getSymbols(sym)))
symbols <- symbols[symbols %in% ls()]
#2
sym.list <- llply(symbols, get)
#3
data <- xts()
for(i in seq_along(symbols)) {
symbol <- symbols[i]
data <- merge(data, get(symbol)[,paste(symbol, "Close", sep=".")])
}
This also a little late...If you want to grab data with just R's base functions without dealing with any add-on packages, just use the function read.csv(URL), where the URL is a string pointing to the right place at Yahoo. The data will be pulled in as a dataframe, and you will need to convert the 'Date' from a string to a Date type in order for any plots to look nice. Simple code snippet is below.
URL <- "http://ichart.finance.yahoo.com/table.csv?s=SPY"
dat <- read.csv(URL)
dat$Date <- as.Date(dat$Date, "%Y-%m-%d")
Using R's base functions may give you more control over the data manipulation.
I'm a little late to the party, but I think this will be very helpful to other late comers.
The stockSymbols function in TTR fetches instrument symbols from nasdaq.com, and adjusts the symbols to be compatible with Yahoo! Finance. It currently returns ~6,500 symbols for AMEX, NYSE, and NASDAQ. You could also take a look at the code in stockSymbols that adjusts tickers to be compatible with Yahoo! Finance to possibly adjust some of the tickers in your file.
NOTE: stockSymbols in the version of TTR on CRAN is broken due to a change on nasdaq.com, but it is fixed in the R-forge version of TTR.
I do it like this, because I need to have the historic pricelist and a daily update file in order to run other packages:
library(fImport)
fecha1<-"03/01/2009"
fecha2<-"02/02/2010"
Sys.time()
y <- format(Sys.time(), "%y")
m <- format(Sys.time(), "%m")
d <- format(Sys.time(), "%d")
fecha3 <- paste(c(m,"/",d,"/","20",y), collapse="")
write.table(yahooSeries("GCI", from=fecha1, to=fecha2), file = "GCI.txt", sep="\t", quote = FALSE, eol="\r\n", row.names = TRUE)
write.table(yahooSeries("GCI", from=fecha2, to=fecha3), file = "GCIupdate.txt", sep="\t", quote = FALSE, eol="\r\n", row.names = TRUE)
GCI <- read.table("GCI.txt")
GCI1 <- read.table("GCIupdate.txt")
GCI <- rbind(GCI1, GCI)
GCI <- unique(GCI)
write.table(GCI, file = "GCI.txt", sep="\t", quote = FALSE, eol="\r\n", row.names = TRUE)
If your ultimate goal is to get the data.frame of three columns of closing prices, then the new package tidyquant may be better suited for this.
library(tidyquant)
symbols <- c("MSFT", "C", "VIA/B", "MMM")
# Download data in tidy format.
# Will remove VIA/B and warn you.
data <- tq_get(symbols)
# Ticker symbols as column names for closing prices
data %>%
select(.symbol, date, close) %>%
spread(key = .symbol, value = close)
This will scale to any number of stocks, so the file of 1000 tickers should work just fine!
Slightly modified from the above solutions... (thanks Shane and Stotastic)
symbols <- c("MSFT", "C", "MMM")
# 1. retrieve data
for(i in seq_along(symbols)) {
URL <- paste0("http://ichart.finance.yahoo.com/table.csv?s=", symbols[i])
dat <- read.csv(URL)
dat$Date <- as.Date(dat$Date, "%Y-%m-%d")
assign(paste0(symbols[i]," _data"), dat)
dat <- NULL
}
Unfortunately, URL "ichart.finance.yahoo.com" is dead and not working now. As I know, Yahoo closed it and it seems it will not be opened.
Several days ago I found nice alternative (https://eodhistoricaldata.com/) with an API very similar to Yahoo Finance.
Basically, for R-script described above you just need to change this part:
URL <- paste0("ichart.finance.yahoo.com/table.csv?s=", symbols[i])
to this:
URL <- paste0("eodhistoricaldata.com/api/table.csv?s=", symbols[i])
Then add an API key and it will work in the same way as before. I saved a lot of time for my R-scripts on it.
Maybe give the BatchGetSymbols library a try. What I like about it over quantmod is that you can specify a time period for your data.
library(BatchGetSymbols)
# set dates
first.date <- Sys.Date() - 60
last.date <- Sys.Date()
freq.data <- 'daily'
# set tickers
tickers <- c('FB','MMM','PETR4.SA','abcdef')
l.out <- BatchGetSymbols(tickers = tickers,
first.date = first.date,
last.date = last.date,
freq.data = freq.data,
cache.folder = file.path(tempdir(),
'BGS_Cache') ) # cache in tempdir()

Resources