The problem I am trying to solve is looping a string through R with Yahoo! finance api. This would make a bunch of data frame files, but if I could convert it into xts, that would be awesome. However, the xts part is not as important.
library(quantmod)
DB <- quantmod:::DDB_Yahoo()
for (i in length(DB$db)){
symbols <- DB$db[i] #symbols are c('AAIT', 'AAL', 'AAME', ... #Thousands Essentially
URL <- "http://ichart.finance.yahoo.com/table.csv?s=symbols"
dat[i] <- read.csv(URL[i])
dat$Date <- as.Date(dat$Date, "%Y-%m-%d")
I know that we can't have symbols in ("") quotations, but it is for logical purposes.
p.s. For this instance, I am not using quantmod functions on purpose.
x<-c('AAIT', 'AAL', 'AAME')
kk<-lapply(x,function(i) download.file(paste0("http://ichart.finance.yahoo.com/table.csv?s=",i),paste0(i,".csv")))
if you want to directly read the file:
jj<- lapply(x,function(i) read.csv(paste0("http://ichart.finance.yahoo.com/table.csv?s=",i)))
Related
Hi I am trying to download data of multiple stocks from yahoo finance using the quantmod package in R. My method is to create a function using the getSymbols function, and use lapply to apply this to a vector of ticker symbols. Below is my code
get_data <- function(x){getSymbols(x,
from = "2017-01-01",
to = "2021-03-15",
auto.assign = FALSE)}
tickers <- c("AAPL","GOOG","FB","TSLA")
mydata <- lapply(tickers,get_data)
running the last step does not give me the data, just a list with all the tickers stored as characters.
Would appreciate if anyone could tell me where my code is going wrong. Cheers
I need to modify this example code for using it with intraday data which I should get from here and from here. As I understand, the code in that example works well with any historical data (or not?), so my problem then boils down to a question of loading the initial data in a necessary format (I mean daily or intraday).
As I also understand from answers on this question, it is impossible to load intraday data with getSymbols(). I tried to download that data into my hard-drive and to get it then with a read.csv() function, but this approach didn't work as well. Finally, I found few solutions of this problem in various articles (e.g. here), but all of them seem to be very complicated and "artificial".
So, my question is how to load the given intraday data into the given code elegantly and correctly from programmer's point of view, without reinventing the wheel?
P.S. I am very new to analysis of time series in R and quantstrat thus if my question seems to be obscure let me know what you need to know to answer it.
I don't know how to do this without "reinventing the wheel" because I'm not aware of any existing solutions. It's pretty easy to do with a custom function though.
intradataYahoo <- function(symbol, ...) {
# ensure xts is available
stopifnot(require(xts))
# construct URL
URL <- paste0("http://chartapi.finance.yahoo.com/instrument/1.0/",
symbol, "/chartdata;type=quote;range=1d/csv")
# read the metadata from the top of the file and put it into a usable list
metadata <- readLines(paste(URL, collapse=""), 17)[-1L]
# split into name/value pairs, set the names as the first element of the
# result and the values as the remaining elements
metadata <- strsplit(metadata, ":")
names(metadata) <- sub("-","_",sapply(metadata, `[`, 1))
metadata <- lapply(metadata, function(x) strsplit(x[-1L], ",")[[1]])
# convert GMT offset to numeric
metadata$gmtoffset <- as.numeric(metadata$gmtoffset)
# read data into an xts object; timestamps are in GMT, so we don't set it
# explicitly. I would set it explicitly, but timezones are provided in
# an ambiguous format (e.g. "CST", "EST", etc).
Data <- as.xts(read.zoo(paste(URL, collapse=""), sep=",", header=FALSE,
skip=17, FUN=function(i) .POSIXct(as.numeric(i))))
# set column names and metadata (as xts attributes)
colnames(Data) <- metadata$values[-1L]
xtsAttributes(Data) <- metadata[c("ticker","Company_Name",
"Exchange_Name","unit","timezone","gmtoffset")]
Data
}
I'd consider adding something like this to quantmod, but it would need to be tested. I wrote this in under 15 minutes, so I'm sure there will be some issues.
I'm trying to read quickly into R a ASCII fixed column width dataset, based on a SAS import file (the file that declares the column widths, and etc).
I know I can use SAScii R package for translating the SAS import file (parse.SAScii) and actually importing (read.SAScii). It works but it is too slow, because read.SAScii uses read.fwf to do the data import, which is slow. I would like to change that for a fast import mathod, laf_open_fwf from the "LaF" package.
I'm almost there, using parse.SAScii() and laf_open_fwf(), but I'm able to correctly connect the output of parse.SAScii() to the arguments of laf_open_fwf().
Here is the code, the data is from PNAD, national household survey, 2013:
# Set working dir.
setwd("C:/User/Desktop/folder")
# installing packages:
install.packages("SAScii")
install.packages("LaF")
library(SAScii)
library(LaF)
# Donwload and unzip data and documentation files
# Data
file_url <- "ftp://ftp.ibge.gov.br/Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_anual/microdados/2013/Dados.zip"
download.file(file_url,"Dados.zip", mode="wb")
unzip("Dados.zip")
# Documentation files
file_url <- "ftp://ftp.ibge.gov.br/Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_anual/microdados/2013/Dicionarios_e_input_20150814.zip"
download.file(file_url,"Dicionarios_e_input.zip", mode="wb")
unzip("Dicionarios_e_input.zip")
# importing with read.SAScii(), based on read.fwf(): Works fine
dom.pnad2013.teste1 <- read.SAScii("Dados/DOM2013.txt","Dicionarios_e_input/input DOM2013.txt")
# importing with parse.SAScii() and laf_open_fwf() : stuck here
dic_dom2013 <- parse.SAScii("Dicionarios_e_input/input DOM2013.txt")
head(dic_dom2013)
data <- laf_open_fwf("Dados/DOM2013.txt",
column_types=????? ,
column_widths=dic_dom2013[,"width"],
column_names=dic_dom2013[,"Varname"])
I'm stuck on the last commmand, passing the importing arguments to laf_open_fwf().
UPDATE: here are two solutions, using packages LaF and readr.
Solution using readr (8 seconds)
readr is based on LaF but surprisingly faster. More info on readr here
# Load Packages
library(readr)
library(data.table)
# Parse SAS file
dic_pes2013 <- parse.SAScii("./Dicion rios e input/input PES2013.sas")
setDT(dic_pes2013) # convert to data.table
# read to data frame
pesdata2 <- read_fwf("Dados/DOM2013.txt",
fwf_widths(dput(dic_pes2013[,width]),
col_names=(dput(dic_pes2013[,varname]))),
progress = interactive()
)
Take way: readr seems to be the best option: it's faster, you don't need to worry about column types, shorter code and it shows a progress bar :)
Solution using LaF (20 seconds)
LaFis one of the (maybe THE) fastest ways to read fixed-width files in R, according to this benchmark. It tooke me 20 sec. to read the person level file (PES) into a data frame.
Here is the code:
# Parse SAS file
dic_pes2013 <- parse.SAScii("./Dicion rios e input/input PES2013.sas")
# Read .txt file using LaF. This is virtually instantaneous
pesdata <- laf_open_fwf("./Dados/PES2013.txt",
column_types= rep("character", length(dic_pes2013[,"width"])),
column_widths=dic_pes2013[,"width"],
column_names=dic_pes2013[,"varname"])
# convert to data frame. This tooke me 20 sec.
system.time( pesdata <- pesdata[,] )
Note that that I've used character in column_types. I'm not quite sure why the command returns me an error if I try integer or numeric. This shouldn't be a problem, since you can convert all columns to numeric like this:
# convert all columns to numeric
varposition <- grep("V", colnames(pesdata))
pesdata[varposition] <- sapply(pesdata[],as.numeric)
sapply(pesdata, class)
You can try the read.SAScii.sqlite, also by Anthony Damico. It's 4x faster and lead to no RAM issues (as the author himself describes). But it imports data to a SQLite self-contained database file (no SQL server needed) -- not to a data.frame. Then you can open it in R by using a dbConnection. Here it goes the GitHub adress for the code:
https://github.com/ajdamico/usgsd/blob/master/SQLite/read.SAScii.sqlite.R
In the R console, you can just run:
source("https://raw.githubusercontent.com/ajdamico/usgsd/master/SQLite/read.SAScii.sqlite.R")
It's arguments are almost the same as those for the regular read.SAScii.
I know you are asking for a tip on how to use LaF. But I thought this could also be useful to you.
I think that the best choice is to use fwf2csv() from desc package (C++ code). I will illustrate the procedure with PNAD 2013. Be aware that i'm considering that you already have the dictionary with 3 variables: beginning of the field, size of the field, variable name, AND the dara at Data/
library(bit64)
library(data.table)
library(descr)
library(reshape)
library(survey)
library(xlsx)
end_dom <- dic_dom2013$beggining + dicdom$size - 1
fwf2csv(fwffile='Dados/DOM2013.txt', csvfile='dadosdom.csv', names=dicdom$variable, begin=dicdom$beggining, end=end_dom)
dadosdom <- fread(input='dadosdom.csv', sep='auto', sep2='auto', integer64='double')
I am new to R and quantmod. I am trying to get daily data for a user defined ticker symbol, like this:
check_symbol<-"GOOG"
check_symbol2<-paste0(check_symbol,".Adjusted")
getSymbols(check_symbol)
temp<-as.vector(GOOG[,check_symbol2])
How do I keep GOOG as a variable in the as.vector(GOOG[,check_symbol2]) part of the above code?
Also, any more elegant way of doing this is much appreciated!
It seems like you'd benefit from using auto.assign=FALSE in the call to getSymbols:
check_symbol <- "GOOG"
check_symbol_data <- getSymbols(check_symbol, auto.assign=FALSE)
temp <- as.vector(Ad(check_symbol_data))
I have an object that I have created using the as.ts function in R, and now I would like a simple way to transform one of the variables and add it to the same ts object. So, for example
tsMloa <- ts(read.dta("http://www.stata-press.com/data/r12/mloa.dta"), frequency=12, start=1959)
tsMloa[, "meanLog"] <- tsMloa[,"log"] - mean(tsMloa[,"log"])
gives me a subscript out of bounds error. How can I get around this?
Firstly, you ought to consider adding require(foreign) to your example code, as it's necessary to run your code.
I don't know anything about *.dta files or their formatting, but i can tell you that if you'd like to work with time series in R, you'd do well to look into the zoo and xts family of functions.
With that in mind, try the following:
require(xts)
require(foreign)
tsMloa <- ts(read.dta("http://www.stata-press.com/data/r12/mloa.dta"), frequency=12, start=1959)
tt <- seq(as.Date("1959-01-01"), as.Date("1990-12-01"), by='mon')
tsMloa_x <- xts(unclass(tsMloa)[,1:3], order.by=tt)
tsMloa_x$meanLog <- tsMloa_x$log - mean(tsMloa_x$log)
That should do what you are looking for -- and it gives you a reason to look into the very good packages.
Doing it with zoo -- plus i've created a function to turn your integers into months.
require(foreign)
require(zoo)
Mloa <- read.dta("http://www.stata-press.com/data/r12/mloa.dta"), frequency=12, start=1959)
intToMonth <- function(intMonth, origin = "1960-01-01"){
dd <- as.POSIXlt(origin)
ddVec <- rep(dd, length(intMonth))
ddVec$mon <- ddVec$mon + intMonth%%12
ddVec$year <- ddVec$year + intMonth%/%12
ddRet <- as.Date(ddVec)
return(ddRet)
}
dateString <- intToMonth(Mloa[, 'tm'])
zMloa <- zoo(Mloa[, -2], dateString)
zMloa$meanLog <- zMloa$log - mean(zMloa$log)
As i see it, your problem is with converting the timestamps in the source file to something R understands and can work with. I found this part of adapting to R especially tricky.
The above function will take your month-integers, and turn them into a Date object. The resultant output will work with both zoo and xts as the order.by argument.
If you need to change the origin date, just supply the second argument to the function -- i.e. otherDateString <- intToMonth(timeInts, "2011-01-01").