Quantmod in R - How to work on multiple symbols efficiently? - r

I'm using quantmod to work on multiple symbols in R. My instinct is to combine the symbols into a list of xts objects, then use lapply do do what I need to do. However, some of the things that make quantmod convenient seem (to this neophyte) not to play nicely with lists. An example:
> symbols <- c("SPY","GLD")
> getSymbols(symbols)
> prices.list <- mget(symbols)
> names(prices.list) <- symbols
> returns.list <- lapply(prices.list, monthlyReturn, leading = FALSE)
This works. But it's unclear to me which column of prices it is using. If I try to specify adjusted close, it throws an error:
> returns.list <- lapply(Ad(prices.list), monthlyReturn, leading = FALSE)
Error in Ad(prices.list) :
subscript out of bounds: no column name containing "Adjusted"
The help for Ad() confirms that it works on "a suitable OHLC object," not on a list of OHLC objects. In this particular case, how can I specify that lapply should apply the monthlyReturn function to the Adjusted column?
More generally, what is the best practice for working with multiple symbols in quantmod? Is it to use lists, or is another approach better suited?

Answer monthlyReturn:
All the **Return functions are based on periodReturn. The default check of periodReturn is to make sure it is an xts objects and then takes the open price as the start value and the close price as the last value and calculates the return. If these are available at least. If these are not available it will calculate the return based on the first value of the timeseries and the last value of the timeseries, taking into account the needed time interval (month, day, year, etc).
Answer for lapply:
You want do 2 operations on a list object, so using an function inside the lapply should be used:
lapply(prices.list, function(x) monthlyReturn(Ad(x), leading = FALSE))
This will get what you want.
Answer for multiple symbols:
Do what you are doing.
run and lapply when getting the symbols:
stock_prices <- lapply(symbols, getSymbols, auto.assign = FALSE)
use packages tidyquant or BatchGetSymbols to get all the data in a big tibble.
... probably forgot a few. There are multiple SO answers about this.

Related

How do I access coredata from each object in a matrix of xts zoo objects

I used the getFX function from the Quantmod package in R to generate a vector of rates from Oanda, each in xts zoo format.
currency_pairs <- c("GBP/USD", "USD/SGD")
rates <- getFX(currency_pairs, from="2019/01/01", to="2019/01/01"
This returns a vector of xts zoo objects in the form:
(GBPUSD, USDSGD,...)
However I would like to have just the rates, since I only require the rates for one date and therefore know the timestamp.
I have tried looping over the vector like so:
for (i in 1:length(rates){
rates[i] <- coredata(rates[i])
}
but this just returns the currency pair name.
What you could do in this case, if you only retrieve data for one date, is use to sapply like this:
library(quantmod)
currency_pairs <- c("GBP/USD", "USD/SGD")
# for 1 date this will return a named vector otherwise use lapply
rates <- sapply(currency_pairs, getFX, from="2019/01/01", to="2019/01/01", auto.assign = FALSE)
rates
GBP/USD USD/SGD
1.275455 1.362920
Normally I would suggest using lapply to retrieve all the currencies in a big list and then access the list with lapply / mapply / Map / purrr::map etc.

Stock splits performance R

I need to evaluate the post split stock performance with the quantmod package in R for NYSE,AMEX,NASDAQ. My problem is that I'm only able to look up specific symbols ( getSymbols()), but I need to separate my data into non splitting firms and splitting firms, to compare them. Does anyone have an idea how I can do this for the last 25 years ?
Thanks
Since you're using the quantmod package, you can use getSplits() function to determine which splits a stock has had within a certain timeframe. Since you're dealing with a large number of stocks, you can use a custom function to get what you want.
getSymbolSplit <- function(symbol,xts,date) {
splitCheck <- getSplits(symbol,from = date)
if(anyNA(splitCheck, recursive = FALSE)){
xts$Split <- 0
} else {
xts$Split <- 1
}
return(xts)
}
Once you have that function, you can quickly add a split to the existing stocks data. Example:
getSymbols('GOOG')
GOOG <- getSymbolSplit('GOOG',GOOG,'1993-01-01')
The getSymbols() function creates the xts named GOOG, so our function checks if any splits have happened since 1993-01-01 (yes) and adds a column Split with the value of 1.
getSymbols('REGN')
REGN <- getSymbolSplit('REGN',REGN,'1993-01-01')
Same deal, but REGN has had no splits since 1993 and the column has a value of 0.
Now you have a clear binary variable for grouping between firms that have had a split in your given timeframe.
As a warning, I encountered a problem with BRK-A. R does not normally permit names that include a '-', and the function breaks when trying to pass an xts named BRK-A. If you have stocks that use a - in their symbol, I recommend you rename them before using them. This function is not the only place where the dash could cause problems.

Can I create new xts columns from a list of names?

My objective: read data files from yahoo then perform calculations on each xts using lists to create the names of xts and the names of columns to assign results to.
Why? I want to perform the same calculations for a large number of xts datasets without having to retype separate lines to perform the same calculations on each dataset.
First, get the datasets for 2 ETFs:
library(quantmod)
# get ETF data sets for example
startDate = as.Date("2013-12-15") #Specify period of time we are interested in
endDate = as.Date("2013-12-31")
etfList <- c("IEF","SPY")
getSymbols(etfList, src = "yahoo", from = startDate, to = endDate)
To simplify coding, replace the ETF. prefix from yahoo data
colnames(IEF) <- gsub("SPY.","", colnames(SPY))
colnames(IEF) <- gsub("IEF.","", colnames(IEF))
head(IEF,2)
Open High Low Close Volume Adjusted
#2013-12-16 100.86 100.87 100.52 100.61 572400 98.36
#2013-12-17 100.60 100.93 100.60 100.93 694800 98.67
Creating new columns using the functions in quantmod is straightforward, e.g.,
SPY$logRtn <- periodReturn(Ad(SPY),period='daily',subset=NULL,type='log')
IEF$logRtn <- periodReturn(Ad(IEF),period='daily',subset=NULL,type='log')
head(IEF,2)
# Open High Low Close Volume Adjusted logRtn
#2013-12-16 100.86 100.87 100.52 100.61 572400 98.36 0.0000000
#2013-12-17 100.60 100.93 100.60 100.93 694800 98.67 0.0031467
but rather that creating a new statement to perform the calculation for each ETF, I want to use a list instead. Here's the general idea:
etfList
#[1] "IEF" "SPY"
etfColName = "logRtn"
for (etfName in etfList) {
newCol <- paste(etfName, etfColName, sep = "$"
newcol <- periodReturn(Ad(etfName),period='daily',subset=NULL,type='log')
}
Of course, using strings (obviously) doesn't work, because
typeof(newCol) # is [1] "character"
typeof(logRtn) # is [1] "double"
I've tried everything I can think of (at least twice) to coerce the character string etfName$etfColName into an object that I can assign calculations to.
I've looked at many variations that work with data.frames, e.g., mutate() from dplyr, but don't work on xts data files. I could convert datasets back/forth from xts to data.frames, but that's pretty kludgy (to say the least).
So, can anyone suggest an elegant and straightforward solution to this problem (i.e., in somewhat less than 25 lines of code)?
I shall be so grateful that, when I make enough to buy my own NFL team, you will always have a place of honor in the owner's box.
This type of task is a lot easier if you store your data in a new environment. Then you can use eapply to loop over all the objects in the environment and apply a function to them.
library(quantmod)
etfList <- c("IEF","SPY")
# new environment to store data
etfEnv <- new.env()
# use env arg to make getSymbols load the data to the new environment
getSymbols(etfList, from="2013-12-15", to="2013-12-31", env=etfEnv)
# function containing stuff you want to do to each instrument
etfTransform <- function(x, ...) {
# remove instrument name prefix from colnames
colnames(x) <- gsub(".*\\.", "", colnames(x))
# add return column
x$logRtn <- periodReturn(Ad(x), ...)
x
}
# use eapply to apply your function to each instrument
etfData <- eapply(etfEnv, etfTransform, period='daily', type='log')
(I didn't realize that you had posted a reproducible example.)
See if this is helpful:
etfColName = "logRtn"
for ( etfName in etfList ) {
newCol <- get(etfName)[ , etfColName]
assign(etfName, cbind( get(etfName),
periodReturn( Ad(get(etfName)),
period='daily',
subset=NULL,type='log')))
}
> names(SPY)
[1] "SPY.Open" "SPY.High" "SPY.Low" "SPY.Close"
[5] "SPY.Volume" "SPY.Adjusted" "logRtn" "daily.returns"
I'm not an quantmod user and it's only from the behavior I see that I believe the Ad function returns a named vector. (So I did not need to do any naming.)
R is not a macro language, which means you cannot just string together character values and expect them to get executed as though you had typed them at the command line. Theget and assign functions allow you to 'pull' and 'push' items from the data object environment on the basis of character values, but you should not use the $-function in conjunction with them.
I still do not see a connection between the creation of newCol and the actual new column that your code was attempting to create. They have different spellings so would have been different columns ... if I could have figured out what you were attempting.

how to combine raster layers from a stack into a data frame R

I have a stack made of rasters
s<-stack(list of ASCI files)
I am trying to perform this operation
df<-as.data.frame(c(s[[1]],s[[2]],s[[2]],s[["bathymetry"]]))
but I get this error
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class "structure("RasterLayer", package = "raster")" to a data.frame
When I perform this operation on a single raster such as
df<-as.data.frame(s[[1]])
everything works fine. But I have to extract many rasters and combined them in one dataframe. The only solution I see now is to extract them individually and then combined them, is there a better solution? I am working with hundreds of raster at a time.
EDIT: I should also add that this function goes inside a loop and I am only extracting a subset of the raster on each loop.
Or use...
data.frame( rasterToPoints( s ) )
Drop the columns you don't want afterwards.
To apply a function to each element of a list, the apply family commands are helpful:
lapply( s, as.data.frame )
That returns a list of data.frames.
To restrict it to only the elements that you want, just subset to a smaller list beforehand.
s_small <- s[c(1,2,3,5)]
lapply( s_small, as.data.frame )

get xts objects from within an environment

I have stored xts objects inside an environment. Can I subset these objects while they are stored in an environment, i.e. act upon them "in-place"? Can I extract these objects by referring to their colname?
Below an example of what I'm getting at.
# environment in which to store data
data <- new.env()
# Set data tickers of interest
tickers <- c("FEDFUNDS", "GDPPOT", "DGS10")
# import data from FRED database
library("quantmod")
dta <- getSymbols( tickers
, src = "FRED"
, env = data
, adjust = TRUE
)
This, however, downloads the entire dataset. Now, I want to discard some data, save it, use it (e.g. plot it). I want to keep the data within this date range:
# set dates of interest
date.start <- "2012-01-01"
date.end <- "2012-12-31"
I have two distinct objectives.
to subset all of the data inside of the environment (either
acting in-place or creating a new environment and overwriting the
old environment with it).
to take only some tickers of my choosing and to subset those,
say FEDFUNDS and DGS10, and afterwards save them in a new
environment. I also want to preserve the xts-ness of these objects, so I can conveniently plot them together or separately.
Here are some things I did manage to do:
# extract and subset a single xts object
dtx1 <- data$FEDFUNDS
dtx1 <- dtx1[paste(date.start,date.end,sep="/")]
The drawback of this approach is that I need to type FEDFUNDS explicitly after data$. But I'd like to work from a prespecified list of tickers, e.g.
tickers2 <- c("FEDFUNDS", "DGS10")
I have got one step closer to being systematic by combining the function get with the function lapply
# extract xts objects as a list
dtxl <- lapply(tickers, get, envir = data)
But this returns a list. And I'm not sure how to conveniently work with this list to subset the data, plot it, etc. How do I refer to, say, DGS10 or the pair of tickers in tickers2?
I very much wanted to write something like data$tickers[1] or data$tickers[[1]] but that didn't work. I also tried paste0('data','$',tickers[1]) and variations of it with or without quotes. At any rate, I believe that the order of the data inside an environment is not systematic, so I'd really prefer to use the ticker's name rather than its index, something like data$tickers[colnames = FEDFUNDS] None of the attempts in this paragraph have worked.
If my question is unclear, I apologize, but please do request clarification. And thanks for your attention!
EDIT: Subsetting
I've received some fantastic suggestions. GSee's answer has several very useful tricks. Here's how to subset the xts objects to within a date interval of interest:
dates <- paste(date.start, date.end, sep="/")
as.environment(eapply(data, "[", dates))
This will subset every object in an environment, and return an environment with the subsetted data:
data2 <- as.environment(eapply(data, "[", paste(date.start, date.end, sep="/")))
You can do basically the same thing for your second question. Just, name the components of the list that lapply returns by wrapping it with setNames, then coerce to an environment:
data3 <- as.environment(setNames(lapply(tickers, get, envir = data), tickers))
Or, better yet, use mget so that you don't have to use lapply or setNames
data3 <- as.environment(mget(tickers, envir = data))
Alternatively I actually have a couple convenience functions in qmao designed specifically for this: gaa stands for "get, apply, assign" and gsa stands for "get, subset, assign".
To, get data for some tickers, subset the data, and then assign into an environment
gsa(tickers, subset=paste(date.start, date.end, sep="/"), env=data,
store.to=globalenv())
gaa lets you apply any function to each object before saving in the same or different environment.
If I'm reading the question correctly, you want smth like this:
dtxl = do.call(cbind, sapply(tickers2,
function(ticker) get(ticker, env=data)[paste(date.start,date.end,sep="/")])
)

Resources