Variable lagging in R error - r

I am trying to lag some prices I downloaded from yahoo market but I don't want the lag to be fixed. I would like to have it where depending on another DF or Values the lag period changes.
This extracts and formats the data:
library("quantmod")
library("PerformanceAnalytics")
library(dplyr)
nameOfStrategy <- "GSPC MACD"
#Specify dates for downloading data, training models and running simulation
trainingStartDate = as.Date("2000-01-01")
trainingEndDate = as.Date("2010-01-01")
outofSampleStartDate = as.Date("2010-01-02")
#Download the data
symbolData <- new.env() #Make a new environment for quantmod to store data in
getSymbols("^GSPC", env = symbolData, src = "yahoo", from = trainingStartDate)
trainingData <- window(symbolData$GSPC, start = trainingStartDate, end = trainingEndDate)
testData <- window(symbolData$GSPC, start = outofSampleStartDate)
indexReturns <- Delt(Cl(window(symbolData$GSPC, start = outofSampleStartDate)))
colnames(indexReturns) <- "GSPC Buy&Hold"
And this is the code I'm using to find market signals and then organize data:
signalB <- ifelse(MACD12$macd > MACD12$signal & lag.xts(MACD12$macd) < lag.xts(MACD12$signal),1,NA)
#If fastMA > slowMA on change go long
signalS <- ifelse(MACD12$macd < MACD12$signal & lag.xts(MACD12$macd) > lag.xts(MACD12$signal),-1,NA)
#Combines Buy and sell signals
Tsignal <- merge(signalB,signalS)
#Gets number of days---dont know when period starts but doesnt matter since we just want difference in days
Tsignal$dates =indexTZ(Tsignal)
Tsignal$dates =index(Tsignal)
#Combines Buy and Sell signal into overall signal
Tsignal$Signal <- ifelse(is.na(Tsignal$Buy),ifelse(is.na(Tsignal$Sell),NA,-1),1)
Tsignal$Tdate <- 0
#Gets 'Date' only if signal either buy or sell
Tsignal$Tdate <- ifelse(!is.na(Tsignal$Signal),Tsignal$dates,NA)
#Finds difference between the Sell sig and the last signal **** In this case it will work but future buying twice before
# selling will result in not getting returns of one of the purchases
Tsignal$lag <- ifelse(Tsignal$Signal == -1,diff(na.locf(Tsignal$Tdate)),NA)
Tsignal$lag <- ifelse(is.na(Tsignal$lag),0,Tsignal$lag)
Here is the problem:
lag(Cl(trainingData),Tsignal$lag))
#Warning message:
#In if (n == 0) return(x) :
#the condition has length > 1 and only the first element will be used
It returns all of the prices but lagged back at a period of 0 (ie just returns the prices). While the Tsignal$lag has a lot of zero's in it there are values greater than 1 spread throughout.
I need it to return the same price during the dates where Tsignal$lag = 0 and return the price lagged back the number of periods Tsignal$lag specifies.
Currently I am using the dplyr lag function but I have tried other packages with the lag function and I get the same error. Writing this I am thinking I might have to do a 'for' function but I'm not sure, I am fairly new to R.
Thanks for your help ahead of time!

Both the dplyr and stats lag functions require integers for the number of lag periods (?dplyr::lag), but your Tsignal$lag is a vector. To remain in the xts domain, one suggestion is to use the lag values as direct index offsets to the close prices, as in
Cl(trainingData)[(1:nrow(trainingData))-coredata(Tsignal$lag),]

Related

R forecastML package keeps renaming outcome columns

I am trying to use the forecast ML r package to run some tests but the moment I hit this step, it renames the columns
data <- read.csv("C:\\Users\\User\\Desktop\\DG ST Forecast\\LassoTemporalForecast.csv", header=TRUE)
date_frequency <- "1 week"
dates <- seq(as.Date("2012-10-05"), as.Date("2020-10-05"), by = date_frequency)
data_train <- data[1:357,]
data_test <- data[358:429,]
outcome_col <- 1 # The column index of our DriversKilled outcome.
horizons <- c(1,2,3,4,5,6,7,8,9,10,11,12) # 4 models that forecast 1, 1:3, 1:6, and 1:12 time steps ahead.
# A lookback across select time steps in the past. Feature lags 1 through 9, for instance, will be
# silently dropped from the 12-step-ahead model.
lookback <- c(1)
# A non-lagged feature that changes through time whose value we either know (e.g., month) or whose
# value we would like to forecast.
dynamic_features <- colnames(data_train)
data_list <- forecastML::create_lagged_df(data_train,
type = "train",
outcome_col = 1,
horizons = horizons,
lookback = lookback,
date = dates[1:nrow(data_train)],
frequency = date_frequency,
dynamic_features = colnames(data_train)
)
After the data_list, here is a snapshot of what happens in the console:
Next, when I try to create windows following the name change,
windows <- forecastML::create_windows(lagged_df = data_list, window_length = 36,
window_start = NULL, window_stop = NULL,
include_partial_window = TRUE)
plot(windows, data_list, show_labels = TRUE)
this error: Can't subset columns that don't exist. x Column cases doesn't exist.
I've checked through many times based on my input data and the code previously and still can't understand why the name change occurs, if anyone is familiar with this package please assist thank you!
I'm the package author. It's difficult to tell without a reproducible example, but here's what I think is going on: Dynamic features are essentially features with a lag of 0. Dynamic features also retain their original names, as opposed to lagged features which have "_lag_n" appended to the feature name. So by setting dynamic_features to all column names you are getting duplicate columns specifically for the outcome column. My guess is that "cases" is the outcome here. Fix this by removing dynamic_features = colnames(data_train) and setting it to only those features that you really want to have a lag of 0.

R Quant Trading

I could use some help getting my code to work properly. I am trying to create a simple position signal based on the closing price being higher than the MACD, Bollinger Bands, and the Slow Stochastics. I am getting errors on line 17 onwards. I am not sure if this is because "Stock" is an xts object or not. I would like to graph the output in the end as well. Thanks!
#install.packages("quantmod")
library("quantmod")
#install.packages("FinancialInstrument")
library("FinancialInstrument")
#install.packages("PerformanceAnalytics")
library("PerformanceAnalytics")
#install.packages("TTR")
library("TTR")
#######################################
Stock <- get(getSymbols('CAT'))["2014::"]
# add the indicators
Stock$BBands <- BBands(HLC(Stock))
Stock$MACD <- MACD(HLC(Stock))
Stock$stochOSC <- stoch(Stock[,c("High","Low","Close")])
Stock$position <- ifelse(Cl(Stock) > Stock$BBands > Stock$MACD > Stock $stockOSC , 1 , -1)
Gains <- lag(Stock$position) * dailyReturn(Stock)
charts.PerformanceSummary(cbind(dailyReturn(Stock),Gains))
As Pascal mentioned in his above comment, MACD uses a univariate object. This object should be the closing price (unless you want something else) which is the third column in the HLC(Stock) named CAT.Close. The Stock$stochOSC didn't work because column names specified wrongly (CAT. should be added before High, Low and Close). Finally, & should separate multiple conditions of ifelse (note the typo in Stock$stochOSC in the question (ck instead of ch)).
Here is the code:
#install.packages("quantmod")
library("quantmod")
#install.packages("FinancialInstrument")
library("FinancialInstrument")
#install.packages("PerformanceAnalytics")
library("PerformanceAnalytics")
#install.packages("TTR")
library("TTR")
#######################################
Stock <- get(getSymbols('CAT'))["2014::"]
# add the indicators
Stock$BBands <- BBands(HLC(Stock))
Stock$MACD <- MACD(HLC(Stock)[,3])
Stock$stochOSC <- stoch(Stock[,c("CAT.High","CAT.Low","CAT.Close")])
Stock$position <- ifelse(Cl(Stock)>Stock$BBands & Stock$BBands >Stock$MACD & Stock$MACD > Stock$stochOSC , 1 , -1)
Gains <- lag(Stock$position) * dailyReturn(Stock)
charts.PerformanceSummary(cbind(dailyReturn(Stock),Gains))
You should get the following plot:

Retrieve monthly Adjusted stock quotes using the quantmod package in R

I'm learning R this semester and this is my first assignment. I want to retrieve monthly Adjusted stock quotes within a set date range using a for loop. And once I am able to do that I want to merge all the data into a data frame.
My code so far retrieves daily stock quotes for 5 stock symbols within a set date range, it assigns the object to the environment specified, and places only the .Adjusted column in the list.
Could someone point me in a better direction in obtaining the monthly quotes and am I on the right track with my code.
Thanks.
#Packages
library(quantmod)
#Data structure that contains stock quote objects
ETF_Data <- new.env()
#Assign dates to set range for stock quotes
sDate <- as.Date("2007-08-31")
eDate <- as.Date("2014-09-04")
#Assign a vector of ticker symbols.
ticker_symbol <- c("IVW","JKE","QQQ","SPYG","VUG")
#Assign number of ticker symbols.
total_ticker_symbols <- length(ticker_symbol)
#Assign empty list to for each object contained in my environment.
Temp_ETF_Data <- list()
#Assign integer value to counter.
counter <- 1L
#Loop and retrieve each ticker symbols quotes from Yahoo's API
for(i in ticker_symbol)
{
getSymbols(
i,
env = ETF_Data,
reload.Symbols = FALSE,
from = sDate,
to = eDate,
verbose = FALSE,
warnings = TRUE,
src = "yahoo",
symbol.lookup = TRUE)
#Add only Adjusted Closing Prices for each stock or object into list.
Temp_ETF_Data[[i]] <- Ad(ETF_Data[[i]])
if (counter == length(ticker_symbol))
{
#Merge all the objects of the list into one object.
ETF_Adj_Daily_Quotes <- do.call(merge, Temp_ETF_Data)
ETF_Adj_Monthly_Quotes <- ETF_Adj_Daily_Quotes[endpoints(ETF_Adj_Daily_Quotes,'months')]
}
else
{
counter <- counter + 1
}
}
There's no need for the for loop. You can loop over all the objects in the environment with eapply:
getSymbols(ticker_symbol, env=ETF_Data, from=sDate, to=eDate)
# Extract the Adjusted column from all objects,
# then merge all columns into one object
ETF_Adj_Data <- do.call(merge, eapply(ETF_Data, Ad))
# then extract the monthly endpoints
Monthly_ETF_Adj_Data <- ETF_Adj_Data[endpoints(ETF_Adj_Data,'months')]
I know that this is an old question but this answer might help potential future users seeking a better answer.
quantmod has now introduced an additional parameter to the getSymbols function called periodicity which can take the values of daily, weekly, monthly.
I tested out the following and it seems to work as desired:
getSymbols("EURGBP=X", from = starting, src = 'yahoo', periodicity = 'monthly')
just use
to.monthly(your_ticker)

Mapping multiple IDs using R

The idea is as follows. Every patient has a unique patient id, which we call hidenic_id. However this patient may be admitted to the hospital multiple times. On the other hand every entry has unique emtek_id.
Patient 110380 was admitted to the hospital 4/14/2001 11:08 and then transferred through the hospital and discharged on 4/24/2001 18:16. Now this patient again admitted on 5/11/2001 23:24 because he has different emtek_id now. He is discharged from the hospital on 5/25/2001 16:26. So you need to assign correct emtek_ids by checking the dates. If the date in the combined file is within the admission and discharge time period (or very close like 24 hours) we can assign that emtek_id.
How can I assign different emtek_IDs to entries with hidenic_id and admit time?
I had a couple ideas worth sharing.
First, make emtek_id from hidenic_id and date. Second, make the emtek_id logical for parsing, e.g., emtek_id#dataTime. Third, make the database a global vector. Depending on memory limits, there has to be a faster way than this, but it might give you a few ideas.
The main problems are handling NA values and incorrect hidenic_id, validating hidenic_id(s), and padding the IDs if you don't characters leading (which would be a quick fix). Lastly, how do you want to handle input that's incorrect but not NA/null? For instance, say you input "ID" instead of "ID12345", do you want to treat that as a call to assign a new value or prompt for a correct input XOR NA value? I will assume you only feed it correct ID inputs or NA values, but this is my trivializing assumption.
Here's some pseudo-code to start the idea. You choose how to store the data (eg. csv file then use data.table::fread()):
#this file's name is "make.hidenic_id.R"
library(data.table)
library(stringr)
set.seed(101)
#one might one a backup written, perhaps conditionally updating it every hour or so.
database.hidenic_id <<-data.table::fread("database.filename.hidenic_id.csv")
database.emtek_id <<-data.table::fread("database.filename.emtek_id.csv")
make.hidenic_Id = function(in.hidenic_id){
if(is.na(in.hidenic_id) | !(in.hidenic_id %in% database.hidenic_id)){
new.hidenic_id=NA
#conditionally make new hidenic_id
while( new.hidenic_id %in% database.hidenic_id){
new.hidenic_id = paste0("ID",str_pad(sample.int(99999, 1),5,pad=0))
}
#make new emtek_id
new.emtek_id <- paste0(new.hidenic_id, sep="#", str_sub(Sys.time(),1,16))
#update databases; e.g., c(database.emtek_id, new.emtek_id)
database.hidenic_id <<- c(database.hidenic_id, new.hidenic_id)
database.emtek_id <<- c(database.emtek_id, new.emtek_id)
}else{
new.emtek_id <- paste0(in.hidenic_id, sep="#", str_sub(Sys.time(),1,16))
# update database.emtek_id
database.emtek_id <<- c(database.emtek_id, new.emtek_id)
}
return(new.emtek_id)
}
temp = readline(prompt="Enter hidenic_id OR type \"NA\": ")
data.table::fwrite(database.emtek_id, "database.filename.emtek_id.csv")
data.table::fwrite(database.hidenic_id,"database.filename.hidenic_id.csv")
and call the file with
source("make.hidenic_id.R")
There are a lot of "good-practice" things I don't do to manage poor input data or optimizing searching, but this is a strong start. Some other good-practice would be to have longer integers or a different leading string, but you never said we could use input value to make the IDs.
You could say this was inspired by the census since everything is just one massive string per geographic ID variable.
I was intrested in your problem so I created some mock data and tried to solve the problem but I ran into some confusion myself and then posted my question, which I think is what you are asking but more general. You can see the response here: How can I tell if a time point exists between a set of before and after times
My post generates what I believe is what you are starting with and the checked answer is what I believe you are looking for. The full code is below. You will need to install zoo and IRanges.
Also, I did this in version 2.15.3. IRanges did not install properly in 3.0.0.
## package installation
source("http://bioconductor.org/biocLite.R")
biocLite("IRanges")
install.packages("zoo")
## generate the emtek and hidenic file data
library(zoo)
date_string <- paste("2001", sample(12, 10, 3), sample(28,10), sep = "-")
time_string <- c("23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26",
"23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26")
entry_emtek <- strptime(paste(date_string, time_string), "%Y-%m-%d %H:%M:%S")
entry_emtek <- entry_emtek[order(entry_emtek)]
exit_emtek <- entry_emtek + 3600 * 24
emtek_file <- data.frame(emtek_id = 1:10, entry_emtek, exit_emtek)
hidenic_id <- 110380:110479
date_string <- paste("2001", sample(12, 100, replace = TRUE), sample(28,100, replace = T), sep = "-")
time_string <- rep(c("23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26",
"23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26"),10)
hidenic_time <- strptime(paste(date_string, time_string), "%Y-%m-%d %H:%M:%S")
hidenic_time <- hidenic_time[order(hidenic_time)]
hidenic_file <- data.frame(hidenic_id, hidenic_time)
## Find the intersection of emtek and hidenic times. This part was done by user: agstudy
library(IRanges)
## create a time intervals
subject <- IRanges(as.numeric(emtek_file$entry_emtek),
as.numeric(emtek_file$exit_emtek))
## create a time intervals (start=end here)
query <- IRanges(as.numeric(hidenic_file$hidenic_time),
as.numeric(hidenic_file$hidenic_time))
## find overlaps and extract rows (both time point and intervals)
emt.ids <- subjectHits(findOverlaps(query,subject))
hid.ids <- queryHits(findOverlaps(query,subject))
cbind(hidenic_file[hid.ids,],emtek_file[emt.ids,])

Calculate Option Prices from a Multivariate XTS of Volatilities and Spot Prices

Here is my code for downloading spot prices and calculating realized volatilities for a bunch of indices.
library(quantmod)
library(PerformanceAnalytics)
library(RQuantLib)
tickers.index = c("^RUT","^STOXX50E","^HSI")
myEnv <- new.env()
getSymbols(tickers.index, src='yahoo', from = "2004-03-26", to = "2012-10-10", env = myEnv, adjust=TRUE)
index <- do.call(merge, c(eapply(myEnv, Ad), all=TRUE))
index <-na.locf(index)
#Calculate daily returns for all indices and convert to arithmetic returns
index.ret <- exp(CalculateReturns(index,method="compound")) - 1
index.ret[1,] <- 0
#Calculate realized vol for all the indices
index.realized <- xts(apply(index.ret,2,runSD,n=20), index(index.ret))*sqrt(252)
index.realized[1:19,] <- 1
What I would like to do now is to calculate a series of Put prices with the function EuropeanOption for every index, every day with the following parameters:
Underlying Price - Today's close from the index XTS
Strike Price - Yesterday's close from the index XTS
Implied Vol - Yesterday's realized vol from the index.realized XTS
All other parameters will just be constants
I have tried to implement this with various attempts using apply and etc but couldn't get it to work. I don't have to use the RQuantLib - if other functions to calculate the price of an European option can make this easier, I am fine with it. Would greatly appreciate any help.
Thank you.
OK I got it working
puts.unwind <- mapply(EuropeanOption,"put",index,na.locf(lag(index,1),fromLast=TRUE),0,0,29/365‌​,index.realized)
puts.unwind <- xts(matrix(as.numeric(puts.unwind[1,]),nrow(index),ncol(index)),index(index))
First line calculates the puts and the second line extracts only the prices and reformats into an XTS.

Resources