Calculating portfolio level returns - r

EDIT UPDATED I've found a great post by the authors of performanceAnalytics. This post basically sums up the ins-outs of cumulative portfolio returns, and as the author shows it's pretty tricky (he got it wrong too)! Here it is for reference: https://tradeblotter.wordpress.com/2014/09/25/aggregate-portfolio-contributions-through-time/
SO I've run into a little bit of a stump where my two numbers should be adding up but they are not. Here's an example dataframe with stock choices and their weightings of a portfolio for context:
stock.choices stock_weights
1 GOOG 0.150
2 AMZN 0.200
3 BA 0.250
4 FB 0.225
5 AAPL 0.175
Then I'm going to use the Return.portfolio function with wealth.index = TRUE to show the return of my portfolio.
Portfolio <- merge.xts(GOOG,AMZN,BA,FB,AAPL)
dollar_growth <- Return.portfolio(Portfolio, weights = stock_weights, wealth.index = TRUE)
I use dygraph to visualise the dollar growth.
dygraph(dollar_growth, main = "Portfolio Growth Base.$1") %>% dyAxis("y", label = "$")%>%
dyAnnotation("2017-05-01", text = May1, tooltip = "Initial Investment", width = 70, height = 18, tickHeight = -75)%>%
dyAnnotation(LastDay, text = Today, tooltip = "Percentage Increase",width = 70, attachAtBottom = TRUE) %>%
dyAxis("y", label = "Dollars USD")
For this example I'm going to use May 1st as the initial point of investment. On this portfolio I'm getting 11.5% return form May 1st - calculated by taking the current value ($1.37) and dividing that by May 1st ($1.23057) yielding 11.33% increase.
However when I use a different method I get a different answer which is strange because I would have thought this second method was the accurate way of calculating the return of a portfolio.
Firstly I create a dataframe which has the stock values at May 1st and their current values. Then I multiply both by their respective weighting in the portfolio. Here's the output:
May1 Current Stock.Weights May1C CurrentC
GOOG 912.57 926.50 0.150 136.8855 138.97500
AMZN 948.23 965.90 0.200 189.6460 193.18000
BA 182.39 238.78 0.250 45.5975 59.69500
FB 152.46 170.95 0.225 34.3035 38.46375
AAPL 146.58 158.63 0.175 25.6515 27.76025
May1C = May1 * Stock.Weights | CurrentC = Current * Stock.Weights
Now when I sum both May1C and CurrentC I get:
> sum(df$May1C)
[1] 432.084
> sum(df$CurrentC)
[1] 458.074
Which I would think would be the current value of the portfolio as it is the stock choices * their respective weights. This yields only an increase of 6.015%.
My question is: How is the return.portfolio function returning an 11.3% increase, where as the second method is returning a 6.015%?
Edit in reply to the comments I have found that when using return.portfolio the verbose = TRUE function returns the stock weights changing over time. This output shows the weights changing overtime with EOP and BOP.
For reference, here's the complete code to run the dygraph output:
library(PerformanceAnalytics)
library(quantmod)
library(dygraphs)
library(scales)
daily_stock_returns = function(ticker) {
symbol <- getSymbols(ticker, src = 'google', auto.assign = FALSE, warnings = FALSE)
symbol <- xts::last(symbol, "1 year")
data <- periodReturn(symbol, period = 'daily', type = 'log')
colnames(data) <- as.character(ticker)
assign(ticker, data, .GlobalEnv)
}
daily_stock_returns("GOOG")
daily_stock_returns("AMZN")
daily_stock_returns("BA")
daily_stock_returns("FB")
daily_stock_returns("AAPL")
Portfolio <- merge.xts(GOOG,AMZN,BA,FB,AAPL)
test <- periodReturn(Portfolio, period = 'monthly', type = 'log')
stock_weights <- c(.15, .20, .25, .225, .175)
dollar_growth <- Return.portfolio(Portfolio, weights = stock_weights, wealth.index = TRUE)
May1 <- as.numeric(dollar_growth["2017-05-01"])
format(round(May1, 3), nsmall = 2)
Today <- as.numeric(xts::last(dollar_growth, "1 day"))
Today <- ((Today/May1)-1) %>% percent()
format(round(May1, 3), nsmall = 2)
LastDay <- xts::last(dollar_growth, "1 day")
dygraph(dollar_growth, main = "Portfolio Growth Base.$1")

If you want to see the Dollar value of the portfolio components and the total portfolio you can do the following. Assuming what you want is investing in a portfolio on “DayStart (2017-01-01)” with allocation “alloc (.15, .20, .25, .225, .175)" and then WITHOUT rebalancing let it run it’s course until “DayEnd (2017-05-01)”:
initial alloc(e.g.1000 USD) for GOOG, AMZN, BA, FB, AAPL: 150, 200, 250, 225, 175
taking your portfolio returns “Portfolio” ( I took ‘discrete' returns not ‘log’):
startCapital <- c(150, 200, 250, 225, 175)
portDollar <- cumprod(1+Portfolio["::2017-05-01”]) * startCapital
portDollar <- cbind(portDollar,portf=rowSums(portDollar))
You can now plot the portfolio value in Dollars or convert it back to returns.
both(portDollar)
GOOG AMZN BA FB AAPL portf
2017-01-03 151.4052 248.5942 175.7486 201.4256 225.6790 1002.853
2017-01-04 202.0686 224.7743 152.2168 255.6943 175.3316 1010.086
2017-01-05 254.8609 180.1164 203.0709 233.9321 151.0465 1023.027
GOOG AMZN BA FB AAPL portf
2017-04-27 195.9950 241.4572 262.7753 190.4188 309.3954 1200.042
2017-04-28 173.9812 303.9860 206.1689 258.2377 278.1846 1220.558
2017-05-01 233.6613 280.3763 174.3678 327.5105 220.7346 1236.650

Related

R: How to construct an index which states for each country pair the correlation between the yes votes?

I have a dataset which states for each UN resolution the country and the vote:
ResolutionID: 1,2,3,...
Country: US, CA, MX, ...
vote: yes, no, abstain
Dataset
I want to create an variable calculating for each country pair (e.g. US-CA, US-MX, MX-CA,...) the correlation of their voting records. Thus, providing an index for each country pairs friendship or strategic alliance.
What R Code do I have to used?
Citeation of the dataset: Erik Voeten "Data and Analyses of Voting in the UN General Assembly" Routledge Handbook of International Organization, edited by Bob Reinalda (published May 27, 2013)
What R Code do I have to used?
Using the algorithm for calculating the 'index of agreement' as proposed by Lijphart (1963), the below might be what you're after.
## set up sample data
set.seed(32446)
test = data.frame(rcid = rep(seq(3), each=10), country = rep(letters[seq(10)], 3), vote = sample(c("yes","no"),30, replace = T))
test$vote[sample(seq(30),3)] = "abstain" # add in a few abstentions
test = test[-sample(seq(30), 2), ] # remove some as missing
test
## set up the comparison df
allCountries = unique(test$country)
compdf = outer(allCountries, allCountries, paste)
compdf = data.frame(compdf[which(lower.tri(compdf))])
names(compdf) = "comp"
## cycle through the resolutions - use scoring as per Lijphart (1963): agreement by averaging the scores of 1 if there is agreement, 0 if the vote is opposite, and 0.5 if only one country abstains.
for(r in unique(test$rcid)){
tempVotes = data.frame(countries = allCountries,
test$vote[which(test$rcid==r)][match(allCountries, test$country[which(test$rcid==r)])])
tempVotes = outer(tempVotes[,2], tempVotes[,2],
FUN=function(x,y){
ifelse(is.na(x) | is.na(y), NA, #NA if one country didn't vote
ifelse(x==y,1, ## 1 if they agree
ifelse(x=="abstain" | y=="abstain", 0.5, # 0.5 if one side abstains
0)
) # zero otherwise
)
}
)
compdf = cbind(compdf, tempVotes[which(lower.tri(tempVotes) )] )
names(compdf)[ncol(compdf)] = paste0("resolution_",r)
}
## calculate the mean score across resolutions
result = data.frame(comp = compdf$comp,
result = rowMeans(compdf[,seq(2, ncol(compdf)) ], na.rm=T)
)
result$result = 1-result$result # make it into a distance score, rather than agreement score
## create distance matrix and plot Dendrogram
library(ggplot2)
library(ggdendro)
distance = matrix(NA, length(allCountries), length(allCountries))
distance[which(lower.tri(distance))] = result$result
rownames(distance) = allCountries; colnames(distance) = allCountries
distance
cluster = hclust(as.dist(distance))
ggdendrogram(cluster, rotate = FALSE, size = 2)
Lijphart, A. The Analysis of Bloc Voting in the General Assembly: A Critique and a Proposal. The American Political Science Review Vol. 57, No. 4 (Dec., 1963), pp. 902-917 https://www.jstor.org/stable/1952608

Best function for modelling diminishing returns

I am visiting a bird sanctuary that has many different species of birds. Some species are more numerous while other species are less numerous. I came back to the sanctuary 9 times and after every visit I am calculating the total number of species I observed. Unsurprisingly, there is a diminishing return in my visits, since I observe the most numerous species on my every visit, but it does not increase the count of observed species. What is the best function in R to predict how many birds I will observe on my 20th visit?
Here is the data.frame
d <- structure(list(visit = 1:9,
totalNumSpeciesObserved = c(200.903, 296.329, 370.018, 431.59, 485.14, 533.233, 576.595, 616.536, 654)),
class = "data.frame", row.names = c(NA, 9L))
I expect to see a model that fits data well and behaves in a "log-like" fashion, predicting diminishing returns
In order to best ask a question, stack has some good links: https://stackoverflow.com/help/how-to-ask
If you're trying to model this, I might take the approach of a regression on the square root of the independent variable based on the data. Kind of strange to think about it as a function of visits though... Maybe if it were even spaced time periods it would make more sense.
d <- structure(list(visit = 1:9,
totalNumSpeciesObserved = c(200.903, 296.329, 370.018, 431.59, 485.14, 533.233, 576.595, 616.536, 654)),
class = "data.frame", row.names = c(NA, 9L))
mod <- lm(totalNumSpeciesObserved ~ I(sqrt(visit)), d)
new.df <- data.frame(visit=1:13)
out <- predict(mod, newdata = new.df)
plot(d, type = 'o',pch = 16, xlim = c(1,13), ylim = c(200,800), lwd = 2, cex = 2)
points(out, type= 'o', pch = 21, col = "blue", cex = 2)
The I() wrapper allows you to transform the independent variable on the fly, hense the use of sqrt() without needing to save a new variable.
I also don't know if this helps, but you could build a simulator to test for asymptoptic behaviour. For example you could build a population:
population <- sample(size = 1e6, LETTERS[1:20],
replace = TRUE, prob = 1/(2:21)^2)
This would say there are 20 species and decreasing probability in your population (expand as you wish).
The you could simulate visits and information about your visit. For example how large is the sample of your visit? During a visit you only see 1% of the rainforest etc.
sim_visits <- function(visits, percent_obs, population){
species_viewed <- vector()
unique_views <- vector()
for(i in 1:visits){
my_samp <- sample(x = population, size = round(percent_obs*length(population),0),
replace = FALSE)
species_viewed <- c(species_viewed, my_samp)
unique_views[i] <- length(unique(species_viewed))
}
new_observed <- unique_views - dplyr::lag(unique_views, 1, 0)
df <- data.frame(unique_views = unique_views, new_observed)
df$cummulative <- cumsum(unique_views)
df
}
And then you could draw from the simulation many times and see what distribution of values you get.
sim_visits(9, percent_obs = .001, population = population)
unique_views new_observed cummulative
1 13 13 13
2 15 2 28
3 15 0 43
4 17 2 60
5 17 0 77
6 17 0 94
7 17 0 111
8 17 0 128
9 17 0 145
And don't know if this is helpful, but I find simulation a good way to conceptualise problems like these.

Generate multiple plots in base R with loop function then concatenate by matching group variables

I have a data frame (below, my apologies for the verbose code, this is my first attempt at generating reproducible random data) that I'd like to loop through and generate individual plots in base R (specifically, ethograms) for each subject's day and video clip (e.g. subj-1/day1/clipB). After generating n graphs, I'd like to concatenate a PDF for each subj that includes all days + clips, and have each row correspond to a single day. I haven't been able to get past the generating individual graphs, however, so any help would be greatly appreciated!
Data frame
n <- 20000
library(stringi)
test <- as.data.frame(sprintf("%s", stri_rand_strings(n, 2, '[A-Z]')))
colnames(test)<-c("Subj")
test$Day <- sample(1:3, size=length(test$Subj), replace=TRUE)
test$Time <- sample(0:600, size=length(test$Subj), replace=TRUE)
test$Behavior <- as.factor(sample(c("peck", "eat", "drink", "fly", "sleep"), size = length(test$Time), replace=TRUE))
test$Vid_Clip <- sample(c("Clip_A", "Clip_B", "Clip_C"), size = length(test$Time), replace=TRUE)
Sample data from data frame:
> head(test)
Subj Day Time Behavior Vid_Clip
1 BX 1 257 drink Clip_B
2 NP 2 206 sleep Clip_B
3 ZF 1 278 peck Clip_B
4 MF 2 391 sleep Clip_A
5 VE 1 253 fly Clip_C
6 ID 2 359 eat Clip_C
After adapting this code, I am able to successfully generate a single plot (one at a time):
Subset single subj/day/clip:
single_subj_day_clip <- test[test$Vid_Clip == "Clip_B" & test$Subj == "AA" & test$Day == 1,]
After which, I can generate the graph I'm after by running the following lines:
beh_numb <- nlevels(single_subj_day_clip$Behavior)
mar.default <- c(5,4,4,2) + 0.1
par(mar = mar.default + c(0, 4, 0, 0))
plot(single_subj_day_clip$Time,
xlim=c(0,max(single_subj_day_clip$Time)), ylim=c(0, beh_numb), type="n",
ann=F, yaxt="n", frame.plot=F)
for (i in 1:length(single_subj_day_clip$Behavior)) {
ytop <- as.numeric(single_subj_day_clip$Behavior[i])
ybottom <- ytop - 0.5
rect(xleft=single_subj_day_clip$Subj[i], xright=single_subj_day_clip$Time[i+1],
ybottom=ybottom, ytop=ytop, col = ybottom)}
axis(side=2, at = (1:beh_numb -0.25), labels=levels(single_subj_day_clip$Behavior), las = 1)
mtext(text="Time (sec)", side=1, line=3, las=1)
Example graph from randomly generate data(sorry for link - newb SO user so until I'm at 10 reputation pts, I can't embed an image directly)
Example graph from actual data
Ideal per subject graph
Thank you all in advance for your input.
Cheers,
Dan
New and hopefully correct answer
The code is too long to post it here, so there is a link to the Dropbox folder with data and code. You can check this html document or run this .Rmd file on your machine. Please check if all required packages are installed. There is the output of the script.
There are additional problem in the analysis - some events are registered only once, at a single time point between other events. So there is no "width" of such bars. I assigned width of such events to 1000 ms, so some (around 100 per 20000 observations) of them are out of scale if they are at the beginning or at the end of the experiment (and if the width for such events is equal to zero). You can play with the code to fix this behavior.
Another problem is the different colors for the same factors on the different plots. I need some fresh air to fix it as well.
Looking into the graphs, you can notice that sometimes, it seems that some observation with a very short time are overlapping with other observations. But if you zoom the pdf to the maximum - you will see that they are not, and there is a 'holes' in underlying intervals, where they are supposed to be.
Lines, connecting the intervals for different kinds of behavior are helping to follow the timecourse of the experiment. You can uncomment corresponding parts of the code, if you wish.
Please let me know if it works.
Old answer
I am not sure it is the best way to do it, but probably you can use split() and after that lapply through your tables:
Split your data.frame by Subj, Day, and Vid_clip:
testl <- split(test, test[, c(1, 2, 5)], drop = T)
testl[[1123]]
# Subj Day Time Behavior Vid_Clip
#8220 ST 2 303 fly Clip_A
#9466 ST 2 463 fly Clip_A
#9604 ST 2 32 peck Clip_A
#10659 ST 2 136 peck Clip_A
#13126 ST 2 47 fly Clip_A
#14458 ST 2 544 peck Clip_A
Loop through the list with your data and plot to .pdf:
mar.default <- c(5,4,4,2) + 0.1
par(mar = mar.default + c(0, 4, 0, 0))
nbeh = nlevels(test$Behavior)
pdf("plots.pdf")
invisible(
lapply(testl, function(l){
plot(x = l$Time, xlim = c(0, max(l$Time)), ylim = c(0, nbeh),
type = "n", ann = F, yaxt = "n", frame.plot = F)
lapply(1:nbeh, function(i){
ytop <- as.numeric(l$Behavior[i]); ybot <- ytop - .5
rect(l$Subj[i], ybot, l$Time[i + 1], ytop, col = ybot)
})
axis(side = 2, at = 1:nbeh - .25, labels = levels(l$Behavior), las = 1)
mtext(text = "Time (sec)", side = 1, line = 3, las = 1)
})
)
dev.off()
You should probably check output here before you run code on your PC. I didn't edit much your plot-code, so please check it twice.

Renko Chart in R

I am trying to construct Renko Chart using the obtained from Yahoo finance and was wondering if there is any package to do so. I had a look at the most financial packages but was only able to find Candlestick charts.
For more information on Renko charts use the link given here
Really cool question! Apparently, there is really nothing of that sort available for R. There were some attempts to do similar things (e.g., waterfall charts) on various sites, but they all don't quite hit the spot. Soooo... I made a little weekend project out of it with data.table and ggplot.
rrenko
There are still bugs, instabilities, and visual things that I would love to optimize (and the code is full of commented out debug notes), but the main idea should be there. Open for feedback and points for improvement.
Caveats: There are still case where the data transformation screws up, especially if the size is very small or very large. This should be fixable in the near future. Also, the renko() function at the moment expects a dataframe with two columns: date (x-axis) and close (y-axis).
Installation
devtools::install_github("RomanAbashin/rrenko")
library(rrenko)
Code
renko(df, size = 5, style = "modern") +
scale_y_continuous(breaks = seq(0, 150, 10)) +
labs(x = "", y = "")
renko(df, size = 5, style = "classic") +
scale_y_continuous(breaks = seq(0, 150, 10)) +
labs(x = "", y = "")
Data
set.seed(1702)
df <- data.frame(date = seq.Date(as.Date("2014-05-02"), as.Date("2018-05-04"), by = "week"),
close = abs(100 + cumsum(sample(seq(-4.9, 4.9, 0.1), 210, replace = TRUE))))
> head(df)
date close
1: 2014-05-02 104.0
2: 2014-05-09 108.7
3: 2014-05-16 111.5
4: 2014-05-23 110.3
5: 2014-05-30 108.9
6: 2014-06-06 106.5
I'm R investment developer, I used some parts of Roman's code to optimize some lines of my Renko code. Roman's ggplot skills are awesome. The plot function was just possible because of Roman's code.
If someone is interesting:
https://github.com/Kinzel/k_rrenko
It will need the packages: xts, ggplot2 and data.table
"Ativo" need to be a xts, with one of columns named "close" to work.
EDIT:
After TeeKea request, how to use it is simple:
"Ativo" is a EURUSD xts 15-min of 2015-01-01 to 2015-06-01. If the "close" column is not found, it will be used the last one.
> head(Ativo)
Open High Low Close
2015-01-01 20:00:00 1.20965 1.21022 1.20959 1.21006
2015-01-01 20:15:00 1.21004 1.21004 1.20979 1.21003
2015-01-01 20:30:00 1.21033 1.21041 1.20982 1.21007
2015-01-01 20:45:00 1.21006 1.21007 1.20978 1.21002
2015-01-01 21:00:00 1.21000 1.21002 1.20983 1.21002
2015-01-02 00:00:00 1.21037 1.21063 1.21024 1.21037
How to use krenko_plot:
krenko_plot(Ativo, 0.01,withDates = F)
Link to image krenko_plot
Compared to plot.xts
plot.xts(Ativo, type='candles')
Link to image plot.xts
There are two main variables: size and threshold.
"size" is the size of the bricks. Is needed to run.
"threshold" is the threshold of new a brick. Default is 1.
The first brick is removed to ensure reliability.
Here's a quick and dirty solution, adapted from a python script here.
# Get some test data
library(rvest)
url <- read_html("https://coinmarketcap.com/currencies/bitcoin/historical-data/?start=20170602&end=20181126")
df <- url %>% html_table() %>% as.data.frame()
# Make sure to have your time sequence the right way up
data <- apply(df[nrow(df):1, 3:4], 1, mean)
# Build the renko function
renko <- function(data, delta){
pre <- data[1]
xpos <- NULL
ypos <- NULL
xneg <- NULL
yneg <- NULL
for(i in 1:length(data)){
increment <- data[i] - pre
incrementPerc <- increment / pre
pre <- data[i]
if(incrementPerc > delta){
xpos <- c(xpos, i)
ypos <- c(ypos, data[i])
}
if(incrementPerc < -delta){
xneg <- c(xneg, i)
yneg <- c(yneg, data[i])
}
}
signal <- list(xpos = xpos,
ypos = unname(ypos),
xneg = xneg,
yneg = unname(yneg))
return(signal)
}
# Apply the renko function and plot the outcome
signals <- renko(data = data, delta = 0.05)
plot(1:length(data), data, type = "l")
points(signals$xneg, signals$yneg, col = "red", pch = 19)
points(signals$xpos, signals$ypos, col = "yellowgreen", pch = 19)
NOTE: This is not a renko chart (thanks to #Roman). Buy and sell signals are displayed only. See reference mentioned above...

Trying to program trading signals in R

I new new to R and am trying to program a pair trading strategy in R.
I have already written the code for downloading the data. And have created additional columns and prepared the data. Now i need to calculate the trading signals.
My signal rules are as follows.
- If Z-Score is greater than 2.25 , Sell the pair; Buy back when Z-Score is less than 0.25.
- If Z-Score is less than -2.25 , Buy the pair; sell (Exit) when z-score is above -0.25.
- close any open position if there is a change in signal.
When we sell a pair, we sell the first stock and buy the second stock. In this case, we sell ACC and Buy Ambujacem.
When we buy a pair, we buy the first stock and sell the second stock. In this case, we buy ACC and Sell Ambujacem.
Could anyone help me with the coding for the trading signals.
Enclosing the code.
Regards,
Subash
# Trading Code
library(quantmod)
getSymbols("ACC.NS", from=as.Date('2007-01-01'), to=as.Date('2015-07-24'))
getSymbols("AMBUJACEM.NS", from=as.Date('2007-01-01'), to=as.Date('2015-07-24'))
acc=ACC.NS[,6]
amb=AMBUJACEM.NS[,6]
t.zoo <- merge(acc, amb, all=TRUE)
t.zoo=as.data.frame(t.zoo)
typeof(t.zoo)
t.zoo=na.omit(t.zoo)
#adding columns
t.zoo$spread <- 0
t.zoo$adfTest <- 0
t.zoo$mean <- 0
t.zoo$stdev <- 0
t.zoo$zScore <- 0
t.zoo$signal <- 0
t.zoo$BuyPrice <- 0
t.zoo$SellPrice <- 0
t.zoo$LongReturn <- 0
t.zoo$ShortReturn <- 0
t.zoo$Slippage <- 0
t.zoo$TotalReturn <- 0
#preparing the data
#Calculating the pair ratio
t.zoo$pairRatio <- t.zoo$ACC.NS.Adjusted/t.zoo$AMBUJACEM.NS.Adjusted
#Calculate the log prices of the two time series
t.zoo$LogA <- log10(t.zoo$ACC.NS.Adjusted)
t.zoo$LogB <- log10(t.zoo$AMBUJACEM.NS.Adjusted)
#Calculating the spread
t.zoo$spread <- t.zoo$ACC.NS.Adjusted/t.zoo$AMBUJACEM.NS.Adjusted
#Calculating the mean
# Computes the mean using the SMA function
# choose the number of days for calculating the mean
SMAdays = 20
t.zoo$mean <- SMA(t.zoo$spread,SMAdays)
#Calculating the Std Deviation
t.zoo$stdev <- rollapply(t.zoo$spread,20,sd, fill=NA, align='right')
#Calculating the Z Score
t.zoo$zScore <- (t.zoo$pairRatio - t.zoo$mean)/t.zoo$spread
View(t.zoo)
#Calculation of trading signals and trading prices
#Trigger sell or buy signal if Z Score moves above 2.25 or below -2.25.
# Close position if Z Score reaches 0.2 or -0.2.
# close any open position if there is a change in signal.
I think the main issue was to come up with trading signals for a strategy that depends not only on the current level of indicator but also on the direction from which the indicator is crossed.
There were a number of problems with the code posted in comments, including use of single = for comparisons . So I've worked it afresh
Here's my attempt at solving this. It seems to be fine. I've added some plotting code to eyeball the results. I suggest you check the result over different periods.
This code comes after the one in the original question . Only difference is that I have kept t.zoo as an xts/zoo object and not converted it to data.frame. Also, I've multiplied zScores with 100
It generates trigger dates and also a column depicting the state of strategy. Calculating returns would be easy from there
colnames(t.zoo)
#t.zoo must be an xts object
#working on a separate xts object
sigs<- t.zoo[, c("ACC.NS.Adjusted", "AMBUJACEM.NS.Adjusted" , "zScore")]
# creating my own triggers as there are not enough good values
# buyTrig<- mean(t.zoo$zScore ,na.rm = T) - 1*sd(t.zoo$zScore ,na.rm = T)
# sellTrig<- (-1) * buyTrig
# sqOffTrig<- mean(t.zoo$zScore ,na.rm = T) - 0.5*sd(t.zoo$zScore ,na.rm = T)
# Another approach: scaling tz.zoo to fit your criterion
sigs$zScore<- sigs$zScore*100
buyTrig<- (-2.25)
sellTrig<- (-1) * buyTrig
sqOffTrig<- 0.25
cat ( buyTrig, sellTrig , sqOffTrig)
hist(sigs$zScore, breaks = 40)
abline(v=c(buyTrig,sellTrig), col="red")
abline(v=c(-sqOffTrig, sqOffTrig), col="green")
sum(sigs$zScore >= -sqOffTrig & sigs$zScore<= sqOffTrig , na.rm = T) # 139
sigs$action<- 0
sigs$mode <- NA
sigs$zLag<- lag.xts(sigs$zScore,1)
sigs[19:22,]
#these are not the real trigger dates, but they will serve our purpose
# along with na.locf
buyTrigDays<- time(sigs[sigs$zScore<= buyTrig & sigs$zLag > buyTrig, ])
sellTrigDays<- time(sigs[sigs$zScore>= sellTrig & sigs$zLag < sellTrig, ])
#square offs
buySqOffDays<- time( sigs[sigs$zScore>= (-1*sqOffTrig) & sigs$zLag < (-1*sqOffTrig), ] )
buySqOffDays
sellSqOffDays<- time( sigs[sigs$zScore<= (sqOffTrig) & sigs$zLag > (sqOffTrig), ] )
sellSqOffDays
sigs$mode[buyTrigDays]=1 ; sigs$mode[sellTrigDays]= -1;
sigs$mode[buySqOffDays]=0 ; sigs$mode[sellSqOffDays]= 0;
sigs$mode
# use local fill to repeat these triggered position into future
# till you meet another non NA value
sigs$mode<- na.locf(sigs$mode, fromLast = F)
plot((sigs$zScore["2015"] ))
points(sigs$zScore[sigs$mode==1], col="red", on=1, pch = 19)
points(sigs$zScore[sigs$mode==-1], col="green", on=1 , pch = 19)
points(sigs$zScore[sigs$mode==0], col="blue", on=1)
sum(is.na(sigs$mode))
#now to get the real dates when square off is triggered
trigdays<- time( sigs[diff(sigs$mode,1) != 0, ] ) #when the value changes
squareOffTrigger_real<- time(sigs[sigs$mode==0][trigdays])
buyTrigger_real<- time(sigs[sigs$mode==1] [trigdays])
sellTrigger_real<- time(sigs[sigs$mode==-1][trigdays])
#check
length(sellTrigger_real) + length(buyTrigger_real) == length(squareOffTrigger_real)
plot(sigs$zScore["2015"])
points(sigs$zScore[buyTrigger_real] , col="blue", pch = 19, on=1)
points(sigs$zScore[sellTrigger_real] , col="red", pch = 19, on=1)
points(sigs$zScore[squareOffTrigger_real] , col="green", pch = 19, on=1)
abline(h=c(-sqOffTrig, sqOffTrig) , col= "green" )
# further calculations can be easily made using either the mode
# column or the trigger dates computed at the end

Resources