date format in aggregate command - r

I have a data frame named cut like this
date Open AAPL.High ABT.High ACN.High
0007-01-04 -0.10089875 -0.730226369 1.524297464 0.619697524
0007-01-05 0.122753233 0.244001748 -0.478851673 -0.646728204
0007-01-08 -0.64223987 0.405351183 0.23971246 1.048819618
0007-01-09 0.253708795 7.179369202 1.071651455 0.187090794
I used this formula to get monthly maximum values
cut$date <- as.Date(cut$date,format="%Y-%m-%d")
RMax <- aggregate(cut[,-1],
by=list(Month=format(cut$date,"%y-%m")),
FUN=max)
but now I want to find bi monthly, tri monthly, or even 45 days maximum return value. How Can I edit this formula?

Related

Backtesting in R for time series

I am new to the backtesting methodology - algorithm in order to assess if something works based on the historical data.Since I am new to that I am trying to keep things simple in order to understand it.So far I have understood that if let's say I have a data set of time series :
date = seq(as.Date("2000/1/1"),as.Date("2001/1/31"), by = "day")
n = length(date);n
class(date)
y = rnorm(n)
data = data.frame(date,y)
I will keep the first 365 days that will be the in sample period in order to do something with them and then I will update them with one observation at the time for the next month.Am I correct here ?
So if I am correct, I define the in sample and out of sample periods.
T = dim(data)[1];T
outofsampleperiod = 31
initialsample = T-outofsampleperiod
I want for example to find the quantile of alpha =0.01 of the empirical data.
pre = data[1:initialsample,]
ypre = pre$y
quantile(ypre,0.01)
1%
-2.50478
Now the difficult part for me is to update them in a for loop in R.
I want to add each time one observation and find again the empirical quantile of alpha = 0.01.To print them all and check the condition if is greater than the in sample quantile as resulted previously.
for (i in 1:outofsampleperiod){
qnew = quantile(1:(initialsample+i-1),0.01)
print(qnew)
}
You can create a little function that gets the quantile of column y, over rows 1 to i of a frame df like this:
func <- function(i,df) quantile(df[1:i,"y"],.01)
Then apply this function to each row of data
data$qnew = lapply(1:nrow(data),func,df=data)
Output (last six rows)
> tail(data)
date y qnew
392 2001-01-26 1.3505147 -2.253655
393 2001-01-27 -0.5096840 -2.253337
394 2001-01-28 -0.6865489 -2.253019
395 2001-01-29 1.0881961 -2.252701
396 2001-01-30 0.1754646 -2.252383
397 2001-01-31 0.5929567 -2.252065

XTS:: Help me on the usage & differences between period.apply() & to.period()

I am learning time series analysis with R and came across these 2 functions while learning. I do understand that the output of both of these is a periodic data defined by the frequency of period and the only difference I can see is the OHLC output option in the to.period().
Other than the OHLC when a particular of these functions is to be used?
to.period and all the to.minutes, to.weekly, to.quarterly are indeed meant for OHLC data.
If you take the function to.period it will take the open from the first day of the period, the close of the last day of the period and the highest high / lowest low of the specified period. These functions work very well together with the quantmod / tidyquant / quantstrat packages. See code example 1.
If you give the to.period non-OHLC data, but a timeseries with 1 data column, you still get a sort of OHLC back. See code example 2.
Now period.apply is is more interesting. Here you can supply your own functions to be applied on the data. Especially in combination with endpoints this can be a powerful function in timeseries data if you want to aggregate your function to different time periods. The index is mostly specified with endpoints, since with endpoints you can create the index you need to get to higher time levels (from day to week / etc etc). See code example 3 and 4.
Remember to use matrix functions with period.apply if you have more than 1 column of data since xts is basicly a matrix and an index. See code example 5.
More info on this data.camp course.
library(xts)
data(sample_matrix)
zoo.data <- zoo(rnorm(31)+10,as.Date(13514:13744,origin="1970-01-01"))
# code example 1
to.quarterly(sample_matrix)
sample_matrix.Open sample_matrix.High sample_matrix.Low sample_matrix.Close
2007 Q1 50.03978 51.32342 48.23648 48.97490
2007 Q2 48.94407 50.33781 47.09144 47.76719
# same as to.quarterly
to.period(sample_matrix, period = "quarters")
sample_matrix.Open sample_matrix.High sample_matrix.Low sample_matrix.Close
2007 Q1 50.03978 51.32342 48.23648 48.97490
2007 Q2 48.94407 50.33781 47.09144 47.76719
# code example 2
to.period(zoo.data, period = "quarters")
zoo.data.Open zoo.data.High zoo.data.Low zoo.data.Close
2007-03-31 9.039875 11.31391 7.451139 10.35057
2007-06-30 10.834614 11.31391 7.451139 11.28427
2007-08-19 11.004465 11.31391 7.451139 11.30360
# code example 3 using base standard deviation in the chosen period
period.apply(zoo.data, endpoints(zoo.data, on = "quarters"), sd)
2007-03-31 2007-06-30 2007-08-19
1.026825 1.052786 1.071758
# self defined function of summing x + x for the period
period.apply(zoo.data, endpoints(zoo.data, on = "quarters"), function(x) sum(x + x) )
2007-03-31 2007-06-30 2007-08-19
1798.7240 1812.4736 993.5729
# code example 5
period.apply(sample_matrix, endpoints(sample_matrix, on = "quarters"), colMeans)
Open High Low Close
2007-03-31 50.15493 50.24838 50.05231 50.14677
2007-06-30 48.47278 48.56691 48.36606 48.45318

RSI outputs in Technical Trading Rules (TTR) package

I'm learning to use R's capability in technical trading with Technical Trading Rules (TTR) package. Assume a crypto portfolio and BTC its reference currency. Historical hourly data (60 period) is collected using cryptocompare.com API and converted to zoo object. The aim is to create a 14-period RSI for each crypto (and possibly visualize all in one canvas). For each crypto, I expect RSI output to be 14 NA followed by 46 calculated values. But I'm getting 360 outputs. What am I missing here?
require(jsonlite)
require(dplyr)
require(TTR)
portfolio <- c("ETH", "XMR", "IOT")
for(i in 1:length(portfolio)) {
hour_data <- fromJSON(paste0("https://min-api.cryptocompare.com/data/histohour?fsym=", portfolio[i], "&tsym=BTC&limit=60", collapse = ""))
read.zoo(hour_data$Data) %>%
RSI(n = 14) %>%
print()
}
Also, my time series data is in the following form (first column timestamp):
close high low open volumefrom volumeto
1506031200 261.20 264.97 259.78 262.74 4427.84 1162501.8
1506034800 258.80 261.20 255.68 261.20 2841.67 735725.4
Does TTR use more conventional OHLC (open, high, low, close) order?
The RSI() function expects a univariate price series. You passed it an object with 6 columns, so it converted that to a univariate vector. You need to subset the output of read.zoo() so that only the "close" column is passed to RSI().

Return dispersion calculation in R

I would like to calculate the return dispersion over a long period of time. The formula I use is the formula for the equal weighted standard deviation (see here). I tried to use the sd() and apply() function, but it did not work.
Formula:
r(i,t) are i=4 stock returns (so n=4) at time (t)
R(SMI,t) is the index at time (t)
NOVARTIS R NESTLE R ROCHE UBS GROUP
2005-07-18 1.11200510 -0.14716706 -0.4210533 -0.28876340
2005-07-19 0.23668650 -0.22115748 -0.3623192 0.67176884
2005-07-20 0.07877117 -0.44378771 4.0313698 -0.47844392
2005-07-21 -0.55270571 -0.37133351 -0.8754068 0.28604262
2005-07-22 -0.23781224 -0.07443246 0.2926546 0.00000000
2005-07-25 0.23781224 0.74184316 0.4082829 -0.09525666
This is my index
SMI
2005-07-18 -0.01077012
2005-07-19 0.53767147
2005-07-20 -0.02208674
2005-07-21 -0.10192245
2005-07-22 0.01653908
2005-07-25 0.03050783
Now I want to calculate the RD for every time (t), so I get a timeseries for all RDs.
What functions, loops or other techniques should I look at? I do not want to do it by hand because the formula may be applied to bigger dataset.
I made up my own sample data because it was easier but I think this is what you're after. It uses data.table and reshape2 for the heavy lifting.
library(data.table)
library(reshape2)
#make fake data
set.seed(100)
rit<-data.table(dATE=as.POSIXct('2005-07-18')+(60*60*24*0:5),
stock1=runif(6,-1,1),
stock2=runif(6,-1,1),
stock3=runif(6,-1,1),
stock4=runif(6,-1,1))
smi<-data.table(dATE=as.POSIXct('2005-07-18')+(60*60*24*0:5),smi=runif(6,-1,1))
#to convert from a matrix like object
#(I can't quickly figure out how to pull POSIXct out of ts object
#so it's hard coded dates but will still work)
rit<-data.table(your_rit_object)
rit[,dATE=seq(from=as.POSIXct('2005-07-18'), to=as.POSIXct('2005-07-25'),by='days')
smi<-data.table(your_smi_object)
smi[,dATE=seq(from=as.POSIXct('2005-07-18'), to=as.POSIXct('2005-07-25'),by='days')
#melt table from wide to long
ritmelt<-melt(rit,id.vars="dATE")
#combine with smi table
ritmeltsmi<-merge(ritmelt,smi,by='dATE')
#implement formula
ritmeltsmi[,sqrt(sum((value-smi)^2))/.N,by=dATE]
#if you want to name the new column you could do this instead
#ritmeltsmi[,list(RD=sqrt(sum((value-smi)^2))/.N),by=dATE]

Sum of subvectors

My vector contains the frequency per day of a certain event in a certain month.
I want to see what run of 16 days contains the highest frequency, and I would like to extract the dates which start en and it.
vector=table(date[year(date)==2001&month(date)==05])
I know how to do this, but my method is (obviously) too primitive.
max(c(sum(vector[1:16]),sum(vector[2:17]),sum(vector[3:18]),sum(vector[4:19]),sum(vector[5:20]),sum(vector[6:21]))/sum(vector))
Edit: For reproducibility the data in vector is provided in .csv form below:
"","Var1","Freq"
"1","2001-05-06",1
"2","2001-05-08",1
"3","2001-05-09",7
"4","2001-05-10",2
"5","2001-05-11",10
"6","2001-05-12",10
"7","2001-05-13",7
"8","2001-05-14",20
"9","2001-05-15",24
"10","2001-05-16",15
"11","2001-05-17",27
"12","2001-05-18",17
"13","2001-05-19",13
"14","2001-05-20",15
"15","2001-05-21",13
"16","2001-05-22",26
"17","2001-05-23",17
"18","2001-05-24",19
"19","2001-05-25",7
"20","2001-05-26",5
"21","2001-05-27",6
"22","2001-05-28",2
"23","2001-05-29",1
"24","2001-05-31",1
Assuming the data in vector is as shown in your data example, something like
max_start <- which.max(rollmean(vector$Freq, 16, align="left"))
date_max_start <- vector$Var1[max_start]
date_max_end <- vector$Var1[max_start + 16]

Resources