I'm learning to use R's capability in technical trading with Technical Trading Rules (TTR) package. Assume a crypto portfolio and BTC its reference currency. Historical hourly data (60 period) is collected using cryptocompare.com API and converted to zoo object. The aim is to create a 14-period RSI for each crypto (and possibly visualize all in one canvas). For each crypto, I expect RSI output to be 14 NA followed by 46 calculated values. But I'm getting 360 outputs. What am I missing here?
require(jsonlite)
require(dplyr)
require(TTR)
portfolio <- c("ETH", "XMR", "IOT")
for(i in 1:length(portfolio)) {
hour_data <- fromJSON(paste0("https://min-api.cryptocompare.com/data/histohour?fsym=", portfolio[i], "&tsym=BTC&limit=60", collapse = ""))
read.zoo(hour_data$Data) %>%
RSI(n = 14) %>%
print()
}
Also, my time series data is in the following form (first column timestamp):
close high low open volumefrom volumeto
1506031200 261.20 264.97 259.78 262.74 4427.84 1162501.8
1506034800 258.80 261.20 255.68 261.20 2841.67 735725.4
Does TTR use more conventional OHLC (open, high, low, close) order?
The RSI() function expects a univariate price series. You passed it an object with 6 columns, so it converted that to a univariate vector. You need to subset the output of read.zoo() so that only the "close" column is passed to RSI().
Related
I have the following dataset which I want to make a time series object of for auto.arima forecasting:
head(df)
total_score return
1539 121.77
1074 422.18
901 -229.79
843 96.30
1101 -55.25
961 -48.28
This data set contains of 13104 rows with each row representing sentiment score of tweets and BTC return on hourly basis, i.e. first row is 2021-01-01 00:00 and second row is 2021-01-01 01:00 and so on up until 2022-06-30 23:00. I have looked up how many hours fits in this range and that is 13103. How can I make my ts function such that I can use it for forecasting purposes in R auto.arima?
Moreover, I understand that auto.arima takes homoscedastic errors, whereas I need it to work for heteroscedastic errors. I also read that for this, I might use a GARCH model. However, if my auto.arima functions results in using a order of (2,0,0), does this mean that my GARCH model should be a (0,0,2)?
PS: I am still confused on why my data seems to be stationary, I was under the impression that crypto currencies are most likely NOT stationary, that is, the returns as well. But that is something for another time.
I am learning time series analysis with R and came across these 2 functions while learning. I do understand that the output of both of these is a periodic data defined by the frequency of period and the only difference I can see is the OHLC output option in the to.period().
Other than the OHLC when a particular of these functions is to be used?
to.period and all the to.minutes, to.weekly, to.quarterly are indeed meant for OHLC data.
If you take the function to.period it will take the open from the first day of the period, the close of the last day of the period and the highest high / lowest low of the specified period. These functions work very well together with the quantmod / tidyquant / quantstrat packages. See code example 1.
If you give the to.period non-OHLC data, but a timeseries with 1 data column, you still get a sort of OHLC back. See code example 2.
Now period.apply is is more interesting. Here you can supply your own functions to be applied on the data. Especially in combination with endpoints this can be a powerful function in timeseries data if you want to aggregate your function to different time periods. The index is mostly specified with endpoints, since with endpoints you can create the index you need to get to higher time levels (from day to week / etc etc). See code example 3 and 4.
Remember to use matrix functions with period.apply if you have more than 1 column of data since xts is basicly a matrix and an index. See code example 5.
More info on this data.camp course.
library(xts)
data(sample_matrix)
zoo.data <- zoo(rnorm(31)+10,as.Date(13514:13744,origin="1970-01-01"))
# code example 1
to.quarterly(sample_matrix)
sample_matrix.Open sample_matrix.High sample_matrix.Low sample_matrix.Close
2007 Q1 50.03978 51.32342 48.23648 48.97490
2007 Q2 48.94407 50.33781 47.09144 47.76719
# same as to.quarterly
to.period(sample_matrix, period = "quarters")
sample_matrix.Open sample_matrix.High sample_matrix.Low sample_matrix.Close
2007 Q1 50.03978 51.32342 48.23648 48.97490
2007 Q2 48.94407 50.33781 47.09144 47.76719
# code example 2
to.period(zoo.data, period = "quarters")
zoo.data.Open zoo.data.High zoo.data.Low zoo.data.Close
2007-03-31 9.039875 11.31391 7.451139 10.35057
2007-06-30 10.834614 11.31391 7.451139 11.28427
2007-08-19 11.004465 11.31391 7.451139 11.30360
# code example 3 using base standard deviation in the chosen period
period.apply(zoo.data, endpoints(zoo.data, on = "quarters"), sd)
2007-03-31 2007-06-30 2007-08-19
1.026825 1.052786 1.071758
# self defined function of summing x + x for the period
period.apply(zoo.data, endpoints(zoo.data, on = "quarters"), function(x) sum(x + x) )
2007-03-31 2007-06-30 2007-08-19
1798.7240 1812.4736 993.5729
# code example 5
period.apply(sample_matrix, endpoints(sample_matrix, on = "quarters"), colMeans)
Open High Low Close
2007-03-31 50.15493 50.24838 50.05231 50.14677
2007-06-30 48.47278 48.56691 48.36606 48.45318
I have a data frame named cut like this
date Open AAPL.High ABT.High ACN.High
0007-01-04 -0.10089875 -0.730226369 1.524297464 0.619697524
0007-01-05 0.122753233 0.244001748 -0.478851673 -0.646728204
0007-01-08 -0.64223987 0.405351183 0.23971246 1.048819618
0007-01-09 0.253708795 7.179369202 1.071651455 0.187090794
I used this formula to get monthly maximum values
cut$date <- as.Date(cut$date,format="%Y-%m-%d")
RMax <- aggregate(cut[,-1],
by=list(Month=format(cut$date,"%y-%m")),
FUN=max)
but now I want to find bi monthly, tri monthly, or even 45 days maximum return value. How Can I edit this formula?
Given a dataset of months, how do I calculate the "average" month, taking into account that months are circular?
months = c(1,1,1,2,3,5,7,9,11,12,12,12)
mean(months)
## [1] 6.333333
In this dummy example, the mean should be in January or December. I see that there are packages for circular statistics, but I'm not sure whether they suit my needs here.
I think
months <- c(1,1,1,2,3,5,7,9,11,12,12,12)
library("CircStats")
conv <- 2*pi/12 ## months -> radians
Now convert from months to radians, compute the circular mean, and convert back to months. I'm subtracting 1 here assuming that January is at "0 radians"/12 o'clock ...
(res1 <- circ.mean(conv*(months-1))/conv)
The result is -0.3457. You might want:
(res1 + 12) %% 12
which gives 11.65, i.e. partway through December (since we are still on the 0=January, 11=December scale)
I think this is right but haven't checked it too carefully.
For what it's worth, the CircStats::circ.mean function is very simple -- it might not be worth the overhead of loading the package if this is all you need:
function (x)
{
sinr <- sum(sin(x))
cosr <- sum(cos(x))
circmean <- atan2(sinr, cosr)
circmean
}
Incorporating #A.Webb's clever alternative from the comments:
m <- mean(exp(conv*(months-1)*1i))
12+Arg(m)/conv%%12 ## 'direction', i.e. average month
Mod(m) ## 'intensity'
I would like to calculate the return dispersion over a long period of time. The formula I use is the formula for the equal weighted standard deviation (see here). I tried to use the sd() and apply() function, but it did not work.
Formula:
r(i,t) are i=4 stock returns (so n=4) at time (t)
R(SMI,t) is the index at time (t)
NOVARTIS R NESTLE R ROCHE UBS GROUP
2005-07-18 1.11200510 -0.14716706 -0.4210533 -0.28876340
2005-07-19 0.23668650 -0.22115748 -0.3623192 0.67176884
2005-07-20 0.07877117 -0.44378771 4.0313698 -0.47844392
2005-07-21 -0.55270571 -0.37133351 -0.8754068 0.28604262
2005-07-22 -0.23781224 -0.07443246 0.2926546 0.00000000
2005-07-25 0.23781224 0.74184316 0.4082829 -0.09525666
This is my index
SMI
2005-07-18 -0.01077012
2005-07-19 0.53767147
2005-07-20 -0.02208674
2005-07-21 -0.10192245
2005-07-22 0.01653908
2005-07-25 0.03050783
Now I want to calculate the RD for every time (t), so I get a timeseries for all RDs.
What functions, loops or other techniques should I look at? I do not want to do it by hand because the formula may be applied to bigger dataset.
I made up my own sample data because it was easier but I think this is what you're after. It uses data.table and reshape2 for the heavy lifting.
library(data.table)
library(reshape2)
#make fake data
set.seed(100)
rit<-data.table(dATE=as.POSIXct('2005-07-18')+(60*60*24*0:5),
stock1=runif(6,-1,1),
stock2=runif(6,-1,1),
stock3=runif(6,-1,1),
stock4=runif(6,-1,1))
smi<-data.table(dATE=as.POSIXct('2005-07-18')+(60*60*24*0:5),smi=runif(6,-1,1))
#to convert from a matrix like object
#(I can't quickly figure out how to pull POSIXct out of ts object
#so it's hard coded dates but will still work)
rit<-data.table(your_rit_object)
rit[,dATE=seq(from=as.POSIXct('2005-07-18'), to=as.POSIXct('2005-07-25'),by='days')
smi<-data.table(your_smi_object)
smi[,dATE=seq(from=as.POSIXct('2005-07-18'), to=as.POSIXct('2005-07-25'),by='days')
#melt table from wide to long
ritmelt<-melt(rit,id.vars="dATE")
#combine with smi table
ritmeltsmi<-merge(ritmelt,smi,by='dATE')
#implement formula
ritmeltsmi[,sqrt(sum((value-smi)^2))/.N,by=dATE]
#if you want to name the new column you could do this instead
#ritmeltsmi[,list(RD=sqrt(sum((value-smi)^2))/.N),by=dATE]