R: Calculate 12-month cumulative returns - r

I have a time series dataset, and I'd like to get rolling 12-month cumulative returns. Below is what my code and data look like:
df <- data.frame(
A = c(-0.0195, 0.0079, 0.0034, 0.0394, -0.0065, 0.0034, 0.0136, 0.0683, -0.0063, -0.0537, -0.0216, -0.0036, 0.0659, -0.0377, -0.0568, 0.0039, -0.0191, 0.0028),
B = c(-0.0211, 0.0021, 0.0014, 0.0358, 0.0009, 0.0153, 0.0071, 0.0658, 0.0033, -0.0542, -0.0261, 0.0064, 0.0665, -0.0304, -0.0507, 0.0089, NA, NA),
C= c(-0.0176, 0.0144, 0.0057, 0.0442, -0.0152, -0.0105, 0.0213, 0.0712, -0.0176, -0.0531, -0.0163, -0.0154, 0.0652, NA, NA, NA, NA, NA)
)
row.names(df) <- c("2016-10-31", "2016-09-30", "2016-08-31", "2016-07-31", "2016-06-30", "2016-05-31", "2016-04-30", "2016-03-31", "2016-02-29", "2016-01-31", "2015-12-31", "2015-11-30", "2015-10-31", "2015-09-30", "2015-08-31", "2015-07-31", "2015-06-30", "2015-05-31")
> df
A B C
2016-10-31 -0.0195 -0.0211 -0.0176
2016-09-30 0.0079 0.0021 0.0144
2016-08-31 0.0034 0.0014 0.0057
2016-07-31 0.0394 0.0358 0.0442
2016-06-30 -0.0065 0.0009 -0.0152
2016-05-31 0.0034 0.0153 -0.0105
2016-04-30 0.0136 0.0071 0.0213
2016-03-31 0.0683 0.0658 0.0712
2016-02-29 -0.0063 0.0033 -0.0176
2016-01-31 -0.0537 -0.0542 -0.0531
2015-12-31 -0.0216 -0.0261 -0.0163
2015-11-30 -0.0036 0.0064 -0.0154
2015-10-31 0.0659 0.0665 0.0652
2015-09-30 -0.0377 -0.0304 NA
2015-08-31 -0.0568 -0.0507 NA
2015-07-31 0.0039 0.0089 NA
2015-06-30 -0.0191 NA NA
2015-05-31 0.0028 NA NA
I want to get 12-month cumulative returns for each month using the formula prod(1+R)-1(product of 12 individual period returns minus 1). The results should be:
A(1-Y) B(1-Y) C(1-Y)
2016-10-31 0.0198 0.0322 0.0052
2016-09-30 0.1086 0.1246 0.0898
2016-08-31 0.0585 0.0881
2016-07-31 -0.0050 0.0316
2016-06-30 -0.0390 0.0048
2016-05-31 -0.0512
2016-04-30 -0.0517
B(1-Y) only has 5 cumulative returns because B has no data prior to 2015-07-31. Thus, it does not satisfy the condition of 12 months (there are only 11 months between 2015-07-31 and 2016-05-31).
I have tried Return.cumulative(df), but this function gives cumulative returns since inception, which is not what I am looking for. Any suggestion will be appreciated!

As A.Webb mentions, rollapply gives the desired result. You can also use a for loop.
a <- b <- c <- 0
for(i in (nrow(df)-11):1){
a[i] <- prod(1 + df$A[i:(i+11)]) - 1
b[i] <- prod(1 + df$B[i:(i+11)]) - 1
c[i] <- prod(1 + df$C[i:(i+11)]) - 1
}
cum.returns <- data.frame(a,b,c)

Related

shift observation of one group into next group by id in R

Suppose I have a dataframe like so:
contracts
Dates Last.Price Last.Price.1 id carry
1 1998-11-30 94.50 98.50 QS -0.040609137
2 1998-11-30 31.32 32.13 HO -0.025210084
3 1998-12-31 95.50 98.00 QS -0.025510204
4 1998-12-31 34.00 34.28 HO -0.008168028
5 1999-01-29 100.00 100.50 QS -0.004975124
6 1999-01-29 33.16 33.42 HO -0.007779773
7 1999-02-26 100.25 100.25 QS 0.000000000
8 1999-02-26 32.29 32.37 HO -0.002471424
9 1999-02-26 10.88 11.00 CO -0.010909091
10 1999-03-31 131.50 130.75 QS 0.005736138
11 1999-03-31 44.68 44.00 HO 0.015454545
12 1999-03-31 15.24 15.16 CO 0.005277045
I want to calculate the weights of each id in each month. I have a function that does this. I use dplyr to achieve this:
library(dplyr)
library(lubridate)
contracts <- contracts %>%
mutate(Dates = ymd(Dates)) %>%
group_by(Dates) %>%
mutate(weights = weight(carry))
which gives:
contracts
Dates Last.Price Last.Price.1 id carry weights
1 1998-11-30 94.50 98.50 QS -0.040609137 0.616979910
2 1998-11-30 31.32 32.13 HO -0.025210084 0.383020090
3 1998-12-31 95.50 98.00 QS -0.025510204 0.757468623
4 1998-12-31 34.00 34.28 HO -0.008168028 0.242531377
5 1999-01-29 100.00 100.50 QS -0.004975124 0.390056023
6 1999-01-29 33.16 33.42 HO -0.007779773 0.609943977
7 1999-02-26 100.25 100.25 QS 0.000000000 NA
8 1999-02-26 32.29 32.37 HO -0.002471424 0.184703218
9 1999-02-26 10.88 11.00 CO -0.010909091 0.815296782
10 1999-03-31 131.50 130.75 QS 0.057361377 0.057361377
11 1999-03-31 44.68 44.00 HO 0.015454545 0.015454545
12 1999-03-31 15.24 15.16 CO 0.005277045 0.005277045
Now I want the lag the weights, such that the weights calculated in november are applied in december. So I essentially want to shift the weights column by group, the group being the dates. So the values in November end up being the values in December and so on.
Now I also want the shift to match by id, such that if a new id is included, the group where the id first appears will have an NA in the lagged column.
The desired output is given below:
desired
Dates Last.Price Last.Price.1 id carry weights w
1 1998-11-30 94.50 98.50 QS -0.040609137 0.616979910 NA
2 1998-11-30 31.32 32.13 HO -0.025210084 0.383020090 NA
3 1998-12-31 95.50 98.00 QS -0.025510204 0.757468623 0.61697991
4 1998-12-31 34.00 34.28 HO -0.008168028 0.242531377 0.38302009
5 1999-01-29 100.00 100.50 QS -0.004975124 0.390056023 0.75746862
6 1999-01-29 33.16 33.42 HO -0.007779773 0.609943977 0.24253138
7 1999-02-26 100.25 100.25 QS 0.000000000 NA 0.39005602
8 1999-02-26 32.29 32.37 HO -0.002471424 0.184703218 0.60994398
9 1999-02-26 10.88 11.00 CO -0.010909091 0.815296782 NA
10 1999-03-31 131.50 130.75 QS 0.057361377 0.057361377 NA
11 1999-03-31 44.68 44.00 HO 0.015454545 0.015454545 0.18470322
12 1999-03-31 15.24 15.16 CO 0.005277045 0.005277045 0.81529678
Take note of February 1999. CO has an NA because it first appears in February.
Now look at March 1999, CO has the value from Februray, QS has an NA only because the February value was NA (due to division by 0).
Can this be done?
Data:
contracts <- read.table(text = "Dates, Last.Price, Last.Price.1, id,carry
1998-11-30, 94.500, 98.500, QS, -0.0406091371
1998-11-30, 31.320, 32.130, HO, -0.0252100840
1998-12-31, 95.500, 98.000, QS, -0.0255102041
1998-12-31, 34.000, 34.280, HO, -0.0081680280
1999-01-29, 100.000, 100.500, QS, -0.0049751244
1999-01-29, 33.160, 33.420, HO, -0.0077797726
1999-02-26, 100.250, 100.250, QS, 0.0000000000
1999-02-26, 32.290, 32.370, HO, -0.0024714242
1999-02-26, 10.880, 11.000, CO, -0.0109090909
1999-03-31, 131.500, 130.750, QS, 0.0057361377
1999-03-31, 44.680, 44.000, HO, 0.0154545455
1999-03-31, 15.240, 15.160, CO, 0.0052770449", sep = ",", header = T)
desired <- read.table(text = "Dates,Last.Price,Last.Price.1,id,carry,weights,w
1998-11-30,94.5,98.5, QS,-0.0406091371,0.616979909839741,NA
1998-11-30,31.32,32.13, HO,-0.025210084,0.383020090160259,NA
1998-12-31,95.5,98, QS,-0.0255102041,0.757468623182272,0.616979909839741
1998-12-31,34,34.28, HO,-0.008168028,0.242531376817728,0.383020090160259
1999-01-29,100,100.5, QS,-0.0049751244,0.390056023188584,0.757468623182272
1999-01-29,33.16,33.42, HO,-0.0077797726,0.609943976811416,0.242531376817728
1999-02-26,100.25,100.25, QS,0,NA,0.390056023188584
1999-02-26,32.29,32.37, HO,-0.0024714242,0.184703218189261,0.609943976811416
1999-02-26,10.88,11, CO,-0.0109090909,0.815296781810739,NA
1999-03-31,131.5,130.75, QS,0.057361377,0.057361377,NA
1999-03-31,44.68,44, HO,0.0154545455,0.0154545455,0.184703218189261
1999-03-31,15.24,15.16, CO,0.0052770449,0.0052770449,0.815296782", sep = ",", header = TRUE)
weights function:
weight <- function(vec) {
neg <- which(vec<0)
w <- vec
w[neg] <- vec[vec<0] / sum(vec[vec<0])
w[-neg] <- vec[vec>=0] / sum(vec[vec>=0])
w
}
contracts %>%
group_by(Dates) %>%
mutate(weights = weight(carry)) %>%
arrange(Dates) %>%
group_by(id) %>%
mutate(w = dplyr::lag(weights)) %>%
ungroup()
# # A tibble: 12 x 7
# Dates Last.Price Last.Price.1 id carry weights w
# <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
# 1 1998-11-30 94.5 98.5 " QS" -0.0406 0.617 NA
# 2 1998-11-30 31.3 32.1 " HO" -0.0252 0.383 NA
# 3 1998-12-31 95.5 98 " QS" -0.0255 0.757 0.617
# 4 1998-12-31 34 34.3 " HO" -0.00817 0.243 0.383
# 5 1999-01-29 100 100. " QS" -0.00498 0.390 0.757
# 6 1999-01-29 33.2 33.4 " HO" -0.00778 0.610 0.243
# 7 1999-02-26 100. 100. " QS" 0 NaN 0.390
# 8 1999-02-26 32.3 32.4 " HO" -0.00247 0.185 0.610
# 9 1999-02-26 10.9 11 " CO" -0.0109 0.815 NA
# 10 1999-03-31 132. 131. " QS" 0.00574 0.00574 NaN
# 11 1999-03-31 44.7 44 " HO" 0.0155 0.0155 0.185
# 12 1999-03-31 15.2 15.2 " CO" 0.00528 0.00528 0.815
Notes:
I used dplyr::lag instead of just lag because of the possibility of confusion with stats::lag, which behaves significantly differently than dplyr::lag. While most of the time it'll work just fine, it works until it doesn't ... and it doesn't usually warn you :-)
This is lagging by Dates regardless of month. I'll assume that you are certain that Dates are always perfectly frequent. If you think there's the possibility in a gap (where lagging by-row is not correct), then you'll need to break out the year/month into a new field and join on itself instead of doing a lag.

How to confront error "wrong embedding dimension" in cajolst R function?

When I try to use cajolst function from urca package I get a strange error.
would you please guide me how can i confront the problem?
result<-urca::cajolst(data ,trend = FALSE, K = 2, season = NULL)
Error in embed(diff(x), K) : wrong embedding dimension.
dates A G
2016-11-30 0 0
2016-12-01 -3.53 3.198
2016-12-02 -2.832 8.703
2016-12-04 -2.666 7.799
2016-12-05 -0.54 7.701
2016-12-06 -1.296 4.685
2016-12-07 -1.785 -4.587
2016-12-08 -6.834 -3.696
2016-12-09 -9.624 -5.461
2016-12-11 -11.374 -0.423
2016-12-12 -6.037 -1.614
2016-12-13 -5.934 -3.231
2016-12-14 -7.279 1.072
2016-12-15 -7.859 -4.823
2016-12-16 -15.132 10.838
2016-12-19 -15.345 11.5
2016-12-20 -15.673 6.639
2016-12-21 -15.391 11.162
2016-12-22 -14.357 7.032
2016-12-23 -14.99 12.355
2016-12-26 -15.626 10.944
2016-12-27 -12.297 10.215
2016-12-28 -13.967 5.957
2016-12-29 -12.946 3.446
2016-12-30 -19.681 10.274
2017-01-02 -18.24 8.781
2017-01-03 -16.83 1.116
2017-01-04 -18.189 -0.036
2017-01-05 -15.897 -1.441
2017-01-06 -20.196 -8.534
2017-01-09 -14.57 -28.768
2017-01-10 -13.27 -29.821
2017-01-11 -8.85 -38.881
2017-01-12 -6.375 -50.885
2017-01-13 -8.056 -51.321
2017-01-16 -5.217 -63.619
2017-01-17 -4.75 -39.163
2017-01-18 3.505 -46.309
2017-01-19 10.939 -45.825
2017-01-20 9.248 -42.973
2017-01-23 9.532 -33.396
2017-01-24 4.235 -31.38
2017-01-25 -1.885 -19.21
2017-01-26 -5.027 -15.74
2017-01-27 0.015 -23.029
2017-01-30 -0.685 -30.773
2017-01-31 -2.692 -25.544
2017-02-01 -2.654 -17.912
2017-02-02 4.002 -43.309
2017-02-03 4.813 -52.627
2017-02-06 7.049 -49.965
2017-02-07 10.003 -40.568
2017-02-08 8.996 -39.828
2017-02-09 7.047 -41.19
2017-02-10 7.656 -50.853
2017-02-13 4.986 -41.318
2017-02-14 8.493 -51.946
2017-02-15 12.547 -59.538
2017-02-16 10.327 -54.496
2017-02-17 7.09 -57.571
2017-02-20 11.633 -54.91
2017-02-21 12.664 -51.597
2017-02-22 16.103 -57.819
2017-02-23 14.25 -51.336
2017-02-24 7.794 -54.898
2017-02-27 15.27 -55.754
2017-02-28 19.984 -58.37
2017-03-01 23.899 -70.73
2017-03-02 16.63 -56.29
2017-03-03 16.443 -55.858
2017-03-06 17.901 -59.377
2017-03-07 19.067 -64.383
2017-03-08 17.219 -57.829
2017-03-09 15.694 -55.022
2017-03-10 17.351 -60.431
2017-03-13 18.945 -59.79
2017-03-14 20.001 -64.848
2017-03-15 23.852 -73.806
2017-03-16 22.697 -64.191
2017-03-17 26.892 -65.328
2017-03-20 29.221 -72.764
2017-03-21 25.165 -53.427
2017-03-22 22.998 -51.676
2017-03-23 20.072 -40.57
2017-03-24 20.758 -43.654
2017-03-27 20.062 -33.672
2017-03-28 22.066 -47.184
2017-03-29 22.363 -54.57
2017-03-30 20.684 -48.199
2017-03-31 17.056 -40.887
2017-04-03 19.12 -39.618
2017-04-04 16.359 -37.1
2017-04-05 18.643 -32.734
2017-04-06 14.708 -30.455
2017-04-07 8.403 -33.553
2017-04-10 6.072 -29.048
2017-04-11 5.186 -20.696
2017-04-12 4.248 -20.924
2017-04-13 12.803 -31.075
2017-04-14 12.566 -29.768
2017-04-17 14.065 -28.906
2017-04-18 14.5 4.121
2017-04-19 13.865 8.835
2017-04-20 16.126 6.191
2017-04-21 17.591 3.77
2017-04-24 22.3 -2.497
2017-04-25 22.731 7.408
2017-04-26 19.146 18.45
2017-04-27 19.052 25.541
2017-04-28 21.889 26.878
2017-05-01 27.323 14.362
2017-05-02 29.93 17.525
2017-05-03 19.835 29.856
2017-05-04 19.683 36.72
2017-05-05 13.545 41.055
2017-05-08 14.165 43.544
2017-05-09 11.325 49.978
2017-05-10 10.143 47.072
2017-05-11 13.718 38.901
2017-05-12 14.216 36.017
2017-05-15 13.701 33.797
2017-05-16 13.505 33.867
2017-05-17 13.456 38.004
2017-05-18 12.613 37.758
2017-05-19 11.166 40.367
2017-05-22 12.221 34.022
2017-05-23 13.682 29.793
2017-05-24 10.05 26.701
2017-05-25 10.122 31.394
2017-05-26 7.592 20.073
2017-05-29 6.796 23.809
2017-05-30 9.638 16.1
2017-05-31 7.983 29.043
2017-06-01 3.594 39.557
2017-06-02 8.763 27.863
2017-06-05 12.157 22.397
2017-06-06 13.383 19.053
2017-06-07 20.52 17.449
2017-06-08 19.534 -1.615
2017-06-09 16.011 -1.989
2017-06-12 9.153 -9.294
2017-06-13 4.295 -0.897
2017-06-14 9.743 -9.818
2017-06-15 10.386 -8.255
2017-06-16 11.983 -12.522
2017-06-19 9.513 -12.931
2017-06-20 10.298 -21.024
2017-06-21 11.087 -11.801
2017-06-22 4.472 -9.048
2017-06-23 9.416 -9.592
2017-06-26 9.686 -12.006
2017-06-27 6.424 -2.632
2017-06-28 3.062 -1.016
2017-06-29 5.593 -0.825
2017-06-30 3.531 0.914
2017-07-03 3.208 -2.596
2017-07-04 -6.373 4.289
2017-07-05 -5.149 5.917
2017-07-06 -6.104 12.75
2017-07-07 -9.565 1.615
2017-07-10 -8.961 -0.053
2017-07-11 -4.065 -8.541
2017-07-12 -10.133 -11.286
2017-07-13 -6.223 -15.181
2017-07-14 -1.524 -14.396
2017-07-17 -1.613 -14.61
2017-07-18 5.781 -35.473
2017-07-19 8.243 -44.186
2017-07-20 7.665 -49.857
2017-07-21 0.485 -41.286
2017-07-24 -0.638 -39.127
2017-07-25 0.767 -40.952
2017-07-26 3.566 -44.388
2017-07-27 6.834 -42.543
2017-07-28 1.306 -37.657
2017-07-31 5.839 -34.048
2017-08-01 5.838 -28.939
2017-08-02 7.298 -26.566
2017-08-03 6.804 -32.876
2017-08-04 8.989 -38.618
2017-08-07 8.862 -36.676
2017-08-08 8.234 -40.893
2017-08-09 7.39 -35.16
2017-08-10 8.593 -35.555
2017-08-11 7.253 -35.175
2017-08-14 5.593 -33.644
2017-08-15 4.528 -37.82
2017-08-16 6.752 -53.217
2017-08-17 6.284 -49.252
2017-08-18 4.765 -55.602
2017-08-21 3.905 -54.32
2017-08-22 1.76 -57.853
2017-08-23 0.406 -58.925
2017-08-24 -2.438 -58.098
2017-08-25 -0.791 -56.682
2017-08-28 2.173 -51.278
2017-08-29 2.523 -54.353
2017-08-30 4.482 -46.325
2017-08-31 0.246 -52.567
2017-09-01 -4.214 -53.636
2017-09-04 -4.548 -52.735
2017-09-05 -1.781 -50.421
2017-09-06 -10.463 -51.122
2017-09-07 -13.119 -52.433
2017-09-08 -11.716 -43.493
2017-09-11 -16.15 -43.142
2017-09-12 -12.478 -29.335
2017-09-13 -16.457 -31.697
2017-09-14 -14.615 -15.13
2017-09-15 -13.911 3.023
One of the issue is that the 'Date' column is also included and secondly, the season is not needed, it can be FALSE or specify an integer value
library(urca)
out <- cajolst(data[-1] ,trend = FALSE, K = 2, season =FALSE)
If there is a season effect and it is `quarterly, the value would be 4
out1 <- cajolst(data[-1] ,trend = FALSE, K = 2, season = 4)
out1
#####################################################
# Johansen-Procedure Unit Root / Cointegration Test #
#####################################################
#The value of the test statistic is: 3.6212 13.2233
data
data <- structure(list(dates = c("2016-11-30", "2016-12-01", "2016-12-02",
"2016-12-04", "2016-12-05", "2016-12-06", "2016-12-07", "2016-12-08",
"2016-12-09", "2016-12-11", "2016-12-12", "2016-12-13", "2016-12-14",
"2016-12-15", "2016-12-16", "2016-12-19", "2016-12-20", "2016-12-21",
"2016-12-22", "2016-12-23", "2016-12-26", "2016-12-27", "2016-12-28",
"2016-12-29", "2016-12-30", "2017-01-02", "2017-01-03", "2017-01-04",
"2017-01-05", "2017-01-06", "2017-01-09", "2017-01-10", "2017-01-11",
"2017-01-12", "2017-01-13", "2017-01-16", "2017-01-17", "2017-01-18",
"2017-01-19", "2017-01-20", "2017-01-23", "2017-01-24", "2017-01-25",
"2017-01-26", "2017-01-27", "2017-01-30", "2017-01-31", "2017-02-01",
"2017-02-02", "2017-02-03", "2017-02-06", "2017-02-07", "2017-02-08",
"2017-02-09", "2017-02-10", "2017-02-13", "2017-02-14", "2017-02-15",
"2017-02-16", "2017-02-17", "2017-02-20", "2017-02-21", "2017-02-22",
"2017-02-23", "2017-02-24", "2017-02-27", "2017-02-28", "2017-03-01",
"2017-03-02", "2017-03-03", "2017-03-06", "2017-03-07", "2017-03-08",
"2017-03-09", "2017-03-10", "2017-03-13", "2017-03-14", "2017-03-15",
"2017-03-16", "2017-03-17", "2017-03-20", "2017-03-21", "2017-03-22",
"2017-03-23", "2017-03-24", "2017-03-27", "2017-03-28", "2017-03-29",
"2017-03-30", "2017-03-31", "2017-04-03", "2017-04-04", "2017-04-05",
"2017-04-06", "2017-04-07", "2017-04-10", "2017-04-11", "2017-04-12",
"2017-04-13", "2017-04-14", "2017-04-17", "2017-04-18", "2017-04-19",
"2017-04-20", "2017-04-21", "2017-04-24", "2017-04-25", "2017-04-26",
"2017-04-27", "2017-04-28", "2017-05-01", "2017-05-02", "2017-05-03",
"2017-05-04", "2017-05-05", "2017-05-08", "2017-05-09", "2017-05-10",
"2017-05-11", "2017-05-12", "2017-05-15", "2017-05-16", "2017-05-17",
"2017-05-18", "2017-05-19", "2017-05-22", "2017-05-23", "2017-05-24",
"2017-05-25", "2017-05-26", "2017-05-29", "2017-05-30", "2017-05-31",
"2017-06-01", "2017-06-02", "2017-06-05", "2017-06-06", "2017-06-07",
"2017-06-08", "2017-06-09", "2017-06-12", "2017-06-13", "2017-06-14",
"2017-06-15", "2017-06-16", "2017-06-19", "2017-06-20", "2017-06-21",
"2017-06-22", "2017-06-23", "2017-06-26", "2017-06-27", "2017-06-28",
"2017-06-29", "2017-06-30", "2017-07-03", "2017-07-04", "2017-07-05",
"2017-07-06", "2017-07-07", "2017-07-10", "2017-07-11", "2017-07-12",
"2017-07-13", "2017-07-14", "2017-07-17", "2017-07-18", "2017-07-19",
"2017-07-20", "2017-07-21", "2017-07-24", "2017-07-25", "2017-07-26",
"2017-07-27", "2017-07-28", "2017-07-31", "2017-08-01", "2017-08-02",
"2017-08-03", "2017-08-04", "2017-08-07", "2017-08-08", "2017-08-09",
"2017-08-10", "2017-08-11", "2017-08-14", "2017-08-15", "2017-08-16",
"2017-08-17", "2017-08-18", "2017-08-21", "2017-08-22", "2017-08-23",
"2017-08-24", "2017-08-25", "2017-08-28", "2017-08-29", "2017-08-30",
"2017-08-31", "2017-09-01", "2017-09-04", "2017-09-05", "2017-09-06",
"2017-09-07", "2017-09-08", "2017-09-11", "2017-09-12", "2017-09-13",
"2017-09-14", "2017-09-15"), A = c(0, -3.53, -2.832, -2.666,
-0.54, -1.296, -1.785, -6.834, -9.624, -11.374, -6.037, -5.934,
-7.279, -7.859, -15.132, -15.345, -15.673, -15.391, -14.357,
-14.99, -15.626, -12.297, -13.967, -12.946, -19.681, -18.24,
-16.83, -18.189, -15.897, -20.196, -14.57, -13.27, -8.85, -6.375,
-8.056, -5.217, -4.75, 3.505, 10.939, 9.248, 9.532, 4.235, -1.885,
-5.027, 0.015, -0.685, -2.692, -2.654, 4.002, 4.813, 7.049, 10.003,
8.996, 7.047, 7.656, 4.986, 8.493, 12.547, 10.327, 7.09, 11.633,
12.664, 16.103, 14.25, 7.794, 15.27, 19.984, 23.899, 16.63, 16.443,
17.901, 19.067, 17.219, 15.694, 17.351, 18.945, 20.001, 23.852,
22.697, 26.892, 29.221, 25.165, 22.998, 20.072, 20.758, 20.062,
22.066, 22.363, 20.684, 17.056, 19.12, 16.359, 18.643, 14.708,
8.403, 6.072, 5.186, 4.248, 12.803, 12.566, 14.065, 14.5, 13.865,
16.126, 17.591, 22.3, 22.731, 19.146, 19.052, 21.889, 27.323,
29.93, 19.835, 19.683, 13.545, 14.165, 11.325, 10.143, 13.718,
14.216, 13.701, 13.505, 13.456, 12.613, 11.166, 12.221, 13.682,
10.05, 10.122, 7.592, 6.796, 9.638, 7.983, 3.594, 8.763, 12.157,
13.383, 20.52, 19.534, 16.011, 9.153, 4.295, 9.743, 10.386, 11.983,
9.513, 10.298, 11.087, 4.472, 9.416, 9.686, 6.424, 3.062, 5.593,
3.531, 3.208, -6.373, -5.149, -6.104, -9.565, -8.961, -4.065,
-10.133, -6.223, -1.524, -1.613, 5.781, 8.243, 7.665, 0.485,
-0.638, 0.767, 3.566, 6.834, 1.306, 5.839, 5.838, 7.298, 6.804,
8.989, 8.862, 8.234, 7.39, 8.593, 7.253, 5.593, 4.528, 6.752,
6.284, 4.765, 3.905, 1.76, 0.406, -2.438, -0.791, 2.173, 2.523,
4.482, 0.246, -4.214, -4.548, -1.781, -10.463, -13.119, -11.716,
-16.15, -12.478, -16.457, -14.615, -13.911), G = c(0, 3.198,
8.703, 7.799, 7.701, 4.685, -4.587, -3.696, -5.461, -0.423, -1.614,
-3.231, 1.072, -4.823, 10.838, 11.5, 6.639, 11.162, 7.032, 12.355,
10.944, 10.215, 5.957, 3.446, 10.274, 8.781, 1.116, -0.036, -1.441,
-8.534, -28.768, -29.821, -38.881, -50.885, -51.321, -63.619,
-39.163, -46.309, -45.825, -42.973, -33.396, -31.38, -19.21,
-15.74, -23.029, -30.773, -25.544, -17.912, -43.309, -52.627,
-49.965, -40.568, -39.828, -41.19, -50.853, -41.318, -51.946,
-59.538, -54.496, -57.571, -54.91, -51.597, -57.819, -51.336,
-54.898, -55.754, -58.37, -70.73, -56.29, -55.858, -59.377, -64.383,
-57.829, -55.022, -60.431, -59.79, -64.848, -73.806, -64.191,
-65.328, -72.764, -53.427, -51.676, -40.57, -43.654, -33.672,
-47.184, -54.57, -48.199, -40.887, -39.618, -37.1, -32.734, -30.455,
-33.553, -29.048, -20.696, -20.924, -31.075, -29.768, -28.906,
4.121, 8.835, 6.191, 3.77, -2.497, 7.408, 18.45, 25.541, 26.878,
14.362, 17.525, 29.856, 36.72, 41.055, 43.544, 49.978, 47.072,
38.901, 36.017, 33.797, 33.867, 38.004, 37.758, 40.367, 34.022,
29.793, 26.701, 31.394, 20.073, 23.809, 16.1, 29.043, 39.557,
27.863, 22.397, 19.053, 17.449, -1.615, -1.989, -9.294, -0.897,
-9.818, -8.255, -12.522, -12.931, -21.024, -11.801, -9.048, -9.592,
-12.006, -2.632, -1.016, -0.825, 0.914, -2.596, 4.289, 5.917,
12.75, 1.615, -0.053, -8.541, -11.286, -15.181, -14.396, -14.61,
-35.473, -44.186, -49.857, -41.286, -39.127, -40.952, -44.388,
-42.543, -37.657, -34.048, -28.939, -26.566, -32.876, -38.618,
-36.676, -40.893, -35.16, -35.555, -35.175, -33.644, -37.82,
-53.217, -49.252, -55.602, -54.32, -57.853, -58.925, -58.098,
-56.682, -51.278, -54.353, -46.325, -52.567, -53.636, -52.735,
-50.421, -51.122, -52.433, -43.493, -43.142, -29.335, -31.697,
-15.13, 3.023)), class = "data.frame", row.names = c(NA, -210L
))

Piecewise interpolation for entire data.frame R

I have a dataset from a sources that uses a special compression algorithm. Simply put, new measurements are recorded only when the change in slope (rate of change) is greater than a certain percentage (say 5%).
However, for the analysis I'm currently carrying out, I need values at regular intervals. I am able to carry out a piecewise interpolation using approx, approxfun or spline for different variables vs time (tme in below data) but I'd like to do it for all variables (columns of data.table) in a single shot.
library(data.table)
q = setDT(
structure(list(tme = structure(c(1463172120, 1463173320, 1463175720,
1463180520, 1463182920, 1463187720, 1463188920, 1463190120, 1463191320,
1463192520, 1463202180, 1463203380, 1463204580, 1463205780, 1463206980,
1463208180, 1463218980, 1463233440, 1463244240, 1463245440, 1463246640,
1463247840, 1463249040, 1463250240, 1463251440, 1463252640, 1463253840,
1463255040, 1463256240, 1463316360, 1463317560, 1463318760, 1463319960,
1463321160, 1463322360, 1463323560, 1463324760, 1463325960, 1463327160,
1463328360, 1463329560, 1463330760, 1463331960), class = c("POSIXct",
"POSIXt"), tzone = "America/Montreal"), rh = c(50.36, 47.31,
46.39, 46.99, 47.89, 50.37, 51.29, 51.92, 54.97, 67.64, 69.38,
68.96, 69.89, 56.66, 51.23, 55.38, 64.36, 50.72, 31.33, 31.38,
32.65, 33.15, 33.05, 31.87, 32.58, 32.65, 31.06, 29.82, 28.72,
67.95, 66.68, 64.66, 62.12, 59.86, 58.11, 57.41, 56.5, 56.16,
55.69, 54.57, 53.89, 53.81, 52.01), degc = c(30.0055555555556,
30.3611111111111, 30.6611111111111, 30.5833333333333, 30.2666666666667,
28.6888888888889, 28.2555555555556, 28.0722222222222, 27.4944444444444,
25.0722222222222, 24.8111111111111, 24.7166666666667, 24.1666666666667,
25.4111111111111, 25.5222222222222, 24.3555555555556, 22.7722222222222,
25.5222222222222, 27.8111111111111, 27.9888888888889, 28.0277777777778,
28.1333333333333, 28.5333333333333, 28.7, 28.85, 29.1555555555556,
28.8388888888889, 29.5111111111111, 29.6722222222222, 22.3888888888889,
22.5722222222222, 22.9444444444444, 23.3722222222222, 23.6777777777778,
23.8777777777778, 24.2055555555556, 24.6888888888889, 24.9777777777778,
25.3888888888889, 25.8, 26.1, 26.1555555555556, 26.7388888888889
)), .Names = c("tme", "rh", "degc"), row.names = c(NA, -43L), class = c("data.table",
"data.frame")))
q is my queried dataset. Here's what works for individual variables (degc in this example):
interpolate_degc <- approxfun(x = q$tme, y = q$degc, method = "linear")
# To get the uniform samples:
width <- "10 mins"
new_times <- seq.POSIXt(from = q$tme[1], to = q$tme[nrow(q)], by = width)
new_degc <- interpolate_degc(new_times)
I'd like to do this for all variables in a single shot, preferably using data.table.
This seems to work:
cols = c("rh", "degc")
DT = q[.(seq(min(tme), max(tme), by="10 mins")), on=.(tme)]
DT[, (cols) := lapply(cols, function(z) with(q,
approxfun(x = tme, y = get(z), method = "linear")
)(tme))]
tme rh degc
1: 2016-05-13 16:42:00 50.360 30.00556
2: 2016-05-13 16:52:00 48.835 30.18333
3: 2016-05-13 17:02:00 47.310 30.36111
4: 2016-05-13 17:12:00 47.080 30.43611
5: 2016-05-13 17:22:00 46.850 30.51111
---
263: 2016-05-15 12:22:00 54.026 26.04000
264: 2016-05-15 12:32:00 53.866 26.11667
265: 2016-05-15 12:42:00 53.826 26.14444
266: 2016-05-15 12:52:00 53.270 26.33056
267: 2016-05-15 13:02:00 52.370 26.62222
Generally when you want to iterate over columns, lapply or Map will work.
How it works: Inside the with(q, ...), tme and get(z) refer to columns of q, but outside of it, we're looking at columns of DT (in this case just tme).
Another way of doing the same thing:
q[, {
tt = seq(min(tme), max(tme), by="10 mins")
c(
.(tme = tt),
lapply(.SD, function(z) approxfun(x = tme, y = z, method="linear")(tt))
)
}, .SDcols=cols]
For time series I like to use specialized packages like xts and zoo:
library(xts)
ts <- merge(xts(x = q[,-1], order.by = q[,1]), new_times)
head(ts)
#> rh degc
#> 2016-05-13 16:42:00 50.36 30.00556
#> 2016-05-13 16:52:00 NA NA
#> 2016-05-13 17:02:00 47.31 30.36111
#> 2016-05-13 17:12:00 NA NA
#> 2016-05-13 17:22:00 NA NA
#> 2016-05-13 17:32:00 NA NA
head(na.approx(ts))
#> rh degc
#> 2016-05-13 16:42:00 50.360 30.00556
#> 2016-05-13 16:52:00 48.835 30.18333
#> 2016-05-13 17:02:00 47.310 30.36111
#> 2016-05-13 17:12:00 47.080 30.43611
#> 2016-05-13 17:22:00 46.850 30.51111
#> 2016-05-13 17:32:00 46.620 30.58611
head(na.spline(ts))
#> rh degc
#> 2016-05-13 16:42:00 50.36000 30.00556
#> 2016-05-13 16:52:00 48.52407 30.20524
#> 2016-05-13 17:02:00 47.31000 30.36111
#> 2016-05-13 17:12:00 46.62601 30.47791
#> 2016-05-13 17:22:00 46.33972 30.56219
#> 2016-05-13 17:32:00 46.30857 30.62093

Reshaping data for panel regression from Datastream

I have downloaded data from Datastream in form one variable per sheet.
Current data view - One variable: Price
What I want to do it to convert each sheet (each variable) into panel format so that I can use plm() or export data to Stata (I am kind of new to R), so that it looks like
Click to view - What I expect to have
One conundrum is that I have >500 companies and manually writting the names (or codes) in the R code is very burdensome
I would really appreciate if you could sketch a basic code and not just refer to reshape function in R.
P.S. Sorry for posting this question if it was already answered.
Your current data set is in wide format and you need it in long format and melt function from reshape package will do very well
The primary key for melt function is date since it is the same for all companies
I have assumed a test dataset for the below demo:
#Save Price, volume, market value, shares, etc into individual CSV files
#Rename first column as "date" and Remove rows 2 and 3 since you do not need them
#Demo for price data
price_data = read.csv("path_to_price_csv_file",header=TRUE,stringsAsFactors=FALSE,na.strings="NA")
test_DF = price_data
require(reshape2)
require(PerformanceAnalytics)
data(managers)
test_DF = data.frame(date=as.Date(index(managers),format="%Y-%m-%d"),managers,row.names=NULL,stringsAsFactors=FALSE)
#This data is similar in format as your price data
head(test_DF)
# date HAM1 HAM2 HAM3 HAM4 HAM5 HAM6 EDHEC.LS.EQ SP500.TR US.10Y.TR US.3m.TR
# 1 1996-01-31 0.0074 NA 0.0349 0.0222 NA NA NA 0.0340 0.00380 0.00456
# 2 1996-02-29 0.0193 NA 0.0351 0.0195 NA NA NA 0.0093 -0.03532 0.00398
# 3 1996-03-31 0.0155 NA 0.0258 -0.0098 NA NA NA 0.0096 -0.01057 0.00371
# 4 1996-04-30 -0.0091 NA 0.0449 0.0236 NA NA NA 0.0147 -0.01739 0.00428
# 5 1996-05-31 0.0076 NA 0.0353 0.0028 NA NA NA 0.0258 -0.00543 0.00443
# 6 1996-06-30 -0.0039 NA -0.0303 -0.0019 NA NA NA 0.0038 0.01507 0.00412
#test_data = test_DF #replace price, volume , shares dataset here
#dateColumnName = "date" #name of your date column
#columnOfInterest1 = "manager" #for you this will be "Name"
#columnOfInterest2 = "return" #this will vary according to your input data, price, volume, shares etc.
Custom_Melt_DataFrame = function(test_data = test_DF ,dateColumnName = "date", columnOfInterest1 = "manager",columnOfInterest2 = "return") {
molten_DF = melt(test_data,dateColumnName,stringsAsFactors=FALSE)
colnames(molten_DF) = c(dateColumnName,columnOfInterest1,columnOfInterest2)
#format as character
molten_DF[,columnOfInterest1] = as.character(molten_DF[,columnOfInterest1])
#assign index
molten_DF$index = rep(1:(ncol(test_data)-1),each=nrow(test_data))
#reorder columns
molten_DF = molten_DF[,c("index",columnOfInterest1,dateColumnName,columnOfInterest2)]
return(molten_DF)
}
custom_data = Custom_Melt_DataFrame (test_data = test_DF ,dateColumnName = "date", columnOfInterest1 = "manager",columnOfInterest2 = "return")
head(custom_data,10)
# index manager date return
# 1 1 HAM1 1996-01-31 0.0074
# 2 1 HAM1 1996-02-29 0.0193
# 3 1 HAM1 1996-03-31 0.0155
# 4 1 HAM1 1996-04-30 -0.0091
# 5 1 HAM1 1996-05-31 0.0076
# 6 1 HAM1 1996-06-30 -0.0039
# 7 1 HAM1 1996-07-31 -0.0231
# 8 1 HAM1 1996-08-31 0.0395
# 9 1 HAM1 1996-09-30 0.0147
# 10 1 HAM1 1996-10-31 0.0288
tail(custom_data,10)
# index manager date return
# 1311 10 US.3m.TR 2006-03-31 0.00385
# 1312 10 US.3m.TR 2006-04-30 0.00366
# 1313 10 US.3m.TR 2006-05-31 0.00404
# 1314 10 US.3m.TR 2006-06-30 0.00384
# 1315 10 US.3m.TR 2006-07-31 0.00423
# 1316 10 US.3m.TR 2006-08-31 0.00441
# 1317 10 US.3m.TR 2006-09-30 0.00456
# 1318 10 US.3m.TR 2006-10-31 0.00381
# 1319 10 US.3m.TR 2006-11-30 0.00430
# 1320 10 US.3m.TR 2006-12-31 0.00441

How to fetch 3-years historical price serie from Oanda with R?

I would like to process Bitcoin price in R but I'm unable to download time serie from Yahoo and Google.
From Yahoo the BTCUSD historical time serie is missing and the Google doesn't recognize the URL formated by getSymbols when symbol is "CURRENCY:EURUSD". I know R expect the ":" to be a list so I applied a workaround I found in Stakeoverflow to turn CURRENCY:EURUSD in CURRENCY.EURUSD but still Google cannot process the request.
Download from Oanda works like a charm but request cannot exceed 500 days. I try this workaround to bypass the limitation but it fails to populate correctly the prices object in which I have others symbols :
for some reason BTCUSD prices are missing for 2012 and part of 2013
also there are symbols from symbols's list that get NA with the wo.
tail(prices) (with the loop bellow)
UUP FXB FXE FXF FXY SLV GLD BTC
2014-08-31 NA NA NA NA NA NA NA 506.809
2014-09-30 22.87 159.33 124.48 102.26 88.80 16.35 116.21 375.386
2014-10-31 23.09 157.20 123.49 101.45 86.65 15.50 112.66 341.852
2014-11-30 NA NA NA NA NA NA NA 378.690
2014-12-31 23.97 153.06 119.14 98.16 81.21 15.06 113.58 312.642
2015-01-24 NA NA NA NA NA NA NA 229.813
Extract of print(prices) (with the loop bellow)
2013-06-28 22.56 150.17 128.93 103.92 98.63 18.97 119.11 NA
2013-07-31 22.09 150.12 131.74 105.99 99.93 19.14 127.96 NA
2013-08-30 22.19 152.93 130.84 105.45 99.63 22.60 134.62 NA
2013-09-30 21.63 159.70 133.85 108.44 99.47 20.90 128.18 133.794
2013-10-31 21.63 158.10 134.29 108.03 99.38 21.10 127.74 203.849
2013-11-30 NA NA NA NA NA NA NA 1084.800
2013-12-31 21.52 163.30 135.99 109.82 92.76 18.71 116.12 758.526
2014-01-31 21.83 161.95 133.29 108.00 95.58 18.45 120.09 812.097
tail(prices) (without the loop bellow)
UUP FXB FXE FXF FXY SLV GLD
2014-08-29 22.02 163.23 129.54 106.42 93.61 18.71 123.86
2014-09-30 22.87 159.33 124.48 102.26 88.80 16.35 116.21
2014-10-31 23.09 157.20 123.49 101.45 86.65 15.50 112.66
2014-11-28 23.47 153.46 122.46 101.00 82.01 14.83 112.11
2014-12-31 23.97 153.06 119.14 98.16 81.21 15.06 113.58
2015-01-23 25.21 147.23 110.33 110.95 82.57 17.51 124.23
What is wrong with this code ? Tx !
require(quantmod)
require(PerformanceAnalytics)
symbols <- c(
"UUP",
"FXB",
"FXE",
"FXF",
"FXY",
"SLV",
"GLD"
)
getSymbols(symbols, from="2004-01-01")
prices <- list()
for(i in 1:length(symbols)) {
prices[[i]] <- Cl(get(symbols[i]))
}
BTC <- list()
for(i in 1:2) {
BTC[[1]] <- getFX("BTC/USD",
from = Sys.Date() -499 * (i + 1),
to = Sys.Date() - 499 * i,
env = parent.frame(),
auto.assign = FALSE)
}
BTC[[1]] <- getFX("BTC/USD",
from = Sys.Date() -499,
to = Sys.Date(),
env = parent.frame(),
auto.assign = FALSE)
prices[[length(symbols)+1]] <- BTC[[1]]
prices <- do.call(cbind, prices)
colnames(prices) <- gsub("\\.[A-z]*", "", colnames(prices))
ep <- endpoints(prices, "months")
prices <- prices[ep,]
prices <- prices["1997-03::"]
Your for loop isn't using i, and then after the for loop you're overwriting the results (the list was of length 1 because BTC[[1]] was hardcoded)
Try this
btc <- do.call(rbind, lapply(0:2, function(i) {
getFX("BTC/USD",
from = Sys.Date() -499 * (i + 1),
to = Sys.Date() - 499 * i,
env=NULL)
}))
prices <- do.call(cbind, c(prices, list(btc)))
Edit: Here's a more complete example
library(quantmod)
# Use tryCatch() in case we try to get data too far in the past that
# Oanda doesn't provide. Return NULL if there is an error, and Filter
# to only include data that has at least 1 row.
btc <- do.call(rbind, Filter(NROW, lapply(0:5, function(i) {
tryCatch(getFX("BTC/USD",
from = Sys.Date() -499 * (i + 1),
to = Sys.Date() - 499 * i,
env=NULL), error=function(e) NULL)
})))
symbols <- c(
"UUP",
"FXB",
"FXE",
"FXF",
"FXY",
"SLV",
"GLD"
)
e <- new.env()
getSymbols(symbols, from=start(btc), env=e)
prices <- do.call(cbind, c(eapply(e, Cl)[symbols], list(btc)))
colnames(prices) <- gsub("\\.[A-z]*", "", colnames(prices))
head(na.locf(prices)[endpoints(prices, "months")])
# UUP FXB FXE FXF FXY SLV GLD BTC
#2010-07-31 23.74 156.15 129.88 95.38 114.60 17.58 115.49 0.06386
#2010-08-31 24.12 152.60 126.25 97.80 117.83 18.93 122.08 0.06441
#2010-09-30 22.84 156.33 135.81 101.00 118.57 21.31 127.91 0.06194
#2010-10-31 22.37 159.45 138.69 100.81 122.93 24.17 132.62 0.18530
#2010-11-30 23.50 154.72 129.30 98.87 118.16 27.44 135.42 0.27380
#2010-12-31 22.71 155.77 133.09 106.25 121.75 30.18 138.72 0.29190

Resources