How to extract data from complex function in R - r

The acf method in the stats package returns a complex output. For example
x = rnorm(1000, mean=100, sd=10)
acf(x)
returns a plot. If I do
acf_x = acf(x)
acf_x
it returns
Autocorrelations of series ‘x’, by lag
0 1 2 3 4 5 6 7 8 9 10 11
1.000 0.000 -0.031 -0.002 -0.052 0.017 -0.014 0.030 0.011 0.002 -0.044 0.000
12 13 14 15 16 17 18 19 20 21 22 23
0.055 -0.007 0.049 0.025 -0.027 -0.048 0.033 0.027 0.043 -0.007 -0.010 0.025
24 25 26 27 28 29 30
-0.083 0.045 -0.074 0.016 0.041 -0.046 0.010
If I look at class(acf) it returns 'acf'.
How do I extract the autocorrelation versus lag into a data_frame?
More generally, when presented with a function that returns a complex object, how do I extract the data from it, i.e. is there a general pattern for this type of function?

If you look at the help function of acf via ?acf you'll see under "value" what the output will look like.
In this case, the acf object is a list with several elements.
If you e.g. want the lags, you can simply access this via:
my_lags <- acf_x$lag

Deschen's answer to the original question gives the general response - how do I discover the elements in a complex model object: str(). One can also use the names() function for S3 objects, where the result lists the names one can use to extract elements from the list() with the $ or [[ forms of the extract operator.
set.seed(95014)
x = rnorm(1000, mean=100, sd=10)
acf_x <- acf(x)
names(acf_x)
> names(acf_x)
[1] "acf" "type" "n.used" "lag" "series" "snames"
>
Since the acf and lag elements are stored as arrays, we'll need to extract just the first dimension to obtain a simple vector. We can accomplish this by chaining the [ form of the extract operator onto the object that is generated by the [[ extract on the model object.
head(acf_x[["acf"]][,1,1]) # second extract returns a simple vector
> head(acf_x[["acf"]][,1,1])
[1] 1.000000000 -0.034863150 0.037745441 -0.020464290 -0.004974406
[6] 0.016770363
In this case R performs the extraction left to right - first acf_x[["acf"]] is evaluated, and then [,1,1] is applied to the result.
For the concrete part of the question, "how do I create a data frame with this data?" One can create a data frame with the output from the acf() function as follows.
set.seed(95014)
x = rnorm(1000, mean=100, sd=10)
acf_x <- acf(x)
results <- data.frame(acf_value = acf_x$acf[,1,1],
acf_lag = acf_x$lag[,1,1])
head(results)
...and the output:
> head(results)
acf_value acf_lag
1 1.000000000 0
2 -0.034863150 1
3 0.037745441 2
4 -0.020464290 3
5 -0.004974406 4
6 0.016770363 5

Try
str(acf_x)
or
print.default(acf_x)
This will get you an idea how the object looks like internally and how to access the elements in it.

Related

Why are you giving the error %dopar% not found even after loading the foreach package?

I want to generate clusters from a multivariate dataset in locations in the (x, y) plane, in two steps. In the first step (already executed) I used a more dense set of data and separated in previous classes. In the second stage of clustering I want to use this prior information to refine and create the final cluster. They are soil and plant data of an agricultural area. I am not able to execute the second step, and tried to use the "foreach" and "doParallel" R packages, without success. R reports that the% dopar% function was not found. First, I ran a code, with several "for" strings. The process was extremely slow. I interrupted. I saw that it was possible to use parallel computing, but I do not have this knowledge. I installed the "foreach" and "doParallel" packages. Just with "foreach" I can execute small parts of the code and verify that it is actually working. But when I enter% dopar% after the "foreach" parameter, R reports the error of not finding this function.
#The last column of the data set (p+1) indicates the a priori classes.
require(foreach)
require(doParallel)
require(geoR)
#Product of the differences between variables
Z<-function(dados,var,ponto1,ponto2){
return(as.numeric(dados$data[ponto1,var]-dados$data[ponto2,var]))
}
Kernel estimator with priori information
Klambda<-function(dados,ponto1,ponto2){
dmatriz<-as.matrix(dist(dados$coords))
if(dmatriz[ponto1,ponto2]>=2.23){return(0)}
if(dados$data[ponto1,p+1]!=dados$data[ponto2,p+1]){return(0.2*(0.75*(2.23^2-dmatriz[ponto1,ponto2]^2))}
if(dados$data[ponto1,p+1]==dados$data[ponto2,p+1]){return(0.8*(0.75*(2.23^2-dmatriz[ponto1,ponto2]^2))}
}
Numerator
Numerador<-function(dados,ponto1,ponto2,var1,var2){
result=0
foreach(k=1:n)%dopar%{
kl = Klambda(dados,ponto1,k)
foreach(l=1:n)%dopar%{
result=result+(kl *Klambda(dados,ponto2,l)*Z(dados,var1,k,l)*Z(dados,var2,k,l))
}
}
return(result)
}
Denominator
Denominador<-function(dados,ponto1,ponto2){
n=nrow(dados$data)
result=0
foreach(k=1:n)%dopar%{
foreach(l=1:n)%dopar%{
result=result+(Klambda(dados,ponto1,k)*Klambda(dados,ponto2,l))
}
}
return(2*result)
}
#Gamma: direct and cross semivariance
GammaHat<-function(dados,ponto1,ponto2,var1,var2){return(Numerador(dados,ponto1,ponto2,var1,var2)/Denominador(dados,ponto1,ponto2))
}
Dissimilarity
Dlambda<-function(dados,ponto1,ponto2){
result=0
foreach(i=1:p)%dopar%{
foreach(j=1:p)%dopar%{
result=result+GammaHat(dados,ponto1,ponto2,i,j)
}
}
return(result)
}
Matrix of dissimilarity
MatrizD<-array(dim=c(n,n))
system.time(
foreach(k=1:n)%dopar%{
foreach(l=1:n)%dopar%{
if(l<=k){
MatrizD[k,l]<- Dlambda(d,k,l)
}
MatrizD[l,k]<-MatrizD[k,l]
}
}
)
For the test file (12 locations, 3 variables plus the a priori class), without the foreach the code for the dissimilarity matrix takes around 100 seconds. However, I could not execute yet for the actual data set (102 points, 23 variables plus the a priori class).
x y var1 var2 var3 Class
1 1 0.245 0.514 0.048 1
1 2 0.825 0.427 0.100 1
1 3 0.873 0.803 0.452 1
2 1 0.452 0.801 0.510 1
2 2 0.243 0.855 0.303 1
2 3 0.640 0.108 0.954 1
3 1 0.834 0.185 0.418 1
3 2 0.998 0.160 0.787 2
3 3 0.596 0.030 0.840 2
4 1 0.506 0.264 0.503 2
4 2 0.975 0.441 0.011 2
4 3 0.466 0.138 0.482 2

How can I use acf() for mutliple subsets in R?

I am trying write code that will do autocorrelation for multiple subsets. For example. I have health data for multiple countries over time. I want to get each country's autocorrelation for each variable. Any help would be great!
Here are some things I have tried, unsuccessfully:
require(plyr)
POP_ACF=acf(PhD_data_list_view$POP, lag.max=NULL, type=c("correlation"),
plot=TRUE, na.action=na.pass, demean=TRUE)
dlply(PhD_data_list_view, .(Country), function(x) POP_ACF %+% x)
POP_ACF=function(PhD_data_list_view$POP) c(acf(PhD_data_list_view$POP, plot=TRUE)$acf)
acf is a function takes a vector and returns a list. That makes it a natural fit for the purrr package, which maps functions over lists, but it can also be done using base R.
I'll use the beaver1 dataset from the datasets package since you didn't provide yours. I'll use different days of observations as the analogue to your different countries, and temperature for your POP variable.
Base R:
split turns the vector beaver1$temp into a list of vectors along the second argument, beaver1$day.
Then mapply runs the function acf on each element of that list.
Since we're using mapply instead of lapply, we can also provide another list of arguments, here the titles for each plot, main = unique(beaver1$day).
The last argument, SIMPLIFY = F, tells it to return the default output, not attempt to coerce the list into anything else.
par(mfrow = c(1,2))
mapply(acf,
split(beaver1$temp, beaver1$day),
main = unique(beaver1$day),
SIMPLIFY = F)
# $`346`
#
# Autocorrelations of series ‘dots[[1L]][[1L]]’, by lag
#
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
# 1.000 0.838 0.698 0.593 0.468 0.355 0.265 0.167 0.113 0.069 0.028 0.037 0.087 0.108 0.145 0.177 0.151 0.125 0.123 0.106
# $`347`
#
# Autocorrelations of series ‘dots[[1L]][[2L]]’, by lag
#
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13
# 1.000 0.546 0.335 0.130 0.080 0.024 -0.025 -0.103 -0.090 -0.032 0.168 0.036 -0.089 -0.306
purrr and the tidy way:
This way is a bit more flexible depending what you want to do with the output. We can use purrr::map as a direct drop-in for mapply:
library(purrr)
beaver1 %>%
split(.$day) %>%
map(~acf(.$temp, main = unique(.$day)))
Which returns the exact same output. But we can also go fully tidy and return the data from acf as a dataframe so that we can explore it further with ggplot2.
The first map is returning a list of outputs, each of which is a list containing, among other things, variables lag, acf, and n.used.
The map_dfr is running the function data.frame, assigning each of those variables to a new column.
We also make a column to calculate the CIs. Refer to: How is the confidence interval calculated for the ACF function?
Then we can use ggplot to make any kind of plot we want, and we still have the data for any other analysis you want to do.
library(ggplot2)
beaver_acf <-
beaver1 %>%
split(.$day) %>%
map(~acf(.$temp, plot = F)) %>%
map_dfr(
~data.frame(lag = .$lag,
acf = .$acf,
ci = qnorm(0.975)/sqrt(.$n.used)
), .id = "day")
head(beaver_acf)
# day lag acf ci
# 1 346 0 1.0000000 0.2054601
# 2 346 1 0.8378889 0.2054601
# 3 346 2 0.6983476 0.2054601
# 4 346 3 0.5928198 0.2054601
# 5 346 4 0.4680912 0.2054601
# 6 346 5 0.3554939 0.2054601
ggplot(beaver_acf, aes(lag, acf)) +
geom_segment(aes(xend = lag, yend = 0)) +
geom_hline(aes(yintercept = ci), linetype = "dashed", color = "blue") +
geom_hline(aes(yintercept = -ci), linetype = "dashed", color = "blue") +
facet_wrap(~variable)

R Interclass distance matrix

This question is sort of a follow-up to how to extract intragroup and intergroup distances from a distance matrix? in R. In that question, they first computed the distance matrix for all points, and then simply extracted the inter-class distance matrix. I have a situation where I'd like to bypass the initial computation and skip right to extraction, i.e. I want to directly compute the inter-class distance matrix. Drawing from the linked example, with tweaks, let's say I have some data in a dataframe called df:
values<-c(0.002,0.3,0.4,0.005,0.6,0.2,0.001,0.002,0.3,0.01)
class<-c("A","A","A","B","B","B","B","A","B","A")
df<-data.frame(values, class)
What I'd like is a distance matrix:
1 2 3 8 10
4 .003 .295 .395 .003 .005
5 .598 .300 .200 .598 .590
6 .198 .100 .200 .198 .190
7 .001 .299 .399 .001 .009
9 .298 .000 .100 .298 .290
Does there already exist in R an elegant and fast way to do this?
EDIT After receiving a good solution for the 1D case above, I thought of a bonus question: what about a higher-dimensional case, say if instead df looks like this:
values1<-c(0.002,0.3,0.4,0.005,0.6,0.2,0.001,0.002,0.3,0.01)
values2<-c(0.001,0.1,0.1,0.001,0.1,0.1,0.001,0.001,0.1,0.01)
class<-c("A","A","A","B","B","B","B","A","B","A")
df<-data.frame(values1, values2, class)
And I'm interested in again getting a matrix of the Euclidean distance between points in class B with points in class A.
For general n-dimensional Euclidean distance, we can exploit the equation (not R, but algebra):
square_dist(b,a) = sum_i(b[i]*b[i]) + sum_i(a[i]*a[i]) - 2*inner_prod(b,a)
where the sums are over the dimensions of vectors a and b for i=[1,n]. Here, a and b are one pair from A and B. The key here is that this equation can be written as a matrix equation for all pairs in A and B.
In code:
## First split the data with respect to the class
n <- 2 ## the number of dimensions, for this example is 2
tmp <- split(df[,1:n], df$class)
d <- sqrt(matrix(rowSums(expand.grid(rowSums(tmp$B*tmp$B),rowSums(tmp$A*tmp$A))),
nrow=nrow(tmp$B)) -
2. * as.matrix(tmp$B) %*% t(as.matrix(tmp$A)))
Notes:
The inner rowSums compute sum_i(b[i]*b[i]) and sum_i(a[i]*a[i]) for each b in B and a in A, respectively.
expand.grid then generates all pairs between B and A.
The outer rowSums computes the sum_i(b[i]*b[i]) + sum_i(a[i]*a[i]) for all these pairs.
This result is then reshaped into a matrix. Note that the number of rows of this matrix is the number of points of class B as you requested.
Then subtract two times the inner product of all pairs. This inner product can be written as a matrix multiply tmp$B %*% t(tmp$A) where I left out the coercion to matrix for clarity.
Finally, take the square root.
Using this code with your data:
print(d)
## 1 2 3 8 10
##4 0.0030000 0.3111688 0.4072174 0.0030000 0.01029563
##5 0.6061394 0.3000000 0.2000000 0.6061394 0.59682493
##6 0.2213707 0.1000000 0.2000000 0.2213707 0.21023796
##7 0.0010000 0.3149635 0.4110985 0.0010000 0.01272792
##9 0.3140143 0.0000000 0.1000000 0.3140143 0.30364453
Note that this code will work for any n > 1. We can recover your previous 1-d result by setting n to 1 and not perform the inner rowSums (because there is now only one column in tmp$A and tmp$B):
n <- 1 ## the number of dimensions, set this now to 1
tmp <- split(df[,1:n], df$class)
d <- sqrt(matrix(rowSums(expand.grid(tmp$B*tmp$B,tmp$A*tmp$A)),
nrow=length(tmp$B)) -
2. * as.matrix(tmp$B) %*% t(as.matrix(tmp$A)))
print(d)
## [,1] [,2] [,3] [,4] [,5]
##[1,] 0.003 0.295 0.395 0.003 0.005
##[2,] 0.598 0.300 0.200 0.598 0.590
##[3,] 0.198 0.100 0.200 0.198 0.190
##[4,] 0.001 0.299 0.399 0.001 0.009
##[5,] 0.298 0.000 0.100 0.298 0.290
Here's an attempt via generating each combination and then simply taking the difference from each value:
abs(matrix(Reduce(`-`, expand.grid(split(df$values, df$class))), nrow=5, byrow=TRUE))
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0.003 0.295 0.395 0.003 0.005
#[2,] 0.598 0.300 0.200 0.598 0.590
#[3,] 0.198 0.100 0.200 0.198 0.190
#[4,] 0.001 0.299 0.399 0.001 0.009
#[5,] 0.298 0.000 0.100 0.298 0.290

R - How to select specific values

I'm working in healthcare and I need help on how to use R.
I explain: I have a set of data like that:
S1 S2 S3 S4 S5
0.498 1.48 1.43 0.536 0.548
2.03 1.7 3.74 2.13 2.02
0.272 0.242 0.989 0.534 0.787
0.986 2.03 2.53 1.65 2.31
0.307 0.934 0.633 0.36 0.281
0.78 0.76 0.706 0.81 1.11
0.829 2.03 0.667 1.48 1.42
0.497 1.27 0.952 1.23 1.73
0.553 0.286 0.513 0.422 0.573
Here are my objectives:
Do correlation between every column
Calculate p-values
Calculate R-squared
Only show when R2>0.5 and p-values <0.05
Here is my code so far (it's not the most efficient but it work):
> e<-read.table(‘Workbook8nm.csv’, header=TRUE, sep=“,”, dec=“.”, na.strings=“NA”)
> f<-data.frame(e)
> M<-cor(f, use=“complete”) #Do the correlation like I want
> library(‘psych’)
> N<-corr.test (f) #Give me p-values
So, so far I have my correlation in M and my p-values in N.
I need help on how to show R2 ?
And second part how to make R only show me when R2>0.5 and p-values<0.05 for example ? I used this line :
P<-M[which(m>0.9))]
To show me only when the pearson coefficent is more than 0.9 as a training. But it just make me a list of every values that are superior to 0.9 ... So I don't know between which and which column this coefficient come from. The best would be that it show me significant values in a table with the name of column so after I can easily identify them.
The reason I want to do that is because by table is 570 by 570 so I can't look at every p-values to keep only the significant one.
I hope I was clear ! It's my first post here, tell me if I did any mistake !
Thanks for your help !
I'm sure there is a function somewhere in the R space to do this quicker, but I wrote a quick function to expand a matrix into a data.frame with the "row" and "column" as columns, and the value as a third column.
matrixToFrame <- function(m, name) {
e <- expand.grid(row=rownames(m), col=colnames(m))
e[name] <- as.vector(m)
e
}
We can transform the correlation matrix into a data frame like so:
> matrixToFrame(cor(f), "cor")
row col cor
1 S1 S1 1.0000000
2 S2 S1 0.5322052
3 S3 S1 0.8573687
4 S4 S1 0.8542438
5 S5 S1 0.6820144
6 S1 S2 0.5322052
....
And we can merge the result of corr.test and cor because the columns match up
> b <- merge(matrixToFrame(corr.test(a)$p, "p"), matrixToFrame(cor(a), "cor"))
> head(b)
row col p cor
1 S1 S1 0.0000000000 1.0000000
2 S1 S2 0.2743683745 0.5322052
3 S1 S3 0.0281656707 0.8573687
4 S1 S4 0.0281656707 0.8542438
5 S1 S5 0.2134783039 0.6820144
6 S2 S1 0.1402243214 0.5322052
Then we can just filter for the elements that we want
> b[b$cor > .5 & b$p > .2,]
row col p cor
2 S1 S2 0.2743684 0.5322052
5 S1 S5 0.2134783 0.6820144
8 S2 S3 0.2743684 0.5356585
10 S2 S5 0.2134783 0.6724486
15 S3 S5 0.2134783 0.6827349
EDIT: I found R matrix to rownames colnames values, which provides a couple of attempts at matrixToFrame; nothing particularly more elegant than what I have here, though.
EDIT2: Make sure to read the docs carefully for corr.test -- it looks like different information gets encoded in the upper and lower diagonal (?), so the results here may be deceptive. You may want to do some filtering with lower.tri or upper.tri before the final filtering step.

R: aggregating time series groups of irregular length

I think this is a split-apply-combine problem, but with a time series twist. My data consists of irregular counts and I need to perform some summary statistics on each group of counts. Here's a snapshot of the data:
And here's it is for your console:
library(xts)
date <- as.Date(c("2010-11-18", "2010-11-19", "2010-11-26", "2010-12-03", "2010-12-10",
"2010-12-17", "2010-12-24", "2010-12-31", "2011-01-07", "2011-01-14",
"2011-01-21", "2011-01-28", "2011-02-04", "2011-02-11", "2011-02-18",
"2011-02-25", "2011-03-04", "2011-03-11", "2011-03-18", "2011-03-25",
"2011-03-26", "2011-03-27"))
returns <- c(0.002,0.000,-0.009,0.030, 0.013,0.003,0.010,0.001,0.011,0.017,
-0.008,-0.005,0.027,0.014,0.010,-0.017,0.001,-0.013,0.027,-0.019,
0.000,0.001)
count <- c(NA,NA,1,1,2,2,3,4,5,6,7,7,7,7,7,NA,NA,NA,1,2,NA,NA)
maxCount <- c(NA,NA,0.030,0.030,0.030,0.030,0.030,0.030,0.030,0.030,0.030,
0.030,0.030,0.030,0.030,NA,NA,NA,0.027,0.027,NA,NA)
sumCount <- c(NA,NA,0.000,0.030,0.042,0.045,0.056,0.056,0.067,0.084,0.077,
0.071,0.098,0.112,0.123,NA,NA,NA,0.000,-0.019,NA,NA)
xtsData <- xts(cbind(returns,count,maxCount,sumCount),date)
I have no idea how to construct the max and cumSum columns, especially since each count series is of an irregular length. Since I won't always know the start and end points of a count series, I'm lost at trying to figure out the index of these groups. Thanks for your help!
UPDATE: here is my for loop for attempting to calculating cumSum. it's not the cumulative sum, just the returns necessary, i'm still unsure how to apply functions to these ranges!
xtsData <- cbind(xtsData,mySumCount=NA)
# find groups of returns
for(i in 1:nrow(xtsData)){
if(is.na(xtsData[i,"count"]) == FALSE){
xtsData[i,"mySumCount"] <- xtsData[i,"returns"]
}
else{
xtsData[i,"mySumCount"] <- NA
}
}
UPDATE 2: thank you commenters!
# report returns when not NA count
x1 <- xtsData[!is.na(xtsData$count),"returns"]
# cum sum is close, but still need to exclude the first element
# -0.009 in the first series of counts and .027 in the second series of counts
x2 <- cumsum(xtsData[!is.na(xtsData$count),"returns"])
# this is output is not accurate because .03 is being displayed down the entire column, not just during periods when counts != NA. is this just a rounding error?
x3 <- max(xtsData[!is.na(xtsData$count),"returns"])
SOLUTION:
# function to pad a vector with a 0
lagpad <- function(x, k) {
c(rep(0, k), x)[1 : length(x)]
}
# group the counts
x1 <- na.omit(transform(xtsData, g = cumsum(c(0, diff(!is.na(count)) == 1))))
# cumulative sum of the count series
z1 <- transform(x1, cumsumRet = ave(returns, g, FUN =function(x) cumsum(replace(x, 1, 0))))
# max of the count series
z2 <- transform(x1, maxRet = ave(returns, g, FUN =function(x) max(lagpad(x,1))))
merge(xtsData,z1$cumsumRet,z2$maxRet)
The code shown is not consistent with the output in the image and there is no explanation provided so its not clear what manipulations were wanted; however, the question did mention that the main problem is distinguishing the groups so we will address that.
To do that we compute a new column g whose rows contain 1 for the first group, 2 for the second and so on. We also remove the NA rows since the g column is sufficient to distinguish groups.
The following code computes a vector the same length as count by first setting each NA position to FALSE and each non-NA position to TRUE. It then differences each position of that vector with the prior position. To do that it implicitly converts FALSE to 0 and TRUE to 1 and then performs the differencing. Next we convert this last result to a logical vector which is TRUE for each 1 component and FALSE otherwise. Since the first component of the vector that is differenced has no prior position we prepend 0 for that. The prepending operation implicitly converts the TRUE and FALSE values just generated to 1 and 0 respectively. Taking the cumsum fills in the first group with 1, the second with 2 and so on. Finally omit the NA rows:
x <- na.omit(transform(x, g = cumsum(c(0, diff(!is.na(count)) == 1))))
giving:
> x
returns count maxCount sumCount g
2010-11-26 -0.009 1 0.030 0.000 1
2010-12-03 0.030 1 0.030 0.030 1
2010-12-10 0.013 2 0.030 0.042 1
2010-12-17 0.003 2 0.030 0.045 1
2010-12-24 0.010 3 0.030 0.056 1
2010-12-31 0.001 4 0.030 0.056 1
2011-01-07 0.011 5 0.030 0.067 1
2011-01-14 0.017 6 0.030 0.084 1
2011-01-21 -0.008 7 0.030 0.077 1
2011-01-28 -0.005 7 0.030 0.071 1
2011-02-04 0.027 7 0.030 0.098 1
2011-02-11 0.014 7 0.030 0.112 1
2011-02-18 0.010 7 0.030 0.123 1
2011-03-18 0.027 1 0.027 0.000 2
2011-03-25 -0.019 2 0.027 -0.019 2
attr(,"na.action")
2010-11-18 2010-11-19 2011-02-25 2011-03-04 2011-03-11 2011-03-26 2011-03-27
1 2 16 17 18 21 22
attr(,"class")
[1] "omit"
You can now use ave to perform any calculations you like. For example to take cumulative sums of returns by group:
transform(x, cumsumRet = ave(returns, g, FUN = cumsum))
Replace cumsum with any other function that is suitable for use with ave.
Ah, so "count" are the groups and you want the cumsum per group and the max per group. I think in data.table, so here is how I would do it.
library(xts)
library(data.table)
date <- as.Date(c("2010-11-18", "2010-11-19", "2010-11-26", "2010-12-03", "2010-12-10",
"2010-12-17", "2010-12-24", "2010-12-31", "2011-01-07", "2011-01-14",
"2011-01-21", "2011-01-28", "2011-02-04", "2011-02-11", "2011-02-18",
"2011-02-25", "2011-03-04", "2011-03-11", "2011-03-18", "2011-03-25",
"2011-03-26", "2011-03-27"))
returns <- c(0.002,0.000,-0.009,0.030, 0.013,0.003,0.010,0.001,0.011,0.017,
-0.008,-0.005,0.027,0.014,0.010,-0.017,0.001,-0.013,0.027,-0.019,
0.000,0.001)
count <- c(NA,NA,1,1,2,2,3,4,5,6,7,7,7,7,7,NA,NA,NA,1,2,NA,NA)
maxCount <- c(NA,NA,0.030,0.030,0.030,0.030,0.030,0.030,0.030,0.030,0.030,
0.030,0.030,0.030,0.030,NA,NA,NA,0.027,0.027,NA,NA)
sumCount <- c(NA,NA,0.000,0.030,0.042,0.045,0.056,0.056,0.067,0.084,0.077,
0.071,0.098,0.112,0.123,NA,NA,NA,0.000,-0.019,NA,NA)
DT<-data.table(date,returns,count)]
DT[!is.na(count),max:=max(returns),by=count]
DT[!is.na(count),cumSum:= cumsum(returns),by=count]
#if you need an xts object at the end, then.
xtsData <- xts(cbind(DT$returns,DT$count, DT$max,DT$cumSum),DT$date)

Resources