So, I'm working with a data frame that has daily data over a period of 444 days. I have several variables that I want to lag for use in a regression model (lm). I want to lag them 7 times each. I'm currently generating the lags like this...
email_data$email_reach1 <- lag(ts(email_data$email_reach, start = 1, end = 444), 1)
email_data$email_reach2 <- lag(ts(email_data$email_reach, start = 1, end = 444), 2)
email_data$email_reach3 <- lag(ts(email_data$email_reach, start = 1, end = 444), 3)
email_data$email_reach4 <- lag(ts(email_data$email_reach, start = 1, end = 444), 4)
email_data$email_reach5 <- lag(ts(email_data$email_reach, start = 1, end = 444), 5)
email_data$email_reach6 <- lag(ts(email_data$email_reach, start = 1, end = 444), 6)
email_data$email_reach7 <- lag(ts(email_data$email_reach, start = 1, end = 444), 7)
Then, I repeat this for every single variable I want to lag.
This seems like a terrible way of accomplishing this. Is there something better?
I've thought about lagging the entire data frame, which works, but I don't know how you'd assign variable names to the result and merge it back to the original data frame.
You can also use data.table. (HT to #akrun)
set.seed(1)
email_data <- data.frame(dates=1:10, email_reach=rbinom(10, 10, 0.5))
library(data.table)
setDT(email_data)[, paste0('email_reach', 1:3) := shift(email_reach, 1:3)][]
# dates email_reach email_reach1 email_reach2 email_reach3
# 1: 1 4 NA NA NA
# 2: 2 4 4 NA NA
# 3: 3 5 4 4 NA
# 4: 4 7 5 4 4
# 5: 5 4 7 5 4
# 6: 6 7 4 7 5
# 7: 7 7 7 4 7
# 8: 8 6 7 7 4
# 9: 9 6 6 7 7
#10: 10 3 6 6 7
Another approach is to use the xts library. A little example follows, we start out with:
x <- ts(matrix(rnorm(100),ncol=2), start=c(2009, 1), frequency=12)
head(x)
Series 1 Series 2
[1,] -1.82934747 -0.1234372
[2,] 1.08371836 1.3365919
[3,] 0.95786815 0.0885484
[4,] 0.59301446 -0.6984993
[5,] -0.01094955 -0.3729762
[6,] -0.19256525 0.3137705
Convert it to xts, an call lag(), here with 0,1,2 lags to minimize output:
library(xts)
head(lag(as.xts(x),0:2))
Series.1 Series.2 Series.1.1 Series.2.1 Series.1.2 Series.2.2
jan 2009 -1.82934747 -0.1234372 NA NA NA NA
feb 2009 1.08371836 1.3365919 -1.82934747 -0.1234372 NA NA
mar 2009 0.95786815 0.0885484 1.08371836 1.3365919 -1.8293475 -0.1234372
apr 2009 0.59301446 -0.6984993 0.95786815 0.0885484 1.0837184 1.3365919
maj 2009 -0.01094955 -0.3729762 0.59301446 -0.6984993 0.9578682 0.0885484
jun 2009 -0.19256525 0.3137705 -0.01094955 -0.3729762 0.5930145 -0.6984993
I think this does the same as your code above, for any given n.
n <- 7
for (i in 1:n) {
email_data[[paste0("email_reach", i)]] <- lag(ts(email_data$email_reach, start = 1, end = 444), i)
}
This isn't really an answer, just using the answer format as an elaboration of my warning above:
email_data <- data.frame( email_reach=ts(email_data$email_reach, start = 1, end = 444))
Then your code and this is what you get:
> head(email_data, 10)
email_reach email_reach1 email_reach2 email_reach3 email_reach4
1 4 4 4 4 4
2 4 4 4 4 4
3 5 5 5 5 5
4 7 7 7 7 7
5 4 4 4 4 4
6 7 7 7 7 7
7 7 7 7 7 7
8 6 6 6 6 6
9 6 6 6 6 6
10 3 3 3 3 3
email_reach5 email_reach6 email_reach7
1 4 4 4
2 4 4 4
3 5 5 5
4 7 7 7
5 4 4 4
6 7 7 7
7 7 7 7
8 6 6 6
9 6 6 6
10 3 3 3
Based on the answer by Molx, but generalized for any list of variables, and patched up a bit... Thanks Molx!
do_lag <- function(the_data, variables, num_periods) {
num_vars <- length(variables)
num_rows <- nrow(the_data)
for (j in 1:num_vars) {
for (i in 1:num_periods) {
the_data[[paste0(variables[j], i)]] <- c(rep(NA, i), head(the_data[[variables[j]]], num_rows - i))
}
}
return(the_data)
}
collapse::flag provides a general and fast (C++ based) solution to this problem:
library(collapse)
# Time-series (also supports xts and others)
head(flag(AirPassengers, -1:2))
## F1 -- L1 L2
## Jan 1949 118 112 NA NA
## Feb 1949 132 118 112 NA
## Mar 1949 129 132 118 112
## Apr 1949 121 129 132 118
## May 1949 135 121 129 132
## Jun 1949 148 135 121 129
# Time-series matrix
head(flag(EuStockMarkets, -1:2))
## Time Series:
## Start = c(1991, 130)
## End = c(1998, 169)
## Frequency = 260
## F1.DAX DAX L1.DAX L2.DAX F1.SMI SMI L1.SMI L2.SMI F1.CAC CAC L1.CAC L2.CAC F1.FTSE FTSE L1.FTSE L2.FTSE
## 1991.496 1613.63 1628.75 NA NA 1688.5 1678.1 NA NA 1750.5 1772.8 NA NA 2460.2 2443.6 NA NA
## 1991.500 1606.51 1613.63 1628.75 NA 1678.6 1688.5 1678.1 NA 1718.0 1750.5 1772.8 NA 2448.2 2460.2 2443.6 NA
## 1991.504 1621.04 1606.51 1613.63 1628.75 1684.1 1678.6 1688.5 1678.1 1708.1 1718.0 1750.5 1772.8 2470.4 2448.2 2460.2 2443.6
## 1991.508 1618.16 1621.04 1606.51 1613.63 1686.6 1684.1 1678.6 1688.5 1723.1 1708.1 1718.0 1750.5 2484.7 2470.4 2448.2 2460.2
## 1991.512 1610.61 1618.16 1621.04 1606.51 1671.6 1686.6 1684.1 1678.6 1714.3 1723.1 1708.1 1718.0 2466.8 2484.7 2470.4 2448.2
## 1991.515 1630.75 1610.61 1618.16 1621.04 1682.9 1671.6 1686.6 1684.1 1734.5 1714.3 1723.1 1708.1 2487.9 2466.8 2484.7 2470.4
# Data frame
head(flag(airquality[1:3], -1:2))
## F1.Ozone Ozone L1.Ozone L2.Ozone F1.Solar.R Solar.R L1.Solar.R L2.Solar.R F1.Wind Wind L1.Wind L2.Wind
## 1 36 41 NA NA 118 190 NA NA 8.0 7.4 NA NA
## 2 12 36 41 NA 149 118 190 NA 12.6 8.0 7.4 NA
## 3 18 12 36 41 313 149 118 190 11.5 12.6 8.0 7.4
## 4 NA 18 12 36 NA 313 149 118 14.3 11.5 12.6 8.0
## 5 28 NA 18 12 NA NA 313 149 14.9 14.3 11.5 12.6
## 6 23 28 NA 18 299 NA NA 313 8.6 14.9 14.3 11.5
# Panel lag
head(flag(iris[1:2], -1:2, iris$Species))
## Panel-lag computed without timevar: Assuming ordered data
## F1.Sepal.Length Sepal.Length L1.Sepal.Length L2.Sepal.Length F1.Sepal.Width Sepal.Width L1.Sepal.Width L2.Sepal.Width
## 1 4.9 5.1 NA NA 3.0 3.5 NA NA
## 2 4.7 4.9 5.1 NA 3.2 3.0 3.5 NA
## 3 4.6 4.7 4.9 5.1 3.1 3.2 3.0 3.5
## 4 5.0 4.6 4.7 4.9 3.6 3.1 3.2 3.0
## 5 5.4 5.0 4.6 4.7 3.9 3.6 3.1 3.2
## 6 4.6 5.4 5.0 4.6 3.4 3.9 3.6 3.1
Similarly collapse::fdiff and collapse::fgrowth support stuitably lagged /leaded and iterated (quasi-, log-) differences and growth rates on (multivariate) time series and panels.
Related
Replicating a 2011 example script, the aggregate() function of base R produces NANs. I was wondering if I need to use a more recent version of aggregate or a similar function? Please advise.
Example s1s2.df can be found here: https://www.dropbox.com/s/dsqina3vuy0774u/df.csv?dl=0
Code that produces NAN instead of summarised values:
s1.no.present <- aggregate(s1s2.df$no.present[s1s2.df$sabap==-1], by=list(s1s2.df$month.n[s1s2.df$sabap==-1]),sum)[,2]
s1.no.cards <- aggregate(s1s2.df$no.cards[s1s2.df$sabap==-1], by=list(s1s2.df$month.n[s1s2.df$sabap==-1]),sum)[,2]
s2.no.present <- aggregate(s1s2.df$no.present[s1s2.df$sabap==1], by=list(s1s2.df$month.n[s1s2.df$sabap==1]),sum)[,2]
s2.no.cards <- aggregate(s1s2.df$no.cards[s1s2.df$sabap==1], by=list(s1s2.df$month.n[s1s2.df$sabap==1]),sum)[,2]
Incorrect output:
> tibble(s1.no.present)
# A tibble: 12 × 1
s1.no.present
<int>
1 NA
2 NA
3 NA
4 NA
5 NA
6 NA
7 NA
8 NA
9 NA
10 NA
11 NA
12 NA
Use a custom sum function to remove NAs:
#data
s1s2.df <- read.csv("tmp.csv")
mySum <- function(x){ sum(x, na.rm = TRUE) }
aggregate(s1s2.df$no.present[s1s2.df$sabap == -1 ],
by = list(s1s2.df$month.n[s1s2.df$sabap == -1 ]),
mySum)
# Group.1 x
# 1 1 218
# 2 2 369
# 3 3 590
# 4 4 1471
# 5 5 1880
# 6 6 2241
# 7 7 2306
# 8 8 1827
# 9 9 1377
# 10 10 774
# 11 11 281
# 12 12 280
Or use formulas:
aggregate(formula = no.present ~ month.n,
data = s1s2.df[s1s2.df$sabap == -1, ],
FUN = sum)
# month.n no.present
# 1 1 218
# 2 2 369
# 3 3 590
# 4 4 1471
# 5 5 1880
# 6 6 2241
# 7 7 2306
# 8 8 1827
# 9 9 1377
# 10 10 774
# 11 11 281
# 12 12 280
I have a data frame df, and a list L of indices at which I should put 0 instead of the current values of df.
Example:
DF:
# A tibble: 11 x 3
A B C
<dbl> <dbl> <dbl>
1724 4 2013
1758 4 2013
1612 3 2013
1692 3 2013
1260 33 2014
1157 22 2014
1359 63 2014
1414 27 2014
387 3 2016
374 3 2016
L:
[[1]]
[1] 3 4
[[2]]
[1] 1 2 3 4 5
[[3]]
[1] 1
So in this example, I have to put zeros in rows 3, 4 of column A, in rows 1:5 in column B and row 1 in column C.
Is there a way to do it as a one-liner in R? A dplyr or R-base solution would be great! Also, I would like to avoid apply or loops since I have to do this very efficiently
Loop looks very fast to me. Haven't done the complexity comparison but if you have your replacement in list form and want to replace with 'val', just simply:
df
a b c
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
9 9 9 9
10 10 10 10
val<-0
for(i in 1:length(L)){
df[L[[i]],i]<-val
}
df
a b c
1 1 0 0
2 2 0 2
3 0 0 3
4 0 0 4
5 5 0 5
6 6 6 6
7 7 7 7
8 8 8 8
9 9 9 9
10 10 10 10
I tested it on x, a 10,000 row and 10,0000 column df:
> b<-Sys.time()
> for(i in 1:length(L)){
+ x[L[[i]],i]<-0
+ }
> Sys.time()-b
Time difference of 0.490464 secs
Looks pretty quick :) I know it's obvious but hope it helps!
******** EDIT 1 ********
If we look at method by #mt1022 using unlist and cbind:
> b<-Sys.time()
> Lcol <- rep(seq_along(L), lengths(L))
> x[cbind(unlist(L), Lcol)] <- 0
> Sys.time()-b
Time difference of 7.467723 secs
Clearly much slower (because when we unlist, we essentailly loop through each and every element in L instead of each vector in L). ;)
Another way using matrix of indices:
# DF <- read.table(textConnection('A B C
# 1724 4 2013
# 1758 4 2013
# 1612 3 2013
# 1692 3 2013
# 1260 33 2014
# 1157 22 2014
# 1359 63 2014
# 1414 27 2014
# 387 3 2016
# 374 3 2016'), header = T)
#
# L <- list(c(3, 4), c(1, 2, 3, 4, 5), c(1))
Lcol <- rep(seq_along(L), lengths(L))
DF[cbind(unlist(L), Lcol)] <- 0
# > DF
# A B C
# 1 1724 0 0
# 2 1758 0 2013
# 3 0 0 2013
# 4 0 0 2013
# 5 1260 0 2014
# 6 1157 22 2014
# 7 1359 63 2014
# 8 1414 27 2014
# 9 387 3 2016
# 10 374 3 2016
Another option is to use mapply in combination with do.call.
do.call(cbind, mapply(function(x,y){
df[x,y]<-0
df[y]
}, mylist, seq_along(mylist)))
# A B C
# [1,] 1724 0 0
# [2,] 1758 0 2013
# [3,] 0 0 2013
# [4,] 0 0 2013
# [5,] 1260 0 2014
# [6,] 1157 22 2014
# [7,] 1359 63 2014
# [8,] 1414 27 2014
# [9,] 387 3 2016
# [10,] 374 3 2016
Data:
df <- read.table(text =
"A B C
1724 4 2013
1758 4 2013
1612 3 2013
1692 3 2013
1260 33 2014
1157 22 2014
1359 63 2014
1414 27 2014
387 3 2016
374 3 2016", header = TRUE)
mylist <- list(c(3, 4), c(1, 2, 3, 4, 5), c(1))
I have a data with primary key and ratio values like the following
2.243164164
1.429242413
2.119270714
3.013427143
1.208634972
1.208634972
1.23657632
2.212136028
2.168583297
2.151961216
1.159886063
1.234106444
1.694206176
1.401425329
5.210125578
1.215267806
1.089189869
I want to add a rank column which groups these ratios in say 3 bins. Functionality similar to the sas code:
PROC RANK DATA = TAB1 GROUPS = &NUM_BINS
I did the following:
Convert your vector to data frame.
Create variable Rank:
test2$rank<-rank(test2$test)
> test2
test rank
1 2.243164 15.0
2 1.429242 9.0
3 2.119271 11.0
4 3.013427 16.0
5 1.208635 3.5
6 1.208635 3.5
7 1.236576 7.0
8 2.212136 14.0
9 2.168583 13.0
10 2.151961 12.0
11 1.159886 2.0
12 1.234106 6.0
13 1.694206 10.0
14 1.401425 8.0
15 5.210126 17.0
16 1.215268 5.0
17 1.089190 1.0
Define function to convert to percentile ranks and then define pr as that percentile.
percent.rank<-function(x) trunc(rank(x)/length(x)*100)
test3<-within(test2,pr<-percent.rank(rank))
Then I created bins on the fact you wanted 3 of them.
test3$bins <- cut(test3$pr, breaks=c(0,33,66,100), labels=c("0-33","34-66","66-100"))
test x rank pr bins
1 2.243164 15.0 15.0 88 66-100
2 1.429242 9.0 9.0 52 34-66
3 2.119271 11.0 11.0 64 34-66
4 3.013427 16.0 16.0 94 66-100
5 1.208635 3.5 3.5 20 0-33
6 1.208635 3.5 3.5 20 0-33
7 1.236576 7.0 7.0 41 34-66
8 2.212136 14.0 14.0 82 66-100
9 2.168583 13.0 13.0 76 66-100
10 2.151961 12.0 12.0 70 66-100
11 1.159886 2.0 2.0 11 0-33
12 1.234106 6.0 6.0 35 34-66
13 1.694206 10.0 10.0 58 34-66
14 1.401425 8.0 8.0 47 34-66
15 5.210126 17.0 17.0 100 66-100
16 1.215268 5.0 5.0 29 0-33
17 1.089190 1.0 1.0 5 0-33
That work for you?
Almost late but given your data, we can use ntile from dplyr package to get equal sized groups:
df <- data.frame(values = c(2.243164164,
1.429242413,
2.119270714,
3.013427143,
1.208634972,
1.208634972,
1.23657632,
2.212136028,
2.168583297,
2.151961216,
1.159886063,
1.234106444,
1.694206176,
1.401425329,
5.210125578,
1.215267806,
1.089189869))
library(dplyr)
df <- df %>%
arrange(values) %>%
mutate(rank = ntile(values, 3))
values rank
1 1.089190 1
2 1.159886 1
3 1.208635 1
4 1.208635 1
5 1.215268 1
6 1.234106 1
7 1.236576 2
8 1.401425 2
9 1.429242 2
10 1.694206 2
11 2.119271 2
12 2.151961 2
13 2.168583 3
14 2.212136 3
15 2.243164 3
16 3.013427 3
17 5.210126 3
Or see cut_number from ggplot2 package:
library(ggplot2)
df$rank2 <- cut_number(df$values, 3, labels = c(1:3))
values rank rank2
1 1.089190 1 1
2 1.159886 1 1
3 1.208635 1 1
4 1.208635 1 1
5 1.215268 1 1
6 1.234106 1 1
7 1.236576 2 2
8 1.401425 2 2
9 1.429242 2 2
10 1.694206 2 2
11 2.119271 2 2
12 2.151961 2 3
13 2.168583 3 3
14 2.212136 3 3
15 2.243164 3 3
16 3.013427 3 3
17 5.210126 3 3
Because your sample consists of 17 numbers, one bin consists of 5 numbers while the others consist of 6 numbers. There are differences for row 12: ntile assigns 6 numbers to the first and second group, whereas cut_number assigns them to the first and third group.
> table(df$rank)
1 2 3
6 6 5
> table(df$rank2)
1 2 3
6 5 6
See also here: Splitting a continuous variable into equal sized groups
I am trying to shorten a chunk of code to make it faster and easier to modify. This is a short example of my data.
order obs year var1 var2 var3
1 3 1 1 32 588 NA
2 4 1 2 33 689 2385
3 5 1 3 NA 678 2369
4 33 3 1 10 214 1274
5 34 3 2 10 237 1345
6 35 3 3 10 242 1393
7 78 6 1 5 62 NA
8 79 6 2 5 75 296
9 80 6 3 5 76 500
10 93 7 1 NA NA NA
11 94 7 2 4 86 247
12 95 7 3 3 54 207
Basically, what I want is R to find any possible and unique combination of two values (observations) in column "obs", within the same year, to create a new matrix or DF with observations being the aggregation of the originals. Order is not important, so 1+6 = 6+1. For instance, having 150 observations, I will expect 11,175 feasible combinations (each year).
I sort of got what I want with basic coding but, as you will see, is way too long (I have built this way 66 different new data sets so it does not really make a sense) and I am wondering how to shorten it. I did some trials (plyr,...) with no real success. Here what I did:
# For the 1st year, groups of 2 obs
newmatrix <- data.frame(t(combn(unique(data$obs[data$year==1]), 2)))
colnames(newmatrix) <- c("obs1", "obs2")
newmatrix$name <- do.call(paste, c(newmatrix[c("obs1", "obs2")], sep = "_"))
# and the aggregation of var. using indexes, which I will skip here to save your time :)
To ilustrate, here the result, considering above sample, of what I would get for the 1st year. NA is because I only computed those where the 2 values were valid. And only for variables 1 and 3. More, I did the sum but it could be any other possible Function:
order obs1 obs2 year var1 var3
1 1 1 3 1_3 42 NA
2 2 1 6 1_6 37 NA
3 3 1 7 1_7 NA NA
4 4 3 6 3_6 15 NA
5 5 3 7 3_7 NA NA
6 6 6 7 6_7 NA NA
As for the 2 first lines in the 3rd year, same type of matrix:
order obs1 obs2 year var1 var3
1 1 1 3 1_3 NA 3762
2 2 1 6 1_6 NA 2868
.......... etc ............
I hope I explained myself. Thank you in advance for your hints on how to do this more efficient.
I would use split-apply-combine to split by year, find all the combinations, and then combine back together:
do.call(rbind, lapply(split(data, data$year), function(x) {
p <- combn(nrow(x), 2)
data.frame(order=paste(x$order[p[1,]], x$order[p[2,]], sep="_"),
obs1=x$obs[p[1,]],
obs2=x$obs[p[2,]],
year=x$year[1],
var1=x$var1[p[1,]] + x$var1[p[2,]],
var2=x$var2[p[1,]] + x$var2[p[2,]],
var3=x$var3[p[1,]] + x$var3[p[2,]])
}))
# order obs1 obs2 year var1 var2 var3
# 1.1 3_33 1 3 1 42 802 NA
# 1.2 3_78 1 6 1 37 650 NA
# 1.3 3_93 1 7 1 NA NA NA
# 1.4 33_78 3 6 1 15 276 NA
# 1.5 33_93 3 7 1 NA NA NA
# 1.6 78_93 6 7 1 NA NA NA
# 2.1 4_34 1 3 2 43 926 3730
# 2.2 4_79 1 6 2 38 764 2681
# 2.3 4_94 1 7 2 37 775 2632
# 2.4 34_79 3 6 2 15 312 1641
# 2.5 34_94 3 7 2 14 323 1592
# 2.6 79_94 6 7 2 9 161 543
# 3.1 5_35 1 3 3 NA 920 3762
# 3.2 5_80 1 6 3 NA 754 2869
# 3.3 5_95 1 7 3 NA 732 2576
# 3.4 35_80 3 6 3 15 318 1893
# 3.5 35_95 3 7 3 13 296 1600
# 3.6 80_95 6 7 3 8 130 707
This enables you to be very flexible in how you combine data pairs of observations within a year --- x[p[1,],] represents the year-specific data for the first element in each pair and x[p[2,],] represents the year-specific data for the second element in each pair. You can return a year-specific data frame with any combination of data for the pairs, and the year-specific data frames are combined into a single final data frame with do.call and rbind.
I have the following dummy data set:
ID TIME DDAY DV
1 0 50 6.6
1 12 50 6.1
1 24 50 5.6
1 48 50 7.6
2 0 10 6.6
2 12 10 6.6
2 24 10 6.6
2 48 10 6.6
3 0 50 3.6
3 12 50 6.8
3 24 50 9.6
3 48 50 7.1
4 0 10 8.6
4 12 10 6.4
4 24 10 4.6
4 48 10 5.6
I want to create summary table for mean and standard deviations for DV as shown below:
N TIME DDAY MEAN-DV SD-DV
2 0 50 6.5 1.1
2 12 50 6.1 0.8
2 24 50 4.5 2.0
2 48 50 7.5 1.0
2 0 10 6.9 1.5
2 12 10 8.5 1.3
2 24 10 6.1 0.9
2 48 10 4.5 1.8
How do I do this in R?
You can use:
1) dplyr:
library(dplyr)
dat %.%
group_by(TIME, DDAY) %.%
summarise(MEAN_DV = mean(DV), SD_DV = sd(DV), N = length(DV))
# TIME DDAY MEAN_DV SD_DV N
# 1 48 10 6.10 0.7071068 2
# 2 24 10 5.60 1.4142136 2
# 3 12 10 6.50 0.1414214 2
# 4 0 10 7.60 1.4142136 2
# 5 48 50 7.35 0.3535534 2
# 6 24 50 7.60 2.8284271 2
# 7 12 50 6.45 0.4949747 2
# 8 0 50 5.10 2.1213203 2
where dat is the name of your data frame.
2) data.table:
library(data.table)
DT <- as.data.table(dat)
DT[ , list(MEAN_DV = mean(DV), SD_DV = sd(DV), N = .N), by = c("TIME", "DDAY")]
# TIME DDAY MEAN_DV SD_DV N
# 1: 0 50 5.10 2.1213203 2
# 2: 12 50 6.45 0.4949747 2
# 3: 24 50 7.60 2.8284271 2
# 4: 48 50 7.35 0.3535534 2
# 5: 0 10 7.60 1.4142136 2
# 6: 12 10 6.50 0.1414214 2
# 7: 24 10 5.60 1.4142136 2
# 8: 48 10 6.10 0.7071068 2
require(plyr)
# THIS COLLAPSES ON TIME
ddply(df, .(TIME), summarize, MEAN_DV=mean(DV), SD_DV=sd(DV), N=length(DV))
# THIS COLLAPSES ON TIME AND DDAY
ddply(df, .(TIME, DDAY), summarize, MEAN_DV=mean(DV), SD_DV=sd(DV), N=length(DV))