I am using glmmTMB to analyze a negative binomial generalized linear mixed model (GLMM) where the dependent variable is count data (CT), which is over-dispersed.
There are 115 samples (rows) in the relevant data frame. There are two fixed effects (F1, F2) and a random intercept (R), within which is nested a further random effect (NR). There is also an offset, consisting of the natural logarithm of the total counts in each sample (LOG_TOT).
An example of a data frame, df, is:
CT F1 F2 R NR LOG_TOT
77 0 0 1 1 12.9
167 0 0 2 6 13.7
289 0 0 3 11 13.9
253 0 0 4 16 13.9
125 0 0 5 21 13.7
109 0 0 6 26 13.6
96 1 0 1 2 13.1
169 1 0 2 7 13.7
190 1 0 3 12 13.8
258 1 0 4 17 13.9
101 1 0 5 22 13.5
94 1 0 6 27 13.5
89 1 25 1 4 13.0
166 1 25 2 9 13.6
175 1 25 3 14 13.7
221 1 25 4 19 13.8
131 1 25 5 24 13.5
118 1 25 6 29 13.6
58 1 75 1 5 12.9
123 1 75 2 10 13.4
197 1 75 3 15 13.7
208 1 75 4 20 13.8
113 1 8 1 3 13.2
125 1 8 2 8 13.7
182 1 8 3 13 13.7
224 1 8 4 18 13.9
104 1 8 5 23 13.5
116 1 8 6 28 13.7
122 2 0 1 2 13.1
115 2 0 2 7 13.6
149 2 0 3 12 13.7
270 2 0 4 17 14.1
116 2 0 5 22 13.5
94 2 0 6 27 13.7
73 2 25 1 4 12.8
61 2 25 2 9 13.0
185 2 25 3 14 13.8
159 2 25 4 19 13.7
125 2 25 5 24 13.6
75 2 25 6 29 13.5
121 2 8 1 3 13.0
143 2 8 2 8 13.8
219 2 8 3 13 13.9
191 2 8 4 18 13.7
98 2 8 5 23 13.5
115 2 8 6 28 13.6
110 3 0 1 2 12.8
123 3 0 2 7 13.6
210 3 0 3 12 13.9
354 3 0 4 17 14.4
160 3 0 5 22 13.7
101 3 0 6 27 13.6
69 3 25 1 4 12.6
112 3 25 2 9 13.5
258 3 25 3 14 13.8
174 3 25 4 19 13.5
171 3 25 5 24 13.9
117 3 25 6 29 13.7
38 3 75 1 5 12.1
222 3 75 2 10 14.1
204 3 75 3 15 13.5
235 3 75 4 20 13.7
241 3 75 5 25 13.8
141 3 75 6 30 13.9
113 3 8 1 3 12.9
90 3 8 2 8 13.5
276 3 8 3 13 14.1
199 3 8 4 18 13.8
111 3 8 5 23 13.6
109 3 8 6 28 13.7
135 4 0 1 2 13.1
144 4 0 2 7 13.6
289 4 0 3 12 14.2
395 4 0 4 17 14.6
154 4 0 5 22 13.7
148 4 0 6 27 13.8
58 4 25 1 4 12.8
136 4 25 2 9 13.8
288 4 25 3 14 14.0
113 4 25 4 19 13.5
162 4 25 5 24 13.7
172 4 25 6 29 14.1
2 4 75 1 5 12.3
246 4 75 3 15 13.7
247 4 75 4 20 13.9
114 4 8 1 3 13.1
107 4 8 2 8 13.6
209 4 8 3 13 14.0
190 4 8 4 18 13.9
127 4 8 5 23 13.5
101 4 8 6 28 13.7
167 6 0 1 2 13.4
131 6 0 2 7 13.5
369 6 0 3 12 14.5
434 6 0 4 17 14.9
172 6 0 5 22 13.8
126 6 0 6 27 13.8
90 6 25 1 4 13.1
172 6 25 2 9 13.7
330 6 25 3 14 14.2
131 6 25 4 19 13.7
151 6 25 5 24 13.9
141 6 25 6 29 14.2
7 6 75 1 5 12.2
194 6 75 2 10 14.2
280 6 75 3 15 13.7
253 6 75 4 20 13.8
45 6 75 5 25 13.4
155 6 75 6 30 13.9
208 6 8 1 3 13.5
97 6 8 2 8 13.5
325 6 8 3 13 14.3
235 6 8 4 18 14.1
112 6 8 5 23 13.6
188 6 8 6 28 14.1
The random and nested random effects are treated as factors. The fixed effect F1 has the value 0, 1, 2, 3, 4 and 6. The fixed effect F2 has the values 0, 8, 25 and 75. I am treating the fixed effects as continuous, rather than ordinal, because I would like to identify monotonic unidirectional changes in the dependent variable CT rather than up and down changes.
I previously used the lme4 package to analyze the data as a mixed model:
library(lme4)
m1 <- lmer(CT ~ F1*F2 + (1|R/NR) +
offset(LOG_TOT), data = df, verbose=FALSE)
Followed by the use of glht in the multcomp package for post-hoc analysis employing the formula approach:
library(multcomp)
glht_fixed1 <- glht(m1, linfct = c(
"F1 == 0",
"F1 + 8*F1:F2 == 0",
"F1 + 25*F1:F2 == 0",
"F1 + 75*F1:F2 == 0",
"F1 + (27)*F1:F2 == 0"))
glht_fixed2 <- glht(m1, linfct = c(
"F2 + 1*F1:F2 == 0",
"F2 + 2*F1:F2 == 0",
"F2 + 3*F1:F2 == 0",
"F2 + 4*F1:F2 == 0",
"F2 + 6*F1:F2 == 0",
"F2 + (3.2)*F1:F2 == 0"))
glht_omni <- glht(m1)
Here is the corresponding negative binomial glmmTMB model, which I now prefer:
library(glmmTMB)
m2 <- glmmTMB(CT ~ F1*F2 + (1|R/NR) +
offset(LOG_TOT), data = df, verbose=FALSE, family="nbinom2")
According to this suggestion by Ben Bolker (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2017q3/025813.html), the best approach to post hoc testing with glmmTMB is to use lsmeans (?or its more recent equivalent, emmeans).
I follwed Ben's suggestion, running
source(system.file("other_methods","lsmeans_methods.R",package="glmmTMB"))
and I can then use emmeans on the glmmTMB object. For example,
as.glht(emmeans(m2,~(F1 + 27*F1:F2)))
General Linear Hypotheses
Linear Hypotheses:
Estimate
3.11304347826087, 21 == 0 -8.813
But this does not seem correct. I can also change F1 and F2 to factors and then try this:
as.glht(emmeans(m2,~(week + 27*week:conc)))
General Linear Hypotheses
Linear Hypotheses:
Estimate
0, 0 == 0 -6.721
1, 0 == 0 -6.621
2, 0 == 0 -6.342
3, 0 == 0 -6.740
4, 0 == 0 -6.474
6, 0 == 0 -6.967
0, 8 == 0 -6.694
1, 8 == 0 -6.651
2, 8 == 0 -6.227
3, 8 == 0 -6.812
4, 8 == 0 -6.371
6, 8 == 0 -6.920
0, 25 == 0 -6.653
1, 25 == 0 -6.648
2, 25 == 0 -6.282
3, 25 == 0 -6.766
4, 25 == 0 -6.338
6, 25 == 0 -6.702
0, 75 == 0 -6.470
1, 75 == 0 -6.642
2, 75 == 0 -6.091
3, 75 == 0 -6.531
4, 75 == 0 -5.762
6, 75 == 0 -6.612
But, again, I am not sure how to bend this output to my will. If some kind person could tell me how to correctly carry over the use of formulae in glht and linfct to the emmeans scenario with glmmTMB, I would be very grateful. I have read all the manuals and vignettes until I am blue in face (or it feels that way, at least), but I am still at a loss. In my defense (culpability?) I am a statistical tyro, so many apologies if I am asking a question with very obvious answers here.
The glht software and post hoc testing carries directly over to the glmmADMB package, but glmmADMB is 10x slower than glmmTMB. I need to perform multiple runs of this analysis, each with 300,000 examples of the negative binomial mixed model, so speed is essential.
Many thanks for your suggestions and help!
The second argument (specs) to emmeans is not the same as the linfct argument in glht, so you can't use it in the same way. You have to call emmeans() using it the way it was intended. The as.glht() function converts the result to a glht object, but it really is not necessary to do that as the emmeans summary yields similar results.
I think the results you were trying to get are obtainable via
emmeans(m2, ~ F2, at = list(F2 = c(0, 8, 25, 75)))
(using the original model with the predictors as quantitative variables). This will compute the adjusted means holding F1 at its average, and at each of the specified values of F2.
Please look at the documentation for emmeans(). In addition, there are lots of vignettes that provide explanations and examples -- starting with https://cran.r-project.org/web/packages/emmeans/vignettes/basics.html.
Following the advice of my excellent statistical consultant, I think the solution below provides what I had previously obtained using glht and linfct.
The slopes for F1 are calculated at the various levels of F2 by using contrast and emmeans to compute the differences in the dependendent variable between two values of F1 separated by one unit (i.e. c(0,1)). (Since the regression is linear, the two values of F1 are arbitrary, provided they are separated by one unit, eg c(3,4)). Vice versa for the slopes of F2.
Thus, slopes of F1 at F2 = 0, 8, 25, 75 and 27 (27 is average of F2):
contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=0)),list(c(-1,1)))
(above equivalent to: summary(m1)$coefficients$cond["F1",])
contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=8)),list(c(-1,1)))
contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=25)),list(c(-1,1)))
contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=75)),list(c(-1,1)))
contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=27)),list(c(-1,1)))
and slopes of F2 at F1 = 1, 2, 3, 4, 6 and 3.2 (3.2 is average of F1, excluding zero value):
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=0)),list(c(-1,1)))
(above equivalent to: summary(m1)$coefficients$cond["F2",])
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=1)),list(c(-1,1)))
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=2)),list(c(-1,1)))
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=3)),list(c(-1,1)))
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=4)),list(c(-1,1)))
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=6)),list(c(-1,1)))
contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=3.2)),list(c(-1,1)))
Interaction of F1 and F2 slopes at F1 = 0 and F2 = 0
contrast(emmeans(m1, specs=c("F1","F2"), at=list(F1=c(0,1),F2=c(0,1))),list(c(1,-1,-1,1)))
(above equivalent to: summary(m1)$coefficients$cond["F1:F2",])
From the resulting emmGrid objects provided from contrast(), one can pick out as desired the estimate of the slope (estimate), standard deviation of the estimated slope (SE), Z score for the difference of the estimated slope from a null hypothesized slope of zero (z.ratio, calculated by emmGrid from estimate divided by SE) and corresponding P value (p.value calculated by emmGrid as 2*pnorm(-abs(z.ratio)).
For example:
contrast(emmeans(m1, specs="F1", at=list(F2=c(0,1), F1=0)),list(c(-1,1)))
yields:
NOTE: Results may be misleading due to involvement in interactions
contrast estimate SE df z.ratio p.value
c(-1, 1) 0.001971714 0.002616634 NA 0.754 0.4511
Postscript added 1.25 yrs later:
The above gives the correct solutions, but as Russell Lenth pointed out the answers are more easily obtained using emtrends. However, I have selected this answer as being correct since there may be have some didactic value in showing how to calculate slopes using emmeans to find the resulting change in the predicted dependent variable when the independent variable changes by 1.
Related
airquality
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
7 23 299 8.6 65 5 7
8 19 99 13.8 59 5 8
9 8 19 20.1 61 5 9
Hi there,
How do I replace values in Ozone to be binary? If NA then 0 and if a value then 1.
Thanks
H
Assuming your dataframe is called airquality
airquality$Ozone <- ifelse(is.na(airquality$Ozone), 0, 1)
airquality$Ozone <- as.integer(!is.na(airquality$Ozone))
Alternatively
airquality$Ozone[!is.na(airquality$Ozone)] <- 1L
airquality$Ozone[is.na(airquality$Ozone)] <- 0L
I have a nested list which contains set of data.frame objects in it, now I want them flatten out. I used most common approach like unlist method, it is not properly fatten out my list, the output was not well represented. How can I make this happen more efficiently? Does anyone knows any trick of doing this operation? Thanks.
example:
mylist <- list(pass=list(Alpha.df1_yes=airquality[2:4,], Alpha.df2_yes=airquality[3:6,],Alpha.df3_yes=airquality[2:5,],Alpha.df4_yes=airquality[7:9,]),
fail=list(Alpha.df1_no=airquality[5:7,], Alpha.df2_no=airquality[8:10,], Alpha.df3_no=airquality[13:16,],Alpha.df4_no=airquality[11:13,]))
I tried like this, it works but output was not properly arranged.
res <- lapply(mylist, unlist)
after flatten out, I would like to do merge them without duplication:
out <- lapply(res, rbind.data.frame)
my desired output:
mylist[[1]]$pass:
Ozone Solar.R Wind Temp Month Day
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
How can make this sort of flatten output more compatibly represented? Can anyone propose possible idea of doing this in R? Thanks a lot.
Using lapply and duplicated:
res <- lapply(mylist, function(i){
x <- do.call(rbind, i)
x[ !duplicated(x), ]
rownames(x) <- NULL
x
})
res$pass
# Ozone Solar.R Wind Temp Month Day
# 1 36 118 8.0 72 5 2
# 2 12 149 12.6 74 5 3
# 3 18 313 11.5 62 5 4
# 4 12 149 12.6 74 5 3
# 5 18 313 11.5 62 5 4
# 6 NA NA 14.3 56 5 5
# 7 28 NA 14.9 66 5 6
# 8 36 118 8.0 72 5 2
# 9 12 149 12.6 74 5 3
# 10 18 313 11.5 62 5 4
# 11 NA NA 14.3 56 5 5
# 12 23 299 8.6 65 5 7
# 13 19 99 13.8 59 5 8
# 14 8 19 20.1 61 5 9
Above still returns a list, if we want to keep all in one dataframe with no lists, then:
res <- do.call(rbind, unlist(mylist, recursive = FALSE))
res <- res[!duplicated(res), ]
res
# Ozone Solar.R Wind Temp Month Day
# pass.Alpha.df1_yes.2 36 118 8.0 72 5 2
# pass.Alpha.df1_yes.3 12 149 12.6 74 5 3
# pass.Alpha.df1_yes.4 18 313 11.5 62 5 4
# pass.Alpha.df2_yes.5 NA NA 14.3 56 5 5
# pass.Alpha.df2_yes.6 28 NA 14.9 66 5 6
# pass.Alpha.df4_yes.7 23 299 8.6 65 5 7
# pass.Alpha.df4_yes.8 19 99 13.8 59 5 8
# pass.Alpha.df4_yes.9 8 19 20.1 61 5 9
# fail.Alpha.df2_no.10 NA 194 8.6 69 5 10
# fail.Alpha.df3_no.13 11 290 9.2 66 5 13
# fail.Alpha.df3_no.14 14 274 10.9 68 5 14
# fail.Alpha.df3_no.15 18 65 13.2 58 5 15
# fail.Alpha.df3_no.16 14 334 11.5 64 5 16
# fail.Alpha.df4_no.11 7 NA 6.9 74 5 11
# fail.Alpha.df4_no.12 16 256 9.7 69 5 12
So I have a dataset that includes the lung capacity of certain individuals. I am trying to analyze the data distributions and relations. The only problem is that the data is somewhat incomplete. Some of the rows include "N/A" as the lung capacity. This is causing an issue because it is resulting in a mean and sd of always "N/A" for the different subsets. How would I form this into a subset so that it only includes the data that isn't N/A?
I've tried this:
fData1 = read.table("lung.txt",header=TRUE)
fData2= fData1[fData1$fev!="N/A"]
but this gives me an "undefinied columns selected error".
How can I make it so that I have a data set that excludes the rows with "N/A"?
Here is the begining of my data set:
id age fev height male smoke
1 72 1.2840 66.5 1 1
2 81 2.5530 67.0 0 0
3 90 2.3830 67.0 1 0
4 72 2.6990 71.5 1 0
5 70 2.0310 62.5 0 0
6 72 2.4100 67.5 1 0
7 75 3.5860 69.0 1 0
8 75 2.9580 67.0 1 0
9 67 1.9160 62.5 0 0
10 70 NA 66.0 0 1
One option is to apply the operations excluding the NA values:
dat <- read.table("lung.txt", header = T, na.strings = "NA")
mean(dat$fev, na.rm=T) # mean of fev col
sd(dat$fev, na.rm=T)
If you simply want to get rid of the NAs:
fData1 <- na.omit(fData1)
fData1 <- na.exclude(fData1) # same result
If you'd like to save the rows with NA's here are 2 options:
fData2 <- fData1[is.na(fData1$fev), ]
fData2 <- subset(fData1, is.na(fData1$fev))
If you just want to filter out rows with NA values, you can use complete.cases():
> df
id age fev height male smoke
1 1 72 1.284 66.5 1 1
2 2 81 2.553 67.0 0 0
3 3 90 2.383 67.0 1 0
4 4 72 2.699 71.5 1 0
5 5 70 2.031 62.5 0 0
6 6 72 2.410 67.5 1 0
7 7 75 3.586 69.0 1 0
8 8 75 2.958 67.0 1 0
9 9 67 1.916 62.5 0 0
10 10 70 NA 66.0 0 1
> df[complete.cases(df), ]
id age fev height male smoke
1 1 72 1.284 66.5 1 1
2 2 81 2.553 67.0 0 0
3 3 90 2.383 67.0 1 0
4 4 72 2.699 71.5 1 0
5 5 70 2.031 62.5 0 0
6 6 72 2.410 67.5 1 0
7 7 75 3.586 69.0 1 0
8 8 75 2.958 67.0 1 0
9 9 67 1.916 62.5 0 0
I read a text file in R, looks like below, with 1354896 rows and 5 colums.
I try read.table(), and read.delim() to upload the file, however the format of file after upload changes. It transforms everything into a single column.
OffsetY=0
GridCornerUL=258 182
GridCornerUR=8450 210
GridCornerLR=8419 8443
GridCornerLL=228 8414
Axis-invertX=0
AxisInvertY=0
swapXY=0
DatHeader=[19..65528] PA-D 102 Full:CLS=8652 RWS=8652 XIN=1 YIN=1 VE=30 2.0 11/04/03 12:49:30 50205710 M10 HG-U133_Plus_2.1sq 6
Algorithm=Percentile
AlgorithmParameters=Percentile:75;CellMargin:2;OutlierHigh:1.500;OutlierLow:1.004;AlgVersion:6.0;FixedCellSize:TRUE;FullFeatureWidth:7;FullFeatureHeight:7;IgnoreOutliersInShiftRows:FALSE;FeatureExtraction:TRUE;PoolWidthExtenstion:2;PoolHeightExtension:2;UseSubgrids:FALSE;RandomizePixels:FALSE;ErrorBasis:StdvMean;StdMult:1.000000
[INTENSITY]
NumberCells=1354896
CellHeader=X Y MEAN STDV NPIXELS
0 0 147.0 23.5 25
1 0 10015.0 1276.7 25
2 0 160.0 24.7 25
3 0 9710.0 1159.8 25
4 0 85.0 14.0 25
5 0 171.0 21.0 25
6 0 11648.0 1678.4 25
7 0 163.0 30.7 25
8 0 12044.0 1430.1 25
9 0 169.0 25.7 25
10 0 11646.0 1925.6 25
11 0 176.0 30.7 25
After reading the format is changed as shown below.:
I want to retain the format of rows and colums
I want to remove all the content before [intensity] like (offset, GridCornerUL, so on) shown in the first file.
You could trys:
txt <- readLines("file.txt")
df <- read.csv(text = txt[-(1:grep("NumberCells=\\d+", txt))], check.names = FALSE)
write.csv(df, tf <- tempfile(fileext = ".csv"), row.names = FALSE)
read.csv(tf, check.names = FALSE) # just to verify...
# CellHeader=X Y MEAN STDV NPIXELS
# 1 0 0 147.0 23.5 25
# 2 1 0 10015.0 1276.7 25
# 3 2 0 160.0 24.7 25
# 4 3 0 9710.0 1159.8 25
# 5 4 0 85.0 14.0 25
# 6 5 0 171.0 21.0 25
# 7 6 0 11648.0 1678.4 25
# 8 7 0 163.0 30.7 25
# 9 8 0 12044.0 1430.1 25
# 10 9 0 169.0 25.7 25
# 11 10 0 11646.0 1925.6 25
# 12 11 0 176.0 30.7 25
This omits everything before and including NumberCells=1354896.
As you are using linux, another option would be to pipe the awk with read.table or fread
read.table(pipe("awk 'NR==1, /NumberCells/ {next}{print}' Hashim.txt"),
header=TRUE, check.names=FALSE)
# CellHeader=X Y MEAN STDV NPIXELS
#1 0 0 147 23.5 25
#2 1 0 10015 1276.7 25
#3 2 0 160 24.7 25
#4 3 0 9710 1159.8 25
#5 4 0 85 14.0 25
#6 5 0 171 21.0 25
#7 6 0 11648 1678.4 25
#8 7 0 163 30.7 25
#9 8 0 12044 1430.1 25
#10 9 0 169 25.7 25
#11 10 0 11646 1925.6 25
#12 11 0 176 30.7 25
If NumberCells= always appears immediately before the header row, then you can exploit this to tell you the number of lines to skip:
dat<-readLines("file.txt")
read.table(textConnection(dat), header=TRUE, skip=grep("NumberCells", dat))
# CellHeader.X Y MEAN STDV NPIXELS
#1 0 0 147 23.5 25
#2 1 0 10015 1276.7 25
#3 2 0 160 24.7 25
#4 3 0 9710 1159.8 25
#5 4 0 85 14.0 25
#6 5 0 171 21.0 25
#7 6 0 11648 1678.4 25
#8 7 0 163 30.7 25
#9 8 0 12044 1430.1 25
#10 9 0 169 25.7 25
#11 10 0 11646 1925.6 25
#12 11 0 176 30.7 25
Edit
Because your files have a lot of rows, you may want to limit the number of lines that readLines reads in. To do this, you need to know the maximum number of lines before your header row. For instance, if you know your header row will always come within the first 200 lines of the file, you can do:
dat<-readLines("file.txt", n=200)
read.table("file.txt", header=TRUE, skip=grep("NumberCells", dat))
I have data collected for a few subjects, every 15 seconds over an hour split up by periods. Here's how the dataframe looks like, the time is "Temps", subjects are "Sujet" and the periods are determined by "Palier".
data.frame': 2853 obs. of 22 variables:
$ Temps : Factor w/ 217 levels "00:15","00:30",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Sujet : int 1 1 1 1 1 1 1 1 1 1 ...
$ Test : Factor w/ 3 levels "VO2max","Tlim",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Palier : int 1 1 1 1 1 1 1 1 1 1 ...
$ RPE : int 8 8 8 8 8 8 8 8 8 8 ...
$ Rmec : num 39.1 27.5 23.3 21.5 20.3 21.7 20.5 20.7 20.2 20.1 ...
Here a glimpse of the data.frame:
Temps Sujet Test Palier RPE Rmec Pmec Pchim Fr Vt VE FeO2 FeCO2 VO2 VCO2 RER HR VO2rel VE.VO2 VE.VCO2
1 00:15 1 VO2max 1 8 39.1 185 473.6 19 1854 34.60 16.24 4.48 1353 1268 0.94 121 17.6 0.02557280 0.02728707
2 00:30 1 VO2max 1 8 27.5 185 672.4 17 2602 44.30 15.77 4.78 1921 1731 0.90 124 25.0 0.02306091 0.02559214
3 00:45 1 VO2max 1 8 23.3 185 794.5 18 2793 50.83 15.63 4.85 2270 2015 0.89 131 29.6 0.02239207 0.02522581
4 01:00 1 VO2max 1 8 21.5 185 860.3 20 2756 55.76 15.68 4.88 2458 2224 0.90 137 32.0 0.02268511 0.02507194
5 01:15 1 VO2max 1 8 20.3 185 909.3 23 2709 61.26 15.84 4.88 2598 2446 0.94 139 33.8 0.02357968 0.02504497
6 01:30 1 VO2max 1 8 21.7 185 853.7 21 2899 59.85 16.00 4.89 2439 2395 0.98 140 31.8 0.02453875 0.02498956
Each "Palier" lasts about 5 min and there are from 5 to 10 "Palier". For each subject and "Palier", I need to compute the mean for the last 2 min for all the variables. I haven't figured it out yet with dcast() or ddply(), but I am a newbie!
Any help would be much appreciated!
If you turned it into a data.table (which you'd have to install), you could do this with
library(data.table)
dt = as.data.table(d) # assuming your existing data frame was called d
last.two.min = dt[, mean(tail(Rmec, 9)), by=Sujet]
This assumes that your original data frame was called d, and that you want the last 9 items (since it is every 15 seconds- you might want the last 8 if you want from 58:15 to 60:00).
I assumed that Rmec was the variable you wanted to get the mean for. If there are multiple for which you want to get the mean, you can do something like:
last.two.min = dt[, list(mean.Rmec=mean(tail(Rmec, 9)),
mean.RPE=mean(tail(RPE, 9))), by=Sujet]