repeated measures bootstrap stats, grouped by multiple factors - r

I have a data frame that looks like this, but obviously with many more rows etc:
df <- data.frame(id=c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2),
cond=c('A', 'A', 'B', 'B', 'A', 'A', 'B', 'B', 'A', 'A', 'B', 'B', 'A', 'A', 'B', 'B'),
comm=c('X', 'Y', 'X', 'Y', 'X', 'Y', 'X', 'Y','X', 'Y', 'X', 'Y', 'X', 'Y', 'X', 'Y'),
measure=c(0.8, 1.1, 0.7, 1.2, 0.9, 2.3, 0.6, 1.1, 0.7, 1.3, 0.6, 1.5, 1.0, 2.1, 0.7, 1.2))
So we have 2 factors (each with 2 levels, thus 4 combinations) and one continuous measure. We also have a repeated measures design in that we have multiple measure's within each cell that correspond to the same id.
I've attempted to first solve the groupby issue, then the bootstrap issue, then combine the two, but am pretty much stuck...
Stats, grouped by the 2 factors
I can get multiple summary stats for each of the 4 cells by:
summary_stats <- aggregate(df$measure,
by = list(df$cond, df$comm),
function(x) c(mean = mean(x), median = median(x), sd = sd(x)))
print(summary_stats)
resulting in
Group.1 Group.2 x.mean x.median x.sd
1 A X 0.85000000 0.85000000 0.12909944
2 B X 0.65000000 0.65000000 0.05773503
3 A Y 1.70000000 1.70000000 0.58878406
4 B Y 1.25000000 1.20000000 0.17320508
This is great as we are getting multiple stats for each of the 4 cells.
But what I'd really like is the 95% bootstrap CI's, for each stat, for each of the 4 cells. I don't mind if I have to run a final solution once for statistic (e.g. mean, median, etc), but bonus points for doing it all in one go.
Bootstrap for repeated measures
Can't quite make this work, but what I want is 95% bootstrap CI's, done in a way which is appropriate for this repeated measures design. Unless I'm mistaken then I want to select bootstrap samples on the basis of id (not on the basis of rows of the dataframe), then calculate a summary measure (e.g. mean) for each of the 4 cells.
library(boot)
myfunc <- function(data, indices) {
# select bootstrap sample to index into `id`
d <- data[data$id==indicies,]
return(c(mean=mean(d), median=median(d), sd = sd(d)))
}
bresults <- boot(data = CO2$uptake, statistic = myfunc, R = 1000)
Q1: I'm getting errors in selecting the bootstrap sample by id, i.e. the line d <- data[ data$id==indicies, ]
Combining bootstrap and the groupby 2 factors
Q2: I have no intuition of how to gel the two approaches together to achieve the final desired result. My only idea is to put the aggregate call in myfunc, to repeatedly calculate cell stats under each bootstrap replicate, but I'm out of my comfort zone with R here.

With your two questions, you have two issues:
How to bootstrap (resample) your data in such a way that you resample based on id, rather than rows
How to perform separate bootstraps for the four groups in your 2x2 design
One easy way to do this would be by using the following packages (all part of the tidyverse):
dplyr for manipulating your data (in particular, summarising the data you have for each id) and also for the neat %>% forward pipe operator which supplies the result of an expression as the first argument to the next expression so you can chain commands
broom for doing an operation for each group in your dataframe
boot (which you already use) for the bootstrapping
Load the packages:
library(dplyr)
library(broom)
library(boot)
First of all, to make sure when we resample we include a subject or not, I would save the various values each subject has as a list:
df <- df %>%
group_by(id, cond, comm) %>%
summarise(measure=list(measure)) %>%
ungroup()
Now the dataframe has fewer rows (4 per ID), and the variable measure is not numeric anymore (instead, it's a list). This means we can just use the indices that boot provides (solving issue 1), but also that we'll have to "unlist" it when we actually want to do calculations with it, so your function now becomes:
myfunc <- function(data, indices) {
data <- data[indices,]
return(c(mean=mean(unlist(data$measure)),
median=median(unlist(data$measure)),
sd = sd(unlist(data$measure))))
}
Now that we can simply use boot to resample each row, we can think about how to do it neatly per group. This is where the broom package comes in: you can ask it to do an operation for each group in your data frame, and store it in a tidy dataframe, with one row for each of your groups, and a column for the values that your function produces. So we simply group the dataframe again, and then call do(tidy(...)), with a . instead of the name of our variable. This hopefully solves issue 2 for you!
bootresults <- df %>%
group_by(cond, comm) %>%
do(tidy(boot(data = ., statistic = myfunc, R = 1000)))
This produces:
# Groups: cond, comm [4]
cond comm term statistic bias std.error
<fctr> <fctr> <chr> <dbl> <dbl> <dbl>
1 A X mean 0.85000000 0.000000000 5.280581e-17
2 A X median 0.85000000 0.000000000 5.652979e-17
3 A X sd 0.12909944 -0.004704999 4.042676e-02
4 A Y mean 1.70000000 0.000000000 1.067735e-16
5 A Y median 1.70000000 0.000000000 1.072347e-16
6 A Y sd 0.58878406 -0.005074338 7.888294e-02
7 B X mean 0.65000000 0.000000000 0.000000e+00
8 B X median 0.65000000 0.000000000 0.000000e+00
9 B X sd 0.05773503 0.000000000 0.000000e+00
10 B Y mean 1.25000000 0.001000000 7.283065e-02
11 B Y median 1.20000000 0.027500000 7.729634e-02
12 B Y sd 0.17320508 -0.030022214 5.067446e-02
Hopefully this is what you'd like to see!
If you want to then use the values from this dataframe a bit more, you can use other dplyr functions to select which rows in this table you look at. For example, to look at the bootstrapped standard error of the standard deviation of your measure for condition A / X, you can do the following:
bootresults %>% filter(cond=='A', comm=='X', term=='sd') %>% pull(std.error)
I hope that helps!

For a bootstrap with a cluster variable, here's a solution without additional packages. I didn't use the boot package though.
Part 1: Bootstrap
This function draws a random sample from a set of clustered observations.
.clusterSample <- function(x, id){
boot.id <- sample(unique(id), replace=T)
out <- lapply(boot.id, function(i) x[id%in%i,])
return( do.call("rbind",out) )
}
Part 2: Boostrap estimates and CIs
The next function draws multiple samples and applies the same aggregate statement to each of them. The bootstrap estimates and CIs are then obtained by mean and quantile.
clusterBoot <- function(data, formula, cluster, R=1000, alpha=.05, FUN){
# cluster variable
cls <- model.matrix(cluster,data)[,2]
template <- aggregate(formula, .clusterSample(data,cls), FUN)
var <- which( names(template)==all.vars(formula)[1] )
grp <- template[,-var,drop=F]
val <- template[,var]
x <- vapply( 1:R, FUN=function(r) aggregate(formula, .clusterSample(data,cls), FUN)[,var],
FUN.VALUE=val )
if(is.vector(x)) dim(x) <- c(1,1,length(x))
if(is.matrix(x)) dim(x) <- c(nrow(x),1,ncol(x))
# bootstrap estimates
est <- apply( x, 1:2, mean )
lo <- apply( x, 1:2, function(i) quantile(i,alpha/2) )
up <- apply( x, 1:2, function(i) quantile(i,1-alpha/2) )
colnames(lo) <- paste0(colnames(lo), ".lo")
colnames(up) <- paste0(colnames(up), ".up")
return( cbind(grp,est,lo,up) )
}
Note the use of vapply. I use it because I prefer working with arrays over lists. Note also that I used the formula interface to aggregate, which I also like better.
Part 3: Examples
It can be used with any kind of stats, basically, even without grouping variables. Some examples include:
myStats <- function(x) c(mean = mean(x), median = median(x), sd = sd(x))
clusterBoot(data=df, formula=measure~cond+comm, cluster=~id, R=10, FUN=myStats)
# cond comm mean median sd mean.lo median.lo sd.lo mean.up median.up sd.up
# 1 A X 0.85 0.850 0.11651125 0.85 0.85 0.05773503 0.85 0.85 0.17320508
# 2 B X 0.65 0.650 0.05773503 0.65 0.65 0.05773503 0.65 0.65 0.05773503
# 3 A Y 1.70 1.700 0.59461417 1.70 1.70 0.46188022 1.70 1.70 0.69282032
# 4 B Y 1.24 1.215 0.13856406 1.15 1.15 0.05773503 1.35 1.35 0.17320508
clusterBoot(data=df, formula=measure~cond+comm, cluster=~id, R=10, FUN=mean)
# cond comm est .lo .up
# 1 A X 0.85 0.85 0.85
# 2 B X 0.65 0.65 0.65
# 3 A Y 1.70 1.70 1.70
# 4 B Y 1.25 1.15 1.35
clusterBoot(data=df, formula=measure~1, cluster=~id, R=10, FUN=mean)
# est .lo .up
# 1 1.1125 1.0875 1.1375

Related

Create many ROC curves in R?

I have 150 columns of scores against 1 column of label (1/0).
My goal is to create 150 AUC scores.
Here is a manual example:
auc(roc(df$label, df$col1)),
auc(roc(df$label, df$col2)),
...
I can use here Map/sapply/lapply but is there any other method, or function?
This is a bit of an XY question. What you actually want to achieve is speed up your calculation. gfgm's answer answers it with parallelization, but that's only one way to go.
If, as I assume, you are using library(pROC)'s roc/auc functions, you can gain even more speed by selecting the appropriate algorithm for your dataset.
pROC comes with essentially two algorithms that scale very differently depending on the characteristics of your data set. You can benchmark which one is the fastest by passing algorithm=0 to roc:
# generate some toy data
label <- rbinom(600000, 1, 0.5)
score <- rpois(600000, 10)
library(pROC)
roc(label, score, algorithm=0)
Starting benchmark of algorithms 2 and 3, 10 iterations...
expr min lq mean median uq max neval
2 2 4805.58762 5827.75410 5910.40251 6036.52975 6085.8416 6620.733 10
3 3 98.46237 99.05378 99.52434 99.12077 100.0773 101.363 10
Selecting algorithm 3.
Here we select algorithm 3, which shines when the number of thresholds remains low. But if 600000 data points take 5 minutes to compute I strongly suspect that your data is very continuous (no measurements with identical values) and that you have about as many thresholds as data points (600000). In this case you can skip directly to algorithm 2 which scales much better as the number of thresholds in the ROC curve increases.
You can then run:
auc(roc(df$label, df$col1, algorithm=2)),
auc(roc(df$label, df$col2, algorithm=2)),
On my machine each call to roc now takes about 5 seconds, pretty independently of the number of thresholds. This way you should be done in under 15 minutes total. Unless you have 50 cores or more this is going to be faster than just parallelizing. But of course you can do both...
If you want to parallelize the computations you could do it like this:
# generate some toy data
label <- rbinom(1000, 1, .5)
scores <- matrix(runif(1000*150), ncol = 150)
df <- data.frame(label, scores)
library(pROC)
library(parallel)
auc(roc(df$label, df$X1))
#> Area under the curve: 0.5103
auc_res <- mclapply(df[,2:ncol(df)], function(row){auc(roc(df$label, row))})
head(auc_res)
#> $X1
#> Area under the curve: 0.5103
#>
#> $X2
#> Area under the curve: 0.5235
#>
#> $X3
#> Area under the curve: 0.5181
#>
#> $X4
#> Area under the curve: 0.5119
#>
#> $X5
#> Area under the curve: 0.5083
#>
#> $X6
#> Area under the curve: 0.5159
Since most of the computational time seems to be the call to auc(roc(...)) this should speed things up if you have a multi-core machine.
There's a function for doing that in the cutpointr package. It also calculates cutpoints and other metrics, but you can discard those. By default it will try all columns except for the response column as predictors. Additionally, you can select whether the direction of the ROC curve (whether larger values imply the positive class or the other way around) is determined automatically by leaving out direction or set it manually.
dat <- iris[1:100, ]
library(tidyverse)
library(cutpointr)
mc <- multi_cutpointr(data = dat, class = "Species", pos_class = "versicolor",
silent = FALSE)
mc %>% select(variable, direction, AUC)
# A tibble: 4 x 3
variable direction AUC
<chr> <chr> <dbl>
1 Sepal.Length >= 0.933
2 Sepal.Width <= 0.925
3 Petal.Length >= 1.00
4 Petal.Width >= 1.00
By the way, the runtime shouldn't be a problem here because calculating the ROC-curve (even including a cutpoint) takes less than a second for one variable and one million observations using cutpointr or ROCR, so your task runs in about one or two minutes.
If memory is the limiting factor, parallelization will probably make that problem worse. If the above solution takes up too much memory, because it returns ROC-curves for all variables before dropping those columns, you can try selecting the columns of interest right away in a call to map:
# 600.000 observations for 150 variables and a binary outcome
predictors <- matrix(data = rnorm(150 * 6e5), ncol = 150)
dat <- as.data.frame(cbind(y = sample(0:1, size = 6e5, replace = T), predictors))
library(cutpointr)
library(tidyverse)
vars <- colnames(dat)[colnames(dat) != "y"]
result <- map_df(vars, function(coln) {
cutpointr_(dat, x = coln, class = "y", silent = TRUE, pos_class = 1) %>%
select(direction, AUC) %>%
mutate(variable = coln)
})
result
# A tibble: 150 x 3
direction AUC variable
<chr> <dbl> <chr>
1 >= 0.500 V2
2 <= 0.501 V3
3 >= 0.501 V4
4 >= 0.501 V5
5 <= 0.501 V6
6 <= 0.500 V7
7 <= 0.500 V8
8 >= 0.502 V9
9 >= 0.501 V10
10 <= 0.500 V11
# ... with 140 more rows

Sampling using conditional probability table

I am trying to simulate certain discrete variable depicting "true state of the world" (say, "red", "green" or "blue") and its indicator, somewhat imperfectly describing it.
r_names <- c("real_R", "real_G", "real_B")
Lets say I have some prior belief about distribution of "reality" variable, which I will use to sample it.
r_probs <- c(0.3, 0.5, 0.2)
set.seed(100)
reality <- sample(seq_along(r_names), 10000, prob=r_probs, replace = TRUE)
Now, let's say I have conditional probability table that stipulates the value of indicator given each of the "realities"
ri_matrix <- matrix(c(0.7, 0.3, 0,
0.2, 0.6, 0.2,
0.05,0.15,0.8), byrow=TRUE,nrow = 3)
dimnames(ri_matrix) <- list(paste("real", r_names, sep="_"),
paste("ind", r_names, sep="_"))
ri_matrix
># ind_R ind_G ind_B
># real_Red 0.70 0.30 0.0
># real_Green 0.20 0.60 0.2
># real_Blue 0.05 0.15 0.8
Since base::sample() is not vectorized for prob argument, I have to:
sample_cond <- function(r, rim){
unlist(lapply(r, function(x)
sample(seq_len(ncol(rim)), 1, prob = rim[x,], replace = TRUE)))
}
Now I can sample my "indicator" variable using the conditional probability matrix
set.seed(200)
indicator <- sample_cond(reality, ri_matrix)
Just to make sure the distributions turned out as expected:
prop.table(table(reality, indicator), margin = 1)
#> indicator
#> reality 1 2 3
#> 1 0.70043610 0.29956390 0.00000000
#> 2 0.19976124 0.59331476 0.20692400
#> 3 0.04365278 0.14400401 0.81234320
Is there a better (i.e. more idiomatic and/or efficient) way to sample a discrete variable conditioned on another discrete random variable?
UPDATE:
As suggested by #Mr.Flick, this is at least 50x faster, because it reuses probability vectors instead of repeated subsetting of the conditional probability matrix.
sample_cond_group <- function(r, rim){
il <- mapply(function(x,y){sample(seq(ncol(rim)), length(x), prob = y, replace = TRUE)},
x=split(r, r),
y=split(rim, seq(nrow(rim))))
unsplit(il, r)
}
You can be a bit more efficient by drawing all the random samples per group with a split/combine type strategy. That might look something like this
simFun <- function(N, r_probs, ri_matrix) {
stopifnot(length(r_probs) == nrow(ri_matrix))
ind <- sample.int(length(r_probs), N, prob = r_probs, replace=TRUE)
grp <- split(data.frame(ind), ind)
unsplit(Map(function(data, r) {
draw <-sample.int(ncol(ri_matrix), nrow(data), replace=TRUE, prob=ri_matrix[r, ])
data.frame(data, draw)
}, grp, as.numeric(names(grp))), ind)
}
Than you can call with
simFun(10000, r_probs, ri_matrix)

Mean and standard deviation of triplicated vector data

I have an experiment where I measured a bit less than 200 variables in triplicate. In other words, I have three vectors of ~ 200 values.
I want a quick way to determine if I should use mean or median for my calculations. I can do the mean easily ((v1 + v2 + v3) / 3), but how do I calculate the SD to have it in a vector of ~ 200 SDs? And what about the median?
After having these values, I need to do growth curves (measurements were taken over certain period of time).
Here is a dplyr solution:
require(dplyr)
d <- data.frame(
x1 = rnorm(10),
x2 = rnorm(10),
x3 = rnorm(10)
)
d %>%
rowwise() %>%
mutate(
mean = mean(c(x1, x2, x3)),
median = median(c(x1, x2, x3)),
sd = sd(c(x1, x2, x3))
)
It sounds like you also have a substantive question about longitudinal data. If so, crossvalidated would be a good platform for this question.
apply is what you do. Have your vector in a matrix, e.g.
mydat <- matrix(rnorm(600), ncol = 3)
means <- apply(mydat, MARGIN = 1, mean) # MARGIN = 1 is rows, MARGIN = 2 would be columns...
sds <- apply(mydat, MARGIN = 1, sd)
medians <- apply(mydat, MARGIN = 1, median)
Though I have to say, with 3 values each, using median sounds pretty questionable.
Traditional 'for' loop can also be used, though it is not preferred:
for(i in 1:nrow(d)) d[i,4]=mean(unlist(d[i,1:3]))
for(i in 1:nrow(d)) d[i,5]=sd(unlist(d[i,1:3]))
for(i in 1:nrow(d)) d[i,6]=median(unlist(d[i,1:3]))
names(d)[4:6]=c('meanval', 'sdval', 'medianval')
d
x1 x2 x3 meanval sdval medianval
1 -1.3230176 0.6956100 -0.7210798 -0.44949580 1.0363556 -0.7210798
2 -1.8931166 0.9047873 -1.0378874 -0.67540558 1.4337404 -1.0378874
3 -0.2137543 0.1846733 0.6410478 0.20398893 0.4277283 0.1846733
4 0.1371915 -1.0345325 -0.2260038 -0.37444827 0.5998009 -0.2260038
5 -0.8662465 -0.8229465 -0.2230030 -0.63739866 0.3595296 -0.8229465
6 -0.2918697 -1.3543493 1.3025262 -0.11456426 1.3372826 -0.2918697
7 -0.4931936 1.7186173 1.3757156 0.86704643 1.1904138 1.3757156
8 0.3982403 -0.3394208 1.9316059 0.66347514 1.1585131 0.3982403
9 -1.0332427 -0.3045905 1.1513260 -0.06216908 1.1122775 -0.3045905
10 -1.5603811 -0.1709146 -0.5409815 -0.75742575 0.7195765 -0.5409815
Using d from #DMC's answer.

split on factor, sapply, and lm [duplicate]

This question already has answers here:
Linear Regression and group by in R
(10 answers)
Closed 6 years ago.
I want to apply lm() to observations grouped by subject, but cannot work out the sapply syntax. At the end, I want a dataframe with 1 row for each subject, and the intercept and slope (ie, rows of: subj, lm$coefficients[1] lm$coefficients[2])
set.seed(1)
subj <- rep(c("a","b","c"), 4) # 4 observations each on 3 experimental subjects
ind <- rnorm(12) #12 random numbers, the independent variable, the x axis
dep <- rnorm(12) + .5 #12 random numbers, the dependent variable, the y axis
df <- data.frame(subj=subj, ind=ind, dep=dep)
s <- (split(df,subj)) # create a list of observations by subject
I can pull a single set of observations from s, make a dataframe, and get what I want:
df2 <- as.data.frame(s[1])
df2
lm1 <- lm(df2$a.dep ~ df2$a.ind)
lm1$coefficients[1]
lm1$coefficients[2]
I am having trouble looping over all the elements of s and getting the data into the final form I want:
lm.list <- sapply(s, FUN= function(x)
(lm(x[ ,"dep"] ~ x[,"ind"])))
a <-as.data.frame(lm.list)
I feel like I need some kind of transpose of the structure below; the columns (a,b,c) are what I want my rows to be, but t(a) does not work.
head(a)
a
coefficients 0.1233519, 0.4610505
residuals 0.4471916, -0.3060402, 0.4460895, -0.5872409
effects -0.6325478, 0.6332422, 0.5343949, -0.7429069
rank 2
fitted.values 0.74977179, 0.09854505, -0.05843569, 0.47521446
assign 0, 1
b
coefficients 1.1220840, 0.2024222
residuals -0.04461432, 0.02124541, 0.27103003, -0.24766112
effects -2.0717363, 0.2228309, 0.2902311, -0.2302195
rank 2
fitted.values 1.1012775, 0.8433366, 1.1100777, 1.0887808
assign 0, 1
c
coefficients 0.2982019, 0.1900459
residuals -0.5606330, 1.0491990, 0.3908486, -0.8794147
effects -0.6742600, 0.2271767, 1.1273566, -1.0345665
rank 2
fitted.values 0.3718773, 0.2193339, 0.5072572, 0.2500516
assign 0, 1
By the sounds of it, this might be what you're trying to do:
sapply(s, FUN= function(x)
lm(x[ ,"dep"] ~ x[,"ind"])$coefficients[c(1, 2)])
# a b c
# (Intercept) 0.71379430 -0.6817331 0.5717372
# x[, "ind"] 0.07125591 1.1452096 -1.0303726
Other alternatives, if this is what you're looking for
I've seen it noted that in general, if you're splitting and then using s/lapply, you can usually just jump straight to by and skip the split step:
do.call(rbind,
by(data = df, INDICES=df$subj, FUN=function(x)
lm(x[, "dep"] ~ x[, "ind"])$coefficients[c(1, 2)]))
# (Intercept) x[, "ind"]
# a 0.7137943 0.07125591
# b -0.6817331 1.14520962
# c 0.5717372 -1.03037257
Or, you can use one of the packages that lets you do such sorts of calculations more conveniently, like "data.table":
library(data.table)
DT <- data.table(df)
DT[, list(Int = lm(dep ~ ind)$coefficients[1],
Slo = lm(dep ~ ind)$coefficients[2]), by = subj]
# subj Int Slo
# 1: a 0.7137943 0.07125591
# 2: b -0.6817331 1.14520962
# 3: c 0.5717372 -1.03037257
How about nlme::lmList?
library(nlme)
coef(lmList(dep~ind|subj,df))
## (Intercept) ind
## a 0.7137943 0.07125591
## b -0.6817331 1.14520962
## c 0.5717372 -1.03037257
You can transpose this if you want.

Per Second statistics in R

I have a file which contains Timestamps like this:
0.000100
0.003890
0.567980
0.999000
0.999990
1.000010
1.236800
1.456098
1.989001
2.098710
2.309879
2.890879
I want to find the per-second statistics , like in 1st second: 5 values, 2nd second: 4, 3rd second 3 in the file above using R. I also want to find Avg per second, max value in all the seconds and minimum value in all seconds. How can these be extracted using R? I am a newbie to R and still learning. I know how to plot these in histograms, but don't know how to extract the values.
Data:
x <- c(0.0001, 0.00389, 0.56798, 0.999, 0.99999, 1.00001, 1.2368, 1.456098,
1.989001, 2.09871, 2.309879, 2.890879)
You can also use the cut function to create a factor (time range) and then use in a similar fashion to how Justin proposes with aggregate:
y <- data.frame(val=x, time=cut(x, 0:round(max(x))))
aggregate(val~time, y, length)
aggregate(val~time, y, mean)
Or create your own function and do it in one fell swoop:
funner <- function(x){
c(mean=mean(x), n=length(x), min=min(x), max=max(x), sd=sd(x))
}
aggregate(val~time, y, funner)
yielding:
> aggregate(val~time, y, funner)
time val.mean val.n val.min val.max val.sd
1 (0,1] 0.5141920 5.0000000 0.0001000 0.9999900 0.4996575
2 (1,2] 1.4204773 4.0000000 1.0000100 1.9890010 0.4223025
3 (2,3] 2.4331560 3.0000000 2.0987100 2.8908790 0.4102205
You can do this using integer math:
x <- c(1e-04, 0.00389, 0.56798, 0.999, 0.99999, 1.00001, 1.2368, 1.456098,
1.989001, 2.09871, 2.309879, 2.890879)
> aggregate(x, list(x %/% 1), mean)
Group.1 x
1 0 0.514192
2 1 1.420477
3 2 2.433156
>
I would also suggest you look data.table and plyr packages for this sort of aggregation.
The max and min for each group follow fairly easily. If you just want the max or min of the series you can use those functions directly
> max(x)
[1] 2.890879
>

Resources