Calculating the mean vectors for a two-factorial table

Calculating the mean vectors for a two-factorial table - r

I am trying to calculate the mean reagent vectors across the variables RBC, WBC, and hemoglobin. I am fairly new to R so my question is: Can you show me an easier way to do the following calculations in R? The data is from Table 6.19 of Rencher. I am trying to practice doing the computations in R as I follow the examples in Rencher.
reagent.dat <- read.table("https://dl.dropboxusercontent.com/u/28713619/reagent.dat")
colnames(reagent.dat) <- c("reagent", "subject", "RBC", "WBC", "hemoglobin")
reagent.dat$reagent <- factor(reagent.dat$reagent)
reagent.dat$subject <- factor(reagent.dat$subject)
library(plyr)
library(dplyr)
library(reshape2)
# Calculate the means per variable, across reagents
reagent.datm <- melt(reagent.dat)
group.means <- ddply(reagent.datm, c("variable","reagent"), summarise,mean=mean(value))
group.means <- tbl_df(group.means)
newdata <- group.means %>% select(reagent, mean)
# Store the group means into a matrix
y_bar <- matrix(c(rep(NA, times=12)), ncol=4)
for (i in 1:4)
y_bar[,i] <- as.matrix(filter(newdata, reagent == i)$mean, ncol=1)
y_bar

The dplyr package can actually simplify your code quite easily and is definitely worth learning because of how powerful it can be. As an example:
reagent.dat <- read.table("https://dl.dropboxusercontent.com/u/28713619/reagent.dat")
colnames(reagent.dat) <- c("reagent", "subject", "RBC", "WBC", "hemoglobin")
#Using dplyr
library(dplyr)
reagentmeans <- reagent.dat %>% select(reagent, RBC, WBC, hemoglobin) %>%
group_by(reagent) %>%
summarize(mean_RBC = mean(RBC), mean_WBC = mean(WBC),
mean_hemoglobin = mean(hemoglobin))
> reagentmeans
Source: local data frame [4 x 4]
reagent mean_RBC mean_WBC mean_hemoglobin
(fctr) (dbl) (dbl) (dbl)
1 1 7.290 4.9535 15.310
2 2 7.210 4.8985 15.725
3 3 7.055 4.8810 15.595
4 4 7.025 4.8915 15.765

You can use data.table,
library(data.table)
setDT(reagent.dat)[, lapply(.SD, mean), by = reagent, .SDcols = c('RBC', 'WBC', 'hemoglobin')]
# reagent RBC WBC hemoglobin
#1: 1 7.290 4.9535 15.310
#2: 2 7.210 4.8985 15.725
#3: 3 7.055 4.8810 15.595
#4: 4 7.025 4.8915 15.765

Related

How can i add more columns in dataframe by for loop

I am beginner of R. I need to transfer some Eviews code to R. There are some loop code to add 10 or more columns\variables with some function in data in Eviews.
Here are eviews example code to estimate deflator:
for %x exp con gov inv cap ex im
frml def_{%x} = gdp_{%x}/gdp_{%x}_r*100
next
I used dplyr package and use mutate function. But it is very hard to add many variables.
library(dplyr)
nominal_gdp<-rnorm(4)
nominal_inv<-rnorm(4)
nominal_gov<-rnorm(4)
nominal_exp<-rnorm(4)
real_gdp<-rnorm(4)
real_inv<-rnorm(4)
real_gov<-rnorm(4)
real_exp<-rnorm(4)
df<-data.frame(nominal_gdp,nominal_inv,
nominal_gov,nominal_exp,real_gdp,real_inv,real_gov,real_exp)
df<-df %>% mutate(deflator_gdp=nominal_gdp/real_gdp*100,
deflator_inv=nominal_inv/real_inv,
deflator_gov=nominal_gov/real_gov,
deflator_exp=nominal_exp/real_exp)
print(df)
Please help me to this in R by loop.

The answer is that your data is not as "tidy" as it could be.
This is what you have (with an added observation ID for clarity):
library(dplyr)
df <- data.frame(nominal_gdp = rnorm(4),
nominal_inv = rnorm(4),
nominal_gov = rnorm(4),
real_gdp = rnorm(4),
real_inv = rnorm(4),
real_gov = rnorm(4))
df <- df %>%
mutate(obs_id = 1:n()) %>%
select(obs_id, everything())
which gives:
obs_id nominal_gdp nominal_inv nominal_gov real_gdp real_inv real_gov
1 1 -0.9692060 -1.5223055 -0.26966202 0.49057546 2.3253066 0.8761837
2 2 1.2696927 1.2591910 0.04238958 -1.51398652 -0.7209661 0.3021453
3 3 0.8415725 -0.1728212 0.98846942 -0.58743294 -0.7256786 0.5649908
4 4 -0.8235101 1.0500614 -0.49308092 0.04820723 -2.0697008 1.2478635
Consider if you had instead, in df2:
obs_id variable real nominal
1 1 gdp 0.49057546 -0.96920602
2 2 gdp -1.51398652 1.26969267
3 3 gdp -0.58743294 0.84157254
4 4 gdp 0.04820723 -0.82351006
5 1 inv 2.32530662 -1.52230550
6 2 inv -0.72096614 1.25919100
7 3 inv -0.72567857 -0.17282123
8 4 inv -2.06970078 1.05006136
9 1 gov 0.87618366 -0.26966202
10 2 gov 0.30214534 0.04238958
11 3 gov 0.56499079 0.98846942
12 4 gov 1.24786355 -0.49308092
Then what you want to do is trivial:
df2 %>% mutate(deflator = real / nominal)
obs_id variable real nominal deflator
1 1 gdp 0.49057546 -0.96920602 -0.50616221
2 2 gdp -1.51398652 1.26969267 -1.19240392
3 3 gdp -0.58743294 0.84157254 -0.69801819
4 4 gdp 0.04820723 -0.82351006 -0.05853872
5 1 inv 2.32530662 -1.52230550 -1.52749012
6 2 inv -0.72096614 1.25919100 -0.57256297
7 3 inv -0.72567857 -0.17282123 4.19901294
8 4 inv -2.06970078 1.05006136 -1.97102841
9 1 gov 0.87618366 -0.26966202 -3.24919196
10 2 gov 0.30214534 0.04238958 7.12782060
11 3 gov 0.56499079 0.98846942 0.57158146
12 4 gov 1.24786355 -0.49308092 -2.53074800
So the question becomes: how do we get to the nice dplyr-compatible data.frame.
You need to gather your data using tidyr::gather. However, because you have 2 sets of variables to gather (the real and nominal values), it is not straightforward. I have done it in two steps, there may be a better way though.
real_vals <- df %>%
select(obs_id, starts_with("real")) %>%
# the line below is where the magic happens
tidyr::gather(variable, real, starts_with("real")) %>%
# extracting the variable name (by erasing up to the underscore)
mutate(variable = gsub(variable, pattern = ".*_", replacement = ""))
# Same thing for nominal values
nominal_vals <- df %>%
select(obs_id, starts_with("nominal")) %>%
tidyr::gather(variable, nominal, starts_with("nominal")) %>%
mutate(variable = gsub(variable, pattern = ".*_", replacement = ""))
# Merging them... Now we have something we can work with!
df2 <-
full_join(real_vals, nominal_vals, by = c("obs_id", "variable"))
Note the importance of the observation id when merging.

We can grep the matching names, and sort:
x <- colnames(df)
df[ sort(x[ (grepl("^nominal", x)) ]) ] /
df[ sort(x[ (grepl("^real", x)) ]) ] * 100
Similarly, if the columns were sorted, then we could just:
df[ 1:4 ] / df[ 5:8 ] * 100

We can loop over column names using purrr::map_dfc then apply a custom function over the selected columns (i.e. the columns that matched the current name from nms)
library(dplyr)
library(purrr)
#Replace anything before _ with empty string
nms <- unique(sub('.*_','',names(df)))
#Use map if you need the ouptut as a list not a dataframe
map_dfc(nms, ~deflator_fun(df, .x))
Custom function
deflator_fun <- function(df, x){
#browser()
nx <- paste0('nominal_',x)
rx <- paste0('real_',x)
select(df, matches(x)) %>%
mutate(!!paste0('deflator_',quo_name(x)) := !!ensym(nx) / !!ensym(rx)*100)
}
#Test
deflator_fun(df, 'gdp')
nominal_gdp real_gdp deflator_gdp
1 -0.3332074 0.181303480 -183.78433
2 -1.0185754 -0.138891362 733.36121
3 -1.0717912 0.005764186 -18593.97398
4 0.3035286 0.385280401 78.78123
Note: Learn more about quo_name, !!, and ensym which they are tools for programming with dplyr here

How to aggregate using ddply when not all elements of a variable exist on R

I am having trouble using combinations of ddply and merge to aggregate some variables. The data frame that I am using is really large, so I am putting an example below:
data_sample <- cbind.data.frame(c(123,123,123,321,321,134,145,000),
c('j', 'f','j','f','f','o','j','f'),
c(seq(110,180, by = 10)))
colnames(data_sample) <- c('Person','Expense_Type','Expense_Value')
I want to calculate, for each person, the percentage of the value of expense of type j on the person's total expense.
data_sample2 <- ddply(data_sample, c('Person'), transform, total = sum(Value))
data_sample2 <- ddply(data_sample2, c('Person','Type'), transform, empresa = sum(Value))
This it what I've done to list total expenses by type, but the problem is that not all individuals have expenses of type j, so their percentage should be 0 and I don't know how to leave only one line per person with the percentage of total expenses of type j.
I might have not made myself clear.
Thank you!

We can use the by function:
by(data_sample, data_sample$Person, FUN = function(dat){
sum(dat[dat$Expense_Type == 'j',]$Expense_Value) / sum(dat$Expense_Value)
})
We could also make use of the dplyr package:
library(dplyr)
data_sample %>%
group_by(Person) %>%
summarise(Percent_J = sum(ifelse(Expense_Type == 'j', Expense_Value, 0)) / sum(Expense_Value))
# A tibble: 5 × 2
Person Percent_J
<dbl> <dbl>
1 0 0.0000000
2 123 0.6666667
3 134 0.0000000
4 145 1.0000000
5 321 0.0000000

How to get summary statistics for multiple variables by multiple groups?

I know that there are many answers provided in this forum on how to get summary statistics (e.g. mean, se, N) for multiple groups using options like aggregate , ddply or data.table. I'm not sure, however, how to apply these functions over multiple columns at once.
More specifically, I would like to know how to extend the following ddply command over multiple columns (dv1, dv2, dv3) without re-typing the code with different variable name each time.
library(reshape2)
library(plyr)
group1 <- c(rep(LETTERS[1:4], c(4,6,6,8)))
group2 <- c(rep(LETTERS[5:8], c(6,4,8,6)))
group3 <- c(rep(LETTERS[9:10], c(12,12)))
my.dat <- data.frame(group1, group2, group3, dv1=rnorm(24),dv2=rnorm(24),dv3=rnorm(24))
my.dat
data1 <- ddply(my.dat, c("group1", "group2","group3"), summarise,
N = length(dv1),
mean = mean(dv1,na.rm=T),
sd = sd(dv1,na.rm=T),
se = sd / sqrt(N)
)
data1
How can I apply this ddply function over multiple columns such that the outcome will be data1, data2, data3... for each outcome variable? I thought this could be the solution:
dfm <- melt(my.dat, id.vars = c("group1", "group2","group3"))
lapply(list(.(group1, variable), .(group2, variable),.(group3, variable)),
ddply, .data = dfm, .fun = summarize,
mean = mean(value),
sd = sd(value),
N=length(value),
se=sd/sqrt(N))
Looks like it's in the right direction but not exactly what I need. This solution provides the statistics by each group separately. What I need an outcome as in data1 (e.g. first aggregated group is people who are at A, E and I; the second is those who are at group B, E and I etc...)

Here's an illustration of reshaping your data first. I've written a custom function to improve readability:
mysummary <- function(x,na.rm=F){
res <- list(mean=mean(x, na.rm=na.rm),
sd=sd(x,na.rm=na.rm),
N=length(x))
res$se <- res$sd/sqrt(res$N)
res
}
library(data.table)
res <- melt(setDT(my.dat),id.vars=c("group1","group2","group3"))[,mysummary(value),
by=.(group1,group2,group3,variable)]
> head(res)
group1 group2 group3 variable mean sd N se
1: A E I dv1 9.75 6.994045 4 3.497023
2: B E I dv1 9.50 7.778175 2 5.500000
3: B F I dv1 16.00 4.082483 4 2.041241
4: C G I dv1 14.50 10.606602 2 7.500000
5: C G J dv1 10.75 10.372239 4 5.186119
6: D G J dv1 13.00 4.242641 2 3.000000
Or without the custom function, thanks to #Jaap
melt(setDT(my.dat),
id=c("group1","group2","group3"))[, .(mean = mean(value),
sd = sd(value),
n = .N,
se = sd(value)/sqrt(.N)),
.(group1, group2, group3, variable)]

If you don't want to melt into long format, you can also do:
library(data.table)
setDT(my.dat)[, as.list(unlist(lapply(.SD, function(x) list(mean = mean(x),
sd = sd(x),
n = .N,
se = sd(x)/sqrt(.N))))),
by = .(group1, group2, group3), .SDcols=c("dv1","dv2","dv3")]
which gives:
group1 group2 group3 dv1.mean dv1.sd dv1.n dv1.se dv2.mean dv2.sd dv2.n dv2.se dv3.mean dv3.sd dv3.n dv3.se
1: A E I 0.09959774 0.4704498 4 0.23522491 0.05020096 0.8098882 4 0.40494412 -0.134137210 0.7832841 4 0.3916420
2: B E I 0.72726477 0.3651544 2 0.25820315 0.73743314 1.4260172 2 1.00834641 -0.120188202 0.5532434 2 0.3912022
3: B F I -0.68661572 0.7212631 4 0.36063157 0.06670216 0.7678781 4 0.38393905 0.096275469 0.8993015 4 0.4496508
4: C G I -0.54577363 0.0798962 2 0.05649515 0.18293371 0.1022325 2 0.07228926 -0.947603264 2.3118016 2 1.6346906
5: C G J 0.17434075 0.8503874 4 0.42519369 -0.11485558 1.4184031 4 0.70920154 -0.005140781 0.6871591 4 0.3435796
6: D G J 0.17943465 0.4943486 2 0.34955725 -0.22223273 0.3679613 2 0.26018796 -0.373289114 1.0737512 2 0.7592568
7: D H J 0.38090937 0.7904832 6 0.32271340 0.02107597 1.0094695 6 0.41211422 0.118277330 0.9024006 6 0.3684035

Here is a solution using dplyr. This gives the result in a "wide" format (i.e. the stats for dv1, dv2, dv3 are on the same line).
se <- function(x) { sd(x)/sqrt(length(x)) }
my.dat %>%
group_by(group1, group2, group3) %>%
summarise_each(funs(mean, sd, length, se), dv1, dv2, dv3) %>%
ungroup
If having the stats for dv1, dv2, and dv3 on separate lines is desired, this can be modified using melt or gather (from tidyr).

Get all the indices of unique elements

I have a dataset with 500 000 entries. Each entry in it has a userId and a productId. I want to get all userIds corresponding to each distinct productIds. But the list is to huge that none of the following method works for me, it's going very slow. Is there any faster solution.
Using lapply: (Problem: Traversing the whole rpid list for each uniqPids elements)
orderedIndx <- lapply(uniqPids, function(x){
which(rpid %in% x)
})
names(orderedIndx) <- uniqPids
#Looking for indices with each unique productIds
Using For loop:
orderedIndx <- list()
for(j in 1:length(rpid)){
existing <- length(orderedIndx[rpid[j]])
orderedIndx[rpid[j]][existing + 1] <- j
}
Sample Data:
ruid[1:10]
# [1] "a3sgxh7auhu8gw" "a1d87f6zcve5nk" "abxlmwjixxain" "a395borc6fgvxv" "a1uqrsclf8gw1t" "adt0srk1mgoeu"
[7] "a1sp2kvkfxxru1" "a3jrgqveqn31iq" "a1mzyo9tzk0bbi" "a21bt40vzccyt4"
rpid[1:10]
# [1] "b001e4kfg0" "b001e4kfg0" "b000lqoch0" "b000ua0qiq" "b006k2zz7k" "b006k2zz7k" "b006k2zz7k" "b006k2zz7k"
[9] "b000e7l2r4" "b00171apva"
Output should be like:
b001e4kfg0 -> a3sgxh7auhu8gw, a1d87f6zcve5nk
b000lqoch0 -> abxlmwjixxain
b000ua0qiq -> a395borc6fgvxv
b006k2zz7k -> a1uqrsclf8gw1t, adt0srk1mgoeu, a1sp2kvkfxxru1, a3jrgqveqn31iq
b000e7l2r4 -> a1mzyo9tzk0bbi
b00171apva -> a21bt40vzccyt4

It seems perhaps you're just looking for split?
split(seq_along(rpid), rpid)

Not exactly sure what type of output you want, or how many rows you have in your dataset, but I'd suggest 3 versions and you can chose the one you like. First version uses dplyr and character values for your variables. I expect this to be slow if you have millions of rows. Second version uses dplyr but factor variables. I expect this to be faster than the previous one. Third version uses data.table. I expect this to be equally fast, or faster than the second version.
library(dplyr)
ruid =
c("a3sgxh7auhu8gw", "a1d87f6zcve5nk", "abxlmwjixxain", "a395borc6fgvxv",
"a1uqrsclf8gw1t", "adt0srk1mgoeu", "a1sp2kvkfxxru1", "a3jrgqveqn31iq",
"a1mzyo9tzk0bbi", "a21bt40vzccyt4")
rpid =
c("b001e4kfg0", "b001e4kfg0", "b000lqoch0", "b000ua0qiq", "b006k2zz7k",
"b006k2zz7k", "b006k2zz7k", "b006k2zz7k", "b000e7l2r4", "b00171apva")
### using dplyr and character values
dt = data.frame(rpid, ruid, stringsAsFactors = F)
dt %>%
group_by(rpid) %>%
do(data.frame(list_ruids = paste(c(.$ruid), collapse=", "))) %>%
ungroup
# rpid list_ruids
# (chr) (chr)
# 1 b000e7l2r4 a1mzyo9tzk0bbi
# 2 b000lqoch0 abxlmwjixxain
# 3 b000ua0qiq a395borc6fgvxv
# 4 b00171apva a21bt40vzccyt4
# 5 b001e4kfg0 a3sgxh7auhu8gw, a1d87f6zcve5nk
# 6 b006k2zz7k a1uqrsclf8gw1t, adt0srk1mgoeu, a1sp2kvkfxxru1, a3jrgqveqn31iq
# ----------------------------------
### using dplyr and factor values
dt = data.frame(rpid, ruid, stringsAsFactors = T)
dt %>%
group_by(rpid) %>%
do(data.frame(list_ruids = paste(c(levels(dt$ruid)[.$ruid]), collapse=", "))) %>%
ungroup
# rpid list_ruids
# (fctr) (chr)
# 1 b000e7l2r4 a1mzyo9tzk0bbi
# 2 b000lqoch0 abxlmwjixxain
# 3 b000ua0qiq a395borc6fgvxv
# 4 b00171apva a21bt40vzccyt4
# 5 b001e4kfg0 a3sgxh7auhu8gw, a1d87f6zcve5nk
# 6 b006k2zz7k a1uqrsclf8gw1t, adt0srk1mgoeu, a1sp2kvkfxxru1, a3jrgqveqn31iq
# -------------------------------------
library(data.table)
### using data.table
dt = data.table(rpid, ruid)
dt[, list(list_ruids = paste(c(ruid), collapse=", ")), by = rpid]
# rpid list_ruids
# 1: b001e4kfg0 a3sgxh7auhu8gw, a1d87f6zcve5nk
# 2: b000lqoch0 abxlmwjixxain
# 3: b000ua0qiq a395borc6fgvxv
# 4: b006k2zz7k a1uqrsclf8gw1t, adt0srk1mgoeu, a1sp2kvkfxxru1, a3jrgqveqn31iq
# 5: b000e7l2r4 a1mzyo9tzk0bbi
# 6: b00171apva a21bt40vzccyt4

Do you have tidy data in a dataframe? Then you can do this.
library(dplyr)
df %>%
select(productId, userId) %>%
distinct

dplyr summarize: create variables from named vector

Here's my problem:
I am using a function that returns a named vector. Here's a toy example:
toy_fn <- function(x) {
y <- c(mean(x), sum(x), median(x), sd(x))
names(y) <- c("Right", "Wrong", "Unanswered", "Invalid")
y
}
I am using group_by in dplyr to apply this function for each group (typical split-apply-combine). So, here's my toy data.frame:
set.seed(1234567)
toy_df <- data.frame(id = 1:1000,
group = sample(letters, 1000, replace = TRUE),
value = runif(1000))
And here's the result I am aiming for:
toy_summary <-
toy_df %>%
group_by(group) %>%
summarize(Right = toy_fn(value)["Right"],
Wrong = toy_fn(value)["Wrong"],
Unanswered = toy_fn(value)["Unanswered"],
Invalid = toy_fn(value)["Invalid"])
> toy_summary
Source: local data frame [26 x 5]
group Right Wrong Unanswered Invalid
1 a 0.5038394 20.15358 0.5905526 0.2846468
2 b 0.5048040 15.64892 0.5163702 0.2994544
3 c 0.5029442 21.62660 0.5072733 0.2465612
4 d 0.5124601 14.86134 0.5382463 0.2681955
5 e 0.4649483 17.66804 0.4426197 0.3075080
6 f 0.5622644 12.36982 0.6330269 0.2850609
7 g 0.4675324 14.96104 0.4692404 0.2746589
It works! But it is just not cool to call four times the same function. I would rather like dplyr to get the named vector and create a new variable for each element in the vector. Something like this:
toy_summary <-
toy_df %>%
group_by(group) %>%
summarize(toy_fn(value))
This, unfortunately, does not work because "Error: expecting a single value".
I thought, ok, let's just convert the vector to a data.frame using data.frame(as.list(x)). But this does not work either. I tried many things but I couldn't trick dplyr into think it's actually receiving one single value (observation) for 4 different variables. Is there any way to help dplyr realize that?.

One possible solution is to use dplyr SE capabilities. For example, set you function as follows
dots <- setNames(list( ~ mean(value),
~ sum(value),
~ median(value),
~ sd(value)),
c("Right", "Wrong", "Unanswered", "Invalid"))
Then, you can use summarize_ (with a _) as follows
toy_df %>%
group_by(group) %>%
summarize_(.dots = dots)
# Source: local data table [26 x 5]
#
# group Right Wrong Unanswered Invalid
# 1 o 0.4490776 17.51403 0.4012057 0.2749956
# 2 s 0.5079569 15.23871 0.4663852 0.2555774
# 3 x 0.4620649 14.78608 0.4475117 0.2894502
# 4 a 0.5038394 20.15358 0.5905526 0.2846468
# 5 t 0.5041168 24.19761 0.5330790 0.3171022
# 6 m 0.4806628 21.14917 0.4805273 0.2825026
# 7 c 0.5029442 21.62660 0.5072733 0.2465612
# 8 w 0.4932484 17.75694 0.4891746 0.3309680
# 9 q 0.5350707 22.47297 0.5608505 0.2749941
# 10 g 0.4675324 14.96104 0.4692404 0.2746589
# .. ... ... ... ... ...
Though it looks nice, there is a big catch here. You have to know the column you are going to operate on a priori (value) when setting up the function, so it won't work on some other column name, if you won't set up dots properly.
As a bonus here's a simple solution using data.table using your original function
library(data.table)
setDT(toy_df)[, as.list(toy_fn(value)), by = group]
# group Right Wrong Unanswered Invalid
# 1: o 0.4490776 17.51403 0.4012057 0.2749956
# 2: s 0.5079569 15.23871 0.4663852 0.2555774
# 3: x 0.4620649 14.78608 0.4475117 0.2894502
# 4: a 0.5038394 20.15358 0.5905526 0.2846468
# 5: t 0.5041168 24.19761 0.5330790 0.3171022
# 6: m 0.4806628 21.14917 0.4805273 0.2825026
# 7: c 0.5029442 21.62660 0.5072733 0.2465612
# 8: w 0.4932484 17.75694 0.4891746 0.3309680
# 9: q 0.5350707 22.47297 0.5608505 0.2749941
# 10: g 0.4675324 14.96104 0.4692404 0.2746589
#...

You can also try this with do():
toy_df %>%
group_by(group) %>%
do(res = toy_fn(.$value))

This is not a dplyr solution, but if you like pipes:
library(magrittr)
toy_summary <-
toy_df %>%
split(.$group) %>%
lapply( function(x) toy_fn(x$value) ) %>%
do.call(rbind, .)
# > head(toy_summary)
# Right Wrong Unanswered Invalid
# a 0.5038394 20.15358 0.5905526 0.2846468
# b 0.5048040 15.64892 0.5163702 0.2994544
# c 0.5029442 21.62660 0.5072733 0.2465612
# d 0.5124601 14.86134 0.5382463 0.2681955
# e 0.4649483 17.66804 0.4426197 0.3075080
# f 0.5622644 12.36982 0.6330269 0.2850609

Apparently there's a problem when using median (not sure what's going on there) but apart from that you can normally use an approach like the following with summarise_each to apply multiple functions. Note that you can specify the names of resulting columns by using a named vector as input to funs_():
x <- c(Right = "mean", Wrong = "sd", Unanswered = "sum")
toy_df %>%
group_by(group) %>%
summarise_each(funs_(x), value)
#Source: local data frame [26 x 4]
#
# group Right Wrong Unanswered
#1 a 0.5038394 0.2846468 20.15358
#2 b 0.5048040 0.2994544 15.64892
#3 c 0.5029442 0.2465612 21.62660
#4 d 0.5124601 0.2681955 14.86134
#5 e 0.4649483 0.3075080 17.66804
#6 f 0.5622644 0.2850609 12.36982
#7 g 0.4675324 0.2746589 14.96104
#8 h 0.4921506 0.2879830 21.16248
#9 i 0.5443600 0.2945428 22.31876
#10 j 0.5276048 0.3236814 20.57659
#.. ... ... ... ...

using the sequence of list(as_tibble(as.list(...)) followed by an unnest from tidyr does the trick
toy_summary2 <- toy_df %>% group_by(group) %>%
summarize(Col = list(as_tibble(as.list(toy_fn(value))))) %>% unnest()

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Calculating the mean vectors for a two-factorial table - r

You can use data.table, library(data.table) setDT(reagent.dat)[, lapply(.SD, mean), by = reagent, .SDcols = c('RBC', 'WBC', 'hemoglobin')] # reagent RBC WBC hemoglobin #1: 1 7.290 4.9535 15.310 #2: 2 7.210 4.8985 15.725 #3: 3 7.055 4.8810 15.595 #4: 4 7.025 4.8915 15.765

Related

How can i add more columns in dataframe by for loop

How to aggregate using ddply when not all elements of a variable exist on R

How to get summary statistics for multiple variables by multiple groups?

Get all the indices of unique elements

dplyr summarize: create variables from named vector

Categories

Resources