How can I perform following operation in R? [duplicate]

How can I perform following operation in R? [duplicate] - r

I want to calculate mean (or any other summary statistics of length one, e.g. min, max, length, sum) of a numeric variable ("value") within each level of a grouping variable ("group").
The summary statistic should be assigned to a new variable which has the same length as the original data. That is, each row of the original data should have a value corresponding to the current group value - the data set should not be collapsed to one row per group. For example, consider group mean:
Before
id group value
1 a 10
2 a 20
3 b 100
4 b 200
After
id group value grp.mean.values
1 a 10 15
2 a 20 15
3 b 100 150
4 b 200 150

You may do this in dplyr using mutate:
library(dplyr)
df %>%
group_by(group) %>%
mutate(grp.mean.values = mean(value))
...or use data.table to assign the new column by reference (:=):
library(data.table)
setDT(df)[ , grp.mean.values := mean(value), by = group]

Have a look at the ave function. Something like
df$grp.mean.values <- ave(df$value, df$group)
If you want to use ave to calculate something else per group, you need to specify FUN = your-desired-function, e.g. FUN = min:
df$grp.min <- ave(df$value, df$group, FUN = min)

One option is to use plyr. ddply expects a data.frame (the first d) and returns a data.frame (the second d). Other XXply functions work in a similar way; i.e. ldply expects a list and returns a data.frame, dlply does the opposite...and so on and so forth. The second argument is the grouping variable(s). The third argument is the function we want to compute for each group.
require(plyr)
ddply(dat, "group", transform, grp.mean.values = mean(value))
id group value grp.mean.values
1 1 a 10 15
2 2 a 20 15
3 3 b 100 150
4 4 b 200 150

Here is another option using base functions aggregate and merge:
merge(x, aggregate(value ~ group, data = x, mean),
by = "group", suffixes = c("", "mean"))
group id value.x value.y
1 a 1 10 15
2 a 2 20 15
3 b 3 100 150
4 b 4 200 150
You can get "better" column names with suffixes:
merge(x, aggregate(value ~ group, data = x, mean),
by = "group", suffixes = c("", ".mean"))
group id value value.mean
1 a 1 10 15
2 a 2 20 15
3 b 3 100 150
4 b 4 200 150

Related

How to create a new variable conditioning on another variable in R? [duplicate]

I want to calculate mean (or any other summary statistics of length one, e.g. min, max, length, sum) of a numeric variable ("value") within each level of a grouping variable ("group").
The summary statistic should be assigned to a new variable which has the same length as the original data. That is, each row of the original data should have a value corresponding to the current group value - the data set should not be collapsed to one row per group. For example, consider group mean:
Before
id group value
1 a 10
2 a 20
3 b 100
4 b 200
After
id group value grp.mean.values
1 a 10 15
2 a 20 15
3 b 100 150
4 b 200 150

You may do this in dplyr using mutate:
library(dplyr)
df %>%
group_by(group) %>%
mutate(grp.mean.values = mean(value))
...or use data.table to assign the new column by reference (:=):
library(data.table)
setDT(df)[ , grp.mean.values := mean(value), by = group]

Have a look at the ave function. Something like
df$grp.mean.values <- ave(df$value, df$group)
If you want to use ave to calculate something else per group, you need to specify FUN = your-desired-function, e.g. FUN = min:
df$grp.min <- ave(df$value, df$group, FUN = min)

One option is to use plyr. ddply expects a data.frame (the first d) and returns a data.frame (the second d). Other XXply functions work in a similar way; i.e. ldply expects a list and returns a data.frame, dlply does the opposite...and so on and so forth. The second argument is the grouping variable(s). The third argument is the function we want to compute for each group.
require(plyr)
ddply(dat, "group", transform, grp.mean.values = mean(value))
id group value grp.mean.values
1 1 a 10 15
2 2 a 20 15
3 3 b 100 150
4 4 b 200 150

Here is another option using base functions aggregate and merge:
merge(x, aggregate(value ~ group, data = x, mean),
by = "group", suffixes = c("", "mean"))
group id value.x value.y
1 a 1 10 15
2 a 2 20 15
3 b 3 100 150
4 b 4 200 150
You can get "better" column names with suffixes:
merge(x, aggregate(value ~ group, data = x, mean),
by = "group", suffixes = c("", ".mean"))
group id value value.mean
1 a 1 10 15
2 a 2 20 15
3 b 3 100 150
4 b 4 200 150

Aggregate multiple rows in R based on common values in given columns by column indices

Here is a very similar question:
Aggregate multiple rows of the same data.frame in R based on common values in given columns
In my situation, the selection of columns is changing in different simulated samples. I have the selected column indices in each simulation. How can I use the function aggregate on indices instead of variable names? Namely, in the answer of that question, how can I use a code like this:
c=c(1,2,3)
aggregate(value ~ df[,c], FUN = mean, data=df) # comparing to aggregate(value ~ item + size + weight, FUN = mean, data=df)
(Please note that the above line won't run in R.)
Thank you for any help!

Without using the formula method, subset the column 'value' and the grouping columns in the by and specify the function
aggregate(df["value"], df[,c], FUN = mean)
#. item size weight value
#1 B 1 2 3
#2 C 3 2 1
#3 A 2 3 5
With the formula method, subset the grouping columns along with the columns that we want to get the mean of and use . to specify all the columns in the subset dataset
aggregate(value ~ ., data= df[, c('value', names(df)[c])], mean)
# item size weight value
#1 B 1 2 3
#2 C 3 2 1
#3 A 2 3 5
--
If we want to use dplyr, use group_by_at and specify the c variables in it
library(dplyr)
df %>%
group_by_at(c) %>%
# or extract column names, convert to symbol, and evaluate (!!!)
#group_by(!!! rlang::syms(names(.)[c])) %>%
summarise(value = mean(value))
# A tibble: 3 x 4
# Groups: item, size [?]
# item size weight value
# <fct> <int> <int> <dbl>
#1 A 2 3 5
#2 B 1 2 3
#3 C 3 2 1
NOTE: The input dataset is taken from the link in the OP's post

R: Is there a function to create a new columns WITHIN a dataframe by calculate groupwise sums? [duplicate]

I want to calculate mean (or any other summary statistics of length one, e.g. min, max, length, sum) of a numeric variable ("value") within each level of a grouping variable ("group").
The summary statistic should be assigned to a new variable which has the same length as the original data. That is, each row of the original data should have a value corresponding to the current group value - the data set should not be collapsed to one row per group. For example, consider group mean:
Before
id group value
1 a 10
2 a 20
3 b 100
4 b 200
After
id group value grp.mean.values
1 a 10 15
2 a 20 15
3 b 100 150
4 b 200 150

You may do this in dplyr using mutate:
library(dplyr)
df %>%
group_by(group) %>%
mutate(grp.mean.values = mean(value))
...or use data.table to assign the new column by reference (:=):
library(data.table)
setDT(df)[ , grp.mean.values := mean(value), by = group]

Have a look at the ave function. Something like
df$grp.mean.values <- ave(df$value, df$group)
If you want to use ave to calculate something else per group, you need to specify FUN = your-desired-function, e.g. FUN = min:
df$grp.min <- ave(df$value, df$group, FUN = min)

One option is to use plyr. ddply expects a data.frame (the first d) and returns a data.frame (the second d). Other XXply functions work in a similar way; i.e. ldply expects a list and returns a data.frame, dlply does the opposite...and so on and so forth. The second argument is the grouping variable(s). The third argument is the function we want to compute for each group.
require(plyr)
ddply(dat, "group", transform, grp.mean.values = mean(value))
id group value grp.mean.values
1 1 a 10 15
2 2 a 20 15
3 3 b 100 150
4 4 b 200 150

Here is another option using base functions aggregate and merge:
merge(x, aggregate(value ~ group, data = x, mean),
by = "group", suffixes = c("", "mean"))
group id value.x value.y
1 a 1 10 15
2 a 2 20 15
3 b 3 100 150
4 b 4 200 150
You can get "better" column names with suffixes:
merge(x, aggregate(value ~ group, data = x, mean),
by = "group", suffixes = c("", ".mean"))
group id value value.mean
1 a 1 10 15
2 a 2 20 15
3 b 3 100 150
4 b 4 200 150

Use dplyr::percent_rank() to compute percentile ranks within group

Suppose I have the following data:
id grpvar1 grpvar2 value
1 1 3 7.6
2 1 2 4
...
3 1 5 2
For each id, I want to compute the percent_rank() of its value within the group defined by the combination of grpvar1 and grpvar2.
Using data.table, I would go (assuming I my data is in a data.frame called dataf:
library(data.table)
# Make dataset into a data.table.
dt <- data.table(dataf)
# Calculate the percentiles.
dt[, percrank := rank(value)/length(value), by = c("grpvar1", "grpvar2")]
What is the equivalent in dplyr?

Try:
library(dplyr)
dataf %>%
group_by(grpvar1, grpvar2) %>%
mutate(percrank=rank(value)/length(value))

Calculate group mean, sum, or other summary stats. and assign column to original data

I want to calculate mean (or any other summary statistics of length one, e.g. min, max, length, sum) of a numeric variable ("value") within each level of a grouping variable ("group").
The summary statistic should be assigned to a new variable which has the same length as the original data. That is, each row of the original data should have a value corresponding to the current group value - the data set should not be collapsed to one row per group. For example, consider group mean:
Before
id group value
1 a 10
2 a 20
3 b 100
4 b 200
After
id group value grp.mean.values
1 a 10 15
2 a 20 15
3 b 100 150
4 b 200 150

You may do this in dplyr using mutate:
library(dplyr)
df %>%
group_by(group) %>%
mutate(grp.mean.values = mean(value))
...or use data.table to assign the new column by reference (:=):
library(data.table)
setDT(df)[ , grp.mean.values := mean(value), by = group]

Have a look at the ave function. Something like
df$grp.mean.values <- ave(df$value, df$group)
If you want to use ave to calculate something else per group, you need to specify FUN = your-desired-function, e.g. FUN = min:
df$grp.min <- ave(df$value, df$group, FUN = min)

One option is to use plyr. ddply expects a data.frame (the first d) and returns a data.frame (the second d). Other XXply functions work in a similar way; i.e. ldply expects a list and returns a data.frame, dlply does the opposite...and so on and so forth. The second argument is the grouping variable(s). The third argument is the function we want to compute for each group.
require(plyr)
ddply(dat, "group", transform, grp.mean.values = mean(value))
id group value grp.mean.values
1 1 a 10 15
2 2 a 20 15
3 3 b 100 150
4 4 b 200 150

Here is another option using base functions aggregate and merge:
merge(x, aggregate(value ~ group, data = x, mean),
by = "group", suffixes = c("", "mean"))
group id value.x value.y
1 a 1 10 15
2 a 2 20 15
3 b 3 100 150
4 b 4 200 150
You can get "better" column names with suffixes:
merge(x, aggregate(value ~ group, data = x, mean),
by = "group", suffixes = c("", ".mean"))
group id value value.mean
1 a 1 10 15
2 a 2 20 15
3 b 3 100 150
4 b 4 200 150

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How can I perform following operation in R? [duplicate] - r

You may do this in dplyr using mutate: library(dplyr) df %>% group_by(group) %>% mutate(grp.mean.values = mean(value)) ...or use data.table to assign the new column by reference (:=): library(data.table) setDT(df)[ , grp.mean.values := mean(value), by = group]

Have a look at the ave function. Something like df$grp.mean.values <- ave(df$value, df$group) If you want to use ave to calculate something else per group, you need to specify FUN = your-desired-function, e.g. FUN = min: df$grp.min <- ave(df$value, df$group, FUN = min)

Related

How to create a new variable conditioning on another variable in R? [duplicate]

Aggregate multiple rows in R based on common values in given columns by column indices

R: Is there a function to create a new columns WITHIN a dataframe by calculate groupwise sums? [duplicate]

Use dplyr::percent_rank() to compute percentile ranks within group

Calculate group mean, sum, or other summary stats. and assign column to original data

Categories

Resources