R join same row and calculate mean value [duplicate] - r

This question already has answers here:
Grouping functions (tapply, by, aggregate) and the *apply family
(10 answers)
Closed 7 years ago.
I have a data frame that looks like this:
data<-data.frame(y=c(1,1,2,2,3,4,5,5),x=c(5,5,10,10,5,10,5,5))
y x
1 1 5
2 1 5
3 2 10
4 2 30
5 3 5
6 4 10
7 5 4
8 5 8
How can a merge those rows with same value in y column and modify the x column value to the mean of them.
I would like something like this:
y x
1 1 5
2 2 20
3 3 5
4 4 10
7 5 6
I'm trying:
unique(data)
But it removes the values instead of doing the mean of same rows.

It is easy with dplyr. Like here:
library("dplyr")
data %>%
group_by(y) %>%
summarise(x=mean(x))

We can use aggregate
aggregate(x~y, data, mean)

User plyr.
# Create dummy data.
nel = 30
df <- data.frame(x = round(5*runif(nel)), y= round(10*runif(nel)))
# Summarise means
require(plyr)
df$x <- as.factor(df$x)
res <- ddply(df, .(x), summarise, mu=mean(y))

Related

Sum of data in a column based on categorical condition from another column [duplicate]

This question already has answers here:
Idiomatic R code for partitioning a vector by an index and performing an operation on that partition
(3 answers)
Closed 9 years ago.
Suppose I have a data frame like this:
set.seed(123)
df <- as.data.frame(cbind(y<-sample(c("A","B","C"),10,T), X<-sample(c(1,2,3),10,T)))
df <- df[order(df$V1),]
Is there a simply function to sum (or any FUN) V2 by V1 and add to df as a new column, such that:
df$sum <- c(6,6,8,8,8,8,6,6,6,6)
df
I may write a function for that, but I have to do that frequently and be better to know the simplest way to realize that.
I agree with #mnel at least on his first point. I didn't see ave demonstrated in the answers he cited and I think it's the "simplest" base-R method. Using that data.frame(cbind( ...)) construction should be outlawed and teachers who demonstrate it should be stripped of their credentials.
set.seed(123)
df<-data.frame(y=sample( c("A","B","C"), 10, T),
X=sample(c (1,2,3), 10, T))
df<-df[order(df$y),] # that step is not necessary for success.
df
df$sum <- ave(df$X, df$y, FUN=sum)
df
y X sum
1 A 3 6
6 A 3 6
3 B 3 8
7 B 1 8
9 B 1 8
10 B 3 8
2 C 2 6
4 C 2 6
5 C 1 6
8 C 1 6

How sum a column corresponding to group elements? [duplicate]

This question already has answers here:
Idiomatic R code for partitioning a vector by an index and performing an operation on that partition
(3 answers)
Closed 9 years ago.
Suppose I have a data frame like this:
set.seed(123)
df <- as.data.frame(cbind(y<-sample(c("A","B","C"),10,T), X<-sample(c(1,2,3),10,T)))
df <- df[order(df$V1),]
Is there a simply function to sum (or any FUN) V2 by V1 and add to df as a new column, such that:
df$sum <- c(6,6,8,8,8,8,6,6,6,6)
df
I may write a function for that, but I have to do that frequently and be better to know the simplest way to realize that.
I agree with #mnel at least on his first point. I didn't see ave demonstrated in the answers he cited and I think it's the "simplest" base-R method. Using that data.frame(cbind( ...)) construction should be outlawed and teachers who demonstrate it should be stripped of their credentials.
set.seed(123)
df<-data.frame(y=sample( c("A","B","C"), 10, T),
X=sample(c (1,2,3), 10, T))
df<-df[order(df$y),] # that step is not necessary for success.
df
df$sum <- ave(df$X, df$y, FUN=sum)
df
y X sum
1 A 3 6
6 A 3 6
3 B 3 8
7 B 1 8
9 B 1 8
10 B 3 8
2 C 2 6
4 C 2 6
5 C 1 6
8 C 1 6

Condensing data frame with same names and different values [duplicate]

This question already has answers here:
How to use Aggregate function in R
(3 answers)
How to sum a variable by group
(18 answers)
Closed 5 years ago.
I have a data frame that I am trying to condense. There are multiple value os X with the same names but with different Y values associated with them:
X Y
1 a 1
2 b 3
3 a 2
4 c 4
5 b 7
I want to condense the data frame so there are no duplicate names in X, like below:
X Y
1 a 3
2 b 10
3 c 4
Using tidyverse:
library(tidyverse)
df <- df %>%
group_by(x) %>%
summarise(y = sum(y))

Apply a maximum value to whole group [duplicate]

This question already has answers here:
Aggregate a dataframe on a given column and display another column
(8 answers)
Closed 6 years ago.
I have a df like this:
Id count
1 0
1 5
1 7
2 5
2 10
3 2
3 5
3 4
and I want to get the maximum count and apply that to the whole "group" based on ID, like this:
Id count max_count
1 0 7
1 5 7
1 7 7
2 5 10
2 10 10
3 2 5
3 5 5
3 4 5
I've tried pmax, slice etc. I'm generally having trouble working with data that is in interval-specific form; if you could direct me to tools well-suited to that type of data, would really appreciate it!
Figured it out with help from Gavin Simpson here: Aggregate a dataframe on a given column and display another column
maxcount <- aggregate(count ~ Id, data = df, FUN = max)
new_df<-merge(df, maxcount)
Better way:
df$max_count <- with(df, ave(count, Id, FUN = max))

How to apply a function to grouped rows in R [duplicate]

This question already has answers here:
Grouping functions (tapply, by, aggregate) and the *apply family
(10 answers)
Closed 6 years ago.
I have a data frame generated by
points_A = sample(1:6,6)
points_B = sample(1:6,6)
points_C = sample(1:6,6)
df <- data.frame( name = gl(3,2,labels=c("Luca","Mario","Paolo") ) , cbind(points_A,points_B,points_C) )
which display as
name points_A points_B points_C
1 Luca 5 2 3
2 Luca 3 3 1
3 Mario 1 5 2
4 Mario 6 6 4
5 Paolo 4 4 5
6 Paolo 2 1 6
I would like to apply a function (e.g. sum() ) to the rows grouped by the column name (1st column).
The output should be something like:
name points_A points_B points_C
1 Luca 8 5 4
2 Mario 7 11 6
3 Paolo 6 5 11
Any suggestions?
I like to do these things with data.table
library(data.table);
dt<-data.table(df) ;
dt[, function(column), by = group]
As "column" you can also set .SD to get multiple columns. "group" would be "name" in your example.
A (pretty raw) solution with data.table
require(data.table)
setDT(df)
df[, lapply(.SD, sum), by = name, .SDcols = 2:4]
name points_A points_B points_C
1: Luca 9 6 6
2: Mario 5 10 11
3: Paolo 7 5 4
EDIT:
A raw solution in base R:
t(sapply(split(df, df$name), function(x) colSums(x[, c("points_A", "points_B", "points_C")])))

Resources