dplyr summarize multiple column [duplicate] - r

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
I have a simple dataframe with the following column name
Subject # Type # Value0 # value1# value2# ....value100
I want to use the dplyr summarize operation in order to get the mean of each value columns.
I think there is a useful alternative to
ddply(dataframe, c("Subject,Type"), summarize, m1= mean(value1), m2=mean(value2)....)
If I gather all Value column name in a list
names =c("Value0,Value1,....Value100)
How can I use this list in ddply?

We can use summarise_each
library(dplyr)
df1 %>%
group_by(Subject, Type) %>%
summarise_each(funs(mean= mean(., na.rm=TRUE)))

Related

Grouping same values from a single column while retaining the data in [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 1 year ago.
This is my current code for this image data[1:20,c("Job.Family", "Salaries", "Retirement")]. The goal here is to group all the same jobs in the Job.Family column together without loosing any data associated with it. So for example I would like to find out the sum of "Salaries" and "Retirement" for all those in the "Information System" Job.Family. Hopefully this makes sense.
You are probably looking into some very basic subsetting and summarising operations here.
I strongly recommend you study the dplyr package.
Your example:
library(dplyr)
df %>% filter(Job.Family = "Information Systems") %>%
summarise(across(c(Salaries, Retirement), mean))
You may want to calculate this for all groups, as in:
df %>% group_by(Job.Family) %>%
summarise(across(c(Salaries, Retirement), mean))

Dplyr Version of ColSum or Dynamic Group_By in R [duplicate]

This question already has answers here:
Group by multiple columns and sum other multiple columns
(7 answers)
Closed 2 years ago.
I'm having a hard time working through something simple. I have a data frame where the first column is "Cat" and includes 3 different variables which I would like to group_by and summarize. Columns 2-5 are considered Months so 1 is the first month, 2 is the second month etc. What I'm trying to do is group_by Cat and sum up the individual columns. I've tried working with colSums and aggregate. Any help would greatly appreciated! Thanks
dff<-data.frame(Cat=c('A','B','C','A','A','A','B','C'),
'1'=c(10,20,30,80,10,15,20,15),
'2'=c(15,10,20,30,60,45,50,65),
'3'=c(10,20,30,80,20,25,27,85),
'4'=c(90,70,50,30,10,15,20,15),
'5'=c(1,120,3,8,7,10,25,30))
Using aggregate in base R
aggregate(. ~ Cat, dff, sum)
Or with dplyr
library(dplyr)
dff %>%
group_by(Cat) %>%
summarise(across(everything(), sum))

What is the right way to reference part of a dataframe after piping? [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
What is the correct way to do something like this? I am trying to get the colSums of each group for specific columns. The . syntax seems incorrect with this type of subsetting.
csv<-data.frame(id_num=c(1,1,1,2,2),c(1,2,3,4,5),c(1,2,3,3,3))
temp<-csv%>%group_by(id_num)%>%colSums(.[,2:3],na.rm=T)
This can be done with summarise_each or in the recent version additional functions like summarise_at, summarise_if were introduced for convenient use.
csv %>%
group_by(id_num) %>%
summarise_each(funs(sum))
csv %>%
group_by(id_num) %>%
summarise_at(2:3, sum)
If we are using column names, wrap it with vars in the summarise_at
csv %>%
group_by(id_num) %>%
summarise_at(names(csv)[-1], sum)
NOTE: In the OP's dataset, the column names for the 2nd and 3rd columns were not specified resulting in something like c.1..2..3..4..5.
Using the vars to apply the function on the selected column names
csv %>%
group_by(id_num) %>%
summarise_at(vars(c.1..2..3..4..5.), sum)
# # A tibble: 2 × 2
# id_num c.1..2..3..4..5.
# <dbl> <dbl>
#1 1 6
#2 2 9

How to apply summarise_each to all columns except one? [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
I am analyzing a set of data with many columns (almost 30 columns). I want to group data based on two columns and apply sum and mean functions to all the columns except timestamp.
How would I use summarise_each on all columns except timestamp?
This is the draft code I have but it obviously not correct. Plus it generates and error because it can not apply Sum to POSIXt data type (Error: 'sum' not defined for "POSIXt" objects)
features <- dataset %>%
group_by(X, Y) %>%
summarise_each(funs(mean,sum)) %>%
arrange(TIMESTAMP)
Try summarise_each(funs(mean,sum), -TIMESTAMP) to exclude TIMESTAMP from the summarisation.

how to obtain summary of statistics for distinct values of a column in dataframe in R? [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
Consider we have a data.frame named IND, in which we have a column called dept. There are in total 100 rows and there are 20 distinct values in dept.
Now I would like to obtain the summary statistics for these 20 subsets of data.frame containing 5 rows each using the main data.frame!
summary(IND) gives the summary statistics for whole dataset but what should I do in my case?
Something like this
mtcars %>% group_by(cyl) %>% summarise_each(funs(sum, mean))
can be used for your case as
IND %>% group_by(dept) %>% summarise_each(funs(sum, mean))

Resources