Dplyr Version of ColSum or Dynamic Group_By in R [duplicate] - r

This question already has answers here:
Group by multiple columns and sum other multiple columns
(7 answers)
Closed 2 years ago.
I'm having a hard time working through something simple. I have a data frame where the first column is "Cat" and includes 3 different variables which I would like to group_by and summarize. Columns 2-5 are considered Months so 1 is the first month, 2 is the second month etc. What I'm trying to do is group_by Cat and sum up the individual columns. I've tried working with colSums and aggregate. Any help would greatly appreciated! Thanks
dff<-data.frame(Cat=c('A','B','C','A','A','A','B','C'),
'1'=c(10,20,30,80,10,15,20,15),
'2'=c(15,10,20,30,60,45,50,65),
'3'=c(10,20,30,80,20,25,27,85),
'4'=c(90,70,50,30,10,15,20,15),
'5'=c(1,120,3,8,7,10,25,30))

Using aggregate in base R
aggregate(. ~ Cat, dff, sum)
Or with dplyr
library(dplyr)
dff %>%
group_by(Cat) %>%
summarise(across(everything(), sum))

Related

Grouping same values from a single column while retaining the data in [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 1 year ago.
This is my current code for this image data[1:20,c("Job.Family", "Salaries", "Retirement")]. The goal here is to group all the same jobs in the Job.Family column together without loosing any data associated with it. So for example I would like to find out the sum of "Salaries" and "Retirement" for all those in the "Information System" Job.Family. Hopefully this makes sense.
You are probably looking into some very basic subsetting and summarising operations here.
I strongly recommend you study the dplyr package.
Your example:
library(dplyr)
df %>% filter(Job.Family = "Information Systems") %>%
summarise(across(c(Salaries, Retirement), mean))
You may want to calculate this for all groups, as in:
df %>% group_by(Job.Family) %>%
summarise(across(c(Salaries, Retirement), mean))

Conditional sum of a column according to the value of another column when grouping [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 1 year ago.
I'm trying to sum PONDERA when ESTADO==1 and then group by AGLOMERADO
new <- recorte %>% group_by(AGLOMERADO) %>%
summarise(TOTocupied=sum(recorte[recorte$ESTADO==1,"PONDERA"]))
The sum is working correctly, but I can't get the result to be grouped by AGLOMERADO, it gives me back the same result for each AGLOMERADO:
AGLOMERADO TOTocupied
1 100
2 100
3 100
What am I doing wrong?
Don't use $ in dplyr pipe. Also no need to refer to the dataframe again since we are using pipes.
You can try -
library(dplyr)
new <- recorte %>%
group_by(AGLOMERADO)%>%
summarise(TOTocupied = sum(PONDERA[ESTADO==1], na.rm = TRUE))

How to apply summarise_each to all columns except one? [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
I am analyzing a set of data with many columns (almost 30 columns). I want to group data based on two columns and apply sum and mean functions to all the columns except timestamp.
How would I use summarise_each on all columns except timestamp?
This is the draft code I have but it obviously not correct. Plus it generates and error because it can not apply Sum to POSIXt data type (Error: 'sum' not defined for "POSIXt" objects)
features <- dataset %>%
group_by(X, Y) %>%
summarise_each(funs(mean,sum)) %>%
arrange(TIMESTAMP)
Try summarise_each(funs(mean,sum), -TIMESTAMP) to exclude TIMESTAMP from the summarisation.

dplyr summarize multiple column [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
I have a simple dataframe with the following column name
Subject # Type # Value0 # value1# value2# ....value100
I want to use the dplyr summarize operation in order to get the mean of each value columns.
I think there is a useful alternative to
ddply(dataframe, c("Subject,Type"), summarize, m1= mean(value1), m2=mean(value2)....)
If I gather all Value column name in a list
names =c("Value0,Value1,....Value100)
How can I use this list in ddply?
We can use summarise_each
library(dplyr)
df1 %>%
group_by(Subject, Type) %>%
summarise_each(funs(mean= mean(., na.rm=TRUE)))

how to obtain summary of statistics for distinct values of a column in dataframe in R? [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
Consider we have a data.frame named IND, in which we have a column called dept. There are in total 100 rows and there are 20 distinct values in dept.
Now I would like to obtain the summary statistics for these 20 subsets of data.frame containing 5 rows each using the main data.frame!
summary(IND) gives the summary statistics for whole dataset but what should I do in my case?
Something like this
mtcars %>% group_by(cyl) %>% summarise_each(funs(sum, mean))
can be used for your case as
IND %>% group_by(dept) %>% summarise_each(funs(sum, mean))

Resources