This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
I am analyzing a set of data with many columns (almost 30 columns). I want to group data based on two columns and apply sum and mean functions to all the columns except timestamp.
How would I use summarise_each on all columns except timestamp?
This is the draft code I have but it obviously not correct. Plus it generates and error because it can not apply Sum to POSIXt data type (Error: 'sum' not defined for "POSIXt" objects)
features <- dataset %>%
group_by(X, Y) %>%
summarise_each(funs(mean,sum)) %>%
arrange(TIMESTAMP)
Try summarise_each(funs(mean,sum), -TIMESTAMP) to exclude TIMESTAMP from the summarisation.
Related
This question already has answers here:
filtering within the summarise function of dplyr
(3 answers)
Opposite of %in%: exclude rows with values specified in a vector
(13 answers)
Closed 3 months ago.
This post was edited and submitted for review 3 months ago and failed to reopen the post:
Original close reason(s) were not resolved
EDIT: I want to specify which values NOT to include in my calculation by providing a list of values for records to skip. I do NOT want to provide a list of values to include in my calculation because my dataset is too large.
I want to group records based on a certain value, and then I want to do some other calculations for certain variables; however, I want to exclude certain values from one of those calculations. Here is an example of what the data transformation would look like without any exclusions:
library(dplyr)
grouped <- starwars %>%
group_by(species) %>% #group my data by a particular value
summarise(Total_Mass = sum(mass), #make a calculation
Average_Height = mean(height)) # make another calculation
and here's what I am attempting to do:
exclude <- c("R2-D2","Luke","Darth") #make a list of the names of records I would like to exclude
grouped2 <- starwars %>%
group_by(species) %>%
summarise(Total_Mass = sum(mass) where name !%in% exclude, #sum mass for all records except those where name is in the exclude list
Average_Height = mean(height)) # make another calculation without any exclusions
This question already has answers here:
How to sum a variable by group
(18 answers)
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 1 year ago.
This is my current code for this image data[1:20,c("Job.Family", "Salaries", "Retirement")]. The goal here is to group all the same jobs in the Job.Family column together without loosing any data associated with it. So for example I would like to find out the sum of "Salaries" and "Retirement" for all those in the "Information System" Job.Family. Hopefully this makes sense.
You are probably looking into some very basic subsetting and summarising operations here.
I strongly recommend you study the dplyr package.
Your example:
library(dplyr)
df %>% filter(Job.Family = "Information Systems") %>%
summarise(across(c(Salaries, Retirement), mean))
You may want to calculate this for all groups, as in:
df %>% group_by(Job.Family) %>%
summarise(across(c(Salaries, Retirement), mean))
This question already has answers here:
Mean per group in a data.frame [duplicate]
(8 answers)
Closed 7 years ago.
I am trying to calculate an overall mean of multiple classes. Currently the database is in long format. I tried selecting first ID number (group variable 1), then a dummy variable (stem=1) classes that I am interested in (grouping variable 2), and then calculating one GPA mean (i.e., stem GPA mean) for the grades received in interested classes (stem=1).
I have an attached an example of the database below. Overall,, I am trying figure out how to calculate stem GPA for each student.
See example here
I have tried using library(psych), describeBy(data, dataset$id, dataset$stem), but to no avail. Any suggestions?
I prefer the dplyr package for these operations. Try e.g.
df %>% group_by(class) %>% summarise(mean_class=mean(class))
For instance, using the mtcars dataset:
library(dplyr)
mtcars %>% group_by(cyl) %>% summarise(mean_disp = mean(disp))
will give you all the means of disp based on the grouping variable cyl.
This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
I have a simple dataframe with the following column name
Subject # Type # Value0 # value1# value2# ....value100
I want to use the dplyr summarize operation in order to get the mean of each value columns.
I think there is a useful alternative to
ddply(dataframe, c("Subject,Type"), summarize, m1= mean(value1), m2=mean(value2)....)
If I gather all Value column name in a list
names =c("Value0,Value1,....Value100)
How can I use this list in ddply?
We can use summarise_each
library(dplyr)
df1 %>%
group_by(Subject, Type) %>%
summarise_each(funs(mean= mean(., na.rm=TRUE)))
This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
I have a SKU -week data which has different in-store activities (Tactics). I want to summarize each variables by the tactics. Pasting the code in SAS:
proc sql;
create table lp.lp_sku_report1
as select distinct(tactic), sum(Sales_Stat_Case_10_Lt) as Sales_Stat_Case_10_Lt, sum(Sales_Units) as Sales_Units, sum(Sales_Dollars) as Sales_Dollars, sum(Baseline_Stat_Case_10_Lt) as Baseline_Stat_Case_10_Lt, sum(Baseline_Units) as Baseline_Units, sum(Baseline_Dollars) as Baseline_Dollars
from lp.lp_sku_data
group by tactic; quit;
with dplyr package it's very easy to perform this action. first group by the variable tactic and then summarise rest of the variables by using the aggregating function sum.
In this specific case since the aggregating function is same for all the variables you can use summarise_each to apply the same function to all variables.
below is the code
library(dplyr)
df = df %>%
group_by(tactic) %>%
summarise_each(funs(sum))