Calculate mean of multiple rows using grouping variables [duplicate] - r

This question already has answers here:
Mean per group in a data.frame [duplicate]
(8 answers)
Closed 7 years ago.
I am trying to calculate an overall mean of multiple classes. Currently the database is in long format. I tried selecting first ID number (group variable 1), then a dummy variable (stem=1) classes that I am interested in (grouping variable 2), and then calculating one GPA mean (i.e., stem GPA mean) for the grades received in interested classes (stem=1).
I have an attached an example of the database below. Overall,, I am trying figure out how to calculate stem GPA for each student.
See example here
I have tried using library(psych), describeBy(data, dataset$id, dataset$stem), but to no avail. Any suggestions?

I prefer the dplyr package for these operations. Try e.g.
df %>% group_by(class) %>% summarise(mean_class=mean(class))
For instance, using the mtcars dataset:
library(dplyr)
mtcars %>% group_by(cyl) %>% summarise(mean_disp = mean(disp))
will give you all the means of disp based on the grouping variable cyl.

Related

How to sum values in a group_by pipe except certain values? [duplicate]

This question already has answers here:
filtering within the summarise function of dplyr
(3 answers)
Opposite of %in%: exclude rows with values specified in a vector
(13 answers)
Closed 3 months ago.
This post was edited and submitted for review 3 months ago and failed to reopen the post:
Original close reason(s) were not resolved
EDIT: I want to specify which values NOT to include in my calculation by providing a list of values for records to skip. I do NOT want to provide a list of values to include in my calculation because my dataset is too large.
I want to group records based on a certain value, and then I want to do some other calculations for certain variables; however, I want to exclude certain values from one of those calculations. Here is an example of what the data transformation would look like without any exclusions:
library(dplyr)
grouped <- starwars %>%
group_by(species) %>% #group my data by a particular value
summarise(Total_Mass = sum(mass), #make a calculation
Average_Height = mean(height)) # make another calculation
and here's what I am attempting to do:
exclude <- c("R2-D2","Luke","Darth") #make a list of the names of records I would like to exclude
grouped2 <- starwars %>%
group_by(species) %>%
summarise(Total_Mass = sum(mass) where name !%in% exclude, #sum mass for all records except those where name is in the exclude list
Average_Height = mean(height)) # make another calculation without any exclusions

Grouping same values from a single column while retaining the data in [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 1 year ago.
This is my current code for this image data[1:20,c("Job.Family", "Salaries", "Retirement")]. The goal here is to group all the same jobs in the Job.Family column together without loosing any data associated with it. So for example I would like to find out the sum of "Salaries" and "Retirement" for all those in the "Information System" Job.Family. Hopefully this makes sense.
You are probably looking into some very basic subsetting and summarising operations here.
I strongly recommend you study the dplyr package.
Your example:
library(dplyr)
df %>% filter(Job.Family = "Information Systems") %>%
summarise(across(c(Salaries, Retirement), mean))
You may want to calculate this for all groups, as in:
df %>% group_by(Job.Family) %>%
summarise(across(c(Salaries, Retirement), mean))

Sorting Column in R [duplicate]

This question already has answers here:
Calculate the mean by group
(9 answers)
Closed 3 years ago.
I have data that includes a treatment group, which is indicated by a 1, and a control group, which is indicated by a 0. This is all contained under the variable treat_invite. How can I separate these and take the mean of pct_missing for the 1's and 0's? I've attached an image for clarification.
enter image description here
assuming your data frame is called df:
df <- df %>% group_by(treat_invite) %>% mutate(MeanPCTMissing = mean(PCT_missing))
Or, if you want to just have the summary table (rather than the original table with an additional column):
df <- df %>% group_by(treat_invite) %>% summarise(MeanPCTMissing =
mean(PCT_missing))

How to apply summarise_each to all columns except one? [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
I am analyzing a set of data with many columns (almost 30 columns). I want to group data based on two columns and apply sum and mean functions to all the columns except timestamp.
How would I use summarise_each on all columns except timestamp?
This is the draft code I have but it obviously not correct. Plus it generates and error because it can not apply Sum to POSIXt data type (Error: 'sum' not defined for "POSIXt" objects)
features <- dataset %>%
group_by(X, Y) %>%
summarise_each(funs(mean,sum)) %>%
arrange(TIMESTAMP)
Try summarise_each(funs(mean,sum), -TIMESTAMP) to exclude TIMESTAMP from the summarisation.

how to obtain summary of statistics for distinct values of a column in dataframe in R? [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
Consider we have a data.frame named IND, in which we have a column called dept. There are in total 100 rows and there are 20 distinct values in dept.
Now I would like to obtain the summary statistics for these 20 subsets of data.frame containing 5 rows each using the main data.frame!
summary(IND) gives the summary statistics for whole dataset but what should I do in my case?
Something like this
mtcars %>% group_by(cyl) %>% summarise_each(funs(sum, mean))
can be used for your case as
IND %>% group_by(dept) %>% summarise_each(funs(sum, mean))

Resources