Summarize by group like proc sql [duplicate] - r

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
I have a SKU -week data which has different in-store activities (Tactics). I want to summarize each variables by the tactics. Pasting the code in SAS:
proc sql;
create table lp.lp_sku_report1
as select distinct(tactic), sum(Sales_Stat_Case_10_Lt) as Sales_Stat_Case_10_Lt, sum(Sales_Units) as Sales_Units, sum(Sales_Dollars) as Sales_Dollars, sum(Baseline_Stat_Case_10_Lt) as Baseline_Stat_Case_10_Lt, sum(Baseline_Units) as Baseline_Units, sum(Baseline_Dollars) as Baseline_Dollars
from lp.lp_sku_data
group by tactic; quit;

with dplyr package it's very easy to perform this action. first group by the variable tactic and then summarise rest of the variables by using the aggregating function sum.
In this specific case since the aggregating function is same for all the variables you can use summarise_each to apply the same function to all variables.
below is the code
library(dplyr)
df = df %>%
group_by(tactic) %>%
summarise_each(funs(sum))

Related

How to View several columns conditioned on the value in one of them? [duplicate]

This question already has answers here:
Filtering a data frame by values in a column [duplicate]
(3 answers)
Closed 1 year ago.
In the following code below, i look over three variables in a dataset. However, I would like to look over the three variable when the year column is equal to 72. Is there a way to do it by using the View function?
library(plm)
data("Cigar")
View(Cigar[, c("year","price", "sales")])
You can do this in several ways. One way is to use subset() with select. You don't need to quote column names.
For example:
View(subset(Cigar, select = c(year, price, sales), year == 72))
In R version 4.1.0 or newer you can also use the |> pipe :
Cigar |>
subset(Cigar, select = c(year, price, sales), year == 72) |>
View()

Grouping same values from a single column while retaining the data in [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 1 year ago.
This is my current code for this image data[1:20,c("Job.Family", "Salaries", "Retirement")]. The goal here is to group all the same jobs in the Job.Family column together without loosing any data associated with it. So for example I would like to find out the sum of "Salaries" and "Retirement" for all those in the "Information System" Job.Family. Hopefully this makes sense.
You are probably looking into some very basic subsetting and summarising operations here.
I strongly recommend you study the dplyr package.
Your example:
library(dplyr)
df %>% filter(Job.Family = "Information Systems") %>%
summarise(across(c(Salaries, Retirement), mean))
You may want to calculate this for all groups, as in:
df %>% group_by(Job.Family) %>%
summarise(across(c(Salaries, Retirement), mean))

How to apply summarise_each to all columns except one? [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
I am analyzing a set of data with many columns (almost 30 columns). I want to group data based on two columns and apply sum and mean functions to all the columns except timestamp.
How would I use summarise_each on all columns except timestamp?
This is the draft code I have but it obviously not correct. Plus it generates and error because it can not apply Sum to POSIXt data type (Error: 'sum' not defined for "POSIXt" objects)
features <- dataset %>%
group_by(X, Y) %>%
summarise_each(funs(mean,sum)) %>%
arrange(TIMESTAMP)
Try summarise_each(funs(mean,sum), -TIMESTAMP) to exclude TIMESTAMP from the summarisation.

Calculate mean of multiple rows using grouping variables [duplicate]

This question already has answers here:
Mean per group in a data.frame [duplicate]
(8 answers)
Closed 7 years ago.
I am trying to calculate an overall mean of multiple classes. Currently the database is in long format. I tried selecting first ID number (group variable 1), then a dummy variable (stem=1) classes that I am interested in (grouping variable 2), and then calculating one GPA mean (i.e., stem GPA mean) for the grades received in interested classes (stem=1).
I have an attached an example of the database below. Overall,, I am trying figure out how to calculate stem GPA for each student.
See example here
I have tried using library(psych), describeBy(data, dataset$id, dataset$stem), but to no avail. Any suggestions?
I prefer the dplyr package for these operations. Try e.g.
df %>% group_by(class) %>% summarise(mean_class=mean(class))
For instance, using the mtcars dataset:
library(dplyr)
mtcars %>% group_by(cyl) %>% summarise(mean_disp = mean(disp))
will give you all the means of disp based on the grouping variable cyl.

dplyr summarize multiple column [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
I have a simple dataframe with the following column name
Subject # Type # Value0 # value1# value2# ....value100
I want to use the dplyr summarize operation in order to get the mean of each value columns.
I think there is a useful alternative to
ddply(dataframe, c("Subject,Type"), summarize, m1= mean(value1), m2=mean(value2)....)
If I gather all Value column name in a list
names =c("Value0,Value1,....Value100)
How can I use this list in ddply?
We can use summarise_each
library(dplyr)
df1 %>%
group_by(Subject, Type) %>%
summarise_each(funs(mean= mean(., na.rm=TRUE)))

Resources