This question already has answers here:
Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?
(2 answers)
Closed 2 years ago.
Im trying to group together data in my data set based on whether the value in one of the columns is 1, 2 or 3. The columns of my data are CLASS and PERF and I want to group based on the CLASS column. The code I have used is
visible2<-visible %>%
group_by(CLASS) %>%
summarise(mean_performance = mean(PERF), sd_performance = sd(PERF))
the output I get is just one value for the mean and standard deviation for the performance across all groups rather than 3 rows, one for each group
It could be because the plyr package is also loaded and plyr::summarise masked te dplyr::summarise. We can specify dplyr::summarise explicitly or redo this on a fresh R with only dplyr loaded
library(dplyr)
visible %>%
group_by(CLASS) %>%
dplyr::summarise(mean_performance = mean(PERF), sd_performance = sd(PERF))
Related
This question already has answers here:
What does the dplyr period character "." reference?
(2 answers)
Closed 2 years ago.
I want to group the tidyr dataset relig_income by religion, show the mean of the believers by N_People and order them DESC through the mean. I tried the first code but according to my online-course, the correct answer appears to be the second. What does the dot, in the function of arrange, means?
I am getting two different results.
My Code:
tidy_df %>% group_by(religion) %>% summarise(mean_believers = mean(N_People)) %>% arrange(mean_believers, desc(mean_believers))
Correct Answer:
tidy_df %>% group_by(religion) %>% summarise(mean_believers = mean(N_People)) %>% arrange(., desc(mean_believers))
The . is the notation for the data passed through %>%.
For example, you can reference specific columns of the data with .$your_column
Take a look at the documentation for pipe
This question already has answers here:
dplyr groups not working with dollar sign data$column syntax
(1 answer)
Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?
(2 answers)
Closed last year.
I'm trying to do the average and correlation for some variables sorted gender. I don't think my group_by function is working, for some reason.
data(PSID1982, package ="AER" )
PSID1982 %>%
group_by(gender) %>%
summarise(avgeduc = mean(PSID1982$education), avgexper = mean(PSID1982$experience), avgwage= mean(PSID1982$wage),cor_wagvseduc = cor( x=PSID1982$wage, y= PSID1982$education))
The result is just the summary statistics of the entire group, not broken up into different genders.
Your syntax is correct but when you are using pipes and dplyr functions you do not need to call the column name using PSID1982$Column_Name. You just use the name of the column as follows:
PSID1982 %>%
group_by(gender) %>%
summarise(avgeduc = mean(education),
avgexper = mean(experience),
avgwage= mean(wage),
cor_wagvseduc = cor( x=wage, y= education))
This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 4 years ago.
I have daily data (df$date is the daily field):
Which I want to group by week (df$wbm = "week beginning monday") in a new data frame (df2). When I run the below statement, the data frame that is returned is the same as the original:
df2<- df%>%
group_by(wbm)
The function runs without throwing an error, but it just produces the same data frame.
How can I drop date and ensure that my variables are grouped by wbm?
The group_by steps adds a grouping attribute, but we didn't give any command as to how to summarise it. If we need to get the sum of the columns that have column names as 'var' grouped by 'wbm', then use summarise_at
library(dplyr)
df%>%
group_by(wbm) %>%
summarise_at(vars(matches('^var\\d+$')), sum)
If it is only a single column to be summarised, it can be summarise
df %>%
group_by(wbm) %>%
summarise(var1 = sum(var1))
This question already has answers here:
dplyr: How to use group_by inside a function?
(4 answers)
Closed 6 years ago.
I want to set the column for grouping a data frame into a variable and then group and summarise the data frame based on it, i.e.
require(dplyr)
var <- colnames(mtcars)[10]
summaries <- mtcars %>% dplyr::group_by(var) %>% dplyr::summarise_each(funs(mean))
such that I can simply change var and use the second line without changing anything. Unfortunately my solution does not work as group_by asks the column name and not a variable.
Use group_by_, which takes arguments as character strings:
require(dplyr)
var <- colnames(mtcars)[10]
summaries <- mtcars %>% dplyr::group_by_(var) %>% dplyr::summarise_each(funs(mean))
(Maybe resources on standard vs non-standard evaluation would be of interest: http://adv-r.had.co.nz/Computing-on-the-language.html)
This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
I am looking to find the sum of certain columns in my dataset. currently it looks something like this.
I want to find the column sum of everyone in X, Y and Z for each possible grid and month combination.
Currently I have
xx<-data[data$Month=="November"&data$grid=="A3",]
fun<-by(xx[, 1:3],xx$grid, colSums,na.rm=T)
fun<-as.character(fun)
as.data.frame(fun,
stringsAsFactors = default.stringsAsFactors())
But this requires me to change the grid ref and month ref each time, is there a simpler way to do it without manually specifying which grids and months I want.
We can use summarise_each from dplyr after grouping by 'month', 'grid'
library(dplyr)
data %>%
group_by(month, grid) %>%
summarise_each(funs(sum))
Or with aggregate from base R
aggregate(.~month + grid, data, FUN = sum)
Or using the OP's method
by(data[1:3], data[4:5], FUN = colSums)