This question already has answers here:
dplyr: How to use group_by inside a function?
(4 answers)
Closed 6 years ago.
I want to set the column for grouping a data frame into a variable and then group and summarise the data frame based on it, i.e.
require(dplyr)
var <- colnames(mtcars)[10]
summaries <- mtcars %>% dplyr::group_by(var) %>% dplyr::summarise_each(funs(mean))
such that I can simply change var and use the second line without changing anything. Unfortunately my solution does not work as group_by asks the column name and not a variable.
Use group_by_, which takes arguments as character strings:
require(dplyr)
var <- colnames(mtcars)[10]
summaries <- mtcars %>% dplyr::group_by_(var) %>% dplyr::summarise_each(funs(mean))
(Maybe resources on standard vs non-standard evaluation would be of interest: http://adv-r.had.co.nz/Computing-on-the-language.html)
Related
This question already has answers here:
How to select non-numeric columns using dplyr::select_if
(3 answers)
Closed 1 year ago.
In my project, I want to extract all the columns except numeric from my R data frame, as this question I used the same method and just put a not gate into is.numeric() R function but it is not working
This gives all the numaric data,
x<-iris %>% dplyr::select(where(is.numeric))
But this does not work as expected,
x<-iris %>% dplyr::select(where(!is.numeric))
Note: Finally the output data frame should only contain the species column in the iris dataset
purrr package from tidyverse serves exactly what you want by purrr::keep and purrr::discard
library(purrr)
x <- iris %>% keep(is.numeric)
by these piece of code, you set a logical test in keep function and only the columns which passed the test stays.
to reverse that operation and achieve to your wish, you can use discard from purrr also;
x <- iris %>% discard(is.numeric)
you can think discard as keep but with !is.numeric
or alternatively by dplyr
x <- iris %>% select_if(~!is.numeric(.))
This question already has answers here:
Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?
(2 answers)
Closed 2 years ago.
Im trying to group together data in my data set based on whether the value in one of the columns is 1, 2 or 3. The columns of my data are CLASS and PERF and I want to group based on the CLASS column. The code I have used is
visible2<-visible %>%
group_by(CLASS) %>%
summarise(mean_performance = mean(PERF), sd_performance = sd(PERF))
the output I get is just one value for the mean and standard deviation for the performance across all groups rather than 3 rows, one for each group
It could be because the plyr package is also loaded and plyr::summarise masked te dplyr::summarise. We can specify dplyr::summarise explicitly or redo this on a fresh R with only dplyr loaded
library(dplyr)
visible %>%
group_by(CLASS) %>%
dplyr::summarise(mean_performance = mean(PERF), sd_performance = sd(PERF))
This question already has answers here:
dplyr groups not working with dollar sign data$column syntax
(1 answer)
Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?
(2 answers)
Closed last year.
I'm trying to do the average and correlation for some variables sorted gender. I don't think my group_by function is working, for some reason.
data(PSID1982, package ="AER" )
PSID1982 %>%
group_by(gender) %>%
summarise(avgeduc = mean(PSID1982$education), avgexper = mean(PSID1982$experience), avgwage= mean(PSID1982$wage),cor_wagvseduc = cor( x=PSID1982$wage, y= PSID1982$education))
The result is just the summary statistics of the entire group, not broken up into different genders.
Your syntax is correct but when you are using pipes and dplyr functions you do not need to call the column name using PSID1982$Column_Name. You just use the name of the column as follows:
PSID1982 %>%
group_by(gender) %>%
summarise(avgeduc = mean(education),
avgexper = mean(experience),
avgwage= mean(wage),
cor_wagvseduc = cor( x=wage, y= education))
This question already has answers here:
Mean per group in a data.frame [duplicate]
(8 answers)
Closed 4 years ago.
I am trying to get better in using pipes %>% in dplyr package. I understand that the whole point of using pipes (%>%) is that it replaces the first argument in a function by the one connected by pipe. That is, in this example:
area = rep(c(3:7), 5) + rnorm(5)
Pipes
area %>%
mean
equal normal function
`mean(area)`.
My problem is when it gets to a dataframe. I would like to split dataframe in a list of dataframes, and than calculate means per area columns. But, I can't figure out how to call the column instead of the dataframe?
I know that I can get means by year simply by aggregate(area~ year, df, mean) but I would like to practice pipes instead.
Thank you!
# Dummy data
set.seed(13)
df<-data.frame(year = rep(c(1:5), each = 5),
area = rep(c(3:7), each = 5) + rnorm(1))
# Calculate means.
# Neither `mean(df$area)`, `mean("area")` or `mean[area]` does not work. How to call the column correctly?
df %>%
split(df$year) %>%
mean
This?
df %>%
group_by(year) %>%
summarise(Mean=mean(area))
We need to extract the column from the list of data.frames in split. One option is to loop through the list with map, and summarise the 'area'.
df %>%
split(.$year) %>%
map_df(~ .x %>%
summarise(area = mean(area)))
This question already has answers here:
R dplyr: Non-Standard Evaluation difficulty. Would like to use dynamic variable names in filter and mutate
(2 answers)
Closed 4 years ago.
I am trying to filter the mtcars table in R, referencing a column name with a character variable. So, I write:
var <- "cyl"
mtcars %>%
filter(!!var > 6)
But, for some reason the table isn't being filtered. I think this code is the equivalent of this:
mtcars %>%
filter("cyl" > 6)
What I really need is to convert that string to a name. Does anybody know how to handle this problem?
This works for me:
library(dplyr)
var <- sym("cyl")
mtcars %>%
filter(!!var > 6)