R - how to use group by function properly [duplicate]

R - how to use group by function properly [duplicate] - r

This question already has answers here:
dplyr groups not working with dollar sign data$column syntax
(1 answer)
Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?
(2 answers)
Closed last year.
I'm trying to do the average and correlation for some variables sorted gender. I don't think my group_by function is working, for some reason.
data(PSID1982, package ="AER" )
PSID1982 %>%
group_by(gender) %>%
summarise(avgeduc = mean(PSID1982$education), avgexper = mean(PSID1982$experience), avgwage= mean(PSID1982$wage),cor_wagvseduc = cor( x=PSID1982$wage, y= PSID1982$education))
The result is just the summary statistics of the entire group, not broken up into different genders.

Your syntax is correct but when you are using pipes and dplyr functions you do not need to call the column name using PSID1982$Column_Name. You just use the name of the column as follows:
PSID1982 %>%
group_by(gender) %>%
summarise(avgeduc = mean(education),
avgexper = mean(experience),
avgwage= mean(wage),
cor_wagvseduc = cor( x=wage, y= education))

Related

What is the dot (".") notation in R? [duplicate]

This question already has answers here:
What does the dplyr period character "." reference?
(2 answers)
Closed 2 years ago.
I want to group the tidyr dataset relig_income by religion, show the mean of the believers by N_People and order them DESC through the mean. I tried the first code but according to my online-course, the correct answer appears to be the second. What does the dot, in the function of arrange, means?
I am getting two different results.
My Code:
tidy_df %>% group_by(religion) %>% summarise(mean_believers = mean(N_People)) %>% arrange(mean_believers, desc(mean_believers))
Correct Answer:
tidy_df %>% group_by(religion) %>% summarise(mean_believers = mean(N_People)) %>% arrange(., desc(mean_believers))

The . is the notation for the data passed through %>%.
For example, you can reference specific columns of the data with .$your_column
Take a look at the documentation for pipe

How do I group data based on a variable? [duplicate]

This question already has answers here:
Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?
(2 answers)
Closed 2 years ago.
Im trying to group together data in my data set based on whether the value in one of the columns is 1, 2 or 3. The columns of my data are CLASS and PERF and I want to group based on the CLASS column. The code I have used is
visible2<-visible %>%
group_by(CLASS) %>%
summarise(mean_performance = mean(PERF), sd_performance = sd(PERF))
the output I get is just one value for the mean and standard deviation for the performance across all groups rather than 3 rows, one for each group

It could be because the plyr package is also loaded and plyr::summarise masked te dplyr::summarise. We can specify dplyr::summarise explicitly or redo this on a fresh R with only dplyr loaded
library(dplyr)
visible %>%
group_by(CLASS) %>%
dplyr::summarise(mean_performance = mean(PERF), sd_performance = sd(PERF))

R: Calculate mean by column in a list of dataframes using pipes %>% in dplyr [duplicate]

This question already has answers here:
Mean per group in a data.frame [duplicate]
(8 answers)
Closed 4 years ago.
I am trying to get better in using pipes %>% in dplyr package. I understand that the whole point of using pipes (%>%) is that it replaces the first argument in a function by the one connected by pipe. That is, in this example:
area = rep(c(3:7), 5) + rnorm(5)
Pipes
area %>%
mean
equal normal function
`mean(area)`.
My problem is when it gets to a dataframe. I would like to split dataframe in a list of dataframes, and than calculate means per area columns. But, I can't figure out how to call the column instead of the dataframe?
I know that I can get means by year simply by aggregate(area~ year, df, mean) but I would like to practice pipes instead.
Thank you!
# Dummy data
set.seed(13)
df<-data.frame(year = rep(c(1:5), each = 5),
area = rep(c(3:7), each = 5) + rnorm(1))
# Calculate means.
# Neither `mean(df$area)`, `mean("area")` or `mean[area]` does not work. How to call the column correctly?
df %>%
split(df$year) %>%
mean

This?
df %>%
group_by(year) %>%
summarise(Mean=mean(area))

We need to extract the column from the list of data.frames in split. One option is to loop through the list with map, and summarise the 'area'.
df %>%
split(.$year) %>%
map_df(~ .x %>%
summarise(area = mean(area)))

dplyr use select() helpers inside mutate() [duplicate]

This question already has answers here:
dplyr mutate rowSums calculations or custom functions
(7 answers)
Closed 4 years ago.
I'd to make a new variable which represents the sum (or other function) of many other variables which all start with "prefix_". Is there a way to do this neatly using these select() helpers (e.g. starts_with())?
I don't think mutate_at() works for this since I'm just trying to create a single new variable based on many existing variables.
My attempt:
df %<>%
mutate(newvar = sum(vars(starts_with("prefix_"))))
This of course doesn't work. Many thanks!
A reproducible example:
mtcars %<>%
rename("prefix_mpg" = mpg) %>%
rename("prefix_cyl" = cyl) %>%
mutate(newvar = sum(var(starts_with("prefix_"))))
Intended output would be mtcars$newvar which is the sum of prefix_mpg and prefix_cyl. Of course I could just explicitly name mpg and cyl but in my actual case it's a long list of variables, too long to name conveniently.

We can use starts_with with the select call and put them in the rowSums function. . is a way to specify the object from the output of the previous pipe operation.
library(dplyr)
mtcars %>%
rename(prefix_mpg = mpg, prefix_cyl = cyl) %>%
mutate(newvar = rowSums(select(., starts_with("prefix_"))))

R - group data frame from a variable [duplicate]

This question already has answers here:
dplyr: How to use group_by inside a function?
(4 answers)
Closed 6 years ago.
I want to set the column for grouping a data frame into a variable and then group and summarise the data frame based on it, i.e.
require(dplyr)
var <- colnames(mtcars)[10]
summaries <- mtcars %>% dplyr::group_by(var) %>% dplyr::summarise_each(funs(mean))
such that I can simply change var and use the second line without changing anything. Unfortunately my solution does not work as group_by asks the column name and not a variable.

Use group_by_, which takes arguments as character strings:
require(dplyr)
var <- colnames(mtcars)[10]
summaries <- mtcars %>% dplyr::group_by_(var) %>% dplyr::summarise_each(funs(mean))
(Maybe resources on standard vs non-standard evaluation would be of interest: http://adv-r.had.co.nz/Computing-on-the-language.html)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R - how to use group by function properly [duplicate] - r

Related

What is the dot (".") notation in R? [duplicate]

How do I group data based on a variable? [duplicate]

R: Calculate mean by column in a list of dataframes using pipes %>% in dplyr [duplicate]

dplyr use select() helpers inside mutate() [duplicate]

R - group data frame from a variable [duplicate]

Categories

Resources