r: Summarizing Data Frame Using Group By [duplicate] - r

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 4 years ago.
I have daily data (df$date is the daily field):
Which I want to group by week (df$wbm = "week beginning monday") in a new data frame (df2). When I run the below statement, the data frame that is returned is the same as the original:
df2<- df%>%
group_by(wbm)
The function runs without throwing an error, but it just produces the same data frame.
How can I drop date and ensure that my variables are grouped by wbm?

The group_by steps adds a grouping attribute, but we didn't give any command as to how to summarise it. If we need to get the sum of the columns that have column names as 'var' grouped by 'wbm', then use summarise_at
library(dplyr)
df%>%
group_by(wbm) %>%
summarise_at(vars(matches('^var\\d+$')), sum)
If it is only a single column to be summarised, it can be summarise
df %>%
group_by(wbm) %>%
summarise(var1 = sum(var1))

Related

How do I group data based on a variable? [duplicate]

This question already has answers here:
Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?
(2 answers)
Closed 2 years ago.
Im trying to group together data in my data set based on whether the value in one of the columns is 1, 2 or 3. The columns of my data are CLASS and PERF and I want to group based on the CLASS column. The code I have used is
visible2<-visible %>%
group_by(CLASS) %>%
summarise(mean_performance = mean(PERF), sd_performance = sd(PERF))
the output I get is just one value for the mean and standard deviation for the performance across all groups rather than 3 rows, one for each group
It could be because the plyr package is also loaded and plyr::summarise masked te dplyr::summarise. We can specify dplyr::summarise explicitly or redo this on a fresh R with only dplyr loaded
library(dplyr)
visible %>%
group_by(CLASS) %>%
dplyr::summarise(mean_performance = mean(PERF), sd_performance = sd(PERF))

R - how to use group by function properly [duplicate]

This question already has answers here:
dplyr groups not working with dollar sign data$column syntax
(1 answer)
Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?
(2 answers)
Closed last year.
I'm trying to do the average and correlation for some variables sorted gender. I don't think my group_by function is working, for some reason.
data(PSID1982, package ="AER" )
PSID1982 %>%
group_by(gender) %>%
summarise(avgeduc = mean(PSID1982$education), avgexper = mean(PSID1982$experience), avgwage= mean(PSID1982$wage),cor_wagvseduc = cor( x=PSID1982$wage, y= PSID1982$education))
The result is just the summary statistics of the entire group, not broken up into different genders.
Your syntax is correct but when you are using pipes and dplyr functions you do not need to call the column name using PSID1982$Column_Name. You just use the name of the column as follows:
PSID1982 %>%
group_by(gender) %>%
summarise(avgeduc = mean(education),
avgexper = mean(experience),
avgwage= mean(wage),
cor_wagvseduc = cor( x=wage, y= education))

Calculate averages for each country, from a data frame [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 3 years ago.
I'm working on a dataframe called 'df'. Two of the columns in 'df' are 'country' and 'dissent'. I need to calculate the average dissent per country. What is the most effective way to iterate through the data frame and calculate the averages by country?
I tried for loops but it does not work and also I don't think it is the most effective way.
tidyverse provides the easiest way IMO
df %>% group_by(country) %>% summarize(avg = mean(dissent, na.rm = TRUE))
If you really are intent upon "iterating" through, as the question suggests (though not recommended):
avg <- NULL
for (i in 1:length(unique(df$country))) {
avg[i] <- mean(df[df$country == unique(df$country)[i], "dissent"], na.rm=TRUE)
}

dplyr filter tens of columns [duplicate]

This question already has answers here:
filter for complete cases in data.frame using dplyr (case-wise deletion)
(7 answers)
Closed 5 years ago.
Suppose I have a 27 columns data frame. The first column is the ID, and the rest of columns (A to Z) are just data. I want to take out all the rows whose A to Z columns are NA. How should I do it?
The straightforward way is just
data %>%
filter(!(is.na(A) & is.na(B) .... & is.na(Z)))
Is there a more efficient or easier way to do it?
This question is different from This one because I want to exclude rows whose value are ALL NA, and keep the rows whose value are partially NA.
Using tidyverse:
library(tidyverse)
Load data:
ID <- c(1:8)
Col1<-c(34564,NA,43456,NA,45655,6789,99999,87667)
Col2<-c(34565,43456,55555,NA,65433,22234,NA,98909)
Col3<-c(45673,88789,11123,NA,55676,76566,NA,NA)
mydf <- data_frame(ID,Col1,Col2,Col3)
mydf %>%
slice(which(complete.cases(.)))
Whether you want to preserve selected columns removing rows with all NAs you may run:
mydf %>%
mutate(full_incomplete_cases=rowSums(is.na(.[-1]))) %>%
filter(full_incomplete_cases<length(mydf[,-1])) %>%
select(ID:Col3)

R - group data frame from a variable [duplicate]

This question already has answers here:
dplyr: How to use group_by inside a function?
(4 answers)
Closed 6 years ago.
I want to set the column for grouping a data frame into a variable and then group and summarise the data frame based on it, i.e.
require(dplyr)
var <- colnames(mtcars)[10]
summaries <- mtcars %>% dplyr::group_by(var) %>% dplyr::summarise_each(funs(mean))
such that I can simply change var and use the second line without changing anything. Unfortunately my solution does not work as group_by asks the column name and not a variable.
Use group_by_, which takes arguments as character strings:
require(dplyr)
var <- colnames(mtcars)[10]
summaries <- mtcars %>% dplyr::group_by_(var) %>% dplyr::summarise_each(funs(mean))
(Maybe resources on standard vs non-standard evaluation would be of interest: http://adv-r.had.co.nz/Computing-on-the-language.html)

Resources