R Mean of multiple groups by quartiles [duplicate] - r

This question already has answers here:
Mean per group in a data.frame [duplicate]
(8 answers)
Closed 1 year ago.
I have a dataframe with different variables, e.g.: x1, x2 and so on.
I created quartiles based on one variable (BE) with the following code:
Quantile_Var <- Var%>% mutate(Quartile = ntile(BE, 5))
Now I want to see the means of each variables (x1, x2...) by quartiles. I tried to use the following code, but it gives me too many information since I only need the means. How to edit the code so R only gives me the means?
Quantile_Testvar %>% split(.$quartile) %>% map(summary)`
It's probably completly easy, unfortunaly I have struggles to do so

You can use output from ntile as a group and get the average value for all the x variables.
library(dplyr)
Quantile_Var <- Var %>%
group_by(Quartile = ntile(BE, 5)) %>%
summarise(across(starts_with('x'), mean, na.rm = TRUE))

Related

Aggregating data by grouping variable [duplicate]

This question already has answers here:
Adding a column of means by group to original data [duplicate]
(4 answers)
Closed 1 year ago.
I am attempting to create a new variable/column in a data frame at the county level from the mean variable of viral fragments detected in municipalities within this specific county for each day municipalities reported data. I have been able to calculate this mean with two different ways the following code:
dataframe[, mean(SARS.mean), by = date]
aggregate(datframeme$SARS.mean, list(dataframe$Date.Collected), FUN=mean)
but when i do something like
dataframe$countymeanforeachday <- dataframe[, mean(SARS.mean), by = date]
dataframe$countymeanforeachday <- aggregate(datframeme$SARS.mean, list(dataframe$Date.Collected), FUN=mean)
it does not work. Please advise I beg
If you are open to a tidyverse approach:
library(tidyverse)
dataframe <- dataframe %>%
group_by(Date.Collected) %>%
mutate(countymeanforeachday = mean(SARS.mean, na.rm = TRUE)) %>%
ungroup()

R - how to use group by function properly [duplicate]

This question already has answers here:
dplyr groups not working with dollar sign data$column syntax
(1 answer)
Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?
(2 answers)
Closed last year.
I'm trying to do the average and correlation for some variables sorted gender. I don't think my group_by function is working, for some reason.
data(PSID1982, package ="AER" )
PSID1982 %>%
group_by(gender) %>%
summarise(avgeduc = mean(PSID1982$education), avgexper = mean(PSID1982$experience), avgwage= mean(PSID1982$wage),cor_wagvseduc = cor( x=PSID1982$wage, y= PSID1982$education))
The result is just the summary statistics of the entire group, not broken up into different genders.
Your syntax is correct but when you are using pipes and dplyr functions you do not need to call the column name using PSID1982$Column_Name. You just use the name of the column as follows:
PSID1982 %>%
group_by(gender) %>%
summarise(avgeduc = mean(education),
avgexper = mean(experience),
avgwage= mean(wage),
cor_wagvseduc = cor( x=wage, y= education))

Calculate averages for each country, from a data frame [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 3 years ago.
I'm working on a dataframe called 'df'. Two of the columns in 'df' are 'country' and 'dissent'. I need to calculate the average dissent per country. What is the most effective way to iterate through the data frame and calculate the averages by country?
I tried for loops but it does not work and also I don't think it is the most effective way.
tidyverse provides the easiest way IMO
df %>% group_by(country) %>% summarize(avg = mean(dissent, na.rm = TRUE))
If you really are intent upon "iterating" through, as the question suggests (though not recommended):
avg <- NULL
for (i in 1:length(unique(df$country))) {
avg[i] <- mean(df[df$country == unique(df$country)[i], "dissent"], na.rm=TRUE)
}

R: Calculate mean by column in a list of dataframes using pipes %>% in dplyr [duplicate]

This question already has answers here:
Mean per group in a data.frame [duplicate]
(8 answers)
Closed 4 years ago.
I am trying to get better in using pipes %>% in dplyr package. I understand that the whole point of using pipes (%>%) is that it replaces the first argument in a function by the one connected by pipe. That is, in this example:
area = rep(c(3:7), 5) + rnorm(5)
Pipes
area %>%
mean
equal normal function
`mean(area)`.
My problem is when it gets to a dataframe. I would like to split dataframe in a list of dataframes, and than calculate means per area columns. But, I can't figure out how to call the column instead of the dataframe?
I know that I can get means by year simply by aggregate(area~ year, df, mean) but I would like to practice pipes instead.
Thank you!
# Dummy data
set.seed(13)
df<-data.frame(year = rep(c(1:5), each = 5),
area = rep(c(3:7), each = 5) + rnorm(1))
# Calculate means.
# Neither `mean(df$area)`, `mean("area")` or `mean[area]` does not work. How to call the column correctly?
df %>%
split(df$year) %>%
mean
This?
df %>%
group_by(year) %>%
summarise(Mean=mean(area))
We need to extract the column from the list of data.frames in split. One option is to loop through the list with map, and summarise the 'area'.
df %>%
split(.$year) %>%
map_df(~ .x %>%
summarise(area = mean(area)))

I am looking to find the sum of certain columns in my data set [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
I am looking to find the sum of certain columns in my dataset. currently it looks something like this.
I want to find the column sum of everyone in X, Y and Z for each possible grid and month combination.
Currently I have
xx<-data[data$Month=="November"&data$grid=="A3",]
fun<-by(xx[, 1:3],xx$grid, colSums,na.rm=T)
fun<-as.character(fun)
as.data.frame(fun,
stringsAsFactors = default.stringsAsFactors())
But this requires me to change the grid ref and month ref each time, is there a simpler way to do it without manually specifying which grids and months I want.
We can use summarise_each from dplyr after grouping by 'month', 'grid'
library(dplyr)
data %>%
group_by(month, grid) %>%
summarise_each(funs(sum))
Or with aggregate from base R
aggregate(.~month + grid, data, FUN = sum)
Or using the OP's method
by(data[1:3], data[4:5], FUN = colSums)

Resources