This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 3 years ago.
I'm working on a dataframe called 'df'. Two of the columns in 'df' are 'country' and 'dissent'. I need to calculate the average dissent per country. What is the most effective way to iterate through the data frame and calculate the averages by country?
I tried for loops but it does not work and also I don't think it is the most effective way.
tidyverse provides the easiest way IMO
df %>% group_by(country) %>% summarize(avg = mean(dissent, na.rm = TRUE))
If you really are intent upon "iterating" through, as the question suggests (though not recommended):
avg <- NULL
for (i in 1:length(unique(df$country))) {
avg[i] <- mean(df[df$country == unique(df$country)[i], "dissent"], na.rm=TRUE)
}
Related
This question already has answers here:
Adding a column of means by group to original data [duplicate]
(4 answers)
Closed 1 year ago.
I am attempting to create a new variable/column in a data frame at the county level from the mean variable of viral fragments detected in municipalities within this specific county for each day municipalities reported data. I have been able to calculate this mean with two different ways the following code:
dataframe[, mean(SARS.mean), by = date]
aggregate(datframeme$SARS.mean, list(dataframe$Date.Collected), FUN=mean)
but when i do something like
dataframe$countymeanforeachday <- dataframe[, mean(SARS.mean), by = date]
dataframe$countymeanforeachday <- aggregate(datframeme$SARS.mean, list(dataframe$Date.Collected), FUN=mean)
it does not work. Please advise I beg
If you are open to a tidyverse approach:
library(tidyverse)
dataframe <- dataframe %>%
group_by(Date.Collected) %>%
mutate(countymeanforeachday = mean(SARS.mean, na.rm = TRUE)) %>%
ungroup()
This question already has answers here:
Mean per group in a data.frame [duplicate]
(8 answers)
Closed 1 year ago.
I have a dataframe with different variables, e.g.: x1, x2 and so on.
I created quartiles based on one variable (BE) with the following code:
Quantile_Var <- Var%>% mutate(Quartile = ntile(BE, 5))
Now I want to see the means of each variables (x1, x2...) by quartiles. I tried to use the following code, but it gives me too many information since I only need the means. How to edit the code so R only gives me the means?
Quantile_Testvar %>% split(.$quartile) %>% map(summary)`
It's probably completly easy, unfortunaly I have struggles to do so
You can use output from ntile as a group and get the average value for all the x variables.
library(dplyr)
Quantile_Var <- Var %>%
group_by(Quartile = ntile(BE, 5)) %>%
summarise(across(starts_with('x'), mean, na.rm = TRUE))
This question already has answers here:
Aggregate multiple columns at once [duplicate]
(2 answers)
Closed 2 years ago.
I have a data frame with 4 columns. I want to produce a new data frame which groups by the first three columns, and provides a count of the instances of "Yes" in the fourth column
So
becomes
How do I do this in R
Thanks for your help
It would be best if I had a set of your actual data to verify this works and returns the output you desire, but the following should work.
library(dplyr)
df %>%
group_by(across(1:4)) %>%
summarize(Count = sum(`Passed Test` == "Y"))
An option with base R
aggregate(`Passed Test` ~ ., df, FUN = function(x) sum(x == "Y"))
This question already has answers here:
Count number of rows by group using dplyr
(5 answers)
Closed 2 years ago.
I have a dataset that contains information about multiple countries. As I am trying to construct population weights, I want to calculate:
country population (variable included in the dataset) / sample size for each country (different for each country)
For one specific country, I would first create a subset (e.g. italydata = subset(data, data$country == "Italy") and then divide country_population by nrow(italydata).
I am looking for a way to do this calculation for each country in the dataset. I have tried it with dplyr package, but I am uncertain what to write instead of nrow("x").
weight_by_economy <- data %>%
group_by(country) %>%
summarize(weight = country_population/nrow(x))
Thanks for your help!
Try
weight_by_economy <- data %>%
group_by(country) %>%
summarize(weight = country_population/n())
if this doesn't work, please clarify the question by providing a representative data object.
This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
I am looking to find the sum of certain columns in my dataset. currently it looks something like this.
I want to find the column sum of everyone in X, Y and Z for each possible grid and month combination.
Currently I have
xx<-data[data$Month=="November"&data$grid=="A3",]
fun<-by(xx[, 1:3],xx$grid, colSums,na.rm=T)
fun<-as.character(fun)
as.data.frame(fun,
stringsAsFactors = default.stringsAsFactors())
But this requires me to change the grid ref and month ref each time, is there a simpler way to do it without manually specifying which grids and months I want.
We can use summarise_each from dplyr after grouping by 'month', 'grid'
library(dplyr)
data %>%
group_by(month, grid) %>%
summarise_each(funs(sum))
Or with aggregate from base R
aggregate(.~month + grid, data, FUN = sum)
Or using the OP's method
by(data[1:3], data[4:5], FUN = colSums)