Aggregating data by grouping variable [duplicate] - r

This question already has answers here:
Adding a column of means by group to original data [duplicate]
(4 answers)
Closed 1 year ago.
I am attempting to create a new variable/column in a data frame at the county level from the mean variable of viral fragments detected in municipalities within this specific county for each day municipalities reported data. I have been able to calculate this mean with two different ways the following code:
dataframe[, mean(SARS.mean), by = date]
aggregate(datframeme$SARS.mean, list(dataframe$Date.Collected), FUN=mean)
but when i do something like
dataframe$countymeanforeachday <- dataframe[, mean(SARS.mean), by = date]
dataframe$countymeanforeachday <- aggregate(datframeme$SARS.mean, list(dataframe$Date.Collected), FUN=mean)
it does not work. Please advise I beg

If you are open to a tidyverse approach:
library(tidyverse)
dataframe <- dataframe %>%
group_by(Date.Collected) %>%
mutate(countymeanforeachday = mean(SARS.mean, na.rm = TRUE)) %>%
ungroup()

Related

R Mean of multiple groups by quartiles [duplicate]

This question already has answers here:
Mean per group in a data.frame [duplicate]
(8 answers)
Closed 1 year ago.
I have a dataframe with different variables, e.g.: x1, x2 and so on.
I created quartiles based on one variable (BE) with the following code:
Quantile_Var <- Var%>% mutate(Quartile = ntile(BE, 5))
Now I want to see the means of each variables (x1, x2...) by quartiles. I tried to use the following code, but it gives me too many information since I only need the means. How to edit the code so R only gives me the means?
Quantile_Testvar %>% split(.$quartile) %>% map(summary)`
It's probably completly easy, unfortunaly I have struggles to do so
You can use output from ntile as a group and get the average value for all the x variables.
library(dplyr)
Quantile_Var <- Var %>%
group_by(Quartile = ntile(BE, 5)) %>%
summarise(across(starts_with('x'), mean, na.rm = TRUE))

Group by and count based on muliple conditions in R [duplicate]

This question already has answers here:
Aggregate multiple columns at once [duplicate]
(2 answers)
Closed 2 years ago.
I have a data frame with 4 columns. I want to produce a new data frame which groups by the first three columns, and provides a count of the instances of "Yes" in the fourth column
So
becomes
How do I do this in R
Thanks for your help
It would be best if I had a set of your actual data to verify this works and returns the output you desire, but the following should work.
library(dplyr)
df %>%
group_by(across(1:4)) %>%
summarize(Count = sum(`Passed Test` == "Y"))
An option with base R
aggregate(`Passed Test` ~ ., df, FUN = function(x) sum(x == "Y"))

R Dplyr group_by [duplicate]

This question already has answers here:
Count number of rows by group using dplyr
(5 answers)
Closed 2 years ago.
I have a dataset that contains information about multiple countries. As I am trying to construct population weights, I want to calculate:
country population (variable included in the dataset) / sample size for each country (different for each country)
For one specific country, I would first create a subset (e.g. italydata = subset(data, data$country == "Italy") and then divide country_population by nrow(italydata).
I am looking for a way to do this calculation for each country in the dataset. I have tried it with dplyr package, but I am uncertain what to write instead of nrow("x").
weight_by_economy <- data %>%
group_by(country) %>%
summarize(weight = country_population/nrow(x))
Thanks for your help!
Try
weight_by_economy <- data %>%
group_by(country) %>%
summarize(weight = country_population/n())
if this doesn't work, please clarify the question by providing a representative data object.

How to split a time series data by range of date in R? [duplicate]

This question already has answers here:
Case Statement Equivalent in R
(16 answers)
Closed 2 years ago.
I am a beginner in R, and working on a time series monthly data:
I have selected the following data columns from my dataframe
df = select(data, Date, Location, Production, Value)
There are 13 different locations, multiple production sectors and their values. Date range is from 2016-01-2020-09.
I want to split the data into two columns: before covid (2016-01 to 2020-01), and after covid (2020-02 to 2020-09).
How do I do this in R ? Is there a simple method ?
I think you're looking for something like this. Here's a tidy solution. There are several similar ways to do this, but I think this is pretty straightforward to follow.
library(dplyr)
df %>%
mutate(PreCovid = ifelse(Date < as.Date("2020-02-01"), Value, NA),
PostCovid = ifelse(Date >= as.Date("2020-02-01"), Value, NA))
We can convert to date class first and then specify the range in which we want to divide the data.
library(dplyr)
df <- df %>%
mutate(Date = as.Date(paste0(Date, '-01')),
period = case_when(between(Date, as.Date('2016-01-01'),
as.Date('2020-01-01')) ~ 'Before covid',
between(Date, as.Date('2020-01-02'),
as.Date('2020-11-01')) ~ 'After covid'))

Calculate averages for each country, from a data frame [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 3 years ago.
I'm working on a dataframe called 'df'. Two of the columns in 'df' are 'country' and 'dissent'. I need to calculate the average dissent per country. What is the most effective way to iterate through the data frame and calculate the averages by country?
I tried for loops but it does not work and also I don't think it is the most effective way.
tidyverse provides the easiest way IMO
df %>% group_by(country) %>% summarize(avg = mean(dissent, na.rm = TRUE))
If you really are intent upon "iterating" through, as the question suggests (though not recommended):
avg <- NULL
for (i in 1:length(unique(df$country))) {
avg[i] <- mean(df[df$country == unique(df$country)[i], "dissent"], na.rm=TRUE)
}

Resources