R Dplyr group_by [duplicate] - r

This question already has answers here:
Count number of rows by group using dplyr
(5 answers)
Closed 2 years ago.
I have a dataset that contains information about multiple countries. As I am trying to construct population weights, I want to calculate:
country population (variable included in the dataset) / sample size for each country (different for each country)
For one specific country, I would first create a subset (e.g. italydata = subset(data, data$country == "Italy") and then divide country_population by nrow(italydata).
I am looking for a way to do this calculation for each country in the dataset. I have tried it with dplyr package, but I am uncertain what to write instead of nrow("x").
weight_by_economy <- data %>%
group_by(country) %>%
summarize(weight = country_population/nrow(x))
Thanks for your help!

Try
weight_by_economy <- data %>%
group_by(country) %>%
summarize(weight = country_population/n())
if this doesn't work, please clarify the question by providing a representative data object.

Related

Aggregating data by grouping variable [duplicate]

This question already has answers here:
Adding a column of means by group to original data [duplicate]
(4 answers)
Closed 1 year ago.
I am attempting to create a new variable/column in a data frame at the county level from the mean variable of viral fragments detected in municipalities within this specific county for each day municipalities reported data. I have been able to calculate this mean with two different ways the following code:
dataframe[, mean(SARS.mean), by = date]
aggregate(datframeme$SARS.mean, list(dataframe$Date.Collected), FUN=mean)
but when i do something like
dataframe$countymeanforeachday <- dataframe[, mean(SARS.mean), by = date]
dataframe$countymeanforeachday <- aggregate(datframeme$SARS.mean, list(dataframe$Date.Collected), FUN=mean)
it does not work. Please advise I beg
If you are open to a tidyverse approach:
library(tidyverse)
dataframe <- dataframe %>%
group_by(Date.Collected) %>%
mutate(countymeanforeachday = mean(SARS.mean, na.rm = TRUE)) %>%
ungroup()

Group by and count based on muliple conditions in R [duplicate]

This question already has answers here:
Aggregate multiple columns at once [duplicate]
(2 answers)
Closed 2 years ago.
I have a data frame with 4 columns. I want to produce a new data frame which groups by the first three columns, and provides a count of the instances of "Yes" in the fourth column
So
becomes
How do I do this in R
Thanks for your help
It would be best if I had a set of your actual data to verify this works and returns the output you desire, but the following should work.
library(dplyr)
df %>%
group_by(across(1:4)) %>%
summarize(Count = sum(`Passed Test` == "Y"))
An option with base R
aggregate(`Passed Test` ~ ., df, FUN = function(x) sum(x == "Y"))

Calculate averages for each country, from a data frame [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 3 years ago.
I'm working on a dataframe called 'df'. Two of the columns in 'df' are 'country' and 'dissent'. I need to calculate the average dissent per country. What is the most effective way to iterate through the data frame and calculate the averages by country?
I tried for loops but it does not work and also I don't think it is the most effective way.
tidyverse provides the easiest way IMO
df %>% group_by(country) %>% summarize(avg = mean(dissent, na.rm = TRUE))
If you really are intent upon "iterating" through, as the question suggests (though not recommended):
avg <- NULL
for (i in 1:length(unique(df$country))) {
avg[i] <- mean(df[df$country == unique(df$country)[i], "dissent"], na.rm=TRUE)
}

Fill In One Data Frame With Another [duplicate]

This question already has answers here:
Lookup value from another column that matches with variable
(3 answers)
Replace values in a dataframe based on lookup table
(8 answers)
Closed 3 years ago.
set.seed(1)
data=data.frame("id"=1:10,
"score"=NA)
data1=data.frame("id"=c(1:3,5,7,9,10),
"score"=sample(50:100,7))
WANT=data.frame("id"=1:10,
"score"=c(83,81,53,NA,59,NA,58,NA,99,67))
I have complete data frame "data" but I do not have values for everybody which is in my second data frame "data1". However for administrative reasons I must use the full data. Basically "WANT" maintains the structure of "data" but fills in the values where they are available.
Here is a simple solution.
set.seed(1)
data=data.frame("id"=1:10,
"score"=NA)
data1=data.frame("id"=c(1:3,5,7,9,10),
"score"=sample(50:100,7))
WANT=data.frame("id"=1:10,
"score"=c(83,81,53,NA,59,NA,58,NA,99,67))
library(tidyverse)
data %>%
select(-score) %>%
left_join(data1)
I may be reaching but maybe you need.
set.seed(1)
data=data.frame("id"=1:10,
"score"=sample(50:100,10))
data1=data.frame("id"=c(1:3,5,7,9,10),
"score"=sample(50:100,7))
WANT=data.frame("id"=1:10,
"score"=c(83,81,53,NA,59,NA,58,NA,99,67))
library(tidyverse)
data %>%
mutate(score1 = score) %>%
select(-score) %>%
left_join(data1) %>%
mutate(score = if_else(is.na(score),
score1,
score)) %>%
select(-score1)

Get mean values if a key column value is duplicated with dplyr (R) [duplicate]

This question already has answers here:
Means multiple columns by multiple groups [duplicate]
(4 answers)
Closed 4 years ago.
This is my data. What I would like to do is, if the gene column has duplicated value (e.g. CASZ1), then I would like to get mean values for each Sample column.
Input data
Output data
I googled it and tried, but I am stuck to get an answer. I am sorry for asking such a question looks exactly like homework.
My code
data %>% group_by(gene) %>% summarise(avg = mean(colnames(data)) --- error...
You can use summarize_at along with some regular expression to ensure any column not starting by your pattern will not be included:
data %>% group_by(gene) %>% summarise_at(vars(matches("Sample")), mean)
Is that what you're looking for?
You can use summarise_all:
library(dplyr)
data %>% group_by(gene) %>% summarise_all(funs(mean))

Resources