This question already has answers here:
Adding a column of means by group to original data [duplicate]
(4 answers)
Closed 1 year ago.
I am attempting to create a new variable/column in a data frame at the county level from the mean variable of viral fragments detected in municipalities within this specific county for each day municipalities reported data. I have been able to calculate this mean with two different ways the following code:
dataframe[, mean(SARS.mean), by = date]
aggregate(datframeme$SARS.mean, list(dataframe$Date.Collected), FUN=mean)
but when i do something like
dataframe$countymeanforeachday <- dataframe[, mean(SARS.mean), by = date]
dataframe$countymeanforeachday <- aggregate(datframeme$SARS.mean, list(dataframe$Date.Collected), FUN=mean)
it does not work. Please advise I beg
If you are open to a tidyverse approach:
library(tidyverse)
dataframe <- dataframe %>%
group_by(Date.Collected) %>%
mutate(countymeanforeachday = mean(SARS.mean, na.rm = TRUE)) %>%
ungroup()
This question already has answers here:
Aggregate multiple columns at once [duplicate]
(2 answers)
Closed 2 years ago.
I have a data frame with 4 columns. I want to produce a new data frame which groups by the first three columns, and provides a count of the instances of "Yes" in the fourth column
So
becomes
How do I do this in R
Thanks for your help
It would be best if I had a set of your actual data to verify this works and returns the output you desire, but the following should work.
library(dplyr)
df %>%
group_by(across(1:4)) %>%
summarize(Count = sum(`Passed Test` == "Y"))
An option with base R
aggregate(`Passed Test` ~ ., df, FUN = function(x) sum(x == "Y"))
This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 3 years ago.
I'm working on a dataframe called 'df'. Two of the columns in 'df' are 'country' and 'dissent'. I need to calculate the average dissent per country. What is the most effective way to iterate through the data frame and calculate the averages by country?
I tried for loops but it does not work and also I don't think it is the most effective way.
tidyverse provides the easiest way IMO
df %>% group_by(country) %>% summarize(avg = mean(dissent, na.rm = TRUE))
If you really are intent upon "iterating" through, as the question suggests (though not recommended):
avg <- NULL
for (i in 1:length(unique(df$country))) {
avg[i] <- mean(df[df$country == unique(df$country)[i], "dissent"], na.rm=TRUE)
}
This question already has answers here:
Lookup value from another column that matches with variable
(3 answers)
Replace values in a dataframe based on lookup table
(8 answers)
Closed 3 years ago.
set.seed(1)
data=data.frame("id"=1:10,
"score"=NA)
data1=data.frame("id"=c(1:3,5,7,9,10),
"score"=sample(50:100,7))
WANT=data.frame("id"=1:10,
"score"=c(83,81,53,NA,59,NA,58,NA,99,67))
I have complete data frame "data" but I do not have values for everybody which is in my second data frame "data1". However for administrative reasons I must use the full data. Basically "WANT" maintains the structure of "data" but fills in the values where they are available.
Here is a simple solution.
set.seed(1)
data=data.frame("id"=1:10,
"score"=NA)
data1=data.frame("id"=c(1:3,5,7,9,10),
"score"=sample(50:100,7))
WANT=data.frame("id"=1:10,
"score"=c(83,81,53,NA,59,NA,58,NA,99,67))
library(tidyverse)
data %>%
select(-score) %>%
left_join(data1)
I may be reaching but maybe you need.
set.seed(1)
data=data.frame("id"=1:10,
"score"=sample(50:100,10))
data1=data.frame("id"=c(1:3,5,7,9,10),
"score"=sample(50:100,7))
WANT=data.frame("id"=1:10,
"score"=c(83,81,53,NA,59,NA,58,NA,99,67))
library(tidyverse)
data %>%
mutate(score1 = score) %>%
select(-score) %>%
left_join(data1) %>%
mutate(score = if_else(is.na(score),
score1,
score)) %>%
select(-score1)
This question already has answers here:
Means multiple columns by multiple groups [duplicate]
(4 answers)
Closed 4 years ago.
This is my data. What I would like to do is, if the gene column has duplicated value (e.g. CASZ1), then I would like to get mean values for each Sample column.
Input data
Output data
I googled it and tried, but I am stuck to get an answer. I am sorry for asking such a question looks exactly like homework.
My code
data %>% group_by(gene) %>% summarise(avg = mean(colnames(data)) --- error...
You can use summarize_at along with some regular expression to ensure any column not starting by your pattern will not be included:
data %>% group_by(gene) %>% summarise_at(vars(matches("Sample")), mean)
Is that what you're looking for?
You can use summarise_all:
library(dplyr)
data %>% group_by(gene) %>% summarise_all(funs(mean))