This question already has answers here:
Aggregate multiple columns at once [duplicate]
(2 answers)
Closed 2 years ago.
I have a data frame with 4 columns. I want to produce a new data frame which groups by the first three columns, and provides a count of the instances of "Yes" in the fourth column
So
becomes
How do I do this in R
Thanks for your help
It would be best if I had a set of your actual data to verify this works and returns the output you desire, but the following should work.
library(dplyr)
df %>%
group_by(across(1:4)) %>%
summarize(Count = sum(`Passed Test` == "Y"))
An option with base R
aggregate(`Passed Test` ~ ., df, FUN = function(x) sum(x == "Y"))
Related
This question already has answers here:
Adding a column of means by group to original data [duplicate]
(4 answers)
Closed 1 year ago.
I am attempting to create a new variable/column in a data frame at the county level from the mean variable of viral fragments detected in municipalities within this specific county for each day municipalities reported data. I have been able to calculate this mean with two different ways the following code:
dataframe[, mean(SARS.mean), by = date]
aggregate(datframeme$SARS.mean, list(dataframe$Date.Collected), FUN=mean)
but when i do something like
dataframe$countymeanforeachday <- dataframe[, mean(SARS.mean), by = date]
dataframe$countymeanforeachday <- aggregate(datframeme$SARS.mean, list(dataframe$Date.Collected), FUN=mean)
it does not work. Please advise I beg
If you are open to a tidyverse approach:
library(tidyverse)
dataframe <- dataframe %>%
group_by(Date.Collected) %>%
mutate(countymeanforeachday = mean(SARS.mean, na.rm = TRUE)) %>%
ungroup()
This question already has answers here:
Count number of rows by group using dplyr
(5 answers)
Closed 2 years ago.
I have a dataset that contains information about multiple countries. As I am trying to construct population weights, I want to calculate:
country population (variable included in the dataset) / sample size for each country (different for each country)
For one specific country, I would first create a subset (e.g. italydata = subset(data, data$country == "Italy") and then divide country_population by nrow(italydata).
I am looking for a way to do this calculation for each country in the dataset. I have tried it with dplyr package, but I am uncertain what to write instead of nrow("x").
weight_by_economy <- data %>%
group_by(country) %>%
summarize(weight = country_population/nrow(x))
Thanks for your help!
Try
weight_by_economy <- data %>%
group_by(country) %>%
summarize(weight = country_population/n())
if this doesn't work, please clarify the question by providing a representative data object.
This question already has answers here:
Grouping functions (tapply, by, aggregate) and the *apply family
(10 answers)
Closed 4 years ago.
got another problem with r dataframes.
#starting position
from <- c("A","B","A","C")
to <- c("D","F","D","F")
number <- c(3,4,6,7)
data.frame(from,to,number)
How can I count the numbers of the two same "from-to" relations (from A to D)?
The result should look like my "result" dataframe.
#result
from <- c("A","B","C")
to <- c("D","F","F")
result <- c(9,4,7)
data.frame(from,to,result)
Thank you guys :)
You can use group_by to group "from" and "to" then use sum in summarise to get the total for each of the groups.
library(dplyr)
df %>% group_by(from,to) %>% summarise(result = sum(number))
This question already has answers here:
filter for complete cases in data.frame using dplyr (case-wise deletion)
(7 answers)
Closed 5 years ago.
Suppose I have a 27 columns data frame. The first column is the ID, and the rest of columns (A to Z) are just data. I want to take out all the rows whose A to Z columns are NA. How should I do it?
The straightforward way is just
data %>%
filter(!(is.na(A) & is.na(B) .... & is.na(Z)))
Is there a more efficient or easier way to do it?
This question is different from This one because I want to exclude rows whose value are ALL NA, and keep the rows whose value are partially NA.
Using tidyverse:
library(tidyverse)
Load data:
ID <- c(1:8)
Col1<-c(34564,NA,43456,NA,45655,6789,99999,87667)
Col2<-c(34565,43456,55555,NA,65433,22234,NA,98909)
Col3<-c(45673,88789,11123,NA,55676,76566,NA,NA)
mydf <- data_frame(ID,Col1,Col2,Col3)
mydf %>%
slice(which(complete.cases(.)))
Whether you want to preserve selected columns removing rows with all NAs you may run:
mydf %>%
mutate(full_incomplete_cases=rowSums(is.na(.[-1]))) %>%
filter(full_incomplete_cases<length(mydf[,-1])) %>%
select(ID:Col3)
This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 5 years ago.
From a data frame of many columns, I would like to aggregate (i.e. sum) hundreds of columns by a single column, without specifying each of the column names.
Some sample data:
names <- floor(runif(20, 1, 5))
sample <- cbind(names)
for(i in 1:20){
col <- rnorm(20,2,4)
sample <- cbind(sample, col)
}
What I have until now is the following code, but it gives me that arguments must be the same length.
aggregated <- aggregate.data.frame(sample[,c(2:20)], by = as.list(names), FUN = 'sum')
Original dataset is a lot bigger, so I can't specify the name of each of the columns to be aggregated and I can't use the list function.
You don't actually need to list them at all:
aggregate(. ~ names, sample, sum) # . represents all other columns
Of course base R is my favorite but in case someone wants dplyr:
library(dplyr)
data.frame(sample) %>%
group_by(names) %>%
summarise_each(funs(sum))
Just alter your code slightly:
aggregated <- aggregate(sample[,c(2:20)], by = list(names), FUN = 'sum')