Counting same Elements in a dataframe r [duplicate] - r

This question already has answers here:
Grouping functions (tapply, by, aggregate) and the *apply family
(10 answers)
Closed 4 years ago.
got another problem with r dataframes.
#starting position
from <- c("A","B","A","C")
to <- c("D","F","D","F")
number <- c(3,4,6,7)
data.frame(from,to,number)
How can I count the numbers of the two same "from-to" relations (from A to D)?
The result should look like my "result" dataframe.
#result
from <- c("A","B","C")
to <- c("D","F","F")
result <- c(9,4,7)
data.frame(from,to,result)
Thank you guys :)

You can use group_by to group "from" and "to" then use sum in summarise to get the total for each of the groups.
library(dplyr)
df %>% group_by(from,to) %>% summarise(result = sum(number))

Related

Group by and count based on muliple conditions in R [duplicate]

This question already has answers here:
Aggregate multiple columns at once [duplicate]
(2 answers)
Closed 2 years ago.
I have a data frame with 4 columns. I want to produce a new data frame which groups by the first three columns, and provides a count of the instances of "Yes" in the fourth column
So
becomes
How do I do this in R
Thanks for your help
It would be best if I had a set of your actual data to verify this works and returns the output you desire, but the following should work.
library(dplyr)
df %>%
group_by(across(1:4)) %>%
summarize(Count = sum(`Passed Test` == "Y"))
An option with base R
aggregate(`Passed Test` ~ ., df, FUN = function(x) sum(x == "Y"))

adding values in a row if duplicate value in another row in r [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 3 years ago.
I have a data frame of two columns having word and frequency.
i want a new data frame when duplicate word comes the frequency should ad up as happened with word great
These seems to be pretty straight forward but i am not able to do it.
Any suggestion
You can use dplyr for this one:
library(dplyr)
word <- c("great", "good", "nice", "great")
freq <- c(2,4,5,6)
df <- data.frame(word = word, freq = as.numeric(freq))
df %>% group_by(word) %>%
summarise(freq = sum(freq))
aggregate(dat$freq, by=list(dat$word), FUN=sum)

Calculate averages for each country, from a data frame [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 3 years ago.
I'm working on a dataframe called 'df'. Two of the columns in 'df' are 'country' and 'dissent'. I need to calculate the average dissent per country. What is the most effective way to iterate through the data frame and calculate the averages by country?
I tried for loops but it does not work and also I don't think it is the most effective way.
tidyverse provides the easiest way IMO
df %>% group_by(country) %>% summarize(avg = mean(dissent, na.rm = TRUE))
If you really are intent upon "iterating" through, as the question suggests (though not recommended):
avg <- NULL
for (i in 1:length(unique(df$country))) {
avg[i] <- mean(df[df$country == unique(df$country)[i], "dissent"], na.rm=TRUE)
}

dplyr filter tens of columns [duplicate]

This question already has answers here:
filter for complete cases in data.frame using dplyr (case-wise deletion)
(7 answers)
Closed 5 years ago.
Suppose I have a 27 columns data frame. The first column is the ID, and the rest of columns (A to Z) are just data. I want to take out all the rows whose A to Z columns are NA. How should I do it?
The straightforward way is just
data %>%
filter(!(is.na(A) & is.na(B) .... & is.na(Z)))
Is there a more efficient or easier way to do it?
This question is different from This one because I want to exclude rows whose value are ALL NA, and keep the rows whose value are partially NA.
Using tidyverse:
library(tidyverse)
Load data:
ID <- c(1:8)
Col1<-c(34564,NA,43456,NA,45655,6789,99999,87667)
Col2<-c(34565,43456,55555,NA,65433,22234,NA,98909)
Col3<-c(45673,88789,11123,NA,55676,76566,NA,NA)
mydf <- data_frame(ID,Col1,Col2,Col3)
mydf %>%
slice(which(complete.cases(.)))
Whether you want to preserve selected columns removing rows with all NAs you may run:
mydf %>%
mutate(full_incomplete_cases=rowSums(is.na(.[-1]))) %>%
filter(full_incomplete_cases<length(mydf[,-1])) %>%
select(ID:Col3)

aggregate data frame with many columns according to one column [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 5 years ago.
From a data frame of many columns, I would like to aggregate (i.e. sum) hundreds of columns by a single column, without specifying each of the column names.
Some sample data:
names <- floor(runif(20, 1, 5))
sample <- cbind(names)
for(i in 1:20){
col <- rnorm(20,2,4)
sample <- cbind(sample, col)
}
What I have until now is the following code, but it gives me that arguments must be the same length.
aggregated <- aggregate.data.frame(sample[,c(2:20)], by = as.list(names), FUN = 'sum')
Original dataset is a lot bigger, so I can't specify the name of each of the columns to be aggregated and I can't use the list function.
You don't actually need to list them at all:
aggregate(. ~ names, sample, sum) # . represents all other columns
Of course base R is my favorite but in case someone wants dplyr:
library(dplyr)
data.frame(sample) %>%
group_by(names) %>%
summarise_each(funs(sum))
Just alter your code slightly:
aggregated <- aggregate(sample[,c(2:20)], by = list(names), FUN = 'sum')

Resources