This question already has answers here:
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Closed 3 years ago.
I Have a data frame as follows:
df <- data.frame(Name = c("a","c","d","b","f","g","h"), group = c(2,1,2,3,1,3,1))
Name group
a 2
c 1
d 2
b 3
f 1
g 3
h 1
I would like to use gather function from tidyverse package to reshape my data frame to the following format.
group Name total
1 c,f,h 3
2 a,d 2
3 b,h 2
Do you know how can I do this?
Thanks,
We can group by 'group' and paste the elements of 'Name' with toString, while getting the total number of elements with n()
library(dplyr)
df %>%
group_by(group) %>%
summarise(Name = toString(Name), total = n())
Related
This question already has answers here:
How can I calculate the sum of comma separated in the 2nd column
(3 answers)
Closed 2 years ago.
I have a data frame:
df <- data.frame(sample_names=c("foo","bar","foo, bar"), sample_values=c(1,5,3))
df
sample_names sample_values
1 foo 1
2 bar 5
3 foo, bar 3
and I want a resulting data.frame of the following shape:
sample_names sample_values
1 foo 4
2 bar 8
Is there a elegant way to achieve this? My workaround would be to grep by "," and somehow fidly add the result to the existing rows. Since I want to apply this on multiple dataframes, I'd like to come up with an easier solution. Any ideas for this?
We can use separate_rows to split the column, then do a group by operation to get the sum
library(dplyr)
library(tidyr)
df %>%
separate_rows(sample_names) %>%
group_by(sample_names) %>%
summarise(sample_values = sum(sample_values), .groups = 'drop')
-output
# A tibble: 2 x 2
# sample_names sample_values
# <chr> <dbl>
#1 bar 8
#2 foo 4
Or with base R by splitting the column with strsplit into a list of vectors, then use tapply to do a group by sum
lst1 <- strsplit(df$sample_names, ",\\s+")
tapply(rep(df$sample_values, lengths(lst1)), unlist(lst1), FUN = sum)
This question already has answers here:
Remove group from data.frame if at least one group member meets condition
(4 answers)
Closed 3 years ago.
Here is a data frame:
df <- data.frame(letter = rep(c("a","b","c","d"), each = 4), number = c(2,1,5,3,9,4,2,4,3,11,1,2,1,1,5,6))
I know how to remove a rows based on an observation:
rmv <- with(df, number > 8) # finds observations greater than 8
new.df<- df[!rmv, ] # removes observations
However, I want to remove all inputs for each letter group (i.e., all the 'b' and 'c' inputs) if there are any observations greater than 8. Ideal output would be:
letter number
1 a 2
2 a 1
3 a 5
4 a 3
13 d 1
14 d 1
15 d 5
16 d 6
How would I accomplish this?
We can use any, negate (!) after doing a group by 'letter'
library(dplyr)
df %>%
group_by(letter) %>%
filter(!any(number > 8))
Or do the reverse with all
df %>%
group_by(letter) %>%
filter(all(number <= 8))
In base R, this can be done with ave
df[with(df, ave(number <= 8, letter, FUN = all)),]
This question already has answers here:
Why are my dplyr group_by & summarize not working properly? (name-collision with plyr)
(5 answers)
Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?
(2 answers)
Closed 4 years ago.
Why does this piece of code:
letter = c("A","B","C","C","A")
product = c("Beef","Chicken","Beef","Beef","Beef")
value = c(10,20,40,10,5)
df <- data.frame(letter,product,value)
df1 = df %>% group_by(letter,product) %>% summarise(value = sum(value))
Results in this:
> df1
value
1 85
Rather than this?
> df
letter product value
1 A Beef 15
2 B Chicken 20
3 C Beef 50
This question already has answers here:
Counting unique / distinct values by group in a data frame
(12 answers)
Closed 6 years ago.
Let's say that I have a simple data frame in R, as follows:
#example data frame
a = c("red","red","green")
b = c("01/01/1900","01/02/1950","01/05/1990")
df = data.frame(a,b)
colnames(df)<-c("Color","Dates")
My goal is to count the number of dates (as a class - not individually) for each variable in the "Color" column. So, the result would look like this:
#output should look like this:
a = c("red","green")
b = c("2","1")
df = data.frame(a,b)
colnames(df)<-c("Color","Dates")
Red was associated with two dates -- the dates themselves are unimportant, I'd just like to count the aggregate number of dates per color in the data frame.
Or in base R:
sapply(split(df, df$Color), nrow)
# green red
# 1 2
We can use data.table
library(data.table)
setDT(df)[, .(Dates = uniqueN(Dates)) , Color]
# Color Dates
#1: red 2
#2: green 1
using the dplyr package from the tidyverse:
library(dplyr)
df %>% group_by(Color) %>% summarise(n())
# # A tibble: 2 × 2
# Color `n()`
# <fctr> <int>
# 1 green 1
# 2 red 2
This question already has answers here:
count number of rows in a data frame in R based on group [duplicate]
(8 answers)
Closed 6 years ago.
Say I have a data table like this:
id days age
"jdkl" 8 23
"aowl" 1 09
"mnoap" 4 82
"jdkl" 3 14
"jdkl" 2 34
"mnoap" 27 56
I want to create a new data table that has one column with the ids and one column with the number of times they appear. I know that data table has something with =.N, but I wasn't sure how to use it for only one column.
The final data table would look like this:
id count
"jdkl" 3
"aowl" 2
"mnoap" 1
You can just use table from base R:
as.data.frame(sort(table(df$id), decreasing = T))
However, if you want to do it using data.table:
library(data.table)
setDT(df)[, .(Count = .N), by = id][order(-Count)]
or there is the dplyr solution
library(dplyr)
df %>% count(id) %>% arrange(desc(n))
We can use
library(dplyr)
df %>%
group_by(id) %>%
summarise(Count = n()) %>%
arrange(desc(Count))
Or using aggregate from base R
r1 <- aggregate(cbind(Count=days)~id, df1, length)
r1[order(-r1$Count),]
# id Count
#2 jdkl 3
#3 mnoap 2
#1 aowl 1