Aggregation of character column grop by and paste [duplicate] - r

This question already has an answer here:
Concatenate unique strings after groupby in R
(1 answer)
Closed 1 year ago.
I have the next dataframe:
col1<-c("A1","B1","A1","B1","C1","C1","A1")
col2<-c("a","b","c","d","b","f","a")
dat<-data.frame(col1,col2)
From the previous data frame I would like to get something like this:
A1 "ac"
B1 "bd"
C1 "bf"
I mean, I need to aggregate by paste unique values in col 2 grouping the codes in col1.
I was trying something like this
dat%>%group_by(col1)%>%summarise(pp=paste0(col2))
but It doesn't work.

Do this on the unique rows. Also, paste0 by itself doesn't work. it needs the additional argument collapse
aggregate(col2~ col1, unique(dat), FUN = paste, collapse="")
library(dplyr)
library(stringr)
dat %>%
distinct %>%
group_by(col1) %>%
summarise(pp = str_c(col2, collapse=""), .groups = 'drop')
-output
# A tibble: 3 x 2
col1 pp
<chr> <chr>
1 A1 ac
2 B1 bd
3 C1 bf

Related

How can I find the unique combinations based on two columns? [duplicate]

This question already has answers here:
How can I remove all duplicates so that NONE are left in a data frame?
(3 answers)
Closed 1 year ago.
I need to find the unique entries in my dataframe using column ID and Genus. I do not need to find unique values from column Count. My dataframe is structured like this:
ID Genus Count
A Genus1 4
A Genus18 265
A Genus28 1
A Genus2 900
B Genus1 85
B Genus18 9
B Genus28 24
B Genus2 6
B Genus3000 152
The resulting dataframe would have only
ID Genus Count
B Genus3000 152
In it because this row is unique by ID and Genus.
I have tidyverse loaded but have had trouble trying to get the result I need. I tried using distinct() but continue to get back all data from the input as output.
I have tried the following:
uniquedata <- mydata %>% distinct(.keep_all = TRUE)
uniquedata <- mydata %>% group_by(ID, Genus) %>% distinct(.keep_all = TRUE)
uniquedata <- mydata %>% distinct(ID, Genus, .keep_all = TRUE)
uniquedata <- mydata %>% distinct()
What should I use to achieve my desired output?
We could use add_count in combination with filter:
library(dplyr)
df %>%
add_count(Genus) %>%
filter(n == 1) %>%
select(ID, Genus, Count)
Output:
ID Genus Count
<chr> <chr> <dbl>
1 B Genus3000 152
For the given data set, it is enough to check the column "Genus" for values appearing twice and then to remove the corresponding rows from the dataframe.
df %>% count(Genus) -> countGenus
filter(df, Genus %in% filter(countGenus,n==1)$Genus)

unite the strings of columns in one string [duplicate]

This question already has answers here:
Collapse text by group in data frame [duplicate]
(2 answers)
Closed 1 year ago.
I want to unite the texts of columns in one string , I am trying like this but not working for me
df <- data.frame(A1 = c("class","type","class","type","class","class","class","class","class"),
B1 = c("b2","b3","b3","b1","b3","b3","b3","b2","b1"),
C1 = c(22,56,43,56,1,5,7,8,NA),
C1T=c(NA, "Part of other business", NA, NA, NA, NA, "temprorary", NA, NA))
the output should be like
I am not sure if you want to do this for all the columns or only C, for all the columns you could do
> sapply(df,function(x){paste0(na.omit(x),collapse=",")})
A1
"class,type,class,type,class,class,class,class,class"
B1
"b2,b3,b3,b1,b3,b3,b3,b2,b1"
C1T
"Part of other business,temprorary"
You could try this code:
It removes all NA rows. then after grouping use summarise with toString()
library(dplyr)
library(tidyr)
df %>%
drop_na() %>%
group_by(B1 )%>%
summarise(Texts = toString(C1T)) %>%
select(-B1)
# A tibble: 1 x 1
Texts
<chr>
1 Part of other business, temprorary
If you want to do this dynamically or in a function -
library(dplyr)
var1 <- "C1T"
group_byy <- "B1"
df %>%
group_by(.data[[group_byy]]) %>%
summarise(Texts = toString(na.omit(.data[[var1]]))) %>%
filter(Texts != '')
# B1 Texts
# <chr> <chr>
#1 b3 Part of other business, temprorary

In R: is there an elegant way to split data.frame row by "," and add to existing rows matching the splitted strings? [duplicate]

This question already has answers here:
How can I calculate the sum of comma separated in the 2nd column
(3 answers)
Closed 2 years ago.
I have a data frame:
df <- data.frame(sample_names=c("foo","bar","foo, bar"), sample_values=c(1,5,3))
df
sample_names sample_values
1 foo 1
2 bar 5
3 foo, bar 3
and I want a resulting data.frame of the following shape:
sample_names sample_values
1 foo 4
2 bar 8
Is there a elegant way to achieve this? My workaround would be to grep by "," and somehow fidly add the result to the existing rows. Since I want to apply this on multiple dataframes, I'd like to come up with an easier solution. Any ideas for this?
We can use separate_rows to split the column, then do a group by operation to get the sum
library(dplyr)
library(tidyr)
df %>%
separate_rows(sample_names) %>%
group_by(sample_names) %>%
summarise(sample_values = sum(sample_values), .groups = 'drop')
-output
# A tibble: 2 x 2
# sample_names sample_values
# <chr> <dbl>
#1 bar 8
#2 foo 4
Or with base R by splitting the column with strsplit into a list of vectors, then use tapply to do a group by sum
lst1 <- strsplit(df$sample_names, ",\\s+")
tapply(rep(df$sample_values, lengths(lst1)), unlist(lst1), FUN = sum)

filter for rows that meet multiple conditions in a column [duplicate]

This question already has answers here:
How to extract all the rows if a level in one column contains all the levels of another column in R?
(3 answers)
Closed 3 years ago.
I am trying to subset or filter for rows where an ID is associated with two values in the same column (there is a row for each "ID" and the associated condition "DIR")
I was not able to figure this out in dplyr filter or with the subset function
x <- data.frame("ID"=c(1,2,2,3,3,3,4,4,4,4),
"DIR"=c("up","up","down","up","up","up","down","down","down","down"))
I have attempted variations on both:
subset(x, DIR=="up" & DIR=="down")
x %>% group_by(ID) %>% filter(DIR=="up" & DIR=="down")
What I would like is for all that remains to be the two rows for ID #2, given that it is the only ID that has both "up" and "down" in the DIR column
It is returning no results
After grouping by 'ID', filter by checking all the elements of the vector (c("up", "down")) are %in% the column 'DIR'
library(dplyr)
x %>%
group_by(ID) %>%
filter(all(c("up", "down") %in% DIR) )
# A tibble: 2 x 2
# Groups: ID [1]
# ID DIR
# <dbl> <fct>
#1 2 up
#2 2 down
Or using base R
i1 <- with(x, as.logical(ave(as.character(DIR), ID, FUN =
function(x) all(c("up", "down") %in% x))))
x[i1, ]
# ID DIR
#2 2 up
#3 2 down

use dplyr to concatenate a column [duplicate]

This question already has answers here:
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Closed 5 years ago.
I have a data_frame where I would like vector to be the concatenation of elements in A. So
df <- data_frame(id = c(1, 1, 2, 2), A = c("a", "b", "b", "c"))
df
Source: local data frame [4 x 2]
id A
1 1 a
2 1 b
3 2 b
4 2 c
Should become
newdf
Source: local data frame [4 x 2]
id vector
1 1 "a b"
2 2 "b c"
My first inclination is to use paste() inside summarise but this doesn't work.
df %>% group_by(id) %>% summarise(paste(A))
Error: expecting a single value
Hadley and Romain talk about a similar issue in the GitHub issues, but I can't quite see how that applies directly. It seems like there should be a very simple solution, especially because paste() usually does return a single value.
You need to collapse the values in paste
df %>% group_by(id) %>% summarise(vector=paste(A, collapse=" "))
My data frame was as:
col1 col2
1 one
1 one more
2 two
2 two
3 three
I needed to summarise it as follows:
col1 col3
1 one, one more
2 two
3 three
This following code did the trick:
df <- data.frame(col1 = c(1,1,2,2,3), col2 = c("one", "one more", "two", "two", "five"))
df %>%
group_by(col1) %>%
summarise( col3 = toString(unique(col2)))

Resources