use dplyr to concatenate a column [duplicate] - r

This question already has answers here:
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Closed 5 years ago.
I have a data_frame where I would like vector to be the concatenation of elements in A. So
df <- data_frame(id = c(1, 1, 2, 2), A = c("a", "b", "b", "c"))
df
Source: local data frame [4 x 2]
id A
1 1 a
2 1 b
3 2 b
4 2 c
Should become
newdf
Source: local data frame [4 x 2]
id vector
1 1 "a b"
2 2 "b c"
My first inclination is to use paste() inside summarise but this doesn't work.
df %>% group_by(id) %>% summarise(paste(A))
Error: expecting a single value
Hadley and Romain talk about a similar issue in the GitHub issues, but I can't quite see how that applies directly. It seems like there should be a very simple solution, especially because paste() usually does return a single value.

You need to collapse the values in paste
df %>% group_by(id) %>% summarise(vector=paste(A, collapse=" "))

My data frame was as:
col1 col2
1 one
1 one more
2 two
2 two
3 three
I needed to summarise it as follows:
col1 col3
1 one, one more
2 two
3 three
This following code did the trick:
df <- data.frame(col1 = c(1,1,2,2,3), col2 = c("one", "one more", "two", "two", "five"))
df %>%
group_by(col1) %>%
summarise( col3 = toString(unique(col2)))

Related

Aggregation of character column grop by and paste [duplicate]

This question already has an answer here:
Concatenate unique strings after groupby in R
(1 answer)
Closed 1 year ago.
I have the next dataframe:
col1<-c("A1","B1","A1","B1","C1","C1","A1")
col2<-c("a","b","c","d","b","f","a")
dat<-data.frame(col1,col2)
From the previous data frame I would like to get something like this:
A1 "ac"
B1 "bd"
C1 "bf"
I mean, I need to aggregate by paste unique values in col 2 grouping the codes in col1.
I was trying something like this
dat%>%group_by(col1)%>%summarise(pp=paste0(col2))
but It doesn't work.
Do this on the unique rows. Also, paste0 by itself doesn't work. it needs the additional argument collapse
aggregate(col2~ col1, unique(dat), FUN = paste, collapse="")
library(dplyr)
library(stringr)
dat %>%
distinct %>%
group_by(col1) %>%
summarise(pp = str_c(col2, collapse=""), .groups = 'drop')
-output
# A tibble: 3 x 2
col1 pp
<chr> <chr>
1 A1 ac
2 B1 bd
3 C1 bf

How to efficiently transpose data frames with the tidyverse or data.table? [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 1 year ago.
I have several files ending in *.var and I want to combine then
For this, I used the package purrr
filelist = list.files(pattern = "*.var$") #make the file list
df = filelist %>%
set_names() %>%
map_dfr(
~ read_csv(.x, col_types = cols(), col_names = FALSE),
.id = "file_name"
)
That seems to give me the desired output
# A tibble: 6 x 3
file_name X1 X2
<chr> <chr> <chr>
1 CV.var Chrom_3_793 T
2 CV.var Chrom_3_4061 G
3 CV.var Chrom_3_4034 G
4 CV.var Chrom_3_4035 A
5 GK.var Chrom_3_4061 T
6 CV.var Chrom_3_4064 T
But now I would like to transform this table into a table with boolean values.
Basically, I want the column 1 values (there are 4 in total) to become column entries.
And the first 2 columns would be the columns X1 and X2
So that I could know if
Chrom_3_4061 T is in 1, 2, 3, or 4 of my sets, for example:
CV.var GK.var DP.var SK.var
Chrom_3_4061 G 1 0 1 1
That should be a question of transposing and cutting pasting, what is the most efficient way of doing it, I feel a bit lost with the different packages and approaches.
Thanks a lot.
You could use pivot_wider:
library(tidyr)
df %>%
mutate(value = TRUE) %>%
pivot_wider(names_from = file_name, values_fill = FALSE)
I filled it with booleans instead of 0 and 1.

Move any row to bottom of dataframe based on row index in pipe

How do you move any row to the bottom of a dataframe based on the row index using a dplyr pipe?
This can be done in two lines. First line duplicates the desired row to the end of the dataframe using bind_rows() and slice(). The second line removes the first instance of the now duplicated row using slice().
Here is an example moving the second row to the end:
library(dplyr)
df = tibble(x = letters[1:4], y = 1:4)
df %>%
bind_rows(slice(., 2)) %>%
slice(-2)
Which returns:
# A tibble: 4 x 2
x y
<chr> <int>
1 a 1
2 c 3
3 d 4
4 b 2
I am aware that the question specifically states "based on row index", however, I found this thread in the context of trying to figure out how to do this based on a character match and I found no guidance. Hence I figured out something myself (with the help of Oliver Oliver's post here - Thank you!), which I would like to share.
library(dplyr)
df = tibble(x = letters[1:4], y = 1:4)
df %>%
bind_rows(slice(., which(x != "b")), slice(., which(x == "b"))) %>%
tail(nrow(.)/2)

Using set_names vs. mutate(colnames) when changing data frame column names to lower case

A quick question that I was looking to understand better.
Data:
df1 <- data.frame(COLUMN_1 = letters[1:3], COLUMN_2 = 1:3)
> df1
COLUMN_1 COLUMN_2
1 a 1
2 b 2
3 c 3
Why does this work in setting data frame names to lower case:
df2 <- df1 %>%
set_names(., tolower(names(.)))
> df2
column_1 column_2
1 a 1
2 b 2
3 c 3
But this does not?
df2 <- df1 %>%
mutate( colnames(.) <- tolower(colnames(.)) )
Error: Column `colnames(.) <- tolower(colnames(.))` must be length 3 (the number of rows) or one, not 2
The solution, writing the arguments out explicitly, is:
df1 %>% rename_all(tolower) ==
rename_all(.tbl = df1, .funs = tolower)
mutate operates on the data itself, not the column names, so that's why we're using rename. We use rename_all because you don't want to type out 1 = tolower(1), 2 = tolower(2), ...
What you suggested, df2 <- df1 %>% rename_all(tolower(.)) doesn't work because then you would be trying to feed the whole df1 into the tolower function, which is not what you want.
Another solution would be this names(df) <- tolower(names(df))

Paste string values from different rows, if values from another column are the same [duplicate]

This question already has answers here:
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Concatenate strings by group with dplyr [duplicate]
(4 answers)
Closed 4 years ago.
I would like to create a new dataframe based on an existing one. As the title suggests, I would like to paste all string values in a certain column, if a value in another column is equivalent.
Due to my poor writing skills, I think I'm not being very clear what I mean by this. To clarify, I've created an example.
Existing Dataframe
If I have something like this:
DF <- data.frame(
ID = c(1,2,2,3,3,3,4,4,4,4),
value = c("I","ate","cereals","for","breakfast","it","was","delicious","!!!",":)"))
New Dataframe
I would like to create something like this:
DF2 <- data.frame(
ID = c(1,2,3,4),
value = c(paste("I"), paste("ate","cereals"), paste("for","breakfast","it"), paste("was","delicious","!!!",":)")))
All strings from column value are consolidated using paste when they have same values in column ID. I'm having troubles building a function that can do this. Could you please help me.
I am comfortable with either dplyr or data.table.
In dplyr you can use group_by with summarise
DF %>%
group_by(ID) %>%
summarise(value = paste(value, collapse = " "))
## A tibble: 4 x 2
# ID value
# <dbl> <chr>
#1 1. I
#2 2. ate cereals
#3 3. for breakfast it
#4 4. was delicious !!! :)
You can just group_by(ID) and summarise with a concatenation function. Here I use str_c with the collapse argument.
library(tidyverse)
DF <- data.frame(
ID = c(1, 2, 2, 3, 3, 3, 4, 4, 4, 4),
value = c("I", "ate", "cereals", "for", "breakfast", "it", "was", "delicious", "!!!", ":)")
)
DF %>%
group_by(ID) %>%
summarise(value = str_c(value, collapse = " "))
#> # A tibble: 4 x 2
#> ID value
#> <dbl> <chr>
#> 1 1 I
#> 2 2 ate cereals
#> 3 3 for breakfast it
#> 4 4 was delicious !!! :)
Created on 2018-08-26 by the reprex package (v0.2.0).

Resources