This question already has answers here:
Collapse text by group in data frame [duplicate]
(2 answers)
Closed 1 year ago.
I want to unite the texts of columns in one string , I am trying like this but not working for me
df <- data.frame(A1 = c("class","type","class","type","class","class","class","class","class"),
B1 = c("b2","b3","b3","b1","b3","b3","b3","b2","b1"),
C1 = c(22,56,43,56,1,5,7,8,NA),
C1T=c(NA, "Part of other business", NA, NA, NA, NA, "temprorary", NA, NA))
the output should be like
I am not sure if you want to do this for all the columns or only C, for all the columns you could do
> sapply(df,function(x){paste0(na.omit(x),collapse=",")})
A1
"class,type,class,type,class,class,class,class,class"
B1
"b2,b3,b3,b1,b3,b3,b3,b2,b1"
C1T
"Part of other business,temprorary"
You could try this code:
It removes all NA rows. then after grouping use summarise with toString()
library(dplyr)
library(tidyr)
df %>%
drop_na() %>%
group_by(B1 )%>%
summarise(Texts = toString(C1T)) %>%
select(-B1)
# A tibble: 1 x 1
Texts
<chr>
1 Part of other business, temprorary
If you want to do this dynamically or in a function -
library(dplyr)
var1 <- "C1T"
group_byy <- "B1"
df %>%
group_by(.data[[group_byy]]) %>%
summarise(Texts = toString(na.omit(.data[[var1]]))) %>%
filter(Texts != '')
# B1 Texts
# <chr> <chr>
#1 b3 Part of other business, temprorary
Related
This question already has an answer here:
Concatenate unique strings after groupby in R
(1 answer)
Closed 1 year ago.
I have the next dataframe:
col1<-c("A1","B1","A1","B1","C1","C1","A1")
col2<-c("a","b","c","d","b","f","a")
dat<-data.frame(col1,col2)
From the previous data frame I would like to get something like this:
A1 "ac"
B1 "bd"
C1 "bf"
I mean, I need to aggregate by paste unique values in col 2 grouping the codes in col1.
I was trying something like this
dat%>%group_by(col1)%>%summarise(pp=paste0(col2))
but It doesn't work.
Do this on the unique rows. Also, paste0 by itself doesn't work. it needs the additional argument collapse
aggregate(col2~ col1, unique(dat), FUN = paste, collapse="")
library(dplyr)
library(stringr)
dat %>%
distinct %>%
group_by(col1) %>%
summarise(pp = str_c(col2, collapse=""), .groups = 'drop')
-output
# A tibble: 3 x 2
col1 pp
<chr> <chr>
1 A1 ac
2 B1 bd
3 C1 bf
I am having some issues trying to sum a bunch of columns in R. I am analyzing a huge dataset so I am reproducing a sample. of fake data.
Here's how the data looks like (I have 800 columns).
library(data.table)
dataset <- data.table(name = c("A", "B", "C", "D"), a1 = 1:4, a2 = c(1,2,NaN,5), a3 = 1:4, a4 = 1:4, a5 = c(1,2,NA,5), a6 = 1:4, a8 = 1:4)
dataset
What I want to do is sum the columns in buckets of 100 columns so, for example, all the values in the first row between the first column and the column 100, all the values in the first row between the column 1 and the column 200, all the values in the second row between the first column and the column 100, etc.
Using the sample data I've come with this solution using rowSums.
dataset %>%
mutate_if(~!is.numeric(.x), as.numeric) %>%
mutate_all(funs(replace_na(., 0))) %>%
mutate(sum = rowSums(.[,paste("a", 1:3, sep="")])) %>%
mutate(sum1 = rowSums(.[,paste("a", 4:5, sep="")])) %>%
mutate(sum2 = rowSums(.[,paste("a", 6:8, sep="")]))
but I am getting the following error:
Error in `[.data.frame`(., , paste("a", 6:8, sep = "")) : undefined columns selected
as the data does not include column a7.
The original data is missing a bunch of columns between a1 and a800 so solving this would be key to make it work.
What would it be the best way to approach and solve this error?
Also, I have a few more questions regarding the code I've written:
Is there a smarter way to select the column a1 and a100 instead of using this approach .[,paste("a", 1:3, sep="")]? I am interested in selected the column by name. I do not want to select it by the position of the column because sometimes a100 does not mean that is the column 100.
Also, I am converting the NAs and the NaNs to 0 in order to be able to sum the rows. I am doing it this way mutate_all(funs(replace_na(., 0))), losing my first row than contains the names of the values. What would it be the best way to replace NA and NaN without mutating the string values of the first row to 0?
The type of the columns I am adding is integer as I converted them beforehand mutate_if(~!is.numeric(.x), as.numeric) . Should I follow the same approach in case I have dbl?
Thank you!
Here is one way to do this after transforming data to longer format, for each name, we create a group of n rows and take the sum.
library(dplyr)
library(tidyr)
n <- 2 #No of columns to bucket. Change this to 100 for your case.
dataset %>%
pivot_longer(cols = -name, names_to = 'col') %>%
group_by(name) %>%
group_by(grp = rep(seq_len(n()), each = n, length.out = n()), add = TRUE) %>%
summarise(value = sum(value, na.rm = TRUE)) %>%
#If needed in wider format again
pivot_wider(names_from = grp, values_from = value, names_prefix = 'col')
# name col1 col2 col3 col4
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 A 2 2 2 1
#2 B 4 4 4 2
#3 C 3 6 3 3
#4 D 9 8 9 4
This question already has answers here:
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Concatenate strings by group with dplyr [duplicate]
(4 answers)
Closed 4 years ago.
I would like to create a new dataframe based on an existing one. As the title suggests, I would like to paste all string values in a certain column, if a value in another column is equivalent.
Due to my poor writing skills, I think I'm not being very clear what I mean by this. To clarify, I've created an example.
Existing Dataframe
If I have something like this:
DF <- data.frame(
ID = c(1,2,2,3,3,3,4,4,4,4),
value = c("I","ate","cereals","for","breakfast","it","was","delicious","!!!",":)"))
New Dataframe
I would like to create something like this:
DF2 <- data.frame(
ID = c(1,2,3,4),
value = c(paste("I"), paste("ate","cereals"), paste("for","breakfast","it"), paste("was","delicious","!!!",":)")))
All strings from column value are consolidated using paste when they have same values in column ID. I'm having troubles building a function that can do this. Could you please help me.
I am comfortable with either dplyr or data.table.
In dplyr you can use group_by with summarise
DF %>%
group_by(ID) %>%
summarise(value = paste(value, collapse = " "))
## A tibble: 4 x 2
# ID value
# <dbl> <chr>
#1 1. I
#2 2. ate cereals
#3 3. for breakfast it
#4 4. was delicious !!! :)
You can just group_by(ID) and summarise with a concatenation function. Here I use str_c with the collapse argument.
library(tidyverse)
DF <- data.frame(
ID = c(1, 2, 2, 3, 3, 3, 4, 4, 4, 4),
value = c("I", "ate", "cereals", "for", "breakfast", "it", "was", "delicious", "!!!", ":)")
)
DF %>%
group_by(ID) %>%
summarise(value = str_c(value, collapse = " "))
#> # A tibble: 4 x 2
#> ID value
#> <dbl> <chr>
#> 1 1 I
#> 2 2 ate cereals
#> 3 3 for breakfast it
#> 4 4 was delicious !!! :)
Created on 2018-08-26 by the reprex package (v0.2.0).
This question already has answers here:
Add margin row totals in dplyr chain
(10 answers)
Closed 4 years ago.
My data:
data <- data.frame(column1 = c("A","B","C","D"), column2 = c(4, NA, NA, 1))
My pipe:
library (dplyr)
data2 <- data %>%
filter (grepl("A|B|D", column1))
My question:
How can I (simply) continue my pipe to add a row containing the total of the column2 (total = 5)?
You can do:
data2 <- data %>%
filter (grepl("A|B|D", column1)) %>%
rbind(., data.frame(column1="Total", column2=sum(.$column2, na.rm=T)))
column1 column2
1 A 4
2 B NA
3 D 1
4 Total 5
This question already has answers here:
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Closed 5 years ago.
I have a data_frame where I would like vector to be the concatenation of elements in A. So
df <- data_frame(id = c(1, 1, 2, 2), A = c("a", "b", "b", "c"))
df
Source: local data frame [4 x 2]
id A
1 1 a
2 1 b
3 2 b
4 2 c
Should become
newdf
Source: local data frame [4 x 2]
id vector
1 1 "a b"
2 2 "b c"
My first inclination is to use paste() inside summarise but this doesn't work.
df %>% group_by(id) %>% summarise(paste(A))
Error: expecting a single value
Hadley and Romain talk about a similar issue in the GitHub issues, but I can't quite see how that applies directly. It seems like there should be a very simple solution, especially because paste() usually does return a single value.
You need to collapse the values in paste
df %>% group_by(id) %>% summarise(vector=paste(A, collapse=" "))
My data frame was as:
col1 col2
1 one
1 one more
2 two
2 two
3 three
I needed to summarise it as follows:
col1 col3
1 one, one more
2 two
3 three
This following code did the trick:
df <- data.frame(col1 = c(1,1,2,2,3), col2 = c("one", "one more", "two", "two", "five"))
df %>%
group_by(col1) %>%
summarise( col3 = toString(unique(col2)))