Create new columns lopping an array inside mutate (dplyr) - r

I have the following dummy data frame called df:
A1 A2 A3 B1 B2 B3 C1 C2 C3
1 1 1 2 2 2 3 3 3
and I would like to sum columns that contain the same letter into a new column (naming it using the corresponding letter).
I would expect this result:
A1 A2 A3 B1 B2 B3 C1 C2 C3 A B C
1 1 1 2 2 2 3 3 3 3 6 9
I know I can achieve this result using mutatefrom dyplr:
mutate(df,
A = A1 + A2 + A3,
B = B1 + B2 + B3,
C = C1 + C2 + C3)
Is there any way to do it using a vector like letters <- c("A", "B", "C") and looping over that vector inside the mutate function? Something like:
mutate(df,
letters = paste0(letters,"1") + paste0(letters,"2") + paste0(letters,"3") )

One dplyr and purrr solution could be:
bind_cols(df, map_dfc(.x = LETTERS[1:3],
~ df %>%
transmute(!!.x := rowSums(select(., starts_with(.x))))))
A1 A2 A3 B1 B2 B3 C1 C2 C3 A B C
1 1 1 1 2 2 2 3 3 3 3 6 9

Related

Is there a way to change data frame entries in R from numeric to a specific character?

If I have a data frame like so:
df <- data.frame(
a = c(1,1,1,2,2,2,3,3,3),
b = c(1,2,3,1,2,3,1,2,3)
)
which looks like this:
> df
a b
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
Is there a quick way to change the columns a and b to match the example below, without explicitly having to type it all out?
> df
a b
a1 b1
a1 b2
a1 b3
a2 b1
a2 b2
a2 b3
a3 b1
a3 b2
a3 b3
In other words, Im trying to take the name of the column and just place it in front of the value that was in that row originally.
We can use cur_column to return the corresponding column name within across and paste (str_c) the column value with the corresponding column name
library(dplyr)
library(stringr)
df1 <- df %>%
mutate(across(everything(), ~ str_c(cur_column(), .)))
-output
df1
# a b
#1 a1 b1
#2 a1 b2
#3 a1 b3
#4 a2 b1
#5 a2 b2
#6 a2 b3
#7 a3 b1
#8 a3 b2
#9 a3 b3
Or using base R
df[] <- Map(paste0, names(df), df)
Or another option is
df[] <- paste0(names(df)[col(df)], unlist(df))

How to write two vectors of different length into one data frame by writing same values into same row?

I want to write two vectors of different length with partly equal values into one data frame. The same values should be written in the same row.
ef1 <- c('A1', 'A2', 'B0', 'B1', 'C1', 'C2')
ef2 <- c('A1', 'A2', 'C1', 'C2', 'D1', 'D2')
If I write them in one data frame, it looks like this:
df <- data.frame (ef1, ef2)
> df
ef1 ef2
1 A1 A1
2 A2 A2
3 B0 C1
4 B1 C2
5 C1 D1
6 C2 D2
But what I want is this:
> df
ef1 ef2
1 A1 A1
2 A2 A2
3 B0 NA
4 B1 NA
5 C1 C1
6 C2 C2
7 NA D1
8 NA D2
I'm grateful for any help.
One option is match
(tmp <- unique(c(ef1, ef2)))
# [1] "A1" "A2" "B0" "B1" "C1" "C2" "D1" "D2"
out <- data.frame(ef1 = ef1[match(tmp, ef1)],
ef2 = ef2[match(tmp, ef2)])
Result
out
# ef1 ef2
#1 A1 A1
#2 A2 A2
#3 B0 <NA>
#4 B1 <NA>
#5 C1 C1
#6 C2 C2
#7 <NA> D1
#8 <NA> D2
Another solution, using dplyr's full_join. The idea is to artificially create a merging column and then make a full join.
ef1<-tibble(a=ef1,ef1=ef1)
ef2<-tibble(a=ef2,ef2=ef2)
ef1 %>%
full_join(ef2,by="a") %>%
select(ef1,ef2)
# A tibble: 8 x 2
ef1 ef2
<chr> <chr>
1 A1 A1
2 A2 A2
3 B0 NA
4 B1 NA
5 C1 C1
6 C2 C2
7 NA D1
8 NA D2

group 2 variables and then delimit the strings

I am trying to group two variables and remove the comma seperated without increasing the number of row
eg:
#my dataframe
> df
g1 g2 g3
1 a1 a2 77.7,81.7
2 a1 a2 77.7,81.7
3 b2 b3 3,1,5
4 b2 b3 3,1,5
5 b2 b3 3,1,5
Expected Output:
g1 g2 g3
1 a1 a2 77.7
2 a1 a2 81.7
3 b2 b3 3
4 b2 b3 1
5 b2 b3 5
I tried some codes below but its unable to group and not comes in expected format. Please help!
Codes:
df <- data.frame(g1 = c("a1","a1","b2","b2","b2"), g2 = c("a2","a2","b3","b3","b3"), g3 = c("77.7,81.7","77.7,81.7","3,1,5","3,1,5","3,1,5"))
library(stringr)
s <- strsplit(df$g3, split = ",")
data.frame(V1 = rep(df$g1, sapply(s, length)), V2 = unlist(s))
Building on Chris Ruehlemann's answer: you can use the following and it will still work if values reappear.
df$g3_split <- unlist(lapply(split(df,df$g1), function(x) unique(unlist(strsplit(x$g3, ","))) ))
df
g1 g2 g3 g3_split
1 a1 a2 77.7,81.7 77.7
2 a1 a2 77.7,81.7 81.7
3 b2 b3 3,77.7,5 3
4 b2 b3 3,77.7,5 77.7
5 b2 b3 3,77.7,5 5
DATA:
df <- data.frame(g1 = c("a1","a1","b2","b2","b2"),
g2 = c("a2","a2","b3","b3","b3"),
g3 = c("77.7,81.7","77.7,81.7","3,1,5","3,1,5","3,1,5"), stringsAsFactors = F)
SOLUTION:
df$g3_split <- unique(unlist(strsplit(df$g3, ",")))
RESULT:
df
g1 g2 g3 g3_split
1 a1 a2 77.7,81.7 77.7
2 a1 a2 77.7,81.7 81.7
3 b2 b3 3,1,5 3
4 b2 b3 3,1,5 1
5 b2 b3 3,1,5 5
If you want to replace g3with the new values, just assign unique(unlist(strsplit(df$g3, ","))) to df$g3 instead of df$g3_split.
An option with separate_rows
library(dplyr)
library(tidyr)
df %>%
mutate( g3_split = g3) %>%
separate_rows(g3_split) %>%
distinct(g3_split, .keep_all = TRUE)

access first row of group_by dataset

I have a dataframedf1 with columns a,b,c. I want to assign c=0 to the first row of the dataset returned by group_by(a,b). I tried something like
t <- df1 %>% group_by(a,b) %>% filter(row_number(a)==1) %>% mutate(c= 0)
But it reduced number of rows. Expected output is
a b c
a1 b1 0
a1 b1 NA
a2 b2 0
a2 b2 NA
You can use seq_along to number elements in each group from 1 to the total number of elements within each group (2, in this case). Then use ifelse to set the first element of 'c' for each group to be 0 and leave the other element as is.
library(dplyr)
df %>%
group_by(a, b) %>%
mutate(c = ifelse(seq_along(c) == 1, 0, c))
# A tibble: 4 x 3
# Groups: a, b [2]
# a b c
# <fct> <fct> <dbl>
#1 a1 b1 0.
#2 a1 b1 NA
#3 a2 b2 0.
#4 a2 b2 NA
data
df <- data.frame(a = rep(c("a1", "a2"), each = 2),
b = rep(c("b1", "b2"), each = 2),
c = NA)
df
# a b c
#1 a1 b1 NA
#2 a1 b1 NA
#3 a2 b2 NA
#4 a2 b2 NA

how to sort data frame by column names in R?

How can I sort the below data frame df to df1?
df
a1 a4 a3 a5 a2
sorted data frame
df1
a1 a2 a3 a4 a5
We can use mixedorder from library(gtools)
library(gtools)
df1 <- df[mixedorder(colnames(df))]
df1
# a1 a3 a9 a10
#1 1 3 1 2
#2 2 4 2 3
#3 3 5 3 4
#4 4 6 4 5
#5 5 7 5 6
data
df <- data.frame(a1 = 1:5, a10=2:6, a3 = 3:7, a9= 1:5)
In base R, assuming the numbers in the colnames don't go into double digits.
df
# a1 a4 a3 a5 a2
#1 1 4 3 5 2
df[, order(names(df))]
# a1 a2 a3 a4 a5
#1 1 2 3 4 5
Assuming there is no "hole" in the numbers suffixing the columns names, you can also use dplyr:
library(dplyr)
df1 <- select(df, num_range("a", 1:4))

Resources