R: Reshaping Multiple Columns from Long to Wide - r

Using following data:
library(tidyverse)
sample_df <- data.frame(Letter = c("a", "a", "a", "b", "b"),
Number = c(1,2,1,3,4),
Fruit = c("Apple", "Plum", "Peach", "Pear", "Peach"))
Letter Number Fruit
a 1 Apple
a 2 Plum
a 1 Peach
b 3 Pear
b 4 Peach
I want to transform a set of values from a long to a wide format:
Letter Number_1 Number_2 Fruit_1 Fruit_2 Fruit_3
a 1 2 Apple Plum Peach
b 3 4 Pear Peach
To do so, I unsuccessfully tried to create an index of each unique group combinations using c("Letter", "Number") and c("Letter", "Fruit"). Firstly, does this index need to be created, and if so how should it be done?
# Gets Unique Values, but no Index of Unique Combinations
sample_df1 <- sample_df %>%
group_by(Letter) %>%
mutate(Id1 = n_distinct(Letter, Number),
Id2 = n_distinct(Letter, Fruit))
# Gets Following Error: Column `Id1` must be length 3 (the group size) or one, not 2
sample_df1 <- sample_df %>%
group_by(Letter) %>%
mutate(Id1 = 1:n_distinct(Letter, Number),
Id2 = 1:n_distinct(Letter, Fruit))
# NOTE: Manually Created the Index Columns to show next problem
sample_df1 <- sample_df %>%
group_by(Letter) %>%
add_column(Id1 = c(1,2,1,1,2),
Id2 = c(1,2,3,1,2))
Assuming it did need to be done, I manually appended the desired values, and partially solved the problem using developmental tidyr.
# Requires Developmental Tidyr
devtools::install_github("tidyverse/tidyr")
sample_df1 %>%
pivot_wider(names_from = c("Id1", "Id2"), values_from = c("Number", "Fruit")) %>%
set_names(~ str_replace_all(.,"(\\w+.*)(_\\d)(_\\d)", "\\1\\3"))
# Letter Number_1 Number_2 Number_3 Fruit_1 Fruit_2 Fruit_3
#<fct> <dbl> <dbl> <dbl> <fct> <fct> <fct>
# a 1 2 1 Apple Plum Peach
# b 3 4 NA Pear Peach NA
However, this approach still created an unwanted Number_3 column. Using any tidyr, data.table or any other package, is there any way of getting the data in the desired format without duplicating columns?

An option would be to replace the duplicated elements by 'Letter' to NA and then in the reshaped data, remove the columns that are all NA
library(data.table)
out <- dcast(setDT(sample_df)[, lapply(.SD, function(x)
replace(x, duplicated(x), NA)), Letter], Letter ~ rowid(Letter),
value.var = c("Number", "Fruit"))
nm1 <- out[, names(which(!colSums(!is.na(.SD))))]
out[, (nm1) := NULL][]
# Letter Number_1 Number_2 Fruit_1 Fruit_2 Fruit_3
#1: a 1 2 Apple Plum Peach
#2: b 3 4 Pear Peach <NA>
If we want to use the tidyverse approach, a similar option can be used. Note that pivot_wider is from the dev version of tidyr (tidyr_0.8.3.9000)
library(tidyverse)
sample_df %>%
group_by(Letter) %>%
mutate_at(vars(-group_cols()), ~ replace(., duplicated(.), NA)) %>%
mutate(rn = row_number()) %>%
pivot_wider(
names_from = rn,
values_from = c("Number", "Fruit")) %>%
select_if(~ any(!is.na(.)))
# A tibble: 2 x 6
# Letter Number_1 Number_2 Fruit_1 Fruit_2 Fruit_3
# <fct> <dbl> <dbl> <fct> <fct> <fct>
#1 a 1 2 Apple Plum Peach
#2 b 3 4 Pear Peach <NA>

Related

Add more rows based on a grouping variable R

I'd like to add more rows to my dataset based on a grouping variable. Right now, my data has 2 rows but I would like 3 rows and the var app to be repeated for the third row.
This is what my data currently looks like:
my_data <- data.frame(app = c('a','b'), type = c('blue','red'), code = c(1:2), type_2 = c(NA, 'blue'), code_2 = c(NA, 3))
app type code type_2 code_2
a blue 1 NA NA
b red 2 blue 3
I would like the data to look like this:
app type code
a blue 1
b red 2
b blue 3
library(data.table)
setDT(my_data)
res <-
melt(
my_data,
id.vars = "app",
measure.vars = patterns(c("^type", "^code")),
value.name = c("type", "code")
)[!is.na(type), .(app, type, code)]
Using tidyverse
library(dplyr)
library(stringr)
library(tidyr)
my_data %>%
rename_at(vars(c(type, code)), ~ str_c(., "_1")) %>%
pivot_longer(cols = -app, names_to = c(".value", "grp"), names_sep = "_",
values_drop_na = TRUE) %>% select(-grp)
# A tibble: 3 x 3
# app type code
# <chr> <chr> <dbl>
#1 a blue 1
#2 b red 2
#3 b blue 3

paste column elements with condition in r

I have a data frame and I want to paste elements in name1, name2 and name3 which do not contain NA.
c <- data.frame(name1 = letters[1:3],
name2 = c('A', NA, 'C'),
name3 = c('pig', 'cow', NA)
)
The result should like this:
c %>% mutate(new_name = c('a&A&pig', 'b&cow', 'c&C'))
When I use paste0() it binds all the elements including NA. I do not want this.
c %>% mutate(new_name = paste0(name1,'&', name2, '&', name3))
Then I tried another two method. One is split the data frame into list with group_split(), the other is nest the data frame by index. And then use map() and select() to select the column that do not contain NA after the two methods but all failed.
c %>%
mutate(index = row_number()) %>%
group_split(index) %>%
map(select(~where(~!any(is.na(.)))))
c %>%
mutate(index = row_number()) %>%
nest(data = name1:name3) %>%
mutate(without_NA_data = map(data, select(~where(~!any(is.na(.))))))
Is there any way I can get what I want?
Any help will be highly appreciated!
We can use rowwise with c_across by loading only dplyr package
library(dplyr)
c %>%
rowwise %>%
mutate(new_name = paste(na.omit(c_across(everything())), collapse="&")) %>%
ungroup
# A tibble: 3 x 4
# name1 name2 name3 new_name
# <chr> <chr> <chr> <chr>
#1 a A pig a&A&pig
#2 b <NA> cow b&cow
#3 c C <NA> c&C
Or with pmap
library(purrr)
c %>%
mutate(new_name = pmap_chr(., ~ paste(na.omit(c(...)), collapse="&")))
# name1 name2 name3 new_name
#1 a A pig a&A&pig
#2 b <NA> cow b&cow
#3 c C <NA> c&C
Or using base R with paste and replace
trimws(do.call(paste, c(replace(c, is.na(c), ''), sep="&")), whitespace = "&")
#[1] "a&A&pig" "b&&cow" "c&C"
Or using apply
apply(c, 1, function(x) paste(na.omit(x), collapse="&"))
#[1] "a&A&pig" "b&cow" "c&C"
Or paste first and remove the NA substring
gsub("&NA|NA&|NA$", "", do.call(paste, c(c, sep="&")))
#[1] "a&A&pig" "b&cow" "c&C"
We can use unite from tidyr by using na.rm = TRUE to remove NA values
tidyr::unite(c, new_name, starts_with('name'),
sep = '&', na.rm = TRUE, remove = FALSE)
# new_name name1 name2 name3
#1 a&A&pig a A pig
#2 b&cow b <NA> cow
#3 c&C c C <NA>

How to merge rows in a dataframe and combine factor-values in cells

I have a dataframe in R that in which I want to merge certain rows and combine the values of certain cells in these rows. Imagine the following data frame:
Col.1<-c("a","b","b","a","c","c","c","d")
Col.2<-c("mouse", "cat", "dog", "bird", "giraffe", "elephant", "zebra", "worm")
df<-data.frame(Col.1, Col.2)
df
Col.1 Col.2
a mouse
b cat
b dog
a bird
c giraffe
c elephant
c zebra
d worm
I would like to merge all adjacent rows in which the values in Col.1 are the same and combine the values in Col.2 accordingly.
The final result should look like this:
Col.1 Col.2
a mouse
b cat dog
a bird
c giraffe elephant zebra
d worm
I have tried to use a dplyr-solution (like:ddply(df, .(Col.1), summarize, Col.2 = sum(Col.2))), but the sum-command doesn't work for factor-values.
We can do a group by paste. To do the grouping for adjacent similar elements, rleid from data.table can be used, and then summarise the values of 'Col.2' by pasteing
library(dplyr)
library(data.table)
library(stringr)
df %>%
group_by(Col.1, grp = rleid(Col.1)) %>%
summarise(Col.2 = str_c(Col.2, collapse=' ')) %>%
ungroup %>%
select(-grp)
# A tibble: 5 x 2
# Col.1 Col.2
# <fct> <chr>
#1 a mouse
#2 a bird
#3 b cat dog
#4 c giraffe elephant zebra
#5 d worm
NOTE: This matches the output showed in the OP's post
EDIT: missed the "adjacent" bit. See the solution using base function rle below from this question.
Col.1 <- c("a","b","b","a","c","c","c","d")
Col.2 <- c("mouse", "cat", "dog", "bird", "giraffe", "elephant", "zebra", "worm")
df <- tibble(Col.1, Col.2)
rlel <- rle(df$Col.1)$length
df %>%
mutate(adj = unlist(lapply(1:length(rlel), function(i) rep(i, rlel[i])))) %>%
group_by(Col.1, adj) %>%
summarize(New.Col.2 = paste(Col.2, collapse = " ")) %>%
ungroup %>% arrange(adj) %>% select(-adj)
# A tibble: 5 x 2
Col.1 New.Col.2
<chr> <chr>
1 a mouse
2 b cat dog
3 a bird
4 c giraffe elephant zebra
5 d worm

Put the combinations matrix of many rows in a column of a dataframe, then split it

I have a dataframe that looks like this (I simplify):
df <- data.frame(rbind(c(1, "dog", "cat", "rabbit"), c(2, "apple", "peach", "cucumber")))
colnames(df) <- c("ID", "V1", "V2", "V3")
## ID V1 V2 V3
## 1 1 dog cat rabbit
## 2 2 apple peach cucumber
I would like to create a column containing all possible combinations of variables V1:V3 two by two (order doesn't matter), but keeping a link with the original ID. So something like this.
## ID bigrams
## 1 1 dog cat
## 2 1 cat rabbit
## 3 1 dog rabbit
## 4 2 apple peach
## 5 2 apple cucumber
## 6 2 peach cucumber
My idea: use combn(), mutate() and separate_row().
library(tidyr)
library(dplyr)
df %>%
mutate(bigrams=paste(unlist(t(combn(df[,2:4],2))), collapse="-")) %>%
separate_rows(bigrams, sep="-") %>%
select(ID,bigrams)
The result is not what I expected... I guess that concatenating a matrix (the result of combine()) is not as easy as that.
I have two questions about this: 1) how to debug this code? 2) Is this a good way to do this kind of thing? I'm new on R but I’ve an Open Refine background, so concatenate-split multivalued cells make a lot of sense for me. But is this also the right method with R?
Thanks in advance for any help.
We can do this with data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), melt it to 'long' format, grouped by 'ID', get the combn of 'value' and paste it together
library(data.table)
dM <- melt(setDT(df), id.var = "ID")[, combn(value, 2, FUN = paste, collapse=' '), ID]
setnames(dM, 2, 'bigrams')[]
# ID bigrams
#1: 1 dog cat
#2: 1 dog rabbit
#3: 1 cat rabbit
#4: 2 apple peach
#5: 2 apple cucumber
#6: 2 peach cucumber
I recommend #akrun's "melt first" approach, but just for fun, here are more ways to do it:
library(tidyverse)
df %>%
mutate_all(as.character) %>%
transmute(ID = ID, bigrams = pmap(
list(V1, V2, V3),
function(a, b, c) combn(c(a, b, c), 2, paste, collapse = " ")
))
# ID bigrams
# 1 1 dog cat, dog rabbit, cat rabbit
# 2 2 apple peach, apple cucumber, peach cucumber
(mutate_all(as.character) just because you gave us factors, and factor to character conversion can be surprising).
df %>%
mutate_all(as.character) %>%
nest(-ID) %>%
mutate(bigrams = map(data, combn, 2, paste, collapse = " ")) %>%
unnest(data) %>%
as.data.frame()
# ID bigrams V1 V2 V3
# 1 1 dog cat, dog rabbit, cat rabbit dog cat rabbit
# 2 2 apple peach, apple cucumber, peach cucumber apple peach cucumber
(as.data.frame() just for a prettier printing)

Matching columns in R and adding frequencies against them

I need to match my values in col1 with col 2 and col3 and if they match i need to add their frequencies.It should display the count from freq1 freq2 and freq3 of the unique values.
col1 freq1 col2 freq2 col3 freq3
apple 3 grapes 4 apple 1
grapes 5 apple 2 orange 2
orange 4 banana 5 grapes 2
guava 3 orange 6 banana 7
I need my output like this
apple 6
grapes 11
orange 12
guava 3
banana 12
I m a beginner.How do I code this in R.
We can use melt from data.table with patterns specified in the measure argument to convert the 'wide' format to 'long' format, then grouped by 'col', we get the sum of 'freq' column
library(data.table)
melt(setDT(df1), measure = patterns("^col", "^freq"),
value.name = c("col", "freq"))[,.(freq = sum(freq)) , by = col]
# col freq
#1: apple 6
#2: grapes 11
#3: orange 12
#4: guava 3
#5: banana 12
If it is alternating 'col', 'freq', columns, we can just unlist the subset of 'col' columns and 'freq' columns separately to create a data.frame (using c(TRUE, FALSE) to recycle for subsetting columns), and then use aggregate from base R to get the sum grouped by 'col'.
aggregate(freq~col, data.frame(col = unlist(df1[c(TRUE, FALSE)]),
freq = unlist(df1[c(FALSE, TRUE)])), sum)
# col freq
#1 apple 6
#2 banana 12
#3 grapes 11
#4 guava 3
#5 orange 12
I think that the easiest to understand for newbie would be creating 3 separate dataframes (I assumed here that your dataframe name is df):
df1 <- data.frame(df$col1, df$freq1)
colnames(df1) <- c("fruit", "freq")
df2 <- data.frame(df$col2, df$freq2)
colnames(df2) <- c("fruit", "freq")
df3 <- data.frame(df$col3, df$freq3)
colnames(df3) <- c("fruit", "freq")
Then bind all dataframes by rows:
df <- rbind(df1, df2, df3)
And at the end group by fruit and sum frequencies using dplyr library.
library(dplyr)
df <- df %>%
group_by(fruit)%>%
summarise(sum(freq))

Resources