Coalescing multiple chunks of columns with the same suffix in names (R) - r

I have a dataset with various "chunks" of columns with different prefixes, but the same suffix:
ID
A034
B034
C034
D034
A099
B099
A123
B123
...
1
NA
1
NA
NA
NA
3
1
NA
...
2
2
NA
NA
NA
2
NA
NA
2
...
3
NA
NA
2
NA
NA
2
1
NA
...
The number of columns within each "chunk" also varies. Is there any way (other than manually, which is what I have been painstakingly doing with coalesce(!!! select(., contains("XXX")))) to automatically coalesce by chunk based on the shared suffix? That is, the result should resemble
ID
034
099
123
...
1
1
3
1
...
2
2
2
2
...
3
2
2
1
...
I'm not sure how to begin doing something like this, so any suggestions would be very helpful.

We reshape the data into 'long' format with pivot_longer, then we group by 'ID' and loop across the other columns, apply the na.omit to remove the NA elements (we assume that there is only one non-NA per each column by group)
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(cols = -ID, names_to = ".value",
names_pattern = "[A-Z](\\d+)") %>%
group_by(ID) %>%
summarise(across(everything(), na.omit), .groups = 'drop')
-output
# A tibble: 3 x 4
ID `034` `099` `123`
<int> <int> <int> <int>
1 1 1 3 1
2 2 2 2 2
3 3 2 2 1
Or to be safe, use complete.cases to create a logical vector for non-NA elements, and extract the first element (assuming we need only a single non-NA - if the non-NA lengths are different, we may need to return a list)
df1 %>%
pivot_longer(cols = -ID, names_to = ".value",
names_pattern = "[A-Z](\\d+)") %>%
group_by(ID) %>%
summarise(across(everything(), ~ .[complete.cases(.)][1]))
data
df1 <- structure(list(ID = 1:3, A034 = c(NA, 2L, NA), B034 = c(1L, NA,
NA), C034 = c(NA, NA, 2L), D034 = c(NA, NA, NA), A099 = c(NA,
2L, NA), B099 = c(3L, NA, 2L), A123 = c(1L, NA, 1L), B123 = c(NA,
2L, NA)), class = "data.frame", row.names = c(NA, -3L))

one more approach
library(tidyverse)
split(names(df1)[-1], gsub('^\\D*(\\d+)$', '\\1', names(df1)[-1])) %>% map(~df1[c('ID', .x)]) %>%
imap(~ .x %>% group_by(ID) %>% rowwise %>% transmute(!!.y := first(na.omit(c_across(everything())))) %>% ungroup) %>%
reduce(left_join, by = 'ID')
#> # A tibble: 3 x 4
#> ID `034` `099` `123`
#> <int> <int> <int> <int>
#> 1 1 1 3 1
#> 2 2 2 2 2
#> 3 3 2 2 1
Created on 2021-06-20 by the reprex package (v2.0.0)

Related

Coalescing multiple columns from both the left and right side

Given the following data
df1 <- structure(list(ID = 1:3, alpha_1 = c(2L, 2L, 3L),
alpha_2 = c(1L, 2L,
3L), alpha_3 = c(4L, 4L, 2L), alpha_4 = c(3L, NA, NA), beta_1 = c(NA,
2L, NA), beta_2 = c(3L, NA, 2L), charlie_1 = c(1L, NA, 1L), charlie_2 = c(NA,
2L, NA)), class = "data.frame", row.names = c(NA, -3L))
I'm trying to coalesce all columns sharing the same initial prefix name (i.e. coalesce alpha_1, alpha_2, alpha_3, alpha_4, and coalesce beta_1 beta_2, etc.), but from both the left and right sides. That is, I want to generate two new variables, say 'alpha_left' and 'alpha_right', whose columns would be, in this example, (2, 2, 3) and (3, 4, 2) respectively (first non-missing elements from the left and right side of the dataframe).
User #akrun offered a great solution for the coalescing part here, but I'm unsure how to create two new variables from both the left and right coalesces.
Here is an option in tidyverse
Reshape to 'long' format - pivot_longer
Grouped by 'ID'
Do the summarise across the columns 'alpha' till 'charlie'
Get the column name - cur_column()
Create a tibble with the first non-NA element from the left and the right
Change the column names by appending the 'nm1' as prefix
Finally, unnest the list columns created in summarise
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
pivot_longer(cols = contains("_"),
names_to = c( ".value", "grp"), names_sep = "_") %>%
group_by(ID) %>%
summarise(across(alpha:charlie, ~ {
nm1 <- cur_column()
tbl1 <- tibble(left= .[complete.cases(.)][1],
right = rev(.)[complete.cases(rev(.))][1]);
names(tbl1) <- str_c(nm1, "_", names(tbl1))
list(tbl1)})) %>%
unnest(c(alpha, beta, charlie))
-output
# A tibble: 3 x 7
ID alpha_left alpha_right beta_left beta_right charlie_left charlie_right
<int> <int> <int> <int> <int> <int> <int>
1 1 2 3 3 3 1 1
2 2 2 4 2 2 2 2
3 3 3 2 2 2 1 1
Or using base R
lst1 <- lapply(split.default(df1[-1], sub("_\\d+$", "", names(df1)[-1])),
function(x) {
x1 <- apply(x, 1, function(y) {
y1 <- na.omit(y)
if(length(y1) > 1 ) y1[c(1, length(y1))] else y1[1]
})
if(is.vector(x1)) as.data.frame(matrix(x1)) else as.data.frame(t(x1))
})
You could also do:
df1[-1] %>%
split.default(sub("_\\d+", "", names(.))) %>%
imap_dfc(~data.frame(right = coalesce(!!!.x),
left = coalesce(!!!rev(.x))) %>%
set_names(paste(.y, names(.), sep="_")))
alpha_right alpha_left beta_right beta_left charlie_right charlie_left
1 2 3 3 3 1 1
2 2 4 2 2 2 2
3 3 2 2 2 1 1
One more approach not as elegant as #Onyambu's
library(tidyverse)
df1[-1] %>%
split.default(sub("_\\d+", "", names(.))) %>%
imap_dfc(~ .x %>% rowwise() %>%
mutate(!!paste0(.y, '_left') := head(na.omit(c_across(everything())),1),
!!paste0(.y, '_right') := tail(na.omit(c_across(!last_col())),1),
.keep = 'none' )
)
#> # A tibble: 3 x 6
#> # Rowwise:
#> alpha_left alpha_right beta_left beta_right charlie_left charlie_right
#> <int> <int> <int> <int> <int> <int>
#> 1 2 3 3 3 1 1
#> 2 2 4 2 2 2 2
#> 3 3 2 2 2 1 1
Created on 2021-06-19 by the reprex package (v2.0.0)
Another option
library(tidyverse)
df1 <- structure(list(ID = 1:3, alpha_1 = c(2L, 2L, 3L),
alpha_2 = c(1L, 2L,
3L), alpha_3 = c(4L, 4L, 2L), alpha_4 = c(3L, NA, NA), beta_1 = c(NA,
2L, NA), beta_2 = c(3L, NA, 2L), charlie_1 = c(1L, NA, 1L), charlie_2 = c(NA,
2L, NA)), class = "data.frame", row.names = c(NA, -3L))
df1 %>%
pivot_longer(cols = -ID, names_sep = "_", names_to = c(".value", "set")) %>%
group_by(ID) %>%
fill(alpha:charlie, .direction = "updown") %>%
filter(set %in% range(set)) %>%
mutate(set = c("left", "right")) %>%
pivot_wider(id_cols = ID, names_from = set, values_from = alpha:charlie)
#> # A tibble: 3 x 7
#> # Groups: ID [3]
#> ID alpha_left alpha_right beta_left beta_right charlie_left charlie_right
#> <int> <int> <int> <int> <int> <int> <int>
#> 1 1 2 3 3 3 1 1
#> 2 2 2 4 2 2 2 2
#> 3 3 3 2 2 2 1 1
Created on 2021-06-20 by the reprex package (v2.0.0)

Using tidyverse to clean up rank-choice survey

I have survey data in R that looks like this, where I've presented people with two groups of actions - High and Low - and asked them to rank each action. Each group contains unique actions, marked by the letter (6 actions in total).
id A_High B_High C_High D_Low E_Low F_Low
001 5 2 1 6 4 3
002 6 4 3 5 2 1
003 3 1 6 2 4 5
004 6 5 2 1 3 4
I need a new df that looks like the one below, where each High action is assigned a new numeric rank (between 0 and 3) corresponding to the number of Low action items that were ranked below that High action.
For example, a person with id 001 ranked A_High at number 5, B_High at 2, and C_High at 1. A_High's new rank would be 1 (since only 1 Low action, D_Low is ranked below A_High), B_High's new rank would be 3 (since all 3 Low actions were ranked below B_High), and C_High's new rank would be 3 (since all 3 Low actions were ranked below C_High).
id A_High_rank B_High_rank C_High_rank
001 1 3 3
002 0 1 1
003 2 3 0
004 0 0 2
I have a sense that this can be done with if/else statements but suspect that there should be a far more efficient way of achieving this with tidyverse. In the real dataset, I have 1000+ rows and 12 actions (6 High and 6 Low). I would appreciate any help on this.
Thanks!
Data:
"id A_High B_High C_High D_Low E_Low F_Low
001 5 2 1 6 4 3
002 6 4 3 5 2 1
003 3 1 6 2 4 5
004 6 5 2 1 3 4"
A base R option would be to loop over the 'High' columns, get the rowSums of the logical matrix created by checking if it less than the 'Low' column, and rename those output by appending _rank as suffix
out <- cbind(df1[1], sapply(df1[2:4],
function(x) rowSums(x < df1[endsWith(names(df1), 'Low')])))
names(out)[-1] <- paste0(names(out)[-1], "_rank")
-output
out
# id A_High_rank B_High_rank C_High_rank
#1 1 1 3 3
#2 2 0 1 1
#3 3 2 3 0
#4 4 0 0 2
Or using dplyr
library(dplyr)
df1 %>%
transmute(id, across(ends_with('High'),
~ rowSums(. < select(df1, ends_with('Low'))), .names = '{.col}_rank'))
# id A_High_rank B_High_rank C_High_rank
#1 1 1 3 3
#2 2 0 1 1
#3 3 2 3 0
#4 4 0 0 2
data
df1 <- structure(list(id = 1:4, A_High = c(5L, 6L, 3L, 6L), B_High = c(2L,
4L, 1L, 5L), C_High = c(1L, 3L, 6L, 2L), D_Low = c(6L, 5L, 2L,
1L), E_Low = c(4L, 2L, 4L, 3L), F_Low = c(3L, 1L, 5L, 4L)),
class = "data.frame", row.names = c(NA,
-4L))
After much suffering, this is the tidyverse solution I came up with. This was fun!
library(tidyverse)
data %>%
pivot_longer(cols = ends_with("_High"), names_to = "High Variables", values_to = "High") %>%
pivot_longer(cols = ends_with("_Low"), names_to = "Low Variables", values_to = "Low") %>%
filter(High-Low < 0) %>%
group_by(`High Variables`, `id`) %>%
summarise(Count = n()) %>%
pivot_wider(names_from = `High Variables`, values_from = Count) %>%
arrange(id)
Translation:
The first two line create two pairs of columns and leave id untouched. Each pair has two columns, one with the original column names, and the other with the values. Each pait of columns represents either High or Low.
Then, I filtered all the rows, keeping only those where Low was greater than High. Then I counted how many where left for each id and reversed back the format.
Now I just have to figure out how to turn those NAs into 0s.
Here's the output:
> data %>%
+ pivot_longer(cols = ends_with("_High"), names_to = "High Variables", values_to = "High") %>%
+ pivot_longer(cols = ends_with("_Low"), names_to = "Low Variables", values_to = "Low") %>%
+ filter(High < Low) %>%
+ group_by(`High Variables`, `id`) %>%
+ summarise(Count = n()) %>%
+ pivot_wider(names_from = `High Variables`, values_from = Count) %>%
+ arrange(id)
`summarise()` regrouping output by 'High Variables' (override with `.groups` argument)
# A tibble: 4 x 4
id A_High B_High C_High
<int> <int> <int> <int>
1 1 1 3 3
2 2 NA 1 1
3 3 2 3 NA
4 4 NA NA 2

Pivot Wide with Custom Names, Original Values in the cell

I have data that is set up like the following - the CODE variable is character and needs to remain as it is because the numbers have meaning.
ID CODE
1 1.0
1 0.00
1 9.99
2 40.56
3 33.54
3 0.00
How would I use pivot wider to rearrange it so it is like the following, where I can have 4 CODE columns and if there isn't a fourth code per ID, it is just left blank
ID CODE_1 CODE_2 CODE_3 CODE_4
1 1.0 0.00 9.99 "."
2 40.56 "." "." "."
3 33.54 0.00 "." "."
Thank you!
This approach can be close to what you want. You can use tidyverse function complete() to enable the level not present in your original values. Here the code:
library(tidyverse)
#Code
df <- df %>% group_by(ID) %>% mutate(Var=factor(paste0('CODE_',row_number()),
levels = paste0('CODE_',1:4),
labels = paste0('CODE_',1:4),ordered = T,
exclude = F)) %>%
complete(Var = Var) %>%
pivot_wider(names_from = Var,values_from=CODE)
Output:
# A tibble: 3 x 5
# Groups: ID [3]
ID CODE_1 CODE_2 CODE_3 CODE_4
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1 0 9.99 NA
2 2 40.6 NA NA NA
3 3 33.5 0 NA NA
Some data used:
#Data
df <- structure(list(ID = c(1L, 1L, 1L, 2L, 3L, 3L), CODE = c(1, 0,
9.99, 40.56, 33.54, 0)), class = "data.frame", row.names = c(NA,
-6L))
If you really want dots for missing values, you have to transform the variables to character and then assign the replace like this:
#Code 2
df <- df %>% group_by(ID) %>% mutate(Var=factor(paste0('CODE_',row_number()),
levels = paste0('CODE_',1:4),
labels = paste0('CODE_',1:4),ordered = T,
exclude = F)) %>%
complete(Var = Var) %>%
pivot_wider(names_from = Var,values_from=CODE) %>%
mutate(across(CODE_1:CODE_4,~as.character(.))) %>%
replace(is.na(.),'.')
Output:
# A tibble: 3 x 5
# Groups: ID [3]
ID CODE_1 CODE_2 CODE_3 CODE_4
<int> <chr> <chr> <chr> <chr>
1 1 1 0 9.99 .
2 2 40.56 . . .
3 3 33.54 0 . .
We can use dcast from data.table
library(data.table)
dcast(setDT(df), ID ~ paste0("CODE_", rowid(ID)), value.var = 'CODE')
# ID CODE_1 CODE_2 CODE_3
#1: 1 1.00 0 9.99
#2: 2 40.56 NA NA
#3: 3 33.54 0 NA
data
df <- structure(list(ID = c(1L, 1L, 1L, 2L, 3L, 3L), CODE = c(1, 0,
9.99, 40.56, 33.54, 0)), class = "data.frame", row.names = c(NA,
-6L))

R function to paste information from different rows with a common column? [duplicate]

This question already has an answer here:
dplyr::first() to choose first non NA value
(1 answer)
Closed 2 years ago.
I understand we can use the dplyr function coalesce() to unite different columns, but is there such function to unite rows?
I am struggling with a confusing incomplete/doubled dataframe with duplicate rows for the same id, but with different columns filled. E.g.
id sex age source
12 M NA 1
12 NA 3 1
13 NA 2 2
13 NA NA NA
13 F 2 NA
and I am trying to achieve:
id sex age source
12 M 3 1
13 F 2 2
You can try:
library(dplyr)
#Data
df <- structure(list(id = c(12L, 12L, 13L, 13L, 13L), sex = structure(c(2L,
NA, NA, NA, 1L), .Label = c("F", "M"), class = "factor"), age = c(NA,
3L, 2L, NA, 2L), source = c(1L, 1L, 2L, NA, NA)), class = "data.frame", row.names = c(NA,
-5L))
df %>%
group_by(id) %>%
fill(everything(), .direction = "down") %>%
fill(everything(), .direction = "up") %>%
slice(1)
# A tibble: 2 x 4
# Groups: id [2]
id sex age source
<int> <fct> <int> <int>
1 12 M 3 1
2 13 F 2 2
As mentioned by #A5C1D2H2I1M1N2O1R2T1 you can select the first non-NA value in each group. This can be done using dplyr :
library(dplyr)
df %>% group_by(id) %>% summarise(across(.fns = ~na.omit(.)[1]))
# A tibble: 2 x 4
# id sex age source
# <int> <fct> <int> <int>
#1 12 M 3 1
#2 13 F 2 2
Base R :
aggregate(.~id, df, function(x) na.omit(x)[1], na.action = 'na.pass')
Or data.table :
library(data.table)
setDT(df)[, lapply(.SD, function(x) na.omit(x)[1]), id]

Can you use pivot_wider to create multiple groups of alternating new columns?

My data currently looks like this, with the column "Number_Code based on each different Side_Effect:
Session_ID Side_Effect Number_Code
1 anxious 1
1 dizzy 2
1 relaxed 3
3 dizzy 2
7 nauseous 4
7 anxious 1
I know I can do:
mutate(rn = str_c('side_effect_', row_number())) %>%
pivot_wider(names_from = rn, values_from = Side_Effect)
In order to create new column names and put each side effect into a new column like this:
session Number_Code side_effect1 side effect_2 side_effect_3
1 1 anxious NA NA
1 2 NA dizzy NA
1 3 NA NA relaxed
3 2 dizzy NA NA
7 4 nauseous NA NA
7 1 NA anxious NA
But I need to widen the data based on both "Side_Effect" and "Number_Code", and have them in alternating columns like this:
session side_effect1 number_code1 side effect_2 number_code2 side_effect_3 number_code3
1 anxious 1 dizzy 2 relaxed 3
3 dizzy 2 NA NA NA NA
7 nauseous 4 anxious 1 NA NA
I saw another post where they widened the data based on two variables, but all of the columns for the second one were after all of the columns of the first one. Is there a way to get them to alternate like this? Thank you!!
The pivot_wider can take multiple value_from columns, so after creating the sequence by group, use pivot_wider with values_from specifying the columns of interest
library(dplyr)
library(tidyr)
df1 %>%
group_by(Session_ID) %>%
mutate(rn = row_number()) %>%
ungroup %>%
pivot_wider(names_from = rn, values_from = c(Side_Effect, Number_Code))
# A tibble: 3 x 7
# Session_ID Side_Effect_1 Side_Effect_2 Side_Effect_3 Number_Code_1 Number_Code_2 Number_Code_3
# <int> <chr> <chr> <chr> <int> <int> <int>
#1 1 anxious dizzy relaxed 1 2 3
#2 3 dizzy <NA> <NA> 2 NA NA
#3 7 nauseous anxious <NA> 4 1 NA
If we need to reorder the column order, then we can select based on the numeric part and order
df1 %>%
group_by(Session_ID) %>%
mutate(rn = row_number()) %>%
ungroup %>%
pivot_wider(names_from = rn, values_from = c(Side_Effect, Number_Code)) %>%
select(Session_ID, names(.)[-1][order(readr::parse_number(names(.)[-1]))] )
# A tibble: 3 x 7
# Session_ID Side_Effect_1 Number_Code_1 Side_Effect_2 Number_Code_2 Side_Effect_3 Number_Code_3
# <int> <chr> <int> <chr> <int> <chr> <int>
#1 1 anxious 1 dizzy 2 relaxed 3
#2 3 dizzy 2 <NA> NA <NA> NA
#3 7 nauseous 4 anxious 1 <NA> NA
data
df1 <- structure(list(Session_ID = c(1L, 1L, 1L, 3L, 7L, 7L),
Side_Effect = c("anxious",
"dizzy", "relaxed", "dizzy", "nauseous", "anxious"), Number_Code = c(1L,
2L, 3L, 2L, 4L, 1L)), class = "data.frame", row.names = c(NA,
-6L))
I think this is best achieved via the pivot_*_spec() interface which allows the building of a specification data frame. This data frame determines both the names and the variable order of the pivoted data.
library(tidyr)
library(dplyr)
df <- df %>%
group_by(Session_ID) %>%
mutate(row_id = factor(row_number(), labels = c("first", "next", "last")[1:max(row_number())])) %>%
ungroup()
spec <- df %>%
build_wider_spec(names_from = row_id, values_from = c(Side_Effect, Number_Code))
spec
# A tibble: 6 x 3
.name .value row_id
<chr> <chr> <fct>
1 Side_Effect_first Side_Effect first
2 Side_Effect_next Side_Effect next
3 Side_Effect_last Side_Effect last
4 Number_Code_first Number_Code first
5 Number_Code_next Number_Code next
6 Number_Code_last Number_Code last
Because the column order of the pivot is determined by the specification data row order, arrange() can be used to flexibly control the final order of the pivot (where factors can be used, as in the data above, to fine tune the order of text variable names). Some examples:
# Alternating by row id
spec %>%
arrange(row_id) %>%
pivot_wider_spec(df, .)
# A tibble: 3 x 7
Session_ID Side_Effect_first Number_Code_first Side_Effect_next Number_Code_next Side_Effect_last Number_Code_last
<int> <chr> <int> <chr> <int> <chr> <int>
1 1 anxious 1 dizzy 2 relaxed 3
2 3 dizzy 2 NA NA NA NA
3 7 nauseous 4 anxious 1 NA NA
# Alternate by row_id and .value in ascending order
spec %>%
arrange(row_id, .value) %>%
pivot_wider_spec(df, .)
# A tibble: 3 x 7
Session_ID Number_Code_first Side_Effect_first Number_Code_next Side_Effect_next Number_Code_last Side_Effect_last
<int> <int> <chr> <int> <chr> <int> <chr>
1 1 1 anxious 2 dizzy 3 relaxed
2 3 2 dizzy NA NA NA NA
3 7 4 nauseous 1 anxious NA NA
# .value ascending row_id descending
spec %>%
arrange(.value, desc(row_id)) %>%
pivot_wider_spec(df, .)
# A tibble: 3 x 7
Session_ID Number_Code_last Number_Code_next Number_Code_first Side_Effect_last Side_Effect_next Side_Effect_first
<int> <int> <int> <int> <chr> <chr> <chr>
1 1 3 2 1 relaxed dizzy anxious
2 3 NA NA 2 NA NA dizzy
3 7 NA 1 4 NA anxious nauseous
Why not do it just based on position?
library(tidyverse)
# Data
d <- structure(list(Session_ID = c(1, 1, 1, 3, 7, 7), Side_Effect = c("anxious",
"dizzy", "relaxed", "dizzy", "nauseous", "anxious"), Number_Code = c(1,
2, 3, 2, 4, 1)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L), spec = structure(list(cols = list(
Session_ID = structure(list(), class = c("collector_double",
"collector")), Side_Effect = structure(list(), class = c("collector_character",
"collector")), Number_Code = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 2L), class = "col_spec"))
# Solution
d %>%
group_by(Session_ID) %>%
mutate(rn = LETTERS[row_number()]) %>%
ungroup() %>%
pivot_wider(names_from = rn, values_from = c(Side_Effect, Number_Code)) %>%
select(1, as.vector(t(matrix(2:length(.), ncol = 2))))
#> # A tibble: 3 × 7
#> Session_ID Side_Effect_A Number_Code_A Side_Effect_B Number_Code_B
#> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 1 anxious 1 dizzy 2
#> 2 3 dizzy 2 <NA> NA
#> 3 7 nauseous 4 anxious 1
#> # … with 2 more variables: Side_Effect_C <chr>, Number_Code_C <dbl>
This needs adjusting though, if there is more than one ID (2 in the following example):
select(1:2, as.vector(t(matrix(3:length(.), ncol = 2))))
Also, if there are more values_from variables, then the ncol-argument needs to be adjusted (3 in the following example):
select(1, as.vector(t(matrix(2:length(.), ncol = 3))))
I think the following base R code with reshape can help
reshape(
transform(
df,
rid = ave(Session_ID, Session_ID, FUN = seq_along)
),
direction = "wide",
idvar = "Session_ID",
timevar = "rid"
)
which gives
Session_ID Side_Effect.1 Number_Code.1 Side_Effect.2 Number_Code.2
1 1 anxious 1 dizzy 2
4 3 dizzy 2 <NA> NA
5 7 nauseous 4 anxious 1
Side_Effect.3 Number_Code.3
1 relaxed 3
4 <NA> NA
5 <NA> NA
Yes, it's possible, but row_number() is numeric by definition, so I think #akrun's answer is the best approach. Having said that, here is a potential solution for names that are 'both text':
library(tidyverse)
df1 <- data.frame(
stringsAsFactors = FALSE,
Session_ID = c(1L, 1L, 1L, 3L, 7L, 7L),
Side_Effect = c("anxious","dizzy","relaxed",
"dizzy","nauseous","anxious"),
Number_Code = c(1L, 2L, 3L, 2L, 4L, 1L)
)
df2 <- df1 %>%
group_by(Session_ID) %>%
mutate(rn = LETTERS[row_number()]) %>%
ungroup() %>%
pivot_wider(names_from = rn, values_from = c(Side_Effect, Number_Code)) %>%
select(Session_ID, names(.)[-c(1)][as.integer(rbind(seq_along(names(.)[-c(1)]), seq_along(names(.)[-c(1)]) + ceiling(length(names(.)[-c(1)])/2)))[seq_along(names(.)[-c(1)])]])
df2
#> # A tibble: 3 × 7
#> Session_ID Side_Effect_A Number_Code_A Side_Effect_B Number_Code_B
#> <int> <chr> <int> <chr> <int>
#> 1 1 anxious 1 dizzy 2
#> 2 3 dizzy 2 <NA> NA
#> 3 7 nauseous 4 anxious 1
#> # … with 2 more variables: Side_Effect_C <chr>, Number_Code_C <int>
# This can be simplified with a function, e.g.
ordering_func <- function(indices){
as.integer(rbind(indices, indices + ceiling(length(indices)/2))[indices])
}
df1 %>%
group_by(Session_ID) %>%
mutate(rn = LETTERS[row_number()]) %>%
ungroup() %>%
pivot_wider(names_from = rn, values_from = c(Side_Effect, Number_Code)) %>%
select(Session_ID, names(.)[-c(1)][ordering_func(seq_along(names(.)[-c(1)]))])
#> # A tibble: 3 × 7
#> Session_ID Side_Effect_A Number_Code_A Side_Effect_B Number_Code_B Side_Effect_C Number_Code_C
#> <int> <chr> <int> <chr> <int> <chr> <int>
#> 1 1 anxious 1 dizzy 2 relaxed 3
#> 2 3 dizzy 2 NA NA NA NA
#> 3 7 nauseous 4 anxious 1 NA NA
Created on 2021-09-02 by the reprex package (v2.0.1)
Edit
You can simplify it further using:
ordering_func <- function(indices){
as.integer(rbind(indices, indices + ceiling(length(indices)/2))[indices])
}
df1 %>%
group_by(Session_ID) %>%
mutate(rn = LETTERS[row_number()]) %>%
ungroup() %>%
pivot_wider(names_from = rn, values_from = c(Side_Effect, Number_Code)) %>%
select(names(.)[ordering_func(seq_along(names(.)))])
#> # A tibble: 3 × 7
#> Session_ID Number_Code_A Side_Effect_A Number_Code_B Side_Effect_B Number_Code_C Side_Effect_C
#> <int> <int> <chr> <int> <chr> <int> <chr>
#> 1 1 1 anxious 2 dizzy 3 relaxed
#> 2 3 2 dizzy NA NA NA NA
#> 3 7 4 nauseous 1 anxious NA NA
(N.B. this approach puts Number_Code_A before Side_Effect_A: this isn't the right order in the original question, but it may not matter depending on the use-case)

Resources