How to concatenate data.frame inside lists by using names?

How to concatenate data.frame inside lists by using names? - r

I have to import over 1,000 excel files, and each excel contains multiple sheets (some have the same sheet name and some have different sheet names).
Let's say with a small example as follows
games <- data.frame(index = c(1,2,3), player = c('John', 'Sam', 'Mary'))
weather <- data.frame(index = c(1,2,3), temperature = c('hot', 'cold', 'rainy'))
cars <- data.frame(index = c(1,2,3), car = c('honda', 'toyota','bmw'))
list1 <- list(games, weather, cars)
names(list1) <- c('games', 'weather', 'cars')
games <- data.frame(index = c(1,2,3), player = c('AA', 'BB', 'CC'))
weather <- data.frame(index = c(1,2,3), temperature = c('cold', 'rainy', 'hot'))
sport <- data.frame(index = c(1,2,3), interest = c('swim', 'soccer', 'rugby'))
list2 <- list(games, weather, sport)
names(list2) <- c('games', 'weather', 'sport')
list3 <- list(games, weather)
names(list3) <- c('games', 'weather')
rm(games, sport, weather, cars) # clean envir from unneeded stuff
I am looking for the way to combine lists by using lists' name. I have tried to use merge() and mapply(), but they did not return what I wanted
The return that I want is as follows:
$`games`
# A tibble: 6 x 2
index player
<dbl> <chr>
1 1 John
2 2 Sam
3 3 Mary
4 1 AA
5 2 BB
6 3 CC
$weather
# A tibble: 6 x 2
index temperature
<dbl> <chr>
1 1 hot
2 2 cold
3 3 rainy
4 1 cold
5 2 rainy
6 3 hot
$cars
# A tibble: 3 x 2
index car
<dbl> <chr>
1 1 honda
2 2 toyota
3 3 bmw
$sport
index interest
1 1 swim
2 2 soccer
3 3 rugby
EDIT: I have encountered with the case when there is a data.frame sport in list2 (not in list1)

You can use purrr to help manipulate the list. I add the stringAsFactors=FALSE only so that I could bind the data.frame. If you already use tibble, you won't have the issue.
I create a list of the lists.
transpose change the list to regroup the element by name. Basically, x[[1]][[2]] is equivalent to transpose(x)[[2]][[1]]
I use map to iterate through the list, and dplyr::bind_rows to get the resulting tibble.
options(stringsAsFactors = FALSE)
games <- data.frame(index = c(1,2,3), player = c('John', 'Sam', 'Mary'))
weather <- data.frame(index = c(1,2,3), temperature = c('hot', 'cold', 'rainy'))
cars <- data.frame(index = c(1,2,3), car = c('honda', 'toyota','bmw'))
list1 <- list(games, weather, cars)
names(list1) <- c('games', 'weather', 'cars')
games <- data.frame(index = c(1,2,3), player = c('AA', 'BB', 'CC'))
weather <- data.frame(index = c(1,2,3), temperature = c('cold', 'rainy', 'hot'))
list2 <- list(games, weather)
names(list2) <- c('games', 'weather')
library(purrr)
list(list1, list2) %>%
# regroup named element together
transpose() %>%
# bind the df together
map(dplyr::bind_rows)
#> $games
#> index player
#> 1 1 John
#> 2 2 Sam
#> 3 3 Mary
#> 4 1 AA
#> 5 2 BB
#> 6 3 CC
#>
#> $weather
#> index temperature
#> 1 1 hot
#> 2 2 cold
#> 3 3 rainy
#> 4 1 cold
#> 5 2 rainy
#> 6 3 hot
#>
#> $cars
#> index car
#> 1 1 honda
#> 2 2 toyota
#> 3 3 bmw
Created on 2018-11-04 by the reprex package (v0.2.1)
If the first list does not contain all the elements you want, you need to provide the .names argument in transpose. See help("transpose", package = "purrr").
I build an example for that.
options(stringsAsFactors = FALSE)
games <- data.frame(index = c(1,2,3), player = c('John', 'Sam', 'Mary'))
weather <- data.frame(index = c(1,2,3), temperature = c('hot', 'cold', 'rainy'))
list1 <- list(games = games, weather = weather)
games <- data.frame(index = c(1,2,3), player = c('AA', 'BB', 'CC'))
weather <- data.frame(index = c(1,2,3), temperature = c('cold', 'rainy', 'hot'))
cars <- data.frame(index = c(1,2,3), car = c('honda', 'toyota','bmw'))
list2 <- list(games = games, weather = weather, cars = cars)
library(purrr)
all_list <- list(list1, list2)
all_names <- all_list %>% map(names) %>% reduce(union)
list(list1, list2) %>%
# regroup named element together
transpose(.names = all_names) %>%
# bind the df together
map(dplyr::bind_rows)
#> $games
#> index player
#> 1 1 John
#> 2 2 Sam
#> 3 3 Mary
#> 4 1 AA
#> 5 2 BB
#> 6 3 CC
#>
#> $weather
#> index temperature
#> 1 1 hot
#> 2 2 cold
#> 3 3 rainy
#> 4 1 cold
#> 5 2 rainy
#> 6 3 hot
#>
#> $cars
#> index car
#> 1 1 honda
#> 2 2 toyota
#> 3 3 bmw
Created on 2018-11-04 by the reprex package (v0.2.1)

There's an easy way with lapply().
lapply(unique(unlist(lapply(mget(ls(pattern="list")), names))),
function(x) unique(rbind(list1[[x]], list2[[x]], list3[[x]])))
Use setNames() and dplyr::as_tibble to get list names and tibbles.
Like so:
nms <- unique(unlist(lapply(Lol, names)))
setNames(lapply(lapply(nms, function(x) unique(rbind(list1[[x]], list2[[x]], list3[[x]]))),
dplyr::as_tibble), nms)
Yields
$`games`
# A tibble: 6 x 2
index player
* <dbl> <fct>
1 1 John
2 2 Sam
3 3 Mary
4 1 AA
5 2 BB
6 3 CC
$weather
# A tibble: 6 x 2
index temperature
* <dbl> <fct>
1 1 hot
2 2 cold
3 3 rainy
4 1 cold
5 2 rainy
6 3 hot
$cars
# A tibble: 3 x 2
index car
* <dbl> <fct>
1 1 honda
2 2 toyota
3 3 bmw
$sport
# A tibble: 3 x 2
index interest
* <dbl> <fct>
1 1 swim
2 2 soccer
3 3 rugby
However, if the number of lists is unknown, supposed all your lists in the global environment with pattern "list", you could make following approach .
Lol <- mget(ls(pattern="^list+")) # list of lists
mergeFun <- function(z) {
l1 <- lapply(z,
function(y) lapply(1:length(y), # new column w/ sublist names
function(x) cbind(y[[x]], list=names(y)[x])))
l2 <- unlist(l1, recursive=FALSE) # unnest lists
l3 <- Reduce(function(...) merge(..., all=TRUE), l2) # merge list
l4 <- split(l3, l3$list) # new list of lists by sublist names
l5 <- lapply(l4, function(w)
Filter(function(v) !all(is.na(v)), w[, -2])) # delete NA cols
return(lapply(l5, function(u) `rownames<-`(u, NULL))) # reset row names
}
Do lapply(mergeFun(Lol), dplyr::as_tibble) to obtain tibbles if desired, otherwise just mergeFun(Lol).
Yields
> lapply(mergeFun(Lol), dplyr::as_tibble)
$`games`
# A tibble: 6 x 2
index player
<dbl> <fct>
1 1 John
2 1 AA
3 2 Sam
4 2 BB
5 3 Mary
6 3 CC
$weather
# A tibble: 6 x 2
index temperature
<dbl> <fct>
1 1 cold
2 1 hot
3 2 cold
4 2 rainy
5 3 hot
6 3 rainy
$cars
# A tibble: 3 x 2
index car
<dbl> <fct>
1 1 honda
2 2 toyota
3 3 bmw
$sport
# A tibble: 3 x 2
index interest
<dbl> <fct>
1 1 swim
2 2 soccer
3 3 rugby
Data
list1 <- list(games = structure(list(index = c(1, 2, 3), player = structure(c(1L,
3L, 2L), .Label = c("John", "Mary", "Sam"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L)), weather = structure(list(index = c(1, 2, 3), temperature = structure(c(2L,
1L, 3L), .Label = c("cold", "hot", "rainy"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L)), cars = structure(list(index = c(1, 2, 3), car = structure(c(2L,
3L, 1L), .Label = c("bmw", "honda", "toyota"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L)))
list2 <- list(games = structure(list(index = c(1, 2, 3), player = structure(1:3, .Label = c("AA",
"BB", "CC"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L)), weather = structure(list(index = c(1, 2, 3), temperature = structure(c(1L,
3L, 2L), .Label = c("cold", "hot", "rainy"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L)), sport = structure(list(index = c(1, 2, 3), interest = structure(3:1, .Label = c("rugby",
"soccer", "swim"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L)))
list3 <- list(games = structure(list(index = c(1, 2, 3), player = structure(1:3, .Label = c("AA",
"BB", "CC"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L)), weather = structure(list(index = c(1, 2, 3), temperature = structure(c(1L,
3L, 2L), .Label = c("cold", "hot", "rainy"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L)))

Related

How to combine two rows of a dataframe into one row

I have a dataframe which looks like this.
Name info.1 info.2
ab a 1
123 a 1
de c 4
456 c 4
fg d 5
789 d 5
The two rows that need to be combined are identical aside from the name column and are together in the dataframe. I want the new dataframe to look like this:
Name ID info.1 info.2
ab 123 a 1
de 456 c 4
fg 789 d 5
I have no clue how to do this and google search hasn't been helpful so far

In base R you could do:
data.frame(Name = df[seq(nrow(df)) %% 2 == 0, 1],
ID = df[seq(nrow(df)) %% 2 == 1, 1],
df[seq(nrow(df)) %% 2 == 0, 2:3])
#> Name ID info.1 info.2
#> 2 ab 456 a 1
#> 4 123 fg c 4
#> 6 de 789 d 5
Created on 2022-07-20 by the reprex package (v2.0.1)

A possible solution:
library(tidyverse)
df %>%
group_by(info.1) %>%
summarise(Name = str_c(Name, collapse = "_"), info.2 = first(info.2)) %>%
separate(Name, into = c("Name", "ID"), convert = T) %>%
relocate(info.1, .before = info.2)
#> # A tibble: 3 × 4
#> Name ID info.1 info.2
#> <chr> <int> <chr> <int>
#> 1 ab 123 a 1
#> 2 de 456 c 4
#> 3 fg 789 d 5

Assuming the Name column is consistently ordered Name-ID-Name-ID then:
library(tidyverse)
data <- tibble(Name = c('ab', 123, 'de', 456, 'fg', 789),
info.1 = c('a', 'a', 'c', 'c', 'd', 'd'),
info.2 = c(1, 1, 4, 4, 5, 5))
# remove the troublesome column and make a tibble
# with the unique combos of info1 and 2
data_2 <- data %>% select(info.1, info.2) %>% distinct()
# add columns for name and ID by skipping every other row in the
# original tibble
data_2$Name <- data$Name[seq(from = 1, to = nrow(data), by = 2)]
data_2$ID <- data$Name[seq(from = 2, to = nrow(data), by = 2)]

We could also use summarise and extract first as name and last as id:
data |>
group_by(info.1, info.2) |>
summarise(name = first(Name), ID = last(Name)) |>
ungroup() #|>
#relocate(3:4,1:2)
Output:
# A tibble: 3 × 4
info.1 info.2 name ID
<chr> <dbl> <chr> <chr>
1 a 1 ab 123
2 c 4 de 456
3 d 5 fg 789

We could also use
library(dplyr)
library(stringr)
data %>%
group_by(across(starts_with('info'))) %>%
mutate(ID = str_subset(Name, "^\\d+$"), .before = 2) %>%
ungroup %>%
filter(str_detect(Name, '^\\d+$', negate = TRUE))
-output
# A tibble: 3 × 4
Name ID info.1 info.2
<chr> <chr> <chr> <dbl>
1 ab 123 a 1
2 de 456 c 4
3 fg 789 d 5
data
data <- structure(list(Name = c("ab", "123", "de", "456", "fg", "789"
), info.1 = c("a", "a", "c", "c", "d", "d"), info.2 = c(1, 1,
4, 4, 5, 5)), row.names = c(NA, -6L), class = "data.frame")

how to split a dataframe by specific rows in r

I have a data look like this:
data <- structure(list(A = c("1", "1", "1", "A", "10", "10", "B", "200"), B = c("2", "2", "2", "B", "20", "20", "C", "300"), C = c("3","3", "3", "C", "30", "30", "D", "400"), D = c("4", "4", "4", "D", "40", "40", NA, NA)), row.names = c(NA, -8L), class = c("tbl_df","tbl", "data.frame"))
data
> data
# A tibble: 8 x 4
A B C D
<chr> <chr> <chr> <chr>
1 1 2 3 4
2 1 2 3 4
3 1 2 3 4
4 A B C D
5 10 20 30 40
6 10 20 30 40
7 B C D NA
8 200 300 400 NA
It was wrong bind by rows and I wanted to split the data into 3 sub data(d1, d2 and d3) such like this:
NOTE: In my real situation, d1, d2 and d3 have different nrow(). I set nrow(d1) = 3, nrow(d2) = 2 and nrow(d3) = 1 just for simplify the question in this example.
d1 <- data.frame(A = rep(1,3), B = rep(2,3), C = rep(3,3), D = rep(4,3))
d2 <- data.frame(A = rep(10,2), B = rep(20,2), C = rep(30,2), D = rep(40,2))
d3 <- data.frame( B = 200, C = 300, D = 400)
> d1
A B C D
1 1 2 3 4
2 1 2 3 4
3 1 2 3 4
> d2
A B C D
1 10 20 30 40
2 10 20 30 40
> d3
B C D
1 200 300 400
And then I could bind them correctly using bind_rows from dplyr
bind_rows(d1, d2, d3) %>% as_tibble()
# A tibble: 6 x 4
A B C D
<dbl> <dbl> <dbl> <dbl>
1 1 2 3 4
2 1 2 3 4
3 1 2 3 4
4 10 20 30 40
5 10 20 30 40
6 NA 200 300 400
The problem is that I am troubled by how to get the d1, d2 and d3 from data.
Any help will be highly appreciated!

Here is a tidyverse solution.
process_df takes a data frame and sets the column names and removes the first row.
process_df <- function(df, ...) {
df %>%
set_names(slice(., 1)) %>%
select(which(!is.na(names(.)))) %>%
slice(-1)
}
Add a header row that just contains the column names.
Use rowwise() and c_across() to get the values of all columns by row. Use this to identify which rows are header rows.
group_map will apply a function over each group and bind_rows will combine the results.
data %>%
add_row(!!!set_names(names(.)), .before = 1) %>%
rowwise() %>%
mutate(
group = all(is.na(c_across()) | c_across() %in% names(.))
) %>%
ungroup() %>%
mutate(group = cumsum(group)) %>%
group_by(group) %>%
group_map(process_df) %>%
bind_rows()
#> # A tibble: 6 x 4
#> A B C D
#> <chr> <chr> <chr> <chr>
#> 1 1 2 3 4
#> 2 1 2 3 4
#> 3 1 2 3 4
#> 4 10 20 30 40
#> 5 10 20 30 40
#> 6 NA 200 300 400
Explanation of the usage of !!! in new_row
set_names(names(.)) creates a named vector that represents the row we want to add. However, add_row doesn't accept a named vector - it wants the values to be specified as arguments.
Here is a simplified example.
new_row <- c(speed = 1, dist = 2)
add_row doesn't accept a named vector, so this doesn't work.
cars %>% add_row(new_row, .before = TRUE)
# (Error)
!!! will unpack the vector as arguments to the function.
cars %>% add_row(!!!new_row, .before = TRUE)
# (Works)
!!! above essentially results in this:
cars %>% add_row(speed = 1, dist = 2, .before = TRUE)

Does this work:
data
# A tibble: 5 x 4
A B C D
<chr> <chr> <chr> <chr>
1 1 2 3 4
2 A B C D
3 10 20 30 40
4 B C D NA
5 200 300 400 NA
data <- rbind(LETTERS[1:4],data)
data
# A tibble: 6 x 4
A B C D
<chr> <chr> <chr> <chr>
1 A B C D
2 1 2 3 4
3 A B C D
4 10 20 30 40
5 B C D NA
6 200 300 400 NA
split(data, rep(1:ceiling(nrow(data)/2), each = 2))
$`1`
# A tibble: 2 x 4
A B C D
<chr> <chr> <chr> <chr>
1 A B C D
2 1 2 3 4
$`2`
# A tibble: 2 x 4
A B C D
<chr> <chr> <chr> <chr>
1 A B C D
2 10 20 30 40
$`3`
# A tibble: 2 x 4
A B C D
<chr> <chr> <chr> <chr>
1 B C D NA
2 200 300 400 NA

Base R solution:
Map(function(x){setNames(data.frame(t(x[,2, drop = FALSE])), x[,1])[,!is.na(x[,1])]},
split.default(cbind(X0 = names(df), data.frame(t(df))), c(0, seq_len(nrow(df)) %/% 2)))
Including pushing separate data.frames to Global Environment:
list2env(setNames(Map(function(x){setNames(data.frame(t(x[,2, drop = FALSE])), x[,1])[,!is.na(x[,1])]},
split.default(cbind(X0 = names(df), data.frame(t(df))), c(0, seq_len(nrow(df)) %/% 2))),
paste0('d', seq_len(ceiling(nrow(df) / 2)))), .GlobalEnv)
Tidyverse Solution:
library(tidyverse)
df %>%
rbind(names(df), .) %>%
split(cumsum(seq_len(nrow(.)) %% 2)) %>%
Map(function(x){setNames(x[2,], x[1,])[,complete.cases(t(x))]}, .) %>%
set_names(str_c('d', names(.))) %>%
list2env(., .GlobalEnv)
Note solution adjusted to reflect edit to the question:
rdf <- type.convert(data.frame(t(rbind(names(df), df))))
Map(function(x){
y <- setNames(t(x[,-1, drop = FALSE]), x[,1]); y[,!is.na(colSums(y))]
}, split.default(rdf, cumsum(!sapply(rdf, is.integer))))
New solution including push to Global Env:
rdf <- type.convert(data.frame(t(rbind(names(df), df))))
dflist <- Map(function(x) {
y <-
setNames(t(x[, -1, drop = FALSE]), x[, 1])
y[, !is.na(colSums(y))]
}, split.default(rdf, cumsum(!sapply(rdf, is.integer))))
list2env(setNames(dflist, paste0('d', names(dflist))), .GlobalEnv)
Adjusted Tidyverse solution:
df %>%
rbind(names(.), .) %>%
t() %>%
data.frame() %>%
type.convert() %>%
split.default(cumsum(!sapply(., is.integer))) %>%
Map(function(x){
y <- setNames(t(x[,-1, drop = FALSE]), x[,1])
data.frame(y[,!is.na(colSums(y)), drop = FALSE])}, .) %>%
set_names(str_c('d', names(.))) %>%
list2env(., .GlobalEnv)
Data:
df <- structure(list(A = c("1", "A", "10", "B", "200"), B = c("2", "B", "20", "C", "300"), C = c("3", "C", "30", "D", "400"), D = c("4","D", "40", NA, NA)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))
Updated Data:
df <- structure(list(A = c("1", "1", "1", "A", "10", "10", "B", "200"), B = c("2", "2", "2", "B", "20", "20", "C", "300"), C = c("3","3", "3", "C", "30", "30", "D", "400"), D = c("4", "4", "4", "D", "40", "40", NA, NA)), row.names = c(NA, -8L), class = c("tbl_df","tbl", "data.frame"))

Merging data frame and filling missing values [duplicate]

This question already has answers here:
Merging a lot of data.frames [duplicate]
(1 answer)
How do I replace NA values with zeros in an R dataframe?
(29 answers)
Closed 2 years ago.
I want to merge the following 3 data frames and fill the missing values with -1. I think I should use the fct merge() but not exactly know how to do it.
> df1
Letter Values1
1 A 1
2 B 2
3 C 3
> df2
Letter Values2
1 A 0
2 C 5
3 D 9
> df3
Letter Values3
1 A -1
2 D 5
3 B -1
desire output would be:
Letter Values1 Values2 Values3
1 A 1 0 -1
2 B 2 -1 -1 # fill missing values with -1
3 C 3 5 -1
4 D -1 9 5
code:
> dput(df1)
structure(list(Letter = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), Values1 = c(1, 2, 3)), class = "data.frame", row.names = c(NA,
-3L))
> dput(df2)
structure(list(Letter = structure(1:3, .Label = c("A", "C", "D"
), class = "factor"), Values2 = c(0, 5, 9)), class = "data.frame", row.names = c(NA,
-3L))
> dput(df3)
structure(list(Letter = structure(c(1L, 3L, 2L), .Label = c("A",
"B", "D"), class = "factor"), Values3 = c(-1, 5, -1)), class = "data.frame", row.names = c(NA,
-3L))

You can get data frames in a list and use merge with Reduce. Missing values in the new dataframe can be replaced with -1.
new_df <- Reduce(function(x, y) merge(x, y, all = TRUE), list(df1, df2, df3))
new_df[is.na(new_df)] <- -1
new_df
# Letter Values1 Values2 Values3
#1 A 1 0 -1
#2 B 2 -1 -1
#3 C 3 5 -1
#4 D -1 9 5
A tidyverse way with the same logic :
library(dplyr)
library(purrr)
list(df1, df2, df3) %>%
reduce(full_join) %>%
mutate(across(everything(), replace_na, -1))

Here's a dplyr solution
df1 %>%
full_join(df2, by = "Letter") %>%
full_join(df3, by = "Letter") %>%
mutate_if(is.numeric, function(x) replace_na(x, -1))
output:
Letter Values1 Values2 Values3
<chr> <dbl> <dbl> <dbl>
1 A 1 0 -1
2 B 2 -1 -1
3 C 3 5 -1
4 D -1 9 5

calculate quantile for each group i dataframe and assign NA?

I made up this example to explain my question:
df= structure(list(group = structure(c(1L, 1L, 2L, 2L, 10L, 10L
), .Label = c("Eve", "ba", "De", "De","Mi", "C", "O", "W",
"as", "ras", "Cro", "ics"), class = "factor"), ds = c(8, 8,
1, 4, 4, 6), em = c(1, 3, 8,2, 7, 3)), row.names = c(74567L,
74568L, 74570L, 74576L, 74577L, 74578L), class = "data.frame")
I need for each group to assign all values of em and ds to NA
> quantile 90 = NA
< quantile 10 = NA

Here's a way to do it for each group and each numeric variable using dplyr and ifelse.
Having only a couple of samples per group makes it difficult to interpret the whole concept of quantiles, so the result you get very much depends on how you define a quantile. The type parameter allows you to specify the definition you are using. R defaults to type = 7:
library(dplyr)
df %>%
group_by(group) %>%
mutate(ds = ifelse(ds > quantile(ds, .9) | ds < quantile(ds, .1), NA, ds),
em = ifelse(em > quantile(em, .9) | em < quantile(em, .1), NA, em))
#> # A tibble: 6 x 3
#> # Groups: group [3]
#> group ds em
#> <fct> <dbl> <lgl>
#> 1 Eve 8 NA
#> 2 Eve 8 NA
#> 3 ba NA NA
#> 4 ba NA NA
#> 5 ras NA NA
#> 6 ras NA NA
However, you can change this depending on your definition:
df %>%
group_by(group) %>%
mutate(ds = ifelse(ds > quantile(ds, .9, type = 1) |
ds < quantile(ds, .1, type = 1), NA, ds),
em = ifelse(em > quantile(em, .9, type = 1) |
em < quantile(em, .1, type = 1), NA, em))
#> # A tibble: 6 x 3
#> # Groups: group [3]
#> group ds em
#> <fct> <dbl> <dbl>
#> 1 Eve 8 1
#> 2 Eve 8 3
#> 3 ba 1 8
#> 4 ba 4 2
#> 5 ras 4 7
#> 6 ras 6 3
Created on 2020-05-17 by the reprex package (v0.3.0)

How to use column indices to collect values from columns in R

x y z column_indices
6 7 1 1,2
5 4 2 3
1 3 2 1,3
I have the column indices of the values I would like to collect in a separate column like so, what I want to create is something like this:
x y z column_indices values
6 7 1 1,2 6,7
5 4 2 3 2
1 3 2 1,3 1,2
What is the simplest way to do this in R?
Thanks!

In base R, we can use apply, split the column_indices on ',', convert them to integer and get the corresponding value from the row.
df$values <- apply(df, 1, function(x) {
inds <- as.integer(strsplit(x[4], ',')[[1]])
toString(x[inds])
})
df
# x y z column_indices values
#1 6 7 1 1,2 6, 7
#2 5 4 2 3 2
#3 1 3 2 1,3 1, 2
data
df <- structure(list(x = c(6L, 5L, 1L), y = c(7L, 4L, 3L), z = c(1L,
2L, 2L), column_indices = structure(c(1L, 3L, 2L), .Label = c("1,2",
"1,3", "3"), class = "factor")), class = "data.frame", row.names = c(NA, -3L))

One solution involving dplyr and tidyr could be:
df %>%
pivot_longer(-column_indices) %>%
group_by(column_indices) %>%
mutate(values = toString(value[1:n() %in% unlist(strsplit(column_indices, ","))])) %>%
pivot_wider(names_from = "name", values_from = "value")
column_indices values x y z
<chr> <chr> <int> <int> <int>
1 1,2 6, 7 6 7 1
2 3 2 5 4 2
3 1,3 1, 2 1 3 2

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to concatenate data.frame inside lists by using names? - r

Related

How to combine two rows of a dataframe into one row

how to split a dataframe by specific rows in r

Merging data frame and filling missing values [duplicate]

calculate quantile for each group i dataframe and assign NA?

How to use column indices to collect values from columns in R

Categories

Resources