Creating strings from dataframe - r

My dataframe
x1 <- data.frame(C1 = letters[1:4], C3=1:4, C3=letters[11:14])
I need something a list where each listelement are two values from a row
x2 <- list(c("a", "1"), c("b", "2"), c("c", "3"), c("d", "4"))
Basically each two values from a row need to be a listelement so that I can process them later on!
I tried
lapply(X = x2, MARGIN = 1, FUN = paste, collapse = "")
But that did not give me the desired output!

Is this what you want?
paste0(x1[,1], x1[,2])
# [1] "a1" "b2" "c3" "d4"
How about:
as.list(paste0(x1[,1], x1[,2]))
# [[1]]
# [1] "a1"
#
# [[2]]
# [1] "b2"
#
# [[3]]
# [1] "c3"
#
# [[4]]
# [1] "d4"
It doesn't matter how many rows you have. You just need to specify the columns you want pasted into a string.

Here is a method using lapply:
lapply(1:nrow(x1), function(i) c(x1[i,1], x1[i,2]))
The result is
[[1]]
[1] "a" "1"
[[2]]
[1] "b" "2"
[[3]]
[1] "c" "3"
[[4]]
[1] "d" "4"
data
x1 <- data.frame(C1 = letters[1:4], C3=1:4, C3=letters[11:14],
stringsAsFactors = F)
Note that I used the stringsAsFactors = F argument to construct the data. If I didn't do this, then C1 and C3 would be factors, so I'd have to wrap x[i, 1] in as.character.

If there are multiple columns, we can use do.call
as.list(do.call(paste0, x1[-3]))

Related

Names of nested list containing dots (e.g. "c.2)

How can I get the names of the leafs of a nested list (containing a dataframe)
p <- list(a=1,b=list(b1=2,b2=3),c=list(c1=list(c11='a',c12='x'),c.2=data.frame("t"=1)))
into a vector format:
[[1]]
[1] "a"
[[2]]
[1] "b" "b1"
[[3]]
[1] "b" "b2"
[[4]]
[1] "c" "c1" "c11"
[[5]]
[1] "c" "c1" "c12"
[[6]]
[1] "c" "c.2"
The problem is that my list contains names with a dot (e.g. "c.2"). By using unlist, one gets "c.c.2" and I (or possibly strsplit) can't tell if the point is a delimiter of unlist or part of the name. That is the difference to this question.
It should ignore data.frames. My approach so far is adapted from here, but struggles with the points created by unlist:
listNames = function(l, maxDepth = 2) {
n = 0
listNames_rec = function(l, n) {
if(!is.list(l) | is.data.frame(l) | n>=maxDepth) TRUE
else {
n = n + 1
# print(n)
lapply(l, listNames_rec, n)
}
}
n = names(unlist(listNames_rec(l, n)))
return(n)
}
listNames(p, maxDepth = 3)
[1] "a" "b.b1" "b.b2" "c.c1.c11" "c.c1.c12" "c.c.2"
Like this?
subnames <- function(L, s) {
if (!is.list(L) || is.data.frame(L)) return(L)
names(L) <- gsub(".", s, names(L), fixed = TRUE)
lapply(L, subnames, s)
}
res <- listNames(subnames(p, ":"), maxDepth = 3)
gsub(":", ".",
gsub(".", "$", res, fixed = TRUE),
fixed = TRUE
)
#[1] "a" "b$b1" "b$b2" "c$c1$c11" "c$c1$c12" "c$c.2"
Not a full answer but I imagine rrapply package could help you here?
One option could be to extract all names:
library(rrapply)
library(dplyr)
rrapply(p, how = "melt") %>%
select(-value)
# L1 L2 L3
# 1 a <NA> <NA>
# 2 b b1 <NA>
# 3 b b2 <NA>
# 4 c c1 c11
# 5 c c1 c12
# 6 c c.2 t
The problem here is that data.frame names are included above too so you could extract them separately:
#extract data frame name
rrapply(p, classes = "data.frame", how = "melt") %>%
select(-value)
# L1 L2
# 1 c c.2
Then you could play around with these two datasets and perhaps extract duplicates but keep dataframe names
rrapply(p, how = "melt") %>%
bind_rows(rrapply(p, classes = "data.frame", how = "melt"))
#then filter etc...
A way might be:
listNames = function(l, n, N) {
if(!is.list(l) | is.data.frame(l) | n<1) list(rev(N))
else unlist(Map(listNames, l, n=n-1, N=lapply(names(l), c, N)), FALSE, FALSE)
}
listNames(p, 3, NULL)
#[[1]]
#[1] "a"
#
#[[2]]
#[1] "b" "b1"
#
#[[3]]
#[1] "b" "b2"
#
#[[4]]
#[1] "c" "c1" "c11"
#
#[[5]]
#[1] "c" "c1" "c12"
#
#[[6]]
#[1] "c" "c.2"

Combining lists, possibly with mapply

I have a list of lists - a simple example is given below:
my_list <- vector(mode = "list", length = 4)
my_list[[1]] <- c(1, 2, 3)
my_list[[2]] <- c(1, 2, 6)
my_list[[3]] <- c("A")
my_list[[4]] <- c("A", "B")
I would like to combine a subset of these lists based on their indices in a vector. For example if
my_indices <- c(1,2,3), I would like to combine the first three lists and eliminates duplicates to get
c(1, 2, 3, 6, "A")
I can do this manually as follows:
c(my_list[[1]], my_list[[2]], my_list[[3]]) %>%
unique()
[1] "1" "2" "3" "6" "A"
but when i try and simplify / generalize this to
my_indices <- c(1, 2, 3)
c(my_list[[my_indices ]]) %>%
unique()
I get an error message:
error in my_list[[my_indices]] : recursive indexing failed at level 2
How can i combine lists in this setting. I do want a general solution, as my list of lists is large, and I want to be able to extract any subset of it. I have seen posts that use mapply in a related setting, but have not successfully got it to work.
Many thanks in advance for your help
Thomas Philips
c(1, 2, 3, 6, "A") is not what you think, it will be converted to c("1", "2", "3", "6", "A"). If you want mixed class, you cannot unlist, it must stay a list.
Some thoughts:
my_list[my_indices]
# [[1]]
# [1] 1 2 3
# [[2]]
# [1] 1 2 6
# [[3]]
# [1] "A"
unlist(my_list[my_indices])
# [1] "1" "2" "3" "1" "2" "6" "A"
unique(unlist(my_list[my_indices]))
# [1] "1" "2" "3" "6" "A"
To preserve class and ensure uniqueness, you can do
func <- function(a, b) {
a_chrs <- as.character(a)
b_chrs <- as.character(b)
b[ match(setdiff(b_chrs, a_chrs), b_chrs) ]
}
Reduce(func, my_list[my_indices], accumulate = TRUE)
# [[1]]
# [1] 1 2 3
# [[2]]
# [1] 6
# [[3]]
# [1] "A"
The _chrs fancy footwork is because setdiff by itself will not reduce correctly:
out <- Reduce(setdiff, my_list[my_indices], accumulate = TRUE)
out
# [[1]]
# [1] 1 2 3
# [[2]]
# [1] 3
# [[3]]
# [1] 3
If you need that with individually-indexable values, then
unlist(lapply(out, as.list), recursive = FALSE)
# [[1]]
# [1] 1
# [[2]]
# [1] 2
# [[3]]
# [1] 3
# [[4]]
# [1] 6
# [[5]]
# [1] "A"
Here's a tidyverse solution using reduce.
library(tidyverse)
my_list <- vector(mode = "list", length = 4)
my_list[[1]] <- c(1, 2, 3)
my_list[[2]] <- c(1, 2, 6)
my_list[[3]] <- c("A")
my_list[[4]] <- c("A", "B")
to_merge <- c(1,2,3)
unique(reduce(my_list[to_merge], c))
#> [1] "1" "2" "3" "6" "A"
Created on 2021-01-08 by the reprex package (v0.3.0)

Indexing named list with vector in R

How would you index the second element of a vector which is stored as a value in a named list?
I start with this:
hi <- list("1" = c("a","b"),
"2" = c("dog","cat"),
"3" = c("sister","brother")
)
and would like to end up with a named list with the key plus the 2nd element of the vector i.e:
list("1" = "b",
"2" = "cat",
"3" = "brother"
)
You can do:
lapply(hi, `[`, 2)
$`1`
[1] "b"
$`2`
[1] "cat"
$`3`
[1] "brother"
We can use map
library(purrr)
map(hi, pluck, 2)
#$`1`
#[1] "b"
#$`2`
#[1] "cat"
#$`3`
#[1] "brother"

Creating several new vectors from an original vector with separators

I'm trying to create several vectors from an original vector.
I read some posts but couldn't find something to solve my problem.
My original vector is looking like this:
> orig_vec
[1] "A" "B" "C" "D;" "1" "2;" "a1" "a2" "a3"
I want vectors that look like this:
> vector1
[1] "A" "B" "C" "D"
> vector2
[1] "1" "2"
> vector3
[1] "a1" "a2" "a3"
So what I need is a code which recognizes the semicolons as separators and creates new vectors depending on the number of separated values in "orig_vec".
I also have the problem that the "orig_vec" can change.
When it looks like this:
> orig_vec
[1] "A" "B" "C" "D" "E;" "1" "2;" "a1" "a2" "a3;" "b1"
I need to get automatically these vectors:
> vector1
[1] "A" "B" "C" "D" "E"
> vector2
[1] "1" "2"
> vector3
[1] "a1" "a2" "a3"
> vector4
[1] "b1"
I'm sorry that I can't provide more code or any idea of a solution.
This should work:
x <- c("A", "B", "C", "D;", "1", "2;", "a1", "a2", "a3")
sapply(split(x, c(0, cumsum(grepl(";", x))[-length(x)])), function(x) gsub(";", "", x))
$`0`
[1] "A" "B" "C" "D"
$`1`
[1] "1" "2"
$`2`
[1] "a1" "a2" "a3"
We use the cumsum() of condition grepl(";", x) to create a vector for subsetting with split(), then remove the semicolons by sapply()ing gsub().
I like #LAP's as well, here's another option:
vec <- c("A", "B", "C", "D;", "1", "2;", "a1", "a2", "a3;", "b1")
ix <- grep(";", vec)
mapply(function(x, ix1, ix2) x[ix1:ix2],
x = list(sub(";", "", vec)),
ix1 = c(1, ix + 1),
ix2 = c(ix, length(vec)))
[[1]]
[1] "A" "B" "C" "D"
[[2]]
[1] "1" "2"
[[3]]
[1] "a1" "a2" "a3"
[[4]]
[1] "b1"
You'll notice most people are giving you answers that result in a list of vectors, rather than a handful of vectors assigned to variable names. It's generally much cleaner and easier to work with lists of objects rather than objects scattered around in your namespace. Just an added $.02.
Here is one way, based on the idea of first joining on a space then successively splitting, first on ; and then on a space:
s <- c("A", "B", "C", "D;", "1" , "2;" ,"a1", "a2", "a3")
s <- paste0(s,collapse = ' ')
s <- unlist(strsplit(s, ';'))
vectors <- lapply(s,function(x) unlist(strsplit(trimws(x),' ')))
> vectors
[[1]]
[1] "A" "B" "C" "D"
[[2]]
[1] "1" "2"
[[3]]
[1] "a1" "a2" "a3"
Just throwing in a tidyverse approach that works in a single pipe.
Similar to other answers, collapse the vector into a single string, then split that string on each ;. I'm using a space as the collapse so I can use str_trim easily later on.
library(tidyverse)
x %>%
paste(collapse = " ") %>%
strsplit(split = ";", fixed = T)
#> [[1]]
#> [1] "A B C D E" " 1 2" " a1 a2 a3" " b1"
Since strsplit gives you a list and, at least in this scenario, you're only interested in the first list entry, pull it out with [[ and trim the beginning and trailing spaces of those vectors. The map gives you a list of vectors of one string each.
x %>%
paste(collapse = " ") %>%
strsplit(split = ";", fixed = T) %>%
`[[`(1) %>%
map(str_trim)
#> [[1]]
#> [1] "A B C D E"
#>
#> [[2]]
#> [1] "1 2"
#>
#> [[3]]
#> [1] "a1 a2 a3"
#>
#> [[4]]
#> [1] "b1"
Then split each vector by the spaces, and flatten into one list of vectors.
All in one pipe:
x %>%
paste(collapse = " ") %>%
strsplit(split = ";", fixed = T) %>%
`[[`(1) %>%
map(str_trim) %>%
map(str_split, " ") %>%
flatten()
#> [[1]]
#> [1] "A" "B" "C" "D" "E"
#>
#> [[2]]
#> [1] "1" "2"
#>
#> [[3]]
#> [1] "a1" "a2" "a3"
#>
#> [[4]]
#> [1] "b1"
Created on 2019-02-13 by the reprex package (v0.2.1)

Create a list containing a variable number of lists

I need to create a list from rows of a dataframe in the following format:
df <- data.frame(y1 = c("a", "d"), y2 = c("b", "e"), y3 = c("c", "f"))
df$y1 <- as.character(df$y1)
df$y2 <- as.character(df$y2)
df$y3 <- as.character(df$y3)
x <- list(
list(y1 = df$y1[1],
y2 = df$y2[1],
y3 = df$y3[1]),
list(y1 = df$y1[2],
y2 = df$y2[2],
y3 = df$y3[2])
)
> x
[[1]]
[[1]]$`y1`
[1] "a"
[[1]]$y2
[1] "b"
[[1]]$y3
[1] "c"
[[2]]
[[2]]$`y1`
[1] "d"
[[2]]$y2
[1] "e"
[[2]]$y3
[1] "f"
This is an example when there are two rows in the dataframe. How can I achieve this when the number of rows in the dataframe is variable? So for every row in the dataframe, there should be a list.
We may also use apply by going over the rows and applying as.list to each:
apply(df, 1, as.list)
[[1]]
[[1]]$y1
[1] "a"
[[1]]$y2
[1] "b"
[[1]]$y3
[1] "c"
[[2]]
[[2]]$y1
[1] "d"
[[2]]$y2
[1] "e"
[[2]]$y3
[1] "f"
We first split every row of the dataframe and then for every row we convert each element into separate list element using as.list
lapply(split(df, 1:nrow(df)), as.list)
#$`1`
#$`1`$y1
#[1] "a"
#$`1`$y2
#[1] "b"
#$`1`$y3
#[1] "c"
#$`2`
#$`2`$y1
#[1] "d"
#$`2`$y2
#[1] "e"
#$`2`$y3
#[1] "f"
We can use transpose from purrr
library(purrr)
transpose(df)
#[1]]
#[[1]]$y1
#[1] "a"
#[[1]]$y2
#[1] "b"
#[[1]]$y3
#[1] "c"
#[[2]]
#[[2]]$y1
#[1] "d"
#[[2]]$y2
#[1] "e"
#[[2]]$y3
#[1] "f"

Resources