R: relisting a flat list - r

This question has a nice solution of flattening lists while preserving their data types (which unlist does not):
flatten = function(x, unlist.vectors=F) {
while(any(vapply(x, is.list, logical(1)))) {
if (! unlist.vectors)
x = lapply(x, function(x) if(is.list(x)) x else list(x))
x = unlist(x, recursive=F)
}
x
}
If I give it the following list, it behaves as expected:
> a = list(c(1,2,3), list(52, 561), "a")
> flatten(a)
[[1]]
[1] 1 2 3
[[2]]
[1] 52
[[3]]
[1] 561
[[4]]
[1] "a"
Now I'd like to restructure the flat list like a. relist fails miserably:
> relist(flatten(a), skeleton=a)
[[1]]
[[1]][[1]]
[1] 1 2 3
[[1]][[2]]
[1] 52
[[1]][[3]]
[1] 561
[[2]]
[[2]][[1]]
[[2]][[1]][[1]]
[1] "a"
[[2]][[2]]
[[2]][[2]][[1]]
NULL
[[3]]
[[3]][[1]]
NULL
Now, I could of course do relist(unlist(b), a) but that loses data types again. What is a good way to restructure a flat list?
Bonus points if it handles the analogous attribute to unlist.vectors correctly.

One way to do it is:
relist2 = function(x, like, relist.vectors=F) {
if (! relist.vectors)
like = rapply(a, function(f) NA, how='replace')
lapply(relist(x, skeleton=like), function(e) unlist(e, recursive=F))
}
This retains the classes and distinguishes between lists and vectors:
> relist2(flatten(a), like=a)
[[1]]
[1] 1 2 3
[[2]]
[[2]][[1]]
[1] 52
[[2]][[2]]
[1] 561
[[3]]
[1] "a"
> relist2(flatten(a, unlist.vectors=T), like=a, relist.vectors=T)
[[1]]
[1] 1 2 3
[[2]]
[[2]][[1]]
[1] 52
[[2]][[2]]
[1] 561
[[3]]
[1] "a"

Related

How to conditionally convert dates, numbers and factors from a list in R?

I get the following vector in R from a Shiny DT dataframe as input$table_search_columns:
vec = ("2022-08-19 ... 2022-09-09","","","[\"CT101\",\"CT102\"]","","","8.59 ... 76.00","")
I'd like to apply this filtered conditions to my dataframe containing dates, numbers and factor columns.
Therefore, I'd like to end up with the following list:
[[1]]
2022-08-19 2022-09-09
[[2]]
[1] NA
[[3]]
[1] NA
[[4]]
[1] "CT102" "CT104"
[[5]]
[1] NA
[[6]]
[1] NA
[[7]]
8.59 76.0
[[8]]
[1] NA
I tried to use the following code:
filter_conditions <- lapply(myff, function(column) {
if (str_detect(column, "\\.\\.\\.")) {
vals <- strsplit(column, " ")
for (i in seq_along(vals)) {
current_vals <- vals[[i]][1]
is.convertible.to.number <- function(x) !is.na(as.numeric(x))
if (is.convertible.to.number(current_vals)) {
vals[[i]][1] = as.numeric(vals[[i]][1])
vals[[i]][3] = as.numeric(vals[[i]][3])
c(vals[[i]][1],vals[[i]][3])
} else {
vals[[i]][1] = as.Date(vals[[i]][1])
vals[[i]][3] = as.Date(vals[[i]][3])
c(vals[[i]][1],vals[[i]][3])
}
}
} else {
if (column == "") {
NA
} else {
vals <- strsplit(column, "\"")
index <- seq(from = 2, to = length(vals[[1]]), by = 2)
as.character(vals[[1]][index])
}
}
})
but I end up with NULL on numeric and dates filters:
[[1]]
NULL
[[2]]
[1] NA
[[3]]
[1] NA
[[4]]
[1] "CT102" "CT104"
[[5]]
[1] NA
[[6]]
[1] NA
[[7]]
NULL
[[8]]
[1] NA
I'd be very grateful if anyone can give me some assistance.
One possible way to solve your problem:
lapply(stringi::stri_extract_all_regex(vec, "[0-9]+\\.[0-9]+|[-A-Z0-9]+"),
\(x) if(length(na.omit(z <- as.Date(x, "%Y-%m-%d")))) z else type.convert(x, as.is=TRUE))
[[1]]
[1] "2022-08-19" "2022-09-09"
[[2]]
[1] NA
[[3]]
[1] NA
[[4]]
[1] "CT101" "CT102"
[[5]]
[1] NA
[[6]]
[1] NA
[[7]]
[1] 8.59 76.00
[[8]]
[1] NA

Why `lapply` returns result of assignment automatically?

q <- lapply(1:3, function(x) x ** 2)
## returns nothing, because it is an assignment
# however, how you explain this?:
> lapply(list(1:3, 4:6, 7:9, 10:11), function(v) q <- lapply(v, function(x) x ** 2))
[[1]]
[[1]][[1]]
[1] 1
[[1]][[2]]
[1] 4
[[1]][[3]]
[1] 9
[[2]]
[[2]][[1]]
[1] 16
[[2]][[2]]
[1] 25
[[2]][[3]]
[1] 36
[[3]]
[[3]][[1]]
[1] 49
[[3]][[2]]
[1] 64
[[3]][[3]]
[1] 81
[[4]]
[[4]][[1]]
[1] 100
[[4]][[2]]
[1] 121
# while this gives the same but is logical (q is stated as return value).
> lapply(list(1:3, 4:6, 7:9, 10:11), function(v) {q <- lapply(v, function(x) x ** 2);q})
[[1]]
[[1]][[1]]
[1] 1
[[1]][[2]]
[1] 4
[[1]][[3]]
[1] 9
[[2]]
[[2]][[1]]
[1] 16
[[2]][[2]]
[1] 25
[[2]][[3]]
[1] 36
[[3]]
[[3]][[1]]
[1] 49
[[3]][[2]]
[1] 64
[[3]][[3]]
[1] 81
[[4]]
[[4]][[1]]
[1] 100
[[4]][[2]]
[1] 121
why in the second expression, although the inner lapply is just assigned to q but q not called at end of function, the value of the assignment
is returned to the outer lapply and thus collected?
Please, anybody has an explanation for this phenomenon?
It also works with =
lapply(list(1:3, 4:6, 7:9, 10:11), function(v) q = lapply(c(v), function(x) x ** 2))
[[1]]
[[1]][[1]]
[1] 1
[[1]][[2]]
[1] 4
[[1]][[3]]
[1] 9
[[2]]
[[2]][[1]]
[1] 16
[[2]][[2]]
[1] 25
[[2]][[3]]
[1] 36
[[3]]
[[3]][[1]]
[1] 49
[[3]][[2]]
[1] 64
[[3]][[3]]
[1] 81
[[4]]
[[4]][[1]]
[1] 100
[[4]][[2]]
[1] 121
The answer lies in the return value of an assignment operation. The assignment operator <- not only writes a value to a variable in the calling environment, it actually invisibly returns the assigned value itself to the caller.
Remember all operations in R are actually functions. When you do
x <- 3
You are actually doing
`<-`(x, 3)
Which not only creates the symbol "x" in the calling environment and assigns the value 3 to that symbol, but invisibly returns the value 3 to the caller. To see this, consider:
y <- 2
y
#> [1] 2
y <- `<-`(x, 3)
y
#> [1] 3
Or equivalently,
y <- (x <- 4)
y
#> [1] 4
And in fact, because of R's order of evaluation, we can even do:
y <- x <- 5
y
#> [1] 5
Which is a neat way of setting multiple variables to the same value on the same line.
Now consider the lambda function you use inside your lapply:
function(v) q <- lapply(v, function(x) x ** 2)
Look what happens when we consider this function as a stand-alone:
func <- function(v) q <- lapply(v, function(x) x ** 2)
func(1:3)
As predicted, nothing happens. But what happens when we do:
a <- func(1:3)
If func(1:3) doesn't return anything, then presumably a should be empty now.
But it isn't...
a
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 4
#>
#> [[3]]
#> [1] 9
Because the value of assignation was returned to the caller invisibly, we were able to assign it to a value in the calling scope. Therefore, doing
lapply(list(1:3, 4:6, 7:9, 10:11), function(v) q <- lapply(v, function(x) x ** 2))
assigns the value of your inner function applied to all the list elements to a new list. This list is not returned invisibly, but just returned as normal.
So this is expected behaviour.

How to unlist nested lists while keeping vectors

I'd like to unlist a nested list with has some items as vectors. The problem is that unlist also splits up these vectors. How can I keep them as single items?
a) one level up (unlist parameter: recursive = F)
b) all levels (unlist parameter: recursive = T)
Here's the example:
list0 <- list(c(1,2),
list(3,
c(4,5)
)
)
> list0
[[1]]
[1] 1 2
[[2]]
[[2]][[1]]
[1] 3
[[2]][[2]]
[1] 4 5
If we unlist one level:
list1 <- unlist(list0, recursive = F)
we get:
> list1
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] 4 5
but, as I'd like to keep vectors as they are, I'd like to get:
[[1]]
[1] 1 2
[[2]]
[1] 3
[[3]]
[1] 4 5
Maybe one way is with a for loop, but I guess that would be slow if the number of lists is high.
Could anyone give me some hints, please?
Thanks in advance
For your example, the code below gives the expected result.
f <- function(x){
if(is.atomic(x)){
list(x)
}else{
x
}
}
unlist(lapply(list0, f), recursive=FALSE)
But perhaps you need something which works with more nested levels, like:
f <- function(x){
if(is.atomic(x)){
list(x)
}else{
x
}
}
g <- function(L){
out <- unlist(lapply(L, f), recursive=FALSE)
while(any(sapply(out, is.list))){
out <- g(out)
}
out
}
list1 <- list(c(1,2),
list(3, c(4,5)),
list(6, list(c(7,8)))
)
list1_flattened <- g(list1)
which gives:
> list1
[[1]]
[1] 1 2
[[2]]
[[2]][[1]]
[1] 3
[[2]][[2]]
[1] 4 5
[[3]]
[[3]][[1]]
[1] 6
[[3]][[2]]
[[3]][[2]][[1]]
[1] 7 8
> list1_flattened
[[1]]
[1] 1 2
[[2]]
[1] 3
[[3]]
[1] 4 5
[[4]]
[1] 6
[[5]]
[1] 7 8

remove blanks from strsplit in R

> dc1
V1 V2
1 20140211-0100 |Box
2 20140211-1782 |Office|Ball
3 20140211-1783 |Office
4 20140211-1784 |Office
5 20140221-0756 |Box
6 20140203-0418 |Box
> strsplit(as.character(dc1[,2]),"^\\|")
[[1]]
[1] "" "Box"
[[2]]
[1] "" "Office" "Ball"
[[3]]
[1] "" "Office"
[[4]]
[1] "" "Office"
[[5]]
[1] "" "Box"
[[6]]
[1] "" "Box"
How do i remove the blank ("") from strsplit results.The result should look like:
[[1]]
[1] "Box"
[[2]]
[1] "Office" "Ball"
You can check use lapply on your list. I changed the definition of your strsplit to match your intended output.
dc1 <- read.table(text = 'V1 V2
1 20140211-0100 |Box
2 20140211-1782 |Office|Ball
3 20140211-1783 |Office
4 20140211-1784 |Office
5 20140221-0756 |Box
6 20140203-0418 |Box', header = TRUE)
out <- strsplit(as.character(dc1[,2]),"\\|")
> lapply(out, function(x){x[!x ==""]})
[[1]]
[1] "Box"
[[2]]
[1] "Office" "Ball"
[[3]]
[1] "Office"
[[4]]
[1] "Office"
[[5]]
[1] "Box"
[[6]]
[1] "Box"
You could use:
library(stringr)
str_extract_all(dc1[,2], "[[:alpha:]]+")
[[1]]
[1] "Box"
[[2]]
[1] "Office" "Ball"
[[3]]
[1] "Office"
[[4]]
[1] "Office"
[[5]]
[1] "Box"
[[6]]
[1] "Box"
I do not have a global solution, but for your example you could try :
strsplit(sub("^\\|", "", as.character(dc1[,2])),"\\|")
It removes the first | (this is what the regex "^\\|" says), which is the reason for the "", before performing the split.
In this case, you can just remove the first element of each vector by calling "[" in sapply
> sapply(strsplit(as.character(dc1[,2]), "\\|"), "[", -1)
# [[1]]
# [1] "Box"
# [[2]]
# [1] "Office" "Ball"
# [[3]]
# [1] "Office"
# [[4]]
# [1] "Office"
# [[5]]
# [1] "Box"
# [[6]]
# [1] "Box"
Another method uses nzchar() after unlisting the result of strsplit():
out <- unlist(strsplit(as.character(dc1[,2]),"\\|"))
out[nzchar(x=out)] # removes the extraneous "" marks
library("stringr")
lapply(str_split(dc1$V2, "\\|"), function(x) x[-1])
[[1]]
[1] "Box"
[[2]]
[1] "Office" "Ball"
[[3]]
[1] "Office"
[[4]]
[1] "Office"
[[5]]
[1] "Box"
[[6]]
[1] "Box"
This post is cold but if this helps someone:
strsplit(as.character(dc1[,2]),"^\\|") %>%
lapply(function(x){paste0(x, collapse="")})

adding a field to each element of a list

I have a list
> (mylist <- list(list(a=1),list(a=2),list(a=3)))
[[1]]
[[1]]$a
[1] 1
[[2]]
[[2]]$a
[1] 2
[[3]]
[[3]]$a
[1] 3
and I want to add field b to each sublist from 11:13 to get something like
> (mylist <- list(list(a=1,b=11),list(a=2,b=12),list(a=3,b=13)))
[[1]]
[[1]]$a
[1] 1
[[1]]$b
[1] 11
[[2]]
[[2]]$a
[1] 2
[[2]]$b
[1] 12
[[3]]
[[3]]$a
[1] 3
[[3]]$b
[1] 13
How do I do this?
(note that I have a large number of such relatively small lists, so this will be called in apply and has to be reasonably fast).
mylist <- list(list(a=1),list(a=2),list(a=3))
b.vals <- 11:13
mylist <- lapply(
1:length(mylist),
function(x) {
mylist[[x]]$b <- b.vals[[x]]
mylist[[x]]
} )

Resources