Find the indices of an element in a nested list? - r

I have a list like:
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3))
is there an (loop-free) way to identify the positions of the elements, e.g. if I want to replace a values of "C" with 5, and it does not matter where the element "C" is found, can I do something like:
Aindex <- find_index("A", mylist)
mylist[Aindex] <- 5
I have tried grepl, and in the current example, the following will work:
mylist[grepl("C", mylist)][[1]][["C"]]
but this requires an assumption of the nesting level.
The reason that I ask is that I have a deep list of parameter values, and a named vector of replacement values, and I want to do something like
replacements <- c(a = 1, C = 5)
for(i in names(replacements)){
indx <- find_index(i, mylist)
mylist[indx] <- replacements[i]
}
this is an adaptation to my previous question, update a node (of unknown depth) using xpath in R?, using R lists instead of XML

One method is to use unlist and relist.
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3))
tmp <- as.relistable(mylist)
tmp <- unlist(tmp)
tmp[grep("(^|.)C$",names(tmp))] <- 5
tmp <- relist(tmp)
Because list names from unlist are concatenated with a ., you'll need to be careful with grep and how your parameters are named. If there is not a . in any of your list names, this should be fine. Otherwise, names like list(.C = 1) will fall into the pattern and be replaced.

Based on this question, you could try it recursively like this:
find_and_replace <- function(x, find, replace){
if(is.list(x)){
n <- names(x) == find
x[n] <- replace
lapply(x, find_and_replace, find=find, replace=replace)
}else{
x
}
}
Testing in a deeper mylist:
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3, d = list(C=10, D=55)))
find_and_replace(mylist, "C", 5)
$a
[1] 1
$b
$b$A
[1] 1
$b$B
[1] 2
$c
$c$C ### it worked
[1] 5
$c$D
[1] 3
$c$d
$c$d$C ### it worked
[1] 5
$c$d$D
[1] 55

This can now also be done using rrapply in the rrapply-package (an extended version of base rapply). To return the position of an element in the nested list based on its name, we can use the special arguments .xpos and .xname. For instance, to look up the position of the element with name "C":
library(rrapply)
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3))
## get position C-node
(Cindex <- rrapply(mylist, condition = function(x, .xname) .xname == "C", f = function(x, .xpos) .xpos, how = "unlist"))
#> c.C1 c.C2
#> 3 1
We could then update its value in the nested list with:
## update value C-node
mylist[[Cindex]] <- 5
The two steps can also be combined directly in the call to rrapply:
rrapply(mylist, condition = function(x, .xname) .xname == "C", f = function(x) 5, how = "replace")
#> $a
#> [1] 1
#>
#> $b
#> $b$A
#> [1] 1
#>
#> $b$B
#> [1] 2
#>
#>
#> $c
#> $c$C
#> [1] 5
#>
#> $c$D
#> [1] 3

Related

enumerate in R over dataframe rows

I'm trying to modify a function so that if I put in a dataframe, I get the rownumber and row output.
These functions taken from Zip or enumerate in R? are a good starting point for me:
zip <- function(...) {
mapply(list, ..., SIMPLIFY = FALSE)
}
enumerate <- function(...) {
zip(k=seq_along(..1), ...)
}
I modified enumerate to work as I want when the input is a dataframe:
enumerate2 <- function(...){
mod <- ..1
if(is.data.frame(mod)){
mod = split(mod, seq(nrow(mod)))
}
zip(k = seq_along(mod), ...)
}
So for example:
g = data.frame(a = c(1, 2, 3), b = c(4, 5, 6))
enumerate2(v = g)
This will enumerate the rows of a dataframe, so I can do:
for(i in enumerate2(v = g)){
"rowNumber = %s, rowValues = %s" %>% sprintf(i$k, list(i$v)) %>% print
}
The problem is I get a warning:
Warning message:
In mapply(list, ..., SIMPLIFY = FALSE) :
longer argument not a multiple of length of shorter
Also, I'd rather the dataframe still be a dataframe so that I can do things like i$v$b to return the value of row i$k column b from the dataframe.
How can I get rid of the warning, and how can I keep the dataframe structure after split?
edit:
example 1 - data frame input
output:
enumerate2(v = data.frame(A = c(1, 2), B = c(3, 4)))
[[1]]
[[1]]$k
[1] 1
[[1]]$v
A B
1 1 3
[[2]]
[[2]]$k
[1] 2
[[2]]$v
A B
1 2 4
example 2 - list input
output:
enumerate2(v = LETTERS[1:2])
[[1]]
[[1]]$k
[1] 1
[[1]]$v
[1] "A"
[[2]]
[[2]]$k
[1] 2
[[2]]$v
[1] "B"

Using list elements in its' definition

I am having a relatively simple problem with R, which I hope we could find a solution to.
My aim is to define a following list, in which the c element should be the sum of a and b elements defined previously:
ex.list = list(
a = 1,
b = 2,
c = a+b
)
Code throws an error (Error: object 'a' not found), indicating that we cannot use the a and b elements defined just above.
Of course we can simply count the sum out of list definition
ex.list = list(
a = 1,
b = 2
)
ex.list$c = ex.list$a + ex.list$b
Or use another elements in creating the list
a.ex = 1
b.ex = 2
ex.list = list(
a = a.ex,
b = b.ex,
c = a.ex+b.ex
)
Unfortunately, I am not interested in the above solutions. Is there any way to do the sum in the list definition?
You can write your own list function that does lazy evaluation:
lazyList <- function(...) {
tmp <- match.call(expand.dots = FALSE)$`...`
lapply(tmp, eval, envir = tmp)
}
lazyList(
a = 1,
b = 2,
c = a+b
)
#$a
#[1] 1
#
#$b
#[1] 2
#
#$c
#[1] 3
However, obviously, the following is not possible with lazy evaluation:
lazyList(
a = 1,
b = 2,
d = c * a,
c = a+b
)
No, you can't do that. But you can do mad things like this:
> (function(a,b,c=a+b){list(a=a,b=b,c=c)})(11,22)
$a
[1] 11
$b
[1] 22
$c
[1] 33
But really, if you have a list you wish to construct in a particular way, write a function to do it. Its not difficult.

Cannot unlist a list coerced from a call object in R?

Here is the experiment:
myfunc = function(a, b, c) {
callobj = match.call()
save(callobj, file="/tmp/callobj.rda")
}
myfunc(11, 22, 33)
load("/tmp/callobj.rda")
x = unlist(as.list(callobj))
print(x)
x = unlist(list(a = 1, b = 2, c = 3))
print(x)
Results:
> myfunc(11, 22, 33)
> load("/tmp/callobj.rda")
> x = unlist(as.list(callobj))
> print(x)
[[1]]
myfunc
$a
[1] 11
$b
[1] 22
$c
[1] 33
> x = unlist(list(a = 1, b = 2, c = 3))
> print(x)
a b c
1 2 3
The question is, why on the earth do the two lists behave differently? One of them can be unlisted, the other, apparently, cannot.
As suggested in one of the comments, I did this:
> dput(as.list(callobj))
structure(list(myfunc, a = 11, b = 22, c = 33), .Names = c("",
"a", "b", "c"))
> dput(list(a = 1, b = 2))
structure(list(a = 1, b = 2), .Names = c("a", "b"))
Doesn't explain why the two lists are different in behavior.
You can use as.character on call objects.
This would be equivalent to the unlisted, unnamed call to as.list(callobj).
> as.character(callobj)
# [1] "myfunc" "11" "22" "33"
To unlist callobj, we need all the elements to be of the same type. We can't have characters, numerics, and calls all in the same vector. R does not allow mixed object types in the same vector.
But we can use deparse and get it done:
> sapply(as.list(callobj), deparse)
# a b c
# "myfunc" "11" "22" "33"
This also makes apparent one of the main differences between sapply and lapply. We don't need unlist for this if we use sapply.
Looking at dput(unlist(callobj)):
structure(list(myfunc, a = 11, b = 22, c = 33), .Names = c("",
"a", "b", "c"))
myfunc is still in there. You can't unlist a list if there's no simpler represenation to coerce it to:
> unlist(list(function(x) x + 1, 3))
[[1]]
function (x)
x + 1
[[2]]
[1] 3

Get element names in rapply

The little-used rapply function should be a perfect solution to many problems (such as this one ). However, it leaves the user-written function f with no ability to know where it is in the tree.
Is there any way to pass the name of the element in the nested list to f in an rapply call? Unfortunately, rapply calls .Internal pretty quickly.
I struggled with nested lists with arbitrary depth recently. Eventually, I came up to more or less acceptable decision in my case. It is not the direct answer to your question (no rapply usage), but it seems to be solving the same kind of problems. I hope it can be of some help.
Instead of trying to access names of list elements inside rapply I generated vector of names and queried it for elements.
# Sample list with depth of 3
mylist <- list(a=-1, b=list(A=1,B=2), c=list(C=3,D=4, E=list(F=5,G=6)))
Generating of names vector is a tricky in my case. Specifically, names of list elements should be safe, i.e. without . symbol.
list.names <- strsplit(names(unlist(mylist)), split=".", fixed=TRUE)
node.names <- sapply(list.names, function(x) paste(x, collapse="$"))
node.names <- paste("mylist", node.names, sep="$")
node.names
[1] "mylist$a" "mylist$b$A" "mylist$b$B" "mylist$c$C" "mylist$c$D" "mylist$c$E$F"
[7] "mylist$c$E$G"
Next step is accessing list element by string name. I found nothing better than using temporary file.
f <- function(x){
fname <- tempfile()
cat(x, file=fname)
source(fname)$value
}
Here f just returns value of x, where x is a string with full name of list element.
Finally, we can query list in pseudo-recursive way.
sapply(node.names, f)
Referring to the question Find the indices of an element in a nested list?, you can write:
rappply <- function(x, f) {
setNames(lapply(seq_along(x), function(i) {
if (!is.list(x[[i]])) f(x[[i]], .name = names(x)[i])
else rappply(x[[i]], f)
}), names(x))
}
then,
> mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3))
>
> rappply(mylist, function(x, ..., .name) {
+ switch(.name, "a" = 1, "C" = 5, x)
+ })
$a
[1] 1
$b
$b$A
[1] 1
$b$B
[1] 2
$c
$c$C
[1] 5
$c$D
[1] 3
Update June 2020: the rrapply-function in the rrapply-package (an extended version of base rapply) allows to do this by defining the argument .xname in the f function. Inside f, the .xname variable will evaluate to the name of the element in the nested list:
library(rrapply)
L <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3))
rrapply(L, f = function(x, .xname) paste(.xname, x, sep = " = "))
#> $a
#> [1] "a = 1"
#>
#> $b
#> $b$A
#> [1] "A = 1"
#>
#> $b$B
#> [1] "B = 2"
#>
#>
#> $c
#> $c$C
#> [1] "C = 1"
#>
#> $c$D
#> [1] "D = 3"

access plyr id variables within functions

I regulary have the problem that I need to access the actual id variable when using d*ply or l*ply. A simple (yet nonsense) example would be:
df1 <- data.frame( p = c("a", "a", "b", "b"), q = 1:4 )
df2 <- data.frame( m = c("a", "b" ), n = 1:2 )
d_ply( df1, "p", function(x){
actualId <- unique( x$p )
print( mean(x$q)^df2[ df2$m == actualId, "n" ] )
})
So in case of d*ply functions I can help myself with unique( x$p ). But when it comes to l*ply, I have no idea how to access the name of the according list element.
l_ply( list(a = 1, b = 2, c = 3), function(x){
print( <missing code> )
})
# desired output
[1] "a"
[1] "b"
[1] "c"
Any suggestions? Anything I am ignoring?
One way I've gotten around this is to loop over the index (names) and do the subsetting within the function.
l <- list(a = 1, b = 2, c = 3)
l_ply(names(l), function(x){
print(x)
myl <- l[[x]]
print(myl)
})
myl will then be the same as
l_ply(l, function(myl) {
print(myl)
})
Here's one idea.
l_ply( list(a = 1, b = 2, c = 3), function(x){
print(eval(substitute(names(.data)[i], parent.frame())))
})
# [1] "a"
# [1] "b"
# [1] "c"
(Have a look at the final code block of l_ply to see where I got the names .data and i.)
I'm not sure there's a way to do that, because the only argument to your anonymous function is the list element value, without its name :
l_ply( list(a = 1, b = 2, c = 3), function(x){
print(class(x))
})
[1] "numeric"
[1] "numeric"
[1] "numeric"
But if you get back the results of your command as a list or a data frame, the names are preserved for you to use later :
llply( list(a = 1, b = 2, c = 3), function(x){
x
})
$a
[1] 1
$b
[1] 2
$c
[1] 3
Aside from Josh solution, you can also pass both names and values of your list elements to a function with mapply or m*ply :
d <- list(a = 1, b = 2, c = 3)
myfunc <- function(value, name) {
print(as.character(name))
print(value)
}
mapply(myfunc, d, names(d))
m_ply(data.frame(value=unlist(d), name=names(d)), myfunc)

Resources