access plyr id variables within functions - r

I regulary have the problem that I need to access the actual id variable when using d*ply or l*ply. A simple (yet nonsense) example would be:
df1 <- data.frame( p = c("a", "a", "b", "b"), q = 1:4 )
df2 <- data.frame( m = c("a", "b" ), n = 1:2 )
d_ply( df1, "p", function(x){
actualId <- unique( x$p )
print( mean(x$q)^df2[ df2$m == actualId, "n" ] )
})
So in case of d*ply functions I can help myself with unique( x$p ). But when it comes to l*ply, I have no idea how to access the name of the according list element.
l_ply( list(a = 1, b = 2, c = 3), function(x){
print( <missing code> )
})
# desired output
[1] "a"
[1] "b"
[1] "c"
Any suggestions? Anything I am ignoring?

One way I've gotten around this is to loop over the index (names) and do the subsetting within the function.
l <- list(a = 1, b = 2, c = 3)
l_ply(names(l), function(x){
print(x)
myl <- l[[x]]
print(myl)
})
myl will then be the same as
l_ply(l, function(myl) {
print(myl)
})

Here's one idea.
l_ply( list(a = 1, b = 2, c = 3), function(x){
print(eval(substitute(names(.data)[i], parent.frame())))
})
# [1] "a"
# [1] "b"
# [1] "c"
(Have a look at the final code block of l_ply to see where I got the names .data and i.)

I'm not sure there's a way to do that, because the only argument to your anonymous function is the list element value, without its name :
l_ply( list(a = 1, b = 2, c = 3), function(x){
print(class(x))
})
[1] "numeric"
[1] "numeric"
[1] "numeric"
But if you get back the results of your command as a list or a data frame, the names are preserved for you to use later :
llply( list(a = 1, b = 2, c = 3), function(x){
x
})
$a
[1] 1
$b
[1] 2
$c
[1] 3
Aside from Josh solution, you can also pass both names and values of your list elements to a function with mapply or m*ply :
d <- list(a = 1, b = 2, c = 3)
myfunc <- function(value, name) {
print(as.character(name))
print(value)
}
mapply(myfunc, d, names(d))
m_ply(data.frame(value=unlist(d), name=names(d)), myfunc)

Related

enumerate in R over dataframe rows

I'm trying to modify a function so that if I put in a dataframe, I get the rownumber and row output.
These functions taken from Zip or enumerate in R? are a good starting point for me:
zip <- function(...) {
mapply(list, ..., SIMPLIFY = FALSE)
}
enumerate <- function(...) {
zip(k=seq_along(..1), ...)
}
I modified enumerate to work as I want when the input is a dataframe:
enumerate2 <- function(...){
mod <- ..1
if(is.data.frame(mod)){
mod = split(mod, seq(nrow(mod)))
}
zip(k = seq_along(mod), ...)
}
So for example:
g = data.frame(a = c(1, 2, 3), b = c(4, 5, 6))
enumerate2(v = g)
This will enumerate the rows of a dataframe, so I can do:
for(i in enumerate2(v = g)){
"rowNumber = %s, rowValues = %s" %>% sprintf(i$k, list(i$v)) %>% print
}
The problem is I get a warning:
Warning message:
In mapply(list, ..., SIMPLIFY = FALSE) :
longer argument not a multiple of length of shorter
Also, I'd rather the dataframe still be a dataframe so that I can do things like i$v$b to return the value of row i$k column b from the dataframe.
How can I get rid of the warning, and how can I keep the dataframe structure after split?
edit:
example 1 - data frame input
output:
enumerate2(v = data.frame(A = c(1, 2), B = c(3, 4)))
[[1]]
[[1]]$k
[1] 1
[[1]]$v
A B
1 1 3
[[2]]
[[2]]$k
[1] 2
[[2]]$v
A B
1 2 4
example 2 - list input
output:
enumerate2(v = LETTERS[1:2])
[[1]]
[[1]]$k
[1] 1
[[1]]$v
[1] "A"
[[2]]
[[2]]$k
[1] 2
[[2]]$v
[1] "B"

For loop: paste index into string

This may strike you as odd, but I want to exactly achieve the following: I want to get the index of a list pasted into a string containing a string reference to a subset of this list.
For illustration:
l1 <- list(a = 1, b = 2)
l2 <- list(a = 3, b = 4)
l <- list(l1,l2)
X_l <- vector("list", length = length(l))
for (i in 1:length(l)) {
X_l[[i]] = "l[[ #insert index number as character# ]]$l_1*a"
}
In the end, I want something like this:
X_l_wanted <- list("l[[1]]$l_1*a","l[2]]$l_1*a")
You can use sprintf/paste0 directly :
sprintf('l[[%d]]$l_1*a', seq_along(l))
#[1] "l[[1]]$l_1*a" "l[[2]]$l_1*a"
If you want final output as list :
as.list(sprintf('l[[%d]$l_1*a', seq_along(l)))
#[[1]]
#[1] "l[[1]]$l_1*a"
#[[2]]
#[1] "l[[2]]$l_1*a"
Using paste0 :
paste0('l[[', seq_along(l), ']]$l_1*a')
Try paste0() inside your loop. That is the way to concatenate chains. Here the solution with slight changes to your code:
#Data
l1 <- list(a = 1, b = 2)
l2 <- list(a = 3, b = 4)
l <- list(l1,l2)
#List
X_l <- vector("list", length = length(l))
#Loop
for (i in 1:length(l)) {
#Code
X_l[[i]] = paste0('l[[',i,']]$l_1*a')
}
Output:
X_l
[[1]]
[1] "l[[1]]$l_1*a"
[[2]]
[1] "l[[2]]$l_1*a"
Or you could do it with lapply()
library(glue)
X_l <- lapply(1:length(l), function(i)glue("l[[{i}]]$l_l*a"))
X_l
# [[1]]
# l[[1]]$l_l*a
# [[2]]
# l[[2]]$l_l*a

reassign values in a list without looping

test <- list(a = list("first"= 1, "second" = 2),
b = list("first" = 3, "second" = 4))
In the list above, I would like to reassign the "first" elements to equal, let's say, five. This for loop works:
for(temp in c("a", "b")) {
test[[temp]]$first <- 5
}
Is there a way to do the same using a vectorized operation (lapply, etc)? The following extracts the values, but I can't get them reassigned:
lapply(test, "[[", "first")
Here is a vectorised one-liner using unlist and relist:
relist((function(x) ifelse(grepl("first",names(x)),5,x))(unlist(test)),test)
$a
$a$first
[1] 5
$a$second
[1] 2
$b
$b$first
[1] 5
$b$second
[1] 4
You can do it like this:
test <- lapply(test, function(x) {x$first <- 5; x})

Cannot unlist a list coerced from a call object in R?

Here is the experiment:
myfunc = function(a, b, c) {
callobj = match.call()
save(callobj, file="/tmp/callobj.rda")
}
myfunc(11, 22, 33)
load("/tmp/callobj.rda")
x = unlist(as.list(callobj))
print(x)
x = unlist(list(a = 1, b = 2, c = 3))
print(x)
Results:
> myfunc(11, 22, 33)
> load("/tmp/callobj.rda")
> x = unlist(as.list(callobj))
> print(x)
[[1]]
myfunc
$a
[1] 11
$b
[1] 22
$c
[1] 33
> x = unlist(list(a = 1, b = 2, c = 3))
> print(x)
a b c
1 2 3
The question is, why on the earth do the two lists behave differently? One of them can be unlisted, the other, apparently, cannot.
As suggested in one of the comments, I did this:
> dput(as.list(callobj))
structure(list(myfunc, a = 11, b = 22, c = 33), .Names = c("",
"a", "b", "c"))
> dput(list(a = 1, b = 2))
structure(list(a = 1, b = 2), .Names = c("a", "b"))
Doesn't explain why the two lists are different in behavior.
You can use as.character on call objects.
This would be equivalent to the unlisted, unnamed call to as.list(callobj).
> as.character(callobj)
# [1] "myfunc" "11" "22" "33"
To unlist callobj, we need all the elements to be of the same type. We can't have characters, numerics, and calls all in the same vector. R does not allow mixed object types in the same vector.
But we can use deparse and get it done:
> sapply(as.list(callobj), deparse)
# a b c
# "myfunc" "11" "22" "33"
This also makes apparent one of the main differences between sapply and lapply. We don't need unlist for this if we use sapply.
Looking at dput(unlist(callobj)):
structure(list(myfunc, a = 11, b = 22, c = 33), .Names = c("",
"a", "b", "c"))
myfunc is still in there. You can't unlist a list if there's no simpler represenation to coerce it to:
> unlist(list(function(x) x + 1, 3))
[[1]]
function (x)
x + 1
[[2]]
[1] 3

Find the indices of an element in a nested list?

I have a list like:
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3))
is there an (loop-free) way to identify the positions of the elements, e.g. if I want to replace a values of "C" with 5, and it does not matter where the element "C" is found, can I do something like:
Aindex <- find_index("A", mylist)
mylist[Aindex] <- 5
I have tried grepl, and in the current example, the following will work:
mylist[grepl("C", mylist)][[1]][["C"]]
but this requires an assumption of the nesting level.
The reason that I ask is that I have a deep list of parameter values, and a named vector of replacement values, and I want to do something like
replacements <- c(a = 1, C = 5)
for(i in names(replacements)){
indx <- find_index(i, mylist)
mylist[indx] <- replacements[i]
}
this is an adaptation to my previous question, update a node (of unknown depth) using xpath in R?, using R lists instead of XML
One method is to use unlist and relist.
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3))
tmp <- as.relistable(mylist)
tmp <- unlist(tmp)
tmp[grep("(^|.)C$",names(tmp))] <- 5
tmp <- relist(tmp)
Because list names from unlist are concatenated with a ., you'll need to be careful with grep and how your parameters are named. If there is not a . in any of your list names, this should be fine. Otherwise, names like list(.C = 1) will fall into the pattern and be replaced.
Based on this question, you could try it recursively like this:
find_and_replace <- function(x, find, replace){
if(is.list(x)){
n <- names(x) == find
x[n] <- replace
lapply(x, find_and_replace, find=find, replace=replace)
}else{
x
}
}
Testing in a deeper mylist:
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3, d = list(C=10, D=55)))
find_and_replace(mylist, "C", 5)
$a
[1] 1
$b
$b$A
[1] 1
$b$B
[1] 2
$c
$c$C ### it worked
[1] 5
$c$D
[1] 3
$c$d
$c$d$C ### it worked
[1] 5
$c$d$D
[1] 55
This can now also be done using rrapply in the rrapply-package (an extended version of base rapply). To return the position of an element in the nested list based on its name, we can use the special arguments .xpos and .xname. For instance, to look up the position of the element with name "C":
library(rrapply)
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3))
## get position C-node
(Cindex <- rrapply(mylist, condition = function(x, .xname) .xname == "C", f = function(x, .xpos) .xpos, how = "unlist"))
#> c.C1 c.C2
#> 3 1
We could then update its value in the nested list with:
## update value C-node
mylist[[Cindex]] <- 5
The two steps can also be combined directly in the call to rrapply:
rrapply(mylist, condition = function(x, .xname) .xname == "C", f = function(x) 5, how = "replace")
#> $a
#> [1] 1
#>
#> $b
#> $b$A
#> [1] 1
#>
#> $b$B
#> [1] 2
#>
#>
#> $c
#> $c$C
#> [1] 5
#>
#> $c$D
#> [1] 3

Resources