difference between plyr::mutate and dplyr::mutate - r

dplyr::mutate() works the same way as plyr::mutate() and similarly to base::transform(). The key difference between mutate() and transform() is that mutate allows you to refer to columns that you just created. - Introduction to dplyr
There are some differences between the mutate function in dplyr and plyr. The main difference is of course that plyr::mutate can be applied to lists and dplyr::mutate is faster.
Moreover, when referring to the just created columns, plyr cannot reassign them again, but dplyr does.
# creating a temporary variable and removing it later
plyr::mutate(data.frame(a = 2), tmp = a, c = a*tmp, tmp = NULL)
## a tmp c
## 1 2 2 4
dplyr::mutate(data.frame(a = 2), tmp = a, c = a*tmp, tmp = NULL)
## a c
## 1 2 4
# creating a temporery variable and changing it later
plyr::mutate(data.frame(a = 2), b = a, c = a*b, b = 1)
## a b c
## 1 2 2 4
dplyr::mutate(data.frame(a = 2), b = a, c = a*b, b = 1)
## a b c
## 1 2 1 4
Now I am looking for the functionality of the dplyr mutate function for list objects. So I am looking for a function that mutates a list and can reassign just created variables.
plyr::mutate(list(a = 2), b = a, c = a*b, b = 1)
## $a
## [1] 2
##
## $b
## [1] 2
##
## $c
## [1] 4
dplyr::mutate(list(a = 2), b = a, c = a*b, b = 1)
## Error in UseMethod("mutate_") :
## no applicable method for 'mutate_' applied to an object of class "list"
desired_mutate(list(a = 2), b = a, c = a*b, b = 1)
## $a
## [1] 2
##
## $b
## [1] 1
##
## $c
## [1] 4
I realize that in this simple case, I can just use
plyr::mutate(list(a = 2), c = {b = a; a*b})
But in my actual use case, I assign random numbers to a temporary variable and would like to remove it afterwards. Something like the following:
desired_mutate(list(a = c(1, 2, 5, 2)),
tmp = runif(length(a)),
b = tmp * a,
c = tmp + a,
tmp = NULL)

Corrected original for loop in mutate function (using cols position instead of names):
desired_mutate <- function (.data, ...)
{
stopifnot(is.data.frame(.data) || is.list(.data) || is.environment(.data))
cols <- as.list(substitute(list(...))[-1])
cols <- cols[names(cols) != ""]
col_names <- names(cols)
for (i in seq_along(col_names) ) {
if(!is.null(cols[[i]])) {
.data[[col_names[i]]] <- eval(cols[[i]], .data, parent.frame())
} else {
.data[[col_names[i]]] <- NULL
}
}
.data
}
Test:
> str( desired_mutate(list(a = c(1, 2, 5, 2)),
+ tmp = runif(length(a)),
+ b = tmp * a,
+ c = tmp + a,
+ tmp = NULL) )
List of 3
$ a: num [1:4] 1 2 5 2
$ b: num [1:4] 0.351 1.399 3.096 1.4
$ c: num [1:4] 1.35 2.7 5.62 2.7

Related

Get arguments supplied to a purrr::pmap call

I'm running a function with many arguments and I'm exploring how changes in some arguments affect the output of the function. I'm doing that through purrr::pmap. I'd like to keep track of the arguments used for each function call: I'd like the function to return its output, as well as a named list of the values of all the arguments used.
Here's a MWE:
f <- function (a, b, c) a + b + c
a_values <- 1:5
effect_of_a <- pmap(list(a = a_values), f, b = 0, c = 0)
I'd like effect_of_a to be a list of list, where each sublist contains not only the result f(a,b,c), but also the values of a, b and c used. I could code that list manually, but I have many arguments and they may change. So is there a way to capture the list of arguments and their values in a function call initiated by purrr:pmap?
You could use with :
pmap(list(a = a_values), ~with(list(...), list(f = f(...),args = list(...))),
b = 0,
c = 0)
[[1]]
[[1]]$f
[1] 1
[[1]]$args
[[1]]$args$a
[1] 1
[[1]]$args$b
[1] 0
[[1]]$args$c
[1] 0
...
Here is a general solution where the arguments are captured in an extra function f_2, which is called with pmap instead of f:
library(purrr)
f <- function (a, b, c) a + b + c
f_2 <- function(a, b, c) {
list(result = f(a, b, c),
args = c(as.list(environment())))
}
a_values <- 1:3
pmap(list(a = a_values), f_2, b = 0, c = 0)
[[1]]
[[1]]$result
[1] 1
[[1]]$args
[[1]]$args$a
[1] 1
[[1]]$args$b
[1] 0
[[1]]$args$c
[1] 0
[[2]]
[[2]]$result
[1] 2
[[2]]$args
[[2]]$args$a
[1] 2
[[2]]$args$b
[1] 0
[[2]]$args$c
[1] 0
[[3]]
[[3]]$result
[1] 3
[[3]]$args
[[3]]$args$a
[1] 3
[[3]]$args$b
[1] 0
[[3]]$args$c
[1] 0
For a list of lists, you could also try something like:
effect_of_a <- map(a_values %>% as.list, function(a, b = 0, c = 0) {
list(f_eval = f(a, b, c),
a = a,
b = b,
c = c)
})
If the values for b & c are changing too, you could try:
b_values <- 2:6
c_values <- 3:7
effect_of_a <- list(a_values, b_values, c_values) %>%
map(~.x %>% as.list) %>%
pmap(~list(f_eval = f(..1, ..2, ..3),
a = ..1,
b = ..2,
c = ..3))

Assigning a value to a list item using `assign()`

A little bit of context first...
I've written an infix function that in essence replaces the idiom
x[[length(x) +1]] <- y
..or simply x <- append(x, y) for vectors.
Here it is:
`%+=%` <- function(x, y) {
xcall <- substitute(x)
xobjname <- setdiff(all.names(xcall), c("[[", "[", ":", "$"))
# if the object doesn't exist, create it
if (!exists(xobjname, parent.frame(), mode = "list") &&
!exists(xobjname, parent.frame(), mode = "numeric") &&
!exists(xobjname, parent.frame(), mode = "character")) {
xobj <- subset(y, FALSE)
} else {
xobj <- eval(xcall, envir = parent.frame())
}
if (is.atomic(xobj)) {
if (!is.atomic(y)) {
stop('Cannot append object of mode ', dQuote(mode(y)),
' to atomic structure ', xobjname)
}
assign(xobjname, append(xobj, y), envir = parent.frame())
return(invisible())
}
if (is.list(xobj)) {
if (is.atomic(y)) {
xobj[[length(xobj) + 1]] <- y
} else {
for (i in seq_along(y)) {
xobj[[length(xobj) + 1]] <- y[[i]]
names(xobj)[length(xobj)] <- names(y[i])
}
}
assign(xobjname, xobj, envir = parent.frame())
return(invisible())
}
stop("Can't append to an object of mode ",
mode(eval(xcall, envir = parent.frame())))
}
It works as intended with vector or lists, but the limit in its present form is that I can't append a value to a item inside a list, e.g.:
a <- list(a = 1, b = 2)
a$b %+=% 3
So far I haven't found how to do it. I've tried something like the following, but it has no effect:
assign("b", append(a$b, 3), envir = as.environment(a))
Any ideas?
Suggest not using assign and instead:
`%+=%`<- function(x, value) eval.parent(substitute(x <- append(x, value)))
x <- 3
x %+=% 5
x
## [1] 3 5
L <- list(a = 1, b = 2)
L %+=% 3
## List of 3
## $ a: num 1
## $ b: num 2
## $ : num 3
L <- list(a = 1, b = 2)
L$a %+=% 4
str(L)
## List of 2
## $ a: num [1:2] 1 4
## $ b: num 2
or try +<- syntax which avoids the eval:
`+<-` <- append
# test
x <- 3
+x <- 1
x
## [1] 3 1
# test
L<- list(a = 1, b = 2)
+L <- 10
str(L)
## List of 3
## $ a: num 1
## $ b: num 2
## $ : num 10
# test
L <- list(a = 1, b = 2)
+L$a <- 10
str(L)
## List of 2
## $ a: num [1:2] 1 10
## $ b: num 2
Or try this replacement function syntax which is similar to +<-.
`append<-` <- append
x <- 3
append(x) <- 7
## [1] 3 7
... etc ...

Using list elements in its' definition

I am having a relatively simple problem with R, which I hope we could find a solution to.
My aim is to define a following list, in which the c element should be the sum of a and b elements defined previously:
ex.list = list(
a = 1,
b = 2,
c = a+b
)
Code throws an error (Error: object 'a' not found), indicating that we cannot use the a and b elements defined just above.
Of course we can simply count the sum out of list definition
ex.list = list(
a = 1,
b = 2
)
ex.list$c = ex.list$a + ex.list$b
Or use another elements in creating the list
a.ex = 1
b.ex = 2
ex.list = list(
a = a.ex,
b = b.ex,
c = a.ex+b.ex
)
Unfortunately, I am not interested in the above solutions. Is there any way to do the sum in the list definition?
You can write your own list function that does lazy evaluation:
lazyList <- function(...) {
tmp <- match.call(expand.dots = FALSE)$`...`
lapply(tmp, eval, envir = tmp)
}
lazyList(
a = 1,
b = 2,
c = a+b
)
#$a
#[1] 1
#
#$b
#[1] 2
#
#$c
#[1] 3
However, obviously, the following is not possible with lazy evaluation:
lazyList(
a = 1,
b = 2,
d = c * a,
c = a+b
)
No, you can't do that. But you can do mad things like this:
> (function(a,b,c=a+b){list(a=a,b=b,c=c)})(11,22)
$a
[1] 11
$b
[1] 22
$c
[1] 33
But really, if you have a list you wish to construct in a particular way, write a function to do it. Its not difficult.

Keep all names from list to data.frame

When converting a list into a data.frame, R names the variables automatically by concatenating all the sublists names. However it appears that it only keeps the last name when a list is of length 1. Is there a way to enforce a full path name for the variable name?
MWE:
> l <- list(a = list(b = 1), c = 2)
> l
$a
$a$b
[1] 1
$c
[1] 2
> data.frame(l)
b c
1 1 2
> ll <- list(a = list(b = 1, bb = 1), c = 2)
> data.frame(ll)
a.b a.bb c
1 1 1 2
Here I would like to have a.b as the name of the variable of data.frame(l) like it does for data.frame(ll).
A possible solution is to create a function that converts the list into a data frame with as.data.frame() and then sets the names to the desired values in a second step:
list_df <- function(list) {
df <- as.data.frame(list)
names(df) <- list_names(list)
return (df)
}
Obviously, defining list_names() is the hard part. One possibility is to recurse through the nested lists:
list_names <- function(list) {
recursor <- function(list, names) {
if (is.list(list)) {
new_names <- paste(names, names(list), sep = ".")
out <- unlist(mapply(list, new_names, FUN = recursor))
} else {
out <- names
}
return(out)
}
new_names <- unlist(mapply(list, names(list), FUN = recursor))
return(new_names)
}
This works for your two examples:
l <- list(a = list(b = 1), c = 2)
ll <- list(a = list(b = 1, bb = 1), c = 2)
list_df(l)
## a.b c
## 1 1 2
list_df(ll)
## a.b a.bb c
## 1 1 1 2
It also works for a list that is not nested, as well as for a list with deeper nesting:
ls <- list(a = 1, b = 3)
lc <- list(a = list(b = 1, bb = 1), c = 2, d = list(e = list(f = 1, ff = 2), ee = list(fff = 5)))
list_df(ls)
## a b
## 1 1 3
list_df(lc)
## a.b a.bb c d.e.f d.e.ff d.ee.fff
## 1 1 1 2 1 2 5

Find the indices of an element in a nested list?

I have a list like:
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3))
is there an (loop-free) way to identify the positions of the elements, e.g. if I want to replace a values of "C" with 5, and it does not matter where the element "C" is found, can I do something like:
Aindex <- find_index("A", mylist)
mylist[Aindex] <- 5
I have tried grepl, and in the current example, the following will work:
mylist[grepl("C", mylist)][[1]][["C"]]
but this requires an assumption of the nesting level.
The reason that I ask is that I have a deep list of parameter values, and a named vector of replacement values, and I want to do something like
replacements <- c(a = 1, C = 5)
for(i in names(replacements)){
indx <- find_index(i, mylist)
mylist[indx] <- replacements[i]
}
this is an adaptation to my previous question, update a node (of unknown depth) using xpath in R?, using R lists instead of XML
One method is to use unlist and relist.
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3))
tmp <- as.relistable(mylist)
tmp <- unlist(tmp)
tmp[grep("(^|.)C$",names(tmp))] <- 5
tmp <- relist(tmp)
Because list names from unlist are concatenated with a ., you'll need to be careful with grep and how your parameters are named. If there is not a . in any of your list names, this should be fine. Otherwise, names like list(.C = 1) will fall into the pattern and be replaced.
Based on this question, you could try it recursively like this:
find_and_replace <- function(x, find, replace){
if(is.list(x)){
n <- names(x) == find
x[n] <- replace
lapply(x, find_and_replace, find=find, replace=replace)
}else{
x
}
}
Testing in a deeper mylist:
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3, d = list(C=10, D=55)))
find_and_replace(mylist, "C", 5)
$a
[1] 1
$b
$b$A
[1] 1
$b$B
[1] 2
$c
$c$C ### it worked
[1] 5
$c$D
[1] 3
$c$d
$c$d$C ### it worked
[1] 5
$c$d$D
[1] 55
This can now also be done using rrapply in the rrapply-package (an extended version of base rapply). To return the position of an element in the nested list based on its name, we can use the special arguments .xpos and .xname. For instance, to look up the position of the element with name "C":
library(rrapply)
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3))
## get position C-node
(Cindex <- rrapply(mylist, condition = function(x, .xname) .xname == "C", f = function(x, .xpos) .xpos, how = "unlist"))
#> c.C1 c.C2
#> 3 1
We could then update its value in the nested list with:
## update value C-node
mylist[[Cindex]] <- 5
The two steps can also be combined directly in the call to rrapply:
rrapply(mylist, condition = function(x, .xname) .xname == "C", f = function(x) 5, how = "replace")
#> $a
#> [1] 1
#>
#> $b
#> $b$A
#> [1] 1
#>
#> $b$B
#> [1] 2
#>
#>
#> $c
#> $c$C
#> [1] 5
#>
#> $c$D
#> [1] 3

Resources