When converting a list into a data.frame, R names the variables automatically by concatenating all the sublists names. However it appears that it only keeps the last name when a list is of length 1. Is there a way to enforce a full path name for the variable name?
MWE:
> l <- list(a = list(b = 1), c = 2)
> l
$a
$a$b
[1] 1
$c
[1] 2
> data.frame(l)
b c
1 1 2
> ll <- list(a = list(b = 1, bb = 1), c = 2)
> data.frame(ll)
a.b a.bb c
1 1 1 2
Here I would like to have a.b as the name of the variable of data.frame(l) like it does for data.frame(ll).
A possible solution is to create a function that converts the list into a data frame with as.data.frame() and then sets the names to the desired values in a second step:
list_df <- function(list) {
df <- as.data.frame(list)
names(df) <- list_names(list)
return (df)
}
Obviously, defining list_names() is the hard part. One possibility is to recurse through the nested lists:
list_names <- function(list) {
recursor <- function(list, names) {
if (is.list(list)) {
new_names <- paste(names, names(list), sep = ".")
out <- unlist(mapply(list, new_names, FUN = recursor))
} else {
out <- names
}
return(out)
}
new_names <- unlist(mapply(list, names(list), FUN = recursor))
return(new_names)
}
This works for your two examples:
l <- list(a = list(b = 1), c = 2)
ll <- list(a = list(b = 1, bb = 1), c = 2)
list_df(l)
## a.b c
## 1 1 2
list_df(ll)
## a.b a.bb c
## 1 1 1 2
It also works for a list that is not nested, as well as for a list with deeper nesting:
ls <- list(a = 1, b = 3)
lc <- list(a = list(b = 1, bb = 1), c = 2, d = list(e = list(f = 1, ff = 2), ee = list(fff = 5)))
list_df(ls)
## a b
## 1 1 3
list_df(lc)
## a.b a.bb c d.e.f d.e.ff d.ee.fff
## 1 1 1 2 1 2 5
Related
I'm working on a project where I have to apply the same transformation to multiple variables. For example
a <- a + 1
b <- b + 1
d <- d + 1
e <- e + 1
I can obviously perform the operations in sequence using
for (i in c(a, b, d, e)) i <- i + 1
However, I can't actually assign the result to each variable this way, since i is a copy of each variable, not a reference.
Is there a way to do this? Obviously, it'd be easier if the variables were merged in a data.frame or something, but that's not possible.
Usually if you find yourself doing the same thing to multiple objects, they should be stored / thought-of as single object with sub-components. You say that storing these as a data.frame is not possible, so you can use a list instead. This allows you to use lapply/sapply to apply a function to each element of the list in one step.
a <- c(1, 2, 3)
b <- c(1, 4)
c <- 5
d <- rnorm(10)
e <- runif(5)
lstt <- list(a = a, b = b, c = c, d = d, e = e)
lstt$a
# [1] 1 2 3
lstt <- lapply(lstt, '+', 1)
lstt$a
# [1] 2 3 4
The question states that the variables to increment cannot be in a larger structure but then in the comments it is stated that that is not so after all so we will assume they are in a list L.
L <- list(a = 1, b = 2, d = 3, e = 4) # test data
for(nm in names(L)) L[[nm]] <- L[[nm]] + 1
# or
L <- lapply(L, `+`, 1)
# or
L <- lapply(L, function(x) x + 1)
Scalars
If they are all scalars then they can be put in an ordinary vector:
v <- c(a = 1, b = 2, d = 3, e = 4)
v <- v + 1
Vectors
If they are all vectors of the same length they can be put in data frame or if they are also of the same type they can be put in a matrix in which case we can also add 1 to it.
Environment
If the variables do have to be free in an environment then if nms is a vector of the variable names then we can iterate over the names and use those names to subscript the environment env. If the names follow some pattern we may be able to use nms <- ls(pattern = "...", envir = env) or if they are the only variables in that environment we can use nms <- ls(env).
a <- b <- d <- e <- 1 # test data
env <- .GlobalEnv # can change this if not being done in global envir
nms <- c("a", "b", "d", "e")
for(nm in nms) env[[nm]] <- env[[nm]] + 1
a;b;d;e # check
## [1] 2
## [1] 2
## [1] 2
## [1] 2
I am trying to remove a named component from a list, using within and rm. This works for a single component, but not for two or more. I am completely befuddled.
For example - this works
aa = list(a = 1:3, b = 2:5, cc = 1:5)
within(aa, {rm(a)})
the output from within will have just the non-removed components.
However, this does not:
aa = list(a = 1:3, b = 2:5, cc = 1:5)
within(aa, {rm(a); rm(b)})
Neither does this:
within(aa, {rm(a, b)})
The output from within will have all the components, with the ones I am trying to remove, set to NULL. Why?
First, note the following behavior:
> aa = list(a = 1:3, b = 2:5, cc = 1:5)
>
> aa[c('a', 'b')] <- NULL
>
> aa
# $cc
# [1] 1 2 3 4 5
> aa = list(a = 1:3, b = 2:5, cc = 1:5)
>
> aa[c('a', 'b')] <- list(NULL, NULL)
>
> aa
# $a
# NULL
#
# $b
# NULL
#
# $cc
# [1] 1 2 3 4 5
Now let's look at the code for within.list:
within.list <- function (data, expr, ...)
{
parent <- parent.frame()
e <- evalq(environment(), data, parent)
eval(substitute(expr), e)
l <- as.list(e)
l <- l[!sapply(l, is.null)]
nD <- length(del <- setdiff(names(data), (nl <- names(l))))
data[nl] <- l
if (nD)
data[del] <- if (nD == 1) NULL else vector("list", nD)
data
}
Look in particular at the second to last line of the function. If the number of deleted items in the list is greater than one, the function is essentially calling aa[c('a', 'b')] <- list(NULL, NULL), because vector("list", 2) creates a two item list where each item is NULL. We can create our own version of within where we remove the else statement from the second to last line of the function:
mywithin <- function (data, expr, ...)
{
parent <- parent.frame()
e <- evalq(environment(), data, parent)
eval(substitute(expr), e)
l <- as.list(e)
l <- l[!sapply(l, is.null)]
nD <- length(del <- setdiff(names(data), (nl <- names(l))))
data[nl] <- l
if (nD) data[del] <- NULL
data
}
Now let's test it:
> aa = list(a = 1:3, b = 2:5, cc = 1:5)
>
> mywithin(aa, rm(a, b))
# $cc
# [1] 1 2 3 4 5
Now it works as expected!
dplyr::mutate() works the same way as plyr::mutate() and similarly to base::transform(). The key difference between mutate() and transform() is that mutate allows you to refer to columns that you just created. - Introduction to dplyr
There are some differences between the mutate function in dplyr and plyr. The main difference is of course that plyr::mutate can be applied to lists and dplyr::mutate is faster.
Moreover, when referring to the just created columns, plyr cannot reassign them again, but dplyr does.
# creating a temporary variable and removing it later
plyr::mutate(data.frame(a = 2), tmp = a, c = a*tmp, tmp = NULL)
## a tmp c
## 1 2 2 4
dplyr::mutate(data.frame(a = 2), tmp = a, c = a*tmp, tmp = NULL)
## a c
## 1 2 4
# creating a temporery variable and changing it later
plyr::mutate(data.frame(a = 2), b = a, c = a*b, b = 1)
## a b c
## 1 2 2 4
dplyr::mutate(data.frame(a = 2), b = a, c = a*b, b = 1)
## a b c
## 1 2 1 4
Now I am looking for the functionality of the dplyr mutate function for list objects. So I am looking for a function that mutates a list and can reassign just created variables.
plyr::mutate(list(a = 2), b = a, c = a*b, b = 1)
## $a
## [1] 2
##
## $b
## [1] 2
##
## $c
## [1] 4
dplyr::mutate(list(a = 2), b = a, c = a*b, b = 1)
## Error in UseMethod("mutate_") :
## no applicable method for 'mutate_' applied to an object of class "list"
desired_mutate(list(a = 2), b = a, c = a*b, b = 1)
## $a
## [1] 2
##
## $b
## [1] 1
##
## $c
## [1] 4
I realize that in this simple case, I can just use
plyr::mutate(list(a = 2), c = {b = a; a*b})
But in my actual use case, I assign random numbers to a temporary variable and would like to remove it afterwards. Something like the following:
desired_mutate(list(a = c(1, 2, 5, 2)),
tmp = runif(length(a)),
b = tmp * a,
c = tmp + a,
tmp = NULL)
Corrected original for loop in mutate function (using cols position instead of names):
desired_mutate <- function (.data, ...)
{
stopifnot(is.data.frame(.data) || is.list(.data) || is.environment(.data))
cols <- as.list(substitute(list(...))[-1])
cols <- cols[names(cols) != ""]
col_names <- names(cols)
for (i in seq_along(col_names) ) {
if(!is.null(cols[[i]])) {
.data[[col_names[i]]] <- eval(cols[[i]], .data, parent.frame())
} else {
.data[[col_names[i]]] <- NULL
}
}
.data
}
Test:
> str( desired_mutate(list(a = c(1, 2, 5, 2)),
+ tmp = runif(length(a)),
+ b = tmp * a,
+ c = tmp + a,
+ tmp = NULL) )
List of 3
$ a: num [1:4] 1 2 5 2
$ b: num [1:4] 0.351 1.399 3.096 1.4
$ c: num [1:4] 1.35 2.7 5.62 2.7
I'm trying to loop through a bunch of datasets and change columns in R.
I have a bunch of datasets, say a,b,c,etc, and all of them have three columns, say X, Y, Z.
I would like to change their names to be a_X, a_Y, a_Z for dataset a, and b_X, b_Y, b_Z for dataset b, and so on.
Here's my code:
name.list = ("a","b","c")
for(i in name.list){
names(i) = c(paste(i,"_X",sep = ""),paste(i,"_Y",sep = ""),paste(i,"_Y",sep = ""));
}
However, the code above doesn't work since i is in text format.
I've considered assign function but doesn't seem to fit as well.
I would appreciate if any ideas.
Something like this :
list2env(lapply(mget(name.list),function(dat){
colnames(dat) <- paste(nn,colnames(dat),sep='_')
dat
}),.GlobalEnv)
for ( i in name.list) {
assign(i, setNames( get(i), paste(i, names(get(i)), sep="_")))
}
> a
a_X a_Y a_Z
1 1 3 A
2 2 4 B
> b
b_X b_Y b_Z
1 1 3 A
2 2 4 B
> c
c_X c_Y c_Z
1 1 3 A
2 2 4 B
Here's some free data:
a <- data.frame(X = 1, Y = 2, Z = 3)
b <- data.frame(X = 4, Y = 5, Z = 6)
c <- data.frame(X = 7, Y = 8, Z = 9)
And here's a method that uses mget and a custom function foo
name.list <- c("a", "b", "c")
foo <- function(x, i) setNames(x, paste(name.list[i], names(x), sep = "_"))
list2env(Map(foo, mget(name.list), seq_along(name.list)), .GlobalEnv)
a
# a_X a_Y a_Z
# 1 1 2 3
b
# b_X b_Y b_Z
# 1 4 5 6
c
# c_X c_Y c_Z
# 1 7 8 9
You could also avoid get or mget by putting a, b, and c into their own environment (or even a list). You also wouldn't need the name.list vector if you go this route, because it's the same as ls(e)
e <- new.env()
e$a <- a; e$b <- b; e$c <- c
bar <- function(x, y) setNames(x, paste(y, names(x), sep = "_"))
list2env(Map(bar, as.list(e), ls(e)), .GlobalEnv)
Another perk of doing it this way is that you still have the untouched data frames in the environment e. Nothing was overwritten (check a versus e$a).
I have a list like:
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3))
is there an (loop-free) way to identify the positions of the elements, e.g. if I want to replace a values of "C" with 5, and it does not matter where the element "C" is found, can I do something like:
Aindex <- find_index("A", mylist)
mylist[Aindex] <- 5
I have tried grepl, and in the current example, the following will work:
mylist[grepl("C", mylist)][[1]][["C"]]
but this requires an assumption of the nesting level.
The reason that I ask is that I have a deep list of parameter values, and a named vector of replacement values, and I want to do something like
replacements <- c(a = 1, C = 5)
for(i in names(replacements)){
indx <- find_index(i, mylist)
mylist[indx] <- replacements[i]
}
this is an adaptation to my previous question, update a node (of unknown depth) using xpath in R?, using R lists instead of XML
One method is to use unlist and relist.
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3))
tmp <- as.relistable(mylist)
tmp <- unlist(tmp)
tmp[grep("(^|.)C$",names(tmp))] <- 5
tmp <- relist(tmp)
Because list names from unlist are concatenated with a ., you'll need to be careful with grep and how your parameters are named. If there is not a . in any of your list names, this should be fine. Otherwise, names like list(.C = 1) will fall into the pattern and be replaced.
Based on this question, you could try it recursively like this:
find_and_replace <- function(x, find, replace){
if(is.list(x)){
n <- names(x) == find
x[n] <- replace
lapply(x, find_and_replace, find=find, replace=replace)
}else{
x
}
}
Testing in a deeper mylist:
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3, d = list(C=10, D=55)))
find_and_replace(mylist, "C", 5)
$a
[1] 1
$b
$b$A
[1] 1
$b$B
[1] 2
$c
$c$C ### it worked
[1] 5
$c$D
[1] 3
$c$d
$c$d$C ### it worked
[1] 5
$c$d$D
[1] 55
This can now also be done using rrapply in the rrapply-package (an extended version of base rapply). To return the position of an element in the nested list based on its name, we can use the special arguments .xpos and .xname. For instance, to look up the position of the element with name "C":
library(rrapply)
mylist <- list(a = 1, b = list(A = 1, B = 2), c = list(C = 1, D = 3))
## get position C-node
(Cindex <- rrapply(mylist, condition = function(x, .xname) .xname == "C", f = function(x, .xpos) .xpos, how = "unlist"))
#> c.C1 c.C2
#> 3 1
We could then update its value in the nested list with:
## update value C-node
mylist[[Cindex]] <- 5
The two steps can also be combined directly in the call to rrapply:
rrapply(mylist, condition = function(x, .xname) .xname == "C", f = function(x) 5, how = "replace")
#> $a
#> [1] 1
#>
#> $b
#> $b$A
#> [1] 1
#>
#> $b$B
#> [1] 2
#>
#>
#> $c
#> $c$C
#> [1] 5
#>
#> $c$D
#> [1] 3