How to subset an environment by its variable names in r - r

I would like to subset an environment by its variable names.
e <- new.env(parent=emptyenv())
e$a <- 1
e$b <- 2
e$d <- 3
e[ls(e) %in% c("a","b", "c")]
### if e was a list, this would return the subset list(a=1, b=2)
I could not figure out how to subset elements of an environment by their names. Using lapply or eapply does not work either. What is the proper or easy way to subset an environment by its variable names?
Thank you.

Okay, after thinking this through a bit more, may I suggest:
mget(c("a","b"), envir=e)
#$a
#[1] 1
#
#$b
#[1] 2

My original solution is to use get() / mget() (maybe OP saw my deleted comment earlier). Then I noticed that OP had tried eapply(), so I thought about possible solutions with that. Here it is (with help of #thelatemail).
# try some different data type
e <- new.env(parent=emptyenv())
e$a <- 1:3
e$b <- matrix(1:4, 2)
e$c <- data.frame(x=letters[1:2],y=LETTERS[1:2])
You can use either of the following to collect objects in environment e into a list:
elst <- eapply(e, "[") ## my idea
elst <- eapply(e, identity) ## thanks to #thelatemail
elst <- as.list.environment(e) ## thanks to #thelatemail
#$a
#[1] 1 2 3
#$b
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4
#$c
# x y
#1 a A
#2 b B
The as.list.environment() can be seen as the inverse operation of list2env(). It is mentioned in the "See Also" part of ?list2env.
The result elst is just an ordinary list. There are various way to subset this list. For example:
elst[names(elst) %in% c("a","b")] ## no need to use "ls(e)" now
#$a
#[1] 1 2 3
#$b
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4

mget(ls(e)[ls(e) %in% c('a','b','d')], e)

The [ operator usually returns the same type of object as the original, so I guess you're expecting an environment, rather than a list. The same environment but with a different set of elements, or a new environment with the specified elements? Either way I think you'll end up iterating, e.g.,
f = new.env(parent=emptyenv())
for (elt in c("a", "b"))
f[[elt]] = e[[elt]]
Working with environments is not very idiomatic R code, which might explain why there is not a more elegant solution.

You can use rlang::env_get_list() to get a list of the bindings:
rlang::env_get_list(env=e, c("a","b"))
#$a
#[1] 1
#
#$b
#[1] 2
If you're trying to get an environment, rather than a list, I'm not sure how you would do that, other than just creating a new environment using the output of rlang::env_get_list().
If you want to include elements in your list that might not exist in the environment (like "c"), you have to specify a default value - otherwise you'll get an error:
env_get_list(env = e, c("a","b","c"))
#Error in env_get_list(env = e, c("a", "b", "c")) : argument "default" is missing, with no default
env_get_list(env = e, c("a","b","c"),default=NULL)
#$a
#[1] 1
#
#$b
#[1] 2
#
#$c
#NULL
I assume you don't want c at all, so I'd do something like:
temp <- c("a","b","c")[c("a","b","c") %in% env_names(e)]
temp
[1] "a" "b"
env_get_list(env=e,temp)
#$a
#[1] 1
#
#$b
#[1] 2

Related

How to unwrap list with access variables? [duplicate]

I have a vector like below
tmp <- c(a=1, b=2, c=3)
a b c
1 2 3
I want to flatten this vector to get only 1, 2, 3.
I tried unlist(tmp) but it still gives me the same result.
How to achieve that efficiently?
You just want to remove the names attribute from tmp. There are a number of ways to do that.
You can unname it.
unname(tmp)
# [1] 1 2 3
Or use a very common method for removing names, by setting them to NULL.
names(tmp) <- NULL
Or strip the attributes with as.vector.
as.vector(tmp)
# [1] 1 2 3
Or re-concatenate it without the names.
c(tmp, use.names=FALSE)
# [1] 1 2 3
Or use setNames.
setNames(tmp, NULL)
# [1] 1 2 3
There is a use case that the above does not cover:
tmp <- c(1,2,3)
names(tmp) <- c("a","b","c")
In this case you need to use both:
unlist(unname(tmp))

Storing unique values of each column (of a df) in list

It is straight forward to obtain unique values of a column using unique. However, I am looking to do the same but for multiple columns in a dataframe and store them in a list, all using base R. Importantly, it is not combinations I need but simply unique values for each individual column. I currently have the below:
# dummy data
df = data.frame(a = LETTERS[1:4]
,b = 1:4)
# for loop
cols = names(df)
unique_values_by_col = list()
for (i in cols)
{
x = unique(i)
unique_values_by_col[[i]] = x
}
The problem comes when displaying unique_values_by_col as it shows as empty. I believe the problem is i is being passed to the loop as a text not a variable.
Any help would be greatly appreciated. Thank you.
Why not avoid the for loop altogether using lapply:
lapply(df, unique)
Resulting in:
> $a
> [1] A B C D
> Levels: A B C D
> $b
> [1] 1 2 3 4
Or you have also apply that is specifically done to be run on column or line:
apply(df,2,unique)
result:
> apply(df,2,unique)
a b
[1,] "A" "1"
[2,] "B" "2"
[3,] "C" "3"
[4,] "D" "4"
thought if you want a list lapply return you a list so may be better
Your for loop is almost right, just needs one fix to work:
# for loop
cols = names(df)
unique_values_by_col = list()
for (i in cols) {
x = unique(df[[i]])
unique_values_by_col[[i]] = x
}
unique_values_by_col
# $a
# [1] A B C D
# Levels: A B C D
#
# $b
# [1] 1 2 3 4
i is just a character, the name of a column within df so unique(i) doesn't make sense.
Anyhow, the most standard way for this task is lapply() as shown by demirev.
Could this be what you're trying to do?
Map(unique,df)
Result:
$a
[1] A B C D
Levels: A B C D
$b
[1] 1 2 3 4

Using R to compare sub-elements within a string ... and summarize it so that there are no duplicate sub-elements

I have a string like this:
data <- c("A:B:C", "A:B", "E:F:G", "H:I:J", "B:C:D")
I want to convert this to a string of:
c("A:B:C:D", "E:F:G", "H:I:J")
The idea is that each element inside the string is another string of sub-elements (e.g. A, B, C) that have been pasted together (with sep=":"). Each element within the string is compared with all other elements to look for common sub-elements, and elements with common sub-elements are combined.
I don't care about the order of the string (or order of the sub-elements) FWIW.
Thanks for any help offered!
--
Answers so far...
I liked d.b's suggestion - not the least because it stayed in base R. However, with a more complicated larger set, it wasn't working perfectly until everything was run again. With an even more complicated dataset, re-running everything more than twice might be needed.
I had more difficulty with thelatemail's suggestion. I had to upgrade R to use lengths, and I then had to figure out how to get to the end point because the answer was incomplete. In any case, this was how I got to the end (I suspect there is a better way). This worked with a larger set without a hitch.
library(igraph)
spl <- strsplit(data,":")
combspl <- data.frame(
grp = rep(seq_along(spl),lengths(spl)),
val = unlist(spl)
)
cl <- clusters(graph.data.frame(combspl))$membership[-(1:length(spl))]
dat <- data.frame(cl) # after getting nowhere working with the list as formatted
dat[,2] <- row.names(dat)
a <- character(0)
for (i in 1:max(cl)) {
a[i] <- paste(paste0(dat[(dat[,1] == i),][,2]), collapse=":")
}
a
#[1] "A:B:C:D" "E:F:G" "H:I:J"
I'm going to leave this for now as is.
A possible application for the igraph library, if you think of your values as an edgelist of paired groups:
library(igraph)
spl <- strsplit(data,":")
combspl <- data.frame(
grp = rep(seq_along(spl),lengths(spl)),
val = unlist(spl)
)
cl <- clusters(graph.data.frame(combspl))$membership[-(1:length(spl))]
#A B C E F G H I J D
#1 1 1 2 2 2 3 3 3 1
split(names(cl),cl)
#$`1`
#[1] "A" "B" "C" "D"
#
#$`2`
#[1] "E" "F" "G"
#
#$`3`
#[1] "H" "I" "J"
Or as collapsed text:
sapply(split(names(cl),cl), paste, collapse=";")
# 1 2 3
#"A;B;C;D" "E;F;G" "H;I;J"
a = character(0)
for (i in 1:length(data)){
a[i] = paste(unique(unlist(strsplit(data[sapply(1:length(data), function(j)
any(unlist(strsplit(data[i],":")) %in% unlist(strsplit(data[j],":"))))],":"))), collapse = ":")
}
unique(a)
#[1] "A:B:C:D" "E:F:G" "H:I:J"

Dynamically creating named list in R

I need to create named lists dynamically in R as follows.
Suppose there is an array of names.
name_arr<-c("a","b")
And that there is an array of values.
value_arr<-c(1,2,3,4,5,6)
What I want to do is something like this:
list(name_arr[1]=value_arr[1:3])
But R throws an error when I try to do this. Any suggestions as to how to get around this problem?
you can use [[...]] to assign values to keys given by strings:
my.list <- list()
my.list[[name_arr[1]]] <- value_arr[1:3]
You could use setNames. Examples:
setNames(list(value_arr[1:3]), name_arr[1])
#$a
#[1] 1 2 3
setNames(list(value_arr[1:3], value_arr[4:6]), name_arr)
#$a
#[1] 1 2 3
#
#$b
#[1] 4 5 6
Or without setNames:
mylist <- list(value_arr[1:3])
names(mylist) <- name_arr[1]
mylist
#$a
#[1] 1 2 3
mylist <- list(value_arr[1:3], value_arr[4:6])
names(mylist) <- name_arr
mylist
#$a
#[1] 1 2 3
#
#$b
#[1] 4 5 6
Your code will throw a error. Because in list(A = B), A must be a name instead of an object.
You could convert a object to a name by function eval. Here is the example.
eval(parse(text = sprintf('list(%s = value_arr[1:3])',name_arr[1])))

How to automate making a list of lists in R

I can make this list by hand:
list( list(n=1) , list(n=2), list(n=3) )
But how do I automate this, for instance if I want n to go up to 10? I tried as.list(1:10), which firstly is a different type of data structure, and secondly I couldn't work out how to specify n.
I'm hoping the answer can be expanded to multiple element lists, e.g. all combinations of 1:3 and c('A','B'):
list( list(n=1,z='A') , list(n=2,z='A'), list(n=3,z='A'),
list(n=1,z='B') , list(n=2,z='B'), list(n=3,z='B') )
Background: I'll be using it along the lines of: lapply( outer_list, function(params) do.call(FUN,params) )
UPDATE:
It was difficult to choose which answer to give the tick to. I went with the expand.grid approach as it can scale to more than two parameters more easily; the use of mapply as shown in the comment makes the two examples above look reasonably compact and readable:
outer_list=with( expand.grid(n=1:10,stringsAsFactors=F),
mapply(list, n=n, SIMPLIFY=F)
)
outer_list=with( expand.grid(n=1:3,z=c('A','Z'), stringsAsFactors=F),
mapply(list, n=n, z=z, SIMPLIFY=F)
)
They violate the DRY principle, by repeating the parameter names in the mapply() call, which bothers me a little. So, when it bothers me enough I will use the alply call as shown in Sebastian's answer.
You don't need to expand using expand.grid.
L <- mapply(function(x, y) list("n"=x,"z"=y),
rep(1:10, each=10), LETTERS[1:10],
SIMPLIFY=FALSE)
EDIT (see comment below)
L <- mapply(function(x, y) list("n"=x,"z"=y),
rep(1:10, each=length(LETTERS[1:10])), LETTERS[1:10],
SIMPLIFY=FALSE)
vals <- expand.grid(n=1:3, z=c("A", "B"),
KEEP.OUT.ATTRS=FALSE, stringsAsFactors=FALSE)
library(plyr)
alply(vals, 1, as.list)
$`1`
$`1`$n
[1] 1
$`1`$z
[1] "A"
$`2`
$`2`$n
[1] 2
$`2`$z
[1] "A"
$`3`
$`3`$n
[1] 3
$`3`$z
[1] "A"
$`4`
$`4`$n
[1] 1
$`4`$z
[1] "B"
$`5`
$`5`$n
[1] 2
$`5`$z
[1] "B"
$`6`
$`6`$n
[1] 3
$`6`$z
[1] "B"
attr(,"split_type")
[1] "array"
attr(,"split_labels")
n z
1 1 A
2 2 A
3 3 A
4 1 B
5 2 B
6 3 B

Resources