Assign to list index in specific environment using `get` - r

If I make an environment with a list in it, and want to assign values to that list, why does the following fail when using get and assign?
res <- new.env()
res$calls <- vector("list", 100)
res$counter <- 1
## works fine
res$calls[[1]] <- 1
## Fails, why?
get("calls", envir=res)[[get("counter", envir=res)]] <- 2
## doesnt make the assignment
val <- get("calls", envir=res)[[get("counter", envir=res)]]
assign("val", 2, envir=res)

I think the following will address your issue:
get("calls", envir=res)[[get("counter", envir=res)]] <- 2 fails because get is not a replacement function. On the other hand res$calls[[1]] <- 1 is actually a replacement function which you can see if you type help('[[<-'). This is the function used when you make an assignment. I think the reason why get has no replacement counterpart i.e. (get<-) is that there is a specific function to do this, which is called assign (as per #TheTime 's comment).
For the second case val <- get("calls", envir=res)[[get("counter", envir=res)]] is created in the global environment. When you use assign("val", 2, envir=res) a res$val variable is created inside the res environment which you can see below:
> res$val
[1] 2
However, val remains the same on the global environment as 1:
> val
[1] 1
So, You probably won't be able to do the assignment with either get or assign. get won't allow it because it is not a replacement function and ?assign mentions:
assign does not dispatch assignment methods, so it cannot be used to set elements of vectors, names, attributes, etc.
So, you can just use the normal [[<- assignment method. #Frank in the comments provides a nice way like:
res[[ "calls" ]][[ res[["counter"]] ]] <- 2

Related

Global assignment of list elements within a function, using a variable name for the list

The title is a bit loose. I would like to create a function that takes a list name as a variable, and assigns elements to that list. So far, I can access the list's elements via its name in the my_fun function. I can then modify the list within the function (using the function parameter), and print to verify that the assignment has worked. Of course, I'm not saving any of this to the global environment, so when I call the list outside of the function, it hasn't changed.
my_list <- list('original_element')
my_fun <- function(some_list) {
# check that I can access the list using the parameter name
print(some_list[[1]])
# modify the list
some_list[[1]] <- 'element_assigned_in_function'
# check that the list was modified
print(some_list[[1]])
}
my_fun(my_list)
# [1] "original_element"
# [1] "element_assigned_in_function"
# of course, my_list hasn't been modified in the global environment
my_list
# [[1]]
# [1] "original_element"
My question is, how could you assign new elements to this list within the function and store them in the global environment? If I simply try to make the assignment global with <<-, the following error is thrown:
Error in some_list[[1]] <<- "element_assigned_in_function" :
object 'some_list' not found
I've tried to use assign and set envir = .GlobalEnv, but that doesn't work and I don't think assign can be used with list elements. I suspect that there may be some quoting/unquoting/use of expressions to get this to work, but I haven't been able to figure it out. Thank you.
First of all, I'll be clear that I do not encourage assigning values to variables from inside the function in global environment. Ideally, you should always return the value from the function which you want to change. However, just for demonstration purpose here is a way in which you can change the contents of the list from inside the function
my_fun <- function(some_list) {
list_name <- deparse(substitute(some_list))
some_list[[1]] <- 'element_assigned_in_function'
assign(list_name, some_list, .GlobalEnv)
}
my_list <- list('original_element')
my_fun(my_list)
my_list
#[[1]]
#[1] "element_assigned_in_function"

Assign a variable in R using another variable

I have to run 10's of different permutations with same structure but different base names for the output. to avoid having to keep replacing the whole character names within each formula, I was hoping to great a variable then use paste function to assign the variable to the name of the output..
Example:
var<-"Patient1"
(paste0("cells_", var, sep="") <- WhichCells(object=test, expression = test > 0, idents=c("patient1","patient2"))
The expected output would be a variable called "cells_Patient1"
Then for subsequent runs, I would just copy and paste these 2 lines and change var <-"Patient1" to var <-"Patient2"
[please note that I am oversimplifying the above step of WhichCells as it entails ~10 steps and would rather not have to replace "Patient1" by "Patient2" using Search and Replaced
Unfortunately, I am unable to crate the variable "cells_Patient1" using the above command. I am getting the following error:
Error in variable(paste0("cells_", var, sep = "")) <-
WhichCells(object = test, : target of assignment expands to
non-language object
Browsing stackoverflow, I couldn't find a solution. My understanding of the error is that R can't assign an object to a variable that is not a constant. Is there a way to bypass this?
1) Use assign like this:
var <- "Patient1"
assign(paste0("cells_", var), 3)
cells_Patient1
## [1] 3
2) environment This also works.
e <- .GlobalEnv
e[[ paste0("cells_", var) ]] <- 3
cells_Patient1
3) list or it might be better to make these variables into a list:
cells <- list()
cells[[ var ]] <- 3
cells[[ "Patient1" ]]
## [1] 3
Then we could easily iterate over all such variables. Replace sqrt with any suitable function.
lapply(cells, sqrt)
## $Patient1
## [1] 1.732051

Use of the <<- operator in R [duplicate]

I just finished reading about scoping in the R intro, and am very curious about the <<- assignment.
The manual showed one (very interesting) example for <<-, which I feel I understood. What I am still missing is the context of when this can be useful.
So what I would love to read from you are examples (or links to examples) on when the use of <<- can be interesting/useful. What might be the dangers of using it (it looks easy to loose track of), and any tips you might feel like sharing.
<<- is most useful in conjunction with closures to maintain state. Here's a section from a recent paper of mine:
A closure is a function written by another function. Closures are
so-called because they enclose the environment of the parent
function, and can access all variables and parameters in that
function. This is useful because it allows us to have two levels of
parameters. One level of parameters (the parent) controls how the
function works. The other level (the child) does the work. The
following example shows how can use this idea to generate a family of
power functions. The parent function (power) creates child functions
(square and cube) that actually do the hard work.
power <- function(exponent) {
function(x) x ^ exponent
}
square <- power(2)
square(2) # -> [1] 4
square(4) # -> [1] 16
cube <- power(3)
cube(2) # -> [1] 8
cube(4) # -> [1] 64
The ability to manage variables at two levels also makes it possible to maintain the state across function invocations by allowing a function to modify variables in the environment of its parent. The key to managing variables at different levels is the double arrow assignment operator <<-. Unlike the usual single arrow assignment (<-) that always works on the current level, the double arrow operator can modify variables in parent levels.
This makes it possible to maintain a counter that records how many times a function has been called, as the following example shows. Each time new_counter is run, it creates an environment, initialises the counter i in this environment, and then creates a new function.
new_counter <- function() {
i <- 0
function() {
# do something useful, then ...
i <<- i + 1
i
}
}
The new function is a closure, and its environment is the enclosing environment. When the closures counter_one and counter_two are run, each one modifies the counter in its enclosing environment and then returns the current count.
counter_one <- new_counter()
counter_two <- new_counter()
counter_one() # -> [1] 1
counter_one() # -> [1] 2
counter_two() # -> [1] 1
It helps to think of <<- as equivalent to assign (if you set the inherits parameter in that function to TRUE). The benefit of assign is that it allows you to specify more parameters (e.g. the environment), so I prefer to use assign over <<- in most cases.
Using <<- and assign(x, value, inherits=TRUE) means that "enclosing environments of the supplied environment are searched until the variable 'x' is encountered." In other words, it will keep going through the environments in order until it finds a variable with that name, and it will assign it to that. This can be within the scope of a function, or in the global environment.
In order to understand what these functions do, you need to also understand R environments (e.g. using search).
I regularly use these functions when I'm running a large simulation and I want to save intermediate results. This allows you to create the object outside the scope of the given function or apply loop. That's very helpful, especially if you have any concern about a large loop ending unexpectedly (e.g. a database disconnection), in which case you could lose everything in the process. This would be equivalent to writing your results out to a database or file during a long running process, except that it's storing the results within the R environment instead.
My primary warning with this: be careful because you're now working with global variables, especially when using <<-. That means that you can end up with situations where a function is using an object value from the environment, when you expected it to be using one that was supplied as a parameter. This is one of the main things that functional programming tries to avoid (see side effects). I avoid this problem by assigning my values to a unique variable names (using paste with a set or unique parameters) that are never used within the function, but just used for caching and in case I need to recover later on (or do some meta-analysis on the intermediate results).
One place where I used <<- was in simple GUIs using tcl/tk. Some of the initial examples have it -- as you need to make a distinction between local and global variables for statefullness. See for example
library(tcltk)
demo(tkdensity)
which uses <<-. Otherwise I concur with Marek :) -- a Google search can help.
On this subject I'd like to point out that the <<- operator will behave strangely when applied (incorrectly) within a for loop (there may be other cases too). Given the following code:
fortest <- function() {
mySum <- 0
for (i in c(1, 2, 3)) {
mySum <<- mySum + i
}
mySum
}
you might expect that the function would return the expected sum, 6, but instead it returns 0, with a global variable mySum being created and assigned the value 3. I can't fully explain what is going on here but certainly the body of a for loop is not a new scope 'level'. Instead, it seems that R looks outside of the fortest function, can't find a mySum variable to assign to, so creates one and assigns the value 1, the first time through the loop. On subsequent iterations, the RHS in the assignment must be referring to the (unchanged) inner mySum variable whereas the LHS refers to the global variable. Therefore each iteration overwrites the value of the global variable to that iteration's value of i, hence it has the value 3 on exit from the function.
Hope this helps someone - this stumped me for a couple of hours today! (BTW, just replace <<- with <- and the function works as expected).
f <- function(n, x0) {x <- x0; replicate(n, (function(){x <<- x+rnorm(1)})())}
plot(f(1000,0),typ="l")
The <<- operator can also be useful for Reference Classes when writing Reference Methods. For example:
myRFclass <- setRefClass(Class = "RF",
fields = list(A = "numeric",
B = "numeric",
C = function() A + B))
myRFclass$methods(show = function() cat("A =", A, "B =", B, "C =",C))
myRFclass$methods(changeA = function() A <<- A*B) # note the <<-
obj1 <- myRFclass(A = 2, B = 3)
obj1
# A = 2 B = 3 C = 5
obj1$changeA()
obj1
# A = 6 B = 3 C = 9
I use it in order to change inside map() an object in the global environment.
a = c(1,0,0,1,0,0,0,0)
Say I want to obtain a vector which is c(1,2,3,1,2,3,4,5), that is if there is a 1, let it 1, otherwise add 1 until the next 1.
map(
.x = seq(1,(length(a))),
.f = function(x) {
a[x] <<- ifelse(a[x]==1, a[x], a[x-1]+1)
})
a
[1] 1 2 3 1 2 3 4 5

R: passing by parameter to function and using apply instead of nested loop and recursive indexing failed

I have two lists of lists. humanSplit and ratSplit. humanSplit has element of the form::
> humanSplit[1]
$Fetal_Brain_408_AGTCAA_L001_R1_report.txt
humanGene humanReplicate alignment RNAtype
66 DGKI Fetal_Brain_408_AGTCAA_L001_R1_report.txt 6 reg
68 ARFGEF2 Fetal_Brain_408_AGTCAA_L001_R1_report.txt 5 reg
If you type humanSplit[[1]], it gives the data without name $Fetal_Brain_408_AGTCAA_L001_R1_report.txt
RatSplit is also essentially similar to humanSplit with difference in column order. I want to apply fisher's test to every possible pairing of replicates from humanSplit and ratSplit. Now I defined the following empty vector which I will use to store the informations of my fisher's test
humanReplicate <- vector(mode = 'character', length = 0)
ratReplicate <- vector(mode = 'character', length = 0)
pvalue <- vector(mode = 'numeric', length = 0)
For fisher's test between two replicates of humanSplit and ratSplit, I define the following function. In the function I use `geneList' which is a data.frame made by reading a file and has form:
> head(geneList)
human rat
1 5S_rRNA 5S_rRNA
2 5S_rRNA 5S_rRNA
Now here is the main function, where I use a function getGenetype which I already defined in other part of the code. Also x and y are integers :
fishertest <-function(x,y) {
ratReplicateName <- names(ratSplit[x])
humanReplicateName <- names(humanSplit[y])
## merging above two based on the one-to-one gene mapping as in geneList
## defined above.
mergedHumanData <-merge(geneList,humanSplit[[y]], by.x = "human", by.y = "humanGene")
mergedRatData <- merge(geneList, ratSplit[[x]], by.x = "rat", by.y = "ratGene")
## [here i do other manipulation with using already defined function
## getGenetype that is defined outside of this function and make things
## necessary to define following contingency table]
contingencyTable <- matrix(c(HnRn,HnRy,HyRn,HyRy), nrow = 2)
fisherTest <- fisher.test(contingencyTable)
humanReplicate <- c(humanReplicate,humanReplicateName )
ratReplicate <- c(ratReplicate,ratReplicateName )
pvalue <- c(pvalue , fisherTest$p)
}
After doing all this I do the make matrix eg to use in apply. Here I am basically trying to do something similar to double for loop and then using fisher
eg <- expand.grid(i = 1:length(ratSplit),j = 1:length(humanSplit))
junk = apply(eg, 1, fishertest(eg$i,eg$j))
Now the problem is, when I try to run, it gives the following error when it tries to use function fishertest in apply
Error in humanSplit[[y]] : recursive indexing failed at level 3
Rstudio points out problem in following line:
mergedHumanData <-merge(geneList,humanSplit[[y]], by.x = "human", by.y = "humanGene")
Ultimately, I want to do the following:
result <- data.frame(humanReplicate,ratReplicate, pvalue ,alternative, Conf.int1, Conf.int2, oddratio)
I am struggling with these questions:
In defining fishertest function, how should I pass ratSplit and humanSplit and already defined function getGenetype?
And how I should use apply here?
Any help would be much appreciated.
Up front: read ?apply. Additionally, the first three hits on google when searching for "R apply tutorial" are helpful snippets: one, two, and three.
Errors in fishertest()
The error message itself has nothing to do with apply. The reason it got as far as it did is because the arguments you provided actually resolved. Try to do eg$i by itself, and you'll see that it is returning a vector: the corresponding column in the eg data.frame. You are passing this vector as an index in the i argument. The primary reason your function erred out is because double-bracket indexing ([[) only works with singles, not vectors of length greater than 1. This is a great example of where production/deployed functions would need type-checking to ensure that each argument is a numeric of length 1; often not required for quick code but would have caught this mistake. Had it not been for the [[ limit, your function may have returned incorrect results. (I've been bitten by that many times!)
BTW: your code is also incorrect in its scoped access to pvalue, et al. If you make your function return just the numbers you need and the aggregate it outside of the function, your life will simplify. (pvalue <- c(pvalue, ...) will find pvalue assigned outside the function but will not update it as you want. You are defeating one purpose of writing this into a function. When thinking about writing this function, try to answer only this question: "how do I compare a single rat record with a single human record?" Only after that works correctly and simply without having to overwrite variables in the parent environment should you try to answer the question "how do I apply this function to all pairs and aggregate it?" Try very hard to have your function not change anything outside of its own environment.
Errors in apply()
Had your function worked properly despite these errors, you would have received the following error from apply:
apply(eg, 1, fishertest(eg$i, eg$j))
## Error in match.fun(FUN) :
## 'fishertest(eg$i, eg$j)' is not a function, character or symbol
When you call apply in this sense, it it parsing the third argument and, in this example, evaluates it. Since it is simply a call to fishertest(eg$i, eg$j) which is intended to return a data.frame row (inferred from your previous question), it resolves to such, and apply then sees something akin to:
apply(eg, 1, data.frame(...))
Now that you see that apply is being handed a data.frame and not a function.
The third argument (FUN) needs to be a function itself that takes as its first argument a vector containing the elements of the row (1) or column (2) of the matrix/data.frame. As an example, consider the following contrived example:
eg <- data.frame(aa = 1:5, bb = 11:15)
apply(eg, 1, mean)
## [1] 6 7 8 9 10
# similar to your use, will not work; this error comes from mean not getting
# any arguments, your error above is because
apply(eg, 1, mean())
## Error in mean.default() : argument "x" is missing, with no default
Realize that mean is a function itself, not the return value from a function (there is more to it, but this definition works). Because we're iterating over the rows of eg (because of the 1), the first iteration takes the first row and calls mean(c(1, 11)), which returns 6. The equivalent of your code here is mean()(c(1, 11)) will fail for a couple of reasons: (1) because mean requires an argument and is not getting, and (2) regardless, it does not return a function itself (in a "functional programming" paradigm, easy in R but uncommon for most programmers).
In the example here, mean will accept a single argument which is typically a vector of numerics. In your case, your function fishertest requires two arguments (templated by my previous answer to your question), which does not work. You have two options here:
Change your fishertest function to accept a single vector as an argument and parse the index numbers from it. Bothing of the following options do this:
fishertest <- function(v) {
x <- v[1]
y <- v[2]
ratReplicateName <- names(ratSplit[x])
## ...
}
or
fishertest <- function(x, y) {
if (missing(y)) {
y <- x[2]
x <- x[1]
}
ratReplicateName <- names(ratSplit[x])
## ...
}
The second version allows you to continue using the manual form of fishertest(1, 57) while also allowing you to do apply(eg, 1, fishertest) verbatim. Very readable, IMHO. (Better error checking and reporting can be used here, I'm just providing a MWE.)
Write an anonymous function to take the vector and split it up appropriately. This anonymous function could look something like function(ii) fishertest(ii[1], ii[2]). This is typically how it is done for functions that either do not transform as easily as in #1 above, or for functions you cannot or do not want to modify. You can either assign this intermediary function to a variable (which makes it no longer anonymous, figure that) and pass that intermediary to apply, or just pass it directly to apply, ala:
.func <- function(ii) fishertest(ii[1], ii[2])
apply(eg, 1, .func)
## equivalently
apply(eg, 1, function(ii) fishertest(ii[1], ii[2]))
There are two reasons why many people opt to name the function: (1) if the function is used multiple times, better to define once and reuse; (2) it makes the apply line easier to read than if it contained a complex multi-line function definition.
As a side note, there are some gotchas with using apply and family that, if you don't understand, will be confusing. Not the least of which is that when your function returns vectors, the matrix returned from apply will need to be transposed (with t()), after which you'll still need to rbind or otherwise aggregrate.
This is one area where using ddply may provide a more readable solution. There are several tutorials showing it off. For a quick intro, read this; for a more in depth discussion on the bigger picture in which ddply plays a part, read Hadley's Split, Apply, Combine Strategy for Data Analysis paper from JSS.

R anonymous function: capture variables by value

I've defined a list of anonymous functions which use a variable defined in an outer scope.
funclist <- list()
for(i in 1:5)
{
funclist[[i]] <- function(x) print(i)
}
funclist[[1]]('foo')
The output is:
[1] 5
It seems that i is captured by reference. I'd like it to be captured by value, i.e. the output should be
[1] 1
Is there a way to tell R to capture i by value rather than by reference?
When you run a for loop, this creates a variable in the environment the loop is run in, and functions created in the loop are also run from this environment. So whenever you run the functions created in this way that use the index value from the loop, they only have access to the final value, and only as long as that varaible remains (try rm(i) and attempting to fun one of the functions in the list).
What you need to do is bind the index value to the function in their own environment. lapply will do this for you automatically. However, there is a gotcha with lazy evaluation. What you have to do is also force the evaluation of i before creating the anonymous function:
funclist <- lapply(1:5, function(i) {force(i); function(x) print(i)})
funclist[[1]]('foo')
[1] 1
funclist[[5]]('foo')
[1] 5
My read on what you want is to store a value inside the function's environment when the function is defined, and then hold onto that value for internal computations.
For that, you want a closure:
i <- 3
test <- local({
i <- i
function(x) x[i]
})
test(letters[1:5]) # returns 'c'
i <- 5
test(letters[1:5]) # still returns 'c' (i.e. i is local to the test closure)
Is that what you wanted?

Resources