Setting Global variables inside reference class in R - r

I'm a bit confused on global variable assignments after reading quite a lot of stack overflow questions. I have gone through
Global variables in R and other similar questions
I have the following situation. I have 2 global variables current_idx and previous_idx. These 2 global variables are being set by a method in a reference class.
Essentially, using <<- assignment operator should work right ? But, I get this warning
Non-local assignment to non-field names (possibly misspelled?)
Where am I going wrong ?
EDIT
Using assign(current_idx, index, envir = .GlobalEnv) works i.e. I do not get the warning. Can some one shed some light on this.

You are confusing "global variables" and Reference Classes which are a type of environment. Executing <<- will assign to a variable with that name in the parent.frame of the function. If you are only one level down from the .GlobalEnv, it will do the same thing as your assign statement.
If you have a Reference Class item you can assign items inside it by name with:
ref_item$varname <- value
Easier said than done, though. First you need to set up the ReferenceClass properly:
http://www.inside-r.org/r-doc/methods/ReferenceClasses

This is happening because the default method for modifying fields of a reference class from within a reference class method is to use <<-. For example, in:
setRefClass(
"myClass",
fields=list(a="integer"),
methods=list(setA=function(x) a <<- x)
)
You can modify the a field of your reference class via the setA method. Because this is the canonical way of setting fields via methods in reference classes, R assumes that any other use of <<- within a reference method is a mistake. So if you try to assign to a variable that exists in an environment other than the reference class, R "helpfully" warns you that maybe you have a typo since it thinks the only probably use of <<- in a reference method is to modify a reference field.
You can still just assign to global objects with <<-. The warning is just a warning that maybe you are doing something you didn't intend to do. If you intended to write to an object in the global environment, then the warning doesn't apply.
By using assign you are bypassing the check that reference methods carry out to make sure you are not accidentally typoing a field name in an assignment within the reference method, so you don't get the warning. Also, note that assign actually targets the environment you supply, whereas <<- will just find the first object of that name in the lexical search path.
All this said, there are really rare instances where you actually want a reference method do be writing directly to the global environment. You may need to rethink what you are doing. You should ask yourself why those two variables are not just fields in the reference class instead of global variables.

Related

Set an argument of a function as the name of a global variable defined within a function in R

I would like to set a functions argument as the variable name of a global variable defined within a function.
The reason is to create a general function which, connects to a database downloads data into a global variable with the argument of the function as its name.
This would allow to connect to a database once, download data, store them in a variable with a defined name and use it outside of the function. (Alternatively I am also open for different approaches)
test=function(name_argument){
substitute(name_argument)<<-2
}
test("name")
name->2 | The global variable should be callable with the variable name
I have tried using assign(), substitute(), eval() in various forms, without any success.
Does anyone a) know a solution to this and b) can describe the logic behind it. (For example why does substitute seemingly not work with global variables)

My defined R function does not 'save' the changes made to a matrix [duplicate]

I'm just getting my feet wet in R and was surprised to see that a function doesn't modify an object, at least it seems that's the default. For example, I wrote a function just to stick an asterisk on one label in a table; it works inside the function but the table itself is not changed. (I'm coming mainly from Ruby)
So, what is the normal, accepted way to use functions to change objects in R? How would I add an asterisk to the table title?
Replace the whole object: myTable = title.asterisk(myTable)
Use a work-around to call by reference (as described, for example, in Call by reference in R by TszKin Julian?
Use some structure other than a function? An object method?
The reason you're having trouble is the fact that you are passing the object into the local namespace of the function. This is one of the great / terrible things about R: it allows implicit variable declarations and then implements supercedence as the namespaces become deeper.
This is affecting you because a function creates a new namespace within the current namespace. The object 'myTable' was, I assume, originally created in the global namespace, but when it is passed into the function 'title.asterisk' a new function-local namespace now has an object with the same properties. This works like so:
title.asterisk <- function(myTable){ do some stuff to 'myTable' }
In this case, the function 'title.asterisk' does not make any changes to the global object 'myTable'. Instead, a local object is created with the same name, so the local object supercedes the global object. If we call the function title.asterisk(myTable) in this way, the function makes changes only to the local variable.
There are two direct ways to modify the global object (and many indirect ways).
Option 1: The first, as you mention, is to have the function return the object and overwrite the global object, like so:
title.asterisk <- function(myTable){
do some stuff to 'myTable'
return(myTable)
}
myTable <- title.asterisk(myTable)
This is okay, but you are still making your code a little difficult to understand, since there are really two different 'myTable' objects, one global and one local to the function. A lot of coders clear this up by adding a period '.' in front of variable arguments, like so:
title.asterisk <- function(.myTable){
do some stuff to '.myTable'
return(.myTable)
}
myTable <- title.asterisk(myTable)
Okay, now we have a visual cue that the two variables are different. This is good, because we don't want to rely on invisible things like namespace supercedence when we're trying to debug our code later. It just makes things harder than they have to be.
Option 2: You could just modify the object from within the function. This is the better option when you want to do destructive edits to an object and don't want memory inflation. If you are doing destructive edits, you don't need to save an original copy. Also, if your object is suitably large, you don't want to be copying it when you don't have to. To make edits to a global namespace object, simply don't pass it into or declare it from within the function.
title.asterisk <- function(){ do some stuff to 'myTable' }
Now we are making direct edits to the object 'myTable' from within the function. The fact that we aren't passing the object makes our function look to higher levels of namespace to try and resolve the variable name. Lo, and behold, it finds a 'myTable' object higher up! The code in the function makes the changes to the object.
A note to consider: I hate debugging. I mean I really hate debugging. This means a few things for me in R:
I wrap almost everything in a function. As I write my code, as soon as I get a piece working, I wrap it in a function and set it aside. I make heavy use of the '.' prefix for all my function arguments and use no prefix for anything that is native to the namespace it exists in.
I try not to modify global objects from within functions. I don't like where this leads. If an object needs to be modified, I modify it from within the function that declared it. This often means I have layers of functions calling functions, but it makes my work both modular and easy to understand.
I comment all of my code, explaining what each line or block is intended to do. It may seem a bit unrelated, but I find that these three things go together for me. Once you start wrapping coding in functions, you will find yourself wanting to reuse more of your old code. That's where good commenting comes in. For me, it's a necessary piece.
The two paradigms are replacing the whole object, as you indicate, or writing 'replacement' functions such as
`updt<-` <- function(x, ..., value) {
## x is the object to be manipulated, value the object to be assigned
x$lbl <- paste0(x$lbl, value)
x
}
with
> d <- data.frame(x=1:5, lbl=letters[1:5])
> d
x lbl
1 1 a
2 2 b
3 3 c
> updt(d) <- "*"
> d
x lbl
1 1 a*
2 2 b*
3 3 c*
This is the behavior of, for instance, $<- -- in-place update the element accessed by $. Here is a related question. One could think of replacement functions as syntactic sugar for
updt1 <- function(x, ..., value) {
x$lbl <- paste0(x$lbl, value)
x
}
d <- updt1(d, value="*")
but the label 'syntactic sugar' doesn't really do justice, in my mind, to the central paradigm that is involved. It is enabling convenient in-place updates, which is different from the copy-on-change illusion that R usually maintains, and it is really the 'R' way of updating objects (rather than using ?ReferenceClasses, for instance, which have more of the feel of other languages but will surprise R users expecting copy-on-change semantics).
For anybody in the future looking for a simple way (do not know if it is the more appropriate one) to get this solved:
Inside the function create the object to temporally save the modified version of the one you want to change. Use deparse(substitute()) to get the name of the variable that has been passed to the function argument and then use assign() to overwrite your object. You will need to use envir = parent.frame() inside assign() to let your object be defined in the environment outside the function.
(MyTable <- 1:10)
[1] 1 2 3 4 5 6 7 8 9 10
title.asterisk <- function(table) {
tmp.table <- paste0(table, "*")
name <- deparse(substitute(table))
assign(name, tmp.table, envir = parent.frame())
}
(title.asterisk(MyTable))
[1] "1*" "2*" "3*" "4*" "5*" "6*" "7*" "8*" "9*" "10*"
Using parentheses when defining an object is a little more efficient (and to me, better looking) than defining then printing.

R language: changes to the value of an attribute of an object inside a function is lost after function exits [duplicate]

I'm just getting my feet wet in R and was surprised to see that a function doesn't modify an object, at least it seems that's the default. For example, I wrote a function just to stick an asterisk on one label in a table; it works inside the function but the table itself is not changed. (I'm coming mainly from Ruby)
So, what is the normal, accepted way to use functions to change objects in R? How would I add an asterisk to the table title?
Replace the whole object: myTable = title.asterisk(myTable)
Use a work-around to call by reference (as described, for example, in Call by reference in R by TszKin Julian?
Use some structure other than a function? An object method?
The reason you're having trouble is the fact that you are passing the object into the local namespace of the function. This is one of the great / terrible things about R: it allows implicit variable declarations and then implements supercedence as the namespaces become deeper.
This is affecting you because a function creates a new namespace within the current namespace. The object 'myTable' was, I assume, originally created in the global namespace, but when it is passed into the function 'title.asterisk' a new function-local namespace now has an object with the same properties. This works like so:
title.asterisk <- function(myTable){ do some stuff to 'myTable' }
In this case, the function 'title.asterisk' does not make any changes to the global object 'myTable'. Instead, a local object is created with the same name, so the local object supercedes the global object. If we call the function title.asterisk(myTable) in this way, the function makes changes only to the local variable.
There are two direct ways to modify the global object (and many indirect ways).
Option 1: The first, as you mention, is to have the function return the object and overwrite the global object, like so:
title.asterisk <- function(myTable){
do some stuff to 'myTable'
return(myTable)
}
myTable <- title.asterisk(myTable)
This is okay, but you are still making your code a little difficult to understand, since there are really two different 'myTable' objects, one global and one local to the function. A lot of coders clear this up by adding a period '.' in front of variable arguments, like so:
title.asterisk <- function(.myTable){
do some stuff to '.myTable'
return(.myTable)
}
myTable <- title.asterisk(myTable)
Okay, now we have a visual cue that the two variables are different. This is good, because we don't want to rely on invisible things like namespace supercedence when we're trying to debug our code later. It just makes things harder than they have to be.
Option 2: You could just modify the object from within the function. This is the better option when you want to do destructive edits to an object and don't want memory inflation. If you are doing destructive edits, you don't need to save an original copy. Also, if your object is suitably large, you don't want to be copying it when you don't have to. To make edits to a global namespace object, simply don't pass it into or declare it from within the function.
title.asterisk <- function(){ do some stuff to 'myTable' }
Now we are making direct edits to the object 'myTable' from within the function. The fact that we aren't passing the object makes our function look to higher levels of namespace to try and resolve the variable name. Lo, and behold, it finds a 'myTable' object higher up! The code in the function makes the changes to the object.
A note to consider: I hate debugging. I mean I really hate debugging. This means a few things for me in R:
I wrap almost everything in a function. As I write my code, as soon as I get a piece working, I wrap it in a function and set it aside. I make heavy use of the '.' prefix for all my function arguments and use no prefix for anything that is native to the namespace it exists in.
I try not to modify global objects from within functions. I don't like where this leads. If an object needs to be modified, I modify it from within the function that declared it. This often means I have layers of functions calling functions, but it makes my work both modular and easy to understand.
I comment all of my code, explaining what each line or block is intended to do. It may seem a bit unrelated, but I find that these three things go together for me. Once you start wrapping coding in functions, you will find yourself wanting to reuse more of your old code. That's where good commenting comes in. For me, it's a necessary piece.
The two paradigms are replacing the whole object, as you indicate, or writing 'replacement' functions such as
`updt<-` <- function(x, ..., value) {
## x is the object to be manipulated, value the object to be assigned
x$lbl <- paste0(x$lbl, value)
x
}
with
> d <- data.frame(x=1:5, lbl=letters[1:5])
> d
x lbl
1 1 a
2 2 b
3 3 c
> updt(d) <- "*"
> d
x lbl
1 1 a*
2 2 b*
3 3 c*
This is the behavior of, for instance, $<- -- in-place update the element accessed by $. Here is a related question. One could think of replacement functions as syntactic sugar for
updt1 <- function(x, ..., value) {
x$lbl <- paste0(x$lbl, value)
x
}
d <- updt1(d, value="*")
but the label 'syntactic sugar' doesn't really do justice, in my mind, to the central paradigm that is involved. It is enabling convenient in-place updates, which is different from the copy-on-change illusion that R usually maintains, and it is really the 'R' way of updating objects (rather than using ?ReferenceClasses, for instance, which have more of the feel of other languages but will surprise R users expecting copy-on-change semantics).
For anybody in the future looking for a simple way (do not know if it is the more appropriate one) to get this solved:
Inside the function create the object to temporally save the modified version of the one you want to change. Use deparse(substitute()) to get the name of the variable that has been passed to the function argument and then use assign() to overwrite your object. You will need to use envir = parent.frame() inside assign() to let your object be defined in the environment outside the function.
(MyTable <- 1:10)
[1] 1 2 3 4 5 6 7 8 9 10
title.asterisk <- function(table) {
tmp.table <- paste0(table, "*")
name <- deparse(substitute(table))
assign(name, tmp.table, envir = parent.frame())
}
(title.asterisk(MyTable))
[1] "1*" "2*" "3*" "4*" "5*" "6*" "7*" "8*" "9*" "10*"
Using parentheses when defining an object is a little more efficient (and to me, better looking) than defining then printing.

Auto assignment

Recently I have been working with a set of R scripts that I inherited from a colleague. It is for me a trusted source but more than once I found in his code auto-assignments like
x <<- x
Is there any scope where such an operation could make sense?
This is a mechanism for copying a value defined within a function into the global environment (or at least, somewhere within the stack of parent of environments): from ?"<<-"
The operators ‘<<-’ and ‘->>’ are normally only used in functions,
and cause a search to be made through parent environments for an
existing definition of the variable being assigned. If such a
variable is found (and its binding is not locked) then its value
is redefined, otherwise assignment takes place in the global
environment.
I don't think it's particularly good practice (R is a mostly-functional language, and it's generally better to avoid function side effects), but it does do something. (#Roland points out in comments and #BrianO'Donnell in his answer [quoting Thomas Lumley] that using <<- is good practice if you're using it to modify a function closure, as in demo(scoping). In my experience it is more often misused to construct global variables than to work cleanly with function closures.)
Consider this example, starting in an empty/clean environment:
f <- function() {
x <- 1 ## assignment
x <<- x ## global assignment
}
Before we call f():
x
## Error: object 'x' not found
Now call f() and try again:
f()
x
## [1] 1
<<-
is a global assignment operator and I would imagine there should hardly ever be a reason to use it because it effectively causes side effects.
The scope to use it would be in any case when one wants to define a global variable or a variable one level up from current environment.
Alan gives a good answer: Use the superassignment operator <<- to write upstairs.
Hadley also gives a good answer: How do you use "<<-" (scoping assignment) in R?.
For details on the 'superassignment' operator see Scope.
Here is some critical information on the operator from the section on Assignment Operators in the R manual:
"The operators <<- and ->> are normally only used in functions, and cause a search to be made through parent environments for an existing definition of the variable being assigned. If such a variable is found (and its binding is not locked) then its value is redefined, otherwise assignment takes place in the global environment."
Thomas Lumley sums it up nicely: "The good use of superassignment is in conjuction with lexical scope, where an environment stores state for a function or set of functions that modify the state by using superassignment."
For example:
x <- NA
test <- function(x) {
x <<- x
}
> test(5)
> x
#[1] 5
That's a simple use here, <<- will do a parent environment search (case of nested functions declarations) and if not found assign in the global environment.
Usually this is a really bad ideaTM as you have no real control on where the variable will be assigned and you have chances it will overwrite a variable used for another purpose somewhere.

How do I manipulate the global environment inside a function in R?

I'd like to remove all the objects from my current environment except two of them, something like this
rm(list=setdiff(ls(),c("current_object_a","current_object_b")))
but I'd like to call it within a function. If I do it now, nothing happens because I'm deleting the environment variables inside the function, not the global environment.
You have to specify the environment to both ls and rm.
rm(list = setdiff(ls(globalenv()),
c("current_object_a", "current_object_b")),
pos = globalenv())
But, really, why do you want to do this? Deleting things out of the global environment from within a function seems like a Bad Thing.
You can specify the environment with either the pos or envir argument
rm(list=setdiff(ls(pos=globalenv()),
c("current_object_a","current_object_b")),
pos=globalenv())
From ?rm
The ‘pos’ argument can specify the environment from which to
remove the objects in any of several ways: as an integer (the
position in the ‘search’ list); as the character string name of an
element in the search list; or as an ‘environment’ (including
using ‘sys.frame’ to access the currently active function calls).
The ‘envir’ argument is an alternative way to specify an
environment, but is primarily there for back compatibility.

Resources