Why does "<<-" messes with the scope of a function in Shiny - r

I run into a very interesting problem. I wrote a function and wanted to check the output of some variables within the function as well as the return result.
observe({
result <- myFunction()
})
myFunction <- function() {
# some calculations
# ...
# create Dataframe from previous calculated variables
# I was interested in the result of problematicVariable
# thats why I wanted to make it global for checking, after
# closing down the shiny app
problematicVariable <<- data.frame(...)
if(someCondition) {
# ...
} else {
# some calculations
# ...
# now I used problematicVariable for the first time
foo <- data.frame(problematicVariable$bar, problematicVariable$foo)
}
That gave me
data.frame: arguments imply differing number of rows: ...
However, since I made problematicVariable global I run the line where the App crashed manually (foo <- data.frame(problematicVariable$bar, problematicVariable$foo)). There was absolutely no problem. So I thought, that this is strange...
I got rid of the double << and changed it to problematicVariable <- ... and now it works.
So, using <<- to assign problematicVariable somehow made problematicVariable not available in the if...else.
Why causes <<- a behaviour like this? That messes with the scope?!

<<- doesn't always create variables in the global environment. It will, however, create variables in the parent scope. Sometimes the parent scope is the same as the global environment.
?assign is what you want. But I don't see any reason to create global variables from inside a function. Just return the variable - code is easier to debug that way and you'll get fewer unexpected results.
EDIT: Suspected that this was a dupe. Good discussion about this can be found here.

Related

Unit testing functions with global variables in R

Preamble: package structure
I have an R package that contains an R/globals.R file with the following content (simplified):
utils::globalVariables("COUNTS")
Then I have a function that simply uses this variable. For example, R/addx.R contains a function that adds a number to COUNTS
addx <- function(x) {
COUNTS + x
}
This is all fine when doing a devtools::check() on my package, there's no complaining about COUNTS being out of the scope of addx().
Problem: writing a unit test
However, say I also have a tests/testthtat/test-addx.R file with the following content:
test_that("addition works", expect_gte(fun(1), 1))
The content of the test doesn't really matter here, because when running devtools::test() I get an "object 'COUNTS' not found" error.
What am I missing? How can I correctly write this test (or setup my package).
What I've tried to solve the problem
Adding utils::globalVariables("COUNTS") to R/addx.R, either before, inside or after the function definition.
Adding utils::globalVariables("COUNTS") to tests/testthtat/test-addx.R in all places I could think of.
Manually initializing COUNTS (e.g., with COUNTS <- 0 or <<- 0) in all places of tests/testthtat/test-addx.R I could think of.
Reading some examples from other packages on GitHub that use a similar syntax (source).
I think you misunderstand what utils::globalVariables("COUNTS") does. It just declares that COUNTS is a global variable, so when the code analysis sees
addx <- function(x) {
COUNTS + x
}
it won't complain about the use of an undefined variable. However, it is up to you to actually create the variable, for example by an explicit
COUNTS <- 0
somewhere in your source. I think if you do that, you won't even need the utils::globalVariables("COUNTS") call, because the code analysis will see the global definition.
Where you would need it is when you're doing some nonstandard evaluation, so that it's not obvious where a variable comes from. Then you declare it as a global, and the code analysis won't worry about it. For example, you might get a warning about
subset(df, Col1 < 0)
because it appears to use a global variable named Col1, but of course that's fine, because the subset() function evaluates in a non-standard way, letting you include column names without writing df$Col.
#user2554330's answer is great for many things.
If I understand correctly, you have a COUNTS that needs to be updateable, so putting it in the package environment might be an issue.
One technique you can use is the use of local environments.
Two alternatives:
If it will always be referenced in one function, it might be easiest to change the function from
myfunc <- function(...) {
# do something
COUNTS <- COUNTS + 1
}
to
myfunc <- local({
COUNTS <- NA
function(...) {
# do something
COUNTS <<- COUNTS + 1
}
})
What this does is create a local environment "around" myfunc, so when it looks for COUNTS, it will be found immediately. Note that it reassigns using <<- instead of <-, since the latter would not update the different-environment-version of the variable.
You can actually access this COUNTS from another function in the package:
otherfunc <- function(...) {
COUNTScopy <- get("COUNTS", envir = environment(myfunc))
COUNTScopy <- COUNTScopy + 1
assign("COUNTS", COUNTScopy, envir = environment(myfunc))
}
(Feel free to name it COUNTS here as well, I used a different name to highlight that it doesn't matter.)
While the use of get and assign is a little inconvenient, it should only be required twice per function that needs to do this.
Note that the user can get to this if needed, but they'll need to use similar mechanisms. Perhaps that's a problem; in my packages where I need some form of persistence like this, I have used convenience getter/setter functions.
You can place an environment within your package, and then use it like a named list within your package functions:
E <- new.env(parent = emptyenv())
myfunc <- function(...) {
# do something
E$COUNTS <- E$COUNTS + 1
}
otherfunc <- function(...) {
E$COUNTS <- E$COUNTS + 1
}
We do not need the get/assign pair of functions, since E (a horrible name, chosen for its brevity) should be visible to all functions in your package. If you don't need the user to have access, then keep it unexported. If you want users to be able to access it, then exporting it via the normal package mechanisms should work.
Note that with both of these, if the user unloads and reloads the package, the COUNTS value will be lost/reset.
I'll list provide a third option, in case the user wants/needs direct access, or you don't want to do this type of value management within your package.
Make the user provide it at all times. For this, add an argument to every function that needs it, and have the user pass an environment. I recommend that because most arguments are passed by-value, but environments allow referential semantics (pass by-reference).
For instance, in your package:
myfunc <- function(..., countenv) {
stopifnot(is.environment(countenv))
# do something
countenv$COUNT <- countenv$COUNT + 1
}
otherfunc <- function(..., countenv) {
countenv$COUNT <- countenv$COUNT + 1
}
new_countenv <- function(init = 0) {
E <- new.env(parent = emptyenv())
E$COUNT <- init
E
}
where new_countenv is really just a convenience function.
The user would then use your package as:
mycount <- new_countenv()
myfunc(..., countenv = mycount)
otherfunc(..., countenv = mycount)

R function - Error argument is missing, with no default

I am testing a simple function in R that should transform a time series object into a data frame.
However the code works fine outside the function but within the function it gives me the error in the object.
>fx<-function(AMts) {
x<-as.data.frame(AMts)
return(x)
}
>fx()
I expeced to have the data.frame x in my environment, but I got
Error in as.data.frame(AMts) : argument "AMts" is missing, with no default
If it's inside a function, you need to have "<<-" as the assignment operator instead of the traditional "<-". <<- tells R to keep the object after the function is done running.
>fx<-function(AMts) {
x<<-as.data.frame(AMts) # "<<-" is what saves "x" in your environment
return(x) # remove this line; this prints data frame "x" to the console, but it doesn't save it
}
>fx(AMts)
EDIT: As the commenters have already pointed out, you aren't including any parameters in your function. Above I made it fx(AMts) to make it clear you need to pass in AMts to the function too.

R matrix/dataframe doesn't seem to have an environment

I seem to be having the same issue as is seen here so I started checking the environments that my frames/matrices are in. I have a character matrix, and a table that was imported as a list. I have been able to create a user-defined function that I have debugged and I can confirm that it runs through step by step assigning values in the character matrix to those needing change in the list.
{
i = 1
j = NROW(v)
while (i < j) {
if (v[i] %in% Convert[, 1]) {
n <- match(v[i], Convert[, 1])
v[i] <- Convert[n, 2]
}
i = i + 1
}
}
That is the code in case you need to see what I am doing.
The problem is whenever I check the environment of either of the list or the matrix, I get NULL (using environment()). I tried using assign() to create a new matrix. It seems, based on the link above, that this is an environment issue, but if the lists/matrices used have no environment, what is one to do?
Post Note: I have tried converting these to different formats (using as.character or as.list), but I don't know if this is even relevant if I can't get the environment issue resolved above.
environment() works only for functions and not for variables.
In fact the environment function gives the enclosing environment that makes sense only for function and not the binding environment you are interested in case of a variable.
If you want the binding environment use pryr package
library(pryr)
where("V")
Here an example
e<-new.env()
e$test<-function(x){x}
environment(e$test)
yu can see the environment here is the global environment because you defined the function there, but obviously the binding environment(that is the environment where you find the name of the variabile) is e.
http://adv-r.had.co.nz/Environments.html
Here to understand better the problem

Modifying many objects in enclosing environment of a function

Usually in R a function first creates a new environment and does its stuff inside. I would like to have a function that defines/reinitialize a whole lot of things accessible to the parent environment of a function.
I know I can use the <<- operator for specific variables but here I have a lot of functions, variables, even environments that are defined and I would like to have the choice with a parameter in the function to use the parent environment or not.
Currently, I'm calling the function and then attaching it's environment if needed as following:
init <- function(){
things <- 0
ICI <<- environment()
success <- TRUE
return(success)
}
init();attach(ICI)
It works fine but is their a way to change the current environment of the function to be the parent environment so that I can define a parameter of the function switching on or off this behavior?
Actually attach can be called within the function and the attachment will not be destroyed when returning to the parent environment so the following allows to set back everything in the parent environment:
init <- function(transparent=FALSE){
# compute values
things <- 0
success <- TRUE
# follow "set back variables" argument
ICI <- environment()
if(transparent){
attach(ICI) # everything is transmitted to the parent environment
}else{
ICI <<- ICI # only transmit a handle for the environment
}
return(success)
}
init();# attach(ICI)

Why can't I pass a dataset to a function?

I'm using the package glmulti to fit models to several datasets. Everything works if I fit one dataset at a time.
So for example:
output <- glmulti(y~x1+x2,data=dat,fitfunction=lm)
works just fine.
However, if I create a wrapper function like so:
analyze <- function(dat)
{
out<- glmulti(y~x1+x2,data=dat,fitfunction=lm)
return (out)
}
simply doesn't work. The error I get is
error in evaluating the argument 'data' in selecting a method for function 'glmulti'
Unless there is a data frame named dat, it doesn't work. If I use results=lapply(list_of_datasets, analyze), it doesn't work.
So what gives? Without my said wrapper, I can't lapply a list of datasets through this function. If anyone has thoughts or ideas on why this is happening or how I can get around it, that would be great.
example 2:
dat=list_of_data[[1]]
analyze(dat)
works fine. So in a sense it is ignoring the argument and just literally looking for a data frame named dat. It behaves the same no matter what I call it.
I guess this is -yet another- problem due to the definition of environments in the parse tree of S4 methods (one of the resons why I am not a big fan of S4...)
It can be shown by adding quotes around the dat :
> analyze <- function(dat)
+ {
+ out<- glmulti(y~x1+x2,data="dat",fitfunction=lm)
+ return (out)
+ }
> analyze(test)
Initialization...
Error in eval(predvars, data, env) : invalid 'envir' argument
You should in the first place send this information to the maintainers of the package, as they know how they deal with the environments internally. They'll have to adapt the functions.
A -very dirty- workaround for yourself, is to put "dat" in the global environment and delete it afterwards.
analyze <- function(dat)
{
assign("dat",dat,envir=.GlobalEnv) # put the dat in the global env
out<- glmulti(y~x1+x2,data=dat,fitfunction=lm)
remove(dat,envir=.GlobalEnv) # delete dat again from global env
return (out)
}
EDIT:
Just for clarity, this is really about the worst solution possible, but I couldn't manage to find anything better. If somebody else gives you a solution where you don't have to touch your global environment, by all means use that one.

Resources