Can I use the output of a function in another R file? - r

I built a function that retrieves data from an Azure table via a REST API. I sourced the function, so I can reuse it in other R scripts.
The function is as below:
Connect_To_Azure_Table(Account, Container, Key)
and it returns as an output a table called Azure-table. The very last line of the code in the function is
head(Azure_table)
In my next script, I'm going to call that function and execute some data transformation.
However, while the function executes (and my Azure_table is previewed), I don't seem to be able to use it in the code to start performing my data transformation. For example, this is the beginning of my ETL function:
library(dplyr)
library(vroom)
library(tidyverse)
library(stringr)
#Connects to datasource
if(exists("Connect_To_Azure_Table", mode = "function")) {
source("ConnectToAzureTable.R")
}
Account <- "storageaccount"
Container <- "Usage"
Key <- "key"
Connect_To_Azure_Table(Account, Container, Key)
# Performs ETL process
colnames(Azure_table) <- gsub("value.", "", colnames(Azure_table)) # Removes prefix from column headers
Both the function and the table get warning. But while the function executes anyway, the Azure_table throws an error:
> # Performs ETL process
>
> colnames(Azure_table) <- gsub("value.", "", colnames(Azure_table)) # Removes prefix from column headers
Error in is.data.frame(x) : object 'Azure_table' not found
What should I be doing to use Azure_table in my script?
Thanks in advance!
~Alienvolm

You can ignore the RStudio warnings, they are based on a heuristic, and in this case it’s very imprecise and misleading.
However, there are some errors in your code.
Firstly, you’re only sourceing the code if the function was already defined. Surely you want to do it the other way round: source the code if the function does not yet exist. But furthermore that check is completely redundant: if you haven’t sourced the code which defines the function yet, the function won’t be defined. The existence check is unnecessary and misleading. Remove it:
source("ConnectToAzureTable.R")
Secondly, when you’re calling the function you’re not assigning its return value to any name. You probably meant to write the following:
Azure_table <- Connect_To_Azure_Table(Account, Container, Key)

Related

RStudio: Statement to clear memory [duplicate]

I was hoping to make a global function that would clear my workspace and dump my memory. I called my function "cleaner" and want it to execute the following code:
remove(list = ls())
gc()
I tried making the function in the global environment but when I run it, the console just prints the text of the function. In my function file to be sourced:
cleaner <- function(){
remove(list = ls())
gc()
#I tried adding return(invisible()) to see if that helped but no luck
}
cleaner()
Even when I make the function in the script I want it to run (cutting out any potential errors with sourcing), the storage dump seems to work, but it still doesn't clear the workspace.
Two thoughts about this: Your code does not delete all objects, to also remove the hidden ones use
rm(list = ls(all.names = TRUE))
There is also the command gctorture() which provokes garbage collection on (nearly) every memory allocation (as the man page said). It's intended for R developers to ferret out memory protection bugs:
cleaner <- function(){
# Turn it on
gctorture(TRUE)
# Clear workspace
rm(list = ls(all.names = TRUE, envir=sys.frame(-1)),
envir = sys.frame(-1))
# Turn it off (important or it gets very slow)
gctorture(FALSE)
}
If this procedure is used within a function, there is the following problem: Since the function has its own stack frame, only the objects within this stack frame are deleted. They still exist outside. Therefore, it must be specified separately with sys.frame(-1) that only the higher-level stack frame should be considered. The variables are then only deleted within the function that calls cleaner() and in cleaner itself when the function is exited.
But this also means that the function may only be called from the top level in order to function correctly (you can use sys.frames() which lists all higher-level stack frames to build something that also avoids this problem if really necessary)

A note on graphics::curve() in R CMD check

I use the following code in my own package.
graphics::curve( foo (x) )
When I run R CMD check, it said the following note.How do I delete the NOTE?
> checking R code for possible problems ... NOTE
foo: no visible binding for global variable 'x'
Undefined global functions or variables:
x
Edit for the answers:
I try the answer as follows.
function(...){
utils::globalVariables("x")
graphics::curve( sin(x) )
}
But it did not work. So,..., now, I use the following code, instead
function(...){
x <-1 # This is not used but to avoid the NOTE, I use an object "x".
graphics::curve( sin(x) )
}
The last code can remove the NOTE.
Huuum, I guess, the answer is correct, but, I am not sure but it dose not work for me.
Two things:
Add
utils::globalVariables("x")
This can be added in a file of its own (e.g., globals.R), or (my technique) within the file that contains that code.
It is not an error to include the same named variables in multiple files, so the same-file technique will preclude you from accidentally removing it when you remove one (but not another) reference. From the help docs: "Repeated calls in the same package accumulate the names of the global variables".
This must go outside of any function declarations, on its own (top-level). While it is included in the package source (it needs to be, in order to have an effect on the CHECK process), but otherwise has no impact on the package.
Add
importFrom(utils,globalVariables)
to your package NAMESPACE file, since you are now using that function (unless you want another CHECK warning about objects not found in the global environment :-).

R parLapply: How to (or Can we) access an object within the parallel code

I am trying to use parLapply to run a custom function. Since my actual code and data is not very reader friendly, I am creating a pseudo code for reference. I do the following:
a) First, I create a custom function. This function takes an argument say "Argument1". Argument1 is a list object which is what I use to run the parLapply on later.
b) Inside the function, based on Argument1, I create a subset called subset_data (subsetting on the full dataset which is supplied while calling parLapply).
c) After getting subset_data, I obtain a list of unique items for Variable2 and then further subset it depending on the number of unique items in Variable2.
d) Finally I run a function (SomeOtherFunction) which takes subset_data2 as the argument.
SomeCustomFunction = function(Argument1){
subset_data = OriginalData[which(OriginalData$Variable1==Argument1),]
some_other_variable = unique(subset_data$Variable2)
for (object in some_other_variable){
subset_data2 = subset_data[which(subset_data$Variable2 == object),]
FinalOutput = SomeOtherFunction(subset_data2)
}
return(SomeOutput)
}
SomeOtherFunction=function(subset_data2){
#Do Some computation here
}
Next I can create clusters in this way:
cl=parallel::makeCluster(2,type="PSOCK")
registerDoParallel(cl)
And supply the objects Argument1, OriginalData by calling clusterExport and then finally run parLapply by supplying SomeCustomFunction and a list for Argument1 (suppose Argument1_list).
clusterExport(cl=cl, list("Argument1","OriginalData"),envir=environment())
zz=parLapply(cl=cl,fun=SomeCustomFunction,Argument1=Argument1_list)
However, in this case, when I run parLapply, I get an error saying
Error in get(name, envir = envir) : object 'subset_data2' not found
In this case, I was assuming that since subset_data2 is being created within the first function, the object subset_data2 will get supplied automatically. Clearly this is not happening.
Is there a way for me supply this 2nd subset (subset_data2) within the function SomeCustomFunction without passing it to the cluster when calling ClusterExport?
If the question is not clear, please let me know and I can modify it accordingly. Thanks in advance.
P.S. I read this question: using parallel's parLapply: unable to access variables within parallel code, but in my case I do not call parLapply inside my function.
In the related question you mention, the top answer passes clusterExport a character vector of variable names, whereas you pass a list. Also, help(clusterExport) reveals: "varlist: character vector of names of objects to export".
Also, you're missing a " after Argument1 here: list("Argument1,"OriginalData, but I'm guessing that's only the sample code you posted, not in your real code.
PS: It's a step in the right direction that you put some code, but your question will get more responses if you put sample data and code that can be directly pasted and run to reproduce the error.

Externally set default argument of a function from a package

I am building a package with functions that have default arguments. I would like to find a clean way to set the values of the default arguments once the function has been imported.
My first attempt was to set the default values to unknown objects (within the package). Then when I load the package, I would have a external script that would assign a value to the unknown objects.
But it does not seem very clean since I am compiling a function with an unknown object.
My issue is that I will need other people to use the functions, and since they have many arguments I want to keep the code as concise as possible. And many arguments can be set by a configuration script before running the program.
So, for instance, I define in my package:
function_try <- function(x = VAL){
return(x)
}
I compile the package and load it, and then I have an external script that does (or reading from a config file)
VAL <- "hello"
And then a user of the function can just run
function_try()
I would use options for that. So your function looks like:
function_try <- function(x = getOption("mypackage.default.value")) x
In your external script you make sure that the option value is set:
options(mypackage.default.value = "hello")
IMHO that is a clean solution, because anybody reading your function will see at first sight that a certain options value is used as a default and has also a clear understanding of how to overwrite this value if needed.
I would also define a fall back value in your library onLoad to make sure that the value is defined in the first place. You can then even react in your functions to this fallback value and issue a meaningful warning if the function realizes that the the external script did for whatever reason not provide a new value.

No visible binding for '<<-' Assignment

I have solved variants of the "no visible binding" notes that one gets when checking their package. However, I am unable to solve the situation when applied to the '<<-' assignment.
Specifically, I had defined and used a local variable in several functions, such as:
fName = function(df){
eval({vName<<-0}, envir=environment(fName))
}
However when I run check() from devtools, I get the error:
fName: no visible binding for '<<-' assignment to 'vName'
So, I tried using a different syntax, as follows:
fName = function(df){
assign("vName",0, envir=environment(fName))
}
But got the check() error:
cannot add bindings to a locked environment
When I tried:
fName = function(df){
assign("vName",0, envir=environment(-1))
}
I got the error:
use of NULL environment is defunct
So, my question is how I can accomplish the assignment operator <<- without getting a note in the check() from devtools.
Thank you.
The easy answer is - Don't use <<- in your package. One Alterna way you can make assignments to an environment (but not a meaningful one) is to create a locked binding.
e <- new.env()
e$vName <- 0L
lockBinding("vName", e)
vName
# Error: object 'vName' not found
with(e, vName)
# [1] 0
e$vName <- 5
# Error in e$vName <- 5 : cannot change value of locked binding for 'vName'
You can also lock an environment, but not a meaningful one.
lockEnvironment(e)
rm(vName, envir = e)
# Error in rm(vName, envir = e) :
# cannot remove bindings from a locked environment
Have a look at help(bindenv), it's a good read.
Updated Since you mentioned you might be waiting to make assignment rather than at load times, have a read of help(globalVariables) It's another best-seller at ?
For globalVariables, the names supplied are of functions or other objects that should be regarded as defined globally when the check tool is applied to this package. The call to globalVariables will be included in the package's source. Repeated calls in the same package accumulate the names of the global variables.
i dont know if it helps after that many years.I had the same problem,and what i did to solve it is :
utils::globalVariables(c("global_var"))
Write this r code somewhere inside the R directory(save it as R file).Whenever you assign a global var assign it also locally like so:
global_var<<-1
global_var<-1
That worked for me.

Resources