I am trying to determine all the objects in a script. ( specifically to get all the dataframes but I'll settle for all the assigned objects ie vectors lists etc.)
Is there a way of doing this. Should I make the script run in its own session and then somehow get the objects from that session rather than rely on the global environment.
Use the second argument to source() when you execute the script. For example, here's a script:
x <- y + 1
z <- 2
which I can put in script.R. Then I will execute it in its own environment using the following code:
x <- 1 # This value will *not* change
y <- 2 # This value will be visible to the script
env <- new.env()
source("script.R", local = env)
Now I can print the values, and see that the comments are correct
x # the original one
# [1] 1
ls(env) # what was created?
# [1] "x" "z"
env$x # this is the one from the script
# [1] 3
I had a similar question and found an answer. I am copying the answer from my other post here.
I wrote the following function, get.objects(), that returns all the objects created in a script:
get.objects <- function(path2file = NULL, exception = NULL, source = FALSE, message = TRUE) {
library("utils")
library("tools")
# Step 0-1: Possibility to leave path2file = NULL if using RStudio.
# We are using rstudioapi to get the path to the current file
if(is.null(path2file)) path2file <- rstudioapi::getSourceEditorContext()$path
# Check that file exists
if (!file.exists(path2file)) {
stop("couldn't find file ", path2file)
}
# Step 0-2: If .Rmd file, need to extract the code in R chunks first
# Use code in https://felixfan.github.io/extract-r-code/
if(file_ext(path2file)=="Rmd") {
require("knitr")
tmp <- purl(path2file)
path2file <- paste(getwd(),tmp,sep="/")
source = TRUE # Must be changed to TRUE here
}
# Step 0-3: Start by running the script if you are calling an external script.
if(source) source(path2file)
# Step 1: screen the script
summ_script <- getParseData(parse(path2file, keep.source = TRUE))
# Step 2: extract the objects
list_objects <- summ_script$text[which(summ_script$token == "SYMBOL")]
# List unique
list_objects <- unique(list_objects)
# Step 3: find where the objects are.
src <- paste(as.vector(sapply(list_objects, find)))
src <- tapply(list_objects, factor(src), c)
# List of the objects in the Global Environment
# They can be in both the Global Environment and some packages.
src_names <- names(src)
list_objects = NULL
for (i in grep("GlobalEnv", src_names)) {
list_objects <- c(list_objects, src[[i]])
}
# Step 3bis: if any exception, remove from the list
if(!is.null(exception)) {
list_objects <- list_objects[!list_objects %in% exception]
}
# Step 4: done!
# If message, print message:
if(message) {
cat(paste0(" ",length(list_objects)," objects were created in the script \n ", path2file,"\n"))
}
return(list_objects)
}
To run it, you need a saved script. Here is an example of a script:
# This must be saved as a script, e.g, "test.R".
# Create a bunch of objects
temp <- LETTERS[1:3]
data <- data.frame(x = 1:10, y = 10:1)
p1 <- ggplot(data, aes(x, y)) + geom_point()
# List the objects. If you want to list all the objects except some, you can use the argument exception. Here, I listed as exception "p1.
get.objects()
get.objects(exception = "p1", message = FALSE)
Note that the function also works for external script and R markdown.
If you run an external script, you will have to run the script before. To do so, change the argument source to TRUE.
Related
When you save a variable in an R data file using save, it is saved under whatever name it had in the session that saved it. When I later go to load it from another session, it is loaded with the same name, which the loading script cannot possibly know. This name could overwrite an existing variable of the same name in the loading session. Is there a way to safely load an object from a data file into a specified variable name without risk of clobbering existing variables?
Example:
Saving session:
x = 5
save(x, file="x.Rda")
Loading session:
x = 7
load("x.Rda")
print(x) # This will print 5. Oops.
How I want it to work:
x = 7
y = load_object_from_file("x.Rda")
print(x) # should print 7
print(y) # should print 5
If you're just saving a single object, don't use an .Rdata file, use an .RDS file:
x <- 5
saveRDS(x, "x.rds")
y <- readRDS("x.rds")
all.equal(x, y)
I use the following:
loadRData <- function(fileName){
#loads an RData file, and returns it
load(fileName)
get(ls()[ls() != "fileName"])
}
d <- loadRData("~/blah/ricardo.RData")
You can create a new environment, load the .rda file into that environment, and retrieve the object from there. However, this does impose some restrictions: either you know what the original name for your object is, or there is only one object saved in the file.
This function returns an object loaded from a supplied .rda file. If there is more than one object in the file, an arbitrary one is returned.
load_obj <- function(f)
{
env <- new.env()
nm <- load(f, env)[1]
env[[nm]]
}
You could also try something like:
# Load the data, and store the name of the loaded object in x
x = load('data.Rsave')
# Get the object by its name
y = get(x)
# Remove the old object since you've stored it in y
rm(x)
Similar to the other solutions above, I load variables into an environment variable. This way if I load multiple variables from the .Rda, those will not clutter my environment.
load("x.Rda", dt <- new.env())
Demo:
x <- 2
y <- 1
save(x, y, file = "mydata.Rda")
rm(x, y)
x <- 123
# Load 'x' and 'y' into a new environment called 'dt'
load("mydata.Rda", dt <- new.env())
dt$x
#> [1] 2
x
#> [1] 123
Rdata file with one object
assign('newname', get(load('~/oldname.Rdata')))
In case anyone is looking to do this with a plain source file, rather than a saved Rdata/RDS/Rda file, the solution is very similar to the one provided by #Hong Ooi
load_obj <- function(fileName) {
local_env = new.env()
source(file = fileName, local = local_env)
return(local_env[[names(local_env)[1]]])
}
my_loaded_obj = load_obj(fileName = "TestSourceFile.R")
my_loaded_obj(7)
Prints:
[1] "Value of arg is 7"
And in the separate source file TestSourceFile.R
myTestFunction = function(arg) {
print(paste0("Value of arg is ", arg))
}
Again, this solution only works if there is exactly one file, if there are more, then it will just return one of them (probably the first, but that is not guaranteed).
I'm extending the answer from #ricardo to allow selection of specific variable if the .Rdata file contains multiple variables (as my credits are low to edit an answer). It adds some lines to read user input after listing the variables contained in the .Rdata file.
loadRData <- function(fileName) {
#loads an RData file, and returns it
load(fileName)
print(ls())
n <- readline(prompt="Which variable to load? \n")
get(ls()[as.integer(n)])
}
select_var <- loadRData('Multiple_variables.Rdata')
Following from #ricardo, another example of using (effectively) a separate environment
load_rdata <- function(file_path) {
res <- local({
load(file_path)
return(get(ls()))
})
return(res)
}
Similar caveats with only expects one object to be returned
Setup:
Say I have two R functions, x() and y().
# Defining function x
# Simple, but what it does is not really important.
x <- function(input)
{output <- input * 10
return(output)}
x() is contained within an .R file and stored in the same directory as y(), but within a different file.
# Defining function y;
# What's important is that Function y's output depends on function x
y <- function(variable){
source('x.R')
output <- x(input = variable)/0.5
return(output)
}
When y() is defined in R, the environment populates with y() only, like such:
However, after we actually run y()...
# Demonstrating that it works
> y(100)
[1] 2000
the environment populates with x as well, like such:
Question:
Can I add code within y to prevent x from populating the R environment after it has ran? I've built a function that's dependent upon several source files which I don't want to keep in the environment after the function has run. I'd like to avoid unnecessarily crowding the R environment when people use the primary function, but adding a simple rm(SubFunctionName) has not worked and I haven't found any other threads on the topic. Any ideas? Thanks for your time!
1) Replace the source line with the following to cause it to be sourced into the local environment.
source('x.R', local = TRUE)
2) Another possibility is to write y like this so that x.R is only read when y.R is sourced rather than each time y is called.
y <- local({
source('x.R', local = TRUE)
function(variable) x(input = variable) / 0.5
})
3) If you don't mind having x defined in y.R then y.R could be written as follows. Note that this eliminates having any source statements in the code separating the file processing and code.
y <- function(variable) {
x <- function(input) input * 10
x(input = variable) / 0.5
}
4) Yet another possibility for separating the file processing and code is to remove the source statement from y and read x.R and y.R into the same local environment so that outside of e they can only be accessed via e. In that case they can both be removed by removing e.
e <- local({
source("x.R", local = TRUE)
source("y.R", local = TRUE)
environment()
})
# test
ls(e)
## [1] "x" "y"
e$y(3)
## [1] 60
4a) A variation of this having similar advantages but being even shorter is:
e <- new.env()
source("x.R", local = e)
source("y.R", local = e)
# test
ls(e)
## [1] "x" "y"
e$y(3)
## [1] 60
5) Yet another approach is to use the CRAN modules package or the klmr/modules package referenced in its README.
I've just read about delayedAssign(), but the way you have to do it is by passing the name of the delayed variable as the first parameter. Is there a way to do it via direct assignment?
e.g.:
x <- delayed_variable("Hello World")
rather than
delayedAssign("x","Hello World")
I want to create a variable that will throw an error if accessed (use-case is obviously more complex), so for example:
f <- function(x){
y <- delayed_variable(stop("don't use y"))
x
}
f(10)
> 10
f <- function(x){
y <- delayed_variable(stop("don't use y"))
y
}
f(10)
> Error in f(10) : don't use y
No, you can't do it that way. Your example would be fine with the current setup, though:
f <- function(x){
delayedAssign("y", stop("don't use y"))
y
}
f(10)
which gives exactly the error you want. The reason for this limitation is that delayed_variable(stop("don't use y")) would create a value which would trigger the error when evaluated, and assigning it to y would evaluate it.
Another version of the same thing would be
f <- function(x, y = stop("don't use y")) {
...
}
Internally it's very similar to the delayedAssign version.
I reached a solution using makeActiveBinding() which works provided it is being called from within a function (so it doesn't work if called directly and will throw an error if it is). The main purpose of my use-case is a smaller part of this, but I generalised the code a bit for others to use.
Importantly for my use-case, this function can allow other functions to use delayed assignment within functions and can also pass R CMD Check with no Notes.
Here is the function and it gives the desired outputs from my question.
delayed_variable <- function(call){
#Get the current call
prev.call <- sys.call()
attribs <- attributes(prev.call)
# If srcref isn't there, then we're not coming from a function
if(is.null(attribs) || !"srcref" %in% names(attribs)){
stop("delayed_variable() can only be used as an assignment within a function.")
}
# Extract the call including the assignment operator
this_call <- parse(text=as.character(attribs$srcref))[[1]]
# Check if this is an assignment `<-` or `=`
if(!(identical(this_call[[1]],quote(`<-`)) ||
identical(this_call[[1]],quote(`=`)))){
stop("delayed_variable() can only be used as an assignment within a function.")
}
# Get the variable being assigned to as a symbol and a string
var_sym <- this_call[[2]]
var_str <- deparse(var_sym)
#Get the parent frame that we will be assigining into
p_frame <- parent.frame()
var_env <- new.env(parent = p_frame)
#Create a random string to be an identifier
var_rand <- paste0(sample(c(letters,LETTERS),50,replace=TRUE),collapse="")
#Put the variables into the environment
var_env[["p_frame"]] <- p_frame
var_env[["var_str"]] <- var_str
var_env[["var_rand"]] <- var_rand
# Create the function that will be bound to the variable.
# Since this is an Active Binding (AB), we have three situations
# i) It is run without input, and thus the AB is
# being called on it's own (missing(input)),
# and thus it should evaluate and return the output of `call`
# ii) It is being run as the lhs of an assignment
# as part of the initial assignment phase, in which case
# we do nothing (i.e. input is the output of this function)
# iii) It is being run as the lhs of a regular assignment,
# in which case, we want to overwrite the AB
fun <- function(input){
if(missing(input)){
# No assignment: variable is being called on its own
# So, we activate the delayed assignment call:
res <- eval(call,p_frame)
rm(list=var_str,envir=p_frame)
assign(var_str,res,p_frame)
res
} else if(!inherits(input,"assign_delay") &&
input != var_rand){
# Attempting to assign to the variable
# and it is not the initial definition
# So we overwrite the active binding
res <- eval(substitute(input),p_frame)
rm(list=var_str,envir=p_frame)
assign(var_str,res,p_frame)
invisible(res)
}
# Else: We are assigning and the assignee is the output
# of this function, in which case, we do nothing!
}
#Fix the call in the above eval to be the exact call
# rather than a variable (useful for debugging)
# This is in the line res <- eval(call,p_frame)
body(fun)[[c(2,3,2,3,2)]] <- substitute(call)
#Put the function inside the environment with all
# all of the variables above
environment(fun) <- var_env
# Check if the variable already exists in the calling
# environment and if so, remove it
if(exists(var_str,envir=p_frame)){
rm(list=var_str,envir=p_frame)
}
# Create the AB
makeActiveBinding(var_sym,fun,p_frame)
# Return a specific object to check for
structure(var_rand,call="assign_delay")
}
When you save a variable in an R data file using save, it is saved under whatever name it had in the session that saved it. When I later go to load it from another session, it is loaded with the same name, which the loading script cannot possibly know. This name could overwrite an existing variable of the same name in the loading session. Is there a way to safely load an object from a data file into a specified variable name without risk of clobbering existing variables?
Example:
Saving session:
x = 5
save(x, file="x.Rda")
Loading session:
x = 7
load("x.Rda")
print(x) # This will print 5. Oops.
How I want it to work:
x = 7
y = load_object_from_file("x.Rda")
print(x) # should print 7
print(y) # should print 5
If you're just saving a single object, don't use an .Rdata file, use an .RDS file:
x <- 5
saveRDS(x, "x.rds")
y <- readRDS("x.rds")
all.equal(x, y)
I use the following:
loadRData <- function(fileName){
#loads an RData file, and returns it
load(fileName)
get(ls()[ls() != "fileName"])
}
d <- loadRData("~/blah/ricardo.RData")
You can create a new environment, load the .rda file into that environment, and retrieve the object from there. However, this does impose some restrictions: either you know what the original name for your object is, or there is only one object saved in the file.
This function returns an object loaded from a supplied .rda file. If there is more than one object in the file, an arbitrary one is returned.
load_obj <- function(f)
{
env <- new.env()
nm <- load(f, env)[1]
env[[nm]]
}
You could also try something like:
# Load the data, and store the name of the loaded object in x
x = load('data.Rsave')
# Get the object by its name
y = get(x)
# Remove the old object since you've stored it in y
rm(x)
Similar to the other solutions above, I load variables into an environment variable. This way if I load multiple variables from the .Rda, those will not clutter my environment.
load("x.Rda", dt <- new.env())
Demo:
x <- 2
y <- 1
save(x, y, file = "mydata.Rda")
rm(x, y)
x <- 123
# Load 'x' and 'y' into a new environment called 'dt'
load("mydata.Rda", dt <- new.env())
dt$x
#> [1] 2
x
#> [1] 123
Rdata file with one object
assign('newname', get(load('~/oldname.Rdata')))
In case anyone is looking to do this with a plain source file, rather than a saved Rdata/RDS/Rda file, the solution is very similar to the one provided by #Hong Ooi
load_obj <- function(fileName) {
local_env = new.env()
source(file = fileName, local = local_env)
return(local_env[[names(local_env)[1]]])
}
my_loaded_obj = load_obj(fileName = "TestSourceFile.R")
my_loaded_obj(7)
Prints:
[1] "Value of arg is 7"
And in the separate source file TestSourceFile.R
myTestFunction = function(arg) {
print(paste0("Value of arg is ", arg))
}
Again, this solution only works if there is exactly one file, if there are more, then it will just return one of them (probably the first, but that is not guaranteed).
I'm extending the answer from #ricardo to allow selection of specific variable if the .Rdata file contains multiple variables (as my credits are low to edit an answer). It adds some lines to read user input after listing the variables contained in the .Rdata file.
loadRData <- function(fileName) {
#loads an RData file, and returns it
load(fileName)
print(ls())
n <- readline(prompt="Which variable to load? \n")
get(ls()[as.integer(n)])
}
select_var <- loadRData('Multiple_variables.Rdata')
Following from #ricardo, another example of using (effectively) a separate environment
load_rdata <- function(file_path) {
res <- local({
load(file_path)
return(get(ls()))
})
return(res)
}
Similar caveats with only expects one object to be returned
I have the following function:
example_Foo <- function( ...,FigureFolder){
# check what variables are passed through the function
v_names <- as.list(match.call())
variable_list <- v_names[2:(length(v_names)-2)]
# create file to store figures
subDir <- c(paste(FigureFolder,"SavedData",sep = "\\"))
}
Obviously this is just the start of the function, but I have already run into some problems. Here, I am trying to define the directory where I eventually want my results to be saved. An example of using the function is:
weight <- c(102,20,30,04,022,01,220,10)
height <- c(102,20,30,04,022,01,220,10)
catg <- c(102,20,30,04,022,01,220,10)
catg <- matrix(height,nrow = 2)
FigureFolder <- "C:\\exampleDat"
# this is the function
example_Foo(catg,FigureFolder)
This generates the following error:
Error in paste(FigureFolder, "SavedData", sep = "\\") :
argument "FigureFolder" is missing, with no default
which i'm guessing is due to the function not knowing what 'FigureFolder' is, my question is how do I pass this string through the function?
Because you do not use a named argument, the FigureFolder argument is put into .... Just use:
example_Foo(catg, FigureFolder = FigureFolder)
In addition:
example_Foo <- function( ...,FigureFolder){
# check what variables are passed through the function
v_names <- as.list(match.call())
variable_list <- v_names[2:(length(v_names)-2)]
# create file to store figures
subDir <- c(paste(FigureFolder,"SavedData",sep = "\\"))
}
could also be replaced by:
example_Foo <- function( ...,FigureFolder){
# check what variables are passed through the function
variable_list = list(...)
# create file to store figures
subDir <- c(paste(FigureFolder,"SavedData",sep = "\\"))
}
Or even simpler:
example_Foo <- function(variable_list, FigureFolder){
# create file to store figures
subDir <- c(paste(FigureFolder,"SavedData",sep = "\\"))
}
Keeping your code simple makes it easier to read (also for yourself) and easier to use and maintain.
You need to providea value for FigureFolder, eg
example_Foo(catg,FigureFolder="FigureFolder")