Garbage collection com object in R - r

I want to be able to open an excel session from R, write to it and then close the excel session from R. While I can do this all from within the same function, I am trying to generalize the code for the cleanup of excel. However, somehow when I make the call to gc() from a function by passing in the excel object, it does not garbage collect. Below is the code:
opentest<-function() {
excel<-comCreateObject("Excel.Application")
comSetProperty(excel,"Visible",T)
comSetProperty(excel,"DisplayAlerts",FALSE)
comSetProperty(excel, "SheetsInNewWorkbook", 1)
wb <- comGetProperty(excel, "Workbooks")
wb <- comInvoke(wb, "Add")
excel
}
cleanupexcel<-function(excelobj) {
comInvoke(excelobj,"Quit")
rm(excelobj, envir=globalenv())
eapply(env=globalenv(), gc)
}
With the following calls to the function:
excelobj<- opentest()
cleanupexcel(excelobj)
When I call the two functions above, I can still see the excel session running in my task manager. However, if I make the call to gc() after returning from cleanupexcel(), it kills the excel session successfully.
Any ideas on how I can gc successfully from a generic function or is there some other issue that I am having here?

Here's a small change to your code that should work (I'm on Linux now, so I can't test it).
The main fix is to wrap the excel instance in an environment and return that instead.
The close can then access the instance and then remove it (ensuring no reference to it remains) before calling gc():
opentest<-function() {
excel<-comCreateObject("Excel.Application")
comSetProperty(excel,"Visible",T)
comSetProperty(excel,"DisplayAlerts",FALSE)
comSetProperty(excel, "SheetsInNewWorkbook", 1)
wb <- comGetProperty(excel, "Workbooks")
wb <- comInvoke(wb, "Add")
# wrap excel in an environment
env <- new.env(parent=emptyenv())
env$instance <- excel
env
}
cleanupexcel<-function(excel) {
comInvoke(excel$instance,"Quit")
rm("instance", envir=excel)
gc()
}
myexcel <- opentest()
cleanupexcel(myexcel)
...Note that your old code requires the variable to be named "excelobj" since you remove it from within the cleanupexcel function. That's not great.
OK, there are very subtle issues at play, so here's a reproducible example without excel:
opentest<-function() {
excel<-new.env()
reg.finalizer(excel, function(x) { cat("FINALIZING EXCEL!\n") }, FALSE)
# wrap excel in an environment
env <- new.env(parent=emptyenv())
env$instance <- excel
env
}
cleanupexcel<-function(excel) {
cat(excel$instance,"\n")
rm("instance", envir=excel)
gc()
}
myexcel <- opentest()
cleanupexcel(myexcel)
# Prints "FINALIZING EXCEL!"

Related

knitr: Create environment to share variables across chunks which are deleted when finished

I'm developing an R package (https://github.com/rcst/rmaxima) that provides an interface to the computer algebra system Maxima. I want to include a knitr engine, so that it can directly be used with knitr. The package has functions to start/ spawn a child process and then send commands to and fetch results from Maxima. This way, it should be possible to carry over results between different chunks.
The interface works by creating a new object of an Rcpp-class of the interface. Creating the object spawns a child process, deleting the objects stops that process.
I want the engine to start a new child process each time the document is knit()ed, so that the result is reproducible. I'm thinking that I could create an extra environment that binds the interface object. The engine checks whether this objects exists in that environment. If it doesn't exist, it will be created, otherwise the engine can directly proceed to send code to the interface. When knit() exits, it exits the scope of it's environment and all variables within that environment are deleted automatically. This way, I need not stop the child process, because the object of the interface class get's deleted and the process is stopped automatically.
But I have no clue how to go about it. Any hints very much appreciated.
yihui provides an answer here.
In essence, (a) one sets a temporary variable in the parent frame to check if the engine is running and (b) inspects the lists of chunk labels to determine if the current one is the last and therefore trigger deletion and tear down after it's been processed:
last_label <- function(label = knitr::opts_current$get('label')) {
if (knitr:::child_mode()) return(FALSE)
labels <- names(knitr::knit_code$get())
tail(labels, 1) == label
}
knitr::knit_engines$set(maxima = local({
mx <- FALSE
function(options) {
if (!mx) {
maxima$startMaxima()
mx <<- TRUE
}
... # run maxima code
if (last_label(options$label)) {
maxima$stopMaxima()
mx <<- FALSE
}
}
}))
For completeness, I also came up with a solution which works, but is less robust.
Objects that go out of scope are automatically removed in R. However, actual deletion happens during R's garbage collection gc(), which cannot be controlled directly. So to get an object removed when knit() finishes, that object needs to be created in the environment of knit(), which is some levels above the engine call.
In principle one can register a function that does the actual clean-up via on.exit() of knit(), who's environment can be retrieved by parent.frame(n=...). Note that all objects of that scope are still present the moment the expressions registered to on.exit() get called.
maxima.engine <- function(options) {
e <- parent.frame(n = sys.parent() - 2)
if(!exists("mx", envir = e)) {
message("starting maxima on first chunk")
assign(x = "mx", value = new(rmaxima:::RMaxima), envir = e)
assign(x = "execute", value = get("mx", envir = e)$execute, envir = e)
assign(x = "stopMaxima", value = get("mx", envir = e)$stopMaxima, envir = e)
# quit maxima "on.exit"ing of knit()
# eval() doesn't work on "special primitive functions
# do.call() does ... this may break in the future
# see https://stat.ethz.ch/pipermail/r-devel/2013-November/067874.html
do.call("on.exit", list(quote(stopMaxima()),
add = TRUE), envir = e)
}
code <- options$code
out <- character(0);
for(i in 1:length(code))
out <- c(out, eval(call("execute", code[i]), envir = e))
engine_output(options, code, out)
}

Loading .rmd datafile within a function to the environment

I'm new to functions and I would like to load my data with a function.
The function appears to be correct but the file does not save as a dataframe to the environment, while this does happen when it's not within the function.
This is my script:
read_testdata <- function(file) {
Dataset_test <- read_rds(here("foldername", file))
}
read_testdata("filename")
Can someone spot my error?
After some thinking I spotted my problem, the correct code should be this:
read_testdata <- function(file) {
read_rds(here("foldername", file))
}
Dataset_test <- read_testdata("filename.rds")

How To Create R Data Frame From List In Loop

I'm having trouble returning data frames from a loop in R. I have a set of functions that reads in files and turns them into data frames for the larger project to use/visualize.
I have a list of file names to pass:
# list of files to read
frameList <-c("apples", "bananas", "pears")
This function iterates over the list and runs the functions to create the data frames if they are not already present.
populateFrames <- function(){
for (frame in frameList){
if (exists(frame) && is.data.frame(get(frame))){
# do nothing
}
else {
frame <- clean_data(gather_data(frame))
}
}
}
When executed, the function runs with no errors, but does not save any data frame to the environment.
I can manually run the same thing and that saves a data frame:
# manually create "apples" data frame
apples <- clean_data(gather_data(frameList[1]))
From my reading through similar questions here, I see that assign() is used for similar things. But in the same way as before, I can run the code manually fine; but when put inside the loop no data frame is saved to the environment.
# returns a data frame, "apples" to the environment
assign(x = frame[1], value = clean_data(gather_data(frame[1])))
Solutions, following the principle of "change as little about the OPs implementation as possible".
You have two problems here.
Your function is not returning anything, so any changes that happen are stuck in the environment of the function
I think you're expecting the re-assignment of framein the elsestatement to re-assign it to that element in frameList. It's not.
This is the NOT RECOMMENDED* way of doing this where you assign a variable in the function's parent environment. In this case you are populatingFrames as a side effect, mutating the frameList in the parent environment. Mutating the input is generally something you want to avoid if you want to practice defensive programming.
populateFrames <- function(){
for (i in seq_along(frameList)){
if (exists(frameList[[i]]) && is.data.frame(get(frameList[[i]]))){
# do nothing
}
else {
frameList[[i]] <<- clean_data(gather_data(frameList[[i]]))
}
}
}
This is the RECOMMENDED version where you return the new frameList (which means you have to assign it to a value).
populateFrames <- function(){
for (i in seq_along(frameList)){
if (exists(frameList[[i]]) && is.data.frame(get(frameList[[i]]))){
# do nothing
}
else {
frameList[[i]] <- clean_data(gather_data(frameList[[i]]))
}
}
frameList
}
Avoiding global variable assignments, which are typically a no-no, try lapply:
lapply(
frameList,
function(frame){
if(exists(frame) && is.data.frame(get(frame))){
frame
}else{
clean_data(gather_data(frame))
}
}
)

How to read all the files in a folder using R and create objects with the same file names?

I need to create a function in R that reads all the files in a folder (let's assume that all files are tables in tab delimited format) and create objects with same names in global environment. I did something similar to this (see code below); I was able to write a function that reads all the files in the folder, makes some changes in the first column of each file and writes it back in to the folder. But the I couldn't find how to assign the read files in to an object that will stay in the global environment.
changeCol1 <- function () {
filesInfolder <- list.files()
for (i in 1:length(filesInfolder)){
wrkngFile <- read.table(filesInfolder[i])
wrkngFile[,1] <- gsub(0,1,wrkngFile[,1])
write.table(wrkngFile, file = filesInfolder[i], quote = F, sep = "\t")
}
}
You are much better off assigning them all to elements of a named list (and it's pretty easy to do, too):
changeCol1 <- function () {
filesInfolder <- list.files()
lapply(filesInfolder, function(fname) {
wrkngFile <- read.table(fname)
wrkngFile[,1] <- gsub(0, 1, wrkngFile[,1])
write.table(wrkngFile, file=fname, quote=FALSE, sep="\t")
wrkngFile
}) -> data
names(data) <- filesInfolder
data
}
a_list_full_of_data <- changeCol1()
Also, F will come back to haunt you some day (it's not protected where FALSE and TRUE are).
add this to your loop after making the changes:
assign(filesInfolder[i], wrkngFile, envir=globalenv())
If you want to put them into a list, one way would be, outside your loop, declare a list:
mylist = list()
Then, within your loop, do like so:
mylist[[filesInfolder[i] = wrkngFile]]
And then you can access each object by looking at:
mylist[[filename]]
from the global env.

loading R objects and creating graphs with rscript

I am loading all the .RData files in a directory one at a time. Depending on the objects within the .RData files, I want to create graphs. But the print statement returns NULL so my assignment statement to x is not working. What am I doing wrong?
for (i in dir("c:\\data", pattern = "^data"))
{
tmpenv <- new.env()
load(i, envir=tmpenv)
for (y in ls(envir=tmpenv))
{
x <- tmpenv$y
print(head(x))
}
}

Resources