Creating and serializing / saving global variable from within a NAMESPACE in R - r

I would like to create a function within a package with a NAMESPACE that will save some variables. The problem is that when load is called on the .Rdata file it
tries to load the namespace of the package that contained the function that created the .Rdata file, but this package need not be loaded.
This example function is in a package in a namespace :
create.global.function <- function(x, FUN, ...) {
environment(FUN) <- .GlobalEnv
assign(".GLOBAL.FUN", function(x) { FUN(x, ...) }, env=.GlobalEnv)
environment(.GLOBAL.FUN) <- .GlobalEnv
save(list = ls(envir = .GlobalEnv, all.names = TRUE),
file = "/tmp/.Rdata",
envir = .GlobalEnv)
}
The environment(.GLOBAL.FUN) <- .GlobalEnv calls are not sufficient and attaching gdb to the R process confirms it is serializing a NAMESPACESXP here with the name of the package namespace and the load fails because it is unable to load this.
Is it possible to fully strip the namespace out of the .GLOBAL.FUN before I save it such that it can be loaded into other R instances without trying to load the namespace?

#JorisMeys snowfall and the others do not offer exactly this functionality.
snowfall uses sfExport ( from clusterFunctions.R in snowfall) to export local and global objects to the slave nodes, and this in turn uses sfClusterCall which is a wrapper around the clusterCall function from snow.
res <- sfClusterCall( assign, name, val, env = globalenv(),
stopOnError = FALSE )
And the snow library is loaded on the clients getting around any namespace issues as I mentioned in the last sentence of my question I would like to not load the namespace there.
Furthermore, it seems to make simplified assumptions such as that the nodes will share an NFS mount point for shared data (e.g. sfSource function in clusterFunctions.R).
I am more interested in something like a case where a node saves an .Rdata file then scp's it to another node that need not have the package namespace loaded.
It seems I can for now solve my original problem by using eval.parent and substitute:
assign(".GLOBAL.FUN",
eval.parent(substitute(function(y) { FUN(y, ...) })),
env=.GlobalEnv)
I apologize for the posting snafu, but I do not have an edit link although I posted this question, nor is there any place for me to leave a "comment" in the same way that I have this big text field for an answer. I've flagged this for moderation so I can get some help with that and have referenced the FAQ which talks about buttons that do not appear for me for leaving comments. there is some problem with this new account.

Related

R best practices: do I need to "unset" a RefClass?

BLUP: What are the risks of creating an environment whose parent is .GlobalEnv, and calling setRefClass within this environment?
I have a package (repository on Github) that loads the contents of an R file provided by HDFql. This wrapper contains a call to setRefClass. After trying a lot of different things (most of which failed) I settled on sourcing the wrapper into an environment that is a child of .GlobalEnv. The environment itself lives in an environment contained in the package. This nesting was required to get around binding errors, because the call to setRefClass fails if it is executed inside an environment whose ancestor is the package namespace. The global environment seemed to be the only environment suitable for the setRefClass evaluation.
However, I'm a bit worried about creating and using an environment in a package whose parent is .GlobalEnv, and making calls to setRefClass inside this environment. What are potential pitfalls when doing this? Are there best practices for removing or "unsetting" the RefClass when finished? Is there a better solution I am not thinking of?
I have included some sample code below, although it is not reproducible; if you want a reproducible example, you can clone the package repository and/or install it with devtools. The code in question lives in the function hql_load() in file connect.r.
# "hql" is an empty environment exported by the package
constants = new.env(parent = .GlobalEnv)
source(constants.file, local = constants)
assign("constants", constants, envir = hql)
where constants.file contains the code
hdfql_cursor_ <- setRefClass("hdfql_cursor_",
field = list(address = "numeric"),
method = list(
finalize = function(){
.Call("_hdfql_cursor_destroy", .self$address, PACKAGE = "HDFqlR")
}
)
)

Function imported from dependency not found, requires library(dependency)

I am trying to create an R package that uses functions from another package (gamlss.tr).
The function I need from the dependency is gamlss.dist::TF (gamlss.dist is loaded alongside gamlss.tr), but it is referenced in my code as simply TF within a call to gamlss.tr::gen.trun.
When I load gamlss.tr manually with library(), this works. However, when I rely on the functions of the dependency automatically being imported by my package through #import, I get an "object not found" error as soon as TF is accessed.
My attempt to be more explicit and reference the function I need as gamlss.dist::TF resulted in a different error ("unexpected '::'").
Any tips on how to use this function in my package would be much appreciated!
The code below reproduces the problem if incorporated into a clean R package (as done in this .zip), built and loaded with document("/path/to/package"):
#' #import gamlss gamlss.tr gamlss.dist
NULL
#' Use GAMLSS
#'
#' Generate a truncated distribution and use it.
#' #export
use_gamlss <- function() {
print("gen.trun():")
gamlss.tr::gen.trun(par=0,family=TF)
#Error in inherits(object, "gamlss.family") : object 'TF' not found
#gamlss.tr::gen.trun(par=0,family=gamlss.dist::TF)
#Error in parse(text = fname) : <text>:1:1: unexpected '::'
y = rTFtr(1000,mu=10,sigma=5, nu=5)
print("trun():")
truncated_dist = gamlss.tr::trun(par=0,family=TF, local=TRUE)
model = gamlss(y~1, family=truncated_dist)
print(model)
}
use_gamlss() will only start working once a user calls library(gamlss.tr).
This is due to bad design of gamlss.tr in particular the trun.x functions (they take character vectors instead of family objects / they evaluate everything in the function environment instead of the calling environment).
To work around this, you have to make sure that gamlss.distr is in the search path of the execution environment of gamlss.tr functions (This is why ## import-ing it in your package does not help: it would need to be #' #import-ed in gamlss.tr).
This can be achieved by adding it to Depends: of your package.
If you want to avoid that attaching your package also attaches gamlss.distr, you could also add the following at the top of use_gamlss:
nsname <-"gamlss.dist"
attname <- paste0("package:", nsname)
if (!(attname %in% search())) {
attachNamespace(nsname)
on.exit(detach(attname, character.only = TRUE))
}
This would temporarily attach gamlss.dist if it is not attached already.
You can read more on namespaces in R in Hadley Wickham's "Advanced R"

Employ environments to handle package-data in package-functions

I recently wrote a R extension. The functions use data contained in the package and must therefore load them. Subroutines also need to access the data.
This is the approach taken:
main<- function(...){
data(data)
sub <- function(...,data=data){...}
...
}
I'm unhappy with the fact that the data resides in .GlobalEnv so it still hangs around when the function had terminated (also undermining the downpassing via argument concept).
Please put me on the right track! How do you employ environments, when you have to handle package-data in package-functions?
It looks that you are looking for the LazyData directive in your namepace:
LazyData: yes
Othewise, data has the envir argument you can use to control in which environment you want to load your data, so for example if you wanted the data to be loaded inside main, you could use :
main<- function(...){
data(data, envir = environment() )
sub <- function(...,data=data){...}
...
}
If the data is needed for your functions, not for the user of the package, it should be saved in a file called sysdata.rda located in the R directory.
From R extensions:
Two exceptions are allowed: if the R subdirectory contains a file
sysdata.rda (a saved image of R objects: please use suitable
compression as suggested by tools::resaveRdaFiles) this will be
lazy-loaded into the namespace/package environment – this is intended
for system datasets that are not intended to be user-accessible via
data.

extracting source code from r package

I am trying to install the r package sowas and unfortunately it is too old to implement in the new versions of r.
According to the author you can use the package using the source() function to gain access to the code but I have not been able to figure out how to do that.
Any help is appreciated.
Here is a link to the package I described as it is not a CRAN package: http://tocsy.pik-potsdam.de/wavelets/
The .zip file is a windows binary and as such it won't be too interesting. What you'll want to look at is the contents of the .tar.gz archive. You can extract those contents and then look at the code in the R subdirectory.
You could also update the package to work with new versions of R so that you can actually build and install the package. To do so you could unpack the .tar.gz as before but now you'll need to add a NAMESPACE file. This is just a plaintext file at the top of the package directory that has a form like:
export(createar)
export(createwgn)
export(criticalvaluesWCO)
export(criticalvaluesWSP)
export(cwt.ts)
export(plot.wt)
export(plotwt)
export(readmatrix)
export(readts)
export(rk)
export(wco)
export(wcs)
export(writematrix)
export(wsp)
Where you have an export statement for any function in the package that you actually want to be able to use. If a function isn't exported then the functions in the package still have access to that function but the user can't use it (as easily). Once you do that you should be able to build and install the package.
I took the liberty of doing some of this already. I haven't actually taken the time to figure out which functions are useful and should be exported and just assumed that if a help page was written for the function that it should be exported and if there wasn't a help page then I didn't export it. I used Rd2roxygen to convert the help pages to roxygen code (because that's how I roll) and had to do a little bit of cleanup after that but it seems to install just fine.
So if you have the devtools package installed you should actually be able to install the version I modified directly by using the following commands
library(devtools)
install_github("SOWAS", "Dasonk")
Personally I would recommend that you go the route of adding the NAMESPACE file and what not directly as then you'll have more control over the code and be more able to fix any problems that might occur when using the package. Or if you use git you could fork my repo and continue fixing things from there. Good luck.
If you want to see the source code of a particular function, then just type the name of the function without the braces and press enter. You will see the code.
For example type var in command prompt to see it's code.
> var
function (x, y = NULL, na.rm = FALSE, use)
{
if (missing(use))
use <- if (na.rm)
"na.or.complete"
else "everything"
na.method <- pmatch(use, c("all.obs", "complete.obs", "pairwise.complete.obs",
"everything", "na.or.complete"))
if (is.na(na.method))
stop("invalid 'use' argument")
if (is.data.frame(x))
x <- as.matrix(x)
else stopifnot(is.atomic(x))
if (is.data.frame(y))
y <- as.matrix(y)
else stopifnot(is.atomic(y))
.Call(C_cov, x, y, na.method, FALSE)
}
<bytecode: 0x0000000008c97980>
<environment: namespace:stats>

Examining contents of .rdata file by attaching into a new environment - possible?

I am interested in listing objects in an RDATA file and loading only selected objects, rather than the whole set (in case some may be big or may already exist in the environment). I'm not quite clear on how to do this when there are conflicts in names, as attach() doesn't work as nicely.
1: For examining the contents of an R data file without loading it: This question is similar, but different from, the one asked at listing contents of an R data file without loading
In that case, the solution offered was:
attach(filename)
ls(pos = 2)
detach()
If there are naming conflicts between objects in the file and those in the global environment, this warning appears:
The following object(s) are masked _by_ '.GlobalEnv':
I tried creating a new environment, but I cannot seem to attach into that.
For instance, this produces the same error:
lsfile <- function(filename){
tmpEnv <- new.env()
evalq(attach(filename), envir = tmpEnv)
tmpls <- ls(pos = 2)
detach()
return(tmpls)
}
lsfile(filename)
Maybe I've made a mess of things with evalq (or eval). Is there some other way to avoid the naming conflict?
2: If I want to access an object - if there are no naming conflicts, I can just work with the one from the .rdat file, or copy it to a new one. If there are conflicts, how does one access the object in the file's namespace?
For instance, if my file is "sample.rdat", and the object is surveyData, and a surveyData object already exists in the global environment, then how can I access the one from the file:sample.rdat namespace?
I currently solve this problem by loading everything into a temporary environment, and then copy out what's needed, but this is inefficient.
Since this question has just been referenced let's clarify two things:
attach() simply calls load() so there is really no point in using it instead of load
if you want selective access to prevent masking it's much easier to simply load the file into a new environment:
e = local({load("foo.RData"); environment()})
You can then use ls(e) and access contents like e$x. You can still use attach on the environment if you really want it on the search path.
FWIW .RData files have no index (the objects are stored in one big pairlist), so you can't list the contained objects without loading. If you want convenient access, convert it to the lazy-load format instead which simply adds an index so each object can be loaded separately (see Get specific object from Rdata file)
I just use an env= argument to load():
> x <- 1; y <- 2; z <- "foo"
> save(x, y, z, file="/tmp/foo.RData")
> ne <- new.env()
> load(file="/tmp/foo.RData", env=ne)
> ls(env=ne)
[1] "x" "y" "z"
> ne$z
[1] "foo"
>
The cost of this approach is that you do read the whole RData file---but on the other hand that seems to be unavoidable anyway as no other method seems to offer a list of the 'content' of such a file.
You can suppress the warning by setting warn.conflicts=FALSE on the call to attach. If an object is masked by one in the global environment, you can use get to retreive it from your attached data.
x <- 1:10
save(x, file="x.rData")
#attach("x.rData", pos=2, warn.conflicts=FALSE)
attach("x.rData", pos=2)
(x <- 1)
# [1] 1
(x <- get("x", pos=2))
# [1] 1 2 3 4 5 6 7 8 9 10
Thanks to #Dirk and #Joshua.
I had an epiphany. The command/package foreach with SMP or MC seems to produce environments that only inherit, but do not seem to conflict with, the global environment.
lsfile <- function(list_files){
aggregate_ls = foreach(ix = 1:length(list_files)) %dopar% {
attach(list_files[ix])
tmpls <- ls(pos = 2)
return(tmpls)
}
return(aggregate_ls)
}
lsfile("f1.rdat")
lsfile(dir(pattern = "*rdat"))
This is useful to me because I can now parallelize this. This is a bare-bones version, and I will modify it to give more detailed information, but so far it seems to be the only way to avoid conflicts, even without ignore.
So, question #1 can be resolved by either ignoring the warnings (as #Joshua suggested) or by using whatever magic foreach summons.
For part 2, loading an object, I think #Joshua has the right idea - "get" will do.
The foreach magic can also work, by using the .noexport option. However, this has risks: whatever isn't specifically excluded will be inherited/exported from the global environment (I could do ls(), but there's always the possibility of attached datasets). For safety, this means that get() must still be used to avoid the risk of a naming conflict. Loading into a subenvironment avoids the naming conflict, but doesn't avoid the loading of unnecessary objects.
#Joshua's answer is far simpler than my foreach detour.

Resources