I want to save a function in a JLD2 file. Is this even possible and if so how?
using JLD2
a = [1,2,3]
function foo(x, y)
x .> y
end
foo(x) = foo(x, a)
save_object("stored.jld2", foo)
So, I guess my point here is actually two things
I want to save the function.
I want it to have the method foo(x) available, i.e. if the jld2 file is opened somewhere else it has the method foo(x) with a = [1,2,3].
When someone builds a machine learning model and saves it, it must work something like this, right?
Really hope it's understandable what I mean here. If not please let me know.
OK, after some more research and some thinking, I come to the conclusion (happy to be proven wrong...) that this is not possible and/or not sensible. The comments in the question suggest using machine learning packages to save models and I think this approach explains the underlying point...
You can store data in jld2 files and but you can not store functions in jld2 files. What I tried here is not going to work because
foo(x) = foo(x, a)
accesses object a in the global environment - it does not magically store the data of a within the function. Furthermore, functions cannot be saved with JLD2 - who know's, the latter one is maybe gonna change in the future.
Great, but what's the solution now??
What you can instead do is store the data...
using JLD2
a = [1,2,3]
save("bar.jld2", Dict("a" => a))
...and make the function available through a module or a script you include().
include("module_with_foo_function.jl")
using module_with_foo_function
using JLD2
my_dict = load("bar.jld2")
a = my_dict["a"]
foo(x) = foo(x, a)
And if you think about it, this is exactly what packages like MLJ are doing as well. There you store an object that contains the information, but the functionality comes from the package, not from your stored object.
Not sure anyone ever has a similar problem, but if so I hope these thoughts help.
Related
I have a bunch of functions and I'm trying to keep my workspace clean by defining them in an environment and attaching the environment. Some of the functions are S3 generics, and they don't seem to play well with this approach.
A minimum example of what I'm experiencing requires 4 files:
testfun.R
ttt.xxx <- function(object) print("x")
ttt <- function(object) UseMethod("ttt")
ttt2 <- function() {
yyy <- structure(1, class="xxx")
ttt(yyy)
}
In testfun.R I define an S3 generic ttt and a method ttt.xxx, I also define a function ttt2 calling the generic.
testenv.R
test_env <- new.env(parent=globalenv())
source("testfun.R", local=test_env)
attach(test_env)
In testenv.R I source testfun.R to an environment, which I attach.
test1.R
source("testfun.R")
ttt2()
xxx <- structure(1, class="xxx")
ttt(xxx)
test1.R sources testfun.R to the global environment. Both ttt2 and a direct function call work.
test2.R
source("testenv.R")
ttt2()
xxx <- structure(1, class="xxx")
ttt(xxx)
test2.R uses the "attach" approach. ttt2 still works (and prints "x" to the console), but the direct function call fails:
Error in UseMethod("ttt") :
no applicable method for 'ttt' applied to an object of class "xxx"
however, calling ttt and ttt.xxx without arguments show that they are known, ls(pos=2) shows they are on the search path, and sloop::s3_dispatch(ttt(xxx)) tells me it should work.
This questions is related to Confusion about UseMethod search mechanism and the link therein https://blog.thatbuthow.com/how-r-searches-and-finds-stuff/, but I cannot get my head around what is going on: why is it not working and how can I get this to work.
I've tried both R Studio and R in the shell.
UPDATE:
Based on the answers below I changed my testenv.R to:
test_env <- new.env(parent=globalenv())
source("testfun.R", local=test_env)
attach(test_env)
if (is.null(.__S3MethodsTable__.))
.__S3MethodsTable__. <- new.env(parent = baseenv())
for (func in grep(".", ls(envir = test_env), fixed = TRUE, value = TRUE))
.__S3MethodsTable__.[[func]] <- test_env[[func]]
rm(test_env, func)
... and this works (I am only using "." as an S3 dispatching separator).
It’s a little-known fact that you must use .S3method() to define methods for S3 generics inside custom environments (outside of packages).1 The reason almost nobody knows this is because it is not necessary in the global environment; but it is necessary everywhere else since R version 3.6.
There’s virtually no documentation of this change, just a technical blog post by Kurt Hornik about some of the background. Note that the blog post says the change was made in R 3.5.0; however, the actual effect you are observing — that S3 methods are no longer searched in attached environments — only started happening with R 3.6.0; before that, it was somehow not active yet.
… except just using .S3method will not fix your code, since your calling environment is the global environment. I do not understand the precise reason why this doesn’t work, and I suspect it’s due to a subtle bug in R’s S3 method lookup. In fact, using getS3method('ttt', 'xxx') does work, even though that should have the same behaviour as actual S3 method lookup.
I have found that the only way to make this work is to add the following to testenv.R:
if (is.null(.__S3MethodsTable__.)) {
.__S3MethodsTable__. <- new.env(parent = baseenv())
}
.__S3MethodsTable__.$ttt.xxx <- ttt.xxx
… in other words: supply .GlobalEnv manually with an S3 methods lookup table. Unfortunately this relies on an undocumented S3 implementation detail that might theoretically change in the future.
Alternatively, it “just works” if you use ‘box’ modules instead of source. That is, you can replace the entirety of your testenv.R by the following:
box::use(./testfun[...])
This code treats testfun.R as a local module and loads it, attaching all exported names (via the attach declaration [...]).
1 (and inside packages you need to use the equivalent S3method namespace declaration, though if you’re using ‘roxygen2’ then that’s taken care of for you)
First of all, my advice would be: don't try to reinvent R packages. They solve all the problems you say you are trying to solve, and others as well.
Secondly, I'll try to explain what went wrong in test2.R. It calls ttt on an xxx object, and ttt.xxx is on the search list, but is not found.
The problem is how the search for ttt.xxx happens. The search doesn't look for ttt.xxx in the search list, it looks for it in the environment from which ttt was called, then in an object called .__S3MethodsTable__.. I think there are two reasons for this:
First, it's a lot faster. It only needs to look in one or two places, and the table can be updated whenever a package is attached or detached, a relatively rare operation.
Second, it's more reliable. Each package has its own methods table, because two packages can use the same name for generics that have nothing to do with each other, or can use the same class names that are unrelated. So package code needs to be able to count on finding its own definitions first.
Since your call to ttt() happens at the top level, that's where R looks first for ttt.xxx(), but it's not there. Then it looks in the global .__S3MethodsTable__. (which is actually in the base environment), and it's not there either. So it fails.
There is a workaround that will make your code work. If you run
.__S3MethodsTable__. <- list2env(list(ttt.xxx = ttt.xxx))
as the last line of testenv.R, then you'll create a methods table in the global environment. (Normally there isn't one there, because that's user space, and R doesn't like putting things there unless the user asks for it.)
R will find that methods table, and will find the ttt.xxx method that it defines. I wouldn't be surprised if this breaks some other aspect of S3 dispatch, so I don't recommend doing it, but give it a try if you insist on reinventing the package system.
I'm almost certain I've read somewhere how to do this. Instead of having to save the current option (say working directory) to a variable, change the w.d, do an operation, and then revert back to what it was, doing this inside a function akin to "with" relative to attach/detach. A solution just for working directory is what I need now, but there might be a more generic function that does that sort of things? Or ain't it?
So to illustrate... The way it is now:
curdir <- getwd()
setwd("../some/place")
# some operation
setwd(curdir)
The way it is in my wildest dreams:
with.dir("../some/place", # some operation)
I know I could write a function for this, I just have the impression there's something more readily available and generalizable to other parameters too.
Thanks
There is an idiom for this in some of R's base plotting functions
op <- par(no.readonly = TRUE)
# par(blah = stuff)
# plot(stuff)
par(op)
that is so unbelievably crude as to be fully portable to options() and setwd().
Fortunately it's also easy to implement a crude wrapper:
with_dir <- function(dir, expr) {
old_wd <- getwd()
setwd(dir)
result <- evalq(expr)
setwd(old_wd)
result
}
I'm no wizard with nonstandard evaluation so evalq could be unstable somehow. More on NSE in an old write-up by Lumley and also in Wickham's Advanced R, but it's dense stuff and I haven't wrapped my head around it all yet.
edit: as per Ben Bolker's comment, it's probably better to use on.exit for this:
with_dir <- function(dir, expr) {
old_wd <- getwd()
on.exit(setwd(old_wd))
setwd(dir)
evalq(expr)
}
From the R docs:
on.exit records the expression given as its argument as needing to be executed when the current function exits (either naturally or as the result of an error). This is useful for resetting graphical parameters or performing other cleanup actions.
What you're describing depends upon two things: detecting when you enter and leave a particular lexical scope, and defining a behavior to do on entrance and on exit. Python has these, called "Context Managers". This was a big deal when it was released, and many parts of Python's standard library now behave like context managers, and have to define the "enter" and "exit" behavior in explicitly, or by leveraging some clever inheritance scheme.
with.default
function (data, expr, ...)
eval(substitute(expr), data, enclos = parent.frame())
<bytecode: 0x07d02ccc>
<environment: namespace:base>
R's with function works sort of like a context manager, because it can pass scopes around easily. That said, this doesn't give you the "enter" and "exit" operations for free. Especially consider that the current working directory isn't an entry in the current scope, but a state of the R interpreter, which can only be queried or changed by function calls behind the .Internal shield.
You can easily define your own object types to have methods that are context manager-like for the with generic function, as well as writing and registering methods for other types you commonly use, but it is not part of the base R language.
I am moving from R, and I use the head() function a lot. I couldn't find a similar method in Julia, so I wrote one for Julia Arrays. There are couple other R functions that I'm porting to Julia as well.
I need these methods to be available for use in every Julia instance that launches, whether through IJulia or through command line. Is there a "startup script" of sorts for Julia? How can I achieve this?
PS: In case someone else is interested, this is what I wrote. A lot needs to be done for general-purpose use, but it does what I need it to for now.
function head(obj::Array; nrows=5, ncols=size(obj)[2])
if (size(obj)[1] < nrows)
println("WARNING: nrows is greater than actual number of rows in the obj Array.")
nrows = size(obj)[1]
end
obj[[1:nrows], [1:ncols]]
end
You can make a ~/.juliarc.jl file, see the Getting Started section of the manual.
As for you head function, here is how I'd do it:
function head(obj::Array; nrows=5, ncols=size(obj,2))
if size(obj,1) < nrows
warn("nrows is greater than actual number of rows in the obj Array.")
nrows = size(obj,1)
end
obj[1:nrows, 1:ncols]
end
I know something similar has been asked before here on SO, but the solution given there doesn't seem to apply in my case.
I'm trying to follow convention in creating a package by referring to functions exported from other namespaces and avoiding use of require() within a function.
I'm basically trying to prevent a function taking too long to run. For example,
fun <- function(i){
require(R.utils)
setTimeLimit(elapsed=10, transient=TRUE) # prevent taking more than 10secs
return(i^i)
}
>fun(10)
Works fine, but if I try:
require(R.utils)
fun <- function(i){
R.utils:::setTimeLimit(elapsed=10, transient=TRUE) # prevent taking more than 10secs
return(i^i)
}
>fun(10)
I get:
Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
object 'setTimeLimit' not found
Changing ::: to :: doesn't change this behavior.
I'm open to any simpler methods to achieving the same objective.
Also is it really so bad to have require() calls inside a function?
Many thanks!
EDIT:
If import works then great, thanks. Still in development so wanted to make sure it would be OK.
EDIT:
Apologies, it's there in base. Not sure how I missed this; I was originally using R.utils::evalWithTimeout and must have assumed both were in the same package. *looks sheepish*
I'm just posting this to prevent the question from showing up as unanswered, but will be glad to accept another...
isTRUE("setTimeLimit" %in% ls(getNamespace("base"), all.names=TRUE))
Is it possible in R to protect function names (or variables in general) so that they cannot be masked.
I recently spotted that this can be a problem when creating a data frame with the name "new", which masked a function used by lmer and thus stopped it working. (Recovery is easy once you know what the problem is, here "rm(new)" did it.)
There is an easy workaround for your problem, without worrying about protecting variable names (though playing with lockBinding does look fun). If a function becomes masked, as in your example, it is still possible to call the masked version, with the help of the :: operator.
In general, the syntax is packagename::variablename.
(If the function you want has not been exported from the package, then you need three colons instead, :::. This shouldn't apply in this case however.)
Maybe use environments! This is a great way to separate namespaces. For example:
> a <- new.env()
> assign('printer', function(x) print(x), envir=a)
> get('printer', envir=a)('test!')
[1] "test!"
#hdallazuanna recommends (via Twitter)
new <- 1
lockBinding('new', globalenv())
this makes sense when the variable is user created but does not, of course, prevent overwriting a function from a package.
I had the reverse problem from the OP, and I wanted to prevent my custom functions in .Rprofile from being overridden when I defined a variable with the same name as a function, but I ended up putting my functions to ~/.R.R and I added these lines to .Rprofile:
if("myfuns"%in%search())detach("myfuns")
source("~/.R.R",attach(NULL,name="myfuns"))
From the help page of attach:
One useful ‘trick’ is to use ‘what = NULL’ (or equivalently a
length-zero list) to create a new environment on the search path
into which objects can be assigned by assign or load or
sys.source.
...
## create an environment on the search path and populate it
sys.source("myfuns.R", envir = attach(NULL, name = "myfuns"))