Modifying a function from a package in Julia - julia

First I'm sorry if this question seems vague, because I'm new to Julia and I'm mostly just wondering in Julia if there's an easy way to modify a native function of a package while simply using all its dependencies in general, or there has to be a package-specific solution.
For example, foo is a function from the package LibraryA, and it can be simply called and used as follows without problem:
using LibraryA
foo(args)
However, if I want a slightly modified version of foo(), say bar(), whose source code (mostly copy of the foo() function) is in a separate file bar.jl and uses a lot of dependencies from LibraryA, the lines
using LibraryA
include("bar.jl")
bar(args)
will not work as lots of loaderrors occur:
LoadError: UndefVarError: vars not defined
where vars are the variables defined in the package LibraryA

As long as the variables are in the global scope in LibraryA, you can import them explicitly to make them available in the current scope:
using LibraryA: vars, bigvar, littlevar, baz
include("bar.jl")
bar(args)
As a matter of convention, however, make sure that these variables are not intended to be "private" i.e. intended for internal use in LibraryA (there's no explicit private marker in Julia, it's generally inferred based on documentation).
If a variable vars itself appears to be private, but has an accessor getvars that's public, you can instead create a new variable vars in the current scope using that accessor.
using LibraryA: bigvar, littlevar, baz, getvars
vars = getvars()
include("bar.jl")
bar(args)
(Make sure to make it a const var if it was const originally, and similarly type annotate it var::WhateverTheTypeOfVarIs if it is annotated in the original module as well.)

Related

How to include a closure in an R-package?

I would like to include a closure with the functions of an R package we are writing. The function (and its siblings) will have data in its environment, perform a comparison of input with the data, and return the result. To illustrate, think of a function with an inbuilt telephone directory: you query with a number and the function returns a name.
This function will be called as a helper by several other functions in our R package, so it has to exist once the package is loaded. And we want the function to be available in the package environment, just like any other function.
Should I create it via its factory function in .onLoad() and assign() it to the package environment? Could I ship it as an .RDS? Or RData, or does this violate CRAN policy on "binary executable code"? Or is there a different, canonical way? And where would the code and the data (or the RDS/RData) go in the package directory structure?
(I see that the question of how to document a closure has been discussed here).
For the benefit of anyone stumbling on this question. The solution I finally worked out involved a few steps but is "clean" as far as I can tell.
Put the factory function in a file R/aaa.R to ensure it gets loaded before the closure.
Put the data that the closure uses into the standard inst/extdata/ folder.
Put a file with the closure's name and proper docstring into R/: define the closure as a normal function that just returns nothing. This is necessary so the function is properly exported and known in the package namespace. Immediately call the factory function to create the closure and overwrite the original definition. Note: it's not enough to just bring the data into the factory function as an argument, it actually needs to be accessed before defining the closure. Why? That's because lazy loading won't actually have loaded the data into the environment you need it in unless you access it.
That's all. Summary: create a stub for your closure, then overwrite that with the return value of the factory function.
If the factory function is called later by the package user
but we still want the returned closure to be inside the package (for example if we don't want it to be changed by anything other than the factory, reliably accessible from within the package, documented etc..):
# exported function (visible to user)
# everything this function does is 'outsourced'
# to a non-exported function that we can overwrite with the factory:
visible_function(...){
hidden_function(...)
}
# not exported function (invisible to the user)
# called by the visible function
# fails unless factory is called first
hidden_function(x){
stop("call factory_fun() before you can use visible_function()")
}
# exported function, visible to the user.
# changes the hidden function called by the visible function
factory_function(x){
produced_function<-function(){
print(paste(x, "is an object forever stored in my namespace!"))
}
assignInNamespace("hidden_function",
produced_function,
ns="myPackageName")
}
Note that R CMD check throws a NOTE on assignInNamespace so CRAN won't easily accept this solution

Is it unwise to modify the class of functions in other packages?

There's a bit of a preamble before I get to my question, so hang with me!
For an R package I'm working on I'd like to make it as easy as possible for users to partially apply functions inline. I had the idea of using the [] operators to call my partial application function, which I've name "partialApplication." What I aim to achieve is this:
dnorm[mean = 3](1:10)
# Which would be exactly equivalent to:
dnorm(1:10, mean = 3)
To achieve this I tried defining a new [] method for objects of class function, i.e.
`[.function` <- function(...) partialApplication(...)
However, R gives a warning that the [] method for function objects is "locked." (Is there any way to override this?)
My idea seemed to be thwarted, but I thought of one simple solution: I can invent a new S3 class "partialAppliable" and create a [] method for it, i.e.
`[.partialAppliable` = function(...) partialApplication(...)
Then, I can take any function I want and append 'partialAppliable' to its class, and now my method will work.
class(dnorm) = append(class(dnorm), 'partialAppliable')
dnorm[mean = 3](1:10)
# It works!
Now here's my question/problem: I'd like users to be able to use any function they want, so I thought, what if I loop through all the objects in the active environment (using ls) and append 'partialAppliable' to the class of all functions? For instance:
allobjs = unlist(lapply(search(), ls))
#This lists all objects defined in all active packages
for(i in allobjs) {
if(is.function(get(i))) {
curfunc = get(i)
class(curfunc) = append(class(curfunc), 'partialAppliable')
assign(i, curfunc)
}
}
Voilà! It works. (I know, I should probably assign the modified functions back into their original package environments, but you get the picture).
Now, I'm not a professional programmer, but I've picked up that doing this sort of thing (globally modifying all variables in all packages) is generally considered unwise/risky. However, I can't think of any specific problems that will arise. So here's my question: what problems might arise from doing this? Can anyone think of specific functions/packages that will be broken by doing this?
Thanks!
This is similar to what the Defaults package did. The package is archived because the author decided that modifying other package's code is a "very bad thing". I think most people would agree. Just because you can do it, does not mean it's a good idea.
And, no, you most certainly should not assign the modified functions back to their original package environments. CRAN does not like when packages modify the users search path unnecessarily, so I would be surprised if they allowed a package to modify other package's function arguments.
You could work around that by putting all the modified functions in an environment on the search path. But then you have to ensure that environment is always searched first, which means modifying the search path every time another package is loaded.
Changing arguments for functions in other packages also has the potential to make it very difficult for others to reproduce your results because they must have all your argument settings. Unless you always call functions with all their arguments specified, which defeats the purpose of what you're trying to do.

Difference between environment and namespace

I know what a namespace is from other languages but in R I just cannot find a difference between the environment and namespace. Could anyone explain this since in the tutorials I have read (as The Art of R Programming and others) I just cannot find a distinction?
A namespace is something specific to a package. It is defined as a list of directive that allow you to import functions from other packages to be used locally or to export your functions and classes to be used in R.
So if you have created in your package a function called foo you will add to your namespace something like export(foo) to make your function usable.
If you want to import function from a specific package to use them in yours, you will add import(thePackage)
The environment is simply the space where you associate names to values. You can see it as a context in which you can evaluate functions and expressions.

R package can see variables not passed to it

I am writing a new R package and find that variables that I have not explicitly passed to a function in the package (as input argument) are visible within it, e.g.:
myFunc <- function(a,b,c) {
print(d)
}
where d is in the caller .R script, but has not been passed to myFunc, is visible.
Any help would be great, thanks; I'm using R 3.2.4 and have been using roxygen2 (via devtools::document()) to create the NAMESPACE if that helps.
Isn't this just a consequence of the scoping rules in R?
Your function defines a new myFunc environment. When you try to reference d in print(d), the interpreter first checks the myFunc environment for an object called d. Because no such object exists, the interpreter next checks the calling environment for an object called d. It finds the variable defined in your .R script and then prints it.
Here's a link with more info and a pile of examples.
Very useful link, thanks. It looks like forcing limited scoping within a function (i.e. getting a function to not access the global scope) is not a default property of R.
I found a similar question here: R force local scope
Using the checkStrict function posted by the main responder to that question seems to have worked; it found an unintended use of a global variable.
> require(myCustomPackage)
> checkStrict(showDendro)
Warning message:
In checkStrict(showDendro) : global variables used: palName
where showDendro is a function inside my custom package.
So it seems the solution to my problem is:
1) while you can stop R from moving up to the global environment by enclosing all your functions in the local() function , that seems like a tedious solution.
2) when moving code from the general environment into its own function, run something like checkStrict to remove unintended use of global variables.

Dynamically Generate Reference Classes

I'm attempting to generate reference classes within an R package on the fly, and it's proving to be fairly difficult. Here are the approaches I've taken and problems I've run into:
I'm creating a package in which I hope to be able to dynamically read in a schema and automatically generate an associated reference class (think SOAP). Of course, this means I won't be able to define my reference classes before-hand in the package sources.
I initially attempted to create a new class using a simple:
myClass <- setRefClass("NewClassName", fields=list(fieldA="character"))
which, of course, works fine when executed interactively, but when included in the package sources, I get a locked binding error. From my reading, it looks like this occurs because when running interactively, the class information is being stored in the global environment, which is not locked, while my package's base environment is locked.
I then found a thread that suggested using something to the effect of:
myClass <- setRefClass("NewClassName", fields=list(fieldA="character"), where=globalenv())
This actually crashed R/Studio when I tried to build the package, so I don't have a log of the error it generated, unfortunately, but it certainly didn't work.
Next I tried creating a new environment within my package which I could use to store these reference classes. So I added a .classEnv <- new.env() line in my package sources (not inside of any function) and then attempted to use this class when creating a new reference class:
myClass <- setRefClass("NewClassName", fields=list(fieldA="character"), where=.classEnv)
This actually seemed to work OK, but generates the following warning:
> myClass <- setRefClass("NewClassName", where=.classEnv)
Warning message:
In getPackageName(where) :
Created a package name, ‘2013-04-23 10:19:14’, when none found
So, for some reason, methods::getPackageName() isn't able to pick up which package my new environment is in?
Is there a way to create my new environment differently so that getPackageName() can properly recognize the package? Can I add some feature which allows me to help getPackageName() detect the package? Will this even work if I can deal with the warning, or am I misusing reference classes by trying to create them dynamically?
To get the conversation going, I found that getpackageName stores the package name in a hidden .packageName variable in the specified environment.
So you can actually get around the warning with
assign(".packageName", "MyPkg", envir=.classEnv)
myClass <- setRefClass("NewClassName", fields=classFields, where=.classEnv)
which resolves the warning, but the documentation says not to trust the .packageName variable indefinitely, and I still feel like I'm hacking this in and may be misunderstanding something important about reference classes and their relationship to environments.
Full details from documentation:
Package names are normally installed during loading of the package, by the INSTALL script or by the library function. (Currently, the name is stored as the object .packageName but don't trust this for the future.)
Edit:
After reading a little further, the setPackageName method may be a more reliable way to set the package name for the environment. Per the docs:
setPackageName can be used to establish a package name in an environment that would otherwise not have one. This allows you to create classes and/or methods in an arbitrary environment, but it is usually preferable to create packages by the standard R programming tools (package.skeleton, etc.)
So it looks like one valid solution would be the following:
setPackageName("MyPkg", .classEnv)
myClass <- setRefClass("NewClassName", fields=classFields, where=.classEnv)
That eliminates the warning message and doesn't rely on anything that's documented as unstable. I'm still not clear why it's necessary, but...

Resources