R / matlab: restricting which libraries can be loaded - r

For an ML course I'm TAing next semester, we're using an autograding system. We're asking students to do their own implementations of some standard algorithms, so we'd like to restrict students from loading certain libraries (with either a blacklist or a whitelist, not sure.)
Are there any reasonable ways to do this with R or matlab? Or is inspecting the source code (i.e. regex/grep) the best way to go?

Use the trace function to change the behaviour of library. When the library function is called, the following code retrieves the name of the package that is passed to library, and then throws an error if it on the banned list.
trace(
base::library,
function()
{
package_name <- if(parent.frame()$character.only)
{
parent.frame()$package
} else
{
deparse(substitute(package, parent.frame()))
}
if(package_name %in% c("ggplot2", "lattice")) #or whichever packages are banned
{
stop("The ", sQuote(package_name), " package is not allowed")
}
}
)
library(ggplot2)
library("ggplot2", character.only = TRUE)
library(plyr)
You'll also need to trace the require function.
Beware sneaky students: if they know that this is how you are preventing package loading, then they can turn tracing off in their script (and maybe reenable it afterwards). You should perhaps check for calls to trace/untrace/traceOn/traceOff in their scripts too. How much effort you put into this depends on how much manual looking at their code you are going to do and how honest your students are. Weird evasive code like that should stand out if you read it.

Related

R: Packaging a function with trace

We have some internal R packages with a very large number of functions. As part of an effort to eliminate unused code I looked into covr and codetools::checkUsage and both are insufficient - so we opted to hook all functions with trace that would record activity somewhere. Toy example with no technical details:
> f <- function() { print("Doing very very important work") }
> trace(f, tracer=substitute(print("recording call")))
[1] "f"
> f()
Tracing f() on entry
[1] "recording call"
[1] "Doing very very important work"
The tracer operation does not significantly delay the work, but the tracing all package functions (~35K) takes ~3 minutes - and I'm looking for ways to shorten it.
Is there some way to package the functions with the trace, so it won't have to be added in a separate post-load stage? Is there another direction I didn't think of?
You can put the trace() calls into the source for your package. Just make sure the trace() call happens after the function definition, either by putting it later in the same source file, or by putting it in a separate file that collates after all your function definitions.
For example, if your package has a file R/fun.R containing this source,
fun <- function(x) {
print('this is fun!')
}
then simply add another line to R/fun.R so it looks like this instead:
fun <- function(x) {
print('this is fun!')
}
trace(fun, tracer=substitute(print("recording call")))
This works because of the way R installs and traces things:
trace modifies functions to insert the tracing.
installing executes all of the source files in the R directory, and saves the results.
So putting a trace call in your source will modify the function before it is saved, and it will stay modified for any user of that package.

Where and how to define a generic function, if multiple packages are used

I know there are related posts, but with insufficient answers. So please answer seriously to this question.
There are two packages ("keithley" and "xantrex") which control two different hardware devices. Therefore, both are independent from each other. Each of them must be initialised separately. So I wrote two methods
init.keithley(inst,...) # in keythley package
and
init.xantrex(inst,...) # in xantrex package
for the generic S3 function init(inst,...). I tried to declare the generic function in the keithley package and in the xantrex package, but then it is masked, once the latter is loaded and the methods where not found any more.
What I tried is the .onAttach()-hook
.onAttach <- function(libname, pkgname)
{
if(!exists("init"))
eval(expression(init <- function(inst,...) UseMethod("init")),envir = .GlobalEnv)
}
But with this it is NOT possible to evaluate the init() function within the package namespace. This can be proofed with the option envir = environment(), which will not work. I also tried setGenericS3() and setGeneric() with always the same result.
The "dirty" solution could be to define a third package and import it, but there must be a clean way to do this.
Where and how should I define the generic function?
Here is the solution:
As I understand, an attached package has three environments (e.g. "package:Xantrex", "namespace:Xantrex" and "imports:Xantrex") the different meaning of these is explained in detail here: Advanced R.
Now, we have to test whether the generic function init() is already there and if not we have to initialize it in the right environment. The following code will do that for us.
.onAttach <- function(libname, pkgname)
{
if(!exists("init",mode = "function"))
eval(expression(init <- function(inst,...) UseMethod("init")),envir = as.environment("package:Xantrex"))
}
The .onAttach-hook, is necessary to guarantee that the different namespaces are initialized. In contrast to that the .onLoad-hook, would be too early. Mention that the expression is evaluated in the package:Xantrex environment, so the generic becomes visible in the search path.
Next to that take care, that your NAMESPACE file will export(init.xantrex) and NOT S3method(init,xantrex). The latter will result an error, because the generic for the method init.xantrex()is not present while building the package.
Best!
Martin

Changing defaults in a function inside a locked package [duplicate]

This question already has answers here:
Setting Function Defaults R on a Project Specific Basis
(2 answers)
Closed 9 years ago.
I am developing my first package and it is aimed at users who are new to R, so I am trying to minimize the amount of R skills required to use the package. As a result I want a function that changes defaults in other functions within my package. But I get the following error "cannot add bindings to a locked environment", which means the environment of the package is locked and I am not allowed to change the default values of its functions.
Here is an example that throws a similar error:
library(ggplot2)
assign(formals(geom_point)$position, "somethingelse", pos="package:ggplot2")
When I try assignInNamespace i get:
Error in bindingIsLocked(x, ns) : no binding for "identity"
assignInNamespace(formals(geom_point)$position,"somethingelse", pos = "package:ggplot2")
Here is an example of what I hope to achieve.
default <- function(x=c("A", "B", "C")){
x
}
default()
change.default <- function(x){
formals(default)$x <<- x # Notice the global assign
}
change.default(1:3)
default()
I am aware that this is far from the recommended approach, but I am willing to cut corners to improve the learning curve of the package. Is there a way to achieve this?
This question has been marked as a duplicate of Setting Function Defaults R on a Project Specific Basis. This is a different situation as this question concerns how to allow the user in a interactive session to change the defaults of a function - not how to actually do it. The old question could not have been solved with the options() function and it is therefore a different question.
I think the colloquial way to achieve what you want is via option and packages in fact do so, e.g., lattice (although they use special options) or ascii.
Furthermore, this is also done so in base R, e.g., the famous and notorious default for stringsAsFactors.
If you look at ?read.table or ?data.frame you get: stringsAsFactors = default.stringsAsFactors(). Inspecting this reveals:
> default.stringsAsFactors
function ()
{
val <- getOption("stringsAsFactors")
if (is.null(val))
val <- TRUE
if (!is.logical(val) || is.na(val) || length(val) != 1L)
stop("options(\"stringsAsFactors\") not set to TRUE or FALSE")
val
}
<bytecode: 0x000000000b068478>
<environment: namespace:base>
The relevant part here is getOption("stringsAsFactors") which produces:
> getOption("stringsAsFactors")
[1] TRUE
Changing it is achieved like this:
> options(stringsAsFactors = FALSE)
> getOption("stringsAsFactors")
[1] FALSE
To do what you want your package would need to set an option, and the function take it's values form the options. Another function could then change the options:
options(foo=c("A", "B", "C"))
default <- function(x=getOption("foo")){
x
}
default()
change.default <- function(x){
options(foo=x)
}
change.default(1:3)
default()
If you want your package to set the options when loaded, you need to create a .onAttach or .onLoad function in zzz.R. My afex package e.g., does this and changes the default contrasts. In your case it could look like the following:
.onAttach <- function(libname, pkgname) {
options(foo=c("A", "B", "C"))
}
ascii does it via .onLoad (I don't remember what is the exact difference, but Writing R Extensions will help).
Preferably, a function has the following things:
Input arguments
A function body which does something with those arguments
Output arguments
So in your situation where you want to change something about the behavior of a function, changing the input arguments in the best way to go. See for example my answer to another post.
You could also use an option to save some global settings (e.g. which font to use, which PATH the packages you use are stored), see the answer of #James in the question I linked above. But use these things sparingly as it makes the code hard to read. I would primarily use them read only, i.e. set them once (either by the package or the user) and not allow functions to change them.
The unreadability stems from the fact that the behavior of the function is not solely determined locally (i.e. by the code directly working with it), but also by settings far away. This makes it hard to determine what a function does by purely looking at the code calling it, but you have to dig through much more code to fully understand what is going on. In addition, what if other functions change those options, making it even harder to predict what a given function will do as it depends on the history of functions. And here comes my earlier recommendation for read-only options back into play, if these are read only, some of the problems about readability are lessened.

Save package settings between sessions

Is there a definitive way to save options or information pertaining to a certain package between sessions?
For example say somebody made a game and released it as an R package. If they wanted to save high scores and not have them reset each time R started a new session what would be the best way to do this? Currently I can only think of storing a file in the users home directory but I'm not sure if I like that approach.
This may be an approach. I created a dummy package with a dummy function (any function I create is bound to be a dummy function) and a data set I called scores that I set as follows:
scores <- NA
Then I created the package with the scores data set.
Then I used the following to change the data set from within R.
loc <- paste0(find.package("new"), "/Data")
unlink(paste0(loc, "/scores.rda"), recursive = TRUE, force = FALSE)
scores <- 10
save(scores, file=paste0(loc, "/scores.rda"))
Then when I unloaded the library and re loaded agin the data set now says:
> scores
[1] 10
Could this be modified to do what you want? You'd have to have it save in between somehow but am not sure on how to do this without messing with .Last function.
EDIT:
It appears this option is not viable in that when you compile as a package and use lazy load it saves the data sets as:
RData.rbd, RData.rbx, not as .rda files. That means the approach I use above is kinda worthless in that we want it to automatically be recognized.
EDIT2
This approach works and I tried it on a package I made. You can't do lazy load of the data and you have to either explicitly use data(scores) or use data(scores) inside of the function you're calling. I also assigned scores to .scores int he global.env the first time it was created and used exists inside the function to see if it exists. If `.scores. existed I assigned that to scores within the function. Once you unload the library and laod again you never have to worry about that again.
Maybe an alternative is to save this as a function somehow that can be altered using Josh's advice here: Permanently replacing a function
I guess there is no way to store settings without saving them to disk or a database, some way or another. It can be done silently though by putting the code below in your ~/.Rprofile. However, if you have packages that save settings in other ways than using options you need to add them manually.
I know this is exactly what you said you did not want, but it might spark some debate at least.
.Last <- function(){
my.options <- options()
save(my.options, file="~/.Roptions.Rdata")
}
.First <- function(){
tryCatch({
load("~/.Roptions.Rdata")
do.call(options, my.options)
rm(my.options)
}, error=function(...){})
}
To my suprise try(..., silent=TRUE) gives a warning on startup if ~/.Roptions.Rdata does not exist, which is why I used tryCatch instead.
The modern answer to this problem is well explained at https://blog.r-hub.io/2020/03/12/user-preferences/
I think I will be trying the hoardr package! Here is an example that worked for me :)
x <- hoardr::hoard()
x$cache_path_set("yourpackage", type = 'user_cache_dir')
x$mkdir()
scores<-data.frame(
user=c("one","two","three"),
score=c("500,200,1100")
)
save(scores,file = file.path(x$cache_path_get(), "scores.rdata"))
x$list()
x$details()
#new session
x <- hoardr::hoard()
x$cache_path_set("yourpackage", type = 'user_cache_dir')
x$list()
x$details()
load(file = file.path(x$cache_path_get(), "scores.rdata"))
PS - you can see a working example in the rnoaa package found on at github "opensci/rnoaa". Check their R/onload.r file! I can expand if needed.

Search all existing functions for package dependencies?

I have a package that I wrote while learning R and its dependency list is quite long. I'm trying to trim it down, for two cases:
I switched to other approaches, and packages listed in Suggests simply aren't used at all.
Only one function out of my whole package relies on a given dependency, and I'd like to switch to an approach where it is loaded only when needed.
Is there an automated way to track down these two cases? I can think of two crude approaches (download the list of functions in all the dependent packages and automate a text search for them through my package's code, or load the package functions without loading the required packages and execute until there's an error), but neither seems particularly elegant or foolproof....
One way to check dependancies in all functions is to use the byte compiler because that will check for functions being available in the global workspace and issue a notice if it does not find said function.
So if you as an example use the na.locf function from the zoo package in any of your functions and then byte compile your function you will get a message like this:
Note: no visible global function definition for 'na.locf'
To correctly address it for byte compiling you would have to write it as zoo::na.locf
So a quick way to test all R functions in a library/package you could do something like this (assuming you didn't write the calls to other functions with the namespace):
Assuming your R files with the functions are in C:\SomeLibrary\ or subfolders there of and then you define a sourceing file as C:\SomeLibrary.r or similar containing:
if (!(as.numeric(R.Version()$major) >=2 && as.numeric(R.Version()$minor) >= 14.0)) {
stop("SomeLibrary needs version 2.14.0 or greater.")
}
if ("SomeLibrary" %in% search()) {
detach("SomeLibrary")
}
currentlyInWorkspace <- ls()
SomeLibrary <- new.env(parent=globalenv())
require("compiler",quietly=TRUE)
pathToLoad <- "C:/SomeLibraryFiles"
filesToSource <- file.path(pathToLoad,dir(pathToLoad,recursive=TRUE)[grepl(".*[\\.R|\\.r].*",dir(pathToLoad,recursive=TRUE))])
for (filename in filesToSource) {
tryCatch({
suppressWarnings(sys.source(filename, envir=SomeLibrary))
},error=function(ex) {
cat("Failed to source: ",filename,"\n")
print(ex)
})
}
for(SomeLibraryFunction in ls(SomeLibrary)) {
if (class(get(SomeLibraryFunction,envir=SomeLibrary))=="function") {
outText <- capture.output(with(SomeLibrary,assign(SomeLibraryFunction,cmpfun(get(SomeLibraryFunction)))))
if(length(outText)>0){
cat("The function ",SomeLibraryFunction," produced the following compile note(s):\n")
cat(outText,sep="\n")
cat("\n")
}
}
}
attach(SomeLibrary)
rm(list=ls()[!ls() %in% currentlyInWorkspace])
invisible(gc(verbose=FALSE,reset=TRUE))
Then start up R with no preloaded packages and source in C:\SomeLibrary.r
And then you should get notes from cmpfun for any call to a function in a package that's not part of the base packages and doesn't have a fully qualified namespace defined.

Resources