How can I source specific functions in an R script? - r

I have a script with my most commonly used functions which I source at the top of most scripts. Sometimes I only want to get one of the functions in that script, but I don't know how to indicate that I only want one specific function. I'm looking for a function that is similar to the :: used to get a function inside a package. A reproducible example:
# file a.R
foo <- function() cat("Hello!\n")
bar <- function() cat("Goodbye!\n")
# End of file a.R
# file b.R
# Can't just delete all functions
fun <- function(x) print(x)
fun("It's so late!")
source("a.R")
foo()
fun("See you next time")
# End of file
I read the "source" help and it was unhelpful to me. The solution I currently have is to assign a variable at the start of the script with the functions loaded before, then set the difference with what was there after:
list_before <- lsf.str()
# content of file b.R
new_funcs <- setdiff(lsf.str(),list_before)
Then I can use rm(list=new_funcs[-1]) to keep only the function I wanted. This is, however a very convoluted way of doing this and I was hoping to find an easier solution.

A good way would be to write a package but it requires more knowledge (not there myself).
A good alternative I found is to use the package box that always you to import functions from an R script as a module.
You can import all functions or specific functions.
To set up a function as a module, you would use the roxygen2 documentation syntax as such:
#' This is a function to calculate a sum
#' #export
my_sum <- function(x, y){
x + y
}
#' This is a function to calculate a difference
#' #export
my_diff <- function(x, y){
x - y
}
Save the file as an R script "my_module.R"
The export parameter in the documentation tells box that what follows is a module. Then you can call box to reach a specific function in the module named "my_module".
Let's say your project directory has a script folder that contains your scripts and modules, you would import functions as such:
box::use(script/my_module)
my_module$my_sum(x, y)
box::use() creates an environment that contains all the functions found inside the module.
You can also import single functions like as follows. Let's assume your directory is a bit more complex as well where modules are inside a box folder inside script.
box::use(./script/box/my_module[my_sum])
my_sum(x, y)
You can use box to fetch functions from packages as well. In a sense, it is better than calling library() that would import all the functions in the package.
Using box, you can organize script by objectives or whatever organization you have in place.
I have a script to deal with strings from which I fetch function that work with strings.
I have a script for plot functions that I use in my projects...etc

insertSource() would help.
In your example, let's presume we need to import foo() from a.R :
# file b.R
foo <- function(){}
insertSource("a.R", functions = "foo", force=T)
foo <- foo#.Data

Related

What's the most simple approach to name-spacing R files with `file::function`

Criteria for answer to this question
Given the following function (within its own script)
# something.R
hello <- function(x){
paste0("hello ", x)
}
What is the most minimal amount of setup which will enable the following
library(something)
x <- something::hello('Sue')
# x now has value: "hello Sue"
Context
In python it's very simple to have a directory containing some code, and utilise it as
# here foo is a directory
from foo import bar
bar( ... )
I'm not sure how to do something similar in R though.
I'm aware there's source(file.R), but this puts everything into the global namespace. I'm also aware that there's library(package) which provides package::function. What I'm not sure about is whether there's a simple approach to using this namespacing within R. The packaging tutorials that I've searched for seem to be quite involved (in comparison to Python).
I don't know if there is a real benefit in creating a namespace just for one quick function. It is just not the way it is supposed to be (I think).
But anyway here is a rather minimalistic solution:
First install once: install.packages("namespace")
The function you wanted to call in the namespace:
hello <- function(x){
paste0("hello ", x)
}
Creating your namespace, assigning the function and exporting
ns <- namespace::makeNamespace("newspace")
assign("hello",hello ,env = ns)
base::namespaceExport(ns, ls(ns))
Now you can call your function with your new namespace
newspace::hello("you")
Here's the quickest workflow I know to produce a package, using RStudio. The default package already contains a hello function, that I overwrote with your code.
Notice there was also a box "create package based on source files", which I didn't use but you might.
A package done this way will contain exported undocumented untested functions.
If you want to learn how to document, export or not, write tests and run checks, include other objects than functions, include compiled code, share on github, share on CRAN.. This book describes the workflow used by thousands of users, and is designed so you can usually read sections independently.
If you don't want to do it from GUI you can useutils::package.skeleton() to build a package folder, and remotes::install_local() to install it :
Reproducible setup
# create a file containing function definition
# where your current function is located
function_path <- tempfile(fileext = ".R")
cat('
hello <- function(x){
paste0("hello ", x)
}
', file = function_path)
# where you store your package code
package_path <- tempdir()
Solution :
# create package directory at given location
package.skeleton("something", code_file = file_path, path = package_path)
# remove sample doc to make remotes::install_local happy
unlink(file.path(package_path, "something", "man/"), TRUE)
# install package
remotes::install_local(file.path(package_path, "something"))

How to import a file in R and give it a name

I have a R class which looks like below
source("data_validation.R")
data_validation.mymodel <- function(mymodel,newdata=list()) {
data_validation(newdata)
}
I have a function called data_validation in which I need to call another function from another file which I'm sourcing with the same name. But it's giving me an error and can't find the function in the other file because it has the same name. How can I have an alias for my functions in the data_validation.R file so I can easily distinguish these 2 functions (i.e. something similar to python where I can say import data_validation as dv)
As long as you:
Must source the files, vice making at least one of them a formal package; and
Cannot change the function name of either function defined in those sourced files,
you will have to work around things to be able to use both functions.
Here's a hack that might work for you.
I'm starting with a simple file, foo.R, that merely defines one function:
myfunc <- function(x) x+1
In the global environment, I'll load and test it:
source("~/Downloads/foo.R")
myfunc(2)
# [1] 3
But let's say I have another file (foo2.R) with the same function:
myfunc <- function(x) x+1000
Let's write a wrapper around source that looks for just that function:
wrap_source <- function(...) { source(..., local = TRUE); environment(); }
ls(envir=wrap_source("~/Downloads/foo2.R"))
# [1] "myfunc"
other <- wrap_source("~/Downloads/foo2.R")
myfunc(1)
# [1] 2
other$myfunc(1)
# [1] 1001

Function hiding in R

Consider the following file.r:
foo = function(){}
bar = function(){}
useful = function() {foo(); bar()}
foo and bar are meant only for internal use by useful - they are not reusable at all, because they require very specific data layout, have embedded constants, do something obscure that no one is going to need etc.
I don't want to define them inside useful{}, because then it will become too long (>10 LOC).
A client could do the following to import only useful in their namespace, and still I am not sure if that will work with foo and bar outside visibility:
# Source a single function from a source file.
# Example use
# max.a.posteriori <- source1( "file.r","useful" )
source1 <- function( path, fun )
{
source( path, local=TRUE )
get( fun )
}
How can I properly do this on the file.r side i.e. export only specific functions?
Furthermore, there is the problem of ordering of functions, which I feel is related to the above. Let us have
douglas = function() { adams() }
adams = function() { douglas() }
How do I handle circular dependencies?
You can achieve this by setting the binding environment of your useful function, as in the code listed below. This is similar to what packages do and if your project gets bigger, I would really recommend creating a package using the great devtools package.
If the functions foo and bar are not used by other functions I would just define them inside useful. Since the functions are quite independent pieces of code it does not make the code more complicated to understand, even if the line count of useful increases. (Except of course if you are forced by some guideline to keep the line count short.)
For more on environments see: http://adv-r.had.co.nz/Environments.html
# define new environment
myenv <- new.env()
# define functions in this environment
myenv$foo <- function(){}
myenv$bar <- function(){}
# define useful in global environment
useful <- function(){
foo()
bar()
}
# useful does not find the called functions so far
useful()
# neither can they be found in the globalenv
foo()
# but of course in myenv
myenv$foo()
# set the binding environment of useful to myenv
environment(useful) <- myenv
# everything works now
useful()
foo()
My recommendation is to use packages. They were created for such situations. But still you cannot hide the functions itself in pure R.
In order to encapsulate foo and bar you need to implement a class. The easiest way, in my opinion, to do that in R is through R6classes: https://cran.r-project.org/web/packages/R6/vignettes/Introduction.html#private-members. There you have the example on how to hide the length function.

R Package: how "import" works when my exported function does not call explicitly a function from other packages, but a subroutine does

I am developing my first R package and there is something that it is not clear to me about Imports in the DESCRIPTION file. I went through quite some guides that explain package structure but I do not find an answer to my question, so here is my situation.
I define a function f that I will export, so its definition will have the proper #export roxygen comment on top.
now, my function f calls a subroutine hidden, that I do not want to export. Function hidden uses other packages too, say package X.
Because the call to X is inside function hidden, there is no tag #import X in my function f. Thus, I added package X to the Imports in my DESCRIPTION file, hoping to specify the relevant dependency there.
When I use devtools::document(), however, the generated NAMESPACE does not contain an entry for X. I can see why that happens: the parser just does not find the flag in the roxygen comment for f, and at runtime a call to f crashes because X is missing.
Now, I can probably fix everything by specifying X in the import of f. But why is the mechanism this tricky? Or, similarly, why my imports in DESCRIPTION do not match the ones in NAMESPACE?
My understanding is that there are three "correct" ways to do the import. By "correct," I mean that they will pass CRAN checks and function properly. Which option you choose is a matter of balancing various advantages and is largely subjective.
I'll review these options below using the terminology
primary_function the function in your package that you wish to export
hidden the unexported function in your package used by primary_function
thirdpartypkg::blackbox, blackbox is an exported function from the thirdpartypkg package.
Option 1 (no direct import / explicit function call)
I think this is the most common approach. thirdpartypkg is declared in the DESCRIPTION file, but nothing is imported from thirdpartypkg in the NAMESPACE file. In this option, it is necessary to use the thirdpartypkg::blackbox construct to get the desired behavior.
# DESCRIPTION
Imports: thirdpartypkg
# NAMESPACE
export(primary_function)
#' #name primary_function
#' #export
primary_function <- function(x, y, z){
# do something here
hidden(a = y, b = x, z = c)
}
# Unexported function
#' #name hidden
hidden <- function(a, b, c){
# do something here
thirdpartypkg::blackbox(a, c)
}
Option 2 (direct import / no explicit function call)
In this option, you directly import the blackbox function. Having done so, it is no longer necessary to use thirdpartypkg::blackbox; you may simply call blackbox as if it were a part of your package. (Technically it is, you imported it to the namespace, so there's no need to reach to another namespace to get it)
# DESCRIPTION
Imports: thirdpartypkg
# NAMESPACE
export(primary_function)
importFrom(thirdpartypkg, blackbox)
#' #name primary_function
#' #export
primary_function <- function(x, y, z){
# do something here
hidden(a = y, b = x, z = c)
}
# Unexported function
#' #name hidden
#' #importFrom thirdpartypkg blackbox
hidden <- function(a, b, c){
# do something here
# I CAN USE blackbox HERE AS IF IT WERE PART OF MY PACKAGE
blackbox(a, c)
}
Option 3 (direct import / explicit function call)
Your last option combines the the previous two options and imports blackbox into your namespace, but then uses the thirdpartypkg::blackbox construct to utilize it. This is "correct" in the sense that it works. But it can be argued to be wasteful and redundant.
The reason I say it is wasteful and redundant is that, having imported blackbox to your namespace, you're never using it. Instead, you're using the blackbox in the thirdpartypkg namespace. Essentially, blackbox now exists in two namespaces, but only one of them is ever being used. Which begs the question of why make the copy at all.
# DESCRIPTION
Imports: thirdpartypkg
# NAMESPACE
export(primary_function)
importFrom(thirdpartypkg, blackbox)
#' #name primary_function
#' #export
primary_function <- function(x, y, z){
# do something here
hidden(a = y, b = x, z = c)
}
# Unexported function
#' #name hidden
#' #importFrom thirdpartypkg blackbox
hidden <- function(a, b, c){
# do something here
# I CAN USE blackbox HERE AS IF IT WERE PART OF MY PACKAGE
# EVEN THOUGH I DIDN'T. CONSEQUENTLY, THE blackbox I IMPORTED
# ISN'T BEING USED.
thirdpartypkg::blackbox(a, c)
}
Considerations
So which is the best approach to use? There isn't really an easy answer to that. I will say that Option 3 is probably not the approach to take. I can tell you that Wickham advises against Option 3 (I had been developing under that framework and he advised me against it).
If we make the choice between Option 1 and Option 2, the considerations we have to make are 1) efficiency of writing code, 2) efficiency of reading code, and 3) efficiency of executing code.
When it comes to the efficiency of writing code, it's generally easier to #importFrom thirdpartypkg blackbox and avoid having to use the :: operator. It just saves a few key strokes. This adversely affects readability of code, however, because now it isn't immediately apparent where blackbox comes from.
When it comes to efficiency of reading code, it's superior to omit #importFrom and use thirdpartypkg::blackbox. This makes it obvious where blackbox comes from.
When it comes to efficiency of executing code, it's better to #importFrom. Calling thirdpartypkg::blackbox is about 0.1 milliseconds slower than using #importFrom and calling blackbox. That isn't a lot of time, so probably isn't much of a consideration. But if your package uses hundreds of :: constructs and then gets thrown into looping or resampling processes, those milliseconds can start to add up.
Ultimately, I think the best guidance I've read (and I don't know where) is that if you are going to call blackbox more than a handful of times, it's worth using #importFrom. If you will only call it three or four times in a package, go ahead and use the :: construct.

retrieve original version of package function even if over-assigned

Suppose I replace a function of a package, for example knitr:::sub_ext.
(Note: I'm particularly interested where it is an internal function, i.e. only accessible by ::: as opposed to ::, but the same answer may work for both).
library(knitr)
my.sub_ext <- function (x, ext) {
return("I'm in your package stealing your functions D:")
}
# replace knitr:::sub_ext with my.sub_ext
knitr <- asNamespace('knitr')
unlockBinding('sub_ext', knitr)
assign('sub_ext', my.sub_ext, knitr)
lockBinding('sub_ext', knitr)
Question: is there any way to retrieve the original knitr:::sub_ext after I've done this? Preferably without reloading the package?
(I know some people want to know why I would want to do this so here it is. Not required reading for the question). I've been patching some functions in packages like so (not actually the sub_ext function...):
original.sub_ext <- knitr:::sub_ext
new.sub_ext <- function (x, ext) {
# some extra code that does something first, e.g.
x <- do.something.with(x)
# now call the original knitr:::sub_ext
original.sub_ext(x, ext)
}
# now set knitr:::sub_ext to new.sub_ext like before.
I agree this is not in general a good idea (in most cases these are quick fixes until changes make their way into CRAN, or they are "feature requests" that would never be approved because they are somewhat case-specific).
The problem with the above is if I accidentally execute it twice (e.g. it's at the top of a script that I run twice without restarting R in between), on the second time original.sub_ext is actually the previous new.sub_ext as opposed to the real knitr:::sub_ext, so I get infinite recursion.
Since sub_ext is an internal function (I wouldn't call it directly, but functions from knitr like knit all call it internally), I can't hope to modify all the functions that call sub_ext to call new.sub_ext manually, hence the approach of replacing the definition in the package namespace.
When you do assign('sub_ext', my.sub_ext, knitr), you are irrevocably overwriting the value previously associated with sub_ext with the value of my.sub_ext. If you first stash the original value, though, it's not hard to reset it when you're done:
library(knitr)
knitr <- asNamespace("knitr")
## Store the original value of sub_ext
.sub_ext <- get("sub_ext", envir = knitr)
## Overwrite it with your own function
my.sub_ext <- function (x, ext) "I'm in your package stealing your functions D:"
assignInNamespace('sub_ext', my.sub_ext, knitr)
knitr:::sub_ext("eg.csv", "pdf")
# [1] "I'm in your package stealing your functions D:"
## Reset when you're done
assignInNamespace('sub_ext', .sub_ext, knitr)
knitr:::sub_ext("eg.csv", "pdf")
# [1] "eg.pdf"
Alternatively, as long as you are just adding lines of code to what's already there, you could add that code using trace(). What's nice about trace() is that, when you are done, you can use untrace() to revert the function's body to its original form:
trace(what = "mean.default",
tracer = quote({
a <- 1
b <- 2
x <- x*(a+b)
}),
at = 1)
mean(1:2)
# Tracing mean.default(1:2) step 1
# [1] 4.5
untrace("mean.default")
# Untracing function "mean.default" in package "base"
mean(1:2)
# [1] 1.5
Note that if the function you are tracing is in a namespace, you'll want to use trace()'s where argument, passing it the name of some other (exported) function that shares the to-be-traced function's namespace. So, to trace an unexported function in knitr's namespace, you could set where=knit

Resources