Setting Function Defaults R on a Project Specific Basis - r

Commonly, I use the same function settings. I'm wondering if there is a method, other than having a new object in the path that is essentially a wrapper for the function, to set default arguments. For example:
paste() has it's sep argument set to a space =" ", I'm tired of writing ,sep="" over and over. So is there a way to "temporarily" replace the function with my chosen defaults?
paste(...,sep="")
Can I accomplish this through packaging? I've sometimes noticed that, some packages force other equally named functions to be masked in the global environment.
Ideally, I'd like something that can be set on a project by project basis in (load.r or some other such workflow startpoint)

I'd personally be very hesitant to change the default behavior of any commonly used functions --- especially base R functions. For one thing, it will immediately decrease the portability of any scripts or code snippets in which you use the redefined functions. Worse, other R users reading your scripts will likely be either: (a) unaware of your private meanings for well-known language elements or (b) frustrated at having to rewire their own expectations for the functions. For me, it would also feel like an added mental burden to attach different meanings to the same symbol in different settings.
I think a much better solution is to create similarly named functions implementing your preferred defaults. A slightly modified name will effectively flag that this isn't the familiar base function, without burdening you with much or any extra typing. A good example are the paste0() and cat0() functions that are included in the gsubfn package. (Clearly you and I aren't the only two to find ourselves (frequently) annoyed by the paste()'s default sep setting!):
library(gsubfn)
paste0
# function (..., sep = "")
# paste(..., sep = sep)
# <environment: namespace:gsubfn>
cat0
# function (..., sep = "")
# cat(..., sep = sep)
# <environment: namespace:gsubfn>
You can then either collect a number of these functions in a text file, sourcing them early in your script, or (better) package them up and load them via a call to library().

The Defaults package used to do that; retired in 2014.

Related

R Functions require package declaration when they are included from another file?

I am writing some data manipulation scripts in R, and I finally decided to create an external .r file and call my functions from there. But it started giving me some problems when I try calling some functions. Simple example:
This one works with no problem:
change_column_names <- function(file,new_columns,seperation){
new_data <- read.table(file, header=TRUE, sep=seperation)
colnames(new_data) <- new_columns
write.table(new_data, file=file, sep=seperation, quote=FALSE, row.names = FALSE)
}
change_column_names("myfile.txt",c("Column1", "Column2", "Cost"),"|")
When I crate a file "data_manipulation.r", and put the above change_column_names function in there, and do this
sys.source("data_manipulation.r")
change_column_names("myfile.txt",c("Column1", "Column2", "Cost"),"|")
it does not work. It gives me could not find function "read.table" error. I fixed it by changing the function calls to util:::read.table and util:::write.table .
But this kinda getting frustrating. Now I have the same issue with the aggregate function, and I do not even know what package it belongs to.
My questions: Which package aggregate belongs to? How can I easily know what packages functions come from? Any cleaner way to handle this issue?
The sys.source() by default evaluates inside the base environment (which is empty) rather than the global environment (where you usually evaluate code). You probably should just be using source() instead.
You can also see where functions come from by looking at their environment.
environment(aggregate)
# <environment: namespace:stats>
For the first part of your question: If you want to find what package a function belongs to, and that function is working properly you can do one of two (probably more) things:
1.) Access the help files
?aggregate and you will see the package it belongs to in the top of the help file.
Another way, is to simply type aggregate without any arguments into the R console:
> aggregate
function (x, ...)
UseMethod("aggregate")
<bytecode: 0x7fa7a2328b40>
<environment: namespace:stats>
The namespace is the package it belongs to.
2.) Both of those functions that you are having trouble with are base R functions and should always be loaded. I was not able to recreate the issue. Try using source instead of sys.source and let me know if it alleviates your error.

In R, do an operation temporarily using a setting such as working directory

I'm almost certain I've read somewhere how to do this. Instead of having to save the current option (say working directory) to a variable, change the w.d, do an operation, and then revert back to what it was, doing this inside a function akin to "with" relative to attach/detach. A solution just for working directory is what I need now, but there might be a more generic function that does that sort of things? Or ain't it?
So to illustrate... The way it is now:
curdir <- getwd()
setwd("../some/place")
# some operation
setwd(curdir)
The way it is in my wildest dreams:
with.dir("../some/place", # some operation)
I know I could write a function for this, I just have the impression there's something more readily available and generalizable to other parameters too.
Thanks
There is an idiom for this in some of R's base plotting functions
op <- par(no.readonly = TRUE)
# par(blah = stuff)
# plot(stuff)
par(op)
that is so unbelievably crude as to be fully portable to options() and setwd().
Fortunately it's also easy to implement a crude wrapper:
with_dir <- function(dir, expr) {
old_wd <- getwd()
setwd(dir)
result <- evalq(expr)
setwd(old_wd)
result
}
I'm no wizard with nonstandard evaluation so evalq could be unstable somehow. More on NSE in an old write-up by Lumley and also in Wickham's Advanced R, but it's dense stuff and I haven't wrapped my head around it all yet.
edit: as per Ben Bolker's comment, it's probably better to use on.exit for this:
with_dir <- function(dir, expr) {
old_wd <- getwd()
on.exit(setwd(old_wd))
setwd(dir)
evalq(expr)
}
From the R docs:
on.exit records the expression given as its argument as needing to be executed when the current function exits (either naturally or as the result of an error). This is useful for resetting graphical parameters or performing other cleanup actions.
What you're describing depends upon two things: detecting when you enter and leave a particular lexical scope, and defining a behavior to do on entrance and on exit. Python has these, called "Context Managers". This was a big deal when it was released, and many parts of Python's standard library now behave like context managers, and have to define the "enter" and "exit" behavior in explicitly, or by leveraging some clever inheritance scheme.
with.default
function (data, expr, ...)
eval(substitute(expr), data, enclos = parent.frame())
<bytecode: 0x07d02ccc>
<environment: namespace:base>
R's with function works sort of like a context manager, because it can pass scopes around easily. That said, this doesn't give you the "enter" and "exit" operations for free. Especially consider that the current working directory isn't an entry in the current scope, but a state of the R interpreter, which can only be queried or changed by function calls behind the .Internal shield.
You can easily define your own object types to have methods that are context manager-like for the with generic function, as well as writing and registering methods for other types you commonly use, but it is not part of the base R language.

Which R functions are not suitable for programmatic use?

Some functions like browser only make sense when used interactively.
It is widely regarded that the subset function should only be used interactively.
Similarly, sapply isn't good for programmatic use since it doesn't simplify the result for zero length inputs.
I'm trying to make an exhaustive list of functions that are only not suitable for programmatic use.
The plan is to make a tool for package checking to see if any of these functions are called and give a warning.
There are other functions like file.choose and readline that require interactivity, but these are OK to include in packages, since the end use will be interactive. I don't care too much about these for this use case but feel free to add them to the list.
Which functions have I missed?
(Feel free to edit.)
The following functions should be handled with care (which does not necessarily mean they are not suitable for programming):
Functions whose outputs do not have a consistent output class depending on the inputs: sapply, mapply (by default)
Functions whose internal behavior is different depending on the input length: sample, seq
Functions that evaluate some of their arguments within environments: $, subset, with, within, transform.
Functions that go against normal environment usage: attach, detach, assign, <<-
Functions that allow partial matching: $
Functions that only make sense in interactive usage: browser, recover, debug, debugonce, edit, fix, menu, select.list
Functions that can be a threat (virus) if used with user inputs: source, eval(parse(text=...)), system.
Also, to some extent, every function that generates warnings rather than errors. I recommend using options(warn = 2) to turn all warnings into errors in a programming application. Specific cases can then be allowed via suppressWarnings or try.
This is in answer to the comment after the question by the poster. This function inputs a function and returns the bad functions found with their line number. It can generate false positives but they are only warnings anways so that does not seem too bad. Modify bad to suit.
badLines <- function(func) {
bad <- c("sapply", "subset", "attach")
regex <- paste0("\\b", bad, "\\b")
result <- sort(unlist(sapply(regex, FUN = grep, body(func), simplify = FALSE)))
setNames(result, gsub("\\b", "", names(result), fixed = TRUE))
}
badLines(badLines)
## sapply1 subset attach sapply2
## 2 2 2 4

hiding personal functions in R

I have a few convenience functions in my .Rprofile, such as this handy function for returning the size of objects in memory. Sometimes I like to clean out my workspace without restarting and I do this with rm(list=ls()) which deletes all my user created objects AND my custom functions. I'd really like to not blow up my custom functions.
One way around this seems to be creating a package with my custom functions so that my functions end up in their own namespace. That's not particularly hard, but is there an easier way to ensure custom functions don't get killed by rm()?
Combine attach and sys.source to source into an environment and attach that environment. Here I have two functions in file my_fun.R:
foo <- function(x) {
mean(x)
}
bar <- function(x) {
sd(x)
}
Before I load these functions, they are obviously not found:
> foo(1:10)
Error: could not find function "foo"
> bar(1:10)
Error: could not find function "bar"
Create an environment and source the file into it:
> myEnv <- new.env()
> sys.source("my_fun.R", envir = myEnv)
Still not visible as we haven't attached anything
> foo(1:10)
Error: could not find function "foo"
> bar(1:10)
Error: could not find function "bar"
and when we do so, they are visible, and because we have attached a copy of the environment to the search path the functions survive being rm()-ed:
> attach(myEnv)
> foo(1:10)
[1] 5.5
> bar(1:10)
[1] 3.027650
> rm(list = ls())
> foo(1:10)
[1] 5.5
I still think you would be better off with your own personal package, but the above might suffice in the meantime. Just remember the copy on the search path is just that, a copy. If the functions are fairly stable and you're not editing them then the above might be useful but it is probably more hassle than it is worth if you are developing the functions and modifying them.
A second option is to just name them all .foo rather than foo as ls() will not return objects named like that unless argument all = TRUE is set:
> .foo <- function(x) mean(x)
> ls()
character(0)
> ls(all = TRUE)
[1] ".foo" ".Random.seed"
Here are two ways:
1) Have each of your function names start with a dot., e.g. .f instead of f. ls will not list such functions unless you use ls(all.names = TRUE) therefore they won't be passed to your rm command.
or,
2) Put this in your .Rprofile
attach(list(
f = function(x) x,
g = function(x) x*x
), name = "MyFunctions")
The functions will appear as a component named "MyFunctions" on your search list rather than in your workspace and they will be accessible almost the same as if they were in your workspace. search() will display your search list and ls("MyFunctions") will list the names of the functions you attached. Since they are not in your workspace the rm command you normally use won't remove them. If you do wish to remove them use detach("MyFunctions") .
Gavin's answer is wonderful, and I just upvoted it. Merely for completeness, let me toss in another one:
R> q("no")
followed by
M-x R
to create a new session---which re-reads the .Rprofile. Easy, fast, and cheap.
Other than that, private packages are the way in my book.
Another alternative: keep the functions in a separate file which is sourced within .RProfile. You can re-source the contents directly from within R at your leisure.
I find that often my R environment gets cluttered with various objects when I'm creating or debugging a function. I wanted a way to efficiently keep the environment free of these objects while retaining personal functions.
The simple function below was my solution. It does 2 things:
1) deletes all non-function objects that do not begin with a capital letter and then
2) saves the environment as an RData file
(requires the R.oo package)
cleanup=function(filename="C:/mymainR.RData"){
library(R.oo)
# create a dataframe listing all personal objects
everything=ll(envir=1)
#get the objects that are not functions
nonfunction=as.vector(everything[everything$data.class!="function",1])
#nonfunction objects that do not begin with a capital letter should be deleted
trash=nonfunction[grep('[[:lower:]]{1}',nonfunction)]
remove(list=trash,pos=1)
#save the R environment
save.image(filename)
print(paste("New, CLEAN R environment saved in",filename))
}
In order to use this function 3 rules must always be kept:
1) Keep all data external to R.
2) Use names that begin with a capital letter for non-function objects that I want to keep permanently available.
3) Obsolete functions must be removed manually with rm.
Obviously this isn't a general solution for everyone...and potentially disastrous if you don't live by rules #1 and #2. But it does have numerous advantages: a) fear of my data getting nuked by cleanup() keeps me disciplined about using R exclusively as a processor and not a database, b) my main R environment is so small I can backup as an email attachment, c) new functions are automatically saved (I don't have to manually manage a list of personal functions) and d) all modifications to preexisting functions are retained. Of course the best advantage is the most obvious one...I don't have to spend time doing ls() and reviewing objects to decide whether they should be rm'd.
Even if you don't care for the specifics of my system, the "ll" function in R.oo is very useful for this kind of thing. It can be used to implement just about any set of clean up rules that fit your personal programming style.
Patrick Mohr
A nth, quick and dirty option, would be to use lsf.str() when using rm(), to get all the functions in the current workspace. ...and let you name the functions as you wish.
pattern <- paste0('*',lsf.str(), '$', collapse = "|")
rm(list = ls()[-grep(pattern, ls())])
I agree, it may not be the best practice, but it gets the job done! (and I have to selectively clean after myself anyway...)
Similar to Gavin's answer, the following loads a file of functions but without leaving an extra environment object around:
if('my_namespace' %in% search()) detach('my_namespace'); source('my_functions.R', attach(NULL, name='my_namespace'))
This removes the old version of the namespace if it was attached (useful for development), then attaches an empty new environment called my_namespace and sources my_functions.R into it. If you don't remove the old version you will build up multiple attached environments of the same name.
Should you wish to see which functions have been loaded, look at the output for
ls('my_namespace')
To unload, use
detach('my_namespace')
These attached functions, like a package, will not be deleted by rm(list=ls()).

Protecting function names in R

Is it possible in R to protect function names (or variables in general) so that they cannot be masked.
I recently spotted that this can be a problem when creating a data frame with the name "new", which masked a function used by lmer and thus stopped it working. (Recovery is easy once you know what the problem is, here "rm(new)" did it.)
There is an easy workaround for your problem, without worrying about protecting variable names (though playing with lockBinding does look fun). If a function becomes masked, as in your example, it is still possible to call the masked version, with the help of the :: operator.
In general, the syntax is packagename::variablename.
(If the function you want has not been exported from the package, then you need three colons instead, :::. This shouldn't apply in this case however.)
Maybe use environments! This is a great way to separate namespaces. For example:
> a <- new.env()
> assign('printer', function(x) print(x), envir=a)
> get('printer', envir=a)('test!')
[1] "test!"
#hdallazuanna recommends (via Twitter)
new <- 1
lockBinding('new', globalenv())
this makes sense when the variable is user created but does not, of course, prevent overwriting a function from a package.
I had the reverse problem from the OP, and I wanted to prevent my custom functions in .Rprofile from being overridden when I defined a variable with the same name as a function, but I ended up putting my functions to ~/.R.R and I added these lines to .Rprofile:
if("myfuns"%in%search())detach("myfuns")
source("~/.R.R",attach(NULL,name="myfuns"))
From the help page of attach:
One useful ‘trick’ is to use ‘what = NULL’ (or equivalently a
length-zero list) to create a new environment on the search path
into which objects can be assigned by assign or load or
sys.source.
...
## create an environment on the search path and populate it
sys.source("myfuns.R", envir = attach(NULL, name = "myfuns"))

Resources