emacs: expand autocomplete for R function to include namespace - r

Am using EMACS/ESS as editor for R.
I find it helpful to refer to a function defined outside of base with it's relevant namespace; as well as being good practice in general, it seems to be necessary when running R CMD check on a package.
I really like autocomplete in EMACS and am wondering if there's a way to extend the functionality to include namespace when autocomplete-ing the name of a function.
For example (in R):
library(stats)
Then in ESS when I start typing dn the autocomplete dnorm appears (greyed out) and I can complete it by pressing TAB.
What would be better is to complete as stats::dnorm or even stats:::dnorm so that I don't need to manually check whether the function I'm using is in base. (For a relatively new user, memorizing the names of all functions in base may be a lot to ask).
Details:
EMACS: 2012-06-10 on MARVINGNU Emacs 24.1.1 (i386-mingw-nt6.1.7601)
ESS version 12.04-4
Icicles (default install c. Oct 2012). Not sure how to find version info. for this.
If this doesn't already exist, any pointers would be welcome. Note this is closely related but if the answer is already there then I'm not quite getting it...

Related

Are there any good resources/best-practices to "industrialize" code in R for a data science project?

I need to "industrialize" an R code for a data science project, because the project will be rerun several times in the future with fresh data. The new code should be really easy to follow even for people who have not worked on the project before and they should be able to redo the whole workflow quite quickly. Therefore I am looking for tips, suggestions, resources and best-practices on how to achieve this objective.
Thank you for your help in advance!
You can make an R package out of your project, because it has everything you need for a standalone project that you want to share with others :
Easy to share, download and install
R has a very efficient documentation system for your functions and objects when you work within R Studio. Combined with roxygen2, it enables you to document precisely every function, and makes the code clearer since you can avoid commenting with inline comments (but please do so anyway if needed)
You can specify quite easily which dependancies your package will need, so that every one knows what to install for your project to work. You can also use packrat if you want to mimic python's virtualenv
R also provide a long format documentation system, which are called vignettes and are similar to a printed notebook : you can display code, text, code results, etc. This is were you will write guidelines and methods on how to use the functions, provide detailed instructions for a certain method, etc. Once the package is installed they are automatically included and available for all users.
The only downside is the following : since R is a functional programming language, a package consists of mainly functions, and some other relevant objects (data, for instance), but not really scripts.
More details about the last point if your project consists in a script that calls a set of functions to do something, it cannot directly appear within the package. Two options here : a) you make a dispatcher function that runs a set of functions to do the job, so that users just have to call one function to run the whole method (not really good for maintenance) ; b) you make the whole script appear in a vignette (see above). With this method, people just have to write a single R file (which can be copy-pasted from the vignette), which may look like this :
library(mydatascienceproject)
library(...)
...
dothis()
dothat()
finishwork()
That enables you to execute the whole work from a terminal or a distant machine with Rscript, with the following (using argparse to add arguments)
Rscript myautomatedtask.R --arg1 anargument --arg2 anotherargument
And finally if you write a bash file calling Rscript, you can automate everything !
Feel free to read Hadley Wickham's book about R packages, it is super clear, full of best practices and of great help in writing your packages.
One can get lost in the multiple files in the project's folder, so it should be structured properly: link
Naming conventions that I use: first, second.
Set up the random seed, so the outputs should be reproducible.
Documentation is important: you can use the Roxygen skeleton in rstudio (default ctrl+alt+shift+r).
I usually separate the code into smaller, logically cohesive scripts, and use a main.R script, that uses the others.
If you use a special set of libraries, you can consider using packrat. Once you set it up, you can manage the installed project-specific libraries.

How can I add a breakpoint to debug my R package without recompiling

When I develop a R package I endup doing the following thing:
load the latest build
use it
realize there is a bug in say function 'f_bug'
try to debug
I would be easy if I could just 're-source' f_bug and that the newly sourced version would be chosen (I'd rebuild the package clean latter).
But I cannot do that, it looks like the package::f_bug is always "chosen" by default when called within another package function.
Can I do such a thing ?
You can't use RStudio's convenient graphical breakpoints, but you can do the same thing using trace:
trace(package::f_bug, browser, at = insertion_point)
Here insertion_point refers not to line numbers but to a vector of substeps. From ?trace:
look at ‘as.list(body(f))’ to get the numbers
associated with the steps in function ‘f’.)
Another option might be to use utils::setBreakpoint which takes a file name and line number as arguments. See the help file for details.

R: what's the proper way to overwrite a function from a package?

I am using a R package, in which there are 2 functions f1 and f2 (with f2 calling f1)
I wish to overwrite function f1.
Since R 2.15 and the mandatory usage of namespace in packages, if I just source the new function, it is indeed available in the global environement (ie. just calling f1(x) in the console returns the new result). However, calling f2 will still use the packaged function f1. (Because the namespace modifies the search path, and seals it as explained here in the Writing R Extensions tutorial)
What is the proper way to completely replace f1 with the new one? (apart from building again the package!) This can be useful in several situations. For instance if there is a bug in a package that you have not developed. Or if you don't want to re-build your packages everyday while they are still under development.
I know about function
assignInNamespace("f1",f1,ns="mypackage")
However, the help page ?assignInNamespace is a bit enignmatic and seems to discourage people from using it without giving more information, and I couldn't find any best practice recommendations on the official CRAN tutorial. and after calling this function:
# Any of these 2 calls return the new function
mypackage::f1
getFromNamespace(x = "f1", envir = as.environment("package:mypackage"))
# while this one still returns the old packaged version
getFunction(name = "f1", where = as.environment("package:mypackage"))
This is very disturbing. How is the search path affected?
For now I am doing some ugly things such as modifying the lockEnvironment function so that library doesn't lock the package namespace, and I can lock it at a later stage once I have replaced f1 (which seems really not a good practice)
So basically I have 2 questions:
what does exactly do assignInNamespace in the case of a package namespace (which is supposed to be locked)
What are the good practices?
many thanks for sharing your experience there.
EDIT: people interested in this question might find this blog post extremely interesting.
There are lots of different cases here.
If it's a bug in someone else's package
Then the best practice is to contact the package maintainer and persuade them to fix it. That way everyone gets the fix, not just you.
If it's a bug while developing your own package
Then you need to find a workflow where rebuilding packages is easy. Like using the devtools package and typing build(mypackage), or clicking a button ("Build & Reload" in RStudio; "R CMD build" in Architect).
If you just want different behaviour to an existing package
If it isn't a bug as such, or the package maintainer won't make the fix that you want, then you'll have to maintain you own copy of f1. Using assignInNamespace to override it in the existing package is OK for exploring, but it's a bit hacky so it isn't really suitable for a permanent solution.
Your best bet is to create your own package containing copies of f1 and f2. This is less effort than it sounds, since you can just define f2 <- existingpackage::f2.
In response to the comment:
Second and third cases makes sense if you are alone but they require to build and install the packages which is tricky in the case of my organisation as the packages are deployed on dozens of computer and I need root access to update the packages.
So take a copy of the existing package source, apply your patch, and host it on your company network or github or Bitbucket. Then the updated package can be installed programmatically via
install.packages("//some/network/path/mypackage_0.0-1.tar.gz", repos = NULL)
or
library(devtools)
install_github("mypackage", "mygithubusername")
Since the installation is just a line of code, you can easily push it to as many machines as you like. You don't need root access either - just install the package to a library folder that doesn't require root access to write to. (Read the Startup and .libPaths help pages for how to define a new library.) You'll need network access to those machines, but I can't help you with that. Speak to your network administrator or your boss or whoever can get you permission.
In case the function has no explicit binding within the package:
rlang::env_unlock(env = asNamespace('mypackage'))
rlang::env_binding_unlock(env = asNamespace('mypackage'))
assign('f1', f1, envir = asNamespace('mypackage'))
rlang::env_binding_lock(env = asNamespace('mypackage'))
rlang::env_lock(asNamespace('mypackage'))

How can I remove a lock from a linked environment in R?

I tried to run a Bioconductor package (truncateCDF) that modify an environment(hgu133plus2cdf), to remove unwanted probesets, from an affymetrix chip.
Everything went fine, until I got the following message (translated from french):
> assign(cdfname, cdf.env, env=CDF.env)
Error in assign(cdfname, cdf.env, env = CDF.env) :
impossible to change the value of a locked link for 'hgu133plus2cdf'
The assign function is the ultimate function of the code, that save the changes made to the environment dataset CDF.env to the original environment (hgu133plus2cdf), before using it in analyses of affymetrix chip results; so, it is essential.
My question: what is this locked link to the hgu133plus2cdf environment, and how could I bypass it.
The author of this package successfully run its package around 2005; so I suppose it is a feature introduced since then in R (probably not related to Bioconductor, since assign is a basic R function, reason why I ask this question on this forum instead of Biostar).
I tried to read the docs, but I am overwhelmed, when it comes to environments.
Thanks in advance for any help.
I don't think truncateCDF is from a Bioconductor package; it is a at least not current. This sounds like this post and the next two from the same thread from the Bioconductor mailing list. It is a result of a change in R -- packages now have not-easily-modified name spaces, and these are implemented by locking the environment in which name space symbols are defined. Removing probes is not an essential part of a typical microarray work flow. Please ask on the Bioconductor mailing list (no subscription required) if you'd like more help.

How to see man page of a function of a user installed R library/package?

I have installed a library that contains lot of functions. I want to see calculations done in those functions. How to reach man page of these function in R prompt ?
I only know library(help="name of library") command to seek help.
The shortcut is:
?command
The longer version, for commands such as if for which the shortcut won't work, is:
help("command")
If you need to find a command in installed packages:
help.search("command")
If you need to find a command in all possible packages, the sos library is the ticket.
If you were using the ggplot2 package, go to google, type in something like :
'r ggplot2'
In the results look for the sites that look like this:
cran.r-project.org/package=ggplot2
and start with 'cran.r-project.org/....'
Click on that link and you will see a page for the package. You want to click on either the Reference Manual or for some packages there are Vignettes with more involved explanations and examples. However, the Reference Manual is usually pretty good for examples of how to use the package functions.

Resources