Purging previous versions of functions when developing packages in R - r

I am developing a package in R. Let's call it mypkg.
Because some functions behave differently when they are run from a package (I am not sure why--but this is not the question), I am editing functions in the package, then rebuilding the package from the command line. For some reason a given R instance retains older versions of the functions even though the source has been changed and the package rebuilt and re-installed. I need to start a new instance to see the changes.
Here is the typical workflow.
Make changes to myfunction() in mypkg.R
In R: detach(package:mypkg); remove.packages("mypkg")
Command line: R CMD INSTALL --build c:\mypkg
Notifies me that it has been installed to the default library
In R: library(mypkg)
In R: myfunction() runs the previous version before changes.
[The next three steps I want to avoid]
Start a new R instance
In R: library(mypkg)
myfunction() works as expected
Run under R.2.14.1.
I am looking for suggestions of how to improve this workflow to avoid starting a new R instance.

You need to try to unload the package as well as detach it. ?detach has:
If a package has a namespace, detaching it does not by default
unload the namespace (and may not even with ‘unload=TRUE’), and
detaching will not in general unload any dynamically loaded
compiled code (DLLs). Further, registered S3 methods from the
namespace will not be removed. If you use ‘library’ on a package
whose namespace is loaded, it attaches the exports of the already
loaded namespace. So detaching and re-attaching a package may not
refresh some or all components of the package, and is inadvisable.
Note the last sentence in particular.
Try:
detach(package:mypkg, unload = TRUE)
Note that the "If a package has a namespace" now means all packages as this was changed in R 2.14.0 (IIRC the version number)
If the code you are changing is R code, consider using assignInNamespace() to assign an object/function in your global workspace (i.e. the newer version of function in mypkg) to the namespace of mypkg. For example we have function foo() in mypkg and locally we have newfoo() which is the newer version of foo():
assignInNamespace("foo", newfoo, "mypkg")
If the changes relate to C code, or the above don't work, then perhaps you should follow the advice of R Core and spawn a new R instance instead of trying to detach your package.
See also the devtools package of Hadley Wickham and Emacs+ESS might make the development process easier (spawning new R instances etc) if you speak Emacs.

Related

Prevent R from executing code in R profile when installing packages

I have some code in my R profile that I've found interferes with package installation. For example, I like to automatically load devtools when I'm working on packages, so I have this
if (file.exists("DESCRIPTION")) {
try({suppressMessages(library(devtools))})
}
However, I've found that this interferes with package installation, I get errors like ERROR: lazy loading failed for package ‘rlang’. If I comment out the loading of devtools in the R profile, the package installs without error.
Is there any way to check if .Rprofile is being executed during package installation so that I could put that condition in the if and stop devtools from loading at inappropriate times?
I’d generally recommend against putting such code into your ~/.Rprofile! — The ~/.Rprofile should contain only general configuration that is always valid. For further customisation, use a project-local ~/.Rprofile instead.
However, I would make one exception to the above guideline, because one useful distinction you can make is to only execute certain code in interactive sessions:
if (interactive()) {
if (file.exists("DESCRIPTION")) {
try({suppressMessages(library(devtools))})
}
}
.Rprofile files are sourced on R startup. They won't be called later by other functions so you can't use the checking logic during install.packages().
For this reason loading packages using RProfile is not always advisable as it can make your code non-reproducible. However, as a helper package, devtools is probably a good exception (ignoring your particular problem here).
If you use base R only you can tell R not to load Rprofile during startup by using the --vanilla or --no-init-file arguments.
If you are using RStudio one workaround would be to use project level Rprofile files. Rstudio starts a new R session for each project, and loads the user Rprofile from the project root. If it then can't find that it will attempt to load the user RProfile.
Create a new project and within the the project options tick the option to disable the Rprofile loading.
Whenever you want to install a package switch to this project.
One caveat is to make sure you are running this as the only project session open, having multiple RStudio sessions open when installing new packages can lead to problems without putting the Rprofile in the mix.
Rstudio article on startup files https://support.rstudio.com/hc/en-us/articles/360047157094-Managing-R-with-Rprofile-Renviron-Rprofile-site-Renviron-site-rsession-conf-and-repos-conf
Base R documentation on R initialization https://stat.ethz.ch/R-manual/R-devel/library/base/html/Startup.html
A diagram of the R startup flow and some commentary on use of renviron and rprofile
https://rstats.wtf/r-startup.html

Developing R package, testing with `foreach`, while running simulations at same time with different package version

I write almost all my R code in packages at work (and use git). I make heavy use of devtools, in particular short cuts for load_all, etc as I update functions used in a package. I have a rough understanding of devtools, in that load_all makes a temporary copy of the package, and I really like this workflow for testing function updates in packages.
Is there a nice easy way/workflow for running simulations depending on the package, while developing it at the same time, without "breaking" those simulations?
I suspect there is an easy solution that I've overlooked.
Right now what I do is:
get the package "mypackage" up to a point ready for running simulations. copy the whole folder containing the project. Run the simulations in the copied folder using a new package name "mypackage2"). Run simulation scripts which include library(mypackage2) but NOT library(mypackage). This annoyingly means I need to update library(mypackage) calls to library(mypackage2) calls. If I run simulations using library(mypackage) and avoid using library(mypackage2), then I need to make sure the current built version of mypackage is the 'old' one that doesn't reflect updates in 2. below (but 2. below requires rebuilding the package too!). Handling all this gets messy.
While the simulations are running in the copied folder I can update the functions in "mypackage", by either using load_all or rebuilding the package. I often need to Rebuild the package (i.e. using load_all without rebuilding the package when testing updates to the package isn't a workable solution) because I want to test functions that run small parallel simulations with doParallel and foreach, etc (on windows), and any functions I modify and want to test need the latest built "mypackage" in the children processes which spawn new R processes calling "mypackage". I understand that when a package is built in R, it gets stored in ..\R\R-3.6.1\library, and when future R sessions call library(mypackage) they will use that version of the package.
What I'd ideally like to be able to do is, in the same original folder, run simulations with a version of mypackage, and then update the code in the package while simulations are stopped/started, confident my development changes won't break the simulations which are running a specific version of the package.
Is there a simple way for doing the above, without having to recopy folders (and make things like "mypackage2")?
thanks
The issue described here is sort of similar to what I am facing Specify package location in foreach
The problem is that if I run a simulation that takes several days using "mypackage", with many calls to foreach, and update and rebuild "mypackage" when testing changes, future foreach calls from the simulation may pick up the new updated version of the package, which would be a disaster.
I think the answers in the other question do apply,
but you need to do some extra steps.
Let's say you have a version of the package you want to test.
You'd still create a specific folder for that version, but you leave it empty.
Here I'll use /tmp/mypkg2 as an example.
While having your project open in RStudio, you execute:
withr::with_libpaths(c("/tmp/mypkg2", .libPaths()), devtools::install())
That will install that version of the package to the provided folder.
You could then have a wrapper script,
say wrapper.R,
with something like:
pkg_path <- commandArgs(trailingOnly = TRUE)[1L]
cat("Using package at", pkg_path, "\n")
.libPaths(c(pkg_path, .libPaths()))
library(doParallel)
workers <- makeCluster(detectCores())
registerDoParallel(workers)
# We need to modify the lib path in each worker too
parallel::clusterExport(workers, "pkg_path")
parallel::clusterEvalQ(workers, .libPaths(c(pkg_path, .libPaths())))
# ... Your code calling your package and doing stuff
parallel::stopCluster(workers)
Afterwards, from the command line (outside of R/RStudio),
you could type (assuming Rscript is in your path):
Rscript path/to/wrapper.R /tmp/mypkg2
This way, the actual testing code can stay the same
(including calls to library)
and R will automatically search first in pkg_path,
loading your specific package version,
and then searching in the standard locations for any dependencies you may have.
I don't fully understand your use-case (as to why you want to do this) but what I normally do when testing two versions of a package is to push the most recent version to my dev branch in GitHub and then use devtools::load_all() to test what I'm currently working on. Then by using remotes::install_github() and specifying the dev branch you can run the GitHub version with mypackage::func and the devtools version with func

R check for a package: optimize the way DLLs are built and checked

I want to optimize my process for building a package. I have in pckgname/src some fortran code (f90):
pckgname/src/FortranFile1.f90
pckgname/src/FortranFile2.f90
I am under RStudio. When I build the package, it creates the src-i386 and src-x64 folders, inside which executable files in .o are produced
pckgname/src-i386/FortranFile1.o
pckgname/src-i386/FortranFile2.o
pckgname/src-x64/FortranFile1.o
pckgname/src-x64/FortranFile2.o
then dll files are produced into each of these folders from the .o files:
pckgname/src-i386/dllname.dll
pckgname/src-x64/dllname.dll
thereafter if I want to check the code successfully, I need to manually copy paste the dll into these two folders (in the previous version of the question i wrote code instead of dll which might have led to misunderstandings)
pckgname/inst/libs/x64/dllname.dll
pckgname/libs/X64/dllname.dll
My question is: is it normal that I have to do this or is there a shorter way without having to copy paste by hand dllname.dll
into these two folders? It could be indeed a source of error.
NB: If i don't copy the dlls into the said folders I get the following error messages (translated from the French):
Error in inDL(x, as.logical(local), as.logical(now), ...) :
impossible to load shared object 'C:/Users/username/Documents/pckgname/inst/libs/x64/dllname.dll':
LoadLibrary failure: The specified module can't be found
Error in inDL(x, as.logical(local), as.logical(now), ...) :
impossible to load shared object 'C:/Users/username/Documents/pckgname/libs/x64/dllname.dll':
LoadLibrary failure: The specified module can't be found.
[...]
`cleanup` is deprecated
The short answer
Is it normal that I have to do this?
No. If path/to/package is the directory you are developing your package in, and you have everything set up for your package to call your Fortran subroutines correctly (see "The long answer"), you can run
R CMD build path/to/package
at the command prompt, and a tarball will be constructed for you with everything in the right place (note you will need Rtools for this). Then you should be able to run
R CMD check packagename_versionnumber.tar.gz
from the command prompt to check your package, without any problems (stemming from the .dll files being in the wrong place -- you may have other problems, in which case I would suggest asking a new question with the ERROR, WARNING, or NOTE listed in the question).
If you prefer to work just from R, you can even
devtools::check("path/to/package")
without having to run devtools::build() or R CMD build ("devtools::check()... [b]undles the package before checking it" -- Hadley's chapter on checking; see also Karl Broman's chapter on checking).
The long answer
I think your question has to do with three issues potentially:
The difference between directory structure of packages before and after they're installed. (You may want to read the "What is a package?" section of Hadley's Package structure chapter -- luckily R CMD build at the command prompt or devtools::build() in R will take care of that for you)
Using INSTALL vs. BUILD (from the comments to the original version of this answer)
The proper way to set up a package to call Fortran subroutines.
You may need quite a bit of advice on the process of developing R packages itself. Some good guides include (in increasing order of detail):
Karl Broman's R package primer
Hadley Wickham's R packages
The Writing R Extensions manual
In particular, there are some details about having compiled code in an R package that you may want to be aware of. You may want to first read Hadley's chapter on compiled code (Broman doesn't have one), but then you honestly need to read most of the Writing R Extensions manual, in particular sections 1.1, 1.2, 1.5.4, and 1.6, and all of chapters 5 and 6.
In the mean time, I've setup a GitHub repository here that demonstrates a toy example R package FortranExample that shows how to correctly setup a package with Fortran code. The steps I took were:
Create the basic package structure using devtools::create("FortranExample").
Eliminate the "Depends" line in the DESCRIPTION, as it set a dependence on R >= 3.5.1, which will throw a warning in check (I have now also revised the "License" field to eliminate a warning about not specifying a proper license).
Make a src/ directory and add toy Fortran code there (it just doubles a double value).
Use tools::package_native_routine_registration_skeleton("FortranExample") to generate the symbol registration code that I placed in src/init.c (See Writing R Extensions, section 5.4).
Create a nice R wrapper for using .Fortran() to call the Fortran code (placed in R/example_function.R).
In that same file use the #' #useDynLib FortranExample Roxygen tag to add useDynLib(FortranExample) to the NAMESPACE file; if you don't use Roxygen, you can put it there manually (See Writing R Extensions 1.5.4 and 5.2).
Now we have a package that's properly set up to deal with the Fortran code. I have tested on a Windows machine (running Windows 8.1 and R 3.5.1) both the paths of running
R CMD build FortranExample
R CMD check FortranExample_0.0.0.9000.tar.gz
from the command prompt, and of running
devtools::check("FortranExample")
from R. There were no errors, and the only warning was the "License" issue mentioned above.
After cleaning up the after-effects of running devtools::check("FortranExample") (for some reason the cleanup option is now deprecated; see below for an R function to handle this for you inspired by devtools::clean_dll()), I used
devtools::install("FortranExample")
to successfully install the package and tested its function, getting:
FortranExample::example_function(2.0)
# [1] 4
The cleanup function I mentioned is
clean_source_dirs <- function(path) {
paths <- file.path(path, paste0("src", c("", "-i386", "-x64")))
file_pattern <- "\\.o|so|dll|a|sl|dyl"
unlink(list.files(path = paths, pattern = file_pattern, full.names = TRUE))
}
No, it is not normal and there is a solution to this problem. Make use of Makevars.win. The reason for your problem is that .dlls are looking for dependencies in places defined by environment variable PATH and relative paths defined during the linking. Linking is being done when running the command R CMD INSTALL as it is stated in Mingw preferences plus some custom parameters defined in the file Makevars.win (Windows platform dependent). As soon as the resulting library is copied, the binding to the places where dependent .dlls were situated may become broken, so if you put dlls in a place where typically dependent libraries reside, such as, for instance, $(R_HOME)/bin/$(ARCH)/,
cp -f <your library relative path>.dll $(R_HOME)/bin/$(ARCH)/<your library>.dll
during the check R will be looking for your dependencies specifically there too, so you will not miss the dependencies. Very crude solution, but it worked in my case.

How does a typical Rcpp edit-compile-test cycle look like?

I can only find information on how to install a ready-made R extension package, but it is nowhere mentioned which commands a developer of an extension package has to use during daily development. I am using Rcpp and I am on Windows.
If this were a typical C++ project, it would go like this:
edit
make # oops, typo
edit # fix typo
make # oops, forgot an #include
edit
make # good; updates header dependencies for subsequent 'make' automatically
./fooreader # test it
make install # only now I'm ready
Which commands do I need for daily development of an Rcpp package project?
I've allocated a skeleton project using these commands from the R command line:
library(Rcpp)
Rcpp.package.skeleton("FooReader", example_code=FALSE,
author="My Name", email="my.email#example.com")
This allocated 3 files:
DESCRIPTION
NAMESPACE
man/FooReader-package.Rd
Now I dropped source code into
src/readfoo.cpp
with these contents:
#include <Rcpp.h>
#error here
I know I can run this from the R command line:
Rcpp::sourceCpp("D:/Projects/FooReader/src/readfoo.cpp")
(this does run the compiler and indicates the #error).
But I want to develop a package ultimately.
There is no universal answer for everybody, I guess.
For some people, RStudio is everything, and with some reason. One can use the package creation facility to create an Rcpp package, then edit and just hit the buttons (or keyboard shortcuts) to compile and re-load and test.
I also work a lot on a shell, so I do a fair amount of editing in Emacs/ESS along with R CMD INSTALL (where thanks to ccache recompilation of unchanged code is immediate) with command-line use via r of the littler package -- this allows me to write compact expressions loading the new package and evaluating: r -lnewpackage -esomeFunc(somearg) to test newpackage::someFunc() with somearg.
You can also launch the build and test from Emacs. As I said, it all depends.
Both those answers are for package, where I do real work. When I just test something in a single file, I do that in one Emacs buffer and sourceCpp() in an R session in another buffer of the same Emacs. Or sometimes I edit in Emacs and run sourceCpp() in RStudio.
There is no one answer. Find what works for you.
Also, the first part of your question describes the initial setup of a package. That is not part of the edit/compile/link/test cycle as it is a one off. And for that too do we have different approaches many of which have been discussed here.
Edit: The other main misunderstanding of your question is that once you have package you generally do not use sourceCpp() anymore.
In order to test an R package, it has to be installed into a (temporary) library such that it can be attached to a running R process. So you will typically need:
R CMD build . to build package_version.tar.gz
R CMD check <package_version.tar.gz> to test your package, including tests placed into the testsfolder
R CMD INSTALL <package_version.tar.gz> to install it into a library
After that you can attach the package and test it. Quite often I try to use a more TTD approach, which means I do not have to INSTALL the package. Running the unit tests (e.g. via R CMD check) is enough.
All that is independent of Rcpp. For a package using Rcpp you need to call Rcpp::compileAttributes() before these steps, e.g. with Rscript -e 'Rcpp::compileAttributes()'.
If you use RStudio for package development, it offers a lot of automation via the devtools package. I still find it useful to know what has to go on under the hood and it is by no means required.

Profiling an installed R package with source line numbers?

I'd like to profile functions in an installed R package (data.table) using Rprof() with line.profiling=TRUE. Normally, installed package are byte compiled, and line numbers are not available for byte compiled packages. The usual instructions for line profiling with Rprof() require using source() or eval(parse()) so that srcref attributes are present.
How can I load data.table so that line numbers are active? My naive attempts to first load the package with library(data.table) and then source('data.table.R') fails because some of the compiled C functions are not found when I attempt to use the package, presumably because library() is using a different namespace. Maybe there is some way to source() into the correct namespace?
Alternatively, perhaps I can build a modified version of data.table that is not byte compiled, and then load that in a way that keeps line numbers? What alterations would I have to make, and how would I then load it? I started by setting ByteCompile: FALSE and then trying R CMD INSTALL -l ~/R/lib --build data.table, but this still seems to be byte compiled.
I'm eager to make this work and will pursue any suggestions. I'm running R 3.2.1 on Linux, have full control over the machine, and can install anything else that is required.
Edit:
A more complete description of the problem I was trying to solve (and the solution for it) is here: https://github.com/Rdatatable/data.table/issues/1249
I ended up doing essentially what Joshua suggested: recompile the package with "KeepSource: TRUE" in the DESCRIPTION. For my purposes, I also found "ByteCompile: FALSE" to be helpful, although this might not apply generally. I also changed the version number so I could see that I was using my modified version.
Then I installed to a different location with "R CMD INSTALL data.table -l ~/R/lib", and loaded with "library(data.table, lib='~/R/lib')". When used with the patches given in the link, I got the line numbers of the allocations as I desired. But if anyone knows a solution that doesn't require recompilation, I'm sure that others would appreciate if you shared.
You should be able to get line numbers even if the package is byte-compiled. But, as it says in ?Rprof (emphasis added):
Individual statements will be recorded in the profile log if
line.profiling is TRUE, and if the code being executed was
parsed with source references. See parse for a discussion of
source references. By default the statement locations are not
shown in summaryRprof, but see that help page for options to
enable the display.
That means you need to set KeepSource: TRUE either in the DESCRIPTION file or via the --with-keep.source argument to R CMD INSTALL.

Resources