I wrote a R package which uses awk to do some initial filtering on data. But the awk the package uses needs be higher than a certain version.
Where do you recommend to specify this dependency?
I can list the dependency in SystemRequirements in DESCRIPTION file
http://r.789695.n4.nabble.com/how-to-list-external-dependencies-i-e-non-R-packages-td4693947.html
But it doesn't really do the check. Anyway, it may be good enough.
If you really wanted to go the long way, we now provide a template:
package x13binary installs the X-13ARIMA-SEATS binary from US Census
it takes a binary from a matching GitHub repo x13prebuilt we have set up just to provide these binaries
packages needing X-13 deseasonalization such as seasonal simply depend on x13binary
they then use wrapper scripts that find the binary as part of x13binary and use it
X-13ARIMA-SEATS is open source under a weird US Government license meaning that it is available as source, but with terms that vary a little a between the US and the rest of the world -- reflecting its origin from a world where license were less well understood.
This scheme might be overkill for you. On the other hand, you have simply no way to ensure that you will get the correct / minimally required awk version on Windows, OS X or arbitrary Linux.
List it in the SystemRequirements field of the DESCRIPTION file. For an example, see Ryacas.
Related
I'm developing a package (golem) in R, and it returns a NOTE about excess package in an Import (DESCRIPTION):
checking package dependencies … NOTE
Imports includes 34 non-default packages.
Importing from so many packages makes the package vulnerable to any of
them becoming unavailable. Move as many as possible to Suggests and
use conditionally.
I have allocated some packages in Suggests (DESCRIPTION), like this:
usethis::use_package(package = "ggplot2", type = "Suggests")
usethis::use_package(package = "MASS", type = "Suggests")
I would like to know :
What is the difference between Imports (run-time) vs Suggests (develop-time) and if the latter has anything to do with the term "compile time" of other programming languages.
How do I know a package is needed by the user at runtime? Is there any universal rule for this (like a phrase to help you know)? And for Suggests?
In R, packages listed in the Imports clause of the DESCRIPTION file must be available or your package won't load. Normally they will all be loaded when your package is loaded, though it's possible to delay that by not importing anything, just using :: notation to access them.
Packages listed in the Suggests clause don't need to be available, and won't be automatically loaded. To access their functions, you normally call requireNamespace() to find out if the package is available, and if so use :: for access. If it is not available, your package should fail gracefully in whatever the user was trying to do, letting them know that they need to install the missing package if they want the task to succeed.
These aren't really "run-time" versus "develop-time" differences. It's all run-time.
There are two things in R that might be called "compile-time" in other languages. The best match is installing your package. That configures it to the particular R version and platform it is running on. R also has a "just-in-time" compiler that optimizes functions, but other than a bit of a speed increase that is pretty much invisible to the user.
I think #r2evans answered your second question clearly in a comment: the user needs a package to use functions that use that package. If some of your functions that use it are unlikely to be used by most users, use Suggests, and add the test.
sessionInfo() includes very useful info that will improve the chances of someone being able to run your code on their machine, including
OS and version
R version
Attached packages
What other info can be provided with an R script to ensure someone else will be able to run it in their environment?
NB please include how to get that info (i.e. what command to run or where to look for it)
While this is not a complete answer, I tend to include this function with scripts I send along as it will download a package if the computer does not have it. This is more of a suggestion for scripts. For packages, you can explicitly put what versions of other packages your package depends on.
package_load<-function(packages = NULL, quiet=TRUE,
verbose=FALSE, warn.conflicts=FALSE){
# download required packages if they're not already
pkgsToDownload<- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(pkgsToDownload)>0)
install.packages(pkgsToDownload, repos="http://cran.us.r-project.org",
quiet=quiet, verbose=verbose)
# then load them
for(i in 1:length(packages))
require(packages[i], character.only=T, quietly=quiet,
warn.conflicts=warn.conflicts)
}
## Example of use
package_load(c('dplyr', 'rgdal'))
This is helpful for one off scripts as it gets over the hurdle of a different computer not having the appropriate packages. However, I generally suggest to folks to make sure their version of R is up to date as well.
Is this the best solution? Probably not, but it does help with minor scripts you send along to others. For a larger code base, it would probably be better to put together a package or a docker image.
I think the criterion you listed are the "basics" of reusability of a script. The next levels would be the possible interaction of your scripts (e.g. R Shiny scripts will use web features: therefore, giving the web browser and its version used to produce the script is a good practice). Also, another kind of information would be commentaries precising the expected input and outputs.
NB: I would precise "attached packages and their versions", just for us to be sure...
I get this warning
Non-standard file/directory found at top level:
‘data-raw’
when building my package, even there is the recommendation of creating this folder to create package data http://r-pkgs.had.co.nz/data.html#data-sysdata
Any comments on that or do I need a specific setting to get rid of this message.
When used, data-raw should be added to .Rbuildignore. As explained in the Data section of Hadley's R-Packages book (also linked in the question)
Often, the data you include in data/ is a cleaned up version of raw data you’ve gathered from elsewhere. I highly recommend taking the time to include the code used to do this in the source version of your package. This will make it easy for you to update or reproduce your version of the data. I suggest that you put this code in data-raw/. You don’t need it in the bundled version of your package, so also add it to .Rbuildignore. Do all this in one step with:
usethis::use_data_raw()
A user can work on many PCs. A good code runs no matter what PC it is running on. Assuming one does not want to rely on preference and option files, what is the best way to make sure a package is loaded (and installed if needed).
library command is cool, but the require command is much better. But even require is not getting the job done.
Triggering re-install that is not needed (eg, in R studio) causes an interesting prompt to restart the R session - and this is why unnecessary installs are best avoided.
One possible trick A is to do this (not to type the package name too often)
doInstall <- T;toInstall <- c("downloader");
if(doInstall) install.packages(toInstall);
lapply(toInstall, library, character.only = T)
or a worse trick B would be
if (!require(downloader)) {install.packages("downloader"); require(downloader)}
Is there a "2015 way" of doing it with one command - something like
justdoitall(c("downloader","dplyr"))
Here is an example of installing package zipcode using the pacman approach.
if (!require("pacman")) install.packages("pacman")
pacman::p_load(zipcode)
Assuming one does not want to rely on preference and option files
That rules out putting anything in .Rprofile or using external packages so we're stuck with base R to solve your problem. If that's the case then the answer is that you can't do this much better than what you have written in your question (I prefer B to A)
If you're willing to bend a little bit and require the user to load a package first (which could be done on startup by using .Rprofile) there are a few options that do exactly what you want.
installr::require2 and pacman::p_load do what you ask. Disclosure: I am a an author/maintainer of pacman. I agree with your sentiment that we shouldn't rely on options or external files though especially if we plan on sharing the code. I use pacman pretty much every day (it has much more use than just installing/loading packages) but for the most part these types of functions should be treated as useful for interactive use but if you want portable, shareable code without worries about whether packages will be available you will have to resort to something along the lines of what you have in your question.
I'd like to create a zip archive from within R, and need maximal cross-platform compatibility, so I would prefer not to use a system("zip") command.
Within utils there's zip.file.extract (aka unzip), which uses [a lot of] c code, derived from zlib 1.1.3 within a file called dounzip.c I couldn't find any similar capabilities for creating zip files.
It's also tricky to construct a specific google query for "cran create zip" or equivalent!
Also, a tar will not suffice, I need to creating zip's to use as input for another set of non-R tools.
I'd appreciate any pointers?
cheers,
mark
As usual the amazing Omega Project for Statistical Computing is a valuable resource! Take a look at the Rcompression package and try, for example, something like:
?gzip
txt <- paste(rep("This is a string", 40), collapse = "\n")
v <- gzip(txt))
writeBin(v, "test.txt.zip")
HTH
I think the command gzfile() may also do what you're looking for. Also note that in the upcoming version 2.10.0 there are some enhancements to compression functions that may be relevant. (see https://svn.r-project.org/R/trunk/NEWS -- the svn server may ask you to accept a certificate)