Can I load a package's data set without installing the package? - r

In package ISLR, there is a data set called Default.
I want to use that data set, but the ISLR package is not installed on my machine.
data(Default)
# Warning message:
# In data(Default) : data set ‘Default’ not found
library(ISLR)
# Error in library(ISLR) : there is no package called ‘ISLR’
Since I'll probably never use it again, I don't want to install the package. I thought about reading it from the web, but it's not in the linked web page from the package description.
In general, is there a way to load a data set from a package without installing the package?

You can do this from within R:
download.file("http://cran.r-project.org/src/contrib/ISLR_1.0.tar.gz",
dest="ISLR.tar.gz")
untar("ISLR.tar.gz",files="ISLR/data/Default.rda")
L <- load("ISLR/data/Default.rda")
summary(Default)
If you want to keep a copy of the data file:
file.copy("ISLR/data/Default.rda",".")
Clean up:
unlink(c("ISLR.tar.gz","ISLR"),recursive=TRUE)
I'm not sure you can get around having to download the tarball -- in principle you might be able to run untar() directly on a network connection, but I don't think the underlying machinery can actually extract a file without downloading the whole tarball to somewhere on your machine first.

You said, "Since I'll probably never use it again, I don't want to install the package." If the fact that you'll never use it again is your main concern, then perhaps this solution is not quite what you want, but it is probably the simplest solution:
Install the package with install.packages().
Extract and save the dataset that you want.
Uninstall the package with remove.packages().
So the final result is what you want in three simple steps, though the process does involve installing the package, which you hoped to avoid. But you end up without the package in your system that you don't want, so the end result is the same as what you want.

Related

Where to put R files that generate package data

I am currently developing an R package and want it to be as clean as possible, so I try to resolve all WARNINGs and NOTEs displayed by devtools::check().
One of these notes is related to some code I use for generating sample data to go with the package:
checking top-level files ... NOTE
Non-standard file/directory found at top level:
'generate_sample_data.R'
It's an R script currently placed in the package root directory and not meant to be distributed with the package (because it doesn't really seem useful to include)
So here's my question:
Where should I put such a file or how do I tell R to leave it be?
Is .Rbuildignore the right way to go?
Currently devtools::build() puts the R script in the final package, so I shouldn't just ignore the NOTE.
As suggested in http://r-pkgs.had.co.nz/data.html, it makes sense to use ./data-raw/ for scripts/functions that are necessary for creating/updating data but not something you need in the package itself. After adding ./data-raw/ to ./.Rbuildignore, the package generation should ignore anything within that directory. (And, as you commented, there is a helper-function devtools::use_data_raw().)

Using another R package function without using the whole package as dependency

I'm working on an R package here and got this doubt: I need an auxiliar function from another package, but I don't want to include the entire package as a dependency because I only need this one function. What is the correct procedure here? Is it OK if both codes are GPL-2 and I just copy/paste the function to my package? Should I contact the author? Or is it best to include the whole package as a dependency?
If it's just a small function, I don't see a problem with copying the code into your own package (since everything is GPLed). You should acknowledge the source in your package though.
This has the benefit of insulating your code from any changes in the other package; it's not unusual for updates to packages to break other packages downstream. It has the downside that if those updates were useful (bug fixes or added functionality) then you don't benefit from them either.

R package namespace issue using data() -- data set not found

I've hit an issue trying to import a package (namely, 'robfilter') inside one of my own packages. One of its methods that I am trying to use, adore.filter, is failing at this line:
data(critvals)
With error 'data set 'critvals' not found'.
The function works fine if I load the library via require(robfilter). However, this means that in order to use my custom package which calls adore.filter, I will have to load my own package, and then load robfilter. Not a huge problem but slightly annoying.
I'm not sure if the problem is that there is an extra step I need to do in order to make critvals visible within my package, or if perhaps there is something the package author needed to do (and hasn't done) to add critvals to its package namespace; there is no sign of 'critvals' in the robfilter NAMESPACE file. I haven't encountered this issue before and don't really understand how the use of data() inside a package is supposed to work.
There are two solutions as far as I know:
Either ask the robfilter Maintainer to put the data needed by robfiler in the internal data file of robfilter. (R/sysdata.rda)
Or make your package Depends on robfilter
So it works if you put robfilter in the depends section of your description file. But in my case (both are my packages), I was trying to avoid the Depends solution as it loads the imported package and also any other package will need to depend ont its imported package... See my question is quite a duplicate of yours but not in the same context.

What methods exist for distributing a semi-live dataset with an R package?

I am building a package for internal use using devtools. I would like to have the package load in data from a file/connection (that differs depending on the date package is built). The data is large-ish so having a onetime cost of parsing and loading the data during package building is preferable.
Currently, I have a data.R file under R/ that assigns the data to package-level variables, the values are assigned during package installation (or at least that's what appears to be happening). This less than ideal setup mostly works. In order to get all instances of the package to have the same data I have to distribute the data file with the package (currently it's being copied to inst/ by a helper script before building the package) instead of just having it all be packaged together. There must be a better way.
Such as:
Generate .rda files during package building (but this requires not running the same code during package install)
I can do this with a Makefile but that seems like overkill
Can I have R code that is only run during package building and not during install?
Run R code in data/
But the data is munged using code in the package in question. I can fix that with Collate (I think) but then I have to maintain the order of all of the .R files (but with that added complexity I might as well use a Makefile?)
Build two packages, one with all of the code I want, one with the data.
Obvious, clever things I've not thought of.
tl;dr: What are some methods for adding a snapshot of dynamically changing data to an R package frozen for deployment?
As #BenBolker points out in the comments above, splitting the dataset out into a different package has precedent in the community (most notably the core package datasets) and has additional benefits.
The separation of functions from data also makes working on historic versions of the data easier to do with the up to date functions.
I currently have an tools-to-munge package and a things-to-munge package. Using a helper script I can build the tools-to-munge and setup a Suggests (or Depends) in the DESCRIPTION of both packages to point to the appropriate incrementing version of the packages. After the new tools-to-munge package has been built I can build the things-to-munge package as necessary using the functions in the tools-to-munge package.

How do I load a package without installing it in R?

I have built an R package but I do not want my users to have to install it before using it.
Is there a way to load a package without having to install it?
For example, if I have a package mypackage.tar.gz, is there something like
library("mypackage.tar.gz")
?
I'll join in "the chorus" of suggesting you should really install the package.
That having been said, you can take a look at Hadley's devtools package, which will let you load packages into the workspace without dumping in your global workspace.
The package will have to be untar'd/unzipped and follow the standard R package structure.
In order for this to work, though, your users would have to have the devtools package installed, so ... I'm not sure that this is any type of win for you.
If you only need the code to be loaded without it being installed, take the raw R script and source it:
source(myScript.R)
If you have different functions, you can create an R script that just loads all the necessary source files. What I sometimes do when developing, is name all my functions F_some_function.R and my classes Class_some_function.R. This allows me to source a main file containing following code :
funcdir <- "C:/Some/Path"
files <- dir(funcdir)
srcfiles <- c(grep("^Class_",dir(funcdir),value=T),
grep("^F_",dir(funcdir),value=T)
)
for( i in paste(funcdir,srcfiles,sep="/")) source(i)
If you present them with the tarred file, they can untar themselves using untar() before sourcing the main file.
But honestly, please use a package. You load everything in the global environment (or in a specified environment if you use local=T), but you lose all functionality of a package. Installing a package is no hassle, and removing neither.
If it's a matter of writing rights on the C drive (which is the only possible reason not to use a package that I met in my carreer), one can easily set another library location. R 2.12 actually does this by itself on Windows. See ?.libPaths()

Resources