Correct way to handle dataset dependencies in package development?

Correct way to handle dataset dependencies in package development? - r

I'm attempting to build a package that depends on some data from another package. Writing R Extensions says to avoid the use of require in package functions. I may not use all the tables in the Lahman package, and am currently importing them this way...
team.batting <- function(year, league, playoffs = FALSE)
{
...
Batting <- Lahman::Batting
Teams <- Lahman::Teams
## calculations, subsets, etc.
...
}
Is this correct? If not, what is the correct way to call an exported data set in a package function? And is the end user required to have the package installed for this to work?
Also, I'm not really clear on what a development version is, as compared to an installed version. If anyone could shed some light, I'd appreciate it.

After some research, I've determined the the correct way to do this is to include the directive
import(Lahman)
in the NAMESPACE file of my package (or possibly importFrom(Lahman, table name) depending on how many tables are used). After doing this, the :: calls can be removed.
team.batting <- function(year, league, playoffs = FALSE)
{
...
bat <- Batting
tms <- Teams
## calculations, subsets, etc.
...
}

Related

Determining which version of a function is active when many packages are loaded

If I have multiple packages loaded that define functions of the same name, is there an easy way to determine which version of the function is currently the active one? Like, lets say I have base R, the tidyverse, and a bunch of time series packages loaded. I'd like a function which_package("intersect") that would tell me the package name of the active version of the intersect function. I know you can go back and look at all the warning messages you recieved when installing packages, but I think that sort of manual search is not only tedious but also error-prone.
There is a function here that does sort of what I want, except it produces a table for all conflicts rather than the value for one function. I would actually be quite happy with that, and would also accept a similar function as an answer, but I have had problems with the implimentation of function given. As applied to my examples, it inserts vast amounts of white space and many duplicates of the package names (e.g. the %>% function shows up with 132 packages listed), making the output hard to read and hard to use. It seems like it should be easy to remove the white space and duplicates, and I have spent considerable time on various approaches that I expected to work but which had no impact on the outcome.
So, for an example of many conflicts:
install.packages(pkg = c("tidyverse", "fpp3", "tsbox", "rugarch", "Quandl", "DREGAR", "dynlm", "zoo", "GGally", "dyn", "ARDL", "bigtime", "BigVAR", "dLagM", "VARshrink")
lapply(x = c("tidyverse", "fable", "tsbox", "rugarch", "Quandl", "DREGAR", "dynlm", "zoo", "GGally", "dyn", "ARDL", "bigtime", "BigVAR", "dLagM", "VARshrink"),
library, character.only = TRUE)

You can pull this information with your own function helper.
which_package <- function(fun) {
if(is.character(fun)) fun <- getFunction(fun)
stopifnot(is.function(fun))
x <- environmentName(environment(fun))
if (!is.null(x)) return(x)
}
This will return R_GlobalEnv for functions that you define in the global environment. There is also the packageName function if you really want to restrict it to packages only.
For example
library(MASS)
library(dplyr)
which_package(select)
# [1] "dplyr"

How do I load data from another package from within my package

One of the functions in a package that I am developing uses a data set from the acs:: package (the fips.state object). I can load this data into my working environment via
data(fips.state, package = "acs"),
but I do not know the proper way to load this data for my function. I have tried
#importFrom acs fips.state,
but data sets are not exported. I do not want to copy the data and save it to my package because this seems like a poor development practice.
I have looked in http://r-pkgs.had.co.nz/namespace.html, http://kbroman.org/pkg_primer/pages/docs.html, and https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Data-in-packages, but they do not include any information on sharing data sets from one package to another.
Basically, how do I make a data set, that is required by functions in another package available to the functions in my package?

If you don't have control over the acs package, then acs::fips.state seems to be your best bet, as suggested by #paleolimbot.
If you are going to make frequent calls to fips.state, then I'd suggest making a local copy via fips.state <- acs::fips.state, as there is a small cost in looking up objects from other packages that you might do well to avoid incurring multiple times.
But if you are able to influence the acs package (even if you are not, I think this is a useful generalization), then mikefc suggests an alternative solution, which is to set the fips.state object up as internal to the package, and then to export it:
usethis::use_data(fips.state, other.data, internal = FALSE)
And then in NAMESPACE:
export(fips.state)
or if using roxygen2:
#' Fips state
#' #name fips.state
#' #export
"fips.state"
Then in your own package, you can simply #importFrom acs fips.state.

You can always use package::object_name (e.g., dplyr::starwars) anywhere in your package code, without using an import statement.
is_starwars_character <- function(character) {
character %in% dplyr::starwars$name
}
is_starwars_character("Luke Skywalker")
#> [1] TRUE
is_starwars_character("Indiana Jones")
#> [1] FALSE

Choose function to load from an R package

I like using function reshape from the matlab package, but I need then to specify base::sum(m) each time I want to sum the elements of my matrix or else matlab::sum is called, which only sums by columns..
I need loading package gtools to use the rdirichlet function, but then the function gtools::logit masks the function pracma::logit that I like better..
I gess there are no such things like:
library(loadOnly = "rdirichlet", from = "gtools")
or
library(loadEverythingFrom = "matlab", except = "sum")
.. because functions from the package matlab may internaly work on the matlab::sum function. So the latter must be loaded. But is there no way to get this behavior from the point of view of the user? Something that would feel like:
library(pracma)
library(matlab)
library(gtools)
sum <- base::sum
logit <- pracma::logit
.. but that would not spoil your ls() with all these small utilitary functions?
Maybe I need defining my own default namespace?

To avoid spoiling your ls, you can do something like this:
.ns <- new.env()
.ns$sum <- base::sum
.ns$logit <- pracma::logit
attach(.ns)

To my knowledge there is no easy answer to what you want to achieve. The only dirty hack I can think of is to download the source of the packages "matlab", "gtools", "pracma" and delete the offending functions from their NAMESPACE file prior to installation from source (with R CMD INSTALL package).
However, I would recommend using the explicit notation pracma::logit, because it improves readability of your code for other people and yourself in the future.
This site gives a good overview about package namespaces:
http://r-pkgs.had.co.nz/namespace.html

Uninstall (remove) R package with dependencies

I wanted to try some new package. I installed it, it required a lot of dependencies, so it installed plenty of other packages. I tried it and I am not impressed - now I would like to uninstall that package including all the dependencies!
Is there any way to remove given packages including all dependencies which are not needed by any other package in the system?
I looked at ?remove.packages but there is no option to do this.

Here is some code that will all you to remove a package and its unneeded dependencies. Note that its interpretation of "unneeded" dependent packages is the set of packages that this package depends on but that are not used in any other package. This means that it will also default to suggesting to uninstall packages that have no reverse dependencies. Thus I've implemented it as an interactive menu (like in update.packages) to give you control over what to uninstall.
library("tools")
removeDepends <- function(pkg, recursive = FALSE){
d <- package_dependencies(,installed.packages(), recursive = recursive)
depends <- if(!is.null(d[[pkg]])) d[[pkg]] else character()
needed <- unique(unlist(d[!names(d) %in% c(pkg,depends)]))
toRemove <- depends[!depends %in% needed]
if(length(toRemove)){
toRemove <- select.list(c(pkg,sort(toRemove)), multiple = TRUE,
title = "Select packages to remove")
remove.packages(toRemove)
return(toRemove)
} else {
invisible(character())
}
}
# Example
install.packages("YplantQMC") # installs an unneeded dependency "LeafAngle"
c("YplantQMC","LeafAngle") %in% installed.packages()[,1]
## [1] TRUE TRUE
removeDepends("YplantQMC")
c("YplantQMC","LeafAngle") %in% installed.packages()[,1]
## [1] FALSE FALSE
Note: The recursive option may be particularly useful. If package dependencies further depend on other unneeded packages, setting recursive = TRUE is vital. If dependencies are shallow (i.e., only one level down the dependency tree), this can be left as FALSE (the default).

There is in fact a function remove.packages() in base R, but it's in the package utils, which you need to load first:
library(utils)
remove.packages()
It's not entirely clear to me how much recursive cleanup this function does.

There are base R ways to handle this but I'm going to recommend a package (I know you're trying to get rid of these). I'm recommending this package for 2 reasons (1) it solves two problems you're having & (2) Dason K. and I are developing this package (full disclosure). This package's value stands in that the functions are easier to remember names that are consistent. It also does some combined operations. Note you could do all of this in base but this question is already pretty localized so thus I'm going to use a tool that makes answering easier.
This package will:
allow you to delete package and dependencies
allow you to install packages in a temporary directory rather than main library
The caveat is that you can't be 100% certain that the package dependency wasn't already there, installed by the user previously. Therefore I would take caution with every step of this solution that you're not deleting things that are of importance. This solution relies on 2 factors (1) pacman (2) file.info. We'll assume that dependencies that were modified within a certain (user defined) threshold of time are indeed unwanted packages. Note the word assume here.
I made this reproducible for the folks at home in that the answer will randomly install a package from CRAN with additional dependencies (this installs a package you do not already have locally with 3 or more dependencies; used random to not single out any package).
Making a reproducible example
library(pacman)
(available <- p_cran())
(randoms <- setdiff(available, p_lib()))
(mypackages <- p_lib())
ndeps <- 1
while(ndeps < 3) {
package <- sample(randoms, 1)
deps <- unlist(p_depends(package, character.only=TRUE), use.names=FALSE)
ndeps <- length(setdiff(deps, mypackages))
}
package
p_install(package, character.only = TRUE)
Uninstalling package
We will assign the package name from the first part to package or the OP can use the unwanted package they installed and assign that to package (my random package happened to be package <- "OrdinalLogisticBiplot"). This deletions process should, ideally, be done in a clean R session with no add-on packages (except pacman) loaded.
## function to grab file info date/time modified
infograb <- function(x) file.info(file.path(p_path(), x))[["mtime"]]
## determine the differences in times modified for "package"
## and all other packages in library
diffs <- as.numeric(infograb(package)) - sapply(p_lib(), infograb)
## user defined threshold
threshold <- 15
## determine packages just installed within the time frame of the unwanted package
(delete_deps <- diffs[diffs < threshold & diffs >= 0])
## recursively find all packages that could have been installed
potential_depends <- unlist(lapply(unlist(p_depends(package, character=TRUE)),
p_depends, character=TRUE, recursive=TRUE))
## delete packages that are both on the lists of (1) installed within time
## frame of unwanted package and a dependency of that package
p_delete(intersect(names(delete_deps), potential_depends), character.only = TRUE)
This approach makes some big assumptions.
A better approach from the get go
p_temp(package_to_try)
This allows you to try it out first and not have it muddy your local library.
If you're unimpressed with pacman you can use the method described above to delete it.

Here is a quick solution to have a look at the packages that are not required by any other locally installed packages.
library(pacman)
ip <- installed.packages()[,1]
deps <- lapply(1:length(ip), function(i) tryCatch(p_depends(ip[i], local = TRUE)$Imports, error = function(e) return(NULL)))
packages.on.which.things.depend <- sort(unique(unlist(deps)))
packages.on.which.nothing.depends <- setdiff(ip, packages.on.which.things.depend)
packages.on.which.nothing.depends
Then, have a quick look at it and remove the ones you are not using (just be careful and do not try to uninstall base).
After you have determined which ones you use and which you don’t use, you may proceed with something like
i.need <- c("AER", "car", "devtools", "glmnet", "gmm", "Hmisc", "pacman", "plm", "RcppArmadillo", "RcppEigen", "rmarkdown", "rugarch", "base", "datasets")
un <- setdiff(packages.on.which.nothing.depends, i.need)
un
remove.packages(un)
Rinse and repeat until there are no unneeded orphanes packages. R should not allow you to remove built-in system packages.

Save package settings between sessions

Is there a definitive way to save options or information pertaining to a certain package between sessions?
For example say somebody made a game and released it as an R package. If they wanted to save high scores and not have them reset each time R started a new session what would be the best way to do this? Currently I can only think of storing a file in the users home directory but I'm not sure if I like that approach.

This may be an approach. I created a dummy package with a dummy function (any function I create is bound to be a dummy function) and a data set I called scores that I set as follows:
scores <- NA
Then I created the package with the scores data set.
Then I used the following to change the data set from within R.
loc <- paste0(find.package("new"), "/Data")
unlink(paste0(loc, "/scores.rda"), recursive = TRUE, force = FALSE)
scores <- 10
save(scores, file=paste0(loc, "/scores.rda"))
Then when I unloaded the library and re loaded agin the data set now says:
> scores
[1] 10
Could this be modified to do what you want? You'd have to have it save in between somehow but am not sure on how to do this without messing with .Last function.
EDIT:
It appears this option is not viable in that when you compile as a package and use lazy load it saves the data sets as:
RData.rbd, RData.rbx, not as .rda files. That means the approach I use above is kinda worthless in that we want it to automatically be recognized.
EDIT2
This approach works and I tried it on a package I made. You can't do lazy load of the data and you have to either explicitly use data(scores) or use data(scores) inside of the function you're calling. I also assigned scores to .scores int he global.env the first time it was created and used exists inside the function to see if it exists. If `.scores. existed I assigned that to scores within the function. Once you unload the library and laod again you never have to worry about that again.
Maybe an alternative is to save this as a function somehow that can be altered using Josh's advice here: Permanently replacing a function

I guess there is no way to store settings without saving them to disk or a database, some way or another. It can be done silently though by putting the code below in your ~/.Rprofile. However, if you have packages that save settings in other ways than using options you need to add them manually.
I know this is exactly what you said you did not want, but it might spark some debate at least.
.Last <- function(){
my.options <- options()
save(my.options, file="~/.Roptions.Rdata")
}
.First <- function(){
tryCatch({
load("~/.Roptions.Rdata")
do.call(options, my.options)
rm(my.options)
}, error=function(...){})
}
To my suprise try(..., silent=TRUE) gives a warning on startup if ~/.Roptions.Rdata does not exist, which is why I used tryCatch instead.

The modern answer to this problem is well explained at https://blog.r-hub.io/2020/03/12/user-preferences/
I think I will be trying the hoardr package! Here is an example that worked for me :)
x <- hoardr::hoard()
x$cache_path_set("yourpackage", type = 'user_cache_dir')
x$mkdir()
scores<-data.frame(
user=c("one","two","three"),
score=c("500,200,1100")
)
save(scores,file = file.path(x$cache_path_get(), "scores.rdata"))
x$list()
x$details()
#new session
x <- hoardr::hoard()
x$cache_path_set("yourpackage", type = 'user_cache_dir')
x$list()
x$details()
load(file = file.path(x$cache_path_get(), "scores.rdata"))
PS - you can see a working example in the rnoaa package found on at github "opensci/rnoaa". Check their R/onload.r file! I can expand if needed.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Correct way to handle dataset dependencies in package development? - r

Related

Determining which version of a function is active when many packages are loaded

How do I load data from another package from within my package

Choose function to load from an R package

Uninstall (remove) R package with dependencies

Save package settings between sessions

Categories

Resources