"Found the following calls to data() loading into the global environment" - r

In my R-package, I have:
f <- function()
{
data('MyDataSet') # Load a dataset in my own package
... # Use MyDataSet to return something
}
The package builder has a warning message:
Found the following calls to data() loading into the global environment
What is the easiest way to fix the problem? Can I just load the data set into a variable? I don't need to save it to the global environment.

CHECK
I tested this and I could built the package without problems.
What gave me your note (not a warning, not a error) was actually performing the check() for the package.
PROBLEM
This is first of all important if you want to put the package on CRAN. Since the package will be very likely rejected if you do not have 0 notes, 0 warning, 0 errors.
If you just want to use the package for your own, you could also just leave it as it is. Since the check looks for coding guidelines and performs other useful things it indeed also may make sense to fix this for your private package.
FIX
One solution could be to include this dataset in your package itself.
You have to create a folder called data in your package to do this. Add the dataset as .rda file there. I think in your package description LazyData: TRUE has also to be set. (think this is the default).
Now you can write the following:
f <- function()
{
x <- MyPackageName::MyDataSet
... # Use MyDataSet to return something
}

Related

Loading libraries for use in an specific environment in R

I have written some functions to facilitate repeated tasks among my R projects. I am trying to use an environment to load them easily but also prevent them from appearing when I use ls() or delete them with rm(list=ls()).
As a dummy example I have an environment loader function in a file that I can just source from my current project and an additional file for each specialized environment I want to have.
currentProject.R
environments/env_loader.R
environments/colors_env.R
env_loader.R
.environmentLoader <- function(env_file, env_name='my_env') {
sys.source(env_file, envir=attach(NULL, name=env_name))
}
path <- dirname(sys.frame(1)$ofile) # this script's path
#
# Automatically load
.environmentLoader(paste(path, 'colors_env.R', sep='/'), env_name='my_colors')
colors_env.R
library(RColorBrewer) # this doesn't work
# Return a list of colors
dummyColors <- function(n) {
require(RColorBrewer) # This doesn't work
return(brewer.pal(n, 'Blues'))
}
CurrentProject.R
source('./environments/env_loader.R')
# Get a list of 5 colors
dummyColors(5)
This works great except when my functions require me to load a library. In my example, I need to load the RColorBrewer library to use the brewer.pal function in colors_env.R, but the way is now I just get an error Error in brewer.pal(n, "Blues") : could not find function "brewer.pal".
I tried just using library(RColorBrewer) or using require inside my dummyColors function or adding stuff like evalq(library("RColorBrewer"), envir=parent.env(environment())) to the colors_env.R file but it doesn't work. Any suggestions?
If you are using similar functions across projects, I would recommend creating an R package. It's essentially what you're doing in many ways, but you don't have reinvent a lot of the loading mechanisms, etc. Hadley Wickham's book R Packages is very good for this topic. It doesn't need to be a completely fully built out, CRAN ready sort of thing. You can just create a personal package with misc. functions you frequently use.
That being said, the solution for your specific question would be to explicitly use the namespace to call the function.
dummyColors <- function(n) {
require(RColorBrewer) # This doesn't work
return(RColorBrewer::brewer.pal(n, 'Blues'))
}
Create a package and then run it. Use kitten to build the boilerplate, copy your file to it, optionally build it if you want a .tar.gz file or omit that step if you don't need it and finally install it. Then test it out. We have assumed colors_env.R, shown in the question, is in current directory.
(Note that require should always be within an if so that if it does not load then the error is caught. If not within an if use library which will guarantee an error message in that case.)
# create package
library(devtools)
library(pkgKitten)
kitten("colors")
file.copy("colors_env.R", "./colors/R")
build("colors") # optional = will create colors_1.0.tar.gz
install("colors")
# test
library(colors)
dummyColors(5)
## Loading required package: RColorBrewer
## [1] "#EFF3FF" "#BDD7E7" "#6BAED6" "#3182BD" "#08519C"

Extended R package can't correctly communicate with its 'parent' R package

I am trying to build a package that extends another package. However at its most basic level I am doing something wrong. I build a simple example that presents the same issue:
I have two packages, packageA and packageB. packageA has a single R file in the R folder that reads:
local.env.A <- new.env()
setVal <- function()
{
local.env.A$test <- 1
}
getVal <- function()
{
if(!exists("test", envir = local.env.A)) stop("test does not exist")
return(local.env.A$test)
}
For packageB I have the following single R file in the R folder:
# refers to package A
setVal()
getValinA <- function()
{
return(getVal())
}
I want both packageA and packageB to be available for end users, therefore I set packageB to depend on packageA (in the description file). When packageB is loaded, e.g. by means of library(packageB) I expect it to run setVal() and thus set the test value. However, if I next try to get the value that was set by means of getValinA(), it throws me the stop:
> library(packageB)
Loading required package: PackageA
> getValinA()
Error in getVal() : test does not exist
I am pretty sure it is related to environments, but I am not sure how. Please help!
With thanks to #Roland. The answer was very simple. I was under the impression (assumptions assumptions assumptions!) that when you perform library(packageB) it would load all the actions within it, in my case perform the setVal() function. This is however not the case. If you wish this function to be performed you need to place this within the function .onLoad:
.onLoad <- function(libname, pkgname)
{
setVal()
}
By convention you place this .onload function in an R file called zzz.R. Reason being that if you do not specifically collate your R scripts it will load alphabetically, and it makes sense to perform your actions when at least all the functions in your package are loaded.

Why does using "<<-" in a function in the global workspace work, but not in a package?

I'm creating a package using devtools and roxygen2 (in RStudio), however after I've built the package my function no longer works as intended. Yet, if I load the function's .R file and run the function from there in RStudio, it works perfectly. I've created another package using this method before and it worked fine (13 functions all working as intended from my other package), yet I cant seem to get this new one to work.
To start creating the package I start with:
library("devtools")
devtools::install_github("klutometis/roxygen")
library(roxygen2)
setwd("my parent directory")
create("triale")
All is working fine so far. So I put my .R file containing my function in the R folder under the triale folder. The .R file looks like this:
#' Trial Z Function
#'
#' This function counts the values in the columns
#' #param x is the number
#' #keywords x
#' #export
#' #examples
#' trialz()
trialz = function(x) {w_id= c(25,x,25,25,25,1,1,1,1,1);
wcenter= c(rep("BYSTAR-1",10));
df1 <<- data.frame(w_id, wcenter);
countit <<- data.table(df1);
view <<- countit[, .N, by = list(w_id, wcenter)];
View(view)}
Again if I were to just run the code from the .R file, and test the function it works fine. But to continue, next I enter:
setwd("./triale")
document()
The triale documentation is updated, triale is loaded, and the NAMESPACE and trialz.Rd are both written so that trialz.Rd is under the man folder, and NAMESPACE is under the triale folder as intended. Next I install triale:
setwd("..")
install("triale")
Which I know works because I get the following:
Installing triale
"C:/PROGRA~1/R/R-31~1.3/bin/x64/R" --vanilla CMD INSTALL \
"C:/Users/grice/Documents/R/triale" \
--library="C:/Users/grice/Documents/R/win-library/3.1" --install-tests
* installing *source* package 'triale' ...
** R
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (triale)
Reloading installed triale
Package is now built, so I do the following:
library("triale")
library("data.table")
Note whenever I load the package data.table I get the following error message:
data.table 1.9.4 For help type: ?data.table
*** NB: by=.EACHI is now explicit. See README to restore previous behaviour.
However it doesnt seem to affect my function. So now its time to test my function from my package:
trialz(25)
This goes through, and I of course get a populated df1, and countit, but for whatever reason view is always empty (as in 0 obs. of 0 variables).
So I test my work using the dummy code below:
>trialy = function(x) {wid= c(25,x,25,25,25,1,1,1,1,1);
wc= c(rep("BYSTAR-1",10));
df2 <<- data.frame(wid, wc);
countitt <<- data.table(df2);
viewer <<- countitt[, .N, by = list(wid, wc)];
View(viewer)}
>trialy(25)
Even though this is the same exact code with just the names changed around it works. Dumbfounded I open trialz.R and copy the function from there and run it as below, and that works:
> trialz = function(x) {w_id= c(25,x,25,25,25,1,1,1,1,1);
wcenter= c(rep("BYSTAR-1",10));
df1 <<- data.frame(w_id, wcenter);
countit <<- data.table(df1);
view <<- countit[, .N, by = list(w_id, wcenter)];
View(view)}
> trialz(25)
Since I've created a package before I know my method is solid (that package had 13 dif. functions, all of which worked). I just don't understand how a function can work fine as written, yet when I package it, the function no longer works.
Again here is where it stops working as intended when using my package:
view <<- countit[, .N, by = list(w_id, wcenter)];
View(view)}
And my end result should look something like this, if my package worked:
wid wc N
1 25 BYSTAR-1 5
2 1 BYSTAR-1 5
Can anyone explain why view is never populated after I package my function? I've tested it as much as I know how, and my results should be reproducible for anyone thats willing to try it for themselves.
Thanks, I appreciate any feedback.
Your problem here is that "<<-" does not create variables in the global environment but rather in the parent environment. (See help("<<-").)
The parent environment of a function is the environment in which it has been defined. In the case where you defined your function directly in your workspace, this parent environment actually is the same as your workspace environment (namely: .GlobalEnv), which is why your variables are assigned values as you expect them to. In the case where your function is packaged, however, the parent environment is the package environment and not the .GlobalEnv! This is why you do not see your variables being assigned values in your workspace.
Refer to the chapter on environments in Hadley's book and How R Searches and Finds Stuff for more details on environments in R.
Note that doing this would not be considered a proper debugging technique, to say the least. In general, you never want to use the "<<-" operator.
For options on debugging R code, see, e.g., this question. I, in particular, like the debugonce function very well. See ?debugonce.
I forgot one important part when editing my description file in that I for got to add
Imports: data.table
Also the NAMESPACE file needed to include the data.table package as an import as well, like so:
import(data.table)
export(Z)
export(AS) .... etc.
Doing this ensures that whenever a function within your package uses a function from another package, that (second) package is called up before your code is executed.

Save package settings between sessions

Is there a definitive way to save options or information pertaining to a certain package between sessions?
For example say somebody made a game and released it as an R package. If they wanted to save high scores and not have them reset each time R started a new session what would be the best way to do this? Currently I can only think of storing a file in the users home directory but I'm not sure if I like that approach.
This may be an approach. I created a dummy package with a dummy function (any function I create is bound to be a dummy function) and a data set I called scores that I set as follows:
scores <- NA
Then I created the package with the scores data set.
Then I used the following to change the data set from within R.
loc <- paste0(find.package("new"), "/Data")
unlink(paste0(loc, "/scores.rda"), recursive = TRUE, force = FALSE)
scores <- 10
save(scores, file=paste0(loc, "/scores.rda"))
Then when I unloaded the library and re loaded agin the data set now says:
> scores
[1] 10
Could this be modified to do what you want? You'd have to have it save in between somehow but am not sure on how to do this without messing with .Last function.
EDIT:
It appears this option is not viable in that when you compile as a package and use lazy load it saves the data sets as:
RData.rbd, RData.rbx, not as .rda files. That means the approach I use above is kinda worthless in that we want it to automatically be recognized.
EDIT2
This approach works and I tried it on a package I made. You can't do lazy load of the data and you have to either explicitly use data(scores) or use data(scores) inside of the function you're calling. I also assigned scores to .scores int he global.env the first time it was created and used exists inside the function to see if it exists. If `.scores. existed I assigned that to scores within the function. Once you unload the library and laod again you never have to worry about that again.
Maybe an alternative is to save this as a function somehow that can be altered using Josh's advice here: Permanently replacing a function
I guess there is no way to store settings without saving them to disk or a database, some way or another. It can be done silently though by putting the code below in your ~/.Rprofile. However, if you have packages that save settings in other ways than using options you need to add them manually.
I know this is exactly what you said you did not want, but it might spark some debate at least.
.Last <- function(){
my.options <- options()
save(my.options, file="~/.Roptions.Rdata")
}
.First <- function(){
tryCatch({
load("~/.Roptions.Rdata")
do.call(options, my.options)
rm(my.options)
}, error=function(...){})
}
To my suprise try(..., silent=TRUE) gives a warning on startup if ~/.Roptions.Rdata does not exist, which is why I used tryCatch instead.
The modern answer to this problem is well explained at https://blog.r-hub.io/2020/03/12/user-preferences/
I think I will be trying the hoardr package! Here is an example that worked for me :)
x <- hoardr::hoard()
x$cache_path_set("yourpackage", type = 'user_cache_dir')
x$mkdir()
scores<-data.frame(
user=c("one","two","three"),
score=c("500,200,1100")
)
save(scores,file = file.path(x$cache_path_get(), "scores.rdata"))
x$list()
x$details()
#new session
x <- hoardr::hoard()
x$cache_path_set("yourpackage", type = 'user_cache_dir')
x$list()
x$details()
load(file = file.path(x$cache_path_get(), "scores.rdata"))
PS - you can see a working example in the rnoaa package found on at github "opensci/rnoaa". Check their R/onload.r file! I can expand if needed.

Search all existing functions for package dependencies?

I have a package that I wrote while learning R and its dependency list is quite long. I'm trying to trim it down, for two cases:
I switched to other approaches, and packages listed in Suggests simply aren't used at all.
Only one function out of my whole package relies on a given dependency, and I'd like to switch to an approach where it is loaded only when needed.
Is there an automated way to track down these two cases? I can think of two crude approaches (download the list of functions in all the dependent packages and automate a text search for them through my package's code, or load the package functions without loading the required packages and execute until there's an error), but neither seems particularly elegant or foolproof....
One way to check dependancies in all functions is to use the byte compiler because that will check for functions being available in the global workspace and issue a notice if it does not find said function.
So if you as an example use the na.locf function from the zoo package in any of your functions and then byte compile your function you will get a message like this:
Note: no visible global function definition for 'na.locf'
To correctly address it for byte compiling you would have to write it as zoo::na.locf
So a quick way to test all R functions in a library/package you could do something like this (assuming you didn't write the calls to other functions with the namespace):
Assuming your R files with the functions are in C:\SomeLibrary\ or subfolders there of and then you define a sourceing file as C:\SomeLibrary.r or similar containing:
if (!(as.numeric(R.Version()$major) >=2 && as.numeric(R.Version()$minor) >= 14.0)) {
stop("SomeLibrary needs version 2.14.0 or greater.")
}
if ("SomeLibrary" %in% search()) {
detach("SomeLibrary")
}
currentlyInWorkspace <- ls()
SomeLibrary <- new.env(parent=globalenv())
require("compiler",quietly=TRUE)
pathToLoad <- "C:/SomeLibraryFiles"
filesToSource <- file.path(pathToLoad,dir(pathToLoad,recursive=TRUE)[grepl(".*[\\.R|\\.r].*",dir(pathToLoad,recursive=TRUE))])
for (filename in filesToSource) {
tryCatch({
suppressWarnings(sys.source(filename, envir=SomeLibrary))
},error=function(ex) {
cat("Failed to source: ",filename,"\n")
print(ex)
})
}
for(SomeLibraryFunction in ls(SomeLibrary)) {
if (class(get(SomeLibraryFunction,envir=SomeLibrary))=="function") {
outText <- capture.output(with(SomeLibrary,assign(SomeLibraryFunction,cmpfun(get(SomeLibraryFunction)))))
if(length(outText)>0){
cat("The function ",SomeLibraryFunction," produced the following compile note(s):\n")
cat(outText,sep="\n")
cat("\n")
}
}
}
attach(SomeLibrary)
rm(list=ls()[!ls() %in% currentlyInWorkspace])
invisible(gc(verbose=FALSE,reset=TRUE))
Then start up R with no preloaded packages and source in C:\SomeLibrary.r
And then you should get notes from cmpfun for any call to a function in a package that's not part of the base packages and doesn't have a fully qualified namespace defined.

Resources