In R can I save loaded packages with the workspace? - r

R normally only saves objects in .GlobalEnv:
$ R
> library(rjson)
> fromJSON
function (...) ...
> q(save='yes')
$ R
> fromJSON
Error: object 'fromJSON' not found
Is there a way to have this information saved as well?

You are now able to save R session information to a file and load it in another session.
First install the "session" package:
install.packages('session')
Load your libraries and your data, then save the session state to a file:
library(session)
library(ggplot2) # plotting
test <- 100
save.session(file='test.Rda')
Then you can load the session state in another session:
library(session)
restore.session(file='test.Rda')
#ggplot2 (and associated data) should have loaded with the session data
head(diamonds)
ggplot(data = diamonds, aes(x = carat)) +
geom_histogram()
print(test)
Reference: https://www.rdocumentation.org/packages/session/versions/1.0.3/topics/save.session

To the best of my knowledge, no. The workspace is for objects like data and functions. Starting R with particular packages loaded is what your .Rprofile file is for, and you can have a different one in each directory.
You could, I suppose, save a function in the workspace that loads the packages you want, and then run that function when you first start R.

joran is right, but I want to mention a technique that, while cumbersome, might be helpful.
You can use a checkpointing program such as DMTCP to save the entire R process and restart it later.

I'd recommend not saving anything between r sessions and instead recreate it all using code. This is much more likely to lead to reproducible results.

Related

Loading libraries for use in an specific environment in R

I have written some functions to facilitate repeated tasks among my R projects. I am trying to use an environment to load them easily but also prevent them from appearing when I use ls() or delete them with rm(list=ls()).
As a dummy example I have an environment loader function in a file that I can just source from my current project and an additional file for each specialized environment I want to have.
currentProject.R
environments/env_loader.R
environments/colors_env.R
env_loader.R
.environmentLoader <- function(env_file, env_name='my_env') {
sys.source(env_file, envir=attach(NULL, name=env_name))
}
path <- dirname(sys.frame(1)$ofile) # this script's path
#
# Automatically load
.environmentLoader(paste(path, 'colors_env.R', sep='/'), env_name='my_colors')
colors_env.R
library(RColorBrewer) # this doesn't work
# Return a list of colors
dummyColors <- function(n) {
require(RColorBrewer) # This doesn't work
return(brewer.pal(n, 'Blues'))
}
CurrentProject.R
source('./environments/env_loader.R')
# Get a list of 5 colors
dummyColors(5)
This works great except when my functions require me to load a library. In my example, I need to load the RColorBrewer library to use the brewer.pal function in colors_env.R, but the way is now I just get an error Error in brewer.pal(n, "Blues") : could not find function "brewer.pal".
I tried just using library(RColorBrewer) or using require inside my dummyColors function or adding stuff like evalq(library("RColorBrewer"), envir=parent.env(environment())) to the colors_env.R file but it doesn't work. Any suggestions?
If you are using similar functions across projects, I would recommend creating an R package. It's essentially what you're doing in many ways, but you don't have reinvent a lot of the loading mechanisms, etc. Hadley Wickham's book R Packages is very good for this topic. It doesn't need to be a completely fully built out, CRAN ready sort of thing. You can just create a personal package with misc. functions you frequently use.
That being said, the solution for your specific question would be to explicitly use the namespace to call the function.
dummyColors <- function(n) {
require(RColorBrewer) # This doesn't work
return(RColorBrewer::brewer.pal(n, 'Blues'))
}
Create a package and then run it. Use kitten to build the boilerplate, copy your file to it, optionally build it if you want a .tar.gz file or omit that step if you don't need it and finally install it. Then test it out. We have assumed colors_env.R, shown in the question, is in current directory.
(Note that require should always be within an if so that if it does not load then the error is caught. If not within an if use library which will guarantee an error message in that case.)
# create package
library(devtools)
library(pkgKitten)
kitten("colors")
file.copy("colors_env.R", "./colors/R")
build("colors") # optional = will create colors_1.0.tar.gz
install("colors")
# test
library(colors)
dummyColors(5)
## Loading required package: RColorBrewer
## [1] "#EFF3FF" "#BDD7E7" "#6BAED6" "#3182BD" "#08519C"

How do we set constant variables while building R packages?

We are building a package in R for our service (a robo-advisor here in Brazil) and we send requests all the time to our external API inside our functions.
As it is the first time we build a package we have some questions. :(
When we will use our package to run some scripts we will need some information as api_path, login, password.
How do we place this information inside our package?
Here is a real example:
get_asset_daily <- function(asset_id) {
api_path <- "https://api.verios.com.br"
url <- paste0(api_path, "/assets/", asset_id, "/dailies?asc=d")
data <- fromJSON(url)
data
}
Sometimes we use a staging version of the API and we have to constantly switch paths. How we should call it inside our function?
Should we set a global environment variable, a package environment variable, just define api_path in our scripts or a package config file?
How do we do that?
Thanks for your help in advance.
Ana
One approach would be to use R's options interface. Create a file zzz.r in the R directory (this is the customary name for this file) with the following:
.onLoad <- function(libname, pkgname) {
options(api_path='...', username='name', password='pwd')
}
This will set these options when the package is loaded into memory.

Package that downloads data from the internet during installation

Is anyone aware of a package that downloads a dataset from the internet during the installation process and then prepares and saves it so that it is available when loading the package using library(packageName)? Are there any drawbacks in this approach (besides the obvious one that package installation will fail if the data source is unavailable or the data format has changed)?
EDIT: Some background. The data is three tab-separated files in a ZIP archive, owned by federal statistics and generally freely accessible. I have R code which downloads, extracts and prepares the data, in the end three data frames are created which could be saved in .RData format.
I am thinking about creating two packages: A "data" package that provides the data, and a "code" package that operates on it.
I did this mockup before, while you were posting your edit. I presume it would work, but not tested. I've commented it so you can see what you would need to change. The idea here is to check to see if an expected object is available in the current working environment. If it is not, check to see that the file that the data can be found in is in the current working directory. If that is not found, prompt the user to download the file, then proceed from there.
myFunction <- function(this, that, dataset) {
# We're giving the user a chance to specify the dataset.
# Maybe they have already downloaded it and saved it.
if (is.null(dataset)) {
# Check to see if the object is already in the workspace.
# If it is not, check to see whether the .RData file that
# contains the object is in the current working directory.
if (!exists("OBJECTNAME", where = 1)) {
if (isTRUE(list.files(
pattern = "^DATAFILE.RData$") == "DATAFILE.RData")) {
load("DATAFILE.RData")
# If neither of those are successful, prompt the user
# to download the dataset.
} else {
ans = readline(
"DATAFILE.RData dataset not found in working directory.
OBJECTNAME object not found in workspace. \n
Download and load the dataset now? (y/n) ")
if (ans != "y")
return(invisible())
# I usually use RCurl in case the URL is https
require(RCurl)
baseURL = c("http://some/base/url/")
# Here, we actually download the data
temp = getBinaryURL(paste0(baseURL, "DATAFILE.RData"))
# Here we load the data
load(rawConnection(temp), envir=.GlobalEnv)
message("OBJECTNAME data downloaded from \n",
paste0(baseURL, "DATAFILE.RData \n"),
"and added to your workspace\n\n")
rm(temp, baseURL)
}
}
dataset <- OBJECTNAME
}
TEMP <- dataset
## Other fun stuff with TEMP, this, and that.
}
Two packages, hosted at Github
Here's another approach, building on the comments between #juba and I. The basic concept is to have, as you describe, one package for the codes and one for the data. This function would be part of the package that contains your code. It will:
Check to see if the data package is installed
Check to see if the version of the data package you have installed matches the version at Github, which we are going to assume is the most up to date version.
When it fails any of the checks, it asks the user if they want to update their installation of the package. In this case, for demonstration, I've linked to one of my packages in progress at Github. This should give you an idea of what you need to substitute to get it to work with your own package once you've hosted it there.
CheckVersionFirst <- function() {
# Check to see if installed
if (!"StataDCTutils" %in% installed.packages()[, 1]) {
Checks <- "Failed"
} else {
# Compare version numbers
require(RCurl)
temp <- getURL("https://raw.github.com/mrdwab/StataDCTutils/master/DESCRIPTION")
CurrentVersion <- gsub("^\\s|\\s$", "",
gsub(".*Version:(.*)\\nDate.*", "\\1", temp))
if (packageVersion("StataDCTutils") == CurrentVersion) {
Checks <- "Passed"
}
if (packageVersion("StataDCTutils") < CurrentVersion) {
Checks <- "Failed"
}
}
switch(
Checks,
Passed = { message("Everything looks OK! Proceeding!") },
Failed = {
ans = readline(
"'StataDCTutils is either outdated or not installed. Update now? (y/n) ")
if (ans != "y")
return(invisible())
require(devtools)
install_github("StataDCTutils", "mrdwab")
})
# Some cool things you want to do after you are sure the data is there
}
Try it out with CheckVersionFirst().
Note: This would succeed only if you religiously remember to update your version number in your description file every time you push a new version of the data to Github!
So, to clarify/recap/expand, the basic idea would be to:
Periodically push the updated version of your data package to Github, being sure to change the version number of the data package in its DESCRIPTION file when you do so.
Integrate this CheckVersionFirst() function as an .onLoad event in your code package. (Obviously modify the function to match your account and package name).
Change the commented line that reads # Some cool things you want to do after you are sure the data is there to reflect the cool things you actually want to do, which would probably start with library(YOURDATAPACKAGE) to load the data....
This method may not be efficient, but a good workaround. If you are making a package that needs regularly updated data, first make a package which has that data. It does not need any functions, but I like the concept of a setter (which you might not need in this case) & getter.
Then when you make your package, have the 'data'-package as a dependency. This way, whenever someone installs your package, he/she will always have the latest data.
On your part, you'll just have to swap out the data in your 'data' package, and upload it to the repo you want.
If you don't know how to build a package, check ?packages.skeleton and R CMD CHECK, R CMD BUILD

Employ environments to handle package-data in package-functions

I recently wrote a R extension. The functions use data contained in the package and must therefore load them. Subroutines also need to access the data.
This is the approach taken:
main<- function(...){
data(data)
sub <- function(...,data=data){...}
...
}
I'm unhappy with the fact that the data resides in .GlobalEnv so it still hangs around when the function had terminated (also undermining the downpassing via argument concept).
Please put me on the right track! How do you employ environments, when you have to handle package-data in package-functions?
It looks that you are looking for the LazyData directive in your namepace:
LazyData: yes
Othewise, data has the envir argument you can use to control in which environment you want to load your data, so for example if you wanted the data to be loaded inside main, you could use :
main<- function(...){
data(data, envir = environment() )
sub <- function(...,data=data){...}
...
}
If the data is needed for your functions, not for the user of the package, it should be saved in a file called sysdata.rda located in the R directory.
From R extensions:
Two exceptions are allowed: if the R subdirectory contains a file
sysdata.rda (a saved image of R objects: please use suitable
compression as suggested by tools::resaveRdaFiles) this will be
lazy-loaded into the namespace/package environment – this is intended
for system datasets that are not intended to be user-accessible via
data.

Save package settings between sessions

Is there a definitive way to save options or information pertaining to a certain package between sessions?
For example say somebody made a game and released it as an R package. If they wanted to save high scores and not have them reset each time R started a new session what would be the best way to do this? Currently I can only think of storing a file in the users home directory but I'm not sure if I like that approach.
This may be an approach. I created a dummy package with a dummy function (any function I create is bound to be a dummy function) and a data set I called scores that I set as follows:
scores <- NA
Then I created the package with the scores data set.
Then I used the following to change the data set from within R.
loc <- paste0(find.package("new"), "/Data")
unlink(paste0(loc, "/scores.rda"), recursive = TRUE, force = FALSE)
scores <- 10
save(scores, file=paste0(loc, "/scores.rda"))
Then when I unloaded the library and re loaded agin the data set now says:
> scores
[1] 10
Could this be modified to do what you want? You'd have to have it save in between somehow but am not sure on how to do this without messing with .Last function.
EDIT:
It appears this option is not viable in that when you compile as a package and use lazy load it saves the data sets as:
RData.rbd, RData.rbx, not as .rda files. That means the approach I use above is kinda worthless in that we want it to automatically be recognized.
EDIT2
This approach works and I tried it on a package I made. You can't do lazy load of the data and you have to either explicitly use data(scores) or use data(scores) inside of the function you're calling. I also assigned scores to .scores int he global.env the first time it was created and used exists inside the function to see if it exists. If `.scores. existed I assigned that to scores within the function. Once you unload the library and laod again you never have to worry about that again.
Maybe an alternative is to save this as a function somehow that can be altered using Josh's advice here: Permanently replacing a function
I guess there is no way to store settings without saving them to disk or a database, some way or another. It can be done silently though by putting the code below in your ~/.Rprofile. However, if you have packages that save settings in other ways than using options you need to add them manually.
I know this is exactly what you said you did not want, but it might spark some debate at least.
.Last <- function(){
my.options <- options()
save(my.options, file="~/.Roptions.Rdata")
}
.First <- function(){
tryCatch({
load("~/.Roptions.Rdata")
do.call(options, my.options)
rm(my.options)
}, error=function(...){})
}
To my suprise try(..., silent=TRUE) gives a warning on startup if ~/.Roptions.Rdata does not exist, which is why I used tryCatch instead.
The modern answer to this problem is well explained at https://blog.r-hub.io/2020/03/12/user-preferences/
I think I will be trying the hoardr package! Here is an example that worked for me :)
x <- hoardr::hoard()
x$cache_path_set("yourpackage", type = 'user_cache_dir')
x$mkdir()
scores<-data.frame(
user=c("one","two","three"),
score=c("500,200,1100")
)
save(scores,file = file.path(x$cache_path_get(), "scores.rdata"))
x$list()
x$details()
#new session
x <- hoardr::hoard()
x$cache_path_set("yourpackage", type = 'user_cache_dir')
x$list()
x$details()
load(file = file.path(x$cache_path_get(), "scores.rdata"))
PS - you can see a working example in the rnoaa package found on at github "opensci/rnoaa". Check their R/onload.r file! I can expand if needed.

Resources