R Package Build/Install Error: "object not found" even though I have it in R/sysdata.rda - r

Similar Question
accessing sysdata.rda within package functions
Why This Similar Question Does Not Apply To Me
They were able to actually build it, and apparently it was a Github error for them (not related)
R VERSION
3.4.2 (I tried using 3.4.3 as well but the same problem occurred)
EDIT: I am on Windows 10
Context
I have fully read the following tutorial on R packages and how to include .Rda files in them. I have LazyData in my DESCRIPTION file set as true as well. I have tried both the data/ folder implementation and the R/sysdata.rda implementation using the function devtools::use_data() with the respective options of internal = FALSE and internal = TRUE.
However, when I try to build the package, or use devtools::install (which builds as well I assume), it fails and gives me the following error message:
Error in predict(finalModel, newInput) : object 'finalModel' not found
Where finalModel is stored within my .rda file.
Does anyone know any possible reasons why this might occur?
I also asked a coworker to install the package on his machine, but unfortunately he got the exact same error.
I made another test package as a 'sanity-check' by creating a simple linear model using the lm() function on datasets::swiss, and then made a test package with this newly created model as a .rda file. When I referenced this test model in a function within this test package, it eerily worked, despite the fact that (to the best of my knowledge) I used the exact same steps to create this new R package.
Also, I unfortunately cannot share the code for the package I am creating, but I am willing to share the code for the test package that uses the swiss dataset.
Thank you in advance.
EDIT: My .rda file I am putting in the package was created last year, if that has anything to do with it.

I just solved a similar issue of having object 'objectName' not found that arose during package management. In my case, the issue related to losing the context of variables when working with parallelization.
When using parallel::clusterExport(cl, varlist=c("function-name")), clusterExport looks at .GlobalEnv for variable definitions. This wouldn't come up during debugging, as I always the variables defined in .GlobalEnv. The solution was to state the environment explicitly: parallel::clusterExport(cl, varlist=c("function-name"), envir=environment()). This ensures that the parallel processes have context of the variables within the data/ folder and R/sysdata.rda.
Source

If you have more than one internal file, you must save them together:
usethis::use_data(file_1,
file_2,
file_3,
internal = TRUE,
overwrite = TRUE)

Related

Automatic loading of data from sysdata.rda in package

I have spent a lot of time searching for an answer to what is probably a very basic question, but I just can't find the solution to my issue. The closest that I found was this exchange from a few years ago.
In that case, the issue was the location of the sysdata.rda file in the correct directory within the package. That is not my issue.
I have some variables that store things like color palettes that I amusing inside a package. These variables are only used inside my functions so I storing them in R/sysdata.rda. However, when I load the packages, the variables are not loading into the package environment. If I load the data manually from sysdata.rda then everything works fine.
My impression from reading everything that I could find on internal data in R packages was that the data in R/sysdata.rda would load automatically.
Here is the code that I am using to store my data.
devtools::use_data(tmpBrks, tmpColors, prcpBrks, prcpChgBrks,
prcpChgBrkLabels, prcpColors, prcpChgColors,
internal = TRUE, overwrite = TRUE)
That successfully creates the data file at R/sysdata.rda and the data is in the file when I load it manually.
What do I need to do to have the data load automatically so the functions in my package can use them?
As usual, this was a bad combination of user ignorance and poor R documentation. The data was being loaded and was available to the functions. Where I went wrong was in assuming that the data would be visible in the package environment. That is not the case.
As far as I can tell, internal data in the R\sysdata.rda file is available to the functions within the package, but not visible in any way. After I created the internal data file I was looking for the data in the package environment. When I didn't see it I assumed that it wasn't loaded. When I kept pushing forward with my package development I finally realized that the data was loading silently and accessible to the functions in the package.
As evidenced by the two up votes that my question got, I am not the only one who didn't understand the behavior of the R\sysdata.rda internal data. Hopefully this explanation will save someone else a bunch of time searching for an answer to this issue that doesn't really exist.

How to call R script from another R script, both in same package?

I'm building a package that uses two main functions. One of the functions model.R requires a special type of simulation sim.R and a way to set up the results in a table table.R
In a sharable package, how do I call both the sim.R and table.R files from within model.R? I've tried source("sim.R") and source("R/sim.R") but that call doesn't work from within the package. Any ideas?
Should I just copy and paste the codes from sim.R and table.R into the model.R script instead?
Edit:
I have all the scripts in the R directory, the DESCRIPTION and NAMESPACE files are all set. I just have multiple scripts in the R directory. ~R/ has premodel.R model.R sim.R and table.R. I need the model.R script to use both sim.R and table.R functions... located in the same directory in the package (e.g. ~R/).
To elaborate on joran's point, when you build a package you don't need to source functions.
For example, imagine I want to make a package named TEST. I will begin by generating a directory (i.e. folder) named TEST. Within TEST I will create another folder name R, in that folder I will include all R script(s) containing the different functions in the package.
At a minimum you need to also include a DESCRIPTION and NAMESPACE file. A man (for help files) and tests (for unit tests) are also nice to include.
Making a package is pretty easy. Here is a blog with a straightforward introduction: http://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/
As others have pointed out you don't have to source R files in a package. The package loading mechanism will take care of losing the namespace and making all exported functions available. So usually you don't have to worry about any of this.
There are exceptions however. If you have multiple files with R code situations can arise where the order in which these files are processed matters. Often it doesn't matter or the default order used by R happens to be fine. If you find that there are some dependencies within your package that aren't resolved properly you may be faced with a situation where a custom processing order for the R files is required. The DESCRIPTION file offers the optional Collate field for this purpose. Simply list all your R files in the order they should be processed to satisfy the dependencies.
If all your files are in R directory, any function will be in memory after you do a package build or Load_All.
You may have issues if you have code in files that is not in a function tho.
R loads files in alphabetical order.
Usually, this is not a problem, because functions are evaluated when they are called for execution, not at loading time (id. a function can refer another function not yet defined, even in the same file).
But, if you have code outside a function in model.R, this code will be executed immediately at time of file loading, and your package build will fail usually with a
ERROR: lazy loading failed for package 'yourPackageName'
If this is the case, wrap the sparse code of model.R into a function so you can call it later, when the package has fully loaded, external library too.
If this piece of code is there for initialize some value, consider to use_data() to have R take care of load data into the environment for you.
If this piece of code is just interactive code written to test and implement the package itself, you should consider to put it elsewhere or wrap it to a function anyway.
if you really need that code to be executed at loading time or really have dependency to solve, then you must add the collate line into DESCRIPTION file, as already stated by Peter Humburg, to force R to load files order.
Roxygen2 can help you, put before your code
#' #include sim.R table.R
call roxygenize(), and collate line will be generate for you into the DESCRIPTION file.
But even doing that, external library you may depend are not yet loaded by the package, leading to failure again at build time.
In conclusion, you'd better don't leave code outside functions in a .R file if it's located inside a package.
Since you're building a package, the reason why you're having trouble accessing the other functions in your /R directory is because you need to first:
library(devtools)
document()
from within the working directory of your package. Now each function in your package should be accessible to any other function. Then, to finish up, do:
build()
install()
although it should be noted that a simple document() call will already be sufficient to solve your problem.
Make your functions global by defining them with <<- instead of <- and they will become available to any other script running in that environment.

R: What's actually loaded when library() is called?

Here is a snippet of R script doing beta regression on data "GasolineYield":
library("betareg")
data("GasolineYield", package = "betareg")
gy_logit <- betareg(yield ~ batch + temp, data = GasolineYield)
It works fine but if I run the code with the second line deleted, it errs with message:
Error in terms.formula(form, ...) : object 'GasolineYield' not found
But isn't the data.frame GasolineYield in the package betareg ? What's actually happening when I call library("betareg")? Aren't all the data inside the package automatically loaded into current environment? Could anybody help me understand the mechanism behind this?
For the most part, data is included in R packages for the purposes of providing examples and other stuff that is not mission critical. That is why datasets are not automatically loaded into the environment for most packages and you have to load them using the data() command. This is a good thing. It would be a waste of memory, time, and namespace for packages that primarily provide functions to load their data all the time when users don't use it very frequently.
When you load a package, only the stuff that is exported in the "NAMESPACE" file by the package designer is made available. And the "DESCRIPTION" file has a field called "LazyData" that determines the data behavior as well. By the way, packages often have functions in them that are for internal use as well and are not exported in the NAMESPACE file.
TL;DR, the package writer determines what stuff will be available when the package is loaded and they specify those items in the NAMESPACE and DESCRIPTION files.

Dynamically Generate Reference Classes

I'm attempting to generate reference classes within an R package on the fly, and it's proving to be fairly difficult. Here are the approaches I've taken and problems I've run into:
I'm creating a package in which I hope to be able to dynamically read in a schema and automatically generate an associated reference class (think SOAP). Of course, this means I won't be able to define my reference classes before-hand in the package sources.
I initially attempted to create a new class using a simple:
myClass <- setRefClass("NewClassName", fields=list(fieldA="character"))
which, of course, works fine when executed interactively, but when included in the package sources, I get a locked binding error. From my reading, it looks like this occurs because when running interactively, the class information is being stored in the global environment, which is not locked, while my package's base environment is locked.
I then found a thread that suggested using something to the effect of:
myClass <- setRefClass("NewClassName", fields=list(fieldA="character"), where=globalenv())
This actually crashed R/Studio when I tried to build the package, so I don't have a log of the error it generated, unfortunately, but it certainly didn't work.
Next I tried creating a new environment within my package which I could use to store these reference classes. So I added a .classEnv <- new.env() line in my package sources (not inside of any function) and then attempted to use this class when creating a new reference class:
myClass <- setRefClass("NewClassName", fields=list(fieldA="character"), where=.classEnv)
This actually seemed to work OK, but generates the following warning:
> myClass <- setRefClass("NewClassName", where=.classEnv)
Warning message:
In getPackageName(where) :
Created a package name, ‘2013-04-23 10:19:14’, when none found
So, for some reason, methods::getPackageName() isn't able to pick up which package my new environment is in?
Is there a way to create my new environment differently so that getPackageName() can properly recognize the package? Can I add some feature which allows me to help getPackageName() detect the package? Will this even work if I can deal with the warning, or am I misusing reference classes by trying to create them dynamically?
To get the conversation going, I found that getpackageName stores the package name in a hidden .packageName variable in the specified environment.
So you can actually get around the warning with
assign(".packageName", "MyPkg", envir=.classEnv)
myClass <- setRefClass("NewClassName", fields=classFields, where=.classEnv)
which resolves the warning, but the documentation says not to trust the .packageName variable indefinitely, and I still feel like I'm hacking this in and may be misunderstanding something important about reference classes and their relationship to environments.
Full details from documentation:
Package names are normally installed during loading of the package, by the INSTALL script or by the library function. (Currently, the name is stored as the object .packageName but don't trust this for the future.)
Edit:
After reading a little further, the setPackageName method may be a more reliable way to set the package name for the environment. Per the docs:
setPackageName can be used to establish a package name in an environment that would otherwise not have one. This allows you to create classes and/or methods in an arbitrary environment, but it is usually preferable to create packages by the standard R programming tools (package.skeleton, etc.)
So it looks like one valid solution would be the following:
setPackageName("MyPkg", .classEnv)
myClass <- setRefClass("NewClassName", fields=classFields, where=.classEnv)
That eliminates the warning message and doesn't rely on anything that's documented as unstable. I'm still not clear why it's necessary, but...

The cause of "bad magic number" error when loading a workspace and how to avoid it?

I tried to load my R workspace and received this error:
Error: bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file ‘WORKSPACE_Wedding_Weekend_September’ has magic number '#gets'
Use of save versions prior to 2 is deprecated
I'm not particularly interested in the technical details, but mostly in how I caused it and how I can prevent it in the future. Here's some notes on the situation:
I'm running R 2.15.1 on a MacBook Pro running Windows XP on a bootcamp partition.
There is something obviously wrong this workspace file, since it weighs in at only ~80kb while all my others are usually >10,000
Over the weekend I was running an external modeling program in R and storing its output to different objects. I ran several iterations of the model over the course of several days, eg output_Saturday <- call_model()
There is nothing special to the model output, its just a list with slots for betas, VC-matrices, model specification, etc.
I got that error when I accidentally used load() instead of source() or readRDS().
Also worth noting the following from a document by the R Core Team summarizing changes in versions of R after v3.5.0 (here):
R has new serialization format (version 3) which supports custom serialization of
ALTREP framework objects... Serialized data in format 3 cannot be read by versions of R prior to version 3.5.0.
I encountered this issue when I saved a workspace in v3.6.0, and then shared the file with a colleague that was using v3.4.2. I was able to resolve the issue by adding "version=2" to my save function.
Assuming your file is named "myfile.ext"
If the file you're trying to load is not an R-script, for which you would use
source("myfile.ext")
you might try the readRDSfunction and assign it to a variable-name:
my.data <- readRDS("myfile.ext")
The magic number comes from UNIX-type systems where the first few bytes of a file held a marker indicating the file type.
This error indicates you are trying to load a non-valid file type into R. For some reason, R no longer recognizes this file as an R workspace file.
Install the readr package, then use library(readr).
It also occurs when you try to load() an rds object instead of using
object <- readRDS("object.rds")
I got the error when saved with saveRDS() rather than save(). E.g. save(iris, file="data/iris.RData")
This fixed the issue for me. I found this info here
Also note that with save() / load() the object is loaded in with the same name it is initially saved with (i.e you can't rename it until it's already loaded into the R environment under the name it had when you initially saved it).
I had this problem when I saved the Rdata file in an older version of R and then I tried to open in a new one. I solved by updating my R version to the newest.
If you are working with devtools try to save the files with:
devtools::use_data(x, internal = TRUE)
Then, delete all files saved previously.
From doc:
internal If FALSE, saves each object in individual .rda files in the data directory. These are available whenever the package is loaded. If
TRUE, stores all objects in a single R/sysdata.rda file. These objects
are only available within the package.
This error occured when I updated my R and R Studio versions and loaded files I created under my prior version. So I reinstalled my prior R version and everything worked as it should.

Resources