I was wondering if it is possible to define an R package within another R package and achieving functionality of only being able to access the inner R package through the outer? Say that I am building an R package that corresponds to a db layer, and that within the db layer I have a lot of sql statements, is it possible to define a package db where functions would be accessed through, db::db1::function1() where db1 is the package that is defined within db, or would the approach be to use environments to bind together the functionality as I would want to have some logical coherence between the two? So that rather I would have db::db1$ function where the db1 is an environment defined in db.
Edited example:
main folder use case:
./folder1
./folder1a
./folder1b
./folder1c
...
./folder2
./folder3
...
The way I envisioned was that if each folder in the hierarchy that in turns contains dussins of scripts would be a package. How would I do the mapping to packages? What would be the alternative, that each folder1 folder2 etc are the packages and the functionality in folder1a folder1b etc would be tied to an environment that lives inside the package that would correspond to folder1. Which would allow accessing some function defined in a file inside folder1a say through folder1::folder1a::function_defined_in_script_in_1a.
Related
In my work I develop R packages that export R data objects (.RData). The name of these .RData files is always the same (e.g. files.RData). These packages also define and export a function that uploads the data to my database, say upload_data(). Inside upload_data() I first load the data using data(files, package = "PACKAGE NAME") and then push it into my database.
Let's say I have two packages, package1 and package2, which live on my file system. Given a vector of the package names (c("package1", "package2")), how would I go about to call 'upload_data()' programatically? Specifically, inside a script, how would I construct a call using "::" notation that constructs and evaluates a call like this: package1::upload_data()). I tried 'call' but couldn't get it right.
You could go the route of constructing the call using :: notation and evaluating that - but it's probably just easier to directly use get and specify the package you want to grab from.
get("upload_data", envir = asNamespace("package1"))
will return the function the same as using package1::upload_data would but is much easier to deal with programatically.
I'm building a package that contains a number of look up tables, hidden from the user by storing them in R/sysdata.rda. This works fine and I am able to reference them from internal package functions directly or via get.
Is there a way to get a vector of object names contained in sysdata.rda from within a function inside the package? What about as a user?
The behavior I am looking for would be similar to how ls lists the objects in an environment.
The way I use is to have an internal function to generate the sysdata.R It can then also generate the vector of names within the limited scope of the function. You can then add the list of names to sysdata.R itself.
Or, if it is more complicated, have the function save the tables in a new environment: you can then ls the new environment for the list, and save the contents into sysdata.R.
I'm building a package that uses two main functions. One of the functions model.R requires a special type of simulation sim.R and a way to set up the results in a table table.R
In a sharable package, how do I call both the sim.R and table.R files from within model.R? I've tried source("sim.R") and source("R/sim.R") but that call doesn't work from within the package. Any ideas?
Should I just copy and paste the codes from sim.R and table.R into the model.R script instead?
Edit:
I have all the scripts in the R directory, the DESCRIPTION and NAMESPACE files are all set. I just have multiple scripts in the R directory. ~R/ has premodel.R model.R sim.R and table.R. I need the model.R script to use both sim.R and table.R functions... located in the same directory in the package (e.g. ~R/).
To elaborate on joran's point, when you build a package you don't need to source functions.
For example, imagine I want to make a package named TEST. I will begin by generating a directory (i.e. folder) named TEST. Within TEST I will create another folder name R, in that folder I will include all R script(s) containing the different functions in the package.
At a minimum you need to also include a DESCRIPTION and NAMESPACE file. A man (for help files) and tests (for unit tests) are also nice to include.
Making a package is pretty easy. Here is a blog with a straightforward introduction: http://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/
As others have pointed out you don't have to source R files in a package. The package loading mechanism will take care of losing the namespace and making all exported functions available. So usually you don't have to worry about any of this.
There are exceptions however. If you have multiple files with R code situations can arise where the order in which these files are processed matters. Often it doesn't matter or the default order used by R happens to be fine. If you find that there are some dependencies within your package that aren't resolved properly you may be faced with a situation where a custom processing order for the R files is required. The DESCRIPTION file offers the optional Collate field for this purpose. Simply list all your R files in the order they should be processed to satisfy the dependencies.
If all your files are in R directory, any function will be in memory after you do a package build or Load_All.
You may have issues if you have code in files that is not in a function tho.
R loads files in alphabetical order.
Usually, this is not a problem, because functions are evaluated when they are called for execution, not at loading time (id. a function can refer another function not yet defined, even in the same file).
But, if you have code outside a function in model.R, this code will be executed immediately at time of file loading, and your package build will fail usually with a
ERROR: lazy loading failed for package 'yourPackageName'
If this is the case, wrap the sparse code of model.R into a function so you can call it later, when the package has fully loaded, external library too.
If this piece of code is there for initialize some value, consider to use_data() to have R take care of load data into the environment for you.
If this piece of code is just interactive code written to test and implement the package itself, you should consider to put it elsewhere or wrap it to a function anyway.
if you really need that code to be executed at loading time or really have dependency to solve, then you must add the collate line into DESCRIPTION file, as already stated by Peter Humburg, to force R to load files order.
Roxygen2 can help you, put before your code
#' #include sim.R table.R
call roxygenize(), and collate line will be generate for you into the DESCRIPTION file.
But even doing that, external library you may depend are not yet loaded by the package, leading to failure again at build time.
In conclusion, you'd better don't leave code outside functions in a .R file if it's located inside a package.
Since you're building a package, the reason why you're having trouble accessing the other functions in your /R directory is because you need to first:
library(devtools)
document()
from within the working directory of your package. Now each function in your package should be accessible to any other function. Then, to finish up, do:
build()
install()
although it should be noted that a simple document() call will already be sufficient to solve your problem.
Make your functions global by defining them with <<- instead of <- and they will become available to any other script running in that environment.
I am making an R package, and there is a need to keep track of files that were opened using the functions in the package.
What is the recommended procedure for creating R objects (in this case, a data.frame) upon loading the package in a way that is (sufficiently) hidden from the user? I do not want the user to manually edit the data.frame.
One idea I had was to create a data.frame in the options settings inside of an .onLoad call (similar to what Hadley does in his devtools package here), but the list of opened files is not really a configurable "option" in my package. Is there another way?
When you create an R package, unless you're exporting all objects, you have to list which objects are exported in the NAMESPACE file. If you need to maintain a data frame within your package but you don't want it made available to the user, you can choose not to export it by excluding it from the list.
I am building a package for internal use using devtools. I would like to have the package load in data from a file/connection (that differs depending on the date package is built). The data is large-ish so having a onetime cost of parsing and loading the data during package building is preferable.
Currently, I have a data.R file under R/ that assigns the data to package-level variables, the values are assigned during package installation (or at least that's what appears to be happening). This less than ideal setup mostly works. In order to get all instances of the package to have the same data I have to distribute the data file with the package (currently it's being copied to inst/ by a helper script before building the package) instead of just having it all be packaged together. There must be a better way.
Such as:
Generate .rda files during package building (but this requires not running the same code during package install)
I can do this with a Makefile but that seems like overkill
Can I have R code that is only run during package building and not during install?
Run R code in data/
But the data is munged using code in the package in question. I can fix that with Collate (I think) but then I have to maintain the order of all of the .R files (but with that added complexity I might as well use a Makefile?)
Build two packages, one with all of the code I want, one with the data.
Obvious, clever things I've not thought of.
tl;dr: What are some methods for adding a snapshot of dynamically changing data to an R package frozen for deployment?
As #BenBolker points out in the comments above, splitting the dataset out into a different package has precedent in the community (most notably the core package datasets) and has additional benefits.
The separation of functions from data also makes working on historic versions of the data easier to do with the up to date functions.
I currently have an tools-to-munge package and a things-to-munge package. Using a helper script I can build the tools-to-munge and setup a Suggests (or Depends) in the DESCRIPTION of both packages to point to the appropriate incrementing version of the packages. After the new tools-to-munge package has been built I can build the things-to-munge package as necessary using the functions in the tools-to-munge package.