Where should I store private files in an R package? - r

I'm developing an R package with an XLSX file that serves as a template for a function that exports a report.
I currently store it under inst/ but I don't think this is best as the user should have no access to that file.
I know I could store it privately as an object, but I'd like to know if it is possible to store it privately as a file that's loaded when the package user executes their code. This would make the file more easily update-able and the current codebase would require less restructuring.

Related

Using R, is it possible to dyn.load() a shared object library (.so) file directly from a URL?

I'm trying to figure out if it's possible to use the R dyn.load() function to load a shared object library (.so) file directly from a URL, instead of first having to download it into your R working directory.
The R environment that I'm working with runs on an AWS lambda server, and as a consequence I can only read, not write to or create, files in the working directory.
Any help would be much appreciated!
(.so file URL: http://cloud.sowiso.nl/images/uploads/mvtnorm.so)

Source user defined functions in R

I have created a script with a few custom functions and have saved the file in one of the folders.
However, if i want to access the functions using source("Functions.R") i have to have the "Functions.R" file in my working directory. Is there any way of fetching the file without copying it to my current working directory? (I do not want to create a package for it)
Thanks!

R shiny concurrent file access

I am using the R shiny package to build a web interface for my executable program. The web interface provides user input and shows output.
On the server background, the R script formats user inputs and saves them to a local input file. Then R calls the system command to run the executable program.
My concern is that if multiple users run the web app at the same time, it is possible that the input file generated by the first user will be overwritten by the second user's input before it is read by the executable program.
One way to solve the conflict is to ask R to create a temporary folder and generate/run the input file under that folder for each user. But I'd like to know whether there is a better or automatic way to resolve this potential conflict with shiny. For example, if use shiny fileInputs, the uploaded files are automatically stored in a temporary folder.
Update
Thanks for the advice.#Symbolix and #Mike Wise
I read the persistent data storage article before but I don't think it is exactly what I wanted. Maybe my understanding is not correct. I end up with creating a temporary folder and run my executable from there.

Set an .Rmd in a package to write files to the current project working directory

I have a .Rmd which I use to report on data quality in a number of different r projects. It would then split the data to remove subsets with missing data, and interpolate missing results where appropriate. It would do this via a write.csv command to a file path in the form of "./Cleansed_data/"
To make an example
open rstudio
go to the rhs 'project' menu , and select and make a new
project wherever you'd like
go to the lhs 'new script' drop down and
select 'new .Rmd'
change the output to .pdf and hit ok
in the last r
chunk include write.csv(mtcars, file = "mtcars.csv")
hit the 'knit
pdf' button, save the report as "writeFile.Rmd" to your project working directory, and
let it run.
Previously I moved this .Rmd from place to place, however now I would like to built it into an internal package. I have included it (as the documentation indicates to) into inst/rmd within the package directory.
In order to do this build or open any package you have access to
add the file to inst/rmd (create it if this doesn't exist)
rebuild the package
I then rebuild the package and open a new project. I load my new package and attempt to run the document via the render command using the system.file command to locate the .rmd like so
rmarkdown::render(input = system.file("rmd/writeFile.Rmd", package="MyPackage"),
output_file = "writeFile.pdf", output_dir = "./Cars/)
This will render the report from the package build into the folder from output_dir, however, there are a number of pitfalls here. First, if I omit the output_dir argument, the report will render into the package library, usually located in the libraries r installation in the c drive. This is however fixable.
What I can't get around is that when the .Rmd hits the write.csv() then (I believe) the .Rmd is being rendered in the package environment at the time, the working directory of which is the package library folder, not the current project directory.
The Questions
How can I inform the template in the package what the current working directory is for the rstudio project? I'm vaguely aware there might be a rstudio api package? I have nearly no understanding of what it is though, or if this would provide a solution.
If this is either outright impossible or just potentially a very bad idea how can I modify the workflow to successfully retrieve a number of r object outputs into the environment or the working directory, on the call to the report, without having to modify the report for each different project? Further, why specifically is this approach such a bad plan?
In order to close this off:
I have selected to keep the .Rmd included in the package. The .Rmd need to move and be versioned with the package as that holds the functions they use to run.
In order to meet my requirements I style the documents to grab the working directory via the rstudio api in the form.
write.csv(mtcars, file = paste0(rstudioapi::getActiveProject(), "mtcars.csv"))
Having tested #CL's answer, this also runs and is not dependant on Rstudio as an IDE, however I know that these documents will
Always be accessed via the rstudio IDE
Always be accessed from within a specific project
I fear (though have not tested) that there would be the potential for other impacts from setting the working directory for the file to be artificially booted into a different WD. Potentially this could be things like child documents I might want to include later, or other code that might need to be relevant to the file path of the package installation, not the project. In this way I think (If I interpreted Yuhui correctly) the r doc is still the centre of it's own universe. It just writes it's data into another one :)

Is there a persistent location that is always writable which can be used as data cache by a package?

Is there a predefined location where an R package could store cached data? The data should persist across sessions. I was thinking about creating a subdirectory of ${R_LIBS_USER}/package_name, but I'm not sure if this is portable and if this is "allowed" if my package is installed systemwide.
The idea is the following: Create an R script mydata.R in the data subdirectory of the package which would be executed by calling data(mydata) (according to the documentation of data()). This script would load the data from the internet and cache it, if it hasn't been cached before. (If the data has been cached already, the cache will be used.) In addition, a function will be provided to invalidate the cache and/or to check if a newer version of the data is available online.
This is from the documentation of data():
Currently, four formats of data files are supported:
files ending ‘.R’ or ‘.r’ are source()d in, with the R working directory changed temporarily to the directory containing the respective file. (data ensures that the utils package is attached, in case it had been run via utils::data.)
...
Indeed, creating a file fortytwo.R in the data subdirectory of a package with the following contents:
fortytwo = data.frame(answer=42)
and then executing data(fortytwo) creates a data frame variable fortytwo. Now the question is: Where would fortytwo.R cache the data if it were difficult to compute?
EDIT: I am thinking about creating two packages: A "data" package that provides the data, and a "code" package that operates on it. The question concerns the "data" package: Where can it store files in a per-user storage so that it is persistent across R sessions and is accessible from different R projects?
Related: Package that downloads data from the internet during installation.
There is no absolutely defined location for package-specific persistent caching in R. However, the R.cache package provides an interface for creating and managing cached data. It looks like it could be useful for your scenario.
When users load R.cache (library(R.cache)), they get the following prompt:
The R.cache package needs to create a directory that will hold cache files.
It is convenient to use one in the user's home directory, because it remains
also after restarting R. Do you wish to create the '~/.Rcache/' directory? If
not, a temporary directory (/tmp/RtmpqdUcbP/.Rcache) that is specific to this
R session will be used. [Y/n]:
They can then choose to create the cache directory in their home directory, which is presumably persistent, or to create a session-specific directory. If you make your data package depend on R.cache, you could check for the existence of the cached object(s) in its .onLoad() hook function and download the data if it isn't there. Alternatively, you could do this in the way suggested in your own question.
Have you looked at in-memory databases? H2 & Redis have bindings in R via RH2 & rredis- both allow you to share the data across r sessions- till the creating session is alive. in order to have it persisting across non-concurrent sessions, you need to write your data to the disk (assuming you can't re-create it on the fly- that would defeat the purpose of this question), and I believe the data package would be a good option. That way, you could add an update function that initializes everytime you load either package (i.e. if the code package has the right dependencies)
An example is RWeka & RWekaJars packages. Look them up on CRAN, and it should be fairly easy to understand how they work.

Resources