Bundling large .rda files with package - r

I am currently working on a package that I want to bundle some large .rda files with (hundreds of MB). If I use devtools::load_all(), my package takes forever to load since I included the files in the /data/ dir.
Is there a way to tell R to ignore the files in /data/ until I manually load them with data(), or am I better of just putting my data into a different directory?

How about you
create a directory inst/optionalData/ (or another suitable name)
add functions to load these data sets on demand
as you can rely on
system.files("optionalDate", "nameOfFile.rds", package="yourPackage")
to find it.

Related

How to reach a file without knowledge about the user directory

I'm providing a .zip with a .R file and a .xlsx file to some people
I need to make a code that can read this .xlsx file in any directory of any pc.
But as the directories vary from computer to computer, I couldn't find a solution.
IMPORTANT: I'm not using Rstudio for read this .R, so i just can use base functions
Using R - How do I search for a file/folder on all drives (hard drives as well as USB drives) This question don't solve my problem..
Take a look at the here package. When you load the library (library("here")) it sets "base" working directory and then you can use the package to construct relative file paths given that location. For example, if inside your .zip file you have an R script (e.g., My Data Analysis.R) that analyzes data that is kept within a folder called data you could read it in using, for example, read.csv(here("data", "my_csv_file.csv")) and it will construct the full appropriate file path no matter what computer it is on. Of course the file structure of the program needs to stay the same across programs.

R - how to load each file from folder

I have a folder and that contains a lot of R files, that files are functions actually. What I need is to create code in another project in R which will load each file in that folder and load this functions in to the environment.
I know that better option is to create a R package from this functions but it canĀ“t be done for several reasons in my case.
What is the simplest way to achieve my goal?

Setting scratch as temporary directory from R

I would like to set my temporary directory using the scratch space of the cluster. I have tried various methods and this one: How to change directory for temporary files - problems with huge temporary raster files but nothing works.
I have to read a large file (12 GB) in R and run some code using it.
I would like to read the file in this way:
library(data.table)
mydata<-fread("path/file")
But first I believe it is necessary to set the temporary directory as scratch/ otherwise the job has been killed.
Feel free to suggest any other approach.

Incorporating data in to R packages from a external file

I've been using http://r-pkgs.had.co.nz as a guide with much success, but I'm stuck on something that I haven't been able to resolve for many hours. Perhaps because I'm a bit thick...
Basically, I want to include a bunch of csv's as data.frames into an R package I'm working on. If I manually save the data objects as .rda or .Rdata and place them in the <package_dir>/data folder, I can build the package and the data is accessible upon loading the package. However, these csv's will receive updates every so often, and when this happens I'd like to just hit 'build' in R-Studio and have the data objects rebuilt from the csv's using some very simple processing scripts I have written.
So based on Hadley's guide, I've run devtools::use_data_raw() and put the csv's in the <package_dir>/data-raw folder. In that folder I've also placed R scripts to turn these data files in to R objects and then save them to the correct location and with the correct format with devtools::use_data(<object_name>).
My original interpretation was that that when building the package, the scripts in <package_dir>/data-raw get run to produce the .rda files in the <package_dir>/data folder. I'm guessing this is incorrect? If this is wrong, is there a way to automatically source those scripts when building the package? Is/would this be a bad practice?

Load Folder of Scripts in R at startup?

I'm new to R and frankly the amount of documentation is overwhelming, and I haven't been able to find the answer to this question.
I have created a number of .R script files, all stored in a folder that I can access on my server (let's say the folder is, using the Windows backslash character \\servername\Paige\myscripts)
I know that in R you can call each script individually, for example (using the forward slash required in R)
source(file="//servername/Paige/myscripts/con_mdb.r")
and now this script, con_mdb, is available for use.
If I want to make all the scripts in this folder available at startup, how do I do this?
Briefly:
Use your ~/.Rprofile in the directory found via Sys.getenv("HOME") (or if that fails, in R's own Rprofile.site)
Loop over the contents of the directory via dir() or list.files().
Source each file.
as eg via this one liner
sapply(dir("//servername/Paige/myscripts/", "*.r"), source)
but the real story is that you should not do this. Create a package instead, and load that. Bazillion other questions here on how to build a package. Research it -- it is worth it.
Far the best way is to create a package! But as first step, you could also create one r script file (collection.r) in your script directory which includes all the scripts in a relative manner.
In your separate project scripts you can than include only that script with
source(file="//servername/Paige/myscripts/collection.r", chdir = TRUE)
which changes the directory before sourcing. Therefore you would have only to include one file for each project.
In the collection file you could use a loop over all files (except collection.r) or simply list them all.

Resources