Best way to reference files outside directory in R

Best way to reference files outside directory in R - r

So I use a lot of custom built functions in R which I save in the documents folder in my pc. I would like to bring these functions into my R environment (I usually use source()). At the moment I use the entire file path, i.e. C:\Users\usename\documents\R functions\my_function.r and then create a quick access shortcut link in my project directory to these functions (for easy reference in case its needed). However I was wondering if there is a better way to reference these files. By better I basically mean shorter, or a way to source the files through the quick access shortcut. An alternative to this would be to create a secondary directory so I could just type source("&/my_function.r") (the "&" means secondary directory). This is just a minor inconvenience I think would make life easier if resolved. What do yo think? is this unnecessary complication? Is there anyone in a similar situation as me that has any tips for easily sourcing functions?
Thanks a lot!

If these are functions you often use, you could wrap them in a minimalistic package. Then your call would just be library("myhelpers") and you have all of them available.
Creating this package is quite easy. Assuming you use RStudio, you just:
Create a package: File -> New Project -> New Directory -> R Package
Give it the name you want e.g. "myhelpers"
Specify the folder it should be in
Then RStudio directly creates the package structure for you.
Now you have the package structure in your folder. It will look like this:
- DESCRIPTION
- man
- NAMESPACE
- R
- myhelpers.Rproj
You just have to put your .R files with the functions in the R folder. It does not matter, if the functions are in one file or in multiple files.
Then in R Studio go to the Tab "Build" and click "Install and and Restart". That's it!
Now in your other projects or R files you can just type and use all the functions you put in the R folder:
library("myhelpers")
var <- myfunction1(x)
If you later on want to edit your package functions or add new ones, you can just go to the package folder and click on myhelpers.Rproj and RStudio will open your package project for you. After your changes just click again Build -> Install and and Restart to update the package.
Here is also a short explanation with pictures. This is all you need to use your functions for yourself. The nice thing is, from there you can also go further if needed. E.g. add documentation to your functions. (then you could also have a help() page to your function).

Related

Are there any good resources/best-practices to "industrialize" code in R for a data science project?

I need to "industrialize" an R code for a data science project, because the project will be rerun several times in the future with fresh data. The new code should be really easy to follow even for people who have not worked on the project before and they should be able to redo the whole workflow quite quickly. Therefore I am looking for tips, suggestions, resources and best-practices on how to achieve this objective.
Thank you for your help in advance!

You can make an R package out of your project, because it has everything you need for a standalone project that you want to share with others :
Easy to share, download and install
R has a very efficient documentation system for your functions and objects when you work within R Studio. Combined with roxygen2, it enables you to document precisely every function, and makes the code clearer since you can avoid commenting with inline comments (but please do so anyway if needed)
You can specify quite easily which dependancies your package will need, so that every one knows what to install for your project to work. You can also use packrat if you want to mimic python's virtualenv
R also provide a long format documentation system, which are called vignettes and are similar to a printed notebook : you can display code, text, code results, etc. This is were you will write guidelines and methods on how to use the functions, provide detailed instructions for a certain method, etc. Once the package is installed they are automatically included and available for all users.
The only downside is the following : since R is a functional programming language, a package consists of mainly functions, and some other relevant objects (data, for instance), but not really scripts.
More details about the last point if your project consists in a script that calls a set of functions to do something, it cannot directly appear within the package. Two options here : a) you make a dispatcher function that runs a set of functions to do the job, so that users just have to call one function to run the whole method (not really good for maintenance) ; b) you make the whole script appear in a vignette (see above). With this method, people just have to write a single R file (which can be copy-pasted from the vignette), which may look like this :
library(mydatascienceproject)
library(...)
...
dothis()
dothat()
finishwork()
That enables you to execute the whole work from a terminal or a distant machine with Rscript, with the following (using argparse to add arguments)
Rscript myautomatedtask.R --arg1 anargument --arg2 anotherargument
And finally if you write a bash file calling Rscript, you can automate everything !
Feel free to read Hadley Wickham's book about R packages, it is super clear, full of best practices and of great help in writing your packages.

One can get lost in the multiple files in the project's folder, so it should be structured properly: link
Naming conventions that I use: first, second.
Set up the random seed, so the outputs should be reproducible.
Documentation is important: you can use the Roxygen skeleton in rstudio (default ctrl+alt+shift+r).
I usually separate the code into smaller, logically cohesive scripts, and use a main.R script, that uses the others.
If you use a special set of libraries, you can consider using packrat. Once you set it up, you can manage the installed project-specific libraries.

Where to put R files that generate package data

I am currently developing an R package and want it to be as clean as possible, so I try to resolve all WARNINGs and NOTEs displayed by devtools::check().
One of these notes is related to some code I use for generating sample data to go with the package:
checking top-level files ... NOTE
Non-standard file/directory found at top level:
'generate_sample_data.R'
It's an R script currently placed in the package root directory and not meant to be distributed with the package (because it doesn't really seem useful to include)
So here's my question:
Where should I put such a file or how do I tell R to leave it be?
Is .Rbuildignore the right way to go?
Currently devtools::build() puts the R script in the final package, so I shouldn't just ignore the NOTE.

As suggested in http://r-pkgs.had.co.nz/data.html, it makes sense to use ./data-raw/ for scripts/functions that are necessary for creating/updating data but not something you need in the package itself. After adding ./data-raw/ to ./.Rbuildignore, the package generation should ignore anything within that directory. (And, as you commented, there is a helper-function devtools::use_data_raw().)

C file does not work in my own R package?

I built my own package in R and created all my functions. Everything worked very well. Then, I want to include a .C files into my package.
I follow the structure in this link compiled code. Once I done that, my package stop working and cannot use it anymore.
I tried to fix it more than one time but nothing is happen. Then, I built another package and load my functions inside it (I was save a copy of my files).
Now I would like to start again but do not want to lose my function again. Any ideas?

Try to write your files first and make sure that they are work! Then build you package following the structures here.
Follow the structures step by step and you will be fine. Your package will set src file for you and all your other files.

How to call R script from another R script, both in same package?

I'm building a package that uses two main functions. One of the functions model.R requires a special type of simulation sim.R and a way to set up the results in a table table.R
In a sharable package, how do I call both the sim.R and table.R files from within model.R? I've tried source("sim.R") and source("R/sim.R") but that call doesn't work from within the package. Any ideas?
Should I just copy and paste the codes from sim.R and table.R into the model.R script instead?
Edit:
I have all the scripts in the R directory, the DESCRIPTION and NAMESPACE files are all set. I just have multiple scripts in the R directory. ~R/ has premodel.R model.R sim.R and table.R. I need the model.R script to use both sim.R and table.R functions... located in the same directory in the package (e.g. ~R/).

To elaborate on joran's point, when you build a package you don't need to source functions.
For example, imagine I want to make a package named TEST. I will begin by generating a directory (i.e. folder) named TEST. Within TEST I will create another folder name R, in that folder I will include all R script(s) containing the different functions in the package.
At a minimum you need to also include a DESCRIPTION and NAMESPACE file. A man (for help files) and tests (for unit tests) are also nice to include.
Making a package is pretty easy. Here is a blog with a straightforward introduction: http://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/

As others have pointed out you don't have to source R files in a package. The package loading mechanism will take care of losing the namespace and making all exported functions available. So usually you don't have to worry about any of this.
There are exceptions however. If you have multiple files with R code situations can arise where the order in which these files are processed matters. Often it doesn't matter or the default order used by R happens to be fine. If you find that there are some dependencies within your package that aren't resolved properly you may be faced with a situation where a custom processing order for the R files is required. The DESCRIPTION file offers the optional Collate field for this purpose. Simply list all your R files in the order they should be processed to satisfy the dependencies.

If all your files are in R directory, any function will be in memory after you do a package build or Load_All.
You may have issues if you have code in files that is not in a function tho.
R loads files in alphabetical order.
Usually, this is not a problem, because functions are evaluated when they are called for execution, not at loading time (id. a function can refer another function not yet defined, even in the same file).
But, if you have code outside a function in model.R, this code will be executed immediately at time of file loading, and your package build will fail usually with a
ERROR: lazy loading failed for package 'yourPackageName'
If this is the case, wrap the sparse code of model.R into a function so you can call it later, when the package has fully loaded, external library too.
If this piece of code is there for initialize some value, consider to use_data() to have R take care of load data into the environment for you.
If this piece of code is just interactive code written to test and implement the package itself, you should consider to put it elsewhere or wrap it to a function anyway.
if you really need that code to be executed at loading time or really have dependency to solve, then you must add the collate line into DESCRIPTION file, as already stated by Peter Humburg, to force R to load files order.
Roxygen2 can help you, put before your code
#' #include sim.R table.R
call roxygenize(), and collate line will be generate for you into the DESCRIPTION file.
But even doing that, external library you may depend are not yet loaded by the package, leading to failure again at build time.
In conclusion, you'd better don't leave code outside functions in a .R file if it's located inside a package.

Since you're building a package, the reason why you're having trouble accessing the other functions in your /R directory is because you need to first:
library(devtools)
document()
from within the working directory of your package. Now each function in your package should be accessible to any other function. Then, to finish up, do:
build()
install()
although it should be noted that a simple document() call will already be sufficient to solve your problem.

Make your functions global by defining them with <<- instead of <- and they will become available to any other script running in that environment.

Including Script Files in an R Extension Package

I'm creating an R package and I need it to include a couple of non R script files which get called by one of my functions. I need these script files to be distributed with the package, naturally. So that leaves me with two questions:
a) In which directory of the package
tree should I place these files? b) Is that location mandatory or just convention?
Do I need to change any other
settings or configurations or will
they just get copied to the
directory mentioned in #1 and then I
can figure out the path using
system.file()?
I've tried to find the answer in the Writing R Extensions document, but it didn't jump out at me. And, of course, I didn't read the whole thing. Am I being too honest here?

I think you want either exec/ at the top-level (even though that is labeled 'still experimental, or subdirectory of inst as everything in inst/ gets copied verbatim into the package.
A quick example from the packages I have expanded in source is gdata which has inst/perl, inst/xls and inst/bin. These you could then call from R itself by computing the path of the installed package using system.file().