Organizing R's work in functions and subfuctions - r

My aim is to better organize the work done by a R code.
In particular it could be useful to split the R code I have written in different R files, perhaps with each R file accomplishing to a different task. I have in mind what we can do in Matlab with different M files, where we can easily call functions written in different M files directly from the main code.
Is it useful to write this R files in the form of functions?
How can we call these R files /functions in the main code?
Thanks

You can use source("filename.R") to include the file in your main script.
I am not sure if there is a ready function to include an entire directory, but it is straightforward to write using list.files() and then call source dynamicly for each filename. You can also filter files to only list *.R for example.

Unless you intend to write an R package, you should rethink your organization. R is not Matlab, thank goodness! You can place as many functions as you wish into a single file, and make them all available in your environment with source foo.r . If you are writing a collection of generic functions and don't want to build a package, this really is the cleaner way to go.
As a side thought, consider making some of your tools more flexible by adding more input arguments. You may find that you don't really need so many separate functions/files. As a trivial example, if you have some function do_it_double , another do_it_integer , and yet another do_it_character , all of which do basically the same thing, just merge them into a single do_it_all(x,y,datatype='double') and override the default datatype as desired. (I know this can be done with internal input validation. I'm just giving an example)

Your approach might be working good. I would recommend to wrap the code in a function and use one R file for one R function.
It might be interesting to look at the packages devtools and ProjectTemplate which aim to help organizing R code.

Related

Are there any good resources/best-practices to "industrialize" code in R for a data science project?

I need to "industrialize" an R code for a data science project, because the project will be rerun several times in the future with fresh data. The new code should be really easy to follow even for people who have not worked on the project before and they should be able to redo the whole workflow quite quickly. Therefore I am looking for tips, suggestions, resources and best-practices on how to achieve this objective.
Thank you for your help in advance!
You can make an R package out of your project, because it has everything you need for a standalone project that you want to share with others :
Easy to share, download and install
R has a very efficient documentation system for your functions and objects when you work within R Studio. Combined with roxygen2, it enables you to document precisely every function, and makes the code clearer since you can avoid commenting with inline comments (but please do so anyway if needed)
You can specify quite easily which dependancies your package will need, so that every one knows what to install for your project to work. You can also use packrat if you want to mimic python's virtualenv
R also provide a long format documentation system, which are called vignettes and are similar to a printed notebook : you can display code, text, code results, etc. This is were you will write guidelines and methods on how to use the functions, provide detailed instructions for a certain method, etc. Once the package is installed they are automatically included and available for all users.
The only downside is the following : since R is a functional programming language, a package consists of mainly functions, and some other relevant objects (data, for instance), but not really scripts.
More details about the last point if your project consists in a script that calls a set of functions to do something, it cannot directly appear within the package. Two options here : a) you make a dispatcher function that runs a set of functions to do the job, so that users just have to call one function to run the whole method (not really good for maintenance) ; b) you make the whole script appear in a vignette (see above). With this method, people just have to write a single R file (which can be copy-pasted from the vignette), which may look like this :
library(mydatascienceproject)
library(...)
...
dothis()
dothat()
finishwork()
That enables you to execute the whole work from a terminal or a distant machine with Rscript, with the following (using argparse to add arguments)
Rscript myautomatedtask.R --arg1 anargument --arg2 anotherargument
And finally if you write a bash file calling Rscript, you can automate everything !
Feel free to read Hadley Wickham's book about R packages, it is super clear, full of best practices and of great help in writing your packages.
One can get lost in the multiple files in the project's folder, so it should be structured properly: link
Naming conventions that I use: first, second.
Set up the random seed, so the outputs should be reproducible.
Documentation is important: you can use the Roxygen skeleton in rstudio (default ctrl+alt+shift+r).
I usually separate the code into smaller, logically cohesive scripts, and use a main.R script, that uses the others.
If you use a special set of libraries, you can consider using packrat. Once you set it up, you can manage the installed project-specific libraries.

"Hide" functions in RStudio

I am in the painful process of transitioning from MATLAB to R, and still coming to terms with not having a neatly arranged MathWorks website to consult.
When writing MATLAB functions, they are stored in a local drive and can be accessed in my source code (as long as the function is in the active directory).
When writing a function in R, I need to "run" it, so it is stored in the global environment, then I can use it. Surely there is a 'nicer' way of doing this, as I will need to refer to many, many functions. Can I seemingly "hide" them so I don't have to see them, but always know they exist?
Thanks in advance
source('F:\\RWorkingDirectory\\my_functions.r') or you create your own R package which is very easy to do with Rstudio.
Thanks for the suggestions. I have decided to set up an environment instead.
E.g.,
Set up R script with my desired function(s) called MainFunctions.R
Add to .Rprofile:
e <- new.env()
source("MainFunctions.R",local=e)
attach(e)
Now I need to simply edit this file, and not worry about having to load them in, or create a package.

Sourcing So many R scripts

I have lots of .r scripts that I want to source all. I have written a function like the one below to source.
sourcer=function(){
source("wil.r")
source("k.r")
source("l.r")
}
Please can any one tell me how to get this codes activated and how to call each one any time I want to use it?
In addition to the answer by #user2885462, if the amount of R code you need to source becomes bigger, you might want to wrap the code into an R package. This provides a convenient way of loading the code, and allows you to add tests, documentation, etc. Reading the official package writing tutorial is a good place to start for that.
For an individual project, I like to have all (or most) of my R functions in separate .r files, all in the same folder: e.g., AllFunctions
Then at the beginning of my main code I run the following line of code, which sources all .r (and other extensions if they exist - which they usually don't) in the AllFunctions folder:
for (nm in list.files("AllFunctions", pattern = ".[RrSsQq]$")) source(file.path("AllFunctions", nm))

How to keep various-topic R script at hand?

I have written and collected R code on various topics that solve particular problems at hand. I stored the R script/code in .txt files. I have now 100s of them.
How do you keep your R code at hand efficiently?
#Manetheran has the right idea: write a package. It's easy to do (especially with RStudio). Read "Writing R Extensions" and then on top of that learn about roxygen2 (which allows you to document each function in-line and avoid writing .Rd files).
Then you can use devtools to load your package locally, or once it's stable if you think other people can use the functions you can submit your package to CRAN.
I prefer to keep it simple. I use Total Commander and when I need an example which uses some R function, I just do Alt-F7 and search for *.R files which contain the desired string.
I use RStudio and have created two or three basic scripts. I save my much-used functions in the basic script that is most appropriate. Then, at the start of an RStudio script for a project, I source one or more of the basic scripts as is appropriate.

Importing Functions into Current Namespace

Let's say I have an R source file comprised of some functions, doesn't matter what they are, e.g.,
fnx = function(x){(x - mean(x))/sd(x)}
I would like to be able to access them in my current R session (without typing them in obviously). It would be nice if library("/path/to/file/my_fn_lib1.r") worked, as "import" works in Python, but it doesn't. One obvious solution is to create an R Package, but i want to avoid that overhead just to import a few functions.
Use the source() command. In your case:
source("/path/to/file/my_fn_lib1.r")
Incidentally, creating a package is fairly easy with the package.skeleton() function (if you're planning to reuse this frequently).

Resources