Is it good practice to include every library I need to execute a function within that function?
For example, my file global.r contains several functions I need for a shiny app. Currently I have all needed packages at the top of the file. When I'm switching projects/copying these functions I have to load the packages/include them in the new code. Otherwise all needed packages are contained in that function. Of course I have to check all functions with a new R session, but I think this could help in the long run.
When I tried to load a package twice it won't load the package again but checks it's already loaded. My main question is if it would slow my functions if I restructure in that way?
I only saw that practice once, library calls inside functions, so I'm not sure.
As one of the commenters suggest, you should avoid loading packages within a function since
The function now has a global effect - as a general rule, this is something to avoid.
There is a very small performance hit.
The first point is the big one. As with most optimisation, only worry about the second point if it's an issue.
Now that we've established the principle, what are the possible solution.
In small projects, I have a file called packages.R that includes all the library calls I need. This is sourced at the top of my analysis script. BTW, all my functions are in a file call func.R. This workflow was stolen/adapted from a previous SO question
If you're only importing a single function, you could use the :: trick, e.g. package::funcA(...) That way you avoid loading the package.
For larger projects, I create an R package that handles all necessary imports. The benefit of creating a package is detailed in this answer on structuring large R projects.
Related
Tl;dr
I have a complete NAMESPACE file for the package I am building. I want to execute all the importFrom(x,y) clauses in that file to get the set of necessary functions I use in the package. Is there an automated way to do that?
Full question
I'm currently working on building a package that in turns depends on a bunch of other packages. Think: both Hmisc and dplyr.
The other contributors to the package don't really like using devtools::build() every 5 minutes when they debug, which I can understand. They want to be able to toggle a dev_mode boolean which, when set to true, loads all dependencies and sources all scripts in the R/ and data-raw/ folders, so that all functions and data objects are loaded in memory. I don't necessarily think that's a perfectly clean solution, but they're technically my clients and it's a big help to them so please don't offer a frame challenge unless it's the most user-friendly solution imaginable.
The problem with this solution is that when I load two libraries whose namespace clash, functions in the package that would perfectly work otherwise start throwing errors. I thus need to improve my import system.
Thanks to thorough documentation (and the help of devtools::check()), my NAMESPACE is complete. I guess I could split it in pieces and eval() some well-chosen parts of it, but it sort of feels like I'm probably not the first person to encounter this difficulty. Is there an automated way to parse NAMESPACE and import the proper functions from the proper packages ?
Thanks !
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I finally arrived at a point in using R where my programs are not anymore grown-up command line scripts, but real codes. At this point, I think it doesn't make sense to keep all the functions used by the main code in the same source file. Now, If I understand correctly, the way to use function myfunction, stored in file hereliesfunction.r, from a script stored in file myscript.r, is to add the line
source("hereliesfunction.r")
in file myscript.r, before the part of the script code where myfunction is used.
Is this the right approach in R?
Do I need a different source command for each function used by my main code? I guess it works "recursively",i.e., I can put source commands in
hereliesfunction.r to let myfunction use other functions.
What happens when I return from myfunction? Do these other
functions remain in memory, ready to be accessed by the main code too, or are they destroyed just like any other object created by myfunction?
Finally, is there some guideline on whether to store all the
functions used by a main code in the same directory as the main
code, or not?
Once you source a R file, it runs all the commands in that file. If it contains a function definition, it stores it into the global environment and is at your disposal until you remove it or close R session (so 3., yes).
Your entire post is screaming R package. As #docendodiscimus has pointed out, you should invest some time to develop a package. Not only does it hold your code in one place, is easy to maintain, it also offers a great platform to document your code (probably the most important part of code development/analysis) through help files and vignettes and offers easy version control through local and remote repositories (git, svn...).
[about sourcing] Is this the right approach in R?
Yes but in the mid-term, consider building a package as stated by #docendo discimus. devtools::create() and if you use RStudio Projects > New package are your friends. Learning to build packages is made simple by Hadley's R-pkg and was, personally, the best investment ever in R. Plus documenting and writing tutorials/vignettes and writing tests is always useful: it may be time consuming at the first glance, but you will probably soon hugely benefit from it (better understanding of your code, realizing you can improve the package architecture, etc.)
Do I need a different source command for each function used by my main code?
All functions, and in a larger extent code, located in the file sourced will be executed in R (so functions will be declared and available, you can check it with ls()
I guess it work "recursively",i.e., I can put source commands in hereliesfunction.r to let myfunction use other functions.
Yes
What happens when I return from myfunction? Do these other functions remain in memory, ready to be accessed by the main code too, or are they destroyed just like any other object created by myfunction?
Not sure to understand but may be related to previous points.
Finally, is there some guideline on whether to store all the functions used by a main code in the same directory as the main code, or not?
You can store them wherever you want, as long as the path for source is the right one. But it's generally a better practice to store all your functions in the same directory (or in a subfolder, eg /code, so that you just change your working directory once (or if you use RStudio's projects, you don't even need to bother, you just open the project), and as a side effect, as long as one is working in the same directory, the relative paths will still work. And thus you can share the folder with Dropbox or other, which ease collaboration.
Again, in the mid term or if many projects use the same source files, it's probably a good idea to write a package (for your own use, or to share on GitHub or CRAN or...)
I have written personal functions in R that are not specific to one (or a few) projects.
What are the best practices (in R) to put those kind of functions?
Is the best way to do it to have one file that gets sourced at startup? or is there a better (recommended) way to deal with this situation?
Create a package named "utilities" , put utility functions in that package, try to aim for one function per file, and store the package in a source control system (e.g., GIT, SVN ). It will save you time in the long run.
P.S. .Rprofile tends to get accidentally deleted.
If you have many, it would be good to make it into a package that you load each time you start working.
It is probably not a good idea to have a monolithic script with a bunch of functions. Instead break the file up into several files each of which either has only one function (my preference) or has a group of functions that are logically similar. That makes it easier to find things when you need to make changes.
Most people use the .Rprofile file for this. Here are two links which talk about this file in some detail.
http://www.statmethods.net/interface/customizing.html
http://blog.revolutionanalytics.com/2013/10/sample-rprofile.html
At the top of my .Rprofile file I call library() for the various libraries which I normally use. I also have some personal handy functions which I've come to rely on. Because this file is sourced on startup, they are available to me every session.
From my experience, a package will be the best choice for personal functions. Firstly I put all new functions into a personal package, which I called it My. When I find some functions was similar and are worth to become an independent package, I will create a new package and move them.
While I am writing .R functions I constantly need to manually write source("funcname.r") to get the changes reflected in the workspace. I am sure it must be possible to do this automatically. So what I would like would be just to make changes in my function, save the function and be able to use the new function in R workspace without manually "sourcing" this function. How can I do that?
UPDATE: I know about selecting appropriate lines of code and pressing CTRL+R in R Editor (RGui) or using Notepad++ and executing the lines into R. But this approach has a disadvantage of making my workspace console "muddled". I would like to stop this practice if at all possible.
You can use R studio which has a source on save option.
If you are prepared to package your functions into a package, you may enjoy exploring Hadley's devtools package. This provides a suite of tools to write, test and document
packages.
https://github.com/hadley/devtools
This approach offer many advantages, but mainly reloading the package with a minimum of retyping.
You will still have to type load_all("yourpackage") but I find this small amount of typing is small beer compared to the advantages of devtools.
For additional information, including how to setup devtools, have a look at https://github.com/hadley/devtools/wiki/development
If you're using Eclipse + StatET, you can press CTRL+R+S, which saves your script and sources it. As close to automatic as I can get.
If you can get your text editor to run a system command after it saves the file, then you could use something like AutoIt (on Windows) or a batch script (on UNIX-derivative) to pass a call to source off to all running copies of R. But that's a heck of a lot of work for not much gain.
Still, I think it's much more likely to work being event-driven on the text editor end vs. having R constantly scan for updates (or somehow interface with the OS's update-event-messaging-system).
This is likely not possible (automatically detecting disc changes without intervention or running at least one line).
R needs to read into memory functions, so a change on the disc wouldn't be reflected in the workspace without reloading your functions.
If you are into developing R functions, some amount of messiness during your development process will be likely inevitable, but perhaps I could suggest that you try writing an R-package to house your functions?
This has the advantage of being able to robustly document your functions, using lazy loading so that you have access to your functions/datasets immediately without sourcing them.
Don't be afraid of making a package, it's easy with package.skeleton() and doesn't have to go on CRAN but could be for your own personal use without distribution! Just have fun!
Try to accept some messiness during development knowing you are working towards your goal and fighting the good fight of code organization and documentation!
We are only imperfect people, in an imperfect world, but we mean well!
I have several custom functions that I use frequently in R. Rather than souce this file (or parts thereof) in each script, is there some way to add this to a base R file such that they are always available when I use R?
Yes, create a package. There are numerous tutorials as well as the Writing R Extensions manual that came with your copy of R.
It may seem like too much work at first, but you will probably be glad that you did this in the longer run.
PS And you can then load that package from ~/.Rprofile. For really short code, you can also define it there.
A package may be overkill for a for a few useful functions. I'd argue there's nothing wrong with explicitly source()ing them as you need them - at least it is explicit so that if you email someone your code, you won't forget to include those other scripts.
Another option is to use the .Rprofile file. You can read about the details in ?Startup. Basically, the idea is that:
...a file called ‘.Rprofile’ is searched for in the current directory or
in the user's home directory (in that order). The user profile file is
sourced into the workspace.
You can read here about how many people use this functionality.
The accepted answer is best long-term: Make a package.
Luckily, the learning curve for doing this has been dramatically reduced by the devtools package: It automates package creation (a nice assist in getting off on the right foot), encourages good practices (like documenting with roxygen2, and helps with using online version control (bitbucket, github or other), sharing your package with others. It's also very helpful for smoothing your way to CRAN submission.
Good docs at http://adv-r.had.co.nz and http://r-pkgs.had.co.nz .
to create your package, for instance you can:
install.packages("devtools")
devtools::create("path/to/package/pkgname")
You could also look at the 'mvbutils' package: it lets you set up a hierarchical set of "tasks" (folders with workspace ".RData" files in them) such that you can always see what's in the ancestral tasks (ie the ancestors are in the search() path). So you can put your custom functions in the "starting task" where you always start R; and then you change to vwhatever project-specific task you require, so you can avoid cluttered workspaces, but you'll still be able to use (and edit) your custom functions because the starting task is always ancestral. Objects (including functions) get stored in ".RData" files and are thus loaded/saved automatically, but there are separate text-backup facilities for functions.
There are lots of different ways of working in R, and no "one-size-fits-all" best solution. It's also not easy to find an overview! Speaking just for myself:
I'm not a fan of having to 'source' everything in every time; for one thing, it simply doesn't work with big data sets and/or results of model runs.
I think packages are hard to create and maintain; there is a really significant overhead. After the first 5 packages you write, it does get a bit easier provided you do it on at least a weekly basis so you don't forget how, but really...
In fact, 'mvbutils' also has a bunch of tools for facilitating the creation and (especially) maintenance of packages, designed to interface smoothly with the task-hierarchy system. I use & edit my own packages all the time (including editing mvbutils itself); but if it wasn't for the tools in 'mvbutils', I'd be grinding my teeth in frustration most days of the week.