While I am writing .R functions I constantly need to manually write source("funcname.r") to get the changes reflected in the workspace. I am sure it must be possible to do this automatically. So what I would like would be just to make changes in my function, save the function and be able to use the new function in R workspace without manually "sourcing" this function. How can I do that?
UPDATE: I know about selecting appropriate lines of code and pressing CTRL+R in R Editor (RGui) or using Notepad++ and executing the lines into R. But this approach has a disadvantage of making my workspace console "muddled". I would like to stop this practice if at all possible.
You can use R studio which has a source on save option.
If you are prepared to package your functions into a package, you may enjoy exploring Hadley's devtools package. This provides a suite of tools to write, test and document
packages.
https://github.com/hadley/devtools
This approach offer many advantages, but mainly reloading the package with a minimum of retyping.
You will still have to type load_all("yourpackage") but I find this small amount of typing is small beer compared to the advantages of devtools.
For additional information, including how to setup devtools, have a look at https://github.com/hadley/devtools/wiki/development
If you're using Eclipse + StatET, you can press CTRL+R+S, which saves your script and sources it. As close to automatic as I can get.
If you can get your text editor to run a system command after it saves the file, then you could use something like AutoIt (on Windows) or a batch script (on UNIX-derivative) to pass a call to source off to all running copies of R. But that's a heck of a lot of work for not much gain.
Still, I think it's much more likely to work being event-driven on the text editor end vs. having R constantly scan for updates (or somehow interface with the OS's update-event-messaging-system).
This is likely not possible (automatically detecting disc changes without intervention or running at least one line).
R needs to read into memory functions, so a change on the disc wouldn't be reflected in the workspace without reloading your functions.
If you are into developing R functions, some amount of messiness during your development process will be likely inevitable, but perhaps I could suggest that you try writing an R-package to house your functions?
This has the advantage of being able to robustly document your functions, using lazy loading so that you have access to your functions/datasets immediately without sourcing them.
Don't be afraid of making a package, it's easy with package.skeleton() and doesn't have to go on CRAN but could be for your own personal use without distribution! Just have fun!
Try to accept some messiness during development knowing you are working towards your goal and fighting the good fight of code organization and documentation!
We are only imperfect people, in an imperfect world, but we mean well!
Related
When closing R Studio at the end of a R session, I am asked via a dialog box: "Save workspace image to [working directory] ?"
What does that mean? If I choose to save the workspace image, where is it saved? I always choose not to save the workspace image, are there any disadvantages to save it?
I looked at stackoverflow but did not find posts explaining what does the question mean? I only find a question about how to disable the prompt (with no simple answers...): How to disable "Save workspace image?" prompt in R?
What does that mean?
It means that R saves a list of objects in your global environment (i.e. where your normal work happens) into a file. When R next loads, this list is by default restored (at least partially — there are cases where it won’t work).
A consequence is that restarting R does not give you a clean slate. Instead, your workspace is cluttered with existing stuff, which is generally not what you want. People then resort to all kinds of hacks to try to clean their workspace. But none of these hacks are reliable, and none are necessary if you simply don’t save/restore your workspace.
If I choose to save the workspace image, where is it saved?
R creates a (hidden) file called .RData in your current working directory.
I always choose not to save the workspace image, are there any disadvantages to save it?
The advantage is that, under some circumstances, you avoid recomputing results when you continue your work later. However, there are other, better ways of achieving this. On the flip side, starting R without a clean slate has many disadvantages: Any new analysis you now start won’t be in a clean room, and it won’t be reproducible when executed again.
So you are doing the right thing by not saving the workspace! It’s one of the rules of creating reproducible R code. For more information, I recommend Jenny Bryan’s article on using R with a Project-oriented workflow
But having to manually reject saving the workspace every time is annoying and error-prone. You can disable the dialog box in the RStudio options.
The workspace will include any of your saved objects e.g. dataframes, matrices, functions etc.
Saving it into your working directory will allow you to load this back in next time you open up RStudio so you can continue exactly where you left off. No real disadvantage if you can recreate everything from your script next time and if your script doesn't take a long time to run.
The only thing I have to add here is that you should consider seriously that some people may be working on ongoing projects, i.e. things that aren't accomplished in one day and thus must save their workspace image so as to not start from the beginning again.
I think, best practice is: its ok to save your workspace, but your code only really works if you can clear your entire workspace and then rerun it completely with no errors!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I finally arrived at a point in using R where my programs are not anymore grown-up command line scripts, but real codes. At this point, I think it doesn't make sense to keep all the functions used by the main code in the same source file. Now, If I understand correctly, the way to use function myfunction, stored in file hereliesfunction.r, from a script stored in file myscript.r, is to add the line
source("hereliesfunction.r")
in file myscript.r, before the part of the script code where myfunction is used.
Is this the right approach in R?
Do I need a different source command for each function used by my main code? I guess it works "recursively",i.e., I can put source commands in
hereliesfunction.r to let myfunction use other functions.
What happens when I return from myfunction? Do these other
functions remain in memory, ready to be accessed by the main code too, or are they destroyed just like any other object created by myfunction?
Finally, is there some guideline on whether to store all the
functions used by a main code in the same directory as the main
code, or not?
Once you source a R file, it runs all the commands in that file. If it contains a function definition, it stores it into the global environment and is at your disposal until you remove it or close R session (so 3., yes).
Your entire post is screaming R package. As #docendodiscimus has pointed out, you should invest some time to develop a package. Not only does it hold your code in one place, is easy to maintain, it also offers a great platform to document your code (probably the most important part of code development/analysis) through help files and vignettes and offers easy version control through local and remote repositories (git, svn...).
[about sourcing] Is this the right approach in R?
Yes but in the mid-term, consider building a package as stated by #docendo discimus. devtools::create() and if you use RStudio Projects > New package are your friends. Learning to build packages is made simple by Hadley's R-pkg and was, personally, the best investment ever in R. Plus documenting and writing tutorials/vignettes and writing tests is always useful: it may be time consuming at the first glance, but you will probably soon hugely benefit from it (better understanding of your code, realizing you can improve the package architecture, etc.)
Do I need a different source command for each function used by my main code?
All functions, and in a larger extent code, located in the file sourced will be executed in R (so functions will be declared and available, you can check it with ls()
I guess it work "recursively",i.e., I can put source commands in hereliesfunction.r to let myfunction use other functions.
Yes
What happens when I return from myfunction? Do these other functions remain in memory, ready to be accessed by the main code too, or are they destroyed just like any other object created by myfunction?
Not sure to understand but may be related to previous points.
Finally, is there some guideline on whether to store all the functions used by a main code in the same directory as the main code, or not?
You can store them wherever you want, as long as the path for source is the right one. But it's generally a better practice to store all your functions in the same directory (or in a subfolder, eg /code, so that you just change your working directory once (or if you use RStudio's projects, you don't even need to bother, you just open the project), and as a side effect, as long as one is working in the same directory, the relative paths will still work. And thus you can share the folder with Dropbox or other, which ease collaboration.
Again, in the mid term or if many projects use the same source files, it's probably a good idea to write a package (for your own use, or to share on GitHub or CRAN or...)
I have written personal functions in R that are not specific to one (or a few) projects.
What are the best practices (in R) to put those kind of functions?
Is the best way to do it to have one file that gets sourced at startup? or is there a better (recommended) way to deal with this situation?
Create a package named "utilities" , put utility functions in that package, try to aim for one function per file, and store the package in a source control system (e.g., GIT, SVN ). It will save you time in the long run.
P.S. .Rprofile tends to get accidentally deleted.
If you have many, it would be good to make it into a package that you load each time you start working.
It is probably not a good idea to have a monolithic script with a bunch of functions. Instead break the file up into several files each of which either has only one function (my preference) or has a group of functions that are logically similar. That makes it easier to find things when you need to make changes.
Most people use the .Rprofile file for this. Here are two links which talk about this file in some detail.
http://www.statmethods.net/interface/customizing.html
http://blog.revolutionanalytics.com/2013/10/sample-rprofile.html
At the top of my .Rprofile file I call library() for the various libraries which I normally use. I also have some personal handy functions which I've come to rely on. Because this file is sourced on startup, they are available to me every session.
From my experience, a package will be the best choice for personal functions. Firstly I put all new functions into a personal package, which I called it My. When I find some functions was similar and are worth to become an independent package, I will create a new package and move them.
Rstudio has a great code completion feature. It provides a quick view of functions that start with a given string, as well as function and parameter definitions.
ESS is powerful enough, familiar to me, and integrated into Emacs, where I conduct most of my work - so I am hesitant to move, but this feature is making me consider such a move.
Is it possible to integrate this feature into Emacs ESS?
Is there anything similar to this for Emacs ESS?
Any hope that there will be (and if so, how could I support such an effort?)
You do get the completion thanks to the rcompgen package by Deepayan (now "promoted" into base R as part of the utils package). So when I type
lm(
and hit TAB a new buffer opens which gets me the left-hand side of your window above: the available options to the function at hand. I don't think you can show the help directly though.
There is / was also a way to get context-sensitive help in the mini-buffer when typing but I have forgottten how/where that gets turned on.
[EDIT: This is an old answer and auto-complete package dropped out of fashion since then. Please use company-mode instead. It should work by default. Wiki configuration entry is here.]
Recent versions of ESS (> v.12.02) integrate with auto-complete package out of the box (you need not configure anything, just install auto-complete). It provides help on arguments as well as function help. I added detailed instructions to the wiki
Ess-eldoc was also rewritten and from v.12.02 it's active by default, so you don't need to configure anything.
Or maybe we should all use search:
Emacs autocomplete-mode extension for ESS and R
I don't want to be grumpy, I found this few hours ago and I'm still shocked. It works like a charm. Though I still prefer the old-style pop-ups. =)
I have several custom functions that I use frequently in R. Rather than souce this file (or parts thereof) in each script, is there some way to add this to a base R file such that they are always available when I use R?
Yes, create a package. There are numerous tutorials as well as the Writing R Extensions manual that came with your copy of R.
It may seem like too much work at first, but you will probably be glad that you did this in the longer run.
PS And you can then load that package from ~/.Rprofile. For really short code, you can also define it there.
A package may be overkill for a for a few useful functions. I'd argue there's nothing wrong with explicitly source()ing them as you need them - at least it is explicit so that if you email someone your code, you won't forget to include those other scripts.
Another option is to use the .Rprofile file. You can read about the details in ?Startup. Basically, the idea is that:
...a file called ‘.Rprofile’ is searched for in the current directory or
in the user's home directory (in that order). The user profile file is
sourced into the workspace.
You can read here about how many people use this functionality.
The accepted answer is best long-term: Make a package.
Luckily, the learning curve for doing this has been dramatically reduced by the devtools package: It automates package creation (a nice assist in getting off on the right foot), encourages good practices (like documenting with roxygen2, and helps with using online version control (bitbucket, github or other), sharing your package with others. It's also very helpful for smoothing your way to CRAN submission.
Good docs at http://adv-r.had.co.nz and http://r-pkgs.had.co.nz .
to create your package, for instance you can:
install.packages("devtools")
devtools::create("path/to/package/pkgname")
You could also look at the 'mvbutils' package: it lets you set up a hierarchical set of "tasks" (folders with workspace ".RData" files in them) such that you can always see what's in the ancestral tasks (ie the ancestors are in the search() path). So you can put your custom functions in the "starting task" where you always start R; and then you change to vwhatever project-specific task you require, so you can avoid cluttered workspaces, but you'll still be able to use (and edit) your custom functions because the starting task is always ancestral. Objects (including functions) get stored in ".RData" files and are thus loaded/saved automatically, but there are separate text-backup facilities for functions.
There are lots of different ways of working in R, and no "one-size-fits-all" best solution. It's also not easy to find an overview! Speaking just for myself:
I'm not a fan of having to 'source' everything in every time; for one thing, it simply doesn't work with big data sets and/or results of model runs.
I think packages are hard to create and maintain; there is a really significant overhead. After the first 5 packages you write, it does get a bit easier provided you do it on at least a weekly basis so you don't forget how, but really...
In fact, 'mvbutils' also has a bunch of tools for facilitating the creation and (especially) maintenance of packages, designed to interface smoothly with the task-hierarchy system. I use & edit my own packages all the time (including editing mvbutils itself); but if it wasn't for the tools in 'mvbutils', I'd be grinding my teeth in frustration most days of the week.