How to keep modified/downloaded package - r

R noobie here.
I'm am trying to use a package that I download off of github using source_gist, but it appears that I need to re-download it every time I quit R (I'm using RStudio).
To clarify, the function that I'm using is part of plotrix package and is called barp. Someone made a modified version of it (called barp2) and put it up on github. That's what I want to use.
So my question is this: is there anyway to have this modified code saved inside the plotrix package, so I wouldn't have to download it every time?
I hope I'm explaining this correctly.

So, let's get some quick terminology straight: the function you're getting off of github isn't a package, it's just a single function. If it was a package, you could use devtools::install_github once and then load it with require() or library() like any other package.
A good solution isn't too different. Just go to the gist, copy the code, paste it into your R editor, and save it somewhere as a .R script file. Something like C:/path/to/barp2.R (adjusting, of course, based on where you actually want to keep it and based on your OS). Then you can read it locally using source("C:/path/to/barp2.R") instead of devtools::source_gist().
If you always want to load it, you could load plotrix and then source this file every time R starts with a couple lines in your R profile, see ?Startup as #BondedDust suggests for details on this.
Reading it off of github every time does have the advantage that, if the author fixes bugs or otherwise improves it, you'll always be using the up-to-date version. It has several disadvantages too: requiring an internet connection, losing access if the gist is deleted, or being unable to access old versions if the author changes it in a way you don't like. Keeping a copy of a version you like is a smart move.

Related

Is there an R function to make a copy of all the source code used to generate an analysis?

I have a file run_experiment.rmd which performs an analysis on data using a bunch of .r scripts in another folder.
Every analysis is saved into its own timestamped folder. I save the outputs of the analysis, the inputs used, and if possible I would also like to save the code used to generate the analysis (including the contents of both the .rmd file and the .r files).
The reason for this is because if I make changes to the way my analyses are run, then if I re-run the analysis using the new updated file, I will get different results. If possible, I would like to keep legacy versions of the code so that I can always, if need be, re-run the original analysis.
Have you considered using a git repository to commit your code and output each time you update/run it? I think this is the optimal solution for what you are describing. Each commit would have a timestamp associated with it for you to rollback to a previous version when needed.
The best way to do this is to put all of those scripts into an R package, and in your Rmd file, print sessionInfo() to record the package version used.
You should change the version number of the package each time you make a non-trivial change to it (or even better, with every change).
Then when you want to reproduce the analysis, you use the sessionInfo() listing to work out which version of R and of the packages to install, and you'll get the same environment.
There are packages to help with this (pak and renv, maybe others), but I haven't used them, so I can't give details or recommendations.

add and Rcpp file to an existing r Package?

I have already made a simple R package (pure R) to solve a problem with brute force then I tried to faster the code by writing the Rcpp script. I wrote a script to compare the running time with the "bench" library. now, how can I add this script to my package? I tried to add
#'#importFrom Rcpp cppFunction
on top of my R script and inserting the Rcpp file in the scr folder but didn't work. Is there a way to add it to my r package without creating the package from scratch? sorry if it has already been asked but I am new to all this and completely lost.
That conversion is actually (still) surprisingly difficult (in the sense of requiring more than just one file). It is easy to overlook details. Let me walk you through why.
Let us assume for a second that you started a working package using the R package package.skeleton(). That is the simplest and most general case. The package will work (yet have warning, see my pkgKitten package for a wrapper than cleans up, and a dozen other package helping functions and packages on CRAN). Note in particular that I have said nothing about roxygen2 which at this point is a just an added complication so let's focus on just .Rd files.
You can now contrast your simplest package with one built by and for Rcpp, namely by using Rcpp.package.skeleton(). You will see at least these differences in
DESCRIPTION for LinkingTo: and Imports
NAMESPACE for importFrom as well as the useDynLib line
a new src directory and a possible need for src/Makevars
All of which make it easier to (basically) start a new package via Rcpp.package.skeleton() and copy your existing package code into that package. We simply do not have a conversion helper. I still do the "manual conversion" you tried every now and then, and even I need a try or two and I have seen all the error messages a few times over...
So even if you don't want to "copy everything over" I think the simplest way is to
create two packages with and without Rcpp
do a recursive diff
ensure the difference is applied in your original package.
PS And remember that when you use roxygen2 and have documentation in the src/ directory to always first run Rcpp::compileAttributes() before running roxygen2::roxygenize(). RStudio and other helpers do that for you but it is still easy to forget...

Are there any good resources/best-practices to "industrialize" code in R for a data science project?

I need to "industrialize" an R code for a data science project, because the project will be rerun several times in the future with fresh data. The new code should be really easy to follow even for people who have not worked on the project before and they should be able to redo the whole workflow quite quickly. Therefore I am looking for tips, suggestions, resources and best-practices on how to achieve this objective.
Thank you for your help in advance!
You can make an R package out of your project, because it has everything you need for a standalone project that you want to share with others :
Easy to share, download and install
R has a very efficient documentation system for your functions and objects when you work within R Studio. Combined with roxygen2, it enables you to document precisely every function, and makes the code clearer since you can avoid commenting with inline comments (but please do so anyway if needed)
You can specify quite easily which dependancies your package will need, so that every one knows what to install for your project to work. You can also use packrat if you want to mimic python's virtualenv
R also provide a long format documentation system, which are called vignettes and are similar to a printed notebook : you can display code, text, code results, etc. This is were you will write guidelines and methods on how to use the functions, provide detailed instructions for a certain method, etc. Once the package is installed they are automatically included and available for all users.
The only downside is the following : since R is a functional programming language, a package consists of mainly functions, and some other relevant objects (data, for instance), but not really scripts.
More details about the last point if your project consists in a script that calls a set of functions to do something, it cannot directly appear within the package. Two options here : a) you make a dispatcher function that runs a set of functions to do the job, so that users just have to call one function to run the whole method (not really good for maintenance) ; b) you make the whole script appear in a vignette (see above). With this method, people just have to write a single R file (which can be copy-pasted from the vignette), which may look like this :
library(mydatascienceproject)
library(...)
...
dothis()
dothat()
finishwork()
That enables you to execute the whole work from a terminal or a distant machine with Rscript, with the following (using argparse to add arguments)
Rscript myautomatedtask.R --arg1 anargument --arg2 anotherargument
And finally if you write a bash file calling Rscript, you can automate everything !
Feel free to read Hadley Wickham's book about R packages, it is super clear, full of best practices and of great help in writing your packages.
One can get lost in the multiple files in the project's folder, so it should be structured properly: link
Naming conventions that I use: first, second.
Set up the random seed, so the outputs should be reproducible.
Documentation is important: you can use the Roxygen skeleton in rstudio (default ctrl+alt+shift+r).
I usually separate the code into smaller, logically cohesive scripts, and use a main.R script, that uses the others.
If you use a special set of libraries, you can consider using packrat. Once you set it up, you can manage the installed project-specific libraries.

How to keep various-topic R script at hand?

I have written and collected R code on various topics that solve particular problems at hand. I stored the R script/code in .txt files. I have now 100s of them.
How do you keep your R code at hand efficiently?
#Manetheran has the right idea: write a package. It's easy to do (especially with RStudio). Read "Writing R Extensions" and then on top of that learn about roxygen2 (which allows you to document each function in-line and avoid writing .Rd files).
Then you can use devtools to load your package locally, or once it's stable if you think other people can use the functions you can submit your package to CRAN.
I prefer to keep it simple. I use Total Commander and when I need an example which uses some R function, I just do Alt-F7 and search for *.R files which contain the desired string.
I use RStudio and have created two or three basic scripts. I save my much-used functions in the basic script that is most appropriate. Then, at the start of an RStudio script for a project, I source one or more of the basic scripts as is appropriate.

R2PPT crashes R; are there alternatives to R2PPT?

I am attempting to automate the insertion of JPEG images into Powerpoint. I have a macro done for that already, except using R would be infinitely better for my purposes.
The package R2PPT should do this, I understand. However, I cannot use it. For example, when I try to use PPT.Open, I understand I can do it two different ways by calling method = "rcom" or method = "RDCOMClient". Using the latter, R will always crash, sending an error report to windows. Using the former, it tells me I need to install statconnDCOM , before giving the error:
Error in PPT.Open(x) : attempt to apply non-function.
I cannot install statconnDCOM freely, as I wouldn't call this work non-commercial. So if there isn't a way to get around this issue, are there at least some free alternatives to R2PPT so that I can save several hours of manual work with a simple R code? If there is a way for me to use R2PPT, that would be ideal.
Thanks!
Edit:
I'm using R version 2.15 and downloaded the most recent version of R2PPT. Powerpoint is 2007.
Do you have administrative privileges on this machine?
There is an issue with package RDCOMClient. It needs permissions to write file rdcom.err in the root of drive C:. If you don't have privileges to write to c:, there is a rather cumbersome workaround:
Close R
Create "c:\temp" folder if it doesn't exist.
Locate on your hard drive file rdcomclient.dll. It usually placed in \R\library\RDCOMClient\libs\i386\ and in \R\library\RDCOMClient\libs\x64\ (you need to patch file which corresponds your Windows version - 32 bit or 64 bit). It's recommended to make backup copy of this files before patching.
Open rdcomclient.dll in text editor (Notepad++, for example -http://notepad-plus-plus.org/)
Find in file string c:\rdcom.err - it occurs only once.
Go into overwrite mode (usually by pressing "Ins" key). It is very important that new path will have the same number of characters as original one. Type C:\temp\e.rr instead of c:\rdcom.err
Save the file.
Now all should work fine.
Arguably not an answer, but have you looked at using Sweave/knitr to render your presentations in LaTeX using something like Beamer? (As discussed on slide 17 here.)
Wouldn't help any with getting JPGs into a PowerPoint, but would certainly make putting R-output (numerical or graphical) into a presentation much easier!
Edit: if you want to use knitr (which I recommend), here's another reference.

Resources