How to put an r script into a package - r

I'm writing my first R package and have made a successful build with documentation using roxygen2 and added data sets.
However, I would also like ship an example script with how I use the functions in the r package. But I don't know where to put it.
Let's say I have created MyPackage. I have put my function scripts in the /R folder. Let's say I have:
foo1.R
foo2.R
foo3.R
Somewhere I'd also like to put a script with my workflow. Let's say I have a file, MyWorkflow.R:
library(MyPackage)
load(file='inData.R') # Loads indata variables A, B and C
X=foo1(A)
Y=foo2(X,B)
Z=foo3(Y,C)
Can I do this? If so, where do I put it? Is it an OK procedure - or generally frowned upon?
Any help or thoughts are appreciated.
Thanks.
Carl
Edit:
I looked at the link on demo/ and exec/, but didn't understand the exec/ folder thing. Grateful if you could clarify/exemplify/point to good uses of...
If I understand correctly, I'm not looking for an example or demo/, since the script won't necessarily be executable without tweaking by the user (e.g. to provide input data or paths). I "just" want to add an example script showing how I work with these functions.
I realise I should probably dive into the world of vignettes, but have difficulty in finding the time/oomph/energy to do so.
I also saw that there's the inst/ folder. Could you shed some light on the different uses of these options or hint at good examples of where they've been used (I often find examples more informative than reading an explanatory text that's above my level - I often get the feeling of being like a dog looking at a ceiling fan ;)
Will add info to the GitHub README. Thx for good suggestion!

Created inst/Workflow_Example/workflow.R. Upon build & reload, a Workflow_Example folder was created in the library with workflow.R script in it.
In combination with an explanatory remark in the README, this looks like what I was after. Problem solved or am I not seeing something obvious? Am I e.g. violating conventions/conduct/good practice?

You could either put it in demo/ or exec/ depending on the format of the script. See here for more details. I would mention the workflow and where it lives in the README regardless, and if you host your code on Github, you could create a wiki to describe the workflow and place the script there. This would be similar to what nrussell has mentioned in a comment above.

Related

Are there any good resources/best-practices to "industrialize" code in R for a data science project?

I need to "industrialize" an R code for a data science project, because the project will be rerun several times in the future with fresh data. The new code should be really easy to follow even for people who have not worked on the project before and they should be able to redo the whole workflow quite quickly. Therefore I am looking for tips, suggestions, resources and best-practices on how to achieve this objective.
Thank you for your help in advance!
You can make an R package out of your project, because it has everything you need for a standalone project that you want to share with others :
Easy to share, download and install
R has a very efficient documentation system for your functions and objects when you work within R Studio. Combined with roxygen2, it enables you to document precisely every function, and makes the code clearer since you can avoid commenting with inline comments (but please do so anyway if needed)
You can specify quite easily which dependancies your package will need, so that every one knows what to install for your project to work. You can also use packrat if you want to mimic python's virtualenv
R also provide a long format documentation system, which are called vignettes and are similar to a printed notebook : you can display code, text, code results, etc. This is were you will write guidelines and methods on how to use the functions, provide detailed instructions for a certain method, etc. Once the package is installed they are automatically included and available for all users.
The only downside is the following : since R is a functional programming language, a package consists of mainly functions, and some other relevant objects (data, for instance), but not really scripts.
More details about the last point if your project consists in a script that calls a set of functions to do something, it cannot directly appear within the package. Two options here : a) you make a dispatcher function that runs a set of functions to do the job, so that users just have to call one function to run the whole method (not really good for maintenance) ; b) you make the whole script appear in a vignette (see above). With this method, people just have to write a single R file (which can be copy-pasted from the vignette), which may look like this :
library(mydatascienceproject)
library(...)
...
dothis()
dothat()
finishwork()
That enables you to execute the whole work from a terminal or a distant machine with Rscript, with the following (using argparse to add arguments)
Rscript myautomatedtask.R --arg1 anargument --arg2 anotherargument
And finally if you write a bash file calling Rscript, you can automate everything !
Feel free to read Hadley Wickham's book about R packages, it is super clear, full of best practices and of great help in writing your packages.
One can get lost in the multiple files in the project's folder, so it should be structured properly: link
Naming conventions that I use: first, second.
Set up the random seed, so the outputs should be reproducible.
Documentation is important: you can use the Roxygen skeleton in rstudio (default ctrl+alt+shift+r).
I usually separate the code into smaller, logically cohesive scripts, and use a main.R script, that uses the others.
If you use a special set of libraries, you can consider using packrat. Once you set it up, you can manage the installed project-specific libraries.

Atom editor: list and jump to definition(s) in project

As already mentioned I'm using the Atom text editor.
I'm currently working on a project written in c++. Of course it is desirable to jump to the definition of a function (in another project file), or other uses of this function (within the project). As far as I know this can be achieved with the packages I'll mention below. I want the package to display me the definition along with the path to the corresponding file which holds the definition and ideally the line where it occurs.
I'll welcome any comments and suggestions on how to solve the below mentioned problem(s) I have with (one of) the packages. Moreover I'm also thankful about pointers to possible solutions or posts concerning my problem(s), or how I can achieve this with another package.
Here is what I found / tried / did so far.
goto
Currently I'm using this package, although it is rather slow and does not show the arguments of the function as e.g. atom-ctags does, but it's the only package which displays me the files I need to see.
It shows me where the function is defined as well as where it is also used. However it does not show me the path to the file corresponding file it refers to.
atom-ctags
I also tried this package, building the tags is quite fast and moreover it show me the path to the file. But this package only lists the .cc files and not the .h files. It appears to me as if it only shows me the other uses but not the definition, which is obviously a problem.
I also tried generating the ctags myself and changing the command options in the settings of the package, unfortunately without any success.
Atoms built-in symbols-view
In order to get this to work, one needs to generate the symbols. This can be, for example, achieved with the symbol-gen package. However, it shows me some of the definitions, but also no .h files. Moreover, jumping to the definition results in a Selected file does not exist., therefore it is not usable at all.
goto-definition
Just for completeness, there is also this package. It does not work for me, since c++ is not supported but maybe others will find it useful.
symbols-plus
Again, for completeness, this should be a replacement for the atom built-in, but when disabling the build-in it does not show me any jump functionality nor is a short cut mentioned.
So, basically, nothing really works well. I have tried Symbol Tree View but it but barely works.

Sourcing So many R scripts

I have lots of .r scripts that I want to source all. I have written a function like the one below to source.
sourcer=function(){
source("wil.r")
source("k.r")
source("l.r")
}
Please can any one tell me how to get this codes activated and how to call each one any time I want to use it?
In addition to the answer by #user2885462, if the amount of R code you need to source becomes bigger, you might want to wrap the code into an R package. This provides a convenient way of loading the code, and allows you to add tests, documentation, etc. Reading the official package writing tutorial is a good place to start for that.
For an individual project, I like to have all (or most) of my R functions in separate .r files, all in the same folder: e.g., AllFunctions
Then at the beginning of my main code I run the following line of code, which sources all .r (and other extensions if they exist - which they usually don't) in the AllFunctions folder:
for (nm in list.files("AllFunctions", pattern = ".[RrSsQq]$")) source(file.path("AllFunctions", nm))

Using glmm_funs.R?

I am fitting a GLMM and I had seen some examples where is used the function: overdisp_fun, defined in glmm_funs.R, but I don't know which package contain them or how can I call it from R, can somebody help me?
Thanks,
If you google for glmm_funs.R, you'll find links to the script (eg here: http://glmm.wdfiles.com/local--files/trondheim/glmm_funs.R).
You can save the file on your local machine, then call it in your R session with source("path to file/glmm_funs.R").
You will then be able to use the functions contained in the script, including overdisp_fun().
You can think of it a little bit like loading a package, except the functions are just presented in a script.

Using org-mode to structure an analysis

I am trying to make better use of org-mode for my projects. I think literate programming is especially applicable to the realm of data analysis and org-mode lets us do some pretty awesome literate programming.
I think most of you will agree with me that the workflow for writing an analysis is different than most other types of programming. I don't just write a program, I explore the data. And, while many of these explorations are dead-ends, I don't want to delete/ignore them completely. I just don't want to re-run them every time I execute the org file. I also tend to find or develop chunks of useful code that I would like to put into an analytic template, but some of these chunks won't be relevant for every project and I'd like to know how to make org-mode ignore these chunks when I am executing the entire buffer. Here's a simplified example.
* Import
- I want org-mode to ignore import-sql.
#+srcname: import-data
#+begin_src R :exports none :noweb yes
<<import-csv>>
#+end_src
#+srcname: import-csv
#+begin_src R :exports none
data <- read.csv("foo-clean.csv")
#+end_src
#+srcname: import-sql
#+begin_src R :exports none
library(RSQLite)
blah blah blah
#+end_src
* Clean
- This is run on foo.csv, producing foo-clean.csv
- Fixes the mess of -9 and -13 to NA for my sanity.
- This only needs to be run once, and after that, reference.
- How can I tell org-mode to skip this?
#+srcname: clean-csv
#+begin_src sh :exports none
sed .....
#+end_src
* Explore
** Explore by a factor (1)
- Dead end. Did not pan out. Ignore.
- Produces a couple of charts showing there is not interaction.
#+srcname: explore-by-a-factor-1
#+begin_src R :exports none :noweb yes
#+end_src
** Explore by a factor (2)
- A useful exploration that I will reference later in a report.
- Produces a couple of charts showing the interaction of my variables.
#+srcname: explore-by-a-factor-2
#+begin_src R :exports none :noweb yes
#+end_src
I would like to be able to use org-babel-execute-buffer and have org-mode somehow know to skip over the code blocks import-sql, clean-csv and explore-by-a-factor-1. I want them in the org file, because they are relevant to the project. After-all, tomorrow someone might want to know why I was so sure explore-by-a-factor-1 was not useful. I want to keep that code around, so I can bang out the plot or the analysis or what-ever and go on, but not have it run every-time I rerun everything because there's no reason to run it. Ditto with the clean-csv stuff. I want it around, to document what I did to the data (and why), but I don't want to re-run it every time. I'll just import foo-clean.csv.
I Googled all over this and read a bunch of org-mode mailing list archives and I was able to find a couple of ideas, but not what I want. EXPORT_SELECT_TAGS, EXPORT_EXCLUDE_TAGS are great, when exporting the file. And the :tangle header works well, when creating the actual source files. I don't want to do either of these. I just want to execute the buffer. I would like to be able to define code blocks in a similar fashion to be executed or ignored. I guess I would like to find a way to have an org variable such as:
EXECUTE_SELECT_TAGS
This way I could simply tag my various code blocks and be done with it. It would be even nicer if I could then run the file, using only source blocks with specific tags. I can't find a way to do this and I thought I would ask before asking/begging for a new feature in org-mode.
I figured out. From the Org manual (since updated):
The :eval header argument can be used to limit the evaluation of specific code blocks. :eval accepts two arguments “never” and “query”. :eval never will ensure that a code block is never evaluated, this can be useful for protecting against the evaluation of dangerous code blocks. :eval query will require a query for every execution of a code block regardless of the value of the org-confirm-babel-evaluate variable.
So you just have to add :eval never to the header of the blocks that you don’t want to execute, and voilá!
While I never did get an answer to my question, the discussion was interesting and apparently an org-mode based Template for R strikes a few people as an interesting idea. I downloaded the source code to org-mode and looked at org-babel-execute-buffer. It is, as I feared, a naive function which does precisely what it says it does and nothing more. It is not (currently) possible to pass it any additional parameters to affect it's behavior. (Unless I am badly misreading the lisp, which is entirely possible.)
Eventually, I decided org-babel-execute-buffer is not necessary for a useful R template system. Babel's noweb functionality is really flexible and I think it is possible to build a workable solution using noweb, rather than trying to develop a complex tagging schema to define how/when to run things.
For tangling/export it should still be possible to use tags to create usable/sane output.
For anyone who is interested: LiterateR
It's probably a little rude to use this thread to put this out there but this is why I asked the question in the first place. TemplateR is my attempt to make R a little easier to use. Right now it is just a template with two simplistic functions. I consider it to be a proof of concept at this point. Eventually, I want to develop something that does more to help people develop R projects more quickly. TemplateR will accomplish this by:
1. Provide a strong structure to develop around.
2. Provide built-in function to provide support for common tasks, especially in the realm of reproducible research.
3. Provide snippets of tested code that can be rapidly re-purposed for the current project.
Right now, all it provides is a basic structure/framework and two simple functions.
1. Identify which R packages are missing (based on what is manually entered into a table) and
2. Creates project directories (plots, data, reports).
More will come in future versions. The README.org and TODO.org go into further detail.

Resources