Is it possible to get RStudio to show function arguments and descriptions for custom functions? - r

The code completion in RStudio is great, and I really like how a popover appears to describe the arguments for the function inputs. For example, if one types matrix( and then presses "tab", a list of arguments for the matrix() function appears, along with a description of the input. Say, nrow= is selected, then the adjoining window describes the nrow input as "the desired number of rows.".
Can I get RStudio to do this for my custom functions? Would I have to create a package to achieve this effect?
Say I have a file full of custom functions, myCustomFunctions.R, and I store all my miscellaneous helper functions in there. I want to be able to add meta data for my functions so that this helper window also describes my function inputs.

To add to Hadley's answer in the comments, Rstudio is mining specific portions of the help files to generate the helper window. Specifically, tabbing before the parentheses brings up the "Usage" and "Description" sections and tabbing inside the parentheses or after a comma brings up the "Arguments" section. Therefore, not only does a package need to be made, but the help files must be generated to take advantage of this feature.

Following up Hadley: even if the functions are only for your own use, it's worth package-izing them. You will then get a lot of useful stuff for free, above and beyond the package documentation system: things like version control, unit testing, portability, shareability ... I could go on. There's a small potential barrier which you have to get over before you can get back to the fun part (i.e. hacking your own neat stuff), but it's worth the investment of time.
Hadley has public-spiritedly put his Packages book online with step by step descriptions of how to access all the goodies I've mentioned. Hopefully you'll decide it's worth paying for (I did).

Related

Automated update of data documentation when this data is updated

I am creating a package and including data of experiments. The experiments are ongoing and therefore the data is going to change in the future.
I am using roxygen2 for documentation and would like to include a description like the following:
#' Data_foo contains the experimental data of [x] participants
In future, 'x' is going to change. Of course, I can always manually update this - not a very big problem - or I could simply decide not to document like this.
But I simply wondered if this is generally possible, and if - how (e.g., might there be a markdown-like inline-code way?)
Yes, the Rd format (which Roxygen produces) supports that. Use syntax like \Sexpr{nrow(foo_package::Data_foo)} where you want [x] to appear.

Persisting link to a help page after using function wrapper

I've started programming in R not so long ago, and I faced some discomfort with names of 3rd party functions. Yes, they have documentation, but sometimes a name of a function is abbreviated to the point of non-readability.
I'd like to write a function wrapper for every function, which name is prone to be completely unreadable. And I still want to be able to call help (i.e. by pressing F1 in RStudio).
Is there any way to do it?
I feel your pain but I think that writing wrappers is generally not a good idea because for anybody but you1 this will make the code less readable, even if they’re not familiar with the original library. For one thing, your functions will be completely unsearchable (on Google etc).
In addition, as commented, R doesn’t allow non-packaged functions to carry documentation. You’d need to generate your own wrapper package. Then, inside the package, you can define your aliases as follows:
#' #importFrom original_package orig_fct1
readable_function1 = orig_fct1
#' #importFrom original_package orig_fct2
readable_function2 = orig_fct2
# … etc
That is, you don’t need to (and should not) generate wrapper functions; it is sufficient and better to just define aliases. The #importFrom Roxygen directive causes the documentation to be inherited from the original function.
1 And “anybody but you” includes yourself in a few months’ time!

Model namespaces in code completion with R - or how to organize R code

This is more of a general code structuring question.
At the moment I try to write my code into "namespaces". So for example, I would have:
Mine.FancyPlot.Plot(...)
Mine.FancyPlot.Impl.PlotCanvas(...)
Mine.FancyPlot.Impl.PlotLegend(...)
Mine.BasicPlot.Plot(...)
Mine.BasicPlot.Impl.PlotCanvas(...)
Mine.BasicPlot.Impl.PlotLegend(...)
Mine.BasicPlot.Impl.PlotLines(...)
The idea is that I am trying to hide away "private" functions in a "Impl" for implementation namespace. So outside of Mine_FancyPlot.R I wouldn't call Mine.FancyPlot.Impl functions.
This approach works reasonably well, except code completion isn't as nice as it could be.
To begin with, when I type Mine.BasicPlot. and hit TAB, I get all functions, including the Impl functions, and because I is before P, they even hide the "public" user functions.
So I started changing the structure to
MyPub.FancyPlot.Plot(...)
MyPriv.FancyPlot.PlotCanvas(...)
MyPriv.FancyPlot.PlotLegend(...)
MyPub.BasicPlot.Plot(...)
MyPriv.Mine.BasicPlot.PlotCanvas(...)
MyPriv.Mine.BasicPlot.PlotLegend(...)
MyPriv.Mine.BasicPlot.PlotLines(...)
This works better in that "private" functions are no longer predicted. However, I still have the issue that if I type MyPub. and hit TAB, I can't actually see all different "namespaces" (such as I would in Java, C++, ...), but rather a long list of functions starting all in the first "namespace".
Ideally, I'd like code completion in R to cut off all predictions at the next dot, and unique them, so Ideally when I type MyPub. and hit TAB, I would only get a list of "sub-namespaces" and functions in MyPub.
Is this possible? Can the code prediction be altered to reflect this behaviour? Or is there a better way to achieve what I am aiming for?
You should consider putting your functions in a package to organise them. Functions that are not exported will only be accessible by doing 'package:::functionNotExported' and will not be listed when just doing 'functionNotExpo[tab]'
See for instance debugging a function in R that was not exported by a package

Strategies for repeating large chunk of analysis

I find myself in the position of having completed a large chunk of analysis and now need to repeat the analysis with slightly different input assumptions.
The analysis, in this case, involves cluster analysis, plotting several graphs, and exporting cluster ids and other variables of interest. The key point is that it is an extensive analysis, and needs to be repeated and compared only twice.
I considered:
Creating a function. This isn't ideal, because then I have to modify my code to know whether I am evaluating in the function or parent environments. This additional effort seems excessive, makes it harder to debug and may introduce side-effects.
Wrap it in a for-loop. Again, not ideal, because then I have to create indexing variables, which can also introduce side-effects.
Creating some pre-amble code, wrapping the analysis in a separate file and source it. This works, but seems very ugly and sub-optimal.
The objective of the analysis is to finish with a set of objects (in a list, or in separate output files) that I can analyse further for differences.
What is a good strategy for dealing with this type of problem?
Making code reusable takes some time, effort and holds a few extra challenges like you mention yourself.
The question whether to invest is probably the key issue in informatics (if not in a lot of other fields): do I write a script to rename 50 files in a similar fashion, or do I go ahead and rename them manually.
The answer, I believe, is highly personal and even then, different case by case. If you are easy on the programming, you may sooner decide to go the reuse route, as the effort for you will be relatively low (and even then, programmers typically like to learn new tricks, so that's a hidden, often counterproductive motivation).
That said, in your particular case: I'd go with the sourcing option: since you plan to reuse the code only 2 times more, a greater effort would probably go wasted (you indicate the analysis to be rather extensive). So what if it's not an elegant solution? Nobody is ever going to see you do it, and everybody will be happy with the swift results.
If it turns out in a year or so that the reuse is higher than expected, you can then still invest. And by that time, you will also have (at least) three cases for which you can compare the results from the rewritten and funky reusable version of your code with your current results.
If/when I do know up front that I'm going to reuse code, I try to keep that in mind while developing it. Either way I hardly ever write code that is not in a function (well, barring the two-liners for SO and other out-of-the-box analyses): I find this makes it easier for me to structure my thoughts.
If at all possible, set parameters that differ between sets/runs/experiments in an external parameter file. Then, you can source the code, call a function, even utilize a package, but the operations are determined by a small set of externally defined parameters.
For instance, JSON works very well for this and the RJSONIO and rjson packages allow you to load the file into a list. Suppose you load it into a list called parametersNN.json. An example is as follows:
{
"Version": "20110701a",
"Initialization":
{
"indices": [1,2,3,4,5,6,7,8,9,10],
"step_size": 0.05
},
"Stopping":
{
"tolerance": 0.01,
"iterations": 100
}
}
Save that as "parameters01.json" and load as:
library(RJSONIO)
Params <- fromJSON("parameters.json")
and you're off and running. (NB: I like to use unique version #s within my parameters files, just so that I can identify the set later, if I'm looking at the "parameters" list within R.) Just call your script and point to the parameters file, e.g.:
Rscript --vanilla MyScript.R parameters01.json
then, within the program, identify the parameters file from the commandArgs() function.
Later, you can break out code into functions and packages, but this is probably the easiest way to make a vanilla script generalizeable in the short term, and it's a good practice for the long-term, as code should be separated from the specification of run/dataset/experiment-dependent parameters.
Edit: to be more precise, I would even specify input and output directories or files (or naming patterns/prefixes) in the JSON. This makes it very clear how one set of parameters led to one particular output set. Everything in between is just code that runs with a given parametrization, but the code shouldn't really change much, should it?
Update:
Three months, and many thousands of runs, wiser than my previous answer, I'd say that the external storage of parameters in JSON is useful for 1-1000 different runs. When the parameters or configurations number in the thousands and up, it's better to switch to using a database for configuration management. Each configuration may originate in a JSON (or XML), but being able to grapple with different parameter layouts requires a larger scale solution, for which a database like SQLite (via RSQLite) is a fine solution.
I realize this answer is overkill for the original question - how to repeat work only a couple of times, with a few parameter changes, but when scaling up to hundreds or thousands of parameter changes in ongoing research, more extensive tools are necessary. :)
I like to work with combination of a little shell script, a pdf cropping program and Sweave in those cases. That gives you back nice reports and encourages you to source. Typically I work with several files, almost like creating a package (at least I think it feels like that :) . I have a separate file for the data juggling and separate files for different types of analysis, such as descriptiveStats.R, regressions.R for example.
btw here's my little shell script,
#!/bin/sh
R CMD Sweave docSweave.Rnw
for file in `ls pdfs`;
do pdfcrop pdfs/"$file" pdfs/"$file"
done
pdflatex docSweave.tex
open docSweave.pdf
The Sweave file typically sources the R files mentioned above when needed. I am not sure whether that's what you looking for, but that's my strategy so far. I at least I believe creating transparent, reproducible reports is what helps to follow at least A strategy.
Your third option is not so bad. I do this in many cases. You can build a bit more structure by putting the results of your pre-ample code in environments and attach the one you want to use for further analysis.
An example:
setup1 <- local({
x <- rnorm(50, mean=2.0)
y <- rnorm(50, mean=1.0)
environment()
# ...
})
setup2 <- local({
x <- rnorm(50, mean=1.8)
y <- rnorm(50, mean=1.5)
environment()
# ...
})
attach(setup1) and run/source your analysis code
plot(x, y)
t.test(x, y, paired = T, var.equal = T)
...
When finished, detach(setup1) and attach the second one.
Now, at least you can easily switch between setups. Helped me a few times.
I tend to push such results into a global list.
I use Common Lisp but then R isn't so different.
Too late for you here, but I use Sweave a lot, and most probably I'd have used a Sweave file from the beginning (e.g. if I know that the final product needs to be some kind of report).
For repeating parts of the analysis a second and third time, there are then two options:
if the results are rather "independent" (i.e. should produce 3 reports, comparison means the reports are inspected side by side), and the changed input comes in the form of new data files, that goes into its own directory together with a copy of the Sweave file, and I create separate reports (similar to source, but feels more natural for Sweave than for plain source).
if I rather need to do the exactly same thing once or twice again inside one Sweave file I'd consider reusing code chunks. This is similar to the ugly for-loop.
The reason is that then of course the results are together for the comparison, which would then be the last part of the report.
If it is clear from the beginning that there will be some parameter sets and a comparison, I write the code in a way that as soon as I'm fine with each part of the analysis it is wrapped into a function (i.e. I'm acutally writing the function in the editor window, but evaluate the lines directly in the workspace while writing the function).
Given that you are in the described situation, I agree with Nick - nothing wrong with source and everything else means much more effort now that you have it already as script.
I can't make a comment on Iterator's answer so I have to post it here. I really like his answer so I made a short script for creating the parameters and exporting them to external JSON files. And I hope someone finds this useful: https://github.com/kiribatu/Kiribatu-R-Toolkit/blob/master/docs/parameter_configuration.md

Finding What You Need in R: focused searching within R and all (3,500+) CRAN Packages

Often in R, there are a dozen functions scattered across as many packages--all of which have the same purpose but of course differ in accuracy, performance, documentation, theoretical rigor, and so on.
How do you locate these--from within R and even from among the CRAN Packages which you have not installed?
So for instance: the generic plot function. Setting secondary ticks is much easier using a function outside of the base package:
minor.tick(nx=n, ny=n, tick.ratio=n)
Of course plot is in R core, but minor.tick is not, it's actually in Hmisc.
Of course, that doesn't show up in the documentation for plot, nor should you expect it to.
Another example: data-input arguments to plot can be supplied by an object returned from the function hexbin, again, this function is from a library outside of R core.
What would be great obviously is a programmatic way to gather these function arguments from the various libraries and put them in a single namespace?
*edit: (trying to re-state my example just above more clearly:) the arguments to plot supplied in R core, e.g., setting the axis tick frequency are xaxp/yaxp; however, one can also set a/t/f via a function outside of the base package, again, as in the minor.tick function from the Hmisc package--but you wouldn't know that just from looking at the plot method signature. Is there a meta function in R for this?*
So far, as i come across them, i've been manually gathering them, each set gathered in a single TextMate snippet (along with the attendant library imports). This isn't that difficult or time consuming, but i can only update my snippet as i find out about these additional arguments/parameters. Is there a canonical R way to do this, or at least an easier way?
Just in case that wasn't clear, i am not talking about the case where multiple packages provide functions directed to the same statistic or view (e.g., 'boxplot' in the base package; 'boxplot.matrix' in gplots; and 'bplots' in Rlab). What i am talking is the case in which the function name is the same across two or more packages.
The "sos" package is an excellent resource. It's primary interface is the "findFn" command, which accepts a string (your search term) and scans the "function" entries in Johnathan Baron's site search database, and returns the entries that contain the search term in a data frame (of class "findFn").
The columns of this data frame are: Count, MaxScore, TotalScore, Package, Function, Date, Score, Description, and Link. Clicking on "Link" in any entry's row will immediately pull up the help page.
An example: suppose you wanted to find all convolution filters across all 1800+ R packages.
library(sos)
cf = findFn("convolve")
This query will look the term "convolve", in other words, that doesn't have to be the function name.
Keying in "cf" returns an HTML table of all matches found (23 in this case). This table is an HTML rendering of the data frame i mentioned just above. What is particularly convenient is that each column ("Count", "MaxScore", etc.) is sortable by clicking on the column header, so you can view the results by "Score", by "Package Name", etc.
(As an aside: when running that exact query, one of the results was the function "panel.tskernel" in a package called "latticeExtra". I was not aware this package had any time series filters in it and i doubt i would have discovered it otherwise.
Your question is not easy to answer. There is not one definitive function.
formals is the function that gives the named arguments to a function and their defaults in a named list, but you can always have variable arguments through the ... parameter and hidden named arguments with embedded hadArg function. To get a list of those you would have to use a getAnywhere and then scan the expression for the hasArg. I can't think of a automatic way of doing it yourself. That is if the functions hidden arguments are not documented.

Resources