How to view the contents of parsed R functions? - r

R is a functional programming language, and one of it's primary benefits is it's ability to create open and transparent functions.
As John Chambers says in his excellent book "Software for Data Analysis: Programming with R":
Computations are organized around functions, which can encapsulate specific, meaningful computational results, with implementations that can be examined for correctness.
Notions such as "reproducible research" and "trustworthy software" are at the heart of R development. In general, it is easy to examine a function just by typing its name without parenthesis. For instance:
> which
function (x, arr.ind = FALSE)
{
if (!is.logical(x))
stop("argument to 'which' is not logical")
wh <- seq_along(x)[x & !is.na(x)]
dl <- dim(x)
...
My question is: how do you example the contents of functions such as for() or if() without downloading the R source code?
Edit: Incidentally, I understand that this won't help viewing compiled code (such as C, C++, or Java) that may be called from R. I'm really wondering if there is an R function which output's R functions.

As the saying goes: Use the source, Luke!
R functions may be visible at the prompt, but are still parsed meaning that comments are stripped, code indentation is changed etc. So I would always go to the source.

I also like going to the source for the same reasons mentioned by Dirk, but for quick and dirty work I also rely on subsets of:
getAnywhere(), getFromNameSpace(), THEPACKAGE:::thefn, getGeneric()
to see functions which are not exported by the namespace of their package.

Related

unrecognized function Nn in R

I am learning R package SimInf to simulate data-driven stochastic epidemiological models. As I was reading the documentation I came across an unrecognized funcion Nn when defining a function for epicurves. Specifically, this line:
j <- sample(seq_len(Nn(model)), 1)
Values of model are integers. My guess is that Nn selects non-negative values, however my R does not recognize this function. From documentation it does not look like they pre-defined Nn either. Can someone please tell if they know what "Nn" is for? Thank you.
A way to go is always taking the package-name and triple-":" it, such that you can find nearly all functions inside the package. Maybe you are familiar with namespacing a function via packageName::functionFrompackageTocall. The packageName::: shows (nearly) all functions defined in this package. If you do this in R-Studio with SimInf:: and SimInf:::, you will see that the latter gives much more functions. But you can only find the functions SimInf:::Nd and SimInf:::Nc, not the Nn-function. Hence you will have to go to the github-sources of the package, in this case https://github.com/stewid/SimInf .Then search for Nn the whole repository. You will see that it seems like it is always an int, but this doesn't help you since you want to get ii as a function, not as a variable. Scrolling further down in the search-results, you will find the NEWS.md-file which mentions The 'Nn' function to determine the number of nodes in a model has been replaced with the S4 method 'n_nodes'. in the https://github.com/stewid/SimInf/blob/fd7eb4a29b82a4a97f64b528bb0e78e5474aa8a5/NEWS.md file under SimInf 8.0.0 (2020-09-13). Hence having a current version of SimInf installed, it shouldn't use the method Nn anymore. If you use it in your code, replace it by n_nodes. If you find it in current package code, you can email the package-maintainer that you found a bug in his code.
TLDR: Nn is an outdated version of n_nodes

R Development: Use of `::` Operator for `base` Package

TLDR
Does rigorous best practice recommend that a proactive R developer explicitly disambiguate all base functions — even the ubiquitously common functions like c() or cat() — within the .R files of their package, using the package::function() convention?
Context
Though a novice developer, I need to create a (proprietary) package in R. The text R Packages (by the authoritative authors Hadley Wickham and Jenny Bryan) has proven extremely helpful (if occasionally deprecated).
I am keen on following best practices from the start, so as to save myself time and effort down the road. As described in the text, the use of the :: operator can prevent current and
future conflicts, by disambiguating functions whose names are overloaded. To wit, the authors are careful to introduce each function with the package::function() convention, and they recommend its general use within the .R files of one's package.
However, their code examples often call functions that hail from the base package yet are unaccompanied by base::. Many base functions, like the ubiquitous c() or cat(), are used by R programmers in their sleep and (I imagine) are unlikely to ever be overloaded by a presumptuous developer. Nonetheless, it is confusing to see (for example) the juxtaposition of base::with() against (the base function) print(), all within a few lines of text.
...(These functions are inspired by how base::with() works.)
f <- function(x, sig_digits) {
# imagine lots of code here
withr::with_options(
list(digits = sig_digits),
print(x)
)
# ... and a lot more code here
}
I understand that the purpose of base::with() is to unambiguously introduce the with() function to the reader. However, the absence of base:: (within the code itself) seems to stick out like a sore thumb, when the package is explicitly named for any function called from any other package. Given my inexperience, I don't feel comfortable assuming the authors' intent.
Question
Are the names of base functions sufficiently unique that using this convention — of calling base::function() for every function() within the base package — would not be worth it? That the risk of overloading the functions (at some point in the future) is far outweighed by the inconvenience (and sheer ugliness) of
my_vector <- base::c(1, 2, 3)
throughout one's .R files? If not, is there an established convention that would balance unambiguity with elegance?
As always, I am grateful for any help, especially on this, my first post to Stack Overflow.

Optimization R code - Rcpp

In addition to benchmarking functions, is there any tool in R so we can fetch the biggest bottlenecks in an R code?
I often get very undecided about the computational gain I will get when rewriting the R code in C ++. For example, in a bootstrap where each iteration needs to do an optimization, I do not know if it is useful to use the GSL library to do an optimization of a log-likelihood function, since the optim language function R uses the stats.so file. I noticed this doing stats ::: C_optim.
> stats:::C_optim
$name
[1] "optim"
$address
<pointer: 0x1cb34e0>
attr(,"class")
[1] "RegisteredNativeSymbol"
$dll
DLL name: stats
Filename: /usr/lib/R/library/stats/libs/stats.so
Dynamic lookup: FALSE
$numParameters
[1] 7
attr(,"class")
[1] "ExternalRoutine" "NativeSymbolInfo"
Looking at the body of the optim function (edit(optim)), I see that there is the import of efficient functions implemented in C. For example, there is:
.External2(C_optim, par, fn1, gr1, method, con, lower,
upper)
Doubt: To Rcpp users, in your projects, do you normally try to implement all your C++ functions or implement a set of small C++ functions to be used in an R function?
I know it's a pretty general question, but all the functions I use Rcpp always try to implement C++ function from scratch. I felt that I'm programming more in C++ than in R. I sometimes think that I need to program directly in C++.
R has many characteristics that make the language slow for various tasks. I always try to avoid loops and give way to the use of the apply family of functions. However, I often find the R very slow. That way, because I'm very undecided on what's worth optimizing, I end up implementing everything in C++.
If you (generally) code faster in R and feel like writing to much C++ code, I suggest the following approach:
Implement your solution in R.
Only if the R solution is not fast enough, try to optimize it.
The first step in optimization is measuring the performance, i.e. profile your code.
Once you have identified the bottlenecks you can improve those using, better R code or compiled code.
With experience you might be able to cut some corners, i.e. know from the beginning that some things in your problem will require compiled code. But that really depends on the kind of problems you are working on.

Importing functions in R

In Python we have chance to import a certain function from a library with a command "import function from library as smth. Do we have something similar in R?
I know that we can call the function like "library::function()", my question mostly refers to the "as" part.
It is not common and not necessary to do this in R. The assignment operator <- can be used to give a new name to an existing function. For example, one could define a function that does exactly the same as lubridate's, year() function with:
asYear <- lubridate::year
One could argue that, by doing so, the year() function has been "imported" from the lubridate package and that it is now called asYear(). In fact, the new function does just the same (which is no surprise, simply because it is the same):
asYear(Sys.Date())
#[1] 2016
So it is possible to construct an analogy to "from package import as", but it is not a good idea to do this. Here are a few reasons I can think of:
Debugging a code where library functions have been renamed will be
much more difficult.
The documentation is not available for the renamed function. In this example, ?asYear won't work, in contrast to ?lubridate::year or library(lubridate); help(year).
The function is not only renamed but it is copied, which clutters the environment and is inefficient in terms of memory usage.
The maintenance of the code becomes unnecessarily difficult. If another programmer (or the original programmer a few years later) looks at a code containing such a redefinition of a function, it will be harder for her or him to understand what this function is doing.
There are probably more reasons, but I hope that this is sufficient to discourage the use of such a construction. Different programming languages have different peculiarities, and as a programmer it is necessary to adapt to them. What is common in Python can be awkward in R, and vice versa.
A simple and commonly used way to handle such a standard situation in R is to use library() to load the entire namespace of the package containing the requested function:
library (lubridate)
year(Sys.Date())
However, one should be aware of possible namespace clashes, especially if many libraries are loaded simultaneously. Different functions could be defined with the same name in different packages. A well-known example thereof are the contrasting implementations of the lag() function in the dplyr and stats package.
In such cases one can use the double colon operator :: to resolve the namespace that should be addressed. This would be similar to the use of "from" in the case of "import", but such a specification would be needed each time the function is called.
lubridate::year(Sys.Date())
#[1] 2016

Coding practice in R : what are the advantages and disadvantages of different styles?

The recent questions regarding the use of require versus :: raised the question about which programming styles are used when programming in R, and what their advantages/disadvantages are. Browsing through the source code or browsing on the net, you see a lot of different styles displayed.
The main trends in my code :
heavy vectorization I play a lot with the indices (and nested indices), which results in rather obscure code sometimes but is generally a lot faster than other solutions.
eg: x[x < 5] <- 0 instead of x <- ifelse(x < 5, x, 0)
I tend to nest functions to avoid overloading the memory with temporary objects that I need to clean up. Especially with functions manipulating large datasets this can be a real burden. eg : y <- cbind(x,as.numeric(factor(x))) instead of y <- as.numeric(factor(x)) ; z <- cbind(x,y)
I write a lot of custom functions, even if I use the code only once in eg. an sapply. I believe it keeps it more readible without creating objects that can remain lying around.
I avoid loops at all costs, as I consider vectorization to be a lot cleaner (and faster)
Yet, I've noticed that opinions on this differ, and some people tend to back away from what they would call my "Perl" way of programming (or even "Lisp", with all those brackets flying around in my code. I wouldn't go that far though).
What do you consider good coding practice in R?
What is your programming style, and how do you see its advantages and disadvantages?
What I do will depend on why I am writing the code. If I am writing a data analysis script for my research (day job), I want something that works but that is readable and understandable months or even years later. I don't care too much about compute times. Vectorizing with lapply et al. can lead to obfuscation, which I would like to avoid.
In such cases, I would use loops for a repetitive process if lapply made me jump through hoops to construct the appropriate anonymous function for example. I would use the ifelse() in your first bullet because, to my mind at least, the intention of that call is easier to comprehend than the subset+replacement version. With my data analysis I am more concerned with getting things correct than necessarily with compute time --- there are always the weekends and nights when I'm not in the office when I can run big jobs.
For your other bullets; I would tend not to inline/nest calls unless they were very trivial. If I spell out the steps explicitly, I find the code easier to read and therefore less likely to contain bugs.
I write custom functions all the time, especially if I am going to be calling the code equivalent of the function repeatedly in a loop or similar. That way I have encapsulated the code out of the main data analysis script into it's own .R file which helps keep the intention of the analysis separate from how the analysis is done. And if the function is useful I have it for use in other projects etc.
If I am writing code for a package, I might start with the same attitude as my data analysis (familiarity) to get something I know works, and only then go for the optimisation if I want to improve compute times.
The one thing I try to avoid doing, is being too clever when I code, whatever I am coding for. Ultimately I am never as clever as I think I am at times and if I keep things simple, I tend not to fall on my face as often as I might if I were trying to be clever.
I write functions (in standalone .R files) for various chunks of code that conceptually do one thing. This keeps things short and sweet. I found debugging somewhat easier, because traceback() gives you which function produced an error.
I too tend to avoid loops, except when its absolutely necessary. I feel somewhat dirty if I use a for() loop. :) I try really hard to do everything vectorized or with the apply family. This is not always the best practice, especially if you need to explain the code to another person who is not as fluent in apply or vectorization.
Regarding the use of require vs ::, I tend to use both. If I only need one function from a certain package I use it via ::, but if I need several functions, I load the entire package. If there's a conflict in function names between packages, I try to remember and use ::.
I try to find a function for every task I'm trying to achieve. I believe someone before me has thought of it and made a function that works better than anything I can come up with. This sometimes works, sometimes not so much.
I try to write my code so that I can understand it. This means I comment a lot and construct chunks of code so that they somehow follow the idea of what I'm trying to achieve. I often overwrite objects as the function progresses. I think this keeps the transparency of the task, especially if you're referring to these objects later in the function. I think about speed when computing time exceeds my patience. If a function takes so long to finish that I start browsing SO, I see if I can improve it.
I found out that a good syntax editor with code folding and syntax coloring (I use Eclipse + StatET) has saved me a lot of headaches.
Based on VitoshKa's post, I am adding that I use capitalizedWords (sensu Java) for function names and fullstop.delimited for variables. I see that I could have another style for function arguments.
Naming conventions are extremely important for the readability of the code. Inspired by R's S4 internal style here is what I use:
camelCase for global functions and objects (like doSomething, getXyyy, upperLimit)
functions start with a verb
not exported and helper functions always start with "."
local variables and functions are all in small letters and in "_" syntax (do_something, get_xyyy), It makes it easy to distinguish local vs global and therefore leads to a cleaner code.
For data juggling I try to use as much SQL as possible, at least for the basic things like GROUP BY averages. I like R a lot but sometimes it's not only fun to realize that your research strategy was not good enough to find yet another function hidden in yet another package. For my cases SQL dialects do not differ much and the code is really transparent. Most of the time the threshold (when to start to use R syntax) is rather intuitive to discover. e.g.
require(RMySQL)
# selection of variables alongside conditions in SQL is really transparent
# even if conditional variables are not part of the selection
statement = "SELECT id,v1,v2,v3,v4,v5 FROM mytable
WHERE this=5
AND that != 6"
mydf <- dbGetQuery(con,statement)
# some simple things get really tricky (at least in MySQL), but simple in R
# standard deviation of table rows
dframe$rowsd <- sd(t(dframe))
So I consider it good practice and really recommend to use a SQL database for your data for most use cases. I am also looking into TSdbi and saving time series in relational database, but cannot really judge that yet.

Resources