R's switch statement is not a special form, is it therefore slow? - r

In most languages with switch statements, switch is a special form designed such that the possibilities are evaluated lazily and the compiler knows how to optimise the selection of statements based on the given input. R, mostly already being lazy, does not need some of this. However, R's switch statement is still a function call, rather than any sort of special form. Does this mean that R's switch statement is slower than it would be if it were a special form? Or does R's interpreter know to optimise it as if it were a special form?

If you look at internal code of switch in file src/main/builtin.c, you can read in lines 1009-1025 :
This is a SPECIALSXP, so arguments need to be evaluated as needed.
SPECIALSXP means :
no SEXPTYPE Description
7 SPECIALSXP special functions
So switch is actually a special function which passes unevaluated arguments to the internal function.
Further reading the source code from line 1030 to line 1104 shows that as explained in ?switch, the function either handles character or number in a simple and not fully optimized way.
This probably explains why switch isn't particularly fast in situations which would for example require a binary search.

Related

Why is there a function undefined message whereas it's about variable

Let's say I'm invoking something like this
Enum.count(list)
And list is not defined in 'upper scope'. In most of languages you'll probably get something like Variable list is undefined, but in Elixir (it comes from Erlang, so I hope it's same behaviour) you'll be getting undefined function list/0 (there is no such import).
What's the difference in Elixir from other (let's say imperative) programming languages in sense of distinction between variable and function?
Also I've noticed you can make a function in module, and if it takes zero arguments, you can call it without parentheses, I was wondering what's special about that. (was answered below by #sabiwara)
Elixir used to consider parentheses optional for all function calls, including 0-arity functions like Kernel.node/0:
iex> node
:nonode#nohost
This behavior has since been deprecated and will emit a compile time warning:
warning: variable "node" does not exist and is being expanded to "node()", please use parentheses to remove the ambiguity or change the variable name
Parentheses for non-qualified calls are optional, except for zero-arity calls, which would then be ambiguous with variables.
But since it would be a breaking change to just change this behavior, it still works and gets interpreted, in your case, as list().
This might change in Elixir 2.0. A discussion on this topic here.

Julia type conversion best practices

I have a function which requires a DateTime argument. A possibility is that a user might provide a ZonedDateTime argument. As far as I can tell there are three possible ways to catch this without breaking:
Accept both arguments in a single method, and perform a type conversion if necessary via an if... statement
function ofdatetime(dt::AbstractDateTime)
if dt::ZonedDateTime
dt = DateTime(dt, UTC)
end
...
end
Define a second method which simply converts the type and calls the first method
function ofdatetime(dt::DateTime)
...
end
function ofdatetime(dt::ZonedDateTime)
dt = DateTime(dt, UTC)
return ofdatetime(dt)
end
Redefine the entire function body for the second method
function ofdatetime(dt::DateTime)
...
end
function ofdatetime(dt::ZonedDateTime)
dt = DateTime(dt, UTC)
...
end
Of course, this doesn't apply when a different argument type implies that the function actually do something different - the whole point of multiple dispatch - but this is a toy example. I'm wondering what is best practice in these cases? It needn't be exclusively to do with time zones, this is just the example I'm working with. Perhaps a relevant question is 'how does Julia do multiple dispatch under the hood?' i.e. are arguments dispatched to relevant methods by something like an if... else/switch... case block, or is it more clever than that?
The answer in the comments is correct that, ideally, you would write your ofdatetime function such that all operations on dt within your function body are general to any AbstractDateTime; in any case where the the difference between DateTime and ZonedDateTime would matter, you can use dispatch at that point within your function to take care of the details.
Failing that, either of 2 or 3 is generally preferable to 1 in your question, since for either of those, the branch can be elided in the case that the type of df is known at compile-time. Of the latter two, 2 is probably preferable to 3 as written in your example in terms of general code style ("DRY"), but if you were able to avoid the type conversion by writing entirely different function bodies, then 3 could actually have better performance than if the type conversion is at all expensive.
In general though, the best of all worlds is to keep most your code generic to either type, and only dispatch at the last possible moment.

Calling the agrep .Internal C function from Rcpp

In short: How can I call, from within Rccp C++ code, the agrep C internal function that gets called when users use the regular agrep function from base R?
In long: I have found multiple questions here about how to invoke, from within Rcpp, a C or C++ function created for another package (e.g. using C function from other package in Rcpp
and Rcpp: Call C function from a package within Rcpp).
The thing that I am trying to achieve, however, is at the same time simpler but also way less documented: it is to directly call, from within Rcpp, a .Internal C function that comes with base R rather than another package, without interfacing with R (that is, without doing what is said in Call R functions in Rcpp). How could I do that for the .Internal C function that lays underneath base R's agrep wrapper?
The specific function I am trying to call here is the agrep internal C function. And for context, what I am ultimately trying to achieve is to speed-up a call to agrep for when millions of patterns must be each checked against each of millions of x targets.
Great question. The long and short of it is "You cant" (in many cases) unless the function is visible in one of the header files in "src/include/". At least not that easily.
Not long ago I had a similar fun challenge, where I tried to get access to the do_docall function (called by do.call), and it is not a simple task. First of all, it is not directly possible to just #include <agrep.c> (or something similar). That file simply isn't available for inclusion, as it is not a part of the "src/include". It is compiled and the uncompiled file is removed (not to mention that one should never "include" a .c file).
If one is willing to go the mile, then the next step one could look at is "copying" and "altering" the source code. Basically find the function in "src/main/agrep.c", copy it into your package and then fix any errors you find.
Problems with this approach:
As documented in R-exts the internal structures of sexprec_info is not made public (this is the base structure for all objects in R). Many internal function use the fields within this structure, so one has to "copy" the structure into your source code, to make it public to your code specifically.
If you ever #include <Rcpp.h> prior to this file, you will need to go through each and every call to internal functions and likely add either R_ or Rf_.
The function may contain calls to other "internal" functions, that further needs to be copied and altered for it to work.
You will also need to get a clear understanding of what CDR, CAR and similar does. The internal functions have a documented structure, where the first argument contains the full call passed to the function, and function like those 2 are used to access parts of the call.
I did myself a solid and rewrote do_docall changing the input format, to avoid having to consider this. But this takes time. The alternative is to create a pairlist according to the documentation, set its type as a call-sexp (the exact name is lost to me at the moment) and pass the appropriate arguments for op, args and env.
And lastly, if you go through the steps, and find that it is necessary to copy the internal structures of sexprec_info (as described later), then you will need to be very careful about when you include Rinternals and Rcpp, as any one of these causes your code to crash and burn in the most beautiful and silent way if you include your header and these in the wrong order! Note that this even goes for [[Rcpp::export]], which may indeed turn out to include them in the wrong arbitrary order!
If you are willing to go this far down the drainage, I would suggest carefully reading adv-R "R's C interface" and Chapter 2, 5 and 6 of R-ext and maybe even the R internal manual, and finally once that is done take a look at do_docall from src/main/coerce.c and compare it to the implementation in my repository cmdline.arguments/src/utils/{cmd_coerce.h, cmd_coerce.c}. In this version I have
Added all the internal structures that are not public, so that I can access their unmodified form (unmodified by the current session).
This includes the table used to store the currently used SEXP's, that was used as a lookup. This caused a problem as I can't access the modified version, so my code is slightly altered with the old code blocked by the macro #if --- defined(CMDLINE_ARGUMENTS_MAYBE_IN_THE_FUTURE). Luckily the code causing a problem had a static answer, so I could work around this (but this might not always be the case).
I added quite a few Rf_s as their macro version is not available (since I #include <Rcpp.h> at some point)
The code has been split into smaller functions to make it more readable (for my own sake).
The function has one additional argument (name), that is not used in the internal function, with some added errors (for my specific need).
This implementation will be frozen "for all time to come" as I've moved on to another branch (and this one is frozen for my own future benefit, if I ever want to walk down this path again).
I spent a few days scouring the internet for information on this and found 2 different posts, talking about how this could be achieved, and my approach basically copies this. Whether this is actually allowed in a cran package, is an whole other question (and not one that I will be testing out).
This approach goes again if you want to use not-public code from other packages. While often here it is as simple as "copy-paste" their files into your repository.
As a final side note, you mention the intend is to "speed up" your code for when you have to perform millions upon millions of calls to agrep. It seems that this is a time where one should consider performing the task in parallel. Even after going through the steps outlined above, creating N parallel sessions to take care of K evaluations each (say 100.000), would be the first step to reduce computing time. Of course each session should be given a batch and not a single call to agrep.

Why use macros in Julia?

I was reading up on the documentation of macros and ran into the following under the `Hold up: why macros' section. The reasoning given to use macros is as follows:
Macros are necessary because they execute when code is parsed,
therefore, macros allow the programmer to generate and include
fragments of customized code before the full program is run
This leads me to wonder why someone would want to use "generate and include fragments of customized code before the full program is run". Can someone provide context as to why this would be beneficial and/or other good use cases for macros?
Let me give you my view on macros.
A macro basically is a code -> code function. It takes code (a Julia expression) as input and spits out code (a different Julia expression).
Why is this useful? It has multiple purposes:
compile time copy-and-paste: You don't have to write the same piece of code multiple times but instead can define a short macro that writes it for you wherever you put it. (example)
domain specific language (DSL): You can create special syntax that after the macros code -> code transform is replaced by pure Julia constructs. This is used in many packages to define special syntax, for example here and here.
code generation: Imagine you want to write a really long piece of code which, although being long, is very simple because it has some kind of pattern that repeats itself rather trivially. Writing that code by hand can be a pain (or even practically impossible). A macro can programmatically generate the code for you. One example is for-loop unrolling (see here and here). But even the #time macro isn't doing much more than just putting a bunch of Base.time_ns() function calls around the provided Julia expression.
special string parsing: If you type the literal 3.2 in Julia it will be parsed and interpreted as a Float64. Now, imagine you want to supply a number literally that goes beyond Float64 precision but would fit into a BigFloat. Typing big(3.123124812498124812498) won't work, because the literal number is first interpreted as a Float64 and then handed to the big function. Instead you need a way to tell Julia at parse time that this should become a BigFloat. This is handled by a #big_str 3.2 macro which (for convenience) can also be written as big"3.2". The latter is just syntactic sugar.
There might be many more applications of macros, but those are the most important to me.
Let me end by referencing Steven G. Johnson's great talk at JuliaCon 2019:
Most of the time, don't do metaprogramming :)

What does the jq notation <function>/<number> mean?

In various web pages, I see references to jq functions with a slash and a number following them. For example:
walk/1
I found the above notation used on a stackoverflow page.
I could not find in the jq Manual page a definition as to what this notation means. I'm guessing it might indicate that the walk function that takes 1 argument. If so, I wonder why a more meaningful notation isn't used such as is used with signatures in C++, Java, and other languages:
<function>(type1, type2, ..., typeN)
Can anyone confirm what the notation <function>/<number> means? Are other variants used?
The notation name/arity gives the name and arity of the function. "arity" is the number of arguments (i.e., parameters), so for example explode/0 means you'd just write explode without any arguments, and map/1 means you'd write something like map(f).
The fact that 0-arity functions are invoked by name, without any parentheses, makes the notation especially handy. The fact that a function name can have multiple definitions at any one time (each definition having a distinct arity) makes it easy to distinguish between them.
This notation is not used in jq programs, but it is used in the output of the (new) built-in filter, builtins/0.
By contrast, in some other programming languages, it (or some close variant, e.g. module:name/arity in Erlang) is also part of the language.
Why?
There are various difficulties which typically arise when attempting to graft a notation that's suitable for languages in which method-dispatch is based on types onto ones in which dispatch is based solely on arity.
The first, as already noted, has to do with 0-arity functions. This is especially problematic for jq as 0-arity functions are invoked in jq without parentheses.
The second is that, in general, jq functions do not require their arguments to be any one jq type. Having to write something like nth(string+number) rather than just nth/1 would be tedious at best.
This is why the manual strenuously avoids using "name(type)"-style notation. Thus we see, for example, startswith(str), rather than startswith(string). That is, the parameter names in the documentation are clearly just names, though of course they often give strong type hints.
If you're wondering why the 'name/arity' convention isn't documented in the manual, it's probably largely because the documentation was mostly written before jq supported multi-arity functions.
In summary -- any notational scheme can be made to work, but name/arity is (1) concise; (2) precise in the jq context; (3) easy-to-learn; and (4) widely in use for arity-oriented languages, at least on this planet.

Resources