It is possible to record the time that was used to run some code using system.time. Here is a little example:
system.time(
mean(rnorm(10^6))
)
But I am not only interested in the time but also in the number of arithmetic operations (that is +,-,*,/) that were used for the code.
In the above-mentioned case it would be easy to count the number of summations and the division in order to get the mean, but the code I would like to apply this to is far more complex.
Therefore, my question is: is there a function in R that counts the number of arithmetic operations?
You can trace the R functions of interest:
counter <- 0
trace("+", at = 1, print = FALSE,
tracer = quote(.GlobalEnv$counter <- .GlobalEnv$counter + 1))
#Tracing function "+" in package "base"
#[1] "+"
Reduce("+", 1:10)
#[1] 55
counter
#[1] 9
untrace("+")
#Untracing function "+" in package "base"
I'm not sure how useful it would be to count R level calls here. Many (most?) functions do arithmetic in C or Fortran code or even the BLAS. And I don't have a solution for counting calls in compiled code. You'd need to set that up during compilation if it is possible at all.
Related
I have the following equation:
and I'm trying to generate the analytic derivative .
I know you can use deriv() and D() for an expression , but I cannot seem to figure out how to actually implement a sum or a product notation into an expression.
partial/incomplete answer
The Deriv package offers a more robust (and extensible) alternative to the base R D and deriv functions, and appears to know about sum() already. prod() will be difficult, though (see below).
A very simple example:
library(Deriv)
Deriv(~ sum(b*x), "b")
## sum(x)
A slightly more complex answer that sort-of works:
Deriv(~ sum(rep(a, length(x)) + b*x), c("a","b"))
## c(a = sum(rep(x = 1, length(x))), b = sum(x))
Note here that sum(a+b*x) doesn't work (returns 1) for the derivative with respect to a, for reasons described in ?Deriv (search for "rep(" in the page): the rep() is needed to help Deriv sort out scalar/vector definitions. It's too bad that it can't simplify sum(rep(x=1, length(x))) to length(x) but ...
Trying
Deriv( ~ exp(sum(a+b*x))/prod(1+exp(a+b*x)))
gives an error
Could not retrieve body of 'prod()'
You might be able to add a rule for products to the derivatives table, but it will be tricky since prod() takes a ... argument. Let's try defining our own function Prod() which takes a single argument x (I think this is the right generalization of the product rule but didn't think about it too carefully.)
Prod <- function(x) product(x)
drule[["Prod"]] <- alist(for(i in 1:length(x)) { .dx[i]*Prod(x[-i]) })
Deriv(~Prod(beta*x), "x"))
Unsurprisingly (to me), this doesn't work: the result is 0 ... (the basic problem is that using .dx[i] to denote the derivative of x[i] doesn't work in the machinery).
I don't know of a way to solve this in R; if I had this problem (depending on more context, which I don't know), I might see if I could find a framework for automatic differentiation (rather than symbolic differentiation). Unfortunately most of the existing tools for autodiff in R use backends in C++ or Julia (e.g. see here (C++ + Rcpp + CppAD), here (Julia), the TMB package (C++/CppAD/user-friendly extensions). There's an ancient pure-R github project radx but it looks too incomplete to use ... (FWIW autodiffr requires a Julia installation but doesn't actually require you to write any Julia code, AFAICS ...)
I found this previous thread here:
Range standardization (0 to 1) in R
This leads me to my question: when building a function to perform a calculation across all values
in a vector, my understanding was that this scenario is when the use of for-loops would be necessary (because said calculation is being all applied to all vector values). However, apparently that is not the case. What am I misunderstanding?
All of the basic arithmetic operations in R, and most of the basic numerical/mathematical functions, are natively vectorized; that is, they operate position-by-position on elements of vectors (or pairs of elements from matched vectors). So for example if you compute
c(1,3,5) + c(2,4,7)
You don't need an explicit for loop (although there is one in the underlying C code!) to get c(3,7,12) as the answer.
In addition, R has vector recycling rules; any time you call an operation with two vectors, the shorter automatically gets recycled to match the length of the longer one. R doesn't have scalar types, so a single number is stored as a numeric vector of length 1. So if we compute
(x-min(x))/(max(x)-min(x))
max(x) and min(x) are both of length 1, so the denominator is also of length 1. Subtracting min(x) from x, min(x) gets replicated/recycled to a vector the same length as x, and then the pairwise subtraction is done. Finally, the numerator (length == length(x)) is divided by the denominator (length 1) following similar rules.
In general, exploiting this built-in vectorization in R is much faster than writing your own for-loop, so it's definitely worth trying to get the hang of when operations can be vectorized.
This kind of vectorization is implemented in many higher-level languages/libraries that are specialized for numerical computation (pandas and numpy in Python, Matlab, Julia, ...)
I was wondering if there a way for R to detect the existence or absence of the sign * as used in the following objects?
In other words, can R understand that a has a * sign but b doesn't?
a = 3*4
b = 12
If you keep the expressions unevaluated, R can understand their internal complexity. Under normal circumstances, though, R evaluates expressions immediately, so there is no way to tell the difference between a <- 3*4 and b <- 12 once the assignments have been made. That means that the answer to your specific question is No.
Dealing with unevaluated expressions can get a bit complex, but quote() is one simple way to keep e.g. 3*4 from being evaluated:
> length(quote(3*4))
[1] 3
> length(quote(12))
[1] 1
If you're working inside a function, you can use substitute to retrieve the unevaluated form of the function arguments:
> f <- function(a) {
+ length(substitute(a))
+ }
> f(12)
[1] 1
> f(3*4)
[1] 3
In case you're pursuing this farther, you should be aware that counting complexity might not be as easy as you think:
> f(sqrt(2*3+(7*19)^2))
[1] 2
What's going on is that R stores expressions as a tree; the top level here is made up of sqrt and <the rest of the expression>, which has length 2. If you want to measure complexity you'll need to do some kind of collapsing or counting down the branches of the tree ...
Furthermore, if you first assign a <- 3*4 and then call f(a) you get 1, not 3, because substitute() gives you back just the symbol a, which has length 1 ... the information about the difference between "12" and "3*4" gets lost as soon as the expression is evaluated, which happens when the value is assigned to the symbol a. The bottom line is that you have to be very careful in controlling when expressions get evaluated, and it's not easy.
Hadley Wickham's chapter on expressions might be a good place to read more.
I have the following mathematical formula that I want to program as efficiently as possible in R.
$\sum_{i=1}^{N}(x_i-\bar x)(y_i-\bar y)$
Let's say we have the following example data:
x = c(1,5,7,10,11)
y = c(2,4,8,9,12)
How can I easily get this sum with this data without making a separate function?
Isn't there a package or a function that can compute these mathematical sums?
Use the sum command and vectorized operations: sum((x-mean(x))*(y-mean(y)))
The key revelation here is that the sum function is just taking the sum over the argument (vector, matrix, whatever). In this case, it's sufficient to give it a vector, and in this case, the vector expression is a little more complicated than sum(z), but notice that (x-mean(x))*(y-mean(y)) evaluates to z, so the fact that the command is slightly ornate doesn't really change how the function works. This is true in many places, not just the sum command.
Now that I find myself spending so much time programming in R, I really want to get back to automated testing (which I learned to do by habit in Perl). Besides being user-friendly, I would also be particularly interested in being able to generate random inputs for tests like Perl's Test::LectroTest or Haskell's QuickCheck. Is there anything similar for R?
See the R package quickcheck on GitHub.
Like Test::LectroTest, the R package quickcheck is a port of QuickCheck, which Koen Claessen and John Hughes wrote for Haskell.
In addition to QuickCheck features, quickcheck also gives a nod to Hadley Wickam's popular testthat R package, by intentionally incorporating his "expectation" functions (which they call "assertions"). In addition to numerical and string tests are tests for failures and warnings, etc.
Here is a simple example using it:
library(quickcheck)
my_square <- function(x){x^2} # the function to test
test( function(x = rinteger()) min(my_square(x)) >= 0 )
# Pass function (x = rinteger())
# min(my_square(x)) >= 0
# [1] TRUE
test( function(x = rdouble())
all.equal(
my_square(x),
x^2
)
)
# Pass function (x = rdouble())
# all.equal(my_square(x), x^2)
# [1] TRUE
The first test ensures that anything generated by my_square is positive. The second test actually replicates the functionality of my_square and checks every output to make sure it is correct.
Note that rinteger() produces a vector of any length consisting of integer values. Other randomly generated input data can be produced using functions like rcharacter, rdouble, and rmatrix.