Now that I find myself spending so much time programming in R, I really want to get back to automated testing (which I learned to do by habit in Perl). Besides being user-friendly, I would also be particularly interested in being able to generate random inputs for tests like Perl's Test::LectroTest or Haskell's QuickCheck. Is there anything similar for R?
See the R package quickcheck on GitHub.
Like Test::LectroTest, the R package quickcheck is a port of QuickCheck, which Koen Claessen and John Hughes wrote for Haskell.
In addition to QuickCheck features, quickcheck also gives a nod to Hadley Wickam's popular testthat R package, by intentionally incorporating his "expectation" functions (which they call "assertions"). In addition to numerical and string tests are tests for failures and warnings, etc.
Here is a simple example using it:
library(quickcheck)
my_square <- function(x){x^2} # the function to test
test( function(x = rinteger()) min(my_square(x)) >= 0 )
# Pass function (x = rinteger())
# min(my_square(x)) >= 0
# [1] TRUE
test( function(x = rdouble())
all.equal(
my_square(x),
x^2
)
)
# Pass function (x = rdouble())
# all.equal(my_square(x), x^2)
# [1] TRUE
The first test ensures that anything generated by my_square is positive. The second test actually replicates the functionality of my_square and checks every output to make sure it is correct.
Note that rinteger() produces a vector of any length consisting of integer values. Other randomly generated input data can be produced using functions like rcharacter, rdouble, and rmatrix.
Related
I have the following equation:
and I'm trying to generate the analytic derivative .
I know you can use deriv() and D() for an expression , but I cannot seem to figure out how to actually implement a sum or a product notation into an expression.
partial/incomplete answer
The Deriv package offers a more robust (and extensible) alternative to the base R D and deriv functions, and appears to know about sum() already. prod() will be difficult, though (see below).
A very simple example:
library(Deriv)
Deriv(~ sum(b*x), "b")
## sum(x)
A slightly more complex answer that sort-of works:
Deriv(~ sum(rep(a, length(x)) + b*x), c("a","b"))
## c(a = sum(rep(x = 1, length(x))), b = sum(x))
Note here that sum(a+b*x) doesn't work (returns 1) for the derivative with respect to a, for reasons described in ?Deriv (search for "rep(" in the page): the rep() is needed to help Deriv sort out scalar/vector definitions. It's too bad that it can't simplify sum(rep(x=1, length(x))) to length(x) but ...
Trying
Deriv( ~ exp(sum(a+b*x))/prod(1+exp(a+b*x)))
gives an error
Could not retrieve body of 'prod()'
You might be able to add a rule for products to the derivatives table, but it will be tricky since prod() takes a ... argument. Let's try defining our own function Prod() which takes a single argument x (I think this is the right generalization of the product rule but didn't think about it too carefully.)
Prod <- function(x) product(x)
drule[["Prod"]] <- alist(for(i in 1:length(x)) { .dx[i]*Prod(x[-i]) })
Deriv(~Prod(beta*x), "x"))
Unsurprisingly (to me), this doesn't work: the result is 0 ... (the basic problem is that using .dx[i] to denote the derivative of x[i] doesn't work in the machinery).
I don't know of a way to solve this in R; if I had this problem (depending on more context, which I don't know), I might see if I could find a framework for automatic differentiation (rather than symbolic differentiation). Unfortunately most of the existing tools for autodiff in R use backends in C++ or Julia (e.g. see here (C++ + Rcpp + CppAD), here (Julia), the TMB package (C++/CppAD/user-friendly extensions). There's an ancient pure-R github project radx but it looks too incomplete to use ... (FWIW autodiffr requires a Julia installation but doesn't actually require you to write any Julia code, AFAICS ...)
It is known that base R uses BLAS for calculation speedup. In my code I want to use those functions from base R and may be its packages that do use BLAS. How can I get the list of R functions which exactly use BLAS? Or how can I check whether the function I want to use in my code do use BLAS (ATLAS, LAPACK and so on)?
This is not a complete answer, as I am no expert in this. But maybe you or someone else can take some of these starting ideas and create a solution from it (would be great if you could post that then!)
Inspect R's functions if they are defined in C
Only functions which call C code are suspects for using BLAS. So finding out these functions could be a first step.
capture.output(print( FUN )) gives you the definition of a function as a string vector (one element per line) So to list all functions which are defined in terms of .Internal, .Primitive etc. do the following:
# Set this to the package you want to screen
envName <- 'base'
# Get the environment for the given name
env <- pos.to.env(which(search() == paste0('package:',envName)))
# Return TRUE if `string` contains `what`
contains <- function(string, what){
length(grep(what, string, fixed = TRUE)) != 0
}
# Build up a matrix which contains true if an element is defined in terms
# of the following functions which indicate calls to C code
signalWords <- c( '.Primitive', '.Internal', '.External'
, '.Call' , '.C' , '.Fortran' )
envElements <- ls(envir = env)
funTraits <- matrix(FALSE, nrow = length(envElements), ncol = length(signalWords),
dimnames = list(envElements, signalWords))
# Fill up the values of the matrix by reading each elements' definition
for (elementName in envElements){
element <- get(elementName, envir = env)
if(!is.function(element)){
next
}
fun.definition <- capture.output(print(element))
for(s in signalWords){
if(contains(fun.definition, s)){
funTraits[elementName, s] <- TRUE
}
}
}
When a function calls an external C function (as opposed to .Primitive functions) , it looks like this:
dnorm
## function (x, mean = 0, sd = 1, log = FALSE)
## .External(C_dnorm, x, mean, sd, log)
## <bytecode: 0x1b1eebfc>
## <environment: namespace:stats>
Then hunt down the object which is called by .External. It has the name of the corresponding C function. Use PACKAGE:::OBJECT$name to find it:
stats:::C_dnorm$name
## [1] "dnorm"
See further: How can I view the source code for a function?, it also information about where to find source code for compiled functions, and How to see the source code of R .Internal or .Primitive function?
Finally, you'll have to somehow screen the C code and all the functions it calls for BLAS routines...
LD_PRELOAD something which logs if a BLAS function is called
You could develop a DLL which has BLAS's names but just logs its usage and from where it has been called before forwarding the call to the real BLAS routines... The LD_PRELOAD UNIX environment variable can be used to load this library instead of BLAS. This only works if R was compiled to load BLAS as a dynamically linked library.
https://blog.netspi.com/function-hooking-part-i-hooking-shared-library-function-calls-in-linux/
See also: Why can R be linked to a shared BLAS later even if it was built with `--with-blas = lblas`?
It is possible to record the time that was used to run some code using system.time. Here is a little example:
system.time(
mean(rnorm(10^6))
)
But I am not only interested in the time but also in the number of arithmetic operations (that is +,-,*,/) that were used for the code.
In the above-mentioned case it would be easy to count the number of summations and the division in order to get the mean, but the code I would like to apply this to is far more complex.
Therefore, my question is: is there a function in R that counts the number of arithmetic operations?
You can trace the R functions of interest:
counter <- 0
trace("+", at = 1, print = FALSE,
tracer = quote(.GlobalEnv$counter <- .GlobalEnv$counter + 1))
#Tracing function "+" in package "base"
#[1] "+"
Reduce("+", 1:10)
#[1] 55
counter
#[1] 9
untrace("+")
#Untracing function "+" in package "base"
I'm not sure how useful it would be to count R level calls here. Many (most?) functions do arithmetic in C or Fortran code or even the BLAS. And I don't have a solution for counting calls in compiled code. You'd need to set that up during compilation if it is possible at all.
I would like to write a function fun1 with a DataArrays.DataArray y as unique argument. y can be either an integer or a float (in vector or in matrix form).
I have tried to follow the suggestions I have found in stackoverflow (Functions that take DataArrays and Arrays as arguments in Julia) and in the official documentation (http://docs.julialang.org/en/release-0.5/manual/methods/). However, I couldn't write a code enought flexible to deal with the uncertainty around y.
I would like to have something like (but capable of handling numerical DataArrays.DataArray):
function fun1(y::Number)
println(y);
end
Any suggestion?
One options can be to define:
fun1{T<:Number}(yvec::DataArray{T}) = foreach(println,yvec)
Then,
using DataArrays
v = DataArray(rand(10))
w = DataArray(rand(1:10,10))
fun1(v)
#
# elements of v printed as Flaot64s
#
fun1(w)
#
# elements of w printed as Ints
#
A delicate but recurring point to note is the invariance of Julia parametric types which necessitate defining a parametric function. A look at the documentation regarding types should clarify this concept (http://docs.julialang.org/en/release-0.4/manual/types/#types).
is there a way to test whether two objects are identical in the R language?
For clarity: I do not mean identical in the sense of the identical function,
which compares objects based on certain properties like numerical values or logical values etc.
I am really interested in object identity, which for example could be tested using the is operator in the Python language.
UPDATE: A more robust and faster implementation of address(x) (not using .Internal(inspect(x))) was added to data.table v1.8.9. From NEWS :
New function address() returns the address in RAM of its argument. Sometimes useful in determining whether a value has been copied or not by R, programatically.
There's probably a neater way but this seems to work.
address = function(x) substring(capture.output(.Internal(inspect(x)))[1],2,17)
x = 1
y = 1
z = x
identical(x,y)
# [1] TRUE
identical(x,z)
# [1] TRUE
address(x)==address(y)
# [1] FALSE
address(x)==address(z)
# [1] TRUE
You could modify it to work on 32bit by changing 17 to 9.
You can use the pryr package.
For example, return the memory location of the mtcars object:
pryr::address(mtcars)
Then, for variables a and b, you can check:
address(a) == address(b)