Rcpp and int64 NA value - r

How can I pass an NA value from Rcpp to R in a 64 bit vector?
My first approach would be:
// [[Rcpp::export]]
Rcpp::NumericVector foo() {
Rcpp::NumericVector res(2);
int64_t val = 1234567890123456789;
std::memcpy(&(res[0]), &(val), sizeof(double));
res[1] = NA_REAL;
res.attr("class") = "integer64";
return res;
}
But it yields
#> foo()
integer64
[1] 1234567890123456789 9218868437227407266
I need to get
#> foo()
integer64
[1] 1234567890123456789 <NA>

It's really much, much simpler. We have the behaviour of an int64 in R offered by (several) add-on packages the best of which is bit64 giving us the integer64 S3 class and associated behavior.
And it defines the NA internally as follows:
#define NA_INTEGER64 LLONG_MIN
And that is all that there is. R and its packages are foremost C code, and LLONG_MIN exists there and goes (almost) back all the way to founding fathers.
There are two lessons here. The first is the extension of IEEE defining NaN and Inf for floating point values. R actually goes way beyond and adds NA for each of its types. In pretty much the way above: by reserving one particular bit pattern. (Which, in one case, is the birthday of one of the two original R creators.)
The other is to admire the metric ton of work Jens did with the bit64 package and all the required conversion and operator functions. Seamlessly converting all possibly values, including NA, NaN, Inf, ... is no small task.
And it is a neat topic that not too many people know. I am glad you asked the question because we now have a record here.

Alright, I think I found an answer... (not beautiful, but working).
Short Answer:
// [[Rcpp::export]]
Rcpp::NumericVector foo() {
Rcpp::NumericVector res(2);
int64_t val = 1234567890123456789;
std::memcpy(&(res[0]), &(val), sizeof(double));
# This is the magic:
int64_t v = 1ULL << 63;
std::memcpy(&(res[1]), &(v), sizeof(double));
res.attr("class") = "integer64";
return res;
}
which results in
#> foo()
integer64
[1] 1234567890123456789 <NA>
Longer Answer
Inspecting how bit64 stores an NA
# the last value is the max value of a 64 bit number
a <- bit64::as.integer64(c(1, 2, NA, 9223372036854775807))
a
#> integer64
#> [1] 1 2 <NA> <NA>
bit64::as.bitstring(a[3])
#> [1] "1000000000000000000000000000000000000000000000000000000000000000"
bit64::as.bitstring(a[4])
#> [1] "1000000000000000000000000000000000000000000000000000000000000000"
Created on 2020-04-23 by the reprex package (v0.3.0)
we see that it is a 10000.... This can be recreated in Rcpp with int64_t val = 1ULL << 63;. Using memcpy() instead of a simple assign with = ensures that no bits are changed!

Related

Is there any way in which to make an Infix function using sourceCpp()

I was wondering whether it is possible to make an infix function, e.g. A %o% B with Rcpp.
I know that this is possible using the inline package, but have yet been able to find a method for doing this when using sourceCpp().
I have made the following infix implementation of %o% / outer() when arguments are sure to be vectors using RcppEigen and inline:
`%op%` <- cxxfunction(signature(v1="NumericVector",
v2="NumericVector"),
plugin = "RcppEigen",
body = c("
NumericVector xx(v1);
NumericVector yy(v2);
const Eigen::Map<Eigen::VectorXd> x(as<Eigen::Map<Eigen::VectorXd> >(xx));
const Eigen::Map<Eigen::VectorXd> y(as<Eigen::Map<Eigen::VectorXd> >(yy));
Eigen::MatrixXd op = x * y.transpose();
return Rcpp::wrap(op);
"))
This can easily be implemented in to be imported using sourceCpp(), however not as an infix function.
My current attempt is as follows:
#include <Rcpp.h>
using namespace Rcpp;
#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::export]]
NumericMatrix outerProd(NumericVector v1, NumericVector v2) {
NumericVector xx(v1);
NumericVector yy(v2);
const Eigen::Map<Eigen::VectorXd> x(as<Eigen::Map<Eigen::VectorXd> >(xx));
const Eigen::Map<Eigen::VectorXd> y(as<Eigen::Map<Eigen::VectorXd> >(yy));
Eigen::MatrixXd op = x * y.transpose();
return Rcpp::wrap(op);
}
So to summarize my question.. Is it possible to make an infix function available through sourceCpp?
Is it possible to make an infix function available through sourceCpp?
Yes.
As always, one should read the Rcpp vignettes!
In particular here, if you look in Section 1.6 of the Rcpp attributes vignette, you'd see you can modify the name of a function using the name parameter for Rcpp::export.
For example, we could do:
#include <Rcpp.h>
// [[Rcpp::export(name = `%+%`)]]
Rcpp::NumericVector add(Rcpp::NumericVector x, Rcpp::NumericVector y) {
return x + y;
}
/*** R
1:3 %+% 4:6
*/
Then we'd get:
Rcpp::sourceCpp("~/infix-test.cpp")
> 1:3 %+% 4:6
[1] 5 7 9
So, you still have to name C++ functions valid C++ names in the code, but you can export it to R through the name parameter of Rcpp::export without having to do anything further on the R side.
John Chambers states three principles on page four of the (highly recommended) "Extending R" book:
Everything that exists in R is an object.
Everything that happens in R is a function call.
Interfaces to other software are part of R.
So per point two, you can of course use sourceCpp() to create your a compiled function and hang that at any odd infix operator you like.
Code Example
library(Rcpp)
cppFunction("std::string cc(std::string a, std::string b) { return a+b; }")
`%+%` <- function(a,b) cc(a,b)
cc("Hello", "World")
"hello" %+% "world"
Output
R> library(Rcpp)
R> cppFunction("std::string cc(std::string a, std::string b) { return a+b; }")
R> `%+%` <- function(a,b) cc(a,b)
R>
R> cc("Hello", "World")
[1] "HelloWorld"
R>
R> "hello" %+% "world"
[1] "helloworld"
R>
Summary
Rcpp is really just one cog in the machinery.
Edit
It also works with your initial function, with some minor simplification. For
`%op%` <- cppFunction("Eigen::MatrixXd op(Eigen::VectorXd x, Eigen::VectorXd y) { Eigen::MatrixXd op = x * y.transpose(); return op; }", depends="RcppEigen")
as.numeric(1:3) %op% as.numeric(3:1)
we get
R> `%op%` <- cppFunction("Eigen::MatrixXd op(Eigen::VectorXd x, Eigen::VectorXd y) { Eigen::MatrixXd op = x * y.transpose(); return op; }", depends="RcppEigen")
R> as.numeric(1:3) %op% as.numeric(3:1)
[,1] [,2] [,3]
[1,] 3 2 1
[2,] 6 4 2
[3,] 9 6 3
R>
(modulo some line noise from the compiler).

Converting String Versions of "Infinity" to Numeric in Rcpp

I have some JSON response that encodes Inf/-Inf/NaN as strings, so the JSON array it returns will look like [1.0, "Infinity", 2.0]. I parse this using a JSON library and end up with a list that looks like list(1.0, "Infinity", 2.0) and I want to convert it to be list(1.0, Inf, 2.0), for performance reasons I need this to use Rcpp. Here is the code I tried doing but I can't seem to get Rcpp to not yell at me about
library(Rcpp)
cppFunction('
NumericVector convertThings(List data) {
const size_t num_rows = data.size();
NumericVector rv(num_rows);
for (size_t i = 0; i < num_rows; ++i) {
if (as<String>(data[i]) == "Infinity") {
rv[i] = R_PosInf;
} else {
rv[i] = as<double>(data[i]);
}
}
return rv;
}
')
convertThings(list('Infinity', 1.0))
# expected output c(Inf, 1.0)
The error I am seeing is Error: not compatible with requested type. Help is much appreciated!
That is a basic C++ problem: how to convert text to numbers reliably.
One possibly answer is provided by the Boost.Lexical_Cast library and illustrated in this Rcpp Gallery post. Just using the first example:
R> library(Rcpp)
R> sourceCpp("/tmp/boostLexicalCastExample.cpp") # from post
R> lexcicalCast(c("Inf", "inf", "Infinity", "NA", 42))
[1] Inf Inf Inf NA 42
R>
As you can see, it matches at least three different ways of spelling infinity in text.

Rcpp: extract subset of matrix using indexmatrix

I have a question about subsetting from a matrix to a vector. The user has the possibility to explicitly give the indexmatrix (which is a matrix of the same size as M, with 0 if the entry is not wanted, and 1 if the entry has to be extracted). If the indexmatrix is provided, then we just subset it, and if the indexmatrix is not provided (indexmatrix = NULL), then we build it using type1 (which takes true or false). Only two types of indexmatrices are possible.
I used the subsetting technique provided in
Subset of a Rcpp Matrix that matches a logical statement
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
arma::colvec extractElementsRcpp(arma::mat M,
Rcpp::Nullable<Rcpp::NumericMatrix> indexmatrix = R_NilValue,
bool type1 = false) {
unsigned int D = M.n_rows; // dimension of the data
arma::mat indmatrix(D, D); // initialize indexmatrix
if (indexmatrix.isNotNull()) {
// copy indexmatrix to numericmatrix
Rcpp::NumericMatrix indexmatrixt(indexmatrix);
// make indexmatrix into arma matrix indmatrix
indmatrix = Rcpp::as<arma::mat>(indexmatrixt);
} //else {
// get indexmatrix
// Rcpp::NumericMatrix indexmatrixt = getindexmatrix(D, type1)["indexmatrix"];
// // make indexmatrix into arma matrix
// indmatrix = Rcpp::as<arma::mat>(indexmatrixt);
// }
arma::colvec unM = M.elem(find(indmatrix == 1)); // extract wanted elements
return(unM);
}
It works, great! However, the speed is not what I was hoping for. Whenever the indexmatrix is provided, the C++ code is slower than the normal R code, while I was aiming for a nice improvement in speed. I have the feeling I'm copying the matrices around too much. But I am new to C++ and did not find a way to avoid it yet.
The speed comparison is as follows:
test replications elapsed relative
2 extractElementsR(M, indexmatrix = ind) 100 0.084 1.00
1 extractElementsRcpp(M, indexmatrix = ind) 100 0.142 1.69
EDIT: The R function is defined as
extractElementsR <- function (M, indexmatrix, type1 = FALSE) {
D <- nrow(M)
# # get indexmatrix, if necessary
# if(is.null(indexmatrix)) indexmatrix <- getindexmatrix(D, type1 = type1)$indexmatrix
# extract wanted elements
return (M[which(indexmatrix > 0)])
}
One could for example take the matrices
M <- matrix(rnorm(1000^2), ncol = 1000)
indexmatrix <- matrix(1, 1000, 1000)
indexmatrix[lower.tri(indexmatrix)] <- 0
as M and indexmatrix.
EDIT2: I commented the else statement in the Rcpp function and omitted the default NULL value in the R function as it is not important for my question. I want to improve the speed of the Rcpp function when indexmatrix is provided. However, I want to keep the default NULL value (and create and indexmatrix when necessary).
Can you show
the function extractElementR() as well and
example data so that this become a reproducible example?
And at first blush, you are mixing Rcpp and RcppArmadillo types in order to subset with the latter. That will create lots of copies. We can now index with both Rcpp (and Kevin has some answers here) and RcppArmadillo (several older answers) so you could even try two different ways.

A simple RcppArmadillo index issue

I'm coding with RcppArmadillo and got stuck with a very basic question. Suppose I have a vector "v", and I want to take its first 10 elements, like in R: v[1:10]. Since 1:10 doesn't work in RcppArmadillo, I tried v.elem(seq_len(10)), but it didn't work. Any hint?
Assuming you're taking about an arma::vec, this should work:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec f(const arma::vec & v, int first, int last) {
arma::vec out = v.subvec(first, last);
return out;
}
/*** R
f(11:20, 3, 6)
*/
Note that this uses zero-based indexing (11 is the 0th element of the vector). Coerce to NumericVector as desired.
When source'ed into R, the code is compiled, linked, loaded and the embedded example is executed:
R> sourceCpp("/tmp/armaEx.cpp")
R> f(11:20, 3, 6)
[,1]
[1,] 14
[2,] 15
[3,] 16
[4,] 17
R>
So it all really is _just one call to subvec(). See the Armadillo documentation for more.

return NA via RCpp

Newbie RCpp question here: how can I make a NumericVector return NA to R? For example, suppose I have a RCpp code that assigns NA to the first element of a vector.
// [[RCpp::export]]
NumericVector myFunc(NumericVector x) {
NumericVector y=clone(x);
y[0]=NA; // <----- what's the right expression here?
return y;
}
The canonical way to get the correct NA for a given vector type (NumericVector, IntegerVector, ...) is to use the static get_na method. Something like:
y[0] = NumericVector::get_na() ;
FWIW, your code works as is with Rcpp11, which knows how to convert from the static Na_Proxy instance NA into the correct missing value for the target type.
Please at least try to grep through the large corpus of examples provided by our regression tests:
edd#max:~$ cd /usr/local/lib/R/site-library/Rcpp/unitTests/cpp/
edd#max:/usr/local/lib/R/site-library/Rcpp/unitTests/cpp$ grep NA *cpp | tail -5
support.cpp: Rf_pentagamma( NA_REAL) ,
support.cpp: expm1( NA_REAL ),
support.cpp: log1p( NA_REAL ),
support.cpp: Rcpp::internal::factorial( NA_REAL ),
support.cpp: Rcpp::internal::lfactorial( NA_REAL )
edd#max:/usr/local/lib/R/site-library/Rcpp/unitTests/cpp$
Moreover, this is actually a C question for R and answered in the Writing R Extensions manual.
You could also have found a good example in this Rcpp Gallery post as well as others; there is a search function right at the Rcpp Gallery.

Resources