return NA via RCpp - r

Newbie RCpp question here: how can I make a NumericVector return NA to R? For example, suppose I have a RCpp code that assigns NA to the first element of a vector.
// [[RCpp::export]]
NumericVector myFunc(NumericVector x) {
NumericVector y=clone(x);
y[0]=NA; // <----- what's the right expression here?
return y;
}

The canonical way to get the correct NA for a given vector type (NumericVector, IntegerVector, ...) is to use the static get_na method. Something like:
y[0] = NumericVector::get_na() ;
FWIW, your code works as is with Rcpp11, which knows how to convert from the static Na_Proxy instance NA into the correct missing value for the target type.

Please at least try to grep through the large corpus of examples provided by our regression tests:
edd#max:~$ cd /usr/local/lib/R/site-library/Rcpp/unitTests/cpp/
edd#max:/usr/local/lib/R/site-library/Rcpp/unitTests/cpp$ grep NA *cpp | tail -5
support.cpp: Rf_pentagamma( NA_REAL) ,
support.cpp: expm1( NA_REAL ),
support.cpp: log1p( NA_REAL ),
support.cpp: Rcpp::internal::factorial( NA_REAL ),
support.cpp: Rcpp::internal::lfactorial( NA_REAL )
edd#max:/usr/local/lib/R/site-library/Rcpp/unitTests/cpp$
Moreover, this is actually a C question for R and answered in the Writing R Extensions manual.
You could also have found a good example in this Rcpp Gallery post as well as others; there is a search function right at the Rcpp Gallery.

Related

Using R, can I create special functions like %in% just one input (LEFT) or (RIGHT)?

The common notation for factorial is the ! operator in mathematics.
I can create function:
"%!%" = function(n, r=NULL) { factorial(n); }
This works as expected if I pass a NULL or NA to the RHS, which I don't really want to do.
3 %!% NA
3 %!% NULL
3 %!% .
What I would like to do is just enter:
3 %!%
Any suggestions on HOW I can do that? In that setup, I want the LHS (left) to be the input and the RHS (right) to be ignored.
I can do the BOTH no problem:
nPr = function(n, r, replace=FALSE)
{
if(replace) { return( n^r ); }
factorial(n) / factorial(n-r);
}
"%npr%" = "%nPr%" = nPr;
nCr = function(n, r, replace=FALSE)
{
# same function (FALSE, with n+r-1)
if(replace) { return( nCr( (n+r-1), r, replace=FALSE ) ); }
factorial(n) / ( factorial(r) * factorial(n-r) );
}
"%ncr%" = "%nCr%" = nCr;
where
5 %nCr% 3
5 %nPr% 3
work as expected based on selection without replacement.
Question: How to use the special operator with just the LHS?
The follow-on question is the opposite. Let's say I want the LHS (left) to be ignored and focus on the RHS (right). I believe this is how the built-in ? function links to help and ?? links to help.search(). Let's say I wanted to create an %$$% operator that worked that way.
No, you can't do that. The %any% operators are defined by the parser to be binary operators.
You can see all of the operators in R in the ?Syntax help page. Some are binary, some are unary, and some can be either one, but the unary operators always precede the argument. You can attach different functions to most of them (e.g. change the meaning of ! in !x), but you can't change the parser to allow x! to be legal code.

Rcpp and int64 NA value

How can I pass an NA value from Rcpp to R in a 64 bit vector?
My first approach would be:
// [[Rcpp::export]]
Rcpp::NumericVector foo() {
Rcpp::NumericVector res(2);
int64_t val = 1234567890123456789;
std::memcpy(&(res[0]), &(val), sizeof(double));
res[1] = NA_REAL;
res.attr("class") = "integer64";
return res;
}
But it yields
#> foo()
integer64
[1] 1234567890123456789 9218868437227407266
I need to get
#> foo()
integer64
[1] 1234567890123456789 <NA>
It's really much, much simpler. We have the behaviour of an int64 in R offered by (several) add-on packages the best of which is bit64 giving us the integer64 S3 class and associated behavior.
And it defines the NA internally as follows:
#define NA_INTEGER64 LLONG_MIN
And that is all that there is. R and its packages are foremost C code, and LLONG_MIN exists there and goes (almost) back all the way to founding fathers.
There are two lessons here. The first is the extension of IEEE defining NaN and Inf for floating point values. R actually goes way beyond and adds NA for each of its types. In pretty much the way above: by reserving one particular bit pattern. (Which, in one case, is the birthday of one of the two original R creators.)
The other is to admire the metric ton of work Jens did with the bit64 package and all the required conversion and operator functions. Seamlessly converting all possibly values, including NA, NaN, Inf, ... is no small task.
And it is a neat topic that not too many people know. I am glad you asked the question because we now have a record here.
Alright, I think I found an answer... (not beautiful, but working).
Short Answer:
// [[Rcpp::export]]
Rcpp::NumericVector foo() {
Rcpp::NumericVector res(2);
int64_t val = 1234567890123456789;
std::memcpy(&(res[0]), &(val), sizeof(double));
# This is the magic:
int64_t v = 1ULL << 63;
std::memcpy(&(res[1]), &(v), sizeof(double));
res.attr("class") = "integer64";
return res;
}
which results in
#> foo()
integer64
[1] 1234567890123456789 <NA>
Longer Answer
Inspecting how bit64 stores an NA
# the last value is the max value of a 64 bit number
a <- bit64::as.integer64(c(1, 2, NA, 9223372036854775807))
a
#> integer64
#> [1] 1 2 <NA> <NA>
bit64::as.bitstring(a[3])
#> [1] "1000000000000000000000000000000000000000000000000000000000000000"
bit64::as.bitstring(a[4])
#> [1] "1000000000000000000000000000000000000000000000000000000000000000"
Created on 2020-04-23 by the reprex package (v0.3.0)
we see that it is a 10000.... This can be recreated in Rcpp with int64_t val = 1ULL << 63;. Using memcpy() instead of a simple assign with = ensures that no bits are changed!

Converting String Versions of "Infinity" to Numeric in Rcpp

I have some JSON response that encodes Inf/-Inf/NaN as strings, so the JSON array it returns will look like [1.0, "Infinity", 2.0]. I parse this using a JSON library and end up with a list that looks like list(1.0, "Infinity", 2.0) and I want to convert it to be list(1.0, Inf, 2.0), for performance reasons I need this to use Rcpp. Here is the code I tried doing but I can't seem to get Rcpp to not yell at me about
library(Rcpp)
cppFunction('
NumericVector convertThings(List data) {
const size_t num_rows = data.size();
NumericVector rv(num_rows);
for (size_t i = 0; i < num_rows; ++i) {
if (as<String>(data[i]) == "Infinity") {
rv[i] = R_PosInf;
} else {
rv[i] = as<double>(data[i]);
}
}
return rv;
}
')
convertThings(list('Infinity', 1.0))
# expected output c(Inf, 1.0)
The error I am seeing is Error: not compatible with requested type. Help is much appreciated!
That is a basic C++ problem: how to convert text to numbers reliably.
One possibly answer is provided by the Boost.Lexical_Cast library and illustrated in this Rcpp Gallery post. Just using the first example:
R> library(Rcpp)
R> sourceCpp("/tmp/boostLexicalCastExample.cpp") # from post
R> lexcicalCast(c("Inf", "inf", "Infinity", "NA", 42))
[1] Inf Inf Inf NA 42
R>
As you can see, it matches at least three different ways of spelling infinity in text.

iferror equivalent in R

Here is the simplified version:
Suppose I am using 'which' function to find a position, say -
position_num=which(df$word=="ABC")
If the value exists it is fine and it returns an integer, but in case it is unable to match it returns integer(0) in such a case I want to assign a default value to position_num=1.
Thanks for the help in advance
Something like the following if() statement would probably do it.
position_num = if(!length(w <- which(df$word == "ABC"))) 1 else w
Here we are just checking to see if the result from which() has a length, because
length(integer(0))
# [1] 0
And we return 1 if it doesn't have a length, and the which() result (w) otherwise.
Would this work for you?
position_num=ifelse(length(which(df$word=="ABC")) != 0,which(df$word=="ABC"),1)

Rcpp: extract subset of matrix using indexmatrix

I have a question about subsetting from a matrix to a vector. The user has the possibility to explicitly give the indexmatrix (which is a matrix of the same size as M, with 0 if the entry is not wanted, and 1 if the entry has to be extracted). If the indexmatrix is provided, then we just subset it, and if the indexmatrix is not provided (indexmatrix = NULL), then we build it using type1 (which takes true or false). Only two types of indexmatrices are possible.
I used the subsetting technique provided in
Subset of a Rcpp Matrix that matches a logical statement
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
arma::colvec extractElementsRcpp(arma::mat M,
Rcpp::Nullable<Rcpp::NumericMatrix> indexmatrix = R_NilValue,
bool type1 = false) {
unsigned int D = M.n_rows; // dimension of the data
arma::mat indmatrix(D, D); // initialize indexmatrix
if (indexmatrix.isNotNull()) {
// copy indexmatrix to numericmatrix
Rcpp::NumericMatrix indexmatrixt(indexmatrix);
// make indexmatrix into arma matrix indmatrix
indmatrix = Rcpp::as<arma::mat>(indexmatrixt);
} //else {
// get indexmatrix
// Rcpp::NumericMatrix indexmatrixt = getindexmatrix(D, type1)["indexmatrix"];
// // make indexmatrix into arma matrix
// indmatrix = Rcpp::as<arma::mat>(indexmatrixt);
// }
arma::colvec unM = M.elem(find(indmatrix == 1)); // extract wanted elements
return(unM);
}
It works, great! However, the speed is not what I was hoping for. Whenever the indexmatrix is provided, the C++ code is slower than the normal R code, while I was aiming for a nice improvement in speed. I have the feeling I'm copying the matrices around too much. But I am new to C++ and did not find a way to avoid it yet.
The speed comparison is as follows:
test replications elapsed relative
2 extractElementsR(M, indexmatrix = ind) 100 0.084 1.00
1 extractElementsRcpp(M, indexmatrix = ind) 100 0.142 1.69
EDIT: The R function is defined as
extractElementsR <- function (M, indexmatrix, type1 = FALSE) {
D <- nrow(M)
# # get indexmatrix, if necessary
# if(is.null(indexmatrix)) indexmatrix <- getindexmatrix(D, type1 = type1)$indexmatrix
# extract wanted elements
return (M[which(indexmatrix > 0)])
}
One could for example take the matrices
M <- matrix(rnorm(1000^2), ncol = 1000)
indexmatrix <- matrix(1, 1000, 1000)
indexmatrix[lower.tri(indexmatrix)] <- 0
as M and indexmatrix.
EDIT2: I commented the else statement in the Rcpp function and omitted the default NULL value in the R function as it is not important for my question. I want to improve the speed of the Rcpp function when indexmatrix is provided. However, I want to keep the default NULL value (and create and indexmatrix when necessary).
Can you show
the function extractElementR() as well and
example data so that this become a reproducible example?
And at first blush, you are mixing Rcpp and RcppArmadillo types in order to subset with the latter. That will create lots of copies. We can now index with both Rcpp (and Kevin has some answers here) and RcppArmadillo (several older answers) so you could even try two different ways.

Resources