Error handling in Rcpp - r

How can I check, if a value is numeric and finite? Let's say I generate random numbers with Rf_rgamma or with my own routine. Depending on the parameters, errors could be generated. How can I check that within C and break a loop and the entire function in that event?
And how can I check if a vector, let's say an arma::vec from RcppArmadillo, contains only numeric and finite values?
I know, these are general questions. However, my specific problem takes minutes to be reproduced and I haven't been able to create a minimal example. Most of the time my function works fine, just 1 in 100.000 times it causes R to crash.

Here is one way: check each element. A simple demo:
R> cppFunction('int checker(double x) { return ::R_finite(x);} ')
R> checker(2)
[1] 1
R> checker(0)
[1] 1
R> checker(NaN)
[1] 0
R> checker(Inf)
[1] 0
R>

Related

R: How to convert long number to string to save precision

I have a problem to convert a long number to a string in R. How to easily convert a number to string to preserve precision? A have a simple example below.
a = -8664354335142704128
toString(a)
[1] "-8664354335142704128"
b = -8664354335142703762
toString(b)
[1] "-8664354335142704128"
a == b
[1] TRUE
I expected toString(a) == toString(b), but I got different values. I suppose toString() converts the number to float or something like that before converting to string.
Thank you for your help.
Edit:
> -8664354335142704128 == -8664354335142703762
[1] TRUE
> along = bit64::as.integer64(-8664354335142704128)
> blong = bit64::as.integer64(-8664354335142703762)
> along == blong
[1] TRUE
> blong
integer64
[1] -8664354335142704128
I also tried:
> as.character(blong)
[1] "-8664354335142704128"
> sprintf("%f", -8664354335142703762)
[1] "-8664354335142704128.000000"
> sprintf("%f", blong)
[1] "-0.000000"
Edit 2:
My question first was, if I can convert a long number to string without loss. Then I realized, in R is impossible to get the real value of a long number passed into a function, because R automatically read the value with the loss.
For example, I have the function:
> my_function <- function(long_number){
+ string_number <- toString(long_number)
+ print(string_number)
+ }
If someone used it and passed a long number, I am not able to get the information, which number was passed exactly.
> my_function(-8664354335142703762)
[1] "-8664354335142704128"
For example, if I read some numbers from a file, it is easy. But it is not my case. I just need to use something that some user passed.
I am not R expert, so I just was curious why in another language it works and in R not. For example in Python:
>>> def my_function(long_number):
... string_number = str(long_number)
... print(string_number)
...
>>> my_function(-8664354335142703762)
-8664354335142703762
Now I know, the problem is how R reads and stores numbers. Every language can do it differently. I have to change the way how to pass numbers to R function, and it solves my problem.
So the correct answer to my question is:
""I suppose toString() converts the number to float", nope, you did it yourself (even if unintentionally)." - Nope, R did it itself, that is the way how R reads numbers.
So I marked r2evans answer as the best answer because this user helped me to find the right solution. Thank you!
Bottom line up front, you must (in this case) read in your large numbers as string before converting to 64-bit integers:
bit64::as.integer64("-8664354335142704128") == bit64::as.integer64("-8664354335142703762")
# [1] FALSE
Some points about what you've tried:
"I suppose toString() converts the number to float", nope, you did it yourself (even if unintentionally). In R, when creating a number, 5 is a float and 5L is an integer. Even if you had tried to create it as an integer, it would have complained and lost precision anyway:
class(5)
# [1] "numeric"
class(5L)
# [1] "integer"
class(-8664354335142703762)
# [1] "numeric"
class(-8664354335142703762L)
# Warning: non-integer value 8664354335142703762L qualified with L; using numeric value
# [1] "numeric"
more appropriately, when you type it in as a number and then try to convert it, R processes the inside of the parentheses first. That is, with
bit64::as.integer64(-8664354335142704128)
R first has to parse and "understand" everything inside the parentheses before it can be passed to the function. (This is typically a compiler/language-parsing thing, not just an R thing.) In this case, it sees that it appears to be a (large) negative float, so it creates a class numeric (float). Only then does it send this numeric to the function, but by this point the precision has already been lost. Ergo the otherwise-illogical
bit64::as.integer64(-8664354335142704128) == bit64::as.integer64(-8664354335142703762)
# [1] TRUE
In this case, it just *happens that the 64-bit version of that number is equal to what you intended.
bit64::as.integer64(-8664254335142704128) # ends in 4128
# integer64
# [1] -8664254335142704128 # ends in 4128, yay! (coincidence?)
If you subtract one, it results in the same effective integer64:
bit64::as.integer64(-8664354335142704127) # ends in 4127
# integer64
# [1] -8664354335142704128 # ends in 4128 ?
This continues for quite a while, until it finally shifts to the next rounding point
bit64::as.integer64(-8664254335142703617)
# integer64
# [1] -8664254335142704128
bit64::as.integer64(-8664254335142703616)
# integer64
# [1] -8664254335142703104
It is unlikely to be coincidence that the difference is 1024, or 2^10. I haven't fished yet, but I'm guessing there's something meaningful about this with respect to floating point precision in 32-bit land.
fortunately, bit64::as.integer64 has several S3 methods, useful for converting different formats/classes to a integer64
library(bit64)
methods(as.integer64)
# [1] as.integer64.character as.integer64.double as.integer64.factor
# [4] as.integer64.integer as.integer64.integer64 as.integer64.logical
# [7] as.integer64.NULL
So, bit64::as.integer64.character can be useful, since precision is not lost when you type it or read it in as a string:
bit64::as.integer64("-8664354335142704128")
# integer64
# [1] -8664354335142704128
bit64::as.integer64("-8664354335142704128") == bit64::as.integer64("-8664354335142703762")
# [1] FALSE
FYI, your number is already near the 64-bit boundary:
-.Machine$integer.max
# [1] -2147483647
-(2^31-1)
# [1] -2147483647
log(8664354335142704128, 2)
# [1] 62.9098
-2^63 # the approximate +/- range of 64-bit integers
# [1] -9.223372e+18
-8664354335142704128
# [1] -8.664354e+18

Loss of decimal places when calculating mean in R

I have a list entitled SET1Bearing1slope with nine numbers, and each number has at least 10 decimal places. When I use the mean() function on the list I get an arithmetic mean
.
Yet if I list the numbers individually and then use the mean() function, I get a different output
I know that this is caused by a rounding and that the second mean is more accurate. Is there a way to avoid this issue? What method can I use to avoid rounding errors when calculating the mean?
In R, mean() expects a vector of values, not multiple values. It is also a generic function so it is tolerant of additional parameters it doesn't understand (but doesn't warn you about them). See
mean(c(1,5,6))
# [1] 4
mean(1, 5, 6) #only "1" is used here, 5 and 6 are ignored.
# [1] 1
So in your example there are no rounding errors, you are just calling the function incorrectly.
Look at the difference in the way you're calling the function:
mean(c(1,2,5))
[1] 2.666667
mean(1,2,5)
[1] 1
As pointed by MrFlick, in the first case you're passing a vector of numbers (the correct way); in the second, you're passing a list of arguments, and just the first one is considered.
As for the number of digits, you can specify it using options():
options(digits = 10)
x <- runif(10)
x
[1] 0.49957540398 0.71266139182 0.07266473584 0.90541790240 0.41799820261
[6] 0.59809536533 0.88133668737 0.17078919476 0.92475634208 0.48827998806
mean(x)
[1] 0.5671575214
But remember that a greater number of digits is not necessarily better. There's a reason why R and others limits the number os digits. Check this topic: https://en.wikipedia.org/wiki/Significance_arithmetic

$value in unidimensional integrals in R [duplicate]

I have transitioned from STATA to R, and I was experimenting with different data types so that R's data structures are clear in my mind.
Here's how I set up my data structure:
b<-list(u=5,v=12)
c<-list(u=7)
j<-list(name="Joe",salary=55000,union=T)
bcj<-list(b,c,j)
Now, I was trying to figure out different ways to access u=5. I believe there are three ways:
Try1:
bcj[[1]][[1]]
I got 5. Correct!
Try2:
bcj[[1]][["u"]]
I got 5. Correct!
Try3:
bcj[[1]]$u
I got 5. Correct!
Try4
bcj[[1]][1][1]
Here's what I got:
bcj[[1]][1][1]
$u
[1] 5
class(bcj[[1]][1][1])
[1] "list"
Question 1: Why did this happen?
Also, I experimented with the following:
bcj[[1]][1][1][1][1][1]
$u
[1] 5
class(bcj[[1]][1][1][1][1][1])
[1] "list"
Question 2: I would have expected an error because I don't think so many lists exist in bcj, but R gave me a list. Why did this happen?
PS: I did look at this thread on SO, but it's talking about a different issue.
I think this is sufficient to answer your question. Consider a length-1 list:
x <- list(u = 5)
#$u
#[1] 5
length(x)
#[1] 1
x[1]
x[1][1]
x[1][1][1]
...
always gives you the same:
#$u
#[1] 5
In other words, x[1] will be identical to x, and you fall into infinite recursion. No matter how many [1] you write, you just get x itself.
If I create t1<-list(u=5,v=7), and then do t1[2][1][1][1]...this works as well. However, t1[[2]][2] gives NA
That is the difference between [[ and [ when indexing a list. Using [ will always end up with a list, while [[ will take out the content. Compare:
z1 <- t1[2]
## this is a length-1 list
#$v
#[1] 7
class(z1)
# "list"
z2 <- t1[[2]]
## this takes out the content; in this case, a vector
#[1] 7
class(z2)
#[1] "numeric"
When you do z1[1][1]..., as discussed above, you always end up with z1 itself. While if you do z2[2], you surely get an NA, because z2 has only one element, and you are asking for the 2nd element.
Perhaps this post and my answer there is useful for you: Extract nested list elements using bracketed numbers and names?

i not showing up as number in loop

so I have a loop that finds the position in the matrix where there is the largest difference in consecutive elements. For example, if thematrix[8] and thematrix[9] have the largest difference between any two consecutive elements, the number given should be 8.
I made the loop in a way that it will ignore comparisons where one of the elements is NaN (because I have some of those in my data). The loop I made looks like this.
thenumber = 0 #will store the difference
for (i in 1:nrow(thematrix) - 1) {
if (!is.na(thematrix[i]) & !is.na(thematrix[i + 1])) {
if (abs(thematrix[i] - thematrix[i + 1]) > thenumber) {
thenumber = i
}
}
}
This looks like it should work but whenever I run it
Error in if (!is.na(thematrix[i]) & !is.na(thematrix[i + 1])) { :
argument is of length zero
I tried this thing but with a random number in the brackets instead of i and it works. For some reason it only doesn't work when I use the i specified in the beginning of the for-loop. It doesn't recognize that i represents a number. Why doesn't R recognize i?
Also, if there's a better way to do this task I'd appreciate it greatly if you could explain it to me
You are pretty close but when you call i in 1:nrow(thematrix) - 1 R evaluates this to make i = 0 which is what causes this issue. I would suggest either calling i in 1:nrow(thematrix) or i in 2:nrow(thematrix) - 1 to start your loop at i = 1. I think your approach is generally pretty intuitive but one suggestion would be to frequently use the print() function to evaluate how i changes over the course of your function.
The issue is that the : operator has higher precedence than -; you just need to use parentheses around (nrow(thematrix)-1). For example,
thematrix <- matrix(1:10, nrow = 5)
##
wrong <- 1:nrow(thematrix) - 1
right <- 1:(nrow(thematrix) - 1)
##
R> wrong
#[1] 0 1 2 3 4
R> right
#[1] 1 2 3 4
Where the error message is coming from trying to access the zero-th element of thematrix:
R> thematrix[0]
integer(0)
The other two answers address your question directly, but I must say this is about the worst possible way to solve this problem in R.
set.seed(1) # for reproducible example
x <- sample(1:10,10) # numbers 1:10 in random order
x
# [1] 3 4 5 7 2 8 9 6 10 1
which.max(abs(diff(x)))
# [1] 9
The diff(...) function calculates sequential differences, and which.max(...) identifies the element number of the maximum value in a vector.

Using as.hexmode with R

I have some R code:
writePoint <- page * 2^12 + offset
localCount<-0
instructions <- 0
while(localCount < lengthI$length) {
cat("<instruction address=\"")
cat(as.hexmode(writePoint))
However writePoint is always written as a decimal number. What am I doing wrong?
That's kind of interesting. Here's a bit more compact demonstration and a start toward an explanation:
> cat(as.hexmode(10))
10
> cat(as.hexmode(20))
20
> as.hexmode(20)
[1] "14"
> str(as.hexmode(20))
Class 'hexmode' int 20
So a hexmode number has a print method (seen by typing methods(print) at the console) and it coerces it to a character when it is printed but it doesn't really change its internal representation as a number, so cat give you back a decimal number. Notice that the help page for cat says (but I will admit this behavior was not really implied by this text and I would have thought that it meant that cat would give 14 or 0x14):
> 0x14
[1] 20
cat converts numeric/complex elements in the same way as print (and not in the same way as as.character which is used by the S equivalent), so options "digits" and "scipen" are relevant.
Might want to use the as.character coercion to get what you want:
> as.character(as.hexmode(20))
[1] "14"

Resources