R api allow to treat SEXP via pointer directly which simplify all treatement that depends to cast non original data type.
For exemple , we can use unsigned int to treat SEXP with reel or integer type . The problem is that R gives the ability to cast
automaticly from logical to integer SEXP. Internal R header define logical as C integer type causes -I think - non logic
state. for exemple if I use this code:
// [[Rcpp::export]]
SEXP test(SEXP x){
int* arr= INTEGER(x);
arr[0]=77;
return x;
}
and I run in R:
x<-NA ## by default NA is logical vector
is.logical(x) ## return TRUE
test(x) ## return TRUE
is.logical(x) ## return TRUE
print(x+0L ) ## normaly will return TRUE but it gives 77
max(x) ## gives 77 !
Most basic fuction treat x as integer sum,max,min...
The same problemes come with Rcpp witch block in-place exchange. For example:
// [[Rcpp::export]]
IntegerVector test1(IntegerVector x){
x[0]=77;
return x;
}
using R :
x<-NA
test1(x) ## x still NA
x<-as.integer(x)
test1(x) ## edit to 77
Finally, is there a possibly to overcome this critical cast from logical to integer ?
A logical in R has the same bytes per element as an integer (4 bytes). This is different than C, where a bool has 1 byte* and an int has 4 bytes. The reason R does this is probably because in this approach, up-casting logical to integer is instantaneous and vector multiplication between logical and integer has no overhead.
What you're doing in both cases is to access the pointer to the start of the vector and set the first 4 bytes to the value that would correspond to 77.
On the R side, the variable named "x" still points to the same underlying data. But since you changed the underlying data, the value of the x data now has bytes that correspond to an int of 77.
An int of 77 doesn't mean anything as a logical since it can't happen in basic operation. So really, what R does when you force an impossible value is basically unknown.
A logical in R can only have three values: TRUE (corresponds to a value of 1), FALSE (corresponds to a value of 0) and NA (corresponds to a value of -2147483648).
*(Technically, implementation defined but I've only seen it as 1 byte)
Related
I'm trying to understand what the c++ sizeof does when operating on an RCpp vector. As an example:
library(Rcpp)
cppFunction('int size_of(NumericVector a) {return(sizeof a);}')
size_of(1.0)
# [1] 16
this returns the value 16 for any numeric or integer vector passed to it.
As also does
cppFunction('int size_of(IntegerVector a) {return(sizeof a);}')
size_of(1)
# [1] 16
I thought that numerics in R were 8 bytes and integers 4 bytes. So what is going on here? The motivation is to use memcpy on RCpp vectors, for which the size needs to be known.
Everything we pass from R to C(++) and return is a SEXP type -- a pointer to an S Expression.
So if we generalize your function and actually let a SEXP in, we can see some interesting things:
R> Rcpp::cppFunction('int size_of(SEXP a) {return(sizeof a);}')
R> size_of(1L) ## single integer -- still a pointer
[1] 8
R> size_of(1.0) ## single double -- still a pointer
[1] 8
R> size_of(seq(1:100)) ## a sequence ...
[1] 8
R> size_of(help) ## a function
[1] 8
R> size_of(globalenv) ## an environment
[1] 8
R>
in short you got caught between a compile-time C++ type analysis operator (sizeof) and the run-time feature that everything is morphed into the SEXP type. For actual vectors, you probably want the size() or length() member functions and so on.
You would have to get into how NumericVector and IntegerVector are implemented to discover why they statically take up a certain number of bytes.
Based on your observation of the size of a "numeric" or "integer" in this context, it is likely that the value 16 accounts for any/all of the following:
Pointer to [dynamically-allocated?] data
Current logical size of container (number of elements)
Any other metadata
Ideally, don't use memcpy to transfer the state of one object to another, unless you are absolutely certain that it is a trivial object with only members of built-in type. If I have correctly guessed the layout of a NumericVector, using memcpy on it will violate its ownership semantics and thus be incorrect. There are other ways to copy R vectors.
I often seen the symbol 1L (or 2L, 3L, etc) appear in R code. Whats the difference between 1L and 1? 1==1L evaluates to TRUE. Why is 1L used in R code?
So, #James and #Brian explained what 3L means. But why would you use it?
Most of the time it makes no difference - but sometimes you can use it to get your code to run faster and consume less memory. A double ("numeric") vector uses 8 bytes per element. An integer vector uses only 4 bytes per element. For large vectors, that's less wasted memory and less to wade through for the CPU (so it's typically faster).
Mostly this applies when working with indices.
Here's an example where adding 1 to an integer vector turns it into a double vector:
x <- 1:100
typeof(x) # integer
y <- x+1
typeof(y) # double, twice the memory size
object.size(y) # 840 bytes (on win64)
z <- x+1L
typeof(z) # still integer
object.size(z) # 440 bytes (on win64)
...but also note that working excessively with integers can be dangerous:
1e9L * 2L # Works fine; fast lean and mean!
1e9L * 4L # Ooops, overflow!
...and as #Gavin pointed out, the range for integers is roughly -2e9 to 2e9.
A caveat though is that this applies to the current R version (2.13). R might change this at some point (64-bit integers would be sweet, which could enable vectors of length > 2e9). To be safe, you should use .Machine$integer.max whenever you need the maximum integer value (and negate that for the minimum).
From the Constants Section of the R Language Definition:
We can use the ‘L’ suffix to qualify any number with the intent of making it an explicit integer.
So ‘0x10L’ creates the integer value 16 from the hexadecimal representation. The constant 1e3L
gives 1000 as an integer rather than a numeric value and is equivalent to 1000L. (Note that the
‘L’ is treated as qualifying the term 1e3 and not the 3.) If we qualify a value with ‘L’ that is
not an integer value, e.g. 1e-3L, we get a warning and the numeric value is created. A warning
is also created if there is an unnecessary decimal point in the number, e.g. 1.L.
L specifies an integer type, rather than a double that the standard numeric class is.
> str(1)
num 1
> str(1L)
int 1
To explicitly create an integer value for a constant you can call the function as.integer or more simply use "L " suffix.
In R we all know it is convenient for those times we want to ensure we are dealing with an integer to specify it using the "L" suffix like this:
1L
# [1] 1
If we don't explicitly tell R we want an integer it will assume we meant to use a numeric data type...
str( 1 * 1 )
# num 1
str( 1L * 1L )
# int 1
Why is "L" the preferred suffix, why not "I" for instance? Is there a historical reason?
In addition, why does R allow me to do (with warnings):
str(1.0L)
# int 1
# Warning message:
# integer literal 1.0L contains unnecessary decimal point
But not..
str(1.1L)
# num 1.1
#Warning message:
#integer literal 1.1L contains decimal; using numeric value
I'd expect both to either return an error.
Why is "L" used as a suffix?
I've never seen it written down, but I theorise in short for two reasons:
Because R handles complex numbers which may be specified using the
suffix "i" and this would be too simillar to "I"
Because R's integers are 32-bit long integers and "L" therefore appears to be sensible shorthand for referring to this data type.
The value a long integer can take depends on the word size. R does not natively support integers with a word length of 64-bits. Integers in R have a word length of 32 bits and are signed and therefore have a range of −2,147,483,648 to 2,147,483,647. Larger values are stored as double.
This wiki page has more information on common data types, their conventional names and ranges.
And also from ?integer
Note that current implementations of R use 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9: doubles can hold much larger integers exactly.
Why do 1.0L and 1.1L return different types?
The reason that 1.0L and 1.1L will return different data types is because returning an integer for 1.1 will result in loss of information, whilst for 1.0 it will not (but you might want to know you no longer have a floating point numeric). Buried deep with the lexical analyser (/src/main/gram.c:4463-4485) is this code (part of the function NumericValue()) which actually creates a int data type from a double input that is suffixed by an ascii "L":
/* Make certain that things are okay. */
if(c == 'L') {
double a = R_atof(yytext);
int b = (int) a;
/* We are asked to create an integer via the L, so we check that the
double and int values are the same. If not, this is a problem and we
will not lose information and so use the numeric value.
*/
if(a != (double) b) {
if(GenerateCode) {
if(seendot == 1 && seenexp == 0)
warning(_("integer literal %s contains decimal; using numeric value"), yytext);
else {
/* hide the L for the warning message */
*(yyp-2) = '\0';
warning(_("non-integer value %s qualified with L; using numeric value"), yytext);
*(yyp-2) = (char)c;
}
}
asNumeric = 1;
seenexp = 1;
}
}
Probably because R is written in C, and L is used for a (long) integer in C
In R we all know it is convenient for those times we want to ensure we are dealing with an integer to specify it using the "L" suffix like this:
1L
# [1] 1
If we don't explicitly tell R we want an integer it will assume we meant to use a numeric data type...
str( 1 * 1 )
# num 1
str( 1L * 1L )
# int 1
Why is "L" the preferred suffix, why not "I" for instance? Is there a historical reason?
In addition, why does R allow me to do (with warnings):
str(1.0L)
# int 1
# Warning message:
# integer literal 1.0L contains unnecessary decimal point
But not..
str(1.1L)
# num 1.1
#Warning message:
#integer literal 1.1L contains decimal; using numeric value
I'd expect both to either return an error.
Why is "L" used as a suffix?
I've never seen it written down, but I theorise in short for two reasons:
Because R handles complex numbers which may be specified using the
suffix "i" and this would be too simillar to "I"
Because R's integers are 32-bit long integers and "L" therefore appears to be sensible shorthand for referring to this data type.
The value a long integer can take depends on the word size. R does not natively support integers with a word length of 64-bits. Integers in R have a word length of 32 bits and are signed and therefore have a range of −2,147,483,648 to 2,147,483,647. Larger values are stored as double.
This wiki page has more information on common data types, their conventional names and ranges.
And also from ?integer
Note that current implementations of R use 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9: doubles can hold much larger integers exactly.
Why do 1.0L and 1.1L return different types?
The reason that 1.0L and 1.1L will return different data types is because returning an integer for 1.1 will result in loss of information, whilst for 1.0 it will not (but you might want to know you no longer have a floating point numeric). Buried deep with the lexical analyser (/src/main/gram.c:4463-4485) is this code (part of the function NumericValue()) which actually creates a int data type from a double input that is suffixed by an ascii "L":
/* Make certain that things are okay. */
if(c == 'L') {
double a = R_atof(yytext);
int b = (int) a;
/* We are asked to create an integer via the L, so we check that the
double and int values are the same. If not, this is a problem and we
will not lose information and so use the numeric value.
*/
if(a != (double) b) {
if(GenerateCode) {
if(seendot == 1 && seenexp == 0)
warning(_("integer literal %s contains decimal; using numeric value"), yytext);
else {
/* hide the L for the warning message */
*(yyp-2) = '\0';
warning(_("non-integer value %s qualified with L; using numeric value"), yytext);
*(yyp-2) = (char)c;
}
}
asNumeric = 1;
seenexp = 1;
}
}
Probably because R is written in C, and L is used for a (long) integer in C
My question is: Suppose you have computed an algorithm that gives the number of iterations and you would like to print the number of iterations out. But the output always many decimal places, like the following:
64.00000000
Is it possible to get an integer by doing type casting in R ? How would you do it ??
There are some gotchas in coercing to integer mode. Presumably you have a variety of numbers in some structure. If you are working with a matrix, then the print routine will display all the numbers at the same precision. However, you can change that level. If you have calculated this result with an arithmetic process it may be actually less than 64 bit display as that value.
> 64.00000000-.00000099999
[1] 64
> 64.00000000-.0000099999
[1] 63.99999
So assuming you want all the values in whatever structure this is part of, to be displayed as integers, the safest would be:
round(64.000000, 0)
... since this could happen, otherwise.
> as.integer(64.00000000-.00000000009)
[1] 63
The other gotcha is that the range of value for integers is considerably less than the range of floating point numbers.
The function is.integer can be used to test for integer mode.
is.integer(3)
[1] FALSE
is.integer(3L)
[1] TRUE
Neither round nor trunc will return a vector in integer mode:
is.integer(trunc(3.4))
[1] FALSE
Instead of trying to convert the output into an integer, find out why it is not an integer in the first place, and fix it there.
Did you initialize it as an integer, e.g. num.iterations <- 0L or num.iterations <- integer(1) or did you make the mistake of setting it to 0 (a numeric)?
When you incremented it, did you add 1 (a numeric) or 1L (an integer)?
If you are not sure, go through your code and check your variable's type using the class function.
Fixing the problem at the root could save you a lot of trouble down the line. It could also make your code more efficient as numerous operations are faster on integers than numerics (an example).
The function as.integer() truncate the number up to 0 order, so you must add a 0.5 to get a proper approx
dd<-64.00000000
as.integer(dd+0.5)
If you have a numeric matrix you wish to coerce to an integer matrix (e.g., you are creating a set of dummy variables from a factor), as.integer(matrix_object) will coerce the matrix to a vector, which is not what you want. Instead, you can use storage.mode(matrix_object) <- "integer" to maintain the matrix form.