Integers in R programming [duplicate] - r

In R we all know it is convenient for those times we want to ensure we are dealing with an integer to specify it using the "L" suffix like this:
1L
# [1] 1
If we don't explicitly tell R we want an integer it will assume we meant to use a numeric data type...
str( 1 * 1 )
# num 1
str( 1L * 1L )
# int 1
Why is "L" the preferred suffix, why not "I" for instance? Is there a historical reason?
In addition, why does R allow me to do (with warnings):
str(1.0L)
# int 1
# Warning message:
# integer literal 1.0L contains unnecessary decimal point
But not..
str(1.1L)
# num 1.1
#Warning message:
#integer literal 1.1L contains decimal; using numeric value
I'd expect both to either return an error.

Why is "L" used as a suffix?
I've never seen it written down, but I theorise in short for two reasons:
Because R handles complex numbers which may be specified using the
suffix "i" and this would be too simillar to "I"
Because R's integers are 32-bit long integers and "L" therefore appears to be sensible shorthand for referring to this data type.
The value a long integer can take depends on the word size. R does not natively support integers with a word length of 64-bits. Integers in R have a word length of 32 bits and are signed and therefore have a range of −2,147,483,648 to 2,147,483,647. Larger values are stored as double.
This wiki page has more information on common data types, their conventional names and ranges.
And also from ?integer
Note that current implementations of R use 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9: doubles can hold much larger integers exactly.
Why do 1.0L and 1.1L return different types?
The reason that 1.0L and 1.1L will return different data types is because returning an integer for 1.1 will result in loss of information, whilst for 1.0 it will not (but you might want to know you no longer have a floating point numeric). Buried deep with the lexical analyser (/src/main/gram.c:4463-4485) is this code (part of the function NumericValue()) which actually creates a int data type from a double input that is suffixed by an ascii "L":
/* Make certain that things are okay. */
if(c == 'L') {
double a = R_atof(yytext);
int b = (int) a;
/* We are asked to create an integer via the L, so we check that the
double and int values are the same. If not, this is a problem and we
will not lose information and so use the numeric value.
*/
if(a != (double) b) {
if(GenerateCode) {
if(seendot == 1 && seenexp == 0)
warning(_("integer literal %s contains decimal; using numeric value"), yytext);
else {
/* hide the L for the warning message */
*(yyp-2) = '\0';
warning(_("non-integer value %s qualified with L; using numeric value"), yytext);
*(yyp-2) = (char)c;
}
}
asNumeric = 1;
seenexp = 1;
}
}

Probably because R is written in C, and L is used for a (long) integer in C

Related

Is there a better way to extract the digits of a real number using Pari/GP?

Here's my current code, but it's ugly and I'm worried about possible edge cases from very large or small numbers. Is there a better way to do this?
real_to_int(n)={
if(n==floor(n),return(floor(n))); \\ If "n" is a whole number we're done
my(v=Vec(strprintf("%g",n))); \\ Convert "n" to a zero-padded character vector
my(d=sum(i=1,#v,i*(v[i]=="."))); \\ Find the decimal point
my(t=eval(concat(v[^d]))); \\ Delete the decimal point and reconvert to a number
my(z=valuation(t,10)); \\ Count trailing zeroes
t/=10^z; \\ Get rid of trailing zeroes
return(t)
}
You can split your input real into the integer and fractional parts without looking for dot point.
real_to_int(n) = {
my(intpart=digits(floor(n)));
my(fracpartrev=fromdigits(eval(Vecrev(Str(n))[1..-(2+#intpart)])));
fromdigits(concat(intpart, Vecrev(digits(fracpartrev))))
};
real_to_int(123456789.123456789009876543210000)
> 12345678912345678900987654321
Note, the composition of digits and fromdigits eliminates all the leading zeros from the list of digits for you.
The problem is not well defined since the conversion from real number (stored internally in binary) to a decimal string may require rounding and how this is done depends on a number of factors such as the format default, or the current bitprecision.
What is possible is to obtain the internal binary representation of the t_REAL as m * 2^e, where m and e are both integers.
install(mantissa2nr, GL);
real_to_int(n) =
{
e = exponent(n) + 1 - bitprecision(n);
[mantissa2nr(n, 0), e];
}
? [m, e] = real_to_int(Pi)
%1 = [267257146016241686964920093290467695825, -126]
? m * 1. * 2^e
%2 = 3.1415926535897932384626433832795028842
With [m, e] we obtain the exact (rational) internal representation of the number and both are well defined, i.e., independent of all settings. m is the binary equivalent of what was requested in decimal.

Problem with automatically cast Logical vector to integer

R api allow to treat SEXP via pointer directly which simplify all treatement that depends to cast non original data type.
For exemple , we can use unsigned int to treat SEXP with reel or integer type . The problem is that R gives the ability to cast
automaticly from logical to integer SEXP. Internal R header define logical as C integer type causes -I think - non logic
state. for exemple if I use this code:
// [[Rcpp::export]]
SEXP test(SEXP x){
int* arr= INTEGER(x);
arr[0]=77;
return x;
}
and I run in R:
x<-NA ## by default NA is logical vector
is.logical(x) ## return TRUE
test(x) ## return TRUE
is.logical(x) ## return TRUE
print(x+0L ) ## normaly will return TRUE but it gives 77
max(x) ## gives 77 !
Most basic fuction treat x as integer sum,max,min...
The same problemes come with Rcpp witch block in-place exchange. For example:
// [[Rcpp::export]]
IntegerVector test1(IntegerVector x){
x[0]=77;
return x;
}
using R :
x<-NA
test1(x) ## x still NA
x<-as.integer(x)
test1(x) ## edit to 77
Finally, is there a possibly to overcome this critical cast from logical to integer ?
A logical in R has the same bytes per element as an integer (4 bytes). This is different than C, where a bool has 1 byte* and an int has 4 bytes. The reason R does this is probably because in this approach, up-casting logical to integer is instantaneous and vector multiplication between logical and integer has no overhead.
What you're doing in both cases is to access the pointer to the start of the vector and set the first 4 bytes to the value that would correspond to 77.
On the R side, the variable named "x" still points to the same underlying data. But since you changed the underlying data, the value of the x data now has bytes that correspond to an int of 77.
An int of 77 doesn't mean anything as a logical since it can't happen in basic operation. So really, what R does when you force an impossible value is basically unknown.
A logical in R can only have three values: TRUE (corresponds to a value of 1), FALSE (corresponds to a value of 0) and NA (corresponds to a value of -2147483648).
*(Technically, implementation defined but I've only seen it as 1 byte)

When is it advantageous to use the L suffix to specify integer quantities in R? [duplicate]

I often seen the symbol 1L (or 2L, 3L, etc) appear in R code. Whats the difference between 1L and 1? 1==1L evaluates to TRUE. Why is 1L used in R code?
So, #James and #Brian explained what 3L means. But why would you use it?
Most of the time it makes no difference - but sometimes you can use it to get your code to run faster and consume less memory. A double ("numeric") vector uses 8 bytes per element. An integer vector uses only 4 bytes per element. For large vectors, that's less wasted memory and less to wade through for the CPU (so it's typically faster).
Mostly this applies when working with indices.
Here's an example where adding 1 to an integer vector turns it into a double vector:
x <- 1:100
typeof(x) # integer
y <- x+1
typeof(y) # double, twice the memory size
object.size(y) # 840 bytes (on win64)
z <- x+1L
typeof(z) # still integer
object.size(z) # 440 bytes (on win64)
...but also note that working excessively with integers can be dangerous:
1e9L * 2L # Works fine; fast lean and mean!
1e9L * 4L # Ooops, overflow!
...and as #Gavin pointed out, the range for integers is roughly -2e9 to 2e9.
A caveat though is that this applies to the current R version (2.13). R might change this at some point (64-bit integers would be sweet, which could enable vectors of length > 2e9). To be safe, you should use .Machine$integer.max whenever you need the maximum integer value (and negate that for the minimum).
From the Constants Section of the R Language Definition:
We can use the ‘L’ suffix to qualify any number with the intent of making it an explicit integer.
So ‘0x10L’ creates the integer value 16 from the hexadecimal representation. The constant 1e3L
gives 1000 as an integer rather than a numeric value and is equivalent to 1000L. (Note that the
‘L’ is treated as qualifying the term 1e3 and not the 3.) If we qualify a value with ‘L’ that is
not an integer value, e.g. 1e-3L, we get a warning and the numeric value is created. A warning
is also created if there is an unnecessary decimal point in the number, e.g. 1.L.
L specifies an integer type, rather than a double that the standard numeric class is.
> str(1)
num 1
> str(1L)
int 1
To explicitly create an integer value for a constant you can call the function as.integer or more simply use "L " suffix.

Why would R use the "L" suffix to denote an integer?

In R we all know it is convenient for those times we want to ensure we are dealing with an integer to specify it using the "L" suffix like this:
1L
# [1] 1
If we don't explicitly tell R we want an integer it will assume we meant to use a numeric data type...
str( 1 * 1 )
# num 1
str( 1L * 1L )
# int 1
Why is "L" the preferred suffix, why not "I" for instance? Is there a historical reason?
In addition, why does R allow me to do (with warnings):
str(1.0L)
# int 1
# Warning message:
# integer literal 1.0L contains unnecessary decimal point
But not..
str(1.1L)
# num 1.1
#Warning message:
#integer literal 1.1L contains decimal; using numeric value
I'd expect both to either return an error.
Why is "L" used as a suffix?
I've never seen it written down, but I theorise in short for two reasons:
Because R handles complex numbers which may be specified using the
suffix "i" and this would be too simillar to "I"
Because R's integers are 32-bit long integers and "L" therefore appears to be sensible shorthand for referring to this data type.
The value a long integer can take depends on the word size. R does not natively support integers with a word length of 64-bits. Integers in R have a word length of 32 bits and are signed and therefore have a range of −2,147,483,648 to 2,147,483,647. Larger values are stored as double.
This wiki page has more information on common data types, their conventional names and ranges.
And also from ?integer
Note that current implementations of R use 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9: doubles can hold much larger integers exactly.
Why do 1.0L and 1.1L return different types?
The reason that 1.0L and 1.1L will return different data types is because returning an integer for 1.1 will result in loss of information, whilst for 1.0 it will not (but you might want to know you no longer have a floating point numeric). Buried deep with the lexical analyser (/src/main/gram.c:4463-4485) is this code (part of the function NumericValue()) which actually creates a int data type from a double input that is suffixed by an ascii "L":
/* Make certain that things are okay. */
if(c == 'L') {
double a = R_atof(yytext);
int b = (int) a;
/* We are asked to create an integer via the L, so we check that the
double and int values are the same. If not, this is a problem and we
will not lose information and so use the numeric value.
*/
if(a != (double) b) {
if(GenerateCode) {
if(seendot == 1 && seenexp == 0)
warning(_("integer literal %s contains decimal; using numeric value"), yytext);
else {
/* hide the L for the warning message */
*(yyp-2) = '\0';
warning(_("non-integer value %s qualified with L; using numeric value"), yytext);
*(yyp-2) = (char)c;
}
}
asNumeric = 1;
seenexp = 1;
}
}
Probably because R is written in C, and L is used for a (long) integer in C

Disable type "promotion" (auto type assertions)

In Julia, types are automatically "promoted", e.g.:
x = 8
y = 1.0
typeof(x)
typeof(y)
typeof(x + y)
Is it possible to disable this automatic type promotion? I am thing of something like implicitly
(x + y)::Int64.
There isn't any way to add an integer to a float without first converting them to a common type. Every language that allows you to add numeric values of mixed type will do some kind of promotion first. In this case, if you want an Int result, you can convert the result with the int function: int(8 + 1.0). Note that this converts floats to integers by rounding, not truncating as in many languages. You could also convert 1.0 to an Int before adding, in which case you would be adding two integers and you'd get an integer.

Resources