intToBin with large numbers - r

I'm using the intToBin() function from "R.utils" package and am having trouble using it to convert large decimal numbers to binary.
I get this error : NAs introduced by coercion.
Is there another function out there that can handle big numbers/ is there an algorithm/ code to implement such a function?
Thanks

If you read the help page for intToBin, it quite explicitly says it takes "integer" inputs. These are not mathematical "integers" but rather the computer-language-defined ints, which are limited to 16 bits (or something like that).
You'll need to find (or write :-() a function which converts floating-point numbers to binary floats, or if you're lucky, perhaps Rmpfr or gmp packages, which do arbitrary precision "big number" math, may have a float-to-binary tool.
By the time this gets posted, someone will have exposed my ignorance by posting an existing function, w/ my luck.
Edit -- like maybe the package pack

I needed a converter between doubles and hex numbers. So I wrote those, might be helpful to others
doubleToHex <- function(x) {
if(x < 16)
return(sprintf("%X", x))
remainders <- c()
while(x > 15) {
remainders <- append(remainders, x%%16)
x <- floor(x/16)
}
remainders <- paste(sprintf("%X", rev(remainders)), collapse="")
return(paste(x, remainders, sep=""))
}
hexToDouble <- function(x) {
x <- strsplit(x,"")[[1]]
output <- as.double(0)
for(i in rev(seq_along(x))) {
output <- output + (as.numeric(as.hexmode(x[i]) * (16**(length(x)-i))))
}
return(output)
}
doubleToHex(x = 8356723)
hexToDouble(x = "7F8373")
Hasn't been extensively tested yet, let me know if you detect a problem with it.

Related

How can I create a vector by only using for loop? (vector is specified in the body)

(1,2,2,3,3,3,4,4,4,4,...,n,...,n)
I want to make the above vector by for loop, but not using rep function or the other functions. This may not be good question to ask in stackoverflow, but since I am a newbie for R, I dare ask here to be helped.
(You can suppose the length of the vector is 10)
With a for loop, it can be done with
n <- 10
out <- c()
for(i in seq_len(n)){
for(j in seq_len(i)) {
out <- c(out, i)
}
}
In R, otherwise, this can be done as
rep(seq_len(n), seq_len(n))
I have been beaten by #akrun by seconds, even so I'd like to give you a few hints if using rep would have been possible which may help you with R in general. (Without rep usage, just look at #akrun)
Short answer using rep
rep(1:n, 1:n)
Long Answer using rep
Before posting a question you should try to develop your own solutions and share them.
Trying googling a bit and sharing what you already found is usually good as well. Please, have a look at "help/how-to-ask"
Let's try to do it together.
First of all, we should try to have a look at official sources:
R-project "getting help", here you can see the standard way to get a function's documentation is just typing ?func_name in your R console
R-project "official manuals" offer a good introduction to R. Try looking at the first topic, "An Introduction to R"
From the previous two (and other sources as well) you will find two interesting functions:
: operator: it can be used to generate a sequence of integers from a to b like a:b. Typing 1:3, for instance, gives you the 1, 2, 3 vector
rep(x, t) is a function which can be used to replicate the item(s) x t times.
You also need to know R is "vector-oriented", that is it applies functions over vectors without you typing explicits loops.
For instance, if you call repl(1:3, 2), it's (almost) equivalent to running:
for(i in 1:3)
rep(i, 2)
By combining the previous two functions and the notion R is "vector-oriented", you get the rep(1:n, 1:n) solution.
I am not sure why you don't want to use rep, but here is a method of not using it or any functions similar to rep within the loop.
`for (i in 1:10){
a<-NA
a[1:i] <- i
if (i==1){b<-a}
else if (i >1){b <- c(b,a)}
assign("OutputVector",b,envir = .GlobalEnv)
}`
`OutputVector`
Going for an n of ten seemed subjective so I just did the loop for numbers 1 through 10 and you can take the first 10 numbers in the vector if you want. OutputVector[1:10]
You can do this with a single loop, though it's a while rather than a for
n <- 10
x <- 1;
i <- 2;
while(i <= n)
{
x <- c(x, 1/i);
if(sum(x) %% 1 == 0) i = i + 1;
}
1/x

readr and write_csv: double precision numbers and grisu3

Sometimes, when I save a columns of double precision numbers to a csv using write_csv from readr (part of the tidyverse), the following happens:a double like 285121.15 is written as 285121.14999999997. The original value has only two decimals and this is not an artifact of printing it on screen. Numerically, they are pretty much the same thing, but it is annoying to share files with so many (unneeded decimals). The documentation of write_csv says that the grisu3 algorithm is used.
At the same time, I would like to avoid rounding up the values myself, since in general the number of decimals may vary.
According to what I found here
http://www.serpentine.com/blog/2011/06/29/here-be-dragons-advances-in-problems-you-didnt-even-know-you-had/
florian.loitsch.com/publications/dtoa-pldi2010.pdf?attredirects=0
it is a known shortcoming of grisu3.
Seen that I am now dealing with large data sets (hence writing to disk is not a big issue), I came up with the following
############ to avoid troubles when saving numbers
num_to_char <- function(df){
res <- df %>% mutate_if(is.numeric, as.character )
return(res)
}
to_csv <- function(df, ...){
df <- num_to_char(df)
write_csv(df, ...)
}
i.e. I essentially convert the numbers to strings prior to saving the file.
I ran some tests, and it seems to me my problem has been solved, but are there any caveats I should be aware of?
Many thanks!
My suggestion is to using following code:
#removing unlike precision (double precision)
A <- floor(A*100)
#then converting to the real number
A <- A/100
A simple example in R area;)
A <-9.12234353423242
A<-A*100
A
#[1] 912.2344
A<- floor(A)
A
#[1] 912
A <- A/100
A
#[1] 9.12

Computing the nth derivative of a function

To compute a probability, I have to compute derivatives (and then evaluate) like $\frac{\partial^5 f}{\partial x_1^2 \partial x_2^3}$ where $f$ is a polynomial function. The problem is that the order of the derivative is likely to vary as well as the list of variables with respect to which the derivative is computed.
I already tried with rSymPy and Ryacas and it works... until the number of variables becomes to important. So I have to look for a different solution. I tried with the DD() function indicated in the documentation of deriv() and using this function iteratively seems to be fine (and unexpectedly more efficient than with rSymPy and Ryacas).
My problem is to create the DD(DD(DD(...my.expr...,"xi",ni),"xj",nj),"xk",nk) command. I tried the following code:
step1 <- function(k) paste0(",x", k, ",", r[k]-1, ")", collapse="")
step2 <- function(expr) {
paste0(paste0(rep.int("DD(",u), collapse=""), expr,
paste0(sapply(t,f4), collapse=""), collapse="") }
step2(f)
where r is a vector indicating the order of derivation for each variable, t a subset of that vector, u <- length(t) and f is an expression object. This solution does not works because quotation marks are missing around variable names. Indeed I get for instance (I dropped the function from the code):
DD(DD(DD(DD(DD(my.expr,x1,1),x7,1),x9,2),x10,1),x11,1)
instead of:
DD(DD(DD(DD(DD(my.expr,"x1",1),"x7",1),"x9",2),"x10",1),"x11",1)
I tried adding \" in my function step1, but I have then a problem with the computation of the derivative. Any suggestion to fix this problem?
PS: it would surely be easier with a loop, but I would like to avoid if possible.
PS2: Sorry for LaTeX code.
I think this extension works. The trick is to not start going back and forth between expressions and strings ...
DD <- function(expr, names, order = 1, debug=FALSE) {
if (any(order>=1)) { ## do we need to do any more work?
w <- which(order>=1)[1] ## find a derivative to compute
if (debug) {
cat(names,order,w,"\n")
}
## update order
order[w] <- order[w]-1
## recurse ...
return(DD(D(expr,names[w]), names, order, debug))
}
return(expr)
}
Some tests:
DD(expression(x^2*y^3+z),c("x","y"),c(1,1))
## 2 * x * (3 * y^2)
DD(expression(x^2*y^3+z),c("x","y"),c(2,1))
## 2*3*(y^2)
DD(expression(x^2*y^3+z),c("x","y"),c(2,2))
## 2*(3*(2*y))
DD(expression(x^2*y^3+z),c("x","y"),c(2,3))
## 2*(3*2)
DD(expression(x^2*y^3+z),c("x","y"),c(2,4))
## 0
I hadn't noticed previously that you were differentiating a polynomial -- in that special case there's a much simpler answer (hint, represent the polynomial as a sequence of vectors that give the coefficients of orders of different terms). But you may not need that efficient an answer ...

Finding the GCD without looping - R

So I'm trying to learn R and using a number of resources including a book called "Discovering Statistics using R" and a bunch of other cool eBooks.
I understand a great method in programming is the Euclid's Algorithm.
Implementing it in a loop can be achieved like this:
gcd(x,y) //assuming x is the largest value
//do
r = x%y;
x = y;
y = r;
//while r != 0;
return x;
After several searches on Google, SO and Youtube refreshing my memory of gcd algorithms, I wasn't able to find one that doesn't use a loop. Even recursive methods seem to use loops.
How can this be achieved in R without the use of loops or if statements?
Thanks in advance.
Using the statement "without loops or the if statement" literally, here is a recursive version that uses ifelse:
gcd <- function(x,y) {
r <- x%%y;
return(ifelse(r, gcd(y, r), y))
}
One might not expect it, but this is actually vectorized:
gcd(c(1000, 10), c(15, 10))
[1] 5 10
A solution using if would not handle vectors of length greater than 1.
Reducing GCD for two integers enables you to compute GCD for any sequence of integers (sorted or not):
gcd2 <- function(a, b) {
if (b == 0) a else Recall(b, a %% b)
}
gcd <- function(...) Reduce(gcd2, c(...))
You can solve it recursively.
euclids <- function(x,y){
theMax = max(x,y)
theMin = min(x,y)
if (theMax == theMin) return (theMax)
else return (euclids(theMin, theMax-theMin))
}
It's easy to do with a couple modulo operations. Sadly, I left my personal gcd code on a different machine (in a galaxy far away) - but you can find the source in either the numbers or pracma packages.
BTW, here's a good way to find existing code: library(sos); ???gcd

NAs produced by integer overflow + R on linux

I'm running an R script on UNIX based system , the script contain multiplication of large numbers , so the results where NAs by integer overflow , but when i run the same script on windows , this problem does not appears.
but i should keep the script working the whole night on the Desktop(which is Unix).
is there any solution for this problem?
thanks
for(ol in seq(1,nrow(yi),by=25))
{
for(oh in seq(1,nrow(yi),by=25))
{
A=(N*(ol^2)) + ((N*(N+1)*(2*N+1))/6) -(2*ol*((N*N+1)/2)) + (2*N*ol*(N-oh+1)) + ((N-oh+1)*N^2) + (2*N*(oh-N-1)*(oh+N))
}
}
with :
N=16569 = nrow(yi)
but first round is not being calculated on unix.
Can you cast your integers to floating-point numbers in order to use floating-point math for the computations?
For example:
> x=as.integer(1000000)
> x*x
[1] NA
Warning message:
In x * x : NAs produced by integer overflow
> x=as.numeric(1000000)
> x*x
[1] 1e+12
As an aside, it is not entirely clear why the warning would appear in one environment but not the other. I first thought that 32-bit and 64-bit builds of R might be using 32-bit and 64-bit integers respectively, but that doesn't appear to be the case. Are both your environments configured identically in terms of how warnings are displayed?
As the other answers have pointed out, there is something a bit non-reproducible/strange about your results so far. Nevertheless, if you really must do exact calculations on large integers, you probably need an interface between R and some other system.
Some of your choices are:
the gmp package (see this page and scroll down to R
an interface to the bc calculator on googlecode
there is a high precision arithmetic page on the R wiki which compares interfaces to Yacas, bc, and MPFR/GMP
there is a limited interface to the PARI/GP package in the elliptical package, but this is probably (much) less immediately useful than the preceding three choices
Most Unix or Cygwin systems should have bc installed already. GMP and Yacas are easy to install on modern Linux systems ...
Here's an extended example, with a function that can choose among numeric, integer, or bigz computation.
f1 <- function(ol=1L,oh=1L,N=16569L,type=c("num","int","bigz")) {
type <- match.arg(type)
## convert all values to appropriate type
if (type=="int") {
ol <- as.integer(ol)
oh <- as.integer(oh)
N <- as.integer(N)
one <- 1L
two <- 2L
six <- 6L
cc <- as.integer
} else if (type=="bigz") {
one <- as.bigz(1)
two <- as.bigz(2)
six <- as.bigz(6)
N <- as.bigz(N)
ol <- as.bigz(ol)
oh <- as.bigz(oh)
cc <- as.bigz
} else {
one <- 1
two <- 2
six <- 6
N <- as.numeric(N)
oh <- as.numeric(oh)
ol <- as.numeric(ol)
cc <- as.numeric
}
## if using bigz mode, the ratio needs to be converted back to bigz;
## defining cc() as above seemed to be the most transparent way to do it
N*ol^two + cc(N*(N+one)*(two*N+one)/six) -
ol*(N*N+one) + two*N*ol*(N-oh+one) +
(N-oh+one)*N^two + two*N*(oh-N-one)*(oh+N)
}
I removed a lot of unnecessary parentheses, which actually made it harder to see what was going on. It is indeed true that for the (1,1) case the final result is not bigger than .Machine$integer.max but some of the intermediate steps are ... (for the (1,1) case this actually reduces to $$-1/6*(N+2)*(4*N^2-5*N+3)$$ ...)
f1() ## -3.032615e+12
f1() > .Machine$integer.max ## FALSE
N <- 16569L
N*(N+1)*(2*N+1) > .Machine$integer.max ## TRUE
N*(N+1L)*(2L*N+1L) ## integer overflow (NA)
f1(type="int") ## integer overflow
f1(type="bigz") ## "-3032615078557"
print(f1(),digits=20) ## -3032615078557: no actual loss of precision in this case
PS: you have a (N*N+1) term in your equation. Should that really be N*(N+1), or did you really mean N^2+1?
Given your comments, I guess that you seriously misunderstand the "correctness" of numbers in R. You say the outcome you get on Windows is something like -30598395869593930593. Now, on both 32bit and 64bit that precision is even not possible using a double, let alone using an integer :
> x <- -30598395869593930593
> format(x,scientific=F)
[1] "-30598395869593931776"
> all.equal(x,as.numeric(format(x,scientific=F)))
[1] TRUE
> as.integer(x)
[1] NA
You have 16 digits you can trust, all the rest is bollocks. Then again, an accuracy of 16 digits is already pretty strong. Most measurement tools don't even come close to that.

Resources