How does Big Integer math work in R gmp package? - r

As a self taught programmer, I was unaware of libraries such as gmp and wrote several "Big Integer" functions myself to handle large integer arithmetic. The key idea driving my algorithms is storing my large integers of interest in an array (i.e. each index would represent a single digit of the number (e.g. 123456789 would be in the array like so: (1,2,3,4,5,6,7,8,9)). An example of big integer addition is below.
MyBigIntegerAddition <- function(x1, x2) {
MyNum1 <- as.integer(strsplit(as.character(x1), "")[[1]])
MyNum2 <- as.integer(strsplit(as.character(x2), "")[[1]])
if (length(MyNum1) < length(MyNum2)) {
while (length(MyNum1) < length(MyNum2)) {MyNum1 <- c(0L, MyNum1)}
} else if (length(MyNum2) < length(MyNum1)) {
while (length(MyNum2) < length(MyNum1)) {MyNum2 <- c(0L, MyNum2)}
}
MyNum1 <- MyNum1 + MyNum2
lenMyNum1 <- length(MyNum1)
if (lenMyNum1 >= 2L) {
for (j in lenMyNum1:2L) {
TempB1 <- MyNum1[j] %% 10L
TempB2 <- floor(MyNum1[j]/ 10L)
MyNum1[j] <- TempB1
MyNum1[j - 1L] <- MyNum1[j - 1L] + TempB2
}
}
while ((MyNum1[1L] / 10L) > 1L) {
TempB1 <- MyNum1[1L] %% 10L
TempB2 <- floor(MyNum1[1L]/ 10L)
MyNum1[1L] <- TempB1
MyNum1 <- c(TempB2, MyNum1)
}
paste(MyNum1, collapse = "")
}
Below is a random example comparing output
MyBigIntegerAddition("103489710232857289034750289347590984710923874","2987234756 23746529875692873456927834569298347569237")
[1] "298723579113456762732981908207217182160283058493111"
> add.bigz("103489710232857289034750289347590984710923874","298723475623746529875692873456927834569298347569237")
Big Integer ('bigz') :
[1] 298723579113456762732981908207217182160283058493111
I have provided a function that verifies my results as well.
TestStringMath <- function(n,Lim1,Lim2) {
samp1 <- sample(Lim1:Lim2,n)
samp2 <- sample(Lim1:Lim2,n)
count <- 0L
for (i in 1:n) {
temp1 <- add.bigz(samp1[i], samp2[i])
temp2 <- as.bigz(MyBigIntegerAddition(samp1[i], samp2[i]))
if (!(temp1==temp2)) {
count <- count+1L
}
}
count
}
Question: How exactly does gmp's arithmetic functions work? Do they convert numbers to strings and use arrays? Do they simply invoke more memory?

Related

How to get 0-padded binary representation of an integer in R language

How to Generate a function in R that translates integers into a binary vector for length K, for example, if I have integer n=3 and length is K= 7
the output is 0000011?
I generate this function but I don't know how to get 0-padded binary representation :
convert_to_b <- function(n)
{
if(n > 1) {
convert_to_b(as.integer(n/2))
}
cat(n %% 2)
}
The R.utils package has the function intToBin() that can be combined with sprintf() to left pad the result.
convert_to_b <- function(n, K)
{
n <- R.utils::intToBin(n)
if (K < nchar(n))
K <- nchar(n)
sprintf("%0*d", K, as.integer(n))
}
convert_to_b(3, 7)
[1] "0000011"
An option is to use intToBits:
convert_to_b <- function(n, K) {
x <- head(substr(intToBits(n), 2L, 2L), K)
paste(rev(x), collapse="")
}
convert_to_b(3, 7)
#[1] "0000011"

R numeric to char precision loss

I want to convert my many-digit numeric vector to character. I tried the following solutions here which works for one number but not for a vector. This is OK
options(digits=20)
options(scipen=99999)
x<-129483.19999999999709;format(round(x, 12), nsmall = 12)
[1] "129483.199999999997"
But this is not. how to keep numeric precision in characters for numeric vectors?
> y <- c(129483.19999999999709, 1.3546746874,687676846.2546746464)
Specially problematic is 687676846.2546746464 Also tried:
> specify_decimal(y, 12)
[1] "129483.199999999997" "1.354674687400" "687676846.254674673080"
> formatC(y, digits = 12, format = "f")
[1] "129483.199999999997" "1.354674687400" "687676846.254674673080"
> formattable(y, digits = 12, format = "f")
[1] 129483.199999999997 1.354674687400 687676846.254674673080
> sprintf(y, fmt='%#.12g')
[1] "129483.200000" "1.35467468740" "687676846.255"
> sprintf(y, fmt='%#.22g')
[1] "129483.1999999999970896" "1.354674687399999966075" "687676846.2546746730804"
Expected result:
[1] "129483.199999999997" "1.354674687400" "687676846.254674646400"
It seems that precision loss occurs once only, it is not repeated.
> require(dplyr)
> convert <- function(x) as.numeric(as.character(x))
> 687676846.2546746464 %>% convert
[1] 687676846.25467503
> 687676846.2546746464 %>% convert %>% convert %>% convert
[1] 687676846.25467503
Here I only have 5-digit precision, but more problematic I can't know beforehand what precision I am going to get..
At the end I could do what I wanted using these functions. addtrailingzeroes will add a number of zeroes after decimal to x.
nbdec <- function(x) {
x1 <- as.character(x)
xsplit <- strsplit(x1,"\\.")
xlength <- sapply(xsplit, function(d) nchar(d)[2])
xlength <- ifelse(is.na(xlength), 0, xlength)
return(xlength)
}
trailingzeroes <- function(x, dig) {
res <- rep(NA, length(x))
for( i in 1:length(x)) {
if(!is.na(x[i])) res[i] <- { paste0(rep(0,max(0,dig-nbdec(x[i]))), collapse="") }
else { res[i] <- ""}
}
return(res)
}
trailingcommas <- function(x) ifelse(is.na(x), NA, ifelse(nbdec(x)==0, ".",""))
addtrailingzeroes <- function(x, digits) {
return(ifelse(!is.na(x), paste0(x, trailingcommas(x), trailingzeroes(x, digits)),NA))
}
However to suppress inaccuracies and rounding mistakes, x has to be cropped first using roundnumerics.max:
roundnumerics.max <- function(df, startdig=12) {
for(icol in 1:ncol(df)) {
if( is.numeric(df[,icol])) {
dig <- startdig
while(any(!as.numeric(as.character(df[,icol])) %==% df[,icol])) {
dig <- dig-1
df[,icol] <- round(df[,icol], digits=dig)
if(dig==0) {
break
pprint("ERROR: zero numeric accuracy")
}
}
pprint("Numeric accuracy for column ",icol," ", colnames(df)[icol], " is ", dig)
}
}
return(data.frame(df, stringsAsFactors = F))
}
This is slow and far from elegant... I still think it hard to believe that R has such an accuracy limitation to 16 significant digits, and adds inaccurate noise that causes divergences when you try to increase the digits option...Without letting you know...

Function with variables that are different dataframe sizes

So I´m trying to run the fuction below hoping to get 224 vectors in the output, but only get one and I have no idea why.
ee <- 0.95
td <- 480
tt <- c(60,10,14,143,60)
tt <- as.data.frame(tt)
r <- vector()
m <- function(d)
{
n <- length(tt)
c <- nrow(d)
for (j in 1:c)
{
for (i in 1:n)
{
r[i] <- tt[i]/(td*ee/d[j,])
}
return(r)
}
#where d is a data frame of 224 obs. of 1 variable
and the output i´m getting is
[[1]]
[1] 1026.3158 171.0526 239.4737 2446.0526 1026.3158
The problem comes from the fact that your function returns only the last r vector that is computed, due to where return is placed within your loop.
One way to do this is to store the results in a list:
r <- vector()
m_bis <- function(d) {
res <- list() # store all the vectors here
n <- length(tt)
c <- nrow(d)
for (j in 1:c) {
for (i in 1:n) {
r[i] <- tt[i] / (td * ee / d[j,])
}
res[j] <- r
}
return(res)
}
That should yield something like this:
m_bis(as.data.frame(mtcars$mpg))
> [[1]]
[1] 2.7631579 0.4605263 0.6447368 6.5855263 2.7631579
...
[[32]]
[1] 2.8157895 0.4692982 0.6570175 6.7109649 2.8157895
outer(as.vector(tt[,1]), as.vector(d[,1]), function(x,y){x*y/(td*ee)})
Use vectorization to accelerate the computation.

Optimize simple r code for Project Euler 12

The idea of Project Euler question 12 is to find the smallest triangular number with a specified number of divisors(https://projecteuler.net/problem=12). As an attempt to solve this problem, I wrote the following code:
# This function finds the number of divisors of a number and returns it.
FUN <- function(x) {
i = 1
lst = integer(0)
while(i<=x)
{
if(x %% i ==0)
{
lst = c(lst, i)
}
i = i +1
}
return(lst)
}
and
n = 1
i=1
while (length(FUN(n))<500)
{
i = i + 1
n = n + i
}
This code is producing the correct answer for few smaller test cases: length(FUN(n))<4 will produce 6, and length(FUN(n))<6 will produce 28.
However, this simple looking code is taking over 24 hours to run (and still running) for length(FUN(n))<500. I understand that for a number to have 500 divisors, the number is probably very big, but I am wondering why is it taking so long to run.
You FUN is much too inefficient for this task. As the first triangular number is above the 12,000th with a value of 75,000,000 and FUN runs through all these numbers ... the number of iterations to perform is almost
12000 * 75000000 / 2 = 450 * 10^9
This is clearly more than R's relatively slow for-loop can do in a reasonable time frame.
Instead, you could apply the divisors function from the numbers package that employs a prime factor decomposition. The following code need about 5-6 seconds (on my machine) to find the triangular number.
library(numbers)
t <- 0
system.time(
for (i in 1:100000) {
t <- t + i
d <- length( divisors(t) )
if (d > 500) {
cat(i, t, d, '\n')
break
}
}
)
## 12375 76576500 576
## user system elapsed
## 5.660 0.000 5.658
Instead of calculating the i-th triangular number, here i is added to the last triangular number. The time saving is minimal.
Here's my attempt:
library(gmp)
library(plyr)
get_all_factors <- function(n)
{
prime_factor_tables <- lapply(
setNames(n, n),
function(i)
{
if(i == 1) return(data.frame(x = 1L, freq = 1L))
plyr::count(as.integer(gmp::factorize(i)))
}
)
lapply(
prime_factor_tables,
function(pft)
{
powers <- plyr::alply(pft, 1, function(row) row$x ^ seq.int(0L, row$freq))
power_grid <- do.call(expand.grid, powers)
sort(unique(apply(power_grid, 1, prod)))
}
)
}
for (i in 99691200:100000) {
if (length(get_all_factors(i)[[1]])>500) print(paste(i, length(get_all_factors(i)[[1]])))
if (i %% 100000 == 0) print(paste("-",i,"-"))
}
Let it run as long as you can be bothered...

Encoding whole numbers in R to a base 62 character vector

What's a quick way to encode either integer values or numeric whole number values in R to a character vector in base 62 encoding, i.e. a string that only contains [a-zA-Z0-9]? Would translating the answer to this question be sufficient?
converting a number base 10 to base 62 (a-zA-Z0-9)
Edited
Here's my solution:
toBase <- function(num, base=62) {
bv <- c(seq(0,9),letters,LETTERS)
r <- num %% base
res <- bv[r+1]
q <- floor(num/base)
while (q > 0L) {
r <- q %% base
q <- floor(q/base)
res <- paste(bv[r+1],res,sep='')
}
res
}
to10 <- function(num, base=62) {
bv <- c(seq(0,9),letters,LETTERS)
vb <- list()
for (i in 1:length(bv)) vb[[bv[i]]] <- i
num <- strsplit(num,'')[[1]]
res <- vb[[num[1]]]-1
if (length(num) > 1)
for (i in 2:length(num)) res <- base * res + (vb[[num[i]]]-1)
res
}
Is that missing anything?
Here's a solution that does base 36 using [0-9A-Z] that could easily be adapted for base 62 using [a-zA-Z0-9]. And yes, it's basically just a translation of the solution to the other question you linked to.
https://github.com/graywh/r-gmisc/blob/master/R/baseConvert.R
Here's a variant of the above code that allows you to convert a vector of numbers to base 16. It's not particularly elegant, as it isn't vectorized, but it gets the job done.
toBase <- function(num, base=16) {
bv <- c(0:9,letters,LETTERS)
r <- list()
q <- list()
res <- list()
for(i in 1:length(num)){
r[i] <- num[i] %% base
res[i] <- bv[r[[i]]+1]
q[i] <- floor(num[i]/base)
while (q[[i]] > 0L) {
r[i] <- q[[i]] %% base
q[i] <- floor(q[[i]]/base)
res[i] <- paste(bv[r[[i]]+1],res[[i]],sep='')
}
}
return(do.call('c', res))
}
To make this more standard, you should implement it in a similar way to the conversion to hexadecimal. (see here for naming.)
as.exindadeomode <- function(x)
{
#Give x a class of "exindadeomode"
#Contents as per as.hexmode
}
format.exindadeomode <- function (x, width = NULL, upper.case = FALSE, ...)
{
#Return appropriate characters
#Contents as per format.hexmode
}
To convert back to integer, just strip the class, using as.integer.

Resources