R numeric to char precision loss - r

I want to convert my many-digit numeric vector to character. I tried the following solutions here which works for one number but not for a vector. This is OK
options(digits=20)
options(scipen=99999)
x<-129483.19999999999709;format(round(x, 12), nsmall = 12)
[1] "129483.199999999997"
But this is not. how to keep numeric precision in characters for numeric vectors?
> y <- c(129483.19999999999709, 1.3546746874,687676846.2546746464)
Specially problematic is 687676846.2546746464 Also tried:
> specify_decimal(y, 12)
[1] "129483.199999999997" "1.354674687400" "687676846.254674673080"
> formatC(y, digits = 12, format = "f")
[1] "129483.199999999997" "1.354674687400" "687676846.254674673080"
> formattable(y, digits = 12, format = "f")
[1] 129483.199999999997 1.354674687400 687676846.254674673080
> sprintf(y, fmt='%#.12g')
[1] "129483.200000" "1.35467468740" "687676846.255"
> sprintf(y, fmt='%#.22g')
[1] "129483.1999999999970896" "1.354674687399999966075" "687676846.2546746730804"
Expected result:
[1] "129483.199999999997" "1.354674687400" "687676846.254674646400"
It seems that precision loss occurs once only, it is not repeated.
> require(dplyr)
> convert <- function(x) as.numeric(as.character(x))
> 687676846.2546746464 %>% convert
[1] 687676846.25467503
> 687676846.2546746464 %>% convert %>% convert %>% convert
[1] 687676846.25467503
Here I only have 5-digit precision, but more problematic I can't know beforehand what precision I am going to get..

At the end I could do what I wanted using these functions. addtrailingzeroes will add a number of zeroes after decimal to x.
nbdec <- function(x) {
x1 <- as.character(x)
xsplit <- strsplit(x1,"\\.")
xlength <- sapply(xsplit, function(d) nchar(d)[2])
xlength <- ifelse(is.na(xlength), 0, xlength)
return(xlength)
}
trailingzeroes <- function(x, dig) {
res <- rep(NA, length(x))
for( i in 1:length(x)) {
if(!is.na(x[i])) res[i] <- { paste0(rep(0,max(0,dig-nbdec(x[i]))), collapse="") }
else { res[i] <- ""}
}
return(res)
}
trailingcommas <- function(x) ifelse(is.na(x), NA, ifelse(nbdec(x)==0, ".",""))
addtrailingzeroes <- function(x, digits) {
return(ifelse(!is.na(x), paste0(x, trailingcommas(x), trailingzeroes(x, digits)),NA))
}
However to suppress inaccuracies and rounding mistakes, x has to be cropped first using roundnumerics.max:
roundnumerics.max <- function(df, startdig=12) {
for(icol in 1:ncol(df)) {
if( is.numeric(df[,icol])) {
dig <- startdig
while(any(!as.numeric(as.character(df[,icol])) %==% df[,icol])) {
dig <- dig-1
df[,icol] <- round(df[,icol], digits=dig)
if(dig==0) {
break
pprint("ERROR: zero numeric accuracy")
}
}
pprint("Numeric accuracy for column ",icol," ", colnames(df)[icol], " is ", dig)
}
}
return(data.frame(df, stringsAsFactors = F))
}
This is slow and far from elegant... I still think it hard to believe that R has such an accuracy limitation to 16 significant digits, and adds inaccurate noise that causes divergences when you try to increase the digits option...Without letting you know...

Related

Convert a base2 number (with fractional part) to a base10 number in R?

How to convert a base2 number (with fractional part) to a base10 number in R? The number can be negative as well.
Examples:
from2to10(10100101) # "165"
from2to10(0) # "0"
from2to10(10100101.01) # "165.25"
from2to10(-10100101) # "-165"
from2to10(-10100101.01) # "-165.25"
from2to10(111101111.010111) # "495.359375"
Edit: I realize I was assuming that the input would be character, perhaps a bad assumption on my part. I believe trusting R to preserve all of your 0s and 1s (with R FAQ 7.31 in mind) is a bit trusting, but I'll keep my answer as-is unless/until something better comes along.
This was interesting ... not certain if there's an R function that deals with floating-point in non-decimal, so here's one ...
#' Convert floating-point binary to decimal
#'
#' #param s 'character'
#' #return 'numeric'
#' #examples
#' tests <- c("10100101", "0", "10100101.01", "-10100101", "-10100101.01", "111101111.010111")
#' base2float(tests)
#' # [1] 165.0000 0.0000 165.2500 -165.0000 -165.2500 495.3594
base2float <- function(s, base = 2L) {
# ensure the strings seem logical:
# - start with "-", "+", or "[01]"
# - zero or more "[01]"
# - optional decimal "." (can easily change to "," for alternate reps)
# - zero or more "[01]"
stopifnot(all(grepl("^[-+]?[01]*\\.?[01]*$", s)))
splits <- strsplit(s, "\\.")
wholes <- sapply(splits, `[[`, 1L)
wholes[wholes %in% c("", "-", "+")] <- paste0(wholes[wholes %in% c("", "-", "+")], "0")
fracs <- sapply(splits, `[`, 2L)
fracs[is.na(fracs)] <- "0"
# because string-length is used in our calcs ...
fracs <- gsub("0+$", "0", fracs)
whole10 <- strtoi(wholes, base = base)
frac10 <- strtoi(fracs, base = base) / (base^nchar(fracs))
whole10 + sign(whole10)*frac10
}
library(cwhmisc) # int, frac
from2to10 <- function(n) {
SignOfNumber <- ""
if (n < 0) {
n <- abs(n)
SignOfNumber <- "-"}
nWhole <- int(n)
nWhole <- as.character(nWhole)
nFraction <- frac(n)
nFraction <- as.character(nFraction)
DecimalWhole <- sapply(strsplit(nWhole, split=""), function(x) sum(as.numeric(x) * 2^(rev(seq_along(x) - 1))))
if (nFraction == 0) {
DecimalFraction <- ""
paste0(SignOfNumber, DecimalWhole)
} else { # Find decimal fraction part
part3 <- function(x, y, z) { eval(parse(text=(paste(x, y, z,sep="")))) }
y <- as.numeric(strsplit(substr(part3("\"",n,"\""), which(strsplit(part3("\"",n,"\""), "")[[1]]==".") + 1, nchar(part3("\"",n,"\""))),"")[[1]])
DecimalFraction <- sum(y * (0.5^(1:length(y))))
paste0(SignOfNumber, DecimalWhole + DecimalFraction)
}
}
from2to10(10100101) # "165"
from2to10(0) # "0"
from2to10(10100101.01) # "165.25"
from2to10(-10100101) # "-165"
from2to10(-10100101.01) # "-165.25"
from2to10(111101111.010111) # "495.359375"; numeric to string; exact conversion
base2float("111101111.010111") # 495.3594; string to numeric; conversion with rounding. (r2evans)

combinations of numbers to reach a given sum - recursive implementation in R

All I want to do is to implement the solution given here (the one in python)
in R.
I'm not very used to do debugging in R-Studio but even after I have tried that I still can't figure out why my code does not work. Basically (with the example input provided) I get the function to run over all the numbers and then it is stuck in a sort of infinite loop (or function). Can someone please point me in the right direction regarding this?
subset_sum <- function(numbers, target, partial = numeric(0)){
s <- sum(partial,na.rm = TRUE)
# check if the partial sum equals to target
if (s == target){
cat("sum(",partial,")","=",target)
}
else if (s >= target) {
return() # if we reach the number why bother to continue
}
else {
for(i in 1:length(numbers)){
n <- numbers[i]
remaining <- numbers[i+1:length(numbers)]
subset_sum(remaining, target, partial = append(partial,n))
}
}
}
subset_sum(c(3,9,8,4,5,7,10),15)
When not run in debug mode it gives me these errors:
Error: node stack overflow
Error during wrapup: node stack overflow
Here's a recursive implementation in R
subset_sum = function(numbers,target,partial=0){
if(any(is.na(partial))) return()
s = sum(partial)
if(s == target) print(sprintf("sum(%s)=%s",paste(partial[-1],collapse="+"),target))
if(s > target) return()
for( i in seq_along(numbers)){
n = numbers[i]
remaining = numbers[(i+1):length(numbers)]
subset_sum(remaining,target,c(partial,n))
}
}
I had to add one extra catch in R from python to handle when i+1 > length(numbers) and returned an NA.
> subset_sum(c(3,9,8,4,5,7,10),15)
[1] "sum(3+8+4)=15"
[1] "sum(3+5+7)=15"
[1] "sum(8+7)=15"
[1] "sum(5+10)=15"
I think (but I'm not sure) that your issue was nest if/else if logic in a recursive function. Interestingly, when I put the if(i+1 > length(numbers)) return() inside the for loop, that broke the functionality so I didn't get all the answers right - the return's need to be outside the recursion.
This is not a recursive function but it takes advantage of R's ability to handle matrix/array type data. Some output is shown after #
v <- c(3,9,8,4,5,7,10)
v <- sort(v)
# [1] 3 4 5 7 8 9 10
target <- 15
# we don't need to check more than at most 4 numbers since 3+4+5+7 (the smallest numbers) is greater than 15
mincombs <- min(which(cumsum(v) > target))
# [1] 4
Combs <- combn(v, mincombs) # make combinations of numbers
ans <- mapply(function(x,y) ifelse(y > 0, paste0(paste0(Combs[1:y,x], collapse="+"), "=", target), NA), 1:ncol(Combs), apply(Combs, 2, function(I) which(cumsum(I) == target)))
ans <- unlist(ans[lengths(ans) > 0])
# [1] "3+4+8=15" "3+4+8=15" "3+5+7=15" "3+5+7=15" "3+5+7=15" "7+8=15"
In a function
myfun <- function(V, target) {
V <- sort(V)
mincombs <- min(which(cumsum(V) > target))
Combs <- combn(V, mincombs)
ans <- mapply(function(x,y) ifelse(y > 0, paste0(paste0(Combs[1:y,x], collapse="+"), "=", target), NA), 1:ncol(Combs), apply(Combs, 2, function(I) which(cumsum(I) == target)))
ans <- unlist(ans[lengths(ans) > 0])
return(ans)
}
myfun(V = c(3,9,8,4,5,7,10), target = 15)
myfun(V = c(3,9,8,4,5,7,10,12,4,32),target = 20)

Converting a function to accept input directly in r

I was reading a book and I came across this function in R. This function basically finds out patterns in the input string having a minimum threshold of 3.
vec <- "da0abcab0abcaab0d0"
find_rep_path <- function(vec, reps) {
regexp <- paste0(c("(.+)", rep("\\1", reps - 1L)), collapse = "")
match <- regmatches(vec, regexpr(regexp, vec, perl = TRUE))
substr(match, 1, nchar(match) / reps)
}
vals <- unique(strsplit(vec, "")[[1]])
str <- NULL
for (i in seq.int(nchar(vec))) {
x <- vec
for (v in vals) {
substr(x, i, i) <- v
tmp <- find_rep_path(x, 3)
if (length(tmp) > 0)
str <- c(str, tmp)
}
}
nc <- nchar(str)
unique(str[which(nc == max(nc))])
Now, I wish to convert this function into the form like,
function("da0abcab0abcaab0d0"). This means, that I can easily pass a string to the function directly and not hardcode it in the original function. How can I modify this?
I know this is a beginner question but I am completely at sea right now as far as R is concerned. Please help!
I don't see how it's hardcoded. But you can just wrap your code into a function if that's what you mean?
# Function 1
find_rep_path <- function(vec, reps) {
regexp <- paste0(c("(.+)", rep("\\1", reps - 1L)), collapse = "")
match <- regmatches(vec, regexpr(regexp, vec, perl = TRUE))
substr(match, 1, nchar(match) / reps)
}
# Function 2
foo <- function(vec) {
vals <- unique(strsplit(vec, "")[[1]])
str <- NULL
for (i in seq.int(nchar(vec))) {
x <- vec
for (v in vals) {
substr(x, i, i) <- v
tmp <- find_rep_path(x, 3)
if (length(tmp) > 0)
str <- c(str, tmp)
}
}
nc <- nchar(str)
return(unique(str[which(nc == max(nc))]))
}
vec <- "da0abcab0abcaab0d0"
foo(vec)
#[1] "0ab" "abc"
Edit1
To get the place of the matches you can use gregexr:
a <- foo(vec)
gregexpr(a[1], vec)
#[[1]]
#[1] 3 9
#attr(,"match.length")
#[1] 3 3
#attr(,"useBytes")
#[1] TRUE
This tells you that a[1] ("0ab") was matched in vec at positions 3 and 9. Run ?gregexpr for more informations.
Edit2
To add this information to each match, we can do something like
bar <- function(vec) {
m <- foo(vec)
ans <- sapply(m, gregexpr, vec, fixed = TRUE)
ans <- lapply(ans, function(x) {attributes(x) <- NULL; x})
return(ans)
}
bar(vec)
#$`0ab`
#[1] 3 9
#
#$abc
#[1] 4 10

Collapse runs of consecutive numbers to ranges

Consider the following comma-separated string of numbers:
s <- "1,2,3,4,8,9,14,15,16,19"
s
# [1] "1,2,3,4,8,9,14,15,16,19"
Is it possible to collapse runs of consecutive numbers to its corresponding ranges, e.g. the run 1,2,3,4 above would be collapsed to the range 1-4. The desired result looks like the following string:
s
# [1] "1-4,8,9,14-16,19"
I took some heavy inspiration from the answers in this question.
findIntRuns <- function(run){
rundiff <- c(1, diff(run))
difflist <- split(run, cumsum(rundiff!=1))
unlist(lapply(difflist, function(x){
if(length(x) %in% 1:2) as.character(x) else paste0(x[1], "-", x[length(x)])
}), use.names=FALSE)
}
s <- "1,2,3,4,8,9,14,15,16,19"
s2 <- as.numeric(unlist(strsplit(s, ",")))
paste0(findIntRuns(s2), collapse=",")
[1] "1-4,8,9,14-16,19"
EDIT: Multiple solutions: benchmarking time!
Unit: microseconds
expr min lq median uq max neval
spee() 277.708 295.517 301.5540 311.5150 1612.207 1000
seb() 294.611 313.025 321.1750 332.6450 1709.103 1000
marc() 672.835 707.549 722.0375 744.5255 2154.942 1000
#speendo's solution is the fastest at the moment, but none of these have been optimised yet.
I was too slow... but here's another solution.
It uses less R-specific functions so it could be ported to other languages (on the other hand maybe it's less elegant)
s <- "1,2,3,4,8,9,14,15,16,19"
collapseConsecutive <- function(s){
x <- as.numeric(unlist(strsplit(s, ",")))
x_0 <- x[1]
out <- toString(x[1])
hasDash <- FALSE
for(i in 2:length(x)) {
x_1 <- x[i]
x_2 <- x[i+1]
if((x_0 + 1) == x_1 && !is.na(x_2) && (x_1 + 1) == x_2) {
if(!hasDash) {
out <- c(out, "-")
hasDash <- TRUE
}
} else {
if(hasDash) {
hasDash <- FALSE
} else {
out <- c(out, ",")
}
out <- c(out, x_1)
hasDash <- FALSE
}
x_0 <- x_1
}
outString <- paste(out, collapse="")
outString
}
collapseConsecutive(s)
# [1] "1-4,8,9,14-16,19"
Another fairly compact option
in.seq <- function(x) {
# returns TRUE for elments within ascending sequences
(c(diff(x, 1), NA) == 1 & c(NA, diff(x,2), NA) == 2)
}
contractSeqs <- function(x) {
# returns string formatted with contracted sequences
x[in.seq(x)] <- ""
gsub(",{2,}", "-", paste(x, collapse=","), perl=TRUE)
}
s <- "1,2,3,4,8,9,14,15,16,19"
s1 <- as.numeric(unlist(strsplit(s, ","))) # as earlier answers
# assumes: numeric vector, length > 2, positive integers, ascending sequences
contractSeqs(s1)
# [1] "1-4,8,9,14-16,19"
I also wrote a bells & whistles version that can handle both numeric and string input including named objects, descending sequences and alternative punctuation, as well as performing error checking and reporting. If anyone is interested, I can add this to my answer.
Here's a function that should do what you want:
conseq <- function(s){
s <- as.numeric(unlist(strsplit(s, ",")))
dif <- s[seq(length(s))][-1] - s[seq(length(s)-1)]
new <- !c(0, dif == 1)
cs <- cumsum(new)
res <- vector(mode="list", max(cs))
for(i in seq(res)){
s.i <- s[which(cs == i)]
if(length(s.i) > 2){
res[[i]] <- paste(min(s.i), max(s.i), sep="-")
} else {
res[[i]] <- as.character(s.i)
}
}
paste(unlist(res), collapse=",")
}
Example
> s <- "1,2,3,4,8,9,14,15,16,19"
> conseq(s)
[1] "1-4,8,9,14-16,19"

Encoding whole numbers in R to a base 62 character vector

What's a quick way to encode either integer values or numeric whole number values in R to a character vector in base 62 encoding, i.e. a string that only contains [a-zA-Z0-9]? Would translating the answer to this question be sufficient?
converting a number base 10 to base 62 (a-zA-Z0-9)
Edited
Here's my solution:
toBase <- function(num, base=62) {
bv <- c(seq(0,9),letters,LETTERS)
r <- num %% base
res <- bv[r+1]
q <- floor(num/base)
while (q > 0L) {
r <- q %% base
q <- floor(q/base)
res <- paste(bv[r+1],res,sep='')
}
res
}
to10 <- function(num, base=62) {
bv <- c(seq(0,9),letters,LETTERS)
vb <- list()
for (i in 1:length(bv)) vb[[bv[i]]] <- i
num <- strsplit(num,'')[[1]]
res <- vb[[num[1]]]-1
if (length(num) > 1)
for (i in 2:length(num)) res <- base * res + (vb[[num[i]]]-1)
res
}
Is that missing anything?
Here's a solution that does base 36 using [0-9A-Z] that could easily be adapted for base 62 using [a-zA-Z0-9]. And yes, it's basically just a translation of the solution to the other question you linked to.
https://github.com/graywh/r-gmisc/blob/master/R/baseConvert.R
Here's a variant of the above code that allows you to convert a vector of numbers to base 16. It's not particularly elegant, as it isn't vectorized, but it gets the job done.
toBase <- function(num, base=16) {
bv <- c(0:9,letters,LETTERS)
r <- list()
q <- list()
res <- list()
for(i in 1:length(num)){
r[i] <- num[i] %% base
res[i] <- bv[r[[i]]+1]
q[i] <- floor(num[i]/base)
while (q[[i]] > 0L) {
r[i] <- q[[i]] %% base
q[i] <- floor(q[[i]]/base)
res[i] <- paste(bv[r[[i]]+1],res[[i]],sep='')
}
}
return(do.call('c', res))
}
To make this more standard, you should implement it in a similar way to the conversion to hexadecimal. (see here for naming.)
as.exindadeomode <- function(x)
{
#Give x a class of "exindadeomode"
#Contents as per as.hexmode
}
format.exindadeomode <- function (x, width = NULL, upper.case = FALSE, ...)
{
#Return appropriate characters
#Contents as per format.hexmode
}
To convert back to integer, just strip the class, using as.integer.

Resources