I want to convert my many-digit numeric vector to character. I tried the following solutions here which works for one number but not for a vector. This is OK
options(digits=20)
options(scipen=99999)
x<-129483.19999999999709;format(round(x, 12), nsmall = 12)
[1] "129483.199999999997"
But this is not. how to keep numeric precision in characters for numeric vectors?
> y <- c(129483.19999999999709, 1.3546746874,687676846.2546746464)
Specially problematic is 687676846.2546746464 Also tried:
> specify_decimal(y, 12)
[1] "129483.199999999997" "1.354674687400" "687676846.254674673080"
> formatC(y, digits = 12, format = "f")
[1] "129483.199999999997" "1.354674687400" "687676846.254674673080"
> formattable(y, digits = 12, format = "f")
[1] 129483.199999999997 1.354674687400 687676846.254674673080
> sprintf(y, fmt='%#.12g')
[1] "129483.200000" "1.35467468740" "687676846.255"
> sprintf(y, fmt='%#.22g')
[1] "129483.1999999999970896" "1.354674687399999966075" "687676846.2546746730804"
Expected result:
[1] "129483.199999999997" "1.354674687400" "687676846.254674646400"
It seems that precision loss occurs once only, it is not repeated.
> require(dplyr)
> convert <- function(x) as.numeric(as.character(x))
> 687676846.2546746464 %>% convert
[1] 687676846.25467503
> 687676846.2546746464 %>% convert %>% convert %>% convert
[1] 687676846.25467503
Here I only have 5-digit precision, but more problematic I can't know beforehand what precision I am going to get..
At the end I could do what I wanted using these functions. addtrailingzeroes will add a number of zeroes after decimal to x.
nbdec <- function(x) {
x1 <- as.character(x)
xsplit <- strsplit(x1,"\\.")
xlength <- sapply(xsplit, function(d) nchar(d)[2])
xlength <- ifelse(is.na(xlength), 0, xlength)
return(xlength)
}
trailingzeroes <- function(x, dig) {
res <- rep(NA, length(x))
for( i in 1:length(x)) {
if(!is.na(x[i])) res[i] <- { paste0(rep(0,max(0,dig-nbdec(x[i]))), collapse="") }
else { res[i] <- ""}
}
return(res)
}
trailingcommas <- function(x) ifelse(is.na(x), NA, ifelse(nbdec(x)==0, ".",""))
addtrailingzeroes <- function(x, digits) {
return(ifelse(!is.na(x), paste0(x, trailingcommas(x), trailingzeroes(x, digits)),NA))
}
However to suppress inaccuracies and rounding mistakes, x has to be cropped first using roundnumerics.max:
roundnumerics.max <- function(df, startdig=12) {
for(icol in 1:ncol(df)) {
if( is.numeric(df[,icol])) {
dig <- startdig
while(any(!as.numeric(as.character(df[,icol])) %==% df[,icol])) {
dig <- dig-1
df[,icol] <- round(df[,icol], digits=dig)
if(dig==0) {
break
pprint("ERROR: zero numeric accuracy")
}
}
pprint("Numeric accuracy for column ",icol," ", colnames(df)[icol], " is ", dig)
}
}
return(data.frame(df, stringsAsFactors = F))
}
This is slow and far from elegant... I still think it hard to believe that R has such an accuracy limitation to 16 significant digits, and adds inaccurate noise that causes divergences when you try to increase the digits option...Without letting you know...
Related
How to convert a base2 number (with fractional part) to a base10 number in R? The number can be negative as well.
Examples:
from2to10(10100101) # "165"
from2to10(0) # "0"
from2to10(10100101.01) # "165.25"
from2to10(-10100101) # "-165"
from2to10(-10100101.01) # "-165.25"
from2to10(111101111.010111) # "495.359375"
Edit: I realize I was assuming that the input would be character, perhaps a bad assumption on my part. I believe trusting R to preserve all of your 0s and 1s (with R FAQ 7.31 in mind) is a bit trusting, but I'll keep my answer as-is unless/until something better comes along.
This was interesting ... not certain if there's an R function that deals with floating-point in non-decimal, so here's one ...
#' Convert floating-point binary to decimal
#'
#' #param s 'character'
#' #return 'numeric'
#' #examples
#' tests <- c("10100101", "0", "10100101.01", "-10100101", "-10100101.01", "111101111.010111")
#' base2float(tests)
#' # [1] 165.0000 0.0000 165.2500 -165.0000 -165.2500 495.3594
base2float <- function(s, base = 2L) {
# ensure the strings seem logical:
# - start with "-", "+", or "[01]"
# - zero or more "[01]"
# - optional decimal "." (can easily change to "," for alternate reps)
# - zero or more "[01]"
stopifnot(all(grepl("^[-+]?[01]*\\.?[01]*$", s)))
splits <- strsplit(s, "\\.")
wholes <- sapply(splits, `[[`, 1L)
wholes[wholes %in% c("", "-", "+")] <- paste0(wholes[wholes %in% c("", "-", "+")], "0")
fracs <- sapply(splits, `[`, 2L)
fracs[is.na(fracs)] <- "0"
# because string-length is used in our calcs ...
fracs <- gsub("0+$", "0", fracs)
whole10 <- strtoi(wholes, base = base)
frac10 <- strtoi(fracs, base = base) / (base^nchar(fracs))
whole10 + sign(whole10)*frac10
}
library(cwhmisc) # int, frac
from2to10 <- function(n) {
SignOfNumber <- ""
if (n < 0) {
n <- abs(n)
SignOfNumber <- "-"}
nWhole <- int(n)
nWhole <- as.character(nWhole)
nFraction <- frac(n)
nFraction <- as.character(nFraction)
DecimalWhole <- sapply(strsplit(nWhole, split=""), function(x) sum(as.numeric(x) * 2^(rev(seq_along(x) - 1))))
if (nFraction == 0) {
DecimalFraction <- ""
paste0(SignOfNumber, DecimalWhole)
} else { # Find decimal fraction part
part3 <- function(x, y, z) { eval(parse(text=(paste(x, y, z,sep="")))) }
y <- as.numeric(strsplit(substr(part3("\"",n,"\""), which(strsplit(part3("\"",n,"\""), "")[[1]]==".") + 1, nchar(part3("\"",n,"\""))),"")[[1]])
DecimalFraction <- sum(y * (0.5^(1:length(y))))
paste0(SignOfNumber, DecimalWhole + DecimalFraction)
}
}
from2to10(10100101) # "165"
from2to10(0) # "0"
from2to10(10100101.01) # "165.25"
from2to10(-10100101) # "-165"
from2to10(-10100101.01) # "-165.25"
from2to10(111101111.010111) # "495.359375"; numeric to string; exact conversion
base2float("111101111.010111") # 495.3594; string to numeric; conversion with rounding. (r2evans)
All I want to do is to implement the solution given here (the one in python)
in R.
I'm not very used to do debugging in R-Studio but even after I have tried that I still can't figure out why my code does not work. Basically (with the example input provided) I get the function to run over all the numbers and then it is stuck in a sort of infinite loop (or function). Can someone please point me in the right direction regarding this?
subset_sum <- function(numbers, target, partial = numeric(0)){
s <- sum(partial,na.rm = TRUE)
# check if the partial sum equals to target
if (s == target){
cat("sum(",partial,")","=",target)
}
else if (s >= target) {
return() # if we reach the number why bother to continue
}
else {
for(i in 1:length(numbers)){
n <- numbers[i]
remaining <- numbers[i+1:length(numbers)]
subset_sum(remaining, target, partial = append(partial,n))
}
}
}
subset_sum(c(3,9,8,4,5,7,10),15)
When not run in debug mode it gives me these errors:
Error: node stack overflow
Error during wrapup: node stack overflow
Here's a recursive implementation in R
subset_sum = function(numbers,target,partial=0){
if(any(is.na(partial))) return()
s = sum(partial)
if(s == target) print(sprintf("sum(%s)=%s",paste(partial[-1],collapse="+"),target))
if(s > target) return()
for( i in seq_along(numbers)){
n = numbers[i]
remaining = numbers[(i+1):length(numbers)]
subset_sum(remaining,target,c(partial,n))
}
}
I had to add one extra catch in R from python to handle when i+1 > length(numbers) and returned an NA.
> subset_sum(c(3,9,8,4,5,7,10),15)
[1] "sum(3+8+4)=15"
[1] "sum(3+5+7)=15"
[1] "sum(8+7)=15"
[1] "sum(5+10)=15"
I think (but I'm not sure) that your issue was nest if/else if logic in a recursive function. Interestingly, when I put the if(i+1 > length(numbers)) return() inside the for loop, that broke the functionality so I didn't get all the answers right - the return's need to be outside the recursion.
This is not a recursive function but it takes advantage of R's ability to handle matrix/array type data. Some output is shown after #
v <- c(3,9,8,4,5,7,10)
v <- sort(v)
# [1] 3 4 5 7 8 9 10
target <- 15
# we don't need to check more than at most 4 numbers since 3+4+5+7 (the smallest numbers) is greater than 15
mincombs <- min(which(cumsum(v) > target))
# [1] 4
Combs <- combn(v, mincombs) # make combinations of numbers
ans <- mapply(function(x,y) ifelse(y > 0, paste0(paste0(Combs[1:y,x], collapse="+"), "=", target), NA), 1:ncol(Combs), apply(Combs, 2, function(I) which(cumsum(I) == target)))
ans <- unlist(ans[lengths(ans) > 0])
# [1] "3+4+8=15" "3+4+8=15" "3+5+7=15" "3+5+7=15" "3+5+7=15" "7+8=15"
In a function
myfun <- function(V, target) {
V <- sort(V)
mincombs <- min(which(cumsum(V) > target))
Combs <- combn(V, mincombs)
ans <- mapply(function(x,y) ifelse(y > 0, paste0(paste0(Combs[1:y,x], collapse="+"), "=", target), NA), 1:ncol(Combs), apply(Combs, 2, function(I) which(cumsum(I) == target)))
ans <- unlist(ans[lengths(ans) > 0])
return(ans)
}
myfun(V = c(3,9,8,4,5,7,10), target = 15)
myfun(V = c(3,9,8,4,5,7,10,12,4,32),target = 20)
I was reading a book and I came across this function in R. This function basically finds out patterns in the input string having a minimum threshold of 3.
vec <- "da0abcab0abcaab0d0"
find_rep_path <- function(vec, reps) {
regexp <- paste0(c("(.+)", rep("\\1", reps - 1L)), collapse = "")
match <- regmatches(vec, regexpr(regexp, vec, perl = TRUE))
substr(match, 1, nchar(match) / reps)
}
vals <- unique(strsplit(vec, "")[[1]])
str <- NULL
for (i in seq.int(nchar(vec))) {
x <- vec
for (v in vals) {
substr(x, i, i) <- v
tmp <- find_rep_path(x, 3)
if (length(tmp) > 0)
str <- c(str, tmp)
}
}
nc <- nchar(str)
unique(str[which(nc == max(nc))])
Now, I wish to convert this function into the form like,
function("da0abcab0abcaab0d0"). This means, that I can easily pass a string to the function directly and not hardcode it in the original function. How can I modify this?
I know this is a beginner question but I am completely at sea right now as far as R is concerned. Please help!
I don't see how it's hardcoded. But you can just wrap your code into a function if that's what you mean?
# Function 1
find_rep_path <- function(vec, reps) {
regexp <- paste0(c("(.+)", rep("\\1", reps - 1L)), collapse = "")
match <- regmatches(vec, regexpr(regexp, vec, perl = TRUE))
substr(match, 1, nchar(match) / reps)
}
# Function 2
foo <- function(vec) {
vals <- unique(strsplit(vec, "")[[1]])
str <- NULL
for (i in seq.int(nchar(vec))) {
x <- vec
for (v in vals) {
substr(x, i, i) <- v
tmp <- find_rep_path(x, 3)
if (length(tmp) > 0)
str <- c(str, tmp)
}
}
nc <- nchar(str)
return(unique(str[which(nc == max(nc))]))
}
vec <- "da0abcab0abcaab0d0"
foo(vec)
#[1] "0ab" "abc"
Edit1
To get the place of the matches you can use gregexr:
a <- foo(vec)
gregexpr(a[1], vec)
#[[1]]
#[1] 3 9
#attr(,"match.length")
#[1] 3 3
#attr(,"useBytes")
#[1] TRUE
This tells you that a[1] ("0ab") was matched in vec at positions 3 and 9. Run ?gregexpr for more informations.
Edit2
To add this information to each match, we can do something like
bar <- function(vec) {
m <- foo(vec)
ans <- sapply(m, gregexpr, vec, fixed = TRUE)
ans <- lapply(ans, function(x) {attributes(x) <- NULL; x})
return(ans)
}
bar(vec)
#$`0ab`
#[1] 3 9
#
#$abc
#[1] 4 10
Consider the following comma-separated string of numbers:
s <- "1,2,3,4,8,9,14,15,16,19"
s
# [1] "1,2,3,4,8,9,14,15,16,19"
Is it possible to collapse runs of consecutive numbers to its corresponding ranges, e.g. the run 1,2,3,4 above would be collapsed to the range 1-4. The desired result looks like the following string:
s
# [1] "1-4,8,9,14-16,19"
I took some heavy inspiration from the answers in this question.
findIntRuns <- function(run){
rundiff <- c(1, diff(run))
difflist <- split(run, cumsum(rundiff!=1))
unlist(lapply(difflist, function(x){
if(length(x) %in% 1:2) as.character(x) else paste0(x[1], "-", x[length(x)])
}), use.names=FALSE)
}
s <- "1,2,3,4,8,9,14,15,16,19"
s2 <- as.numeric(unlist(strsplit(s, ",")))
paste0(findIntRuns(s2), collapse=",")
[1] "1-4,8,9,14-16,19"
EDIT: Multiple solutions: benchmarking time!
Unit: microseconds
expr min lq median uq max neval
spee() 277.708 295.517 301.5540 311.5150 1612.207 1000
seb() 294.611 313.025 321.1750 332.6450 1709.103 1000
marc() 672.835 707.549 722.0375 744.5255 2154.942 1000
#speendo's solution is the fastest at the moment, but none of these have been optimised yet.
I was too slow... but here's another solution.
It uses less R-specific functions so it could be ported to other languages (on the other hand maybe it's less elegant)
s <- "1,2,3,4,8,9,14,15,16,19"
collapseConsecutive <- function(s){
x <- as.numeric(unlist(strsplit(s, ",")))
x_0 <- x[1]
out <- toString(x[1])
hasDash <- FALSE
for(i in 2:length(x)) {
x_1 <- x[i]
x_2 <- x[i+1]
if((x_0 + 1) == x_1 && !is.na(x_2) && (x_1 + 1) == x_2) {
if(!hasDash) {
out <- c(out, "-")
hasDash <- TRUE
}
} else {
if(hasDash) {
hasDash <- FALSE
} else {
out <- c(out, ",")
}
out <- c(out, x_1)
hasDash <- FALSE
}
x_0 <- x_1
}
outString <- paste(out, collapse="")
outString
}
collapseConsecutive(s)
# [1] "1-4,8,9,14-16,19"
Another fairly compact option
in.seq <- function(x) {
# returns TRUE for elments within ascending sequences
(c(diff(x, 1), NA) == 1 & c(NA, diff(x,2), NA) == 2)
}
contractSeqs <- function(x) {
# returns string formatted with contracted sequences
x[in.seq(x)] <- ""
gsub(",{2,}", "-", paste(x, collapse=","), perl=TRUE)
}
s <- "1,2,3,4,8,9,14,15,16,19"
s1 <- as.numeric(unlist(strsplit(s, ","))) # as earlier answers
# assumes: numeric vector, length > 2, positive integers, ascending sequences
contractSeqs(s1)
# [1] "1-4,8,9,14-16,19"
I also wrote a bells & whistles version that can handle both numeric and string input including named objects, descending sequences and alternative punctuation, as well as performing error checking and reporting. If anyone is interested, I can add this to my answer.
Here's a function that should do what you want:
conseq <- function(s){
s <- as.numeric(unlist(strsplit(s, ",")))
dif <- s[seq(length(s))][-1] - s[seq(length(s)-1)]
new <- !c(0, dif == 1)
cs <- cumsum(new)
res <- vector(mode="list", max(cs))
for(i in seq(res)){
s.i <- s[which(cs == i)]
if(length(s.i) > 2){
res[[i]] <- paste(min(s.i), max(s.i), sep="-")
} else {
res[[i]] <- as.character(s.i)
}
}
paste(unlist(res), collapse=",")
}
Example
> s <- "1,2,3,4,8,9,14,15,16,19"
> conseq(s)
[1] "1-4,8,9,14-16,19"
What's a quick way to encode either integer values or numeric whole number values in R to a character vector in base 62 encoding, i.e. a string that only contains [a-zA-Z0-9]? Would translating the answer to this question be sufficient?
converting a number base 10 to base 62 (a-zA-Z0-9)
Edited
Here's my solution:
toBase <- function(num, base=62) {
bv <- c(seq(0,9),letters,LETTERS)
r <- num %% base
res <- bv[r+1]
q <- floor(num/base)
while (q > 0L) {
r <- q %% base
q <- floor(q/base)
res <- paste(bv[r+1],res,sep='')
}
res
}
to10 <- function(num, base=62) {
bv <- c(seq(0,9),letters,LETTERS)
vb <- list()
for (i in 1:length(bv)) vb[[bv[i]]] <- i
num <- strsplit(num,'')[[1]]
res <- vb[[num[1]]]-1
if (length(num) > 1)
for (i in 2:length(num)) res <- base * res + (vb[[num[i]]]-1)
res
}
Is that missing anything?
Here's a solution that does base 36 using [0-9A-Z] that could easily be adapted for base 62 using [a-zA-Z0-9]. And yes, it's basically just a translation of the solution to the other question you linked to.
https://github.com/graywh/r-gmisc/blob/master/R/baseConvert.R
Here's a variant of the above code that allows you to convert a vector of numbers to base 16. It's not particularly elegant, as it isn't vectorized, but it gets the job done.
toBase <- function(num, base=16) {
bv <- c(0:9,letters,LETTERS)
r <- list()
q <- list()
res <- list()
for(i in 1:length(num)){
r[i] <- num[i] %% base
res[i] <- bv[r[[i]]+1]
q[i] <- floor(num[i]/base)
while (q[[i]] > 0L) {
r[i] <- q[[i]] %% base
q[i] <- floor(q[[i]]/base)
res[i] <- paste(bv[r[[i]]+1],res[[i]],sep='')
}
}
return(do.call('c', res))
}
To make this more standard, you should implement it in a similar way to the conversion to hexadecimal. (see here for naming.)
as.exindadeomode <- function(x)
{
#Give x a class of "exindadeomode"
#Contents as per as.hexmode
}
format.exindadeomode <- function (x, width = NULL, upper.case = FALSE, ...)
{
#Return appropriate characters
#Contents as per format.hexmode
}
To convert back to integer, just strip the class, using as.integer.