I want to write a function for counting the number of fixed digits and the number of decimal digits of each positive number. For whole numbers: for example, for number 459 I would like to see
fixed=3 and decimal=0
as the output of the function.
For decimal numbers: for example, for number 12.657 I would like to see
fixed=2 and decimal=3
(because 12 has two digits and 657 has three digits). For numbers less than 1 I would like to see
0
as the fixed, for example for number 0.4056 I would like to see
fixed=0 and decimal=4
or for number 0.13 I would like to see
fixed=0 and decimal=2
I have a general idea as follows:
digits<-function(x){
length(strsplit(as.character(x),split="")[[1]])
}
I want to extend my code as a new one to work as I explained above.
It seems you kind of stuck hoping that this can be done. Okay here is a crude way:
fun <- function(x){
stopifnot(is.numeric(as.numeric(x)))
s = nchar(unlist(strsplit(as.character(x),".",fixed = TRUE)))
if(as.numeric(x) < 1) s[1] <- s[1]-1
setNames(as.list(s),c("fixed","decimal"))
}
CORRECT:
fun(10.234)
$fixed
[1] 2
$decimal
[1] 3
fun(-10.234)
$fixed
[1] 2
$decimal
[1] 3
fun(0.2346)
$fixed
[1] 0
$decimal
[1] 4
> fun(-0.2346)
$fixed
[1] 0
$decimal
[1] 4
INCORRECT: Note that fixed + decimal <=15!!!
fun(-10000567.2346678901876)
$fixed
[1] 8
$decimal
[1] 7 ## This is incorrect
The correct value is:
fun("-10000567.2346678901876") # Note that the input x is a string
$fixed
[1] 8
$decimal
[1] 13
I do not think this can be done. We cannot assume that a simple numeric value is accurately represented in the computer. Most floating-point values certainly can't.
Enter 0.3 at the R console:
> 0.3
[1] 0.3
Looks alright, doesn't it? But now, let us do this:
> print(0.3, digits=22)
[1] 0.29999999999999999
In essence, if you convert a floating-point number to a string you define how precise you want it. The computer cannot give you that precision because it stores all numbers in bits and, thus, never gives you absolute precision. Even if you see the number as 0.3 and you think it has 0 fixed digits and 1 decimal because of that, this is because R chose to print it that way, not because that is the number represented in the computer's memory.
The other answers prove that a function can handle simple cases. I must admit that I learned that R does an incredible job interpreting the numbers. But we have to use caution! As soon as we start transforming numbers, such function cannot guarantuee meaningful results.
EDITED (based on comments):
The below function will work for numbers with 15 or less significant figures:
digits<-function(x){
## make sure x is a number, if not stop
tryCatch(
{
x/1
},
error=function(cond) {
stop('x must be a number')
}
)
## use nchar() and strsplit() to determine number of digits before decimal point
fixed<-nchar(strsplit(as.character(x),"\\.")[[1]][1])
## check if negative
if(substr(strsplit(as.character(x),"\\.")[[1]][1],1,1)=="-"){fixed<-fixed-1}
## check if -1<x<1
if(as.numeric(strsplit(as.character(x),"\\.")[[1]][1])==0){fixed<-fixed-1}
## use nchar() and strsplit() to determine number of digits after decimal point
decimal<-nchar(strsplit(as.character(x),"\\.")[[1]][2])
## for integers, replace NA decimal result with 0
if(is.na(nchar(strsplit(as.character(x),"\\.")[[1]][2]))){decimal<-0}
## return results
print(paste0("fixed: ",fixed," and decimal: ", decimal))
}
If you want to count negative signs (-) as a digit you'll need to remove:
## check if negative
if(substr(strsplit(as.character(x),"\\.")[[1]][1],1,1)=="-"){fixed<-fixed-1}
Note: The 15 or less significant figures restriction is based on 8 byte floating point representation.
The only way I've been able to overcome this is by using the below code.
digits<-function(x){
tryCatch(
{
as.numeric(x)/1
},
error=function(cond) {
stop('x must be a number')
}
)
j <- 0
num <- x
fixed <- 0
decimal <- 0
for(i in 1:nchar(x)){
if(substr(num, i, i) == "."){
j <- 1
} else if(j==0){
fixed <- fixed + 1
} else{
decimal <- decimal + 1
}
}
if(substr(x,1,1)=="-" & substr(as.numeric(x),2,2)==0){
fixed<-fixed-2
}else if(substr(x,1,1)=="-"){
fixed<-fixed-1
}else if(substr(as.numeric(x),1,1)==0){
fixed<-fixed-1
}else{}
print(paste0("fixed: ",fixed," and decimal: ", decimal))
}
However, this code requires the number to be passed into the function as a character string in order to compute passed 15 significant figures, such as is shown below:
x<-"111111111111111111111111111111111111.22222222222222222222222222222222222222222222222"
digits(x)
Which returns:
[1] "fixed: 36 and decimal: 47"
This then limits this function's applications as it cannot be used in a dplyr pipe, and you will not receive accurate results for numbers that have been previously rounded. For example, using the same number as above, but storing it as a number instead of character string yields the below results:
x<-111111111111111111111111111111111111.22222222222222222222222222222222222222222222222
digits(x)
[1] "fixed: 1 and decimal: 18"
I hope this at least somewhat helps!
Related
I am working on boolean rules related to terminal node assignment for CART-like trees related to my work (http://web.ccs.miami.edu/~hishwaran/ishwaran.html)
I have noticed problematic behavior in evaluating inequalities of character strings using eval and parse of text. The issue has to do with how R evaluates the internal representation of a number.
Here's an example involving the number pi. I want to check if a vector (which I call x) is less than or equal to pi.
> pi
> [1] 3.141593
> rule = paste0("x <= ", pi)
> rule
> [1] "x <= 3.14159265358979"
This rule checks whether the object x is less than pi where pi is represented to 14 digits. Now I will assign x to the values 1,2,3 and pi
> x = c(1,2,3,pi)
Here's what x is up to 15 digits
> print(x, digits=15)
> [1] 1.00000000000000 2.00000000000000 3.00000000000000 3.14159265358979
Now let's evaluate this
> eval(parse(text = rule))
> [1] TRUE TRUE TRUE FALSE
Whooaaaaa, it looks like pi is not less than or equal to pi. Right?
But now if I hard-code x to pi to 14 digits, it works:
> x = c(1,2,3,3.14159265358979)
> eval(parse(text = rule))
[1] TRUE TRUE TRUE TRUE
Obviously in the first case, the internal representation for pi has many digits and so when R evaluates the expression, it is greater than the float representation and it returns FALSE. In the second case it compares two floats, so the result is true.
However, how to avoid this happening? I really need the first evaluation to come back true because I am automating this process for rule based inference and I cannot hard code a value (here this being pi) each time.
One solution I use is to add a small tolerance value.
> tol = sqrt(.Machine$double.eps)
> rule = paste0("x <= ", pi + tol)
> x = c(1,2,3,pi)
> eval(parse(text = rule))
> [1] TRUE TRUE TRUE TRUE
However, this seems like an ugly solution.
Any comments and suggestions are greatly appreciated!
You could just go via the pi name or via a function instead, to prevent pi from getting stringified (which is your first problem here)
rule <- "x <= pi"
x <- c(1,2,3,pi)
eval(parse(text = rule)) ## All TRUE
## another way might be to throw stuff you need uneval'ed into a function or a block:
my_pi <- function() {
pi
}
rule <- "x <= my_pi()"
eval(parse(text = rule)) ## All TRUE
You still will suffer from the usual floating point issues, but imprecise stringification won't be your problem anymore.
Here's why your approach didn't work:
> print( pi, digits=20 )
[1] 3.141592653589793116
> print( eval(parse(text=pi)), digits=20 )
[1] 3.1415926535897900074
The stringified pi is less than R's pi by a good margin.
The paste manual says it uses as.character to convert numbers to strings. Which in turn says it's using 15 significant digits which is what you are observing.
> trunc(26015)
[1] 26015
> 260.15*100
[1] 26015
> trunc(260.15*100)
[1] 26014
> floor(260.15*100)
[1] 26014
> as.integer(260.15*100)
[1] 26014
For this code in R, is there an issue with the internal representation of the number?
When I do 260.15*100, the number being printed is still 26015, but when I use a function like trunc() or as.integer(), it becomes 26014.
Usually, my value containing the decimal comes from another variable. So how do I overcome this issue?
The print method for a numeric is not the same as its internal representation. 260.15 * 100 is never actually 26015, it is just printed as such. print.numeric uses rounding. The underlying numeric data is floating point. You can see this by changing your print options:
# set print.numeric() to display up to 22 digits, the maximum possible
> options(digits = 22)
> 260.15 * 100
[1] 26014.99999999999636202
> 26015
[1] 26015
In lieu of trunc() or as.integer(), does round() meet your needs?
This question already has answers here:
Why does "one" < 2 equal FALSE in R?
(2 answers)
Closed 3 years ago.
I have a string and while comparing with number it does not break and say that this is positive, any hints why this happens?
x <- "The day is bad, I don't like anything! I feel bad and sad really sad"
if (x == 0) {
print("x is equal to 0")
}else if (x > 0) {
print("x is positive")
}else if (x < 0 ){
print("x is negative")
}
The result is:
"x is positive"
?'>'
...If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw...
So while you compare x which is character vector, to 0, that is numeric type, former is converted to character '0':
x == 0 evaluates to FALSE because "The day is bad..." != "0";
x < 0 evaluates to FALSE because while ordered, 0 is placed before "The day is bad..." :
...Comparison of strings in character vectors is lexicographic within
the strings using the collating sequence of the locale in use...
sort(c(x, 0))
#[1] "0"
#[2] "The day is bad, I don't like anything! I feel bad and sad really sad"
Meaning that x is thought as greater than '0' because of the lexicographic order.
Finally x > 0 evaluates to TRUE because '0' precedes 'The day is bad, I dont...' and your code returns [1] x is positive
And if, trying to prove our hypothesis, we ask ourselves, whether Chuck Norris is able to beat the Infinity, we find that it is not the case:
'Chuck Norris' > Inf
# [1] FALSE
In contrast, Keith Richards, as anybody would expect, have no problem with that:
'Keith Richards' > Inf
# [1] TRUE
I have run into the same problem as described at R which () function returns integer(0)
price = seq(4,7, by=0.0025)
allPrices = as.data.frame(price)
lookupPrice = 5.0600
which(allPrices$price == lookupPrice)
The which() statement outputs integer(0), indicating no match. It should output 425, the matching row number in that sequence.
I understand that this is a floating point issue. The link suggests using all.equal(x,y) in some manner.
How do I incorporate the all.equal() function into the which() statement, so that I get the row number in allPrices that matches lookupPrice (in this case, 5.06)?
Is there some other approach? I need the row number, because values in other columns at that price will be modified.
A manual approach to this involves specifying the tolerance for the comparison and doing:
# tol = 1e-7: comparison will be TRUE if numbers are equal up to
# 7 decimal places
tol = 1e-7
which(abs(allPrices$price - lookupPrice) < tol)
You can sapply over all the prices and apply the all.equal function to each one, to find the one that is TRUE
which(sapply(price, all.equal, lookupPrice) == TRUE)
# [1] 425
You could also try rounding the prices in your data frame to 4 decimal places:
which(round(allPrices$price, digits=4) == lookupPrice)
[1] 425
After rounding to 4 places, the precision of the lookupPrice and your data frame of prices should match.
Demo
There is a function near in dplyr:
near(x, y, tol = .Machine$double.eps^0.5)
For this case, you can try:
which(near(allPrices$price, lookupPrice))
#[1] 425
I just had the exact same problem.
I initially fixed it by converting both sets of data from numeric to characters with as.character() before calling which().
However, I wanted to figure out exactly why it wasn't working with the numeric data and did some further troubleshooting.
It appears that the problem is with the way R generates decimal sequences with seq(). Using the round() function works - as suggested by Tim Biegeleisen - but I think you only need to apply it to the numbers generated by seq(). You can check out my work below - the error is very sporadic, I just tried numbers until I found one that failed: 19.2.
> data <- 19.2
> x.seq <- seq(5, 45, 0.2)
> x.seq[72]
[1] 19.2
>
> data == 19.2
[1] TRUE
> x.seq[72] == 19.2
[1] FALSE
> data == x.seq[72]
[1] FALSE
> data == round(x.seq[72], digits = 1)
[1] TRUE
> round(data, digits = 1) == x.seq[72]
[1] FALSE
I would like to create a function that returns a vector of numbers a precision reflected by having only n significant figures, but without trailing zeros, and not in scientific notation
e.g, I would like
somenumbers <- c(0.000001234567, 1234567.89)
myformat(x = somenumbers, n = 3)
to return
[1] 0.00000123 1230000
I have been playing with format, formatC, and sprintf, but they don't seem to want to work on each number independently, and they return the numbers as character strings (in quotes).
This is the closest that i have gotten example:
> format(signif(somenumbers,4), scientific=FALSE)
[1] " 0.000001235" "1235000.000000000"
You can use the signif function to round to a given number of significant digits. If you don't want extra trailing 0's then don't "print" the results but do something else with them.
> somenumbers <- c(0.000001234567, 1234567.89)
> options(scipen=5)
> cat(signif(somenumbers,3),'\n')
0.00000123 1230000
>
sprintf seems to do it:
sprintf(c("%1.8f", "%1.0f"), signif(somenumbers, 3))
[1] "0.00000123" "1230000"
how about
myformat <- function(x, n) {
noquote(sapply(a,function(x) format(signif(x,2), scientific=FALSE)))
}