R How to use which() with floating point values? - r

I have run into the same problem as described at R which () function returns integer(0)
price = seq(4,7, by=0.0025)
allPrices = as.data.frame(price)
lookupPrice = 5.0600
which(allPrices$price == lookupPrice)
The which() statement outputs integer(0), indicating no match. It should output 425, the matching row number in that sequence.
I understand that this is a floating point issue. The link suggests using all.equal(x,y) in some manner.
How do I incorporate the all.equal() function into the which() statement, so that I get the row number in allPrices that matches lookupPrice (in this case, 5.06)?
Is there some other approach? I need the row number, because values in other columns at that price will be modified.

A manual approach to this involves specifying the tolerance for the comparison and doing:
# tol = 1e-7: comparison will be TRUE if numbers are equal up to
# 7 decimal places
tol = 1e-7
which(abs(allPrices$price - lookupPrice) < tol)

You can sapply over all the prices and apply the all.equal function to each one, to find the one that is TRUE
which(sapply(price, all.equal, lookupPrice) == TRUE)
# [1] 425

You could also try rounding the prices in your data frame to 4 decimal places:
which(round(allPrices$price, digits=4) == lookupPrice)
[1] 425
After rounding to 4 places, the precision of the lookupPrice and your data frame of prices should match.
Demo

There is a function near in dplyr:
near(x, y, tol = .Machine$double.eps^0.5)
For this case, you can try:
which(near(allPrices$price, lookupPrice))
#[1] 425

I just had the exact same problem.
I initially fixed it by converting both sets of data from numeric to characters with as.character() before calling which().
However, I wanted to figure out exactly why it wasn't working with the numeric data and did some further troubleshooting.
It appears that the problem is with the way R generates decimal sequences with seq(). Using the round() function works - as suggested by Tim Biegeleisen - but I think you only need to apply it to the numbers generated by seq(). You can check out my work below - the error is very sporadic, I just tried numbers until I found one that failed: 19.2.
> data <- 19.2
> x.seq <- seq(5, 45, 0.2)
> x.seq[72]
[1] 19.2
>
> data == 19.2
[1] TRUE
> x.seq[72] == 19.2
[1] FALSE
> data == x.seq[72]
[1] FALSE
> data == round(x.seq[72], digits = 1)
[1] TRUE
> round(data, digits = 1) == x.seq[72]
[1] FALSE

Related

How do you test if a matrix exists in a matrix list? (Wordle Project)

I am an infrequent R users so my apologies if any of my terminology is incorrect. I am working on a project around the game Wordle to see if a given Wordle submission in my family group chat is unique or if they have already been submitted before. The inspiration for this came from the twitter account "Scorigami" which tracks every NFL game and tweets whether or not that score has occurred before in the history of the league.
To load the Wordle entries into R, I've decided to turn each submission into a Matrix where 0 = incorrect letter, 1 = right letter/wrong position, and 2 = right letter/correct position. In R this looks like this:
wordle_brendan <- rbind(c(1,0,0,0,0),c(2,2,0,0,0),c(2,2,0,0,0),c(2,2,2,2,2))
wordle_jack <- rbind(c(2,0,0,0,0),c(2,2,0,0,0),c(2,2,2,2,2))
I then combine them into a list that will be used to check against any future Wordle submissions to see if they have been previously submitted.
list <- list(wordle_brendan, wordle_jack)
I think I am on the right track, but I don't know how to create a new wordle matrix to test whether that submission has been given before. Say I recreated "wordle_brendan" with the same values but under a different name... How would I then get R to check if that matrix exists in my preexisting list of matrices? I've tried using the %in% function 1,000 different ways but can't get it to work.. Any help would be much appreciated! Thanks! (And if you can think of a better way to do this, please let me know!)
There are multiple ways to do this, but this is pretty simple. We need some samples to check:
new1 <- list[[2]] # The same as your second matrix
new2 <- new1
new2[3, 5] <- 0 # Change one position from 2 to 0.
To compare
any(sapply(list, identical, y=new1))
# [1] TRUE
any(sapply(list, identical, y=new2))
# [1] FALSE
So new1 matches an existing matrix, but new2 does not. To see which matrix:
which(sapply(list, identical, y=new1))
# [1] 2
which(sapply(list, identical, y=new2))
# integer(0)
So new1 matches the second matrix in list, but new2 does not match any matrix.
Here is a way with a matequal function. Base function identical compares objects, not values and if the matrices have the same values but different attributes, such as names, identical returns FALSE.
This is many times too strict. A function that compares values only will return TRUE in these cases.
I will use dcarlson's new1 to illustrate this point.
matequal <- function(x, y) {
ok <- is.matrix(x) && is.matrix(y) && all(dim(x) == dim(y))
ok && all(x == y)
}
wordle_brendan <- rbind(c(1,0,0,0,0),c(2,2,0,0,0),c(2,2,0,0,0),c(2,2,2,2,2))
wordle_jack <- rbind(c(2,0,0,0,0),c(2,2,0,0,0),c(2,2,2,2,2))
list <- list(wordle_brendan, wordle_jack)
new1 <- list[[2]] # The same as your second matrix
wordle_john <- wordle_jack
dimnames(wordle_john) <- list(1:3, letters[1:5])
list2 <- list(wordle_brendan, wordle_jack, wordle_john)
sapply(list2, identical, y=new1)
#> [1] FALSE TRUE FALSE
sapply(list2, matequal, y=new1)
#> [1] FALSE TRUE TRUE
Created on 2022-09-27 with reprex v2.0.2
Edit
identical is not a function to compare two objects' values, it's a function to compare the objects themselves. In the following example identical returns FALSE though x and y have equal values, in the usual sense of equal.
matequal <- function(x, y) {
ok <- is.matrix(x) && is.matrix(y) && all(dim(x) == dim(y))
ok && all(x == y)
}
x <- matrix(1:5, ncol = 1)
y <- matrix(1 + 0:4, ncol = 1)
all(x == y)
#> [1] TRUE
identical(x, y)
#> [1] FALSE
matequal(x, y)
#> [1] TRUE
Created on 2022-09-28 with reprex v2.0.2
This is because the internal representations of x and y, borrowed from the C language, correspond to different class attributes. One of the objects stores elements of class "integer" and the other elements of class "numeric". The matrices both have the same class attribute ("matrix" "array"), the matrices elements' storage type is the main difference.
In a comment it is asked
Thank you and dcarlson for the response! Regarding the your two sapply lines, can you explain what the use would be behind using matequal as opposed to identical? Is the only difference that matequal takes into account the column and row names?
So the answer to the question in comment is no, the attributes, in this case dimnames, are not the only reason why identical is some or many times not ideal to compare R objects.
typeof(x)
#> [1] "integer"
typeof(y)
#> [1] "double"
class(x[1])
#> [1] "integer"
class(y[2])
#> [1] "numeric"
class(x)
#> [1] "matrix" "array"
class(y)
#> [1] "matrix" "array"
Created on 2022-09-28 with reprex v2.0.2

R - Compare vector of objects using Reduce

I am trying to compare vector a of objects using Reduce
all.equal does not work
== works for numericals but will not be sufficient for objects.
I would prefer a solution that does not use existing packages but R core functions only
Example (Simplified to use numeric vectors instead of objects):
test <- c(1,1,1,1,1)
Reduce("==",test)
[1] TRUE
I do not understand why == works while all.equal does not
Reduce(all.equal,test)
[1] "Modes: character, numeric"
[2] "Lengths: 3, 1"
[3] "target is character, current is numeric"
Final remark:
This is not a duplicate. I am interested in a solution that compares objects not numeric values
Comparison of the elements of a vector of numeric values: See existing solution on stackoverflow
Test for equality among all elements of a single numeric vector
You can try identical in sapply and compare each with the first element.
x <- list(list(1), list(1))
all(sapply(x[-1], identical, x[[1]]))
#[1] TRUE
x <- list(list(1), list(2))
all(sapply(x[-1], identical, x[[1]]))
#[1] FALSE
Here is a function returning TRUE if all pairwise comparisons of the elements inside a list were identical:
all_pairs_equal <- function(elements) {
all(mapply(function(x, y) identical(elements[x], elements[y]), 1, seq(1, length(elements))))
}
all_pairs_equal(list(iris, iris, iris))
#> [1] TRUE
all_pairs_equal(list(1, 1, 1))
#> [1] TRUE
all_pairs_equal(list(iris, iris, 2))
#> [1] FALSE
Created on 2021-10-05 by the reprex package (v2.0.1)

R the number of significant digits leads to unexpected results of inequality using eval and parse text

I am working on boolean rules related to terminal node assignment for CART-like trees related to my work (http://web.ccs.miami.edu/~hishwaran/ishwaran.html)
I have noticed problematic behavior in evaluating inequalities of character strings using eval and parse of text. The issue has to do with how R evaluates the internal representation of a number.
Here's an example involving the number pi. I want to check if a vector (which I call x) is less than or equal to pi.
> pi
> [1] 3.141593
> rule = paste0("x <= ", pi)
> rule
> [1] "x <= 3.14159265358979"
This rule checks whether the object x is less than pi where pi is represented to 14 digits. Now I will assign x to the values 1,2,3 and pi
> x = c(1,2,3,pi)
Here's what x is up to 15 digits
> print(x, digits=15)
> [1] 1.00000000000000 2.00000000000000 3.00000000000000 3.14159265358979
Now let's evaluate this
> eval(parse(text = rule))
> [1] TRUE TRUE TRUE FALSE
Whooaaaaa, it looks like pi is not less than or equal to pi. Right?
But now if I hard-code x to pi to 14 digits, it works:
> x = c(1,2,3,3.14159265358979)
> eval(parse(text = rule))
[1] TRUE TRUE TRUE TRUE
Obviously in the first case, the internal representation for pi has many digits and so when R evaluates the expression, it is greater than the float representation and it returns FALSE. In the second case it compares two floats, so the result is true.
However, how to avoid this happening? I really need the first evaluation to come back true because I am automating this process for rule based inference and I cannot hard code a value (here this being pi) each time.
One solution I use is to add a small tolerance value.
> tol = sqrt(.Machine$double.eps)
> rule = paste0("x <= ", pi + tol)
> x = c(1,2,3,pi)
> eval(parse(text = rule))
> [1] TRUE TRUE TRUE TRUE
However, this seems like an ugly solution.
Any comments and suggestions are greatly appreciated!
You could just go via the pi name or via a function instead, to prevent pi from getting stringified (which is your first problem here)
rule <- "x <= pi"
x <- c(1,2,3,pi)
eval(parse(text = rule)) ## All TRUE
## another way might be to throw stuff you need uneval'ed into a function or a block:
my_pi <- function() {
pi
}
rule <- "x <= my_pi()"
eval(parse(text = rule)) ## All TRUE
You still will suffer from the usual floating point issues, but imprecise stringification won't be your problem anymore.
Here's why your approach didn't work:
> print( pi, digits=20 )
[1] 3.141592653589793116
> print( eval(parse(text=pi)), digits=20 )
[1] 3.1415926535897900074
The stringified pi is less than R's pi by a good margin.
The paste manual says it uses as.character to convert numbers to strings. Which in turn says it's using 15 significant digits which is what you are observing.

Fixed and decimal places of a positive number

I want to write a function for counting the number of fixed digits and the number of decimal digits of each positive number. For whole numbers: for example, for number 459 I would like to see
fixed=3 and decimal=0
as the output of the function.
For decimal numbers: for example, for number 12.657 I would like to see
fixed=2 and decimal=3
(because 12 has two digits and 657 has three digits). For numbers less than 1 I would like to see
0
as the fixed, for example for number 0.4056 I would like to see
fixed=0 and decimal=4
or for number 0.13 I would like to see
fixed=0 and decimal=2
I have a general idea as follows:
digits<-function(x){
length(strsplit(as.character(x),split="")[[1]])
}
I want to extend my code as a new one to work as I explained above.
It seems you kind of stuck hoping that this can be done. Okay here is a crude way:
fun <- function(x){
stopifnot(is.numeric(as.numeric(x)))
s = nchar(unlist(strsplit(as.character(x),".",fixed = TRUE)))
if(as.numeric(x) < 1) s[1] <- s[1]-1
setNames(as.list(s),c("fixed","decimal"))
}
CORRECT:
fun(10.234)
$fixed
[1] 2
$decimal
[1] 3
fun(-10.234)
$fixed
[1] 2
$decimal
[1] 3
fun(0.2346)
$fixed
[1] 0
$decimal
[1] 4
> fun(-0.2346)
$fixed
[1] 0
$decimal
[1] 4
INCORRECT: Note that fixed + decimal <=15!!!
fun(-10000567.2346678901876)
$fixed
[1] 8
$decimal
[1] 7 ## This is incorrect
The correct value is:
fun("-10000567.2346678901876") # Note that the input x is a string
$fixed
[1] 8
$decimal
[1] 13
I do not think this can be done. We cannot assume that a simple numeric value is accurately represented in the computer. Most floating-point values certainly can't.
Enter 0.3 at the R console:
> 0.3
[1] 0.3
Looks alright, doesn't it? But now, let us do this:
> print(0.3, digits=22)
[1] 0.29999999999999999
In essence, if you convert a floating-point number to a string you define how precise you want it. The computer cannot give you that precision because it stores all numbers in bits and, thus, never gives you absolute precision. Even if you see the number as 0.3 and you think it has 0 fixed digits and 1 decimal because of that, this is because R chose to print it that way, not because that is the number represented in the computer's memory.
The other answers prove that a function can handle simple cases. I must admit that I learned that R does an incredible job interpreting the numbers. But we have to use caution! As soon as we start transforming numbers, such function cannot guarantuee meaningful results.
EDITED (based on comments):
The below function will work for numbers with 15 or less significant figures:
digits<-function(x){
## make sure x is a number, if not stop
tryCatch(
{
x/1
},
error=function(cond) {
stop('x must be a number')
}
)
## use nchar() and strsplit() to determine number of digits before decimal point
fixed<-nchar(strsplit(as.character(x),"\\.")[[1]][1])
## check if negative
if(substr(strsplit(as.character(x),"\\.")[[1]][1],1,1)=="-"){fixed<-fixed-1}
## check if -1<x<1
if(as.numeric(strsplit(as.character(x),"\\.")[[1]][1])==0){fixed<-fixed-1}
## use nchar() and strsplit() to determine number of digits after decimal point
decimal<-nchar(strsplit(as.character(x),"\\.")[[1]][2])
## for integers, replace NA decimal result with 0
if(is.na(nchar(strsplit(as.character(x),"\\.")[[1]][2]))){decimal<-0}
## return results
print(paste0("fixed: ",fixed," and decimal: ", decimal))
}
If you want to count negative signs (-) as a digit you'll need to remove:
## check if negative
if(substr(strsplit(as.character(x),"\\.")[[1]][1],1,1)=="-"){fixed<-fixed-1}
Note: The 15 or less significant figures restriction is based on 8 byte floating point representation.
The only way I've been able to overcome this is by using the below code.
digits<-function(x){
tryCatch(
{
as.numeric(x)/1
},
error=function(cond) {
stop('x must be a number')
}
)
j <- 0
num <- x
fixed <- 0
decimal <- 0
for(i in 1:nchar(x)){
if(substr(num, i, i) == "."){
j <- 1
} else if(j==0){
fixed <- fixed + 1
} else{
decimal <- decimal + 1
}
}
if(substr(x,1,1)=="-" & substr(as.numeric(x),2,2)==0){
fixed<-fixed-2
}else if(substr(x,1,1)=="-"){
fixed<-fixed-1
}else if(substr(as.numeric(x),1,1)==0){
fixed<-fixed-1
}else{}
print(paste0("fixed: ",fixed," and decimal: ", decimal))
}
However, this code requires the number to be passed into the function as a character string in order to compute passed 15 significant figures, such as is shown below:
x<-"111111111111111111111111111111111111.22222222222222222222222222222222222222222222222"
digits(x)
Which returns:
[1] "fixed: 36 and decimal: 47"
This then limits this function's applications as it cannot be used in a dplyr pipe, and you will not receive accurate results for numbers that have been previously rounded. For example, using the same number as above, but storing it as a number instead of character string yields the below results:
x<-111111111111111111111111111111111111.22222222222222222222222222222222222222222222222
digits(x)
[1] "fixed: 1 and decimal: 18"
I hope this at least somewhat helps!

A problem on "identical()" function in R? How does "identical()" work for different types of objects?

(reproducible example added)
I cannot grasp enough why the following is FALSE (I aware they are double and integer resp.):
identical(1, as.integer(1)) # FALSE
?identical reveals:
num.eq:
logical indicating if (double and complex non-NA) numbers should be compared using == (‘equal’), or by bitwise comparison. The latter (non-default)
differentiates between -0 and +0.
sprintf("%.8190f", as.integer(1)) and sprintf("%.8190f", 1) return exactly equal bit pattern. So, I think that at least one of the following must return TRUE. But, I get FALSE in each of the following:
identical(1, as.integer(1), num.eq=TRUE) # FALSE
identical(1, as.integer(1), num.eq=FALSE) # FALSE
I consider like that now: If sprintf is a notation indicator, not the storage indicator, then this means identical() compares based on storage. i.e.
identical(bitpattern1, bitpattern1bitpattern2) returns FALSE. I could not find any other logical explanation to above FALSE/FALSE situation.
I do know that in both 32bit/64bit architecture of R, integers are stored as 32bit.
They are not identical precisely because they have different types. If you look at the documentation for identical you'll find the example identical(1, as.integer(1)) with the comment ## FALSE, stored as different types. That's one clue. The R language definition reminds us that:
Single numbers, such as 4.2, and strings, such as "four point two" are still vectors, of length 1; there are no more basic types (emphasis mine).
So, basically everything is a vector with a type (that's also why [1] shows up every time R returns something). You can check this by explicitly creating a vector with length 1 by using vector, and then comparing it to 0:
x <- vector("double", 1)
identical(x, 0)
# [1] TRUE
That is to say, both vector("double", 1) and 0 output vectors of type "double" and length == 1.
typeof and storage.mode point to the same thing, so you're kind of right when you say "this means identical() compares based on storage". I don't think this necessarily means that "bit patterns" are being compared, although I suppose it's possible. See what happens when you change the storage mode using storage.mode:
## Assign integer to x. This is really a vector length == 1.
x <- 1L
typeof(x)
# [1] "integer"
identical(x, 1L)
# [1] TRUE
## Now change the storage mode and compare again.
storage.mode(x) <- "double"
typeof(x)
# [1] "double"
identical(x, 1L) # This is no longer TRUE.
# [1] FALSE
identical(x, 1.0) # But this is.
# [1] TRUE
One last note: The documentation for identical states that num.eq is a…
logical indicating if (double and complex non-NA) numbers should be compared using == (‘equal’), or by bitwise comparison.
So, changing num.eq doesn't affect any comparison involving integers. Try the following:
# Comparing integers with integers.
identical(+0L, -0L, num.eq = T) # TRUE
identical(+0L, -0L, num.eq = F) # TRUE
# Comparing integers with doubles.
identical(+0, -0L, num.eq = T) # FALSE
identical(+0, -0L, num.eq = F) # FALSE
# Comparing doubles with doubles.
identical(+0.0, -0.0, num.eq = T) # TRUE
identical(+0.0, -0.0, num.eq = F) # FALSE

Resources