This question already has answers here:
Why does "one" < 2 equal FALSE in R?
(2 answers)
Closed 3 years ago.
I have a string and while comparing with number it does not break and say that this is positive, any hints why this happens?
x <- "The day is bad, I don't like anything! I feel bad and sad really sad"
if (x == 0) {
print("x is equal to 0")
}else if (x > 0) {
print("x is positive")
}else if (x < 0 ){
print("x is negative")
}
The result is:
"x is positive"
?'>'
...If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw...
So while you compare x which is character vector, to 0, that is numeric type, former is converted to character '0':
x == 0 evaluates to FALSE because "The day is bad..." != "0";
x < 0 evaluates to FALSE because while ordered, 0 is placed before "The day is bad..." :
...Comparison of strings in character vectors is lexicographic within
the strings using the collating sequence of the locale in use...
sort(c(x, 0))
#[1] "0"
#[2] "The day is bad, I don't like anything! I feel bad and sad really sad"
Meaning that x is thought as greater than '0' because of the lexicographic order.
Finally x > 0 evaluates to TRUE because '0' precedes 'The day is bad, I dont...' and your code returns [1] x is positive
And if, trying to prove our hypothesis, we ask ourselves, whether Chuck Norris is able to beat the Infinity, we find that it is not the case:
'Chuck Norris' > Inf
# [1] FALSE
In contrast, Keith Richards, as anybody would expect, have no problem with that:
'Keith Richards' > Inf
# [1] TRUE
Related
I know of the switch statement in R, but I'm curious if there's a way to assign the same action/value to multiple patterns in the same arm, similar to how it's in Rust:
let x = 1;
match x {
1 | 2 => println!("one or two"),
3 => println!("three"),
_ => println!("anything"),
}
I don't need to write two separate cases for 1 & 2, I can just combine them into one with '|'. It would be also helpful if I could define the default case ("_") if no pattern before was matched.
Preceding values with no assignment carry forward until an assigned value is found.
switch(
as.character(x),
"1"=,
"2"="one or two",
"3"="three",
"anything"
)
I use as.character(x) instead of just x because EXPR (the first argument) may be interpreted as positional instead of equality. From ?switch:
If the value of 'EXPR' is not a character string it is coerced to
integer. Note that this also happens for 'factor's, with a
warning, as typically the character level is meant. If the
integer is between 1 and 'nargs()-1' then the corresponding
element of '...' is evaluated and the result returned: thus if the
first argument is '3' then the fourth argument is evaluated and
returned.
So if x is an integer between 1 and the number of other arguments, then it is interpreted as a positional indicator, as in
switch(3, 'a','z','y','f')
# [1] "y"
which means that the named arguments are effectively ignored, as in this very confusing example
switch(3, '1'='a','3'='z','2'='y','4'='f')
# [1] "y"
Note that the help does not reference non-strings that are greater than nargs()-1 ... those integers return null:
(switch(9, '1'='a','3'='z','2'='y','4'='f'))
# NULL
Since it is the value of the integer you're looking to match, you need to confusingly convert to string:
switch(as.character(3), '1'='a','3'='z','2'='y','4'='f')
# [1] "z"
Alternatively,
dplyr::case_when(
x %in% 1:2 ~ "one or two",
x == 3 ~ "three",
TRUE ~ "anything"
)
# [1] "one or two"
or
data.table::fcase(
x %in% 1:2 , "one or two",
x == 3 , "three",
rep(TRUE, length(x)), "anything"
)
(The need for rep(TRUE,length(x)) is because fcase requires all arguments to be exactly the same length, i.e., it allows no recycling as many R functions allow. I personally would prefer that they allow 1 or N recycling instead of only N, but that isn't the way at the moment.)
This has an advantage that it is naturally vectorized.
switch is only length-1 friendly. A workaround for a vector x could be
sapply(x, switch, '1'='a', '3'='z', '2'='y', '4'='f')
(or, better yet, vapply enforcing the return class).
This question already has an answer here:
How to match a string with a tolerance of one character?
(1 answer)
Closed 2 years ago.
So I would like my code below to return TRUE, even as the front 2 letters are different.
Is there a way to accomplish this? I know == does not work as it compares both exactly.
if("UKVICTORIA" == "USVICTORIA") {
print("TRUE")} else {
print("FALSE")
}
}
Use agrepl
> agrepl("UKVICTORIA", "USVICTORIA", max.distance = 1)
[1] TRUE
Note, if there is an extra character (Z), it returns FALSE
> agrepl("UZKVICTORIA", "USVICTORIA", max.distance = 1)
[1] FALSE
Remove first two characters and check the number of unique values.
length(unique(sub(".{2}", "", c("UKVICTORIA", "USVICTORIA")))) == 1
#[1] TRUE
I want to write a function for counting the number of fixed digits and the number of decimal digits of each positive number. For whole numbers: for example, for number 459 I would like to see
fixed=3 and decimal=0
as the output of the function.
For decimal numbers: for example, for number 12.657 I would like to see
fixed=2 and decimal=3
(because 12 has two digits and 657 has three digits). For numbers less than 1 I would like to see
0
as the fixed, for example for number 0.4056 I would like to see
fixed=0 and decimal=4
or for number 0.13 I would like to see
fixed=0 and decimal=2
I have a general idea as follows:
digits<-function(x){
length(strsplit(as.character(x),split="")[[1]])
}
I want to extend my code as a new one to work as I explained above.
It seems you kind of stuck hoping that this can be done. Okay here is a crude way:
fun <- function(x){
stopifnot(is.numeric(as.numeric(x)))
s = nchar(unlist(strsplit(as.character(x),".",fixed = TRUE)))
if(as.numeric(x) < 1) s[1] <- s[1]-1
setNames(as.list(s),c("fixed","decimal"))
}
CORRECT:
fun(10.234)
$fixed
[1] 2
$decimal
[1] 3
fun(-10.234)
$fixed
[1] 2
$decimal
[1] 3
fun(0.2346)
$fixed
[1] 0
$decimal
[1] 4
> fun(-0.2346)
$fixed
[1] 0
$decimal
[1] 4
INCORRECT: Note that fixed + decimal <=15!!!
fun(-10000567.2346678901876)
$fixed
[1] 8
$decimal
[1] 7 ## This is incorrect
The correct value is:
fun("-10000567.2346678901876") # Note that the input x is a string
$fixed
[1] 8
$decimal
[1] 13
I do not think this can be done. We cannot assume that a simple numeric value is accurately represented in the computer. Most floating-point values certainly can't.
Enter 0.3 at the R console:
> 0.3
[1] 0.3
Looks alright, doesn't it? But now, let us do this:
> print(0.3, digits=22)
[1] 0.29999999999999999
In essence, if you convert a floating-point number to a string you define how precise you want it. The computer cannot give you that precision because it stores all numbers in bits and, thus, never gives you absolute precision. Even if you see the number as 0.3 and you think it has 0 fixed digits and 1 decimal because of that, this is because R chose to print it that way, not because that is the number represented in the computer's memory.
The other answers prove that a function can handle simple cases. I must admit that I learned that R does an incredible job interpreting the numbers. But we have to use caution! As soon as we start transforming numbers, such function cannot guarantuee meaningful results.
EDITED (based on comments):
The below function will work for numbers with 15 or less significant figures:
digits<-function(x){
## make sure x is a number, if not stop
tryCatch(
{
x/1
},
error=function(cond) {
stop('x must be a number')
}
)
## use nchar() and strsplit() to determine number of digits before decimal point
fixed<-nchar(strsplit(as.character(x),"\\.")[[1]][1])
## check if negative
if(substr(strsplit(as.character(x),"\\.")[[1]][1],1,1)=="-"){fixed<-fixed-1}
## check if -1<x<1
if(as.numeric(strsplit(as.character(x),"\\.")[[1]][1])==0){fixed<-fixed-1}
## use nchar() and strsplit() to determine number of digits after decimal point
decimal<-nchar(strsplit(as.character(x),"\\.")[[1]][2])
## for integers, replace NA decimal result with 0
if(is.na(nchar(strsplit(as.character(x),"\\.")[[1]][2]))){decimal<-0}
## return results
print(paste0("fixed: ",fixed," and decimal: ", decimal))
}
If you want to count negative signs (-) as a digit you'll need to remove:
## check if negative
if(substr(strsplit(as.character(x),"\\.")[[1]][1],1,1)=="-"){fixed<-fixed-1}
Note: The 15 or less significant figures restriction is based on 8 byte floating point representation.
The only way I've been able to overcome this is by using the below code.
digits<-function(x){
tryCatch(
{
as.numeric(x)/1
},
error=function(cond) {
stop('x must be a number')
}
)
j <- 0
num <- x
fixed <- 0
decimal <- 0
for(i in 1:nchar(x)){
if(substr(num, i, i) == "."){
j <- 1
} else if(j==0){
fixed <- fixed + 1
} else{
decimal <- decimal + 1
}
}
if(substr(x,1,1)=="-" & substr(as.numeric(x),2,2)==0){
fixed<-fixed-2
}else if(substr(x,1,1)=="-"){
fixed<-fixed-1
}else if(substr(as.numeric(x),1,1)==0){
fixed<-fixed-1
}else{}
print(paste0("fixed: ",fixed," and decimal: ", decimal))
}
However, this code requires the number to be passed into the function as a character string in order to compute passed 15 significant figures, such as is shown below:
x<-"111111111111111111111111111111111111.22222222222222222222222222222222222222222222222"
digits(x)
Which returns:
[1] "fixed: 36 and decimal: 47"
This then limits this function's applications as it cannot be used in a dplyr pipe, and you will not receive accurate results for numbers that have been previously rounded. For example, using the same number as above, but storing it as a number instead of character string yields the below results:
x<-111111111111111111111111111111111111.22222222222222222222222222222222222222222222222
digits(x)
[1] "fixed: 1 and decimal: 18"
I hope this at least somewhat helps!
(reproducible example added)
I cannot grasp enough why the following is FALSE (I aware they are double and integer resp.):
identical(1, as.integer(1)) # FALSE
?identical reveals:
num.eq:
logical indicating if (double and complex non-NA) numbers should be compared using == (‘equal’), or by bitwise comparison. The latter (non-default)
differentiates between -0 and +0.
sprintf("%.8190f", as.integer(1)) and sprintf("%.8190f", 1) return exactly equal bit pattern. So, I think that at least one of the following must return TRUE. But, I get FALSE in each of the following:
identical(1, as.integer(1), num.eq=TRUE) # FALSE
identical(1, as.integer(1), num.eq=FALSE) # FALSE
I consider like that now: If sprintf is a notation indicator, not the storage indicator, then this means identical() compares based on storage. i.e.
identical(bitpattern1, bitpattern1bitpattern2) returns FALSE. I could not find any other logical explanation to above FALSE/FALSE situation.
I do know that in both 32bit/64bit architecture of R, integers are stored as 32bit.
They are not identical precisely because they have different types. If you look at the documentation for identical you'll find the example identical(1, as.integer(1)) with the comment ## FALSE, stored as different types. That's one clue. The R language definition reminds us that:
Single numbers, such as 4.2, and strings, such as "four point two" are still vectors, of length 1; there are no more basic types (emphasis mine).
So, basically everything is a vector with a type (that's also why [1] shows up every time R returns something). You can check this by explicitly creating a vector with length 1 by using vector, and then comparing it to 0:
x <- vector("double", 1)
identical(x, 0)
# [1] TRUE
That is to say, both vector("double", 1) and 0 output vectors of type "double" and length == 1.
typeof and storage.mode point to the same thing, so you're kind of right when you say "this means identical() compares based on storage". I don't think this necessarily means that "bit patterns" are being compared, although I suppose it's possible. See what happens when you change the storage mode using storage.mode:
## Assign integer to x. This is really a vector length == 1.
x <- 1L
typeof(x)
# [1] "integer"
identical(x, 1L)
# [1] TRUE
## Now change the storage mode and compare again.
storage.mode(x) <- "double"
typeof(x)
# [1] "double"
identical(x, 1L) # This is no longer TRUE.
# [1] FALSE
identical(x, 1.0) # But this is.
# [1] TRUE
One last note: The documentation for identical states that num.eq is a…
logical indicating if (double and complex non-NA) numbers should be compared using == (‘equal’), or by bitwise comparison.
So, changing num.eq doesn't affect any comparison involving integers. Try the following:
# Comparing integers with integers.
identical(+0L, -0L, num.eq = T) # TRUE
identical(+0L, -0L, num.eq = F) # TRUE
# Comparing integers with doubles.
identical(+0, -0L, num.eq = T) # FALSE
identical(+0, -0L, num.eq = F) # FALSE
# Comparing doubles with doubles.
identical(+0.0, -0.0, num.eq = T) # TRUE
identical(+0.0, -0.0, num.eq = F) # FALSE
check.num <- function(x){
if(x>0){
print("Greater than or equal to 1")
} else if(x==0){
print("Equals zero")
} else if(x<0){
print("Less than zero")
} else{
print("Confused")
}
}
check.num(1)
#Output: [1] "Greater than or equal to 1"
check.num(0)
#Output: [1] "Equals zero"
check.num(-1)
#Output: [1] "Less than zero"
How are the below commands returning the values? I was expecting '0' in the first case and in the second "Confused" to be returned.
check.num("")
#Output: [1] "Less than zero"
check.num("kj")
#Output: [1] "Greater than or equal to 1"
From ?Comparison (or ?">"):
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of
precedence being character, complex, numeric, integer, logical and
raw.
The number 0 is cast to character to match the other argument ("" or "kj"), so we end up with these comparisons:
"">"0" ## FALSE
"kj">"0" ## TRUE
that is, it's a lexicographic comparison.
Arguably it would be less surprising if a character-vs-numeric comparison gave NA, but that's not how the language is defined.