Creating a new column - based on two logical columns [duplicate] - r

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
R: subset() logical-and operator for chaining conditions should be & not &&
What is the difference between short (&,|) and long (&&, ||) forms of AND, OR logical operators in R?
For example:
x==0 & y==1
x==0 && y==1
x==0 | y==1
x==0 || y==1
I always use the short forms in my code. Does it have any handicaps?

& and | - are element-wise and can be used with vector operations, whereas, || and && always generate single TRUE or FALSE
theck the difference:
> x <- 1:5
> y <- 5:1
> (x > 2) & (y < 3)
[1] FALSE FALSE FALSE TRUE TRUE
> (x > 2) && (y < 3) # here operaand && takes only 1'st elements from logical
# vectors (x>2) and (y<3)
> FALSE
So, && and || are commonly used in if (condition) state_1 else state_2 statements, as
dealing with vectors of length 1

Related

R- arithmetic does not respect logical NOT operator and order of operations?

It appears that the logical NOT operator ! has non-intuitive order of operations in arithemtic:
set.seed(42)
a <- sample(100, 3)
b <- sample(100, 3)
c <- sample(100, 3)
l <- (1:3) <= 2
a * !l - b * !l + c
# 0 0 29
# same expression, but with parentheses for explicit grouping order of operations
(a * !l) - (b * !l) + c
# 74 14 43
There must be something I do not understand about the ! operator in relation to * or conversion from logical to numeric?
Note that in R, the negation operator ! will apply to entire expression to the right of the operator until it gets to the end or encounters an expression with a lower precedence. It does not just negate the most immediate term. Recall also that 0 is treated as FALSE and any other number is TRUE. So observe
!0
# [1] TRUE
!5
# [1] FALSE
!5-5
# [1] TRUE
!5-3-2
# [1] TRUE
(!5)-3-2
# [1] -5
So you see in the case of !5-3-2 the negation isn't happening until after the 5-3-2 is evaluated. Without the parenthesis, the negation is the very last thing that happens.
So when you write
a * !l - b * !l + c
that's the same as
a * !(l - (b * !(l + c)))
Because all the operations have to happen to the right of the negation before the negation can occur.
If you want to negate just the l terms, you can do
a * (!l) - b * (!l) + c
This is a function of the operator precedence in R (see the ?Syntax help page for details). It's once of the last operators to be evaluated in the given expression.
Note that & and | have a lower precedence than ! so when you do
!a | !b & !c
that's the same as
(!a) | ((!b) & (!c))
so this roughly would be what you expect if you just stick to logical operators. It just gets a bit odd perhaps when you combine logical and arithmetic operators.

Relational operator is not producing the expected answer in R [duplicate]

I'm trying to compare two numbers in R as a part of a if-statement condition:
(a-b) >= 0.5
In this particular instance, a = 0.58 and b = 0.08... and yet (a-b) >= 0.5 is false. I'm aware of the dangers of using == for exact number comparisons, and this seems related:
(a - b) == 0.5) is false, while
all.equal((a - b), 0.5) is true.
The only solution I can think of is to have two conditions: (a-b) > 0.5 | all.equal((a-b), 0.5). This works, but is that really the only solution? Should I just swear off of the = family of comparison operators forever?
Edit for clarity: I know that this is a floating point problem. More fundamentally, what I'm asking is: what should I do about it? What's a sensible way to deal with greater-than-or-equal-to comparisons in R, since the >= can't really be trusted?
I've never been a fan of all.equal for such things. It seems to me the tolerance works in mysterious ways sometimes. Why not just check for something greater than a tolerance less than 0.05
tol = 1e-5
(a-b) >= (0.05-tol)
In general, without rounding and with just conventional logic I find straight logic better than all.equal
If x == y then x-y == 0. Perhaps x-y is not exactly 0 so for such cases I use
abs(x-y) <= tol
You have to set tolerance anyway for all.equal and this is more compact and straightforward than all.equal.
You could create this as a separate operator or overwrite the original >= function (probably not a good idea) if you want to use this approach frequently:
# using a tolerance
epsilon <- 1e-10 # set this as a global setting
`%>=%` <- function(x, y) (x + epsilon > y)
# as a new operator with the original approach
`%>=%` <- function(x, y) (all.equal(x, y)==TRUE | (x > y))
# overwriting R's version (not advised)
`>=` <- function(x, y) (isTRUE(all.equal(x, y)) | (x > y))
> (a-b) >= 0.5
[1] TRUE
> c(1,3,5) >= 2:4
[1] FALSE FALSE TRUE
For completeness' sake, I'll point out that, in certain situations, you could simply round to a few decimal places (and this is kind of a lame solution by comparison to the better solution previously posted.)
round(0.58 - 0.08, 2) == 0.5
One more comment. The all.equal is a generic. For numeric values, it uses all.equal.numeric. An inspection of this function shows that it used .Machine$double.eps^0.5, where .Machine$double.eps is defined as
double.eps: the smallest positive floating-point number ‘x’ such that
‘1 + x != 1’. It equals ‘double.base ^ ulp.digits’ if either
‘double.base’ is 2 or ‘double.rounding’ is 0; otherwise, it
is ‘(double.base ^ double.ulp.digits) / 2’. Normally
‘2.220446e-16’.
(.Machine manual page).
In other words, that would be an acceptable choice for your tolerance:
myeq <- function(a, b, tol=.Machine$double.eps^0.5)
abs(a - b) <= tol
Choose some tolerance level:
epsilon <- 1e-10
Then use
(a-b+epsilon) >= 0.5
But, if your using tolerances anyway, why do you care that a-b == .5 (in fact) doesn't get evaluated? If you are using tolerances anyway you are saying I don't care about the end points exactly.
Here is what is true
if( (a-b) >= .5)
if( (a-b) < .5)
one of those should always evaluate true on every pair of doubles. Any code that uses one implicitly defines a no operation on the other one, at least. If your using tolerances to get actual .5 included in the first but your problem is defined on a continuous domain you arn't accomplishing much. In most problems involving continuous values in the underlying problem there will be very little point to that, since values arbitrarily over .5 will always evaluate as they should. Values arbitrarily close to .5 will go to the "wrong" flow control, but in continuous problems where you are using appropriate precision that doesn't matter.
The only time that tolerances make sense is when you are dealing with problems of the type
if( (a-b) == c)
if( (a-b) != c)
Here no amount of "appropriate precision" can help you. The reason is that you have to be prepared that the second will always evaluate to true unless you set the bits of a-b at a very low level by hand, when in fact you probably want the first to sometimes be true.
<= and >= comparisons are not language specific when numerical difficulty is raised in floating point numbers.
IsSmallerOrEqual <- function(a,b) { # To check a <= b
# Check whether "Mean relative difference..." exist in all.equal's result;
# If exists, it results in character, not logical
if ( class(all.equal(a, b)) == "logical" && (a<b | all.equal(a, b))) { return(TRUE)
} else if (a < b) { return(TRUE)
} else { return(FALSE) }
}
IsSmallerOrEqual(abs(-2-(-2.2)), 0.2) # TRUE; To check |-2-(-2.2)| <= 0.2
IsSmallerOrEqual(abs(-2-(-2.2)), 0.3) # TRUE
IsSmallerOrEqual(abs(-2-(-2.2)), 0.1) # FALSE
IsBiggerOrEqual <- function(a,b) { # To check a >= b
# Check whether "Mean relative difference..." exist in all.equal's result;
# If exists, it results in character, not logical
if ( class(all.equal(a, b)) == "logical" && (a>b | all.equal(a, b))) { return(TRUE)
} else if (a > b) { return(TRUE)
} else { return(FALSE) }
}
IsBiggerOrEqual(3,3) # TRUE
IsBiggerOrEqual(4,3) # TRUE
IsBiggerOrEqual(3,4) # FALSE
IsBiggerOrEqual(0.58 - 0.08,0.5) # TRUE
If all.equal is not processed, we may encounter errors.
The following is not necessary but useful:
abs(-2-(-2.2)) # 0.2
sprintf("%.54f",abs(-2-(-2.2))) # "0.200000000000000177635683940025046467781066894531250000"
sprintf("%.54f",0.2) # "0.200000000000000011102230246251565404236316680908203125"
all.equal(abs(-2-(-2.2)), 0.2) # TRUE; check nearly equivalence of floating point numbers
identical(abs(-2-(-2.2)), 0.2) # FALSE; check exact equivalence of floating point numbers

R inequality operator error? [duplicate]

I'm trying to compare two numbers in R as a part of a if-statement condition:
(a-b) >= 0.5
In this particular instance, a = 0.58 and b = 0.08... and yet (a-b) >= 0.5 is false. I'm aware of the dangers of using == for exact number comparisons, and this seems related:
(a - b) == 0.5) is false, while
all.equal((a - b), 0.5) is true.
The only solution I can think of is to have two conditions: (a-b) > 0.5 | all.equal((a-b), 0.5). This works, but is that really the only solution? Should I just swear off of the = family of comparison operators forever?
Edit for clarity: I know that this is a floating point problem. More fundamentally, what I'm asking is: what should I do about it? What's a sensible way to deal with greater-than-or-equal-to comparisons in R, since the >= can't really be trusted?
I've never been a fan of all.equal for such things. It seems to me the tolerance works in mysterious ways sometimes. Why not just check for something greater than a tolerance less than 0.05
tol = 1e-5
(a-b) >= (0.05-tol)
In general, without rounding and with just conventional logic I find straight logic better than all.equal
If x == y then x-y == 0. Perhaps x-y is not exactly 0 so for such cases I use
abs(x-y) <= tol
You have to set tolerance anyway for all.equal and this is more compact and straightforward than all.equal.
You could create this as a separate operator or overwrite the original >= function (probably not a good idea) if you want to use this approach frequently:
# using a tolerance
epsilon <- 1e-10 # set this as a global setting
`%>=%` <- function(x, y) (x + epsilon > y)
# as a new operator with the original approach
`%>=%` <- function(x, y) (all.equal(x, y)==TRUE | (x > y))
# overwriting R's version (not advised)
`>=` <- function(x, y) (isTRUE(all.equal(x, y)) | (x > y))
> (a-b) >= 0.5
[1] TRUE
> c(1,3,5) >= 2:4
[1] FALSE FALSE TRUE
For completeness' sake, I'll point out that, in certain situations, you could simply round to a few decimal places (and this is kind of a lame solution by comparison to the better solution previously posted.)
round(0.58 - 0.08, 2) == 0.5
One more comment. The all.equal is a generic. For numeric values, it uses all.equal.numeric. An inspection of this function shows that it used .Machine$double.eps^0.5, where .Machine$double.eps is defined as
double.eps: the smallest positive floating-point number ‘x’ such that
‘1 + x != 1’. It equals ‘double.base ^ ulp.digits’ if either
‘double.base’ is 2 or ‘double.rounding’ is 0; otherwise, it
is ‘(double.base ^ double.ulp.digits) / 2’. Normally
‘2.220446e-16’.
(.Machine manual page).
In other words, that would be an acceptable choice for your tolerance:
myeq <- function(a, b, tol=.Machine$double.eps^0.5)
abs(a - b) <= tol
Choose some tolerance level:
epsilon <- 1e-10
Then use
(a-b+epsilon) >= 0.5
But, if your using tolerances anyway, why do you care that a-b == .5 (in fact) doesn't get evaluated? If you are using tolerances anyway you are saying I don't care about the end points exactly.
Here is what is true
if( (a-b) >= .5)
if( (a-b) < .5)
one of those should always evaluate true on every pair of doubles. Any code that uses one implicitly defines a no operation on the other one, at least. If your using tolerances to get actual .5 included in the first but your problem is defined on a continuous domain you arn't accomplishing much. In most problems involving continuous values in the underlying problem there will be very little point to that, since values arbitrarily over .5 will always evaluate as they should. Values arbitrarily close to .5 will go to the "wrong" flow control, but in continuous problems where you are using appropriate precision that doesn't matter.
The only time that tolerances make sense is when you are dealing with problems of the type
if( (a-b) == c)
if( (a-b) != c)
Here no amount of "appropriate precision" can help you. The reason is that you have to be prepared that the second will always evaluate to true unless you set the bits of a-b at a very low level by hand, when in fact you probably want the first to sometimes be true.
<= and >= comparisons are not language specific when numerical difficulty is raised in floating point numbers.
IsSmallerOrEqual <- function(a,b) { # To check a <= b
# Check whether "Mean relative difference..." exist in all.equal's result;
# If exists, it results in character, not logical
if ( class(all.equal(a, b)) == "logical" && (a<b | all.equal(a, b))) { return(TRUE)
} else if (a < b) { return(TRUE)
} else { return(FALSE) }
}
IsSmallerOrEqual(abs(-2-(-2.2)), 0.2) # TRUE; To check |-2-(-2.2)| <= 0.2
IsSmallerOrEqual(abs(-2-(-2.2)), 0.3) # TRUE
IsSmallerOrEqual(abs(-2-(-2.2)), 0.1) # FALSE
IsBiggerOrEqual <- function(a,b) { # To check a >= b
# Check whether "Mean relative difference..." exist in all.equal's result;
# If exists, it results in character, not logical
if ( class(all.equal(a, b)) == "logical" && (a>b | all.equal(a, b))) { return(TRUE)
} else if (a > b) { return(TRUE)
} else { return(FALSE) }
}
IsBiggerOrEqual(3,3) # TRUE
IsBiggerOrEqual(4,3) # TRUE
IsBiggerOrEqual(3,4) # FALSE
IsBiggerOrEqual(0.58 - 0.08,0.5) # TRUE
If all.equal is not processed, we may encounter errors.
The following is not necessary but useful:
abs(-2-(-2.2)) # 0.2
sprintf("%.54f",abs(-2-(-2.2))) # "0.200000000000000177635683940025046467781066894531250000"
sprintf("%.54f",0.2) # "0.200000000000000011102230246251565404236316680908203125"
all.equal(abs(-2-(-2.2)), 0.2) # TRUE; check nearly equivalence of floating point numbers
identical(abs(-2-(-2.2)), 0.2) # FALSE; check exact equivalence of floating point numbers

What is the difference between short (&,|) and long (&&, ||) forms of AND, OR logical operators in R? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
R: subset() logical-and operator for chaining conditions should be & not &&
What is the difference between short (&,|) and long (&&, ||) forms of AND, OR logical operators in R?
For example:
x==0 & y==1
x==0 && y==1
x==0 | y==1
x==0 || y==1
I always use the short forms in my code. Does it have any handicaps?
& and | - are element-wise and can be used with vector operations, whereas, || and && always generate single TRUE or FALSE
theck the difference:
> x <- 1:5
> y <- 5:1
> (x > 2) & (y < 3)
[1] FALSE FALSE FALSE TRUE TRUE
> (x > 2) && (y < 3) # here operaand && takes only 1'st elements from logical
# vectors (x>2) and (y<3)
> FALSE
So, && and || are commonly used in if (condition) state_1 else state_2 statements, as
dealing with vectors of length 1

Numeric comparison difficulty in R

I'm trying to compare two numbers in R as a part of a if-statement condition:
(a-b) >= 0.5
In this particular instance, a = 0.58 and b = 0.08... and yet (a-b) >= 0.5 is false. I'm aware of the dangers of using == for exact number comparisons, and this seems related:
(a - b) == 0.5) is false, while
all.equal((a - b), 0.5) is true.
The only solution I can think of is to have two conditions: (a-b) > 0.5 | all.equal((a-b), 0.5). This works, but is that really the only solution? Should I just swear off of the = family of comparison operators forever?
Edit for clarity: I know that this is a floating point problem. More fundamentally, what I'm asking is: what should I do about it? What's a sensible way to deal with greater-than-or-equal-to comparisons in R, since the >= can't really be trusted?
I've never been a fan of all.equal for such things. It seems to me the tolerance works in mysterious ways sometimes. Why not just check for something greater than a tolerance less than 0.05
tol = 1e-5
(a-b) >= (0.05-tol)
In general, without rounding and with just conventional logic I find straight logic better than all.equal
If x == y then x-y == 0. Perhaps x-y is not exactly 0 so for such cases I use
abs(x-y) <= tol
You have to set tolerance anyway for all.equal and this is more compact and straightforward than all.equal.
You could create this as a separate operator or overwrite the original >= function (probably not a good idea) if you want to use this approach frequently:
# using a tolerance
epsilon <- 1e-10 # set this as a global setting
`%>=%` <- function(x, y) (x + epsilon > y)
# as a new operator with the original approach
`%>=%` <- function(x, y) (all.equal(x, y)==TRUE | (x > y))
# overwriting R's version (not advised)
`>=` <- function(x, y) (isTRUE(all.equal(x, y)) | (x > y))
> (a-b) >= 0.5
[1] TRUE
> c(1,3,5) >= 2:4
[1] FALSE FALSE TRUE
For completeness' sake, I'll point out that, in certain situations, you could simply round to a few decimal places (and this is kind of a lame solution by comparison to the better solution previously posted.)
round(0.58 - 0.08, 2) == 0.5
One more comment. The all.equal is a generic. For numeric values, it uses all.equal.numeric. An inspection of this function shows that it used .Machine$double.eps^0.5, where .Machine$double.eps is defined as
double.eps: the smallest positive floating-point number ‘x’ such that
‘1 + x != 1’. It equals ‘double.base ^ ulp.digits’ if either
‘double.base’ is 2 or ‘double.rounding’ is 0; otherwise, it
is ‘(double.base ^ double.ulp.digits) / 2’. Normally
‘2.220446e-16’.
(.Machine manual page).
In other words, that would be an acceptable choice for your tolerance:
myeq <- function(a, b, tol=.Machine$double.eps^0.5)
abs(a - b) <= tol
Choose some tolerance level:
epsilon <- 1e-10
Then use
(a-b+epsilon) >= 0.5
But, if your using tolerances anyway, why do you care that a-b == .5 (in fact) doesn't get evaluated? If you are using tolerances anyway you are saying I don't care about the end points exactly.
Here is what is true
if( (a-b) >= .5)
if( (a-b) < .5)
one of those should always evaluate true on every pair of doubles. Any code that uses one implicitly defines a no operation on the other one, at least. If your using tolerances to get actual .5 included in the first but your problem is defined on a continuous domain you arn't accomplishing much. In most problems involving continuous values in the underlying problem there will be very little point to that, since values arbitrarily over .5 will always evaluate as they should. Values arbitrarily close to .5 will go to the "wrong" flow control, but in continuous problems where you are using appropriate precision that doesn't matter.
The only time that tolerances make sense is when you are dealing with problems of the type
if( (a-b) == c)
if( (a-b) != c)
Here no amount of "appropriate precision" can help you. The reason is that you have to be prepared that the second will always evaluate to true unless you set the bits of a-b at a very low level by hand, when in fact you probably want the first to sometimes be true.
<= and >= comparisons are not language specific when numerical difficulty is raised in floating point numbers.
IsSmallerOrEqual <- function(a,b) { # To check a <= b
# Check whether "Mean relative difference..." exist in all.equal's result;
# If exists, it results in character, not logical
if ( class(all.equal(a, b)) == "logical" && (a<b | all.equal(a, b))) { return(TRUE)
} else if (a < b) { return(TRUE)
} else { return(FALSE) }
}
IsSmallerOrEqual(abs(-2-(-2.2)), 0.2) # TRUE; To check |-2-(-2.2)| <= 0.2
IsSmallerOrEqual(abs(-2-(-2.2)), 0.3) # TRUE
IsSmallerOrEqual(abs(-2-(-2.2)), 0.1) # FALSE
IsBiggerOrEqual <- function(a,b) { # To check a >= b
# Check whether "Mean relative difference..." exist in all.equal's result;
# If exists, it results in character, not logical
if ( class(all.equal(a, b)) == "logical" && (a>b | all.equal(a, b))) { return(TRUE)
} else if (a > b) { return(TRUE)
} else { return(FALSE) }
}
IsBiggerOrEqual(3,3) # TRUE
IsBiggerOrEqual(4,3) # TRUE
IsBiggerOrEqual(3,4) # FALSE
IsBiggerOrEqual(0.58 - 0.08,0.5) # TRUE
If all.equal is not processed, we may encounter errors.
The following is not necessary but useful:
abs(-2-(-2.2)) # 0.2
sprintf("%.54f",abs(-2-(-2.2))) # "0.200000000000000177635683940025046467781066894531250000"
sprintf("%.54f",0.2) # "0.200000000000000011102230246251565404236316680908203125"
all.equal(abs(-2-(-2.2)), 0.2) # TRUE; check nearly equivalence of floating point numbers
identical(abs(-2-(-2.2)), 0.2) # FALSE; check exact equivalence of floating point numbers

Resources