Lazy evaluation of `which` function arguments? - r

If there are multiple boolean expressions as arguments to the which function, are they evaluated lazily?
For example:
which(test1 & test2)
If test1 returns false, then test2 is not evaluated as the compound expression will be false anyway.

With if there can be efficiency gains as a result of that behavior. It is documented to work that way, and I don't think it is due to lazy evaluation. Even if you "force()-ed" that expression it would still only evaluate a series of &'s until it had a single FALSE. See this help page:
?Logic
#XuWang probably deserved the credit for emphasizing the difference between "&" and "&&". The "&" operator works on vectors and returns vectors. The "&&" operator acts on scalars (actually vectors of length==1) and returns a vector of length== 1. When offered a vector or length >1 as either side of the arguments, it will work on only the information in the first value of each and emit a warning. It is only the "&&" version that does what is being called "lazy" evaluation. You can see that hte "&" operator is not acting in a "lazy fashion with a simepl test:
fn1 <- function(x) print(x)
fn2 <- function(x) print(x)
x1 <- sample(c(TRUE, FALSE), 10, replace=TRUE)
fn1(x1) & fn2(x1) # the first two indicate evaluation of both sides regardless of first value
# [1] FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
# [1] FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
# [1] FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE

Related

Why does rlang::quo_is_missing(quo(NA)) evaluate to FALSE?

If I understand, rlang::quo_is_missing evaluates a quosure and checks whether it contains a missing value. If it does, it should return TRUE, FALSE if not. Yet, I've tried the following combinations and it always returns FALSE:
rlang::quo_is_missing(quo(NA))
rlang::quo_is_missing(quo(NA_character_))
rlang::quo_is_missing(quo(NA_integer_))
If I try non-NA values, it also returns FALSE, as expected:
rlang::quo_is_missing(quo("hello"))
Why is it returning FALSE when the value is obviously missing?
"Missing" is a special term that refers to values that are not present at all. NA is not the same as "missing" -- NA is itself a value. In base R you can compare the functions is.na() and missing() each of which do different things. quo_is_missing is like the missing() function, not is.na and returns true only when there is no value at all:
rlang::quo_is_missing(quo())
If you want to check for NA, you could write a helper
quo_is_na <- function(x) {
!rlang::quo_is_symbolic(x) &&
!rlang::quo_is_missing(x) &&
!rlang::quo_is_null(x) &&
is.na(rlang::quo_get_expr(x))
}
quo_is_na(quo())
# [1] FALSE
quo_is_na(quo(x+y))
# [1] FALSE
quo_is_na(quo(NULL))
# [1] FALSE
quo_is_na(quo(42))
# [1] FALSE
quo_is_na(quo(NA))
# [1] TRUE
quo_is_na(quo(NA_character_))
# [1] TRUE

Regex optional character preceded by Negative Lookback in R

Suppose I have a set of strings:
test <- c('MTB', 'NOT MTB', 'TB', 'NOT TB')
I want to write a regular expression to match either 'TB' or 'MTB' (e.g., the expression "M?TB") strictly when this FAILS to be preceeded by the phrase "NOT " (space included).
My intended result, therefore, is
TRUE FALSE TRUE FALSE
So far I have tried a couple of variations of
grepl("(?<!NOT )M?TB", test, perl = T)
TRUE TRUE TRUE FALSE
Unsuccessfully. As you can see, the phrase 'NOT MTB' meets the criteria for my regular expression.
It seems like including the optional character "M?" seems to make R think that the negative lookbehind is also optional. I have been looking into using parentheses to group the patterns, such as
grepl("(?<!NOT )(M?TB)")
TRUE TRUE TRUE FALSE
Which also fails to exclude the phrase 'NOT MTB'. Admittedly, I am unclear on how parentheses work in regex or eeven what "grouping" means in this context. I have had trouble finding a question related to how to group, require, and "optionalize" different parts of a regex so that I can match a phrase beginning with an optional character and preceeded by a negative lookback. What is the proper way to write an expression like this?
We could use the start (^) and end ($) to match only those words
grepl("^M?TB$", test)
#[1] TRUE FALSE TRUE FALSE
If there are other strings as #Wiktor Stribiżew mentioned in the comments, then one option would be
test1 <- c(test, "THIS MTB")
!grepl("\\bNOT M?TB\\b", test1) & grepl("\\bM?TB\\b", test1)
#[1] TRUE FALSE TRUE FALSE TRUE
test = c("MTB", "NOT MTB", "TB", "NOT TB", "THIS TB", "THIS NOT TB")
grepl("\\b(?<!NOT\\s)M?TB\\b",test,perl = TRUE)
[1] TRUE FALSE TRUE FALSE TRUE FALSE
There is some question on what the question intends but here is some code to try depending on what is wanted.
Added: Poster clarified that #2 and #3 are along the lines looked for.
1) This can be done without regular expressions like this:
test %in% c("TB", "MTB")
## [1] TRUE FALSE TRUE FALSE
2) If the problem is not about exact matches then return matches to M?TB which do not also match NOT M?TB:
grepl("M?TB", test) & !grepl("NOT M?TB",test)
## [1] TRUE FALSE TRUE FALSE
3) Another alternative is to replace NOT M?TB with X and then grepl on M?TB:
grepl("M?TB", sub("NOT M?TB", "X", test))
## [1] TRUE FALSE TRUE FALSE

Using NOT (!) operator in R with numbers

I'm trying to understand the ! operator better in R, and I'm confused as to how it applies to numbers. What does the following code signify, and why are the two equality queries not the same?
> !5 == 7
[1] TRUE
> 5 == !7
[1] FALSE
> !5
[1] FALSE
Thanks!
First of all: the ! operator coerces non-logicals to logical, then reverses them. Anything other than 0 evaluates to a logical TRUE, then the ! operator flips it to FALSE
The rest has to do with order of operations.
!5 == 7
Evaluates to
!(5==7)
Which is equivalent to
!(FALSE)
Which returns TRUE
Whereas
5 == !7
Evaluates to
5 == FALSE
Which returns FALSE
The equivalent to 5 == !7 would be (!5) == 7 (Both return FALSE)
The ! coerces its argument to a logical, thus:
as.logical(-3L:3L)
# [1] TRUE TRUE TRUE FALSE TRUE TRUE TRUE
as.logical(seq(-2,2, by = 0.5))
# [1] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
As you can see, 0 is FALSE, everything else is TRUE.
To get an even better sense of this, see that ! is - like everything in R - a function:
> `!`
function (x) .Primitive("!")
So, you're applying the ! function to numeric arguments, which are coerced to logical, as above.
When you compare a logical to a numeric value using ==, the numeric value is also coerced to logical.
In your first example (!5 == 7) is due to precedence ordering; == is higher precedence than !.

Why do logical operators negate their argument when there is only one argument in R?

When passing only a single vector to the logical and/or operator, the operator negates the argument:
> x = c(F,T,T)
> `&`(x)
[1] TRUE FALSE FALSE
> `|`(x)
[1] TRUE FALSE FALSE
To make the logical operator work as idempotent, one needs to pass a single element vector as the second argument:
> `&`(x,T)
[1] FALSE TRUE TRUE
> `|`(x,F)
[1] FALSE TRUE TRUE
Why do the logical operators negate their argument when there is only one argument passed?
This was modified in R 3.2.1 as a result of a bug report. As you've pointed out, the previous behavior made little sense:

How does ada::predict.ada work?

I tried on Cross-validated but without a response and this is a technical, implementation-centric question.
I used ada::ada in R to create a boosted model which is based on decision trees.
It normally returns a matrix with stats on predicted results compared to expected outcome.
It's something like that:
FALSE TRUE
FALSE 11023 1023
TRUE 997 5673
That's cool, good accuracy.
Now it's time to predict on new data. So I went with:
predict(myadamodel, newdata=giveinputs())
But instead of a simple answer TRUE/FALSE I've got:
[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[25] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
[49] FALSE FALSE
Levels: FALSE TRUE
I presume that this ada object is an ensamble and I received an answer from each classifier.
But in the end I need a final straight answer: TRUE/FALSE. If that's all I can get I need to know how does the "ada" function computes the final answer that was used to build the statistic. I would check that but the "ada" function is precompiled.
How do I get the final TRUE/FALS answer to comply with the statistic that ada return from the learning phase?
I've attached an example that you can copy-paste:
mydata = data.frame(a=numeric(0),b=double(0),r=logical(0))
for(i in -10:10)
for(j in 20:-4)
mydata[length(mydata[,1])+1,] = c(a=i,b=j, r= (j > i))
myada = ada(mydata[,c("a","b")], mydata[,"r"])
print(myada);
predict(myada, data.frame(a=4,b=7))
Please note that the r-column is for some reason expressed as "0" "1". I don't know why and how to tell data.frame not to convert TRUE FALSE to 0, 1 but the idea stays the same.
OK. The reproducible example helped. It looks to be a quirk in the way predict works when you pass new data that has just one row. In this case, you're getting an estimate from each of the iterations (the default number of iterations is 50). Note that you only get two values returned when you do
predict(myada, data.frame(a=4:3,b=7:8))
This is basically because of a use of sapply within the predict function. We can make our own which doesn't have this problem.
predict.ada <- ada:::predict.ada
body(predict.ada)[[12]] <- quote( tmp <- t(do.call(rbind,
lapply(1:iter, function(i) f(f = object$model$trees[[i]],
dat = newdata)))))
and then we can run
predict.ada(myada, newdata=data.frame(a=4,b=7))
# [1] TRUE
# Levels: FALSE TRUE
so this new values is predicted to be TRUE. This was tested in ada_2.0-3 and may break in other versions.
Also, in your test data, when you use c() to merge elements they must be all the same data type or they will be converted to the lowest common denominator data type that can hold all values. If you're mixing types, it's better to use a list(). For example
mydata[length(mydata[,1])+1,] = list(a=i,b=j, r= (j > i))

Resources