find where in boolean vector TRUE followed by FALSE - r

I have an boolean vector and need to indicate where it changes (from TRUE to FALSE).
input <- c(rep(TRUE,3), rep(FALSE,2), TRUE, FALSE)
input
[1] TRUE TRUE TRUE FALSE FALSE TRUE FALSE
The result should be c(4, 7). Does something for doing so already exist (in base)? thx, J

Related

regex a before b with and without whitespaces

I am trying to find all the string which include 'true' when there is no 'act' before it.
An example of possible vector:
vector = c("true","trueact","acttrue","act true","act really true")
What I have so far is this:
grepl(pattern="(?<!act)true", vector, perl=T, ignore.case = T)
[1] TRUE TRUE FALSE TRUE TRUE
what I'm hopping for is
[1] TRUE TRUE FALSE FALSE FALSE
May be this works - i.e. to SKIP the match when there is 'act' as preceding substring but match true otherwise
grepl("(act.*true)(*SKIP)(*FAIL)|\\btrue", vector,
perl = TRUE, ignore.case = TRUE)
[1] TRUE TRUE FALSE FALSE FALSE
Here is one way to do so:
grepl(pattern="^(.(?<!act))*?true", vector, perl=T, ignore.case = T)
[1] TRUE TRUE FALSE FALSE FALSE
^: start of the string
.: matches any character
(?<=): negative lookbehind
act: matches act
*?: matches .(?<!act) between 0 and unlimited times
true: matches true
see here for the regex demo

Flatten boolean vector in R

How could I get a single boolean value that is TRUE if all values in vector are TRUE and FALSE otherwise? For instance:
> grepl("ABC",c("ABC","ABC","123ABC"))
[1] TRUE TRUE TRUE
my desired result:
[1] TRUE
Another example:
> grepl("ABC",c("ABC","ABC","123ABA"))
[1] TRUE TRUE FALSE
my desired result:
[1] FALSE
I know that it could be possibly solved with FOR loop, but this would be a time consuming method. Perhaps there is another, ready and simple solution. Please advise.
Use all :
all(grepl("ABC",c("ABC","ABC","123ABC")))
#[1] TRUE
all(grepl("ABC",c("ABC","ABC","123ABA")))
#[1] FALSE

str_detect, validate phone number with specific pattern

I want to check if the numbers I have in the list matches specific formatting (nnn.nnn.nnnn). I am expecting the code to return a boolean (FALSE, TRUE, FALSE, TRUE, FALSE, FALSE) but the last element returns TRUE when I want it to be FALSE.
library(stringr)
numbers <- c('571-566-6666', '456.456.4566', 'apple', '222.222.2222', '222 333
4444', '2345.234.2345')
str_detect(numbers, "[:digit:]{3}\\.[:digit:]{3}\\.[:digit:]{4}")
If I use:
str_detect(numbers, "[:digit:]{4}\\.[:digit:]{3}\\.[:digit:]{4}")
I get (FALSE, FALSE, FALSE, FALSE, FALSE, TRUE), so I know the pattern for the exact matches work but I am not sure why the first block of code returns TRUE for the last element when there are 4 numbers and not 3 before the '.'
It is because that last value has `345.234.2345' at the end and you don't have a requirement that your pattern start and end with the matching values.
Try this pattern:
"^[:digit:]{3}\\.[:digit:]{3}\\.[:digit:]{4}$"
If you wanted to match with a string possibly inside or one that was separate at the end or beginning by a space it might be more general to use:
"(^|[ ])[:digit:]{3}\\.[:digit:]{3}\\.[:digit:]{4}([ ]|$)"
Testing:
numbers <- c('571-566-6666', '456.456.4566', 'apple', '222.222.2222', '222 333
4444', '2345.234.2345', "interior test 456.456.4566 other",
'456.456.4566 beginning test', "end test 456.456.4566")
str_detect(numbers, "(^|[ ])[:digit:]{3}\\.[:digit:]{3}\\.[:digit:]{4}([ ]|$)")
#[1] FALSE TRUE FALSE TRUE FALSE FALSE TRUE TRUE TRUE
And as Wictor is pointing out you could also use the word boundary operator as long as you double escape it in R patterns.
grepl("\\b[[:digit:]]{3}\\.[[:digit:]]{3}\\.[[:digit:]]{4}\\b", numbers)
[1] FALSE TRUE FALSE TRUE FALSE FALSE TRUE TRUE TRUE
Caveat: The stringr functions (which if I remember correctly are based on stringi functions) appear to be different than the "ordinary" R regex functions in that they allow using the special character classes without double bracketing.
grepl("(^|[ ])[:digit:]{3}\\.[:digit:]{3}\\.[:digit:]{4}([ ]|$)", numbers)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
grepl("(^|[ ])[[:digit:]]{3}\\.[[:digit:]]{3}\\.[[:digit:]]{4}([ ]|$)", numbers)
[1] FALSE TRUE FALSE TRUE FALSE FALSE TRUE TRUE TRUE
Apparently this is via an implicit setting of "fixed" to TRUE.

How does ada::predict.ada work?

I tried on Cross-validated but without a response and this is a technical, implementation-centric question.
I used ada::ada in R to create a boosted model which is based on decision trees.
It normally returns a matrix with stats on predicted results compared to expected outcome.
It's something like that:
FALSE TRUE
FALSE 11023 1023
TRUE 997 5673
That's cool, good accuracy.
Now it's time to predict on new data. So I went with:
predict(myadamodel, newdata=giveinputs())
But instead of a simple answer TRUE/FALSE I've got:
[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[25] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
[49] FALSE FALSE
Levels: FALSE TRUE
I presume that this ada object is an ensamble and I received an answer from each classifier.
But in the end I need a final straight answer: TRUE/FALSE. If that's all I can get I need to know how does the "ada" function computes the final answer that was used to build the statistic. I would check that but the "ada" function is precompiled.
How do I get the final TRUE/FALS answer to comply with the statistic that ada return from the learning phase?
I've attached an example that you can copy-paste:
mydata = data.frame(a=numeric(0),b=double(0),r=logical(0))
for(i in -10:10)
for(j in 20:-4)
mydata[length(mydata[,1])+1,] = c(a=i,b=j, r= (j > i))
myada = ada(mydata[,c("a","b")], mydata[,"r"])
print(myada);
predict(myada, data.frame(a=4,b=7))
Please note that the r-column is for some reason expressed as "0" "1". I don't know why and how to tell data.frame not to convert TRUE FALSE to 0, 1 but the idea stays the same.
OK. The reproducible example helped. It looks to be a quirk in the way predict works when you pass new data that has just one row. In this case, you're getting an estimate from each of the iterations (the default number of iterations is 50). Note that you only get two values returned when you do
predict(myada, data.frame(a=4:3,b=7:8))
This is basically because of a use of sapply within the predict function. We can make our own which doesn't have this problem.
predict.ada <- ada:::predict.ada
body(predict.ada)[[12]] <- quote( tmp <- t(do.call(rbind,
lapply(1:iter, function(i) f(f = object$model$trees[[i]],
dat = newdata)))))
and then we can run
predict.ada(myada, newdata=data.frame(a=4,b=7))
# [1] TRUE
# Levels: FALSE TRUE
so this new values is predicted to be TRUE. This was tested in ada_2.0-3 and may break in other versions.
Also, in your test data, when you use c() to merge elements they must be all the same data type or they will be converted to the lowest common denominator data type that can hold all values. If you're mixing types, it's better to use a list(). For example
mydata[length(mydata[,1])+1,] = list(a=i,b=j, r= (j > i))

Lazy evaluation of `which` function arguments?

If there are multiple boolean expressions as arguments to the which function, are they evaluated lazily?
For example:
which(test1 & test2)
If test1 returns false, then test2 is not evaluated as the compound expression will be false anyway.
With if there can be efficiency gains as a result of that behavior. It is documented to work that way, and I don't think it is due to lazy evaluation. Even if you "force()-ed" that expression it would still only evaluate a series of &'s until it had a single FALSE. See this help page:
?Logic
#XuWang probably deserved the credit for emphasizing the difference between "&" and "&&". The "&" operator works on vectors and returns vectors. The "&&" operator acts on scalars (actually vectors of length==1) and returns a vector of length== 1. When offered a vector or length >1 as either side of the arguments, it will work on only the information in the first value of each and emit a warning. It is only the "&&" version that does what is being called "lazy" evaluation. You can see that hte "&" operator is not acting in a "lazy fashion with a simepl test:
fn1 <- function(x) print(x)
fn2 <- function(x) print(x)
x1 <- sample(c(TRUE, FALSE), 10, replace=TRUE)
fn1(x1) & fn2(x1) # the first two indicate evaluation of both sides regardless of first value
# [1] FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
# [1] FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
# [1] FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE

Resources