Flatten boolean vector in R - r

How could I get a single boolean value that is TRUE if all values in vector are TRUE and FALSE otherwise? For instance:
> grepl("ABC",c("ABC","ABC","123ABC"))
[1] TRUE TRUE TRUE
my desired result:
[1] TRUE
Another example:
> grepl("ABC",c("ABC","ABC","123ABA"))
[1] TRUE TRUE FALSE
my desired result:
[1] FALSE
I know that it could be possibly solved with FOR loop, but this would be a time consuming method. Perhaps there is another, ready and simple solution. Please advise.

Use all :
all(grepl("ABC",c("ABC","ABC","123ABC")))
#[1] TRUE
all(grepl("ABC",c("ABC","ABC","123ABA")))
#[1] FALSE

Related

Full word match with grepl

I would like to have TRUE FALSE instead of the following. Any suggestion?
testLines <- c("buried","medium-buried")
grepl('\\<buried\\>',testLines)
[1] TRUE TRUE
Perhaps this?
testLines <- c("buried","medium-buried")
grepl('^buried$',testLines)
#[1] TRUE FALSE
My understanding (and regex is not my forte) is that ^ denotes the start of the string and $ the end.

Is there a built-in function "none"?

I hope the question is not too foolish.
Is there a built-in R function that returns TRUE when all the cases are FALSE?
Similar to any() or all() but when, in the case of a logical vector of 2, TRUE TRUE returns FALSE, TRUE FALSE returns FALSE and FALSE FALSE returns TRUE.
I would call it none().
We can use ! with any
!any(c(FALSE, FALSE))
Negate(any) ?
> none <- Negate(any)
> none(c(TRUE,TRUE))
[1] FALSE
> none(c(TRUE,FALSE))
[1] FALSE
> none(c(FALSE,FALSE))
[1] TRUE
Or all:
all(!vec)
Or using sum:
sum(vec)==0
where vec is your vector.

improving speed of list which need some cleaning

I'm trying to improve the speed of one of my script in R, one of the part which take a long time is the cleaning of a list under certain conditions.
Might not be necessary to perfectly understand what I want to do, and go directly to the code that I need to improve.
So here's the thing:
I have a list, each element of the list is a list of 2 elements :
- The first element is a vector of integer with a length between 1 and 4
- The second element is a vector of boolean of length 6
here's a piece of code to create such a list (of 1000 elements):
numberList<-1000
l.gen<-lapply(1:numberList,function(i){
return(list(var = floor(runif(floor(runif(1,1,5)),1,7)),vec = as.logical(floor(runif(6,0,1.99)))))
})
kind of look like that :
> l.gen
[[1]]
[[1]]$var
[1] 1 4 2
[[1]]$vec
[1] FALSE FALSE FALSE FALSE TRUE TRUE
[[2]]
[[2]]$var
[1] 3
[[2]]$vec
[1] FALSE FALSE FALSE TRUE TRUE FALSE
[[3]]
[[3]]$var
[1] 6
[[3]]$vec
[1] TRUE FALSE TRUE FALSE TRUE TRUE
[[4]]
[[4]]$var
[1] 6
[[4]]$vec
[1] TRUE TRUE TRUE FALSE FALSE FALSE
Now to cleaning part,
I want to remove from this list all elements "l" that meet two conditions:
the "$vec" of the element "l" has more than 3 times TRUE in common with another element of the list:
for example :
$vec
[1] TRUE TRUE FALSE TRUE TRUE FALSE
and
$vec
[1] TRUE FALSE FALSE TRUE TRUE FALSE
has 3 TRUE in common (the one's in 1st, 4th and 5th position) so it doesn't match the condition.
The second condition is tested in case we have the first one :
the $var of the element should have at least one element in common (doesn't matter their respective positions)
so
[[1]]$var
[1] 1 4 2
and
[[1]]$var
[1] 3 1
meet that condition (because the "1" is in both vectors)
In case two elements of the list meet both those conditions I delete the one with the shorter $var
for example in :
[[3]]
[[3]]$var
[1] 6
[[3]]$vec
[1] TRUE FALSE TRUE FALSE TRUE TRUE
[[3]]
[[3]]$var
[1] 6 3 5
[[3]]$vec
[1] TRUE TRUE TRUE FALSE TRUE TRUE
this element should be deleted :
[[3]]
[[3]]$var
[1] 6
[[3]]$vec
[1] TRUE FALSE TRUE FALSE TRUE TRUE
So here is the code I've tried that meet my requests :
res<-lapply(l.gen,function(l){
for (i in 1:length(l)){
if (length(l$var)<length(l.gen[[i]]$var)){
in.common<-sum(l$vec&l.gen[[i]]$vec)
if(in.common>limit){
var.in.common<-sum(l$var%in%l.gen[[i]]$var)
if(var.in.common>0){
return(NULL)
} else {
return("OK")
}
} else {
return("OK")
}
}
}
})
It works fine but it's kind of slow when the list is very Big.
I've tried to change the for loop with another lapply but it takes more time when the list is big as the return() in the for-loop works like a "break;" which can't be done in the lapply() which try every single element of the list.
I'm opened to every suggestion that might help.

How does ada::predict.ada work?

I tried on Cross-validated but without a response and this is a technical, implementation-centric question.
I used ada::ada in R to create a boosted model which is based on decision trees.
It normally returns a matrix with stats on predicted results compared to expected outcome.
It's something like that:
FALSE TRUE
FALSE 11023 1023
TRUE 997 5673
That's cool, good accuracy.
Now it's time to predict on new data. So I went with:
predict(myadamodel, newdata=giveinputs())
But instead of a simple answer TRUE/FALSE I've got:
[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[25] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
[49] FALSE FALSE
Levels: FALSE TRUE
I presume that this ada object is an ensamble and I received an answer from each classifier.
But in the end I need a final straight answer: TRUE/FALSE. If that's all I can get I need to know how does the "ada" function computes the final answer that was used to build the statistic. I would check that but the "ada" function is precompiled.
How do I get the final TRUE/FALS answer to comply with the statistic that ada return from the learning phase?
I've attached an example that you can copy-paste:
mydata = data.frame(a=numeric(0),b=double(0),r=logical(0))
for(i in -10:10)
for(j in 20:-4)
mydata[length(mydata[,1])+1,] = c(a=i,b=j, r= (j > i))
myada = ada(mydata[,c("a","b")], mydata[,"r"])
print(myada);
predict(myada, data.frame(a=4,b=7))
Please note that the r-column is for some reason expressed as "0" "1". I don't know why and how to tell data.frame not to convert TRUE FALSE to 0, 1 but the idea stays the same.
OK. The reproducible example helped. It looks to be a quirk in the way predict works when you pass new data that has just one row. In this case, you're getting an estimate from each of the iterations (the default number of iterations is 50). Note that you only get two values returned when you do
predict(myada, data.frame(a=4:3,b=7:8))
This is basically because of a use of sapply within the predict function. We can make our own which doesn't have this problem.
predict.ada <- ada:::predict.ada
body(predict.ada)[[12]] <- quote( tmp <- t(do.call(rbind,
lapply(1:iter, function(i) f(f = object$model$trees[[i]],
dat = newdata)))))
and then we can run
predict.ada(myada, newdata=data.frame(a=4,b=7))
# [1] TRUE
# Levels: FALSE TRUE
so this new values is predicted to be TRUE. This was tested in ada_2.0-3 and may break in other versions.
Also, in your test data, when you use c() to merge elements they must be all the same data type or they will be converted to the lowest common denominator data type that can hold all values. If you're mixing types, it's better to use a list(). For example
mydata[length(mydata[,1])+1,] = list(a=i,b=j, r= (j > i))

Partial string matching with grep and regular expressions

I have a vector of three character strings, and I'm trying to write a command that will find which members of the vector have a particular letter as the second character.
As an example, say I have this vector of 3-letter stings...
example = c("AWA","WOO","AZW","WWP")
I can use grepl and glob2rx to find strings with W as the first or last character.
> grepl(glob2rx("W*"),example)
[1] FALSE TRUE FALSE TRUE
> grepl(glob2rx("*W"),example)
[1] FALSE FALSE TRUE FALSE
However, I don't get the right result when I trying using it with glob2rx(*W*)
> grepl(glob2rx("*W*"),example)
[1] TRUE TRUE TRUE TRUE
I am sure my understanding of regular expressions is lacking, however this seems like a pretty straightforward problem and I can't seem to find the solution. I'd really love some assistance!
For future reference, I'd also really like to know if I could extend this to the case where I have longer strings. Say I have strings that are 5 characters long, could I use grepl in such a way to return strings where W is the third character?
I would have thought that this was the regex way:
> grepl("^.W",example)
[1] TRUE FALSE FALSE TRUE
If you wanted a particular position that is prespecified then:
> grepl("^.{1}W",example)
[1] TRUE FALSE FALSE TRUE
This would allow programmatic calculation:
pos= 2
n=pos-1
grepl(paste0("^.{",n,"}W"),example)
[1] TRUE FALSE FALSE TRUE
If you have 3-character strings and need to check the second character, you could just test the appropriate substring instead of using regular expressions:
example = c("AWA","WOO","AZW","WWP")
substr(example, 2, 2) == "W"
# [1] TRUE FALSE FALSE TRUE

Resources