I am using the binary operator %in% to subset a dataframe (I got the idea from another stackoverflow thread), but when I double check the result by switching the arguments, I get different answers. I've read the R documentation on the match() function, and it seems like neither match() nor %in% should be directionally dependent. I really need to understand exactly what is happening to be confident in my results. Could anybody provide some insight?
> filtered_ordGeneNames_proteinIDs <- ordGeneNames_ProteinIDs[ordGeneNames_ProteinIDs$V4 %in% ordDEGs$X, ];
> filtered2_ordGeneNames_proteinIDs <- ordDEGs[ordDEGs$X %in% ordGeneNames_ProteinIDs$V4, ];
> nrow(filtered_ordGeneNames_proteinIDs)
[1] 5767
> nrow(filtered2_ordGeneNames_proteinIDs)
[1] 5746
Of course you have different results:
ordGeneNames_ProteinIDs$V4 %in% ordDEGs$X
tells you which element of ordGeneNames_ProteinIDs$V4 that is also in ordDEGs$X
where :
ordDEGs %in% $XordGeneNames_ProteinIDs$V4
tells you which element of ordDEGs$X that is also in ordGeneNames_ProteinIDs$V4
compare
c(1,2,3,4) %in% c(1,2,1, 2)
[1] TRUE TRUE FALSE FALSE
to
c(1,2,1, 2) %in% c(1,2,3,4)
[1] TRUE TRUE TRUE TRUE
Related
a <- character()
b <- "SO is great"
any(a == b)
#> [1] FALSE
all(a == b)
#> [1] TRUE
The manual describes ‘any’ like this
Given a set of logical vectors, is at least one of the values true?
So, not even one value in the comparison a == b yields TRUE.
If that is the case how can ‘any’ return FALSE while ‘all’ returns TRUE? ‘all’
is described as Given a set of logical vectors, are all of the values true?.
In a nutshell: all values are TRUE and none are TRUE at the same time?
I am not expert but that looks odd.
Questions:
Is there a reasonable explanation for or is it just some quirk of R?
What are the ways around this?
Created on 2021-01-08 by the reprex package (v0.3.0)
Usually, when comparing a == b the elements of the shorter vector are recycled as necessary. However, in your case a has no elements, so no recycling occurs and the result is an empty logical vector.
The results of any(a == b) and all(a == b) are coherent with the logical quantifiers for all and exists. If you quantify on an empty range, for all gives the neutral element for logical conjunction (AND) which is TRUE, while exists gives the neutral element for logical disjunction (OR) which is FALSE.
As to how avoid these situations, check if the vectors have the same size, since comparing vectors of different lengths rarely makes sense.
Regarding question number 2, I know of identical. It works well in all the situations I can think of.
a <- "a"
b <- "b"
identical(a, b) # FALSE >> works
#> [1] FALSE
a <- character(0)
identical(a, b) # FALSE >> works
#> [1] FALSE
a <- NA
identical(a, b) # FALSE >> works
#> [1] FALSE
a <- NULL
identical(a, b) # FALSE >> works
#> [1] FALSE
a <- b
identical(a, b) # TRUE >> works
#> [1] TRUE
identical seems to be a good workaround though it still feels like a workaround to a part-time developer like me. Are there more solutions? Better ones? And why does R behave like this in the first place (see question)?
Created on 2021-01-08 by the reprex package (v0.3.0)
Regarding question 1)
I have no idea whether I am correct, but here are my thoughts:
In R all() is the compliment of any(). For consistency, all(logical(0)) is true. So, you're situation you are capturing this unique case.
In mathematics, this is analogous to a set being both open and closed. I'm not a computer scientist, so I can't really talk to why one of the greybeards from way back when implemented this in either R or S.
regarding question 2)
I think the other responses have answered this well.
Another solution provided by the shiny package
is isTruthy().
The package introduced the concept of truthy/falsy that “generally indicates
whether a value, when coerced to a base::logical(), is TRUE or FALSE”
(see the documentation).
require(shiny, quietly = TRUE)
a <- "a"
b <- "b"
isTruthy(a == b) # FALSE >> works
#> [1] FALSE
a <- character(0)
isTruthy(a == b) # FALSE >> works
#> [1] FALSE
a <- NA
isTruthy(a == b) # FALSE >> works
#> [1] FALSE
a <- NULL
isTruthy(a == b) # FALSE >> works
#> [1] FALSE
a <- b
isTruthy(a == b) # TRUE >> works
#> [1] TRUE
One of the advantages is that you can use other operators like %in%
or match(), too.
The situation in R is that you never know what a function will return when it fails.
Some functions return NA, others NULL, and yet others vectors with length == 0.
isTruthy() makes it easier to handle the diversity.
Unfortunately, when one does not write a shiny app it hardly makes sense to load the package because - aside from isTruthy - shiny only adds a large bunch of unneeded Web App features.
Created on 2021-01-10 by the reprex package (v0.3.0)
I have a doubt very similar to this topic here: Find matches of a vector of strings in another vector of strings.
I have a vector of clients, and if the name indicates that is a commercial client, I need to change the type in my data.frame.
So, suppose that:
commercial_names <- c("BAKERY","MARKET", "SCHOOL", "CINEMA")
clients <- c("JOHN XX","REESE YY","BAKERY ZZ","SAMANTHA WW")
I tried the code in the topic cited before, but I had an error:
> grepl(paste(commercial_names, collape="|"),clients)
[1] TRUE TRUE TRUE TRUE
Warning message:
In grepl(paste(commercial_names, collape = "|"), clients) :
argument 'pattern' has length > 1 and only the first element will be used
What am I doing wrong? I would thank any help.
Your code is correct but for a typo:
grepl(paste0(commercial_names, collapse = "|"), clients) # typo: collape
[1] FALSE FALSE TRUE FALSE
Given the typo, the commercial_names are not collapsed.
Not sure how to do this with a one-liner but a loop seems to do the trick.
sapply(clients, function(client) {
any(str_detect(client, commercial_names))
})
> JOHN XX REESE YY BAKERY ZZ SAMANTHA WW
> FALSE FALSE TRUE FALSE
I found another way of to do this, with the command %like% of package data.table:
> clients %like% paste(commercial_names,collapse = "|")
[1] FALSE FALSE TRUE FALSE
You can do something like this too:
clients.first <- gsub(" ..", "", clients)
clients.first %in% commercial_names
This returns:
[1] FALSE FALSE TRUE FALSE
You might need to change the regular expression for gsub if your clients data is more heterogeneous though.
Is there way to apply back-ticks to a vector of .Primitive function names so that it can be passed safely to is.primitive ?
Currently I use get(x) for x in is.primitive(x). These first calls,
> is.primitive(`$`)
#[1] TRUE
> is.primitive(get("$"))
#[1] TRUE
are the right ones. All the next calls don't work.
> is.primitive("$")
#[1] FALSE
## which is a bit confusing considering the argument name in
> `$`
#.Primitive("$")
##----
## other tries ...
> is.primitive($)
#Error: unexpected '$' in "is.primitive($"
> is.primitive("`$`")
#[1] FALSE
> is.primitive(`"$"`)
#Error in is.primitive(`"$"`) : object '"$"' not found
> sQuote("$")
#[1] "‘$’"
> is.primitive(sQuote("$"))
#[1] FALSE
> as.name("$") ## most promising! ...
#`$`
> is.primitive(as.name("$")) ## ...but no
#[1] FALSE
The reason I'm doing this is because I'd like to perform some analysis on the objects in package:base using something like
Vis.primitive <- Vectorize(is.primitive)
The vector x I'll be using is
x <- ls("package:base")
It is worth reading the help for functional programming
(?Map/?Filter)
This gives examples of how to use Filter to perform such analyses.
You could do something like
Filter(is.primitive, sapply(ls(baseenv()), get, baseenv()))
An alternative would be to use match.fun to search for the function
eg
match.fun('$')
There is no need to mess arround with `!
Given two vectors:
x = c('a','b')
lookup = c('a','c','d','e','f')
test if each element in x is present in lookup. One way of doing it:
all(!is.na(match(x, lookup)))
I find this solution a bit verbose for R and wonder if there is better/shorter version.
%in% does this:
all(x %in% lookup)
## [1] FALSE
Can also use setdiff. See the associated help page for other set operations.
setdiff(x,lookup)
[1] "b"
> as.logical(length( setdiff(x,lookup) ) )
[1] TRUE
I manage to do the following:
stuff <- c("banana_fruit","apple_fruit","coin","key","crap")
fruits <- stuff[stuff %in% grep("fruit",stuff,value=TRUE)]
but I can't get select the-not-so-healthy stuff with the usual thoughts and ideas like
no_fruit <- stuff[stuff %not in% grep("fruit",stuff,value=TRUE)]
#or
no_fruit <- stuff[-c(stuff %in% grep("fruit",stuff,value=TRUE))]
don't work. The latter just ignores the "-"
> stuff[grep("fruit",stuff)]
[1] "banana_fruit" "apple_fruit"
> stuff[-grep("fruit",stuff)]
[1] "coin" "key" "crap"
You can only use negative subscripts with numeric/integer vectors, not logical because:
> -TRUE
[1] -1
If you want to negate a logical vector, use !:
> !TRUE
[1] FALSE
As Joshua mentioned: you can't use - to negate your logical index; use ! instead.
stuff[!(stuff %in% grep("fruit",stuff,value=TRUE))]
See also the stringr package for this kind of thing.
stuff[!str_detect(stuff, "fruit")]
There is also a parameter called 'invert' in grep that does essentially what you're looking for:
> stuff <- c("banana_fruit","apple_fruit","coin","key","crap")
> fruits <- stuff[stuff %in% grep("fruit",stuff,value=TRUE)]
> fruits
[1] "banana_fruit" "apple_fruit"
> grep("fruit", stuff, value = T)
[1] "banana_fruit" "apple_fruit"
> grep("fruit", stuff, value = T, invert = T)
[1] "coin" "key" "crap"