Is there an easy, straightforward way (possibly a builtin function) that could match one vector as a whole in another vector?
Example:
target <- c(1,2,3)
A <- c(4,5,6,1,2,3)
B <- c(4,5,6,3,2,1)
my_match(target, A) # TRUE
my_match(target, B) # FALSE
I tried %in%, match and pmatch but these won't give the desired result. For example, both target %in% A and target %in% B will give the result [1] TRUE TRUE TRUE, which is not what I want.
Here another version
multi_match=function(target,A) {
lA=length(A)
lt=length(target)
if (lt>lA) return(FALSE)
any(colSums(sapply(1:(lA-lt+1),function(i) A[i:(i+lt-1)])==target)==lt)
}
Let's try it with some data
target <- c(1,2,3)
A <- c(4,5,6,1,2,3,1,2,3,1,3)
B <- c(4,5,6,3,2,1)
multi_match(target,A)
#TRUE
multi_match(target,B)
#FALSE
#"wrong" input order - trivially no match
multi_match(A,target)
#FALSE
And an extension of the multi_match function above to multi_which.
multi_which=function(target,A) {
lA=length(A)
lt=length(target)
if (lt>lA) return(integer(0))
which(colSums(sapply(1:(lA-lt+1),function(i) A[i:(i+lt-1)])==target)==lt)
}
multi_which(target,A)
#[1] 4 7
multi_which(target,B)
#integer(0)
#"wrong" input order - trivially no match
multi_which(A,target)
#integer(0)
Try:
grepl(paste(target,collapse=","),paste(A,collapse=","))
grepl(paste(target,collapse=","),paste(B,collapse=","))
This concatenates the vectors into strings and looks for a substring in the second argument that matches the first.
You could put this into a function that returns true or false:
my_match <- function(x,y,dlm=",") grepl(paste(x,collapse=dlm),paste(y,collapse=dlm))
my_match(target,A)
[1] TRUE
my_match(target,B)
[1] FALSE
One possible way is to use match and check if resulting sequence is rising
all(diff(match(target, A)) == 1) && length(match(target, A)) == length(target)
Or as a function
> exact_match <- function(p, x) all(diff(match(p, x)) == 1) && length(match(p, x)) == length(p)
> exact_match(target,A)
[1] TRUE
> exact_match(target,B)
[1] FALSE
Related
Let's say I have this list:
List_example <- list('short'= 10,'medium'= 20,'long'=200)
How do I check can I check if short, medium and long are integers in one go?
Try the code below
> all(sapply(List_example, `%%`, 1) == 0)
[1] TRUE
With sapply :
List_example <- list('short'= 10,'medium'= 20,'long'=200)
all(sapply(List_example, is.numeric))
#[1] TRUE
To check for integers specifically use is.integer.
If an object has an R integer type then clearly it is a whole number or if it has a double type then we can check if it equals itself rounded.
is_int <- function(x) is.integer(x) || (is.numeric(x) && identical(round(x), x))
all(sapply(List_example, is_int))
## [1] TRUE
L <- list(3, 5L, "xyz")
all(sapply(L, is_int))
## [1] FALSE
If what you mean is that you want to find out if they all have R integer type then we have the following since the numbers in the example are all doubles.
all(sapply(List_example, is.integer))
## [1] FALSE
I have data as below:
vec <- c("ABC|ADC|1","ABC|ADG|2")
I need to check if below substring is present or not
"ADC|DFG", it should return false for this as I need to match exact pattern.
"ABC|ADC|1|5" should return True as this is a child element for the first element in vector.
I tried using grepl but it returns true if I just pass ADC as well, any help is appreciated.
grepl returns true because the pipe character | in regex is a special one. a|b means match a or b. all you need to do is escape it.
frtest<-c("ABC|ADC","ABC|ADC|1|2","ABC|ADG","ABC|ADG|2|5")
# making the last number and it's pipe optional
test <- gsub('(\\|\\d)$', '(\\1)?', frtest)
# escaping all pipes
test<-gsub('\\|' ,'\\\\\\\\|',test)
# testing if any of the strings is in vec
res <- sapply(test, function(x) any(grepl(x, vec)) )
# reassigning the names so they're readable
names(res) <-frtest
#> ABC|ADC ABC|ADC|1|2 ABC|ADG ABC|ADG|2|5
TRUE TRUE TRUE TRUE
For two vectors vec and test, this returns a vector which is TRUE if either the corresponding element of test is the start of one of the elements of vec, or one of the elements of vec is the start of the corresponding element of test.
vec <- c("ABC|ADC|1","ABC|ADG|2")
test <- c("ADC|DFG", "ABC|ADC|1|5", "ADC|1", "ABC|ADC")
colSums(sapply(test, startsWith, vec) | t(sapply(vec, startsWith, test))) > 0
# ADC|DFG ABC|ADC|1|5 ADC|1 ABC|ADC
# FALSE TRUE FALSE TRUE
Can someone explain why %in% returns false in this case? The string <sentiment> exists in the larger string.
> x<-"hahahaha <sentiment>too much</sentiment> <feature>doge</feature>."
> "<sentiment>" %in% x
[1] FALSE
%in% checks whether the former element matches any of the elements in the latter. In this case x only has the element "hahahaha <sentiment>too much</sentiment> <feature>doge</feature>.", not "<sentiment>", so "<sentiment>" %in% x returns FALSE. For example, the following returns TRUE:
y = c(x, "<sentiment>")
# > y
# [1] "hahahaha <sentiment>too much</sentiment> <feature>doge</feature>."
# [2] "<sentiment>"
"<sentiment>" %in% y
# [1] TRUE
If you want to check whether "<sentiment>" is a substring of x, use grepl:
grepl("<sentiment>", x, fixed = TRUE)
# [1] TRUE
or use str_detect from stringr:
stringr::str_detect(x, fixed("<sentiment>"))
# [1] TRUE
%in% is the match operator, equivalent to the match function. It searches for an object in a vector (or similar), not an substring in a string.
To find in a string, use one of the pattern matching functions, such as grep or similar.
I would like to determine if a vector is either always increasing or always decreasing in R.
Ideally, if I had these three vectors:
asc=c(1,2,3,4,5)
des=c(5,4,3,2,1)
non=c(1,3,5,4,2)
I would hope that the first two would return TRUE, and the last would return FALSE.
I tried a few approaches. First, I tried:
> is.ordered(asc)
[1] FALSE
> is.ordered(des)
[1] FALSE
> is.ordered(non)
[1] FALSE
And I also tried:
> order(non)
[1] 1 5 2 4 3
And hoped that I could simply compare this vector with 1,2,3,4,5 and 5,4,3,2,1, but even that returns a string of logicals, rather than a single true or false:
> order(non)==c(1,2,3,4,5)
[1] TRUE FALSE FALSE TRUE FALSE
Maybe is.unsorted is the function your looking for
> is.unsorted(asc)
[1] FALSE
> is.unsorted(rev(des)) # here you need 'rev'
[1] FALSE
> is.unsorted(non)
[1] TRUE
From the Description of is.unsorted you can find:
Test if an object is not sorted (in increasing order), without the cost of sorting it.
Here's one way using ?is.unsorted:
is.sorted <- function(x, ...) {
!is.unsorted(x, ...) | !is.unsorted(rev(x), ...)
}
Have a look at the additional arguments to is.unsorted, which can be passed here as well.
Here is one way without is.unsorted() to check if to vectors are sorted. This function will return true, if all elements in the vector given are sorted in an ascending manner or false if not:
is.sorted <- function(x) {
if(all(sort(x, decreasing = FALSE) == x)) {
return(TRUE)
} else {
return(FALSE)
}
}
Let's say we have a statement that produces integer(0), e.g.
a <- which(1:3 == 5)
What is the safest way of catching this?
That is R's way of printing a zero length vector (an integer one), so you could test for a being of length 0:
R> length(a)
[1] 0
It might be worth rethinking the strategy you are using to identify which elements you want, but without further specific details it is difficult to suggest an alternative strategy.
If it's specifically zero length integers, then you want something like
is.integer0 <- function(x)
{
is.integer(x) && length(x) == 0L
}
Check it with:
is.integer0(integer(0)) #TRUE
is.integer0(0L) #FALSE
is.integer0(numeric(0)) #FALSE
You can also use assertive for this.
library(assertive)
x <- integer(0)
assert_is_integer(x)
assert_is_empty(x)
x <- 0L
assert_is_integer(x)
assert_is_empty(x)
## Error: is_empty : x has length 1, not 0.
x <- numeric(0)
assert_is_integer(x)
assert_is_empty(x)
## Error: is_integer : x is not of class 'integer'; it has class 'numeric'.
Maybe off-topic, but R features two nice, fast and empty-aware functions for reducing logical vectors -- any and all:
if(any(x=='dolphin')) stop("Told you, no mammals!")
Inspired by Andrie's answer, you could use identical and avoid any attribute problems by using the fact that it is the empty set of that class of object and combine it with an element of that class:
attr(a, "foo") <- "bar"
identical(1L, c(a, 1L))
#> [1] TRUE
Or more generally:
is.empty <- function(x, mode = NULL){
if (is.null(mode)) mode <- class(x)
identical(vector(mode, 1), c(x, vector(class(x), 1)))
}
b <- numeric(0)
is.empty(a)
#> [1] TRUE
is.empty(a,"numeric")
#> [1] FALSE
is.empty(b)
#> [1] TRUE
is.empty(b,"integer")
#> [1] FALSE
if ( length(a <- which(1:3 == 5) ) ) print(a) else print("nothing returned for 'a'")
#[1] "nothing returned for 'a'"
On second thought I think any is more beautiful than length(.):
if ( any(a <- which(1:3 == 5) ) ) print(a) else print("nothing returned for 'a'")
if ( any(a <- 1:3 == 5 ) ) print(a) else print("nothing returned for 'a'")
You can easily catch integer(0) with function identical(x,y)
x = integer(0)
identical(x, integer(0))
[1] TRUE
foo = function(x){identical(x, integer(0))}
foo(x)
[1] TRUE
foo(0)
[1] FALSE
another option is rlang::is_empty (useful if you're working in the tidyverse)
The rlang namespace does not seem to be attached when attaching the tidyverse via library(tidyverse) - in this case you use purrr::is_empty, which is just imported from the rlang package.
By the way, rlang::is_empty uses user Gavin's approach.
rlang::is_empty(which(1:3 == 5))
#> [1] TRUE
isEmpty() is included in the S4Vectors base package. No need to load any other packages.
a <- which(1:3 == 5)
isEmpty(a)
# [1] TRUE