R, whether all the elements of X are present in Y - r

In R, how do you test for elements of one vector NOT present in another vector?
X <- c('a','b','c','d')
Y <- c('b', 'e', 'a','d','c','f', 'c')
I want to know whether all the elements of X are present in Y ? (TRUE or FALSE answer)

You can use all and %in% to test if all values of X are also in Y:
all(X %in% Y)
#[1] TRUE

You want setdiff:
> setdiff(X, Y) # all elements present in X but not Y
character(0)
> length(setdiff(X, Y)) == 0
[1] TRUE

A warning about setdiff : if your input vectors have repeated elements, setdiff will ignore the duplicates. This may or may not be what you want to do.
I wrote a package vecsets , and here's the difference in what you'll get. Note that I modified X to demonstrate the behavior.
library(vecsets)
X <- c('a','b','c','d','d')
Y <- c('b', 'e', 'a','d','c','f', 'c')
setdiff(X,Y)
character(0)
vsetdiff(X,Y)
[1] "d"

Related

R function to check if a vector is a subset of another [duplicate]

This question already has answers here:
check if vector contains another vector
(8 answers)
How can I determine whether a vector contains another vector respecting order in R?
(2 answers)
Closed 1 year ago.
I need to check if values_saved_dice is a subset from x, for example
values_saved_dice <- c(2,2,2)
x <- c(6,3,2,2,5)
I tried the following function, expecting it should return FALSE.
all(is.element(value_saved_dices, x)
But it returns TRUE, when apparently it should be FALSE - because the number "2" appeared 3 times in value_saved_dices and x only has "2" twice.
Would appreciate any help, thanks!
We may paste the vector into string and use grepl to check if the substring is found or not
grepl(paste(values_saved_dice, collapse = ""), paste(x, collapse = ""))
[1] FALSE
You could create a function:
cont = function(x, y) {
z = x[x %in% setdiff(x, y)]
length(z) == length(x) - length(y)
}
Output:
> values_saved_dice <- c(2,2,2) # triple 2
> x <- c(6,3,2,2,5)
> cont(x, values_saved_dice)
[1] FALSE
> values_saved_dice <- c(2,2) # double 2
> x <- c(6,3,2,2,5)
> cont(x, values_saved_dice)
[1] TRUE

Create indicator variables within a list

I have a list containing sequences of numbers. I want to create a list that indicates all non-zero elements up to the first element that matches a defined limit. I also want to create a list that indicates all non-zero elements after the first element to match the defined limit.
I prefer a base R solution. Presumably the solution will use lapply, but I have not been able to come up with a simple solution.
Below is a minimally reproducible example in which the limit is 2:
my.limit <- 2
my.samples <- list(0,c(1,2),0,c(0,1,1),0,0,0,0,0,c(1,1,2,2,3,4),c(0,1,2),0,c(0,0,1,1,2,2,3))
Here are the two desired lists:
within.limit <- list(0,c(1,1),0,c(0,1,1),0,0,0,0,0,c(1,1,1,0,0,0),c(0,1,1),0,c(0,0,1,1,1,0,0))
outside.limit <- list(0,c(0,0),0,c(0,0,0),0,0,0,0,0,c(0,0,0,1,1,1),c(0,0,0),0,c(0,0,0,0,0,1,1))
We can use match with nomatch argument as a very big number (should be greater than any length of the list, for some reason I couldn't use Inf here.)
within.limit1 <- lapply(my.samples, function(x)
+(x > 0 & seq_along(x) <= match(my.limit, x, nomatch = 1000)))
outside.limit1 <- lapply(my.samples, function(x)
+(seq_along(x) > match(my.limit, x, nomatch = 1000)))
Checking if output is correct to shown one :
all(mapply(function(x, y) all(x == y), within.limit, within.limit1))
#[1] TRUE
all(mapply(function(x, y) all(x == y), outside.limit, outside.limit1))
#[1] TRUE
I would do
within.limit <- lapply(my.samples, function(x)
+(x!=0 & (x<limit | cumsum(x == limit)==1)))
outside.limit <- lapply(my.samples, function(x)
+(x!=0 & (x>limit | cumsum(x == limit)>1)))
foo <- function(samples, limit, within = TRUE) {
`%cp%` <- if (within) `<=` else `>`
lapply(samples, function(x) pmin(x, seq_along(x) %cp% match(my.limit, x, nomatch = 1e8)))
}
> all.equal(foo(my.samples, my.limit, FALSE), outside.limit)
# [1] TRUE
> all.equal(foo(my.samples, my.limit, TRUE), within.limit)
# [1] TRUE
We can use findInterval
lapply(my.samples, function(x)
+(x > 0 & seq_along(x) <= findInterval(my.limit, x)-1))
and
lapply(my.samples, function(x) +(seq_along(x) > findInterval(my.limit, x)-1))

Issue matching string in list

I am trying to see if a list contains a particular string but I am having an issue.
> k
[1] "Investment"
> t
[[1]]
[1] "Investment" "Non-Investment"
> class(k)
[1] "character"
> class(t)
[1] "list"
> k %in% t
[1] FALSE
should not the above code result in TRUE rather than FALSE?
You need to unlist the list:
X <- "investment"
Y <- list(c("non-investment", "investment"))
X %in% unlist(Y)
Note I've changed it to X and Y: t is a base function so it's best not to overwrite it because it might cause conflicts!
One thing to consider is lists with multiple vectors, and figuring out whether you want to be searching across a list of vectors, or within a specific vector. Then you can use unlist to check all vectors simultaneously, and the square brackets to check a specific vector. To illustrate this, here there are sublists in Y, and the X string is in the second list, unlist tells us that X is in Y, while Y[[1]] returns FALSE, because %in% is only checking the first sublist:
X <- "alpha"
Y <- list(c("non-investment", "investment"), c("alpha", "beta"))
X %in% unlist(Y)
X %in% Y[[1]]
Note that, if you had specified Y as just a vector - which is essentially what it is in your example because there are not other sublists - then you could just use:
X <- "investment"
Y <- c("non-investment", "investment")
X %in% Y
The problem with t is it is a length one list of vectors - try k %in% t[[1]]. You may want to use unlist().
EDIT Sorry, list of vector, not lists.

R: Filter vectors by 'two-way' partial match

With two vectors
x <- c("abc", "12")
y <- c("bc", "123", "nomatch")
is there a way to do a filter of both by 'two-way' partial matching (remove elements in one vector if they contain or are contained in any element in the other vector) so that the result are these two vectors:
x1 <- c()
y1 <- c("nomatch")
To explain - every element of x is either a substring or a superstring of one of the elements of y, hence x1 is empty. Update - it is not sufficient for a substring to match the initial chars - a substring might be found anywhere in the string it matches. Example above has been updated to reflect this.
I originally thought ?pmatch might be handy, but your edit clarifies you don't just want to match the start of items. Here's a function that should work:
remover <- function(x,y) {
pmx <- sapply(x, grep, x=y)
pmy <- sapply(y, grep, x=x)
hit <- unlist(c(pmx,pmy))
list(
x[!(seq_along(x) %in% hit)],
y[!(seq_along(y) %in% hit)]
)
}
remover(x,y)
#[[1]]
#character(0)
#
#[[2]]
#[1] "nomatch"
It correctly does nothing when no match is found (thanks #Frank for picking up the earlier error):
remover("yo","nomatch")
#[[1]]
#[1] "yo"
#
#[[2]]
#[1] "nomatch"
We can do the following:
# Return data.frame of matches of a in b
m <- function(a, b) {
data.frame(sapply(a, function(w) grepl(w, b), simplify = F));
}
# Match x and y and remove
x0 <- x[!apply(m(x, y), 2, any)]
y0 <- y[!apply(m(x, y), 1, any)]
# Match y and x and remove
x1 <- x0[!apply(m(y0, x0), 1, any)]
y1 <- y0[!apply(m(y0, x0), 2, any)]
x1;
#character(0)
x2;
#[1] "nomatch"
I build a matrix of all possible matches in both directions, then combine both with | as a match in any direction is equally a match, and then and use it to subset x and y:
x <- c("abc", "12")
y <- c("bc", "123", "nomatch")
bool_mat <- sapply(x,function(z) grepl(z,y)) | t(sapply(y,function(z) grepl(z,x)))
x1 <- x[!apply(bool_mat,2,any)] # character(0)
y1 <- y[!apply(bool_mat,1,any)] # [1] "nomatch"

Identifying source of FALSE

My question is, does there exist a function that, given a logical statement, identifies the source of FALSE (if it is false)?
For example,
x=1; y=1; z=1;
x==1 & y==1 & z==2
Obviously it is the value of z that makes the statement false. In general though, is there a function that let's me identify the variable(s) in a logical statement who's value makes a logical statement false?
Instead of writing x==1 & y==1 & z==2 you could define
cn <- c(x == 1, y == 1, z == 2)
or
cn <- c(x, y, z) == c(1, 1, 2)
and use all(cn). Then
which(!cn)
# [1] 3
gives the source(s) of FALSE.
In general, no, there is no such function that you are looking for, but for different logical statements a similar approach should work, although it might be too lengthy to pursue.
Considering (!(x %in% c(1,2,3)) & y==3) | z %in% c(4,5), we get FALSE if z %in% c(4,5) is FALSE and (!(x %in% c(1,2,3)) & y==3) is FALSE simultaneously. So, if (!(x %in% c(1,2,3)) & y==3) | z %in% c(4,5) returns FALSE, we are sure about z and still need to check x and y, so that the list of problematic variables can be obtained as follows:
if(!((!(x %in% c(1,2,3)) & y==3) | z %in% c(4,5)))
c("x", "y", "z")[c(x %in% c(1,2,3), !y == 3, TRUE)]
# [1] "x" "y" "z"
or
a <- !(x %in% c(1,2,3))
b <- y == 3
c <- z %in% c(4,5)
if(!((a & b) | c))
c("x", "y", "z")[c(!a, !b, TRUE)]
# [1] "x" "y" "z"
I like #julius's answer but there is also the stopifnot function.
x <- 1; y <- 1; z <- 2
stopifnot(x == 1, y == 1, z == 1)
#Error: z == 1 is not TRUE
Not that the result is an error if there are any false statements and nothing if they're all true. It also stops at the first false statement so if you had something like
x <- T; y <- F; z <- F
stopifnot(x, y, z)
#Error: y is not TRUE
you would not be told that z is FALSE in this case.
So the result isn't a logical or an index but instead is either nothing or an error. This doesn't seem desirable but it is useful if the reason you're using it is for checking inputs to a function or something similar where you want to produce an error on invalid inputs and just keep on moving if everything is fine. I mention stopifnot because it seems like this might be the situation you're in. I'm not sure.
Here is a silly example where you might use it. In this case you apparently only want positive numbers as input and reject everything else:
doublePositiveNumber <- function(x){
stopifnot(is.numeric(x), x >= 0)
return(2*x)
}
which results in
> doublePositiveNumber("hey")
Error: is.numeric(x) is not TRUE
> doublePositiveNumber(-2)
Error: x >= 0 is not TRUE
> doublePositiveNumber(2)
[1] 4
So here you guarantee you get the inputs you want and produce and error message for the user that hopefully tells them what the issue is.

Resources