I have 2 vectors, I want to show logic if elements in vector z are equal to any elements in vector x.
z <- rep(c("AA","AB","AC","AD","AE"), 40)
x <- c("AA","AD","BB")
z == x
I use z == x but the True False values are not correct.
Warning shows, "longer object length is not a multiple of shorter object length"
You are looking for %in% (see ?"%in%" for details):
z %in% x
head(z %in% x)
# [1] TRUE FALSE FALSE TRUE FALSE TRUE
Related
How do I run code as cumulative numbers?
This is my code
x <- runif(50)
y <- 10
cumsum(x) <= y
It only returns logical (boolean)
[1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
if I try to correct this by adding as.numeric or as.integer it returns
[1] 1 1 1 1 0 0 0 0 0 0
The problem with your code is you are comparing if the cumsum(x) is <= y. And whenever you do these kinds of comparisons you will get a logical vector.
But if you are looking for items that are smaller than y you can do the following:
x <- runif(50)
y <- 10
idx <- which(cumsum(x) <= y)
x[idx]
The idx stores the indices at which the condition is matched i.e. cumsum(x) <= y. And to print the numbers at those indices you can use x[idx].
The output will look like this:
[1] 0.95362053 0.09553108 0.82016689 0.69883212 0.35495891 0.16966706
[7] 0.15281253 0.16532772 0.81920233 0.82249972 0.30146590 0.69536763
[13] 0.32764723 0.40601504 0.70189321 0.16597773 0.25304473 0.41667253
[19] 0.52499118 0.27216339 0.27347937
You are not returning values of cumsum(x) but the values of logical statement "cumsum(x) is less or equal to y" . The results of this statement are booleans not numbers.
Suppose I want to filter vector x based on which values appear approximately in vector y:
x <- c(1.123456789, 2.123456789, 3.123456789)
y <- c(1.12345, 2.12345)
If I didn't want approximate comparison, I'd use %in%:
x %in% y
[1] FALSE FALSE FALSE
Where my required result is:
# something like: x %near_in% y
[1] TRUE TRUE FALSE
The dplyr::near(x, y, tol) function's help file mentions "x, y: Numeric vectors to compare", but this is of course not entirely true, y has to be either of x's length or a single value, because all near() does is use the abs() function:
near <- function (x, y, tol = .Machine$double.eps^0.5)
{
abs(x - y) < tol
}
And if we do this we see abs() takes y's values and repeats them until it needs so (not without a warning), and we get:
abs(x - y)
[1] 0.000006789 0.000006789 2.000006789
Warning message:
In x - y : longer object length is not a multiple of shorter object length
My current solution is to use sapply() on y's elements to create a n x m matrix (3 x 2 here), then use apply() to see if any() of the rows (values of x) has TRUE in it:
apply(sapply(y, function(y_val) near(x, y_val, 0.0001)), 1, any)
[1] TRUE TRUE FALSE
But this seems cumbersome! What if I had thousands of values in y, wouldn't I be creating a temporary matrix with thousands of rows? Any better way?
You could floor or round the values:
tol <- 1e-5
floor(x/tol)
#> [1] 112345 212345 312345
floor(y/tol)
#> [1] 112345 212345
floor(x/tol) %in% floor(y/tol)
#> [1] TRUE TRUE FALSE
I am not quite sure. but have a look at DescTools library where you can find closest values:
library(DescTools)
y %in% sapply(x, function(i) Closest(y, i))
[1] TRUE TRUE
I want to replace few elements of vector by whole second vector. Condition is, that replaced elements of first vector are equal to third vector. Here is an example:
a <- 1:10
b <- 5:7
v <- rnorm(2, mean = 1, sd = 5)
my output should be
c(a[1:4], v, a[8:10])
I have already tried
replace(a, a == b, v)
a[a == b] <- v
but with a little success. Can anyone help?
The == operator is best used to match vectors of the same length, or when one of the vector is only length 1.
Try this, and notice in neither case do you get the positional match that you desire.
> a == b
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Warning message:
In a == b : longer object length is not a multiple of shorter object length
> b == a
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Warning message:
In b == a : longer object length is not a multiple of shorter object length
Instead, use match() - this gives you the index position where there is a match in the values.
> match(b, a)
[1] 5 6 7
Then:
a <- 1:10
b <- 5:7
v <- rnorm(3, mean=1, sd=5)
a[match(b, a)] <- v
The results:
a
[1] 1.0000000 2.0000000 3.0000000 4.0000000 -4.6843669 0.9014578 -0.7601413 8.0000000
[9] 9.0000000 10.0000000
Here' another option:
a[a %in% b] <- v
Since in the example described in the OP there are three common numbers in the vectors a and b while v <- rnorm(2, mean = 1, sd = 5)
contains only 2 numbers, the vector v will be recycled and a warning will be issued.
The warning and recycling can be prevented, e.g., by defining v as
v <- rnorm(sum(a %in% b), mean = 1, sd = 5)
Suppose I have two vectors, A and B. I want to get a boolean vector with the same length of vector A, which tells the information of each element in vector A whether it is in vector B. What is the function for it?
I think you're looking for %in%:
A <- c(0,2,4,6)
B <- c(8,7,6,5,4)
A %in% B
[1] FALSE FALSE TRUE TRUE
A <- c(0,2,4,6)
B <- c(8,7,6,5,4)
x <- (is.element(A, B))
x
## [1] FALSE FALSE TRUE TRUE
This question already has answers here:
Check which elements of a vector is between the elements of another one in R
(4 answers)
Closed 9 years ago.
I have two vectors. I want to check the first element of first vector is between first and second element of second vector , then check the second element of first vector is between the third and forth element of the second vector ,.....How can I do this in R?
For example, If we have tow vectors
a = c(1.5, 2, 3.5)
b = c(1, 2, 3, 5, 3, 8)
the final result in R should be for 1.5 is TRUE and 3.5 is TRUE and for 2 is FALSE.
x <- c(1.5,3.5,3.5,3.5,4)
y <- 1:5
x > y & x < c(y[-1],NA)
#[1] TRUE FALSE TRUE FALSE FALSE
You need to take care of vector lengths and think about, what you want the result to be for the last element of x and of course.
More robust solution:
x <- c(1.5,3.5,3.5,3.5,4)
findInterval(x,y) == seq_along(x)
#[1] TRUE FALSE TRUE FALSE FALSE
x1 <- c(1.5,3.5)
findInterval(x1,y) == seq_along(x1)
#[1] TRUE FALSE
x2 <- c(1.5,3.5,1:5+0.5)
findInterval(x2,y) == seq_along(x2)
#[1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE
Here's one way.
s <- seq_along(a)
b[s] < a[s] & a[s] < b[s+1]
# [1] TRUE FALSE TRUE
Maybe this is not an ideal and fastest solution, but it works.
a <- rnorm(99)
b <- rnorm(100)
m <- cbind(b[-length(b)], b[-1])
a > m[,1] & a < m[,2]
You should check the lengths of both initial vectors.
Here is one-line solution:
sapply(1:length(a), function(i) {a[i] > b[i] & a[i] < b[i+1]})