R get index satisty the condition [duplicate] - r

I am looking for a condition which will return the index of a vector satisfying a condition.
For example-
I have a vector b = c(0.1, 0.2, 0.7, 0.9)
I want to know the first index of b for which say b >0.65. In this case the answer should be 3
I tried which.min(subset(b, b > 0.65))
But this gives me 1 instead of 3.
Please help

Use which and take the first element of the result:
which(b > 0.65)[1]
#[1] 3

Be careful, which.max is wrong if the condition is never met, it does not return NA:
> a <- c(1, 2, 3, 2, 5)
> a >= 6
[1] FALSE FALSE FALSE FALSE FALSE
> which(a >= 6)[1]
[1] NA # desirable
> which.max(a >= 6)
[1] 1 # not desirable
Why? When all elements are equal, which.max returns 1:
> b <- c(2, 2, 2, 2, 2)
> which.max(b)
[1] 1
Note: FALSE < TRUE

You may use which.max:
which.max(b > 0.65)
# [1] 3
From ?which.max: "For a logical vector x, [...] which.max(x) return[s] the index of the first [...] TRUE
b > 0.65
# [1] FALSE FALSE TRUE TRUE
You should also have a look at the result of your code subset(b, b > 0.65) to see why it can't give you the desired result.

Related

Wrong answer of R using == operator [duplicate]

df <- structure(list(x = 1:10, time = c(0.5, 0.5, 1, 2, 3, 0.5, 0.5,
1, 2, 3)), .Names = c("x", "time"), row.names = c(NA, -10L), class = "data.frame")
df[df$time %in% c(0.5, 3), ]
## x time
## 1 1 0.5
## 2 2 0.5
## 5 5 3.0
## 6 6 0.5
## 7 7 0.5
## 10 10 3.0
df[df$time == c(0.5, 3), ]
## x time
## 1 1 0.5
## 7 7 0.5
## 10 10 3.0
What is the difference between %in% and == here?
The problem is vector recycling.
Your first line does exactly what you'd expect. It checks what elements of df$time are in c(0.5, 3) and returns the values which are.
Your second line is trickier. It's actually equivalent to
df[df$time == rep(c(0.5,3), length.out=nrow(df)),]
To see this, let's see what happens if use a vector rep(0.5, 10):
rep(0.5, 10) == c(0.5, 3)
[1] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
See how it returns every odd value. Essentially it's matching 0.5 to the vector c(0.5, 3, 0.5, 3, 0.5...)
You can manipulate a vector to produce no matches this way. Take the vector: rep(c(3, 0.5), 5):
rep(c(3, 0.5), 5) == c(0.5, 3)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
They're all FALSE; you are matching every 0.5 with 3 and vice versa.
In
df$time == c(0.5,3)
the c(0.5,3) first gets broadcast to the shape of df$time, i.e. c(0.5,3,0.5,3,0.5,3,0.5,3,0.5,3). Then the two vectors are compared element-by-element.
On the other hand,
df$time %in% c(0.5,3)
checks whether each element of df$time belongs to the set {0.5, 3}.
This is an old thread, but I haven't seen this answer anywhere and it might be relevant for some people.
Another difference between the two is handling of NAs (missing values).
NA == NA
[1] NA
NA %in% c(NA)
[1] TRUE

Error at filtering is deleting too many rows [duplicate]

df <- structure(list(x = 1:10, time = c(0.5, 0.5, 1, 2, 3, 0.5, 0.5,
1, 2, 3)), .Names = c("x", "time"), row.names = c(NA, -10L), class = "data.frame")
df[df$time %in% c(0.5, 3), ]
## x time
## 1 1 0.5
## 2 2 0.5
## 5 5 3.0
## 6 6 0.5
## 7 7 0.5
## 10 10 3.0
df[df$time == c(0.5, 3), ]
## x time
## 1 1 0.5
## 7 7 0.5
## 10 10 3.0
What is the difference between %in% and == here?
The problem is vector recycling.
Your first line does exactly what you'd expect. It checks what elements of df$time are in c(0.5, 3) and returns the values which are.
Your second line is trickier. It's actually equivalent to
df[df$time == rep(c(0.5,3), length.out=nrow(df)),]
To see this, let's see what happens if use a vector rep(0.5, 10):
rep(0.5, 10) == c(0.5, 3)
[1] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
See how it returns every odd value. Essentially it's matching 0.5 to the vector c(0.5, 3, 0.5, 3, 0.5...)
You can manipulate a vector to produce no matches this way. Take the vector: rep(c(3, 0.5), 5):
rep(c(3, 0.5), 5) == c(0.5, 3)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
They're all FALSE; you are matching every 0.5 with 3 and vice versa.
In
df$time == c(0.5,3)
the c(0.5,3) first gets broadcast to the shape of df$time, i.e. c(0.5,3,0.5,3,0.5,3,0.5,3,0.5,3). Then the two vectors are compared element-by-element.
On the other hand,
df$time %in% c(0.5,3)
checks whether each element of df$time belongs to the set {0.5, 3}.
This is an old thread, but I haven't seen this answer anywhere and it might be relevant for some people.
Another difference between the two is handling of NAs (missing values).
NA == NA
[1] NA
NA %in% c(NA)
[1] TRUE

Check if a number is between two others

I am looking for a function that verifies if a number is between two other numbers. I also need to control if I want a strict comparison (a
I know the function between() in dplyr. Yet, I have to know the upper and lower numbers.
MyNumber = 8
First = 2
Second = 10
# This will return TRUE
between(MyNumber, lower = First, upper = Second)
# But this will return FALSE
between(MyNumber, lower = Second, upper = First)
# This will return TRUE. I want it to return FALSE
First = 8
between(MyNumber, lower = First, upper = Second)
I need a function that returns TRUE no matter what is the order.
Something like:
between2 <- function(number,bounds) { number > min(bounds) & number < max(bounds)}
between2(8, c(2,10))
[1] TRUE
between2(8, c(10,2))
[1] TRUE
This function also deals with your added condition
between2(8,c(8,10))
[1] FALSE
You could do it with a simple arithmetics:
between <- function(number, first, second) { (first - number) * (second - number) < 0 }
Here are some example outputs:
> between(8, 2, 10)
[1] TRUE
> between(8, 10, 2)
[1] TRUE
> between(8, 10, 12)
[1] FALSE
> between(8, 1, 2)
[1] FALSE
You could use %in% with the : function, once you now first and last:
first <- 2
last <- 10
number <- 8
number %in% first:last
[1] TRUE
first <- 10
last <- 2
number <- 8
number %in% first:last
[1] TRUE
first <- 10
last <- 12
number <- 8
number %in% first:last
[1] FALSE
first <- 12
last <- 10
number <- 8
number %in% first:last
[1] FALSE
In a function, and strict lets you consider or not strict comparison:
my_between <- function(n, f, l, strict = FALSE) {
if (!strict) {
n %in% f:l # if strict == FALSE (default)
} else {
n %in% (f+1):(l-1) # if strict == TRUE
}
}
my_between(8, 2, 10)
What's wrong with
f_between <- function (num, L, R) num>=min(L,R) & num<=max(L,R)
f_between(8, 2, 10)
#[1] TRUE
f_between(6, 6, 10)
#[1] TRUE
f_between(2, -10, -2)
#[1] FALSE
f_between(3, 5, 7)
#[1] FALSE

Replace elements of vector by vector

I want to replace few elements of vector by whole second vector. Condition is, that replaced elements of first vector are equal to third vector. Here is an example:
a <- 1:10
b <- 5:7
v <- rnorm(2, mean = 1, sd = 5)
my output should be
c(a[1:4], v, a[8:10])
I have already tried
replace(a, a == b, v)
a[a == b] <- v
but with a little success. Can anyone help?
The == operator is best used to match vectors of the same length, or when one of the vector is only length 1.
Try this, and notice in neither case do you get the positional match that you desire.
> a == b
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Warning message:
In a == b : longer object length is not a multiple of shorter object length
> b == a
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Warning message:
In b == a : longer object length is not a multiple of shorter object length
Instead, use match() - this gives you the index position where there is a match in the values.
> match(b, a)
[1] 5 6 7
Then:
a <- 1:10
b <- 5:7
v <- rnorm(3, mean=1, sd=5)
a[match(b, a)] <- v
The results:
a
[1] 1.0000000 2.0000000 3.0000000 4.0000000 -4.6843669 0.9014578 -0.7601413 8.0000000
[9] 9.0000000 10.0000000
Here' another option:
a[a %in% b] <- v
Since in the example described in the OP there are three common numbers in the vectors a and b while v <- rnorm(2, mean = 1, sd = 5)
contains only 2 numbers, the vector v will be recycled and a warning will be issued.
The warning and recycling can be prevented, e.g., by defining v as
v <- rnorm(sum(a %in% b), mean = 1, sd = 5)

DIfference between using "%in%" and "==" while subsetting in R [duplicate]

df <- structure(list(x = 1:10, time = c(0.5, 0.5, 1, 2, 3, 0.5, 0.5,
1, 2, 3)), .Names = c("x", "time"), row.names = c(NA, -10L), class = "data.frame")
df[df$time %in% c(0.5, 3), ]
## x time
## 1 1 0.5
## 2 2 0.5
## 5 5 3.0
## 6 6 0.5
## 7 7 0.5
## 10 10 3.0
df[df$time == c(0.5, 3), ]
## x time
## 1 1 0.5
## 7 7 0.5
## 10 10 3.0
What is the difference between %in% and == here?
The problem is vector recycling.
Your first line does exactly what you'd expect. It checks what elements of df$time are in c(0.5, 3) and returns the values which are.
Your second line is trickier. It's actually equivalent to
df[df$time == rep(c(0.5,3), length.out=nrow(df)),]
To see this, let's see what happens if use a vector rep(0.5, 10):
rep(0.5, 10) == c(0.5, 3)
[1] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
See how it returns every odd value. Essentially it's matching 0.5 to the vector c(0.5, 3, 0.5, 3, 0.5...)
You can manipulate a vector to produce no matches this way. Take the vector: rep(c(3, 0.5), 5):
rep(c(3, 0.5), 5) == c(0.5, 3)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
They're all FALSE; you are matching every 0.5 with 3 and vice versa.
In
df$time == c(0.5,3)
the c(0.5,3) first gets broadcast to the shape of df$time, i.e. c(0.5,3,0.5,3,0.5,3,0.5,3,0.5,3). Then the two vectors are compared element-by-element.
On the other hand,
df$time %in% c(0.5,3)
checks whether each element of df$time belongs to the set {0.5, 3}.
This is an old thread, but I haven't seen this answer anywhere and it might be relevant for some people.
Another difference between the two is handling of NAs (missing values).
NA == NA
[1] NA
NA %in% c(NA)
[1] TRUE

Resources