Count number of vector values in range with R - r

In R, if you test a condition on a vector instead of a scalar, it will return a vector containing the result of the comparison for each value in the vector. For example...
> v <- c(1,2,3,4,5)
> v > 2
[1] FALSE FALSE TRUE TRUE TRUE
In this way, I can determine the number of elements in a vector that are above or below a certain number, like so.
> sum(v > 2)
[1] 3
> sum(v < 2)
[1] 1
Does anyone know how I can determine the number of values in a given range? For example, how would I determine the number of values greater than 2 but less than 5?

Try
> sum(v > 2 & v < 5)

There are also the %<% and %<=% comparison operators in the TeachingDemos package which allow you to do this like:
sum( 2 %<% x %<% 5 )
sum( 2 %<=% x %<=% 5 )
which gives the same results as:
sum( 2 < x & x < 5 )
sum( 2 <= x & x <= 5 )
Which is better is probably more a matter of personal preference.

Use which:
set.seed(1)
x <- sample(10, 50, replace = TRUE)
length(which(x > 3 & x < 5))
# [1] 6

Related

How to concisely deal with subsets when their lengths become zero?

To exclude elements from a vector x,
x <- c(1, 4, 3, 2)
we can subtract a vector of positions:
excl <- c(2, 3)
x[-excl]
# [1] 1 2
This also works dynamically,
(excl <- which(x[-which.max(x)] > quantile(x, .25)))
# [1] 2 3
x[-excl]
# [1] 1 2
until excl is of length zero:
excl.nolength <- which(x[-which.max(x)] > quantile(x, .95))
length(excl.nolength)
# [1] 0
x[-excl.nolength]
# integer(0)
I could kind of reformulate that, but I have many objects to which excl is applied, say:
letters[1:4][-excl.nolength]
# character(0)
I know I could use setdiff, but that's rather long and hard to read:
x[setdiff(seq(x), excl.nolength)]
# [1] 1 4 3 2
letters[1:4][setdiff(seq(letters[1:4]), excl.nolength)]
# [1] "a" "b" "c" "d"
Now, I could exploit the fact that nothing is excluded if the element number is greater than the number of elements:
length(x)
# [1] 4
x[-5]
# [1] 1 4 3 2
To generalize that I should probably use .Machine$integer.max:
tmp <- which(x[-which.max(x)] > quantile(x, .95))
excl <- if (!length(tmp) == 0) tmp else .Machine$integer.max
x[-excl]
# [1] 1 4 3 2
Wrapped into a function,
e <- function(x) if (!length(x) == 0) x else .Machine$integer.max
that's quite handy and clear:
x[-e(excl)]
# [1] 1 2
x[-e(excl.nolength)]
# [1] 1 4 3 2
letters[1:4][-e(excl.nolength)]
# [1] "a" "b" "c" "d"
But it seems a little fishy to me...
Is there a better equally concise way to deal with a subset of length zero in base R?
Edit
excl comes out as dynamic result of a function before (as shown with which above) and might be of length zero or not. If length(excl) == 0 nothing should be excluded. Following lines of code, e.g. x[-excl] should not have to be changed at best or as little as possible.
You can overwrite [ with your own function.
"[" <- function(x,y) {if(length(y)==0) x else .Primitive("[")(x,y)}
x <- c(1, 4, 3, 2)
excl <- c(2, 3)
x[-excl]
#[1] 1 2
excl <- integer()
x[-excl]
#[1] 1 4 3 2
rm("[") #Go back to normal mode
I would argue this is somewhat opinion based.
For example i find:
x <- x[-if(length(excl <- which(x[-which.max(x)] > quantile(x, .95))) == 0) .Machine$integer.max else excl]
very unreadable, but some people like one-liners. Reading package code you'll often find this is instead split up into one of the many suggestions you gave
excl <- which(x[-which.max(x)] > quantile(x, .95))
if(length(excl) != 0)
x <- x[-excl]
Alternatively, you could avoid which, and simply use the logical vector for subsetting, and this would likely be considered more clean by most
x <- x[!x[-which.max(x)] > quantile(x, .95)]
This would avoid zero-length index problem, at the cost of some loss of efficiency.
As a side note, the very example used above and in the question seems somewhat off. First which.max only returns the first index which is equal to the max value, and in addition the index will be offset for every value removed. More likely the expected example would be
x <- x[!(x > quantile(x, .95))[-which(x == max(x))]]
How bout this?
a <- letters[1:3]
excl1 <- c(1,3)
excl2 <- c()
a[!(seq_along(a) %in% excl1)]
a[!(seq_along(a) %in% excl2)]

Get index of element in unfiltered vector

I have a rle object from a vector and would like to get the sum of lengths (with values == TRUE & values == FALSE) before and after the maximum of lengths (with values == TRUE)
That I can do:
se <- c(3,1,2,3,1,2,3,4,5,6,5,8,9,9,9,5,4,3,4,5,4,3,2)
obj <- rle(se > 4)
obj
Run Length Encoding
lengths: int [1:5] 8 8 3 1 3
values : logi [1:5] FALSE TRUE FALSE TRUE FALSE
#Getting the maximum with values == True
with(obj, max(lengths[values]))
8
However now I am not sure how to proceed,.. If I would try to find the maximum 8 via match() I would first find the element with values == FALSE and I can not rely that there is not another lengths = 8 with values == FALSE right of the maximum I am trying to look for.
For the showed example the expected result would be:
Sum1 = 8
Sum2 = 7
Thank you a lot for your help,
yasel
Here is an idea that uses rleid from data.table in order to create TRUE/FALSE groups, i.e.
i1 <- setNames(se > 4, data.table::rleid(se > 4))
i2 <- names(which.max(tapply(names(i1)[i1], names(i1)[i1], length)))
sum1 <- length(i1[names(i1) < i2])
sum2 <- length(i1[names(i1) > i2])
sum1
#[1] 8
sum2
#[1] 7
You can also make it a function,
f1 <- function(x, size) {
i1 <- setNames(x > size, data.table::rleid(se > size))
i2 <- names(which.max(tapply(names(i1)[i1], names(i1)[i1], length)))
return(c(length(i1[names(i1) < i2]), length(i1[names(i1) > i2])))
}
f1(se, 4)
#[1] 8 7
Another possible approach:
with(obj, {
i <- which(values & lengths==max(lengths))
c(sum(lengths[seq_len(i-1)]), sum(lengths[-seq_len(i)]))
})
output:
[1] 8 7

How does R remember which values were indexed?

Code:
x <- c(1, 1, 3, 2, 3, 5)
x[x < 3] <- x[x < 3] * 100
x
Output:
[1] 100 100 3 200 3 5
I expected to get the output “[1] 100 100 200”. How does R remember the indexes where x < 3? Because when running x[x < 3], you get the output “1,1,2” (and the indexes of those numbers are 1,2,4).
Hence, R remember the values where x < 3, and manipulates those values. But how does R know the indexes of those values?
If you want to get the result of your call you need to assign it to a new variable:
x2 <- (x[x < 3] <- x[x<3] * 100)
x2
#[1] 100 100 200
Note that x will still be changed (as a result of your assignment x[x < 3] <- x[x<3] * 100).
If that is not desired, you can simply do:
x2 <- x[x<3]*100
x2
#[1] 100 100 200
x
#[1] 1 1 3 2 3 5
If you look at what is x < 3 you'll find it yields
[1] TRUE TRUE FALSE TRUE FALSE FALSE
so x[x<3] <- x[x < 3] * 100 means multiply by 100 the first, second and fourth value of x (x[x < 3]*100) and replace the first, second and fourth value of x by these new values (x[x<3] <-).
It's not remembering the locations of the x < 3 values. The left-hand side of your equation is explicitly the locations of the x < 3 values.
So the thing on the right might be (100, 100, 200), but the thing on the left is the placeholders at places 1, 2 and 4.
In your code, you ask R to go to x and multiple each value less than 3 by 100 that is in this code x[x < 3]*100. R, in this case, looks for the place of your element. That is, the code goes to x values and takes them one by one. The one that obeys the condition is then multiplied by 100. Then, the code returns you all the values of x. The correct way is to assign a name to your specific values as #Cath answer.
In addition, it is a good idea to write a general function for your convenience. Here is a function.
myfun <- function(x, n){
res <- lapply(x, function(i) (if (i < n) {i *100}else{0}))
res
}
or simply use the comment of #Cath.
myfun <- function(x, n){
res <- x[x < n]*100
res
}
myfun(x, 3)
x <- list(1, 1, 3, 2, 3, 5)
n=3
Then,
> myfun(x, 3)
[[1]]
[1] 100
[[2]]
[1] 100
[[3]]
[1] 0
[[4]]
[1] 200
[[5]]
[1] 0
[[6]]
[1] 0

R: Different range checks for different elements of a vector

Is there an R idiom for performing a different (integer) range check for each element of a vector?
My function is passed a two-element (integer) vector of the form v = c(m, n) and must make the following range checks:
1 <= m <= M
1 <= n <= N
For my current task, I've implemented them by manually accessing each element, and running the associated range check against it.
# Check if this is a valid position on an M x N chess board.
validate = function (square) {
row = square[1]
col = square[2]
(row %in% 1:M) && (col %in% 1:N)
}
I wonder whether there's a compacter way of doing the range checks, especially if we were to generalize it to K-element vectors.
Since you're presumably setting up different criteria for each v[j], I'd recommend creating a list out of your range criteria. Like:
Rgames> set.seed(10)
Rgames> foo<-sample(1:5,5,rep=TRUE)
Rgames> foo
[1] 3 2 3 4 1
Rgames> bar<-list(one=1:5, two=3:5,three=1:3,four=c(2,4),five=c(1,4) )
Rgames> checkit<-NA
Rgames> for(j in 1:5) checkit[j]<-foo[j]%in%bar[[j]]
Rgames> checkit
[1] TRUE FALSE TRUE TRUE TRUE
If I understand your goal correctly, the inequality operators are vectorized in R, so you can make use of this fact.
limits <- c(M=3, N=4, 5)
v <- c(m=2, n=5, 8)
result <- 1 <= v & v <= limits
# m n
# TRUE FALSE FALSE
And if you want a single value that's FALSE if any of the limits are exceeded, then you can wrap the inequality expression with all.
all(1 <= v & v <= limits)
Maybe something like this:
`%between%` <- function(x,rng){
all(x <= max(rng,na.rm = TRUE)) && all(x >= min(rng,na.rm = TRUE))
}
> 3 %between% c(1,10)
[1] TRUE
> 3:5 %between% c(1,10)
[1] TRUE
> 9:12 %between% c(1,10)
[1] FALSE
With tweaks depending on how you want to handle NAs, and other edge cases.

find contiguous responses greater than x

I'm trying to find the the point at which participants reach 8 contiguous responses in a row that are greater than 3. For example:
x <- c(2,2,4,4,4,4,5,5,5,5,7)
i want to return the value 10.
i tried the code (Thanks #DWin):
which( rle(x)$values>3 & rle(x)$lengths >= 8)
sum(rle(x)$lengths[ 1:(min(which(rle(x)$lengths >= 8))-1) ]) + 8
the problem with the above code is that it only works if the responses are all identical and greater than 3. thus the code returns a zero.
if:
x <- c(2,2,4,4,4,4,4,4,4,4,7)
the code works fine. but this isn't how my data looks.
Thanks in advance!
Why don't you create a new vector that contains the identical values that rle needs to work properly? You can use ifelse() for this and put everything into a function:
FUN <- function(x, value, runlength) {
x2 <- ifelse(x > value, 1, 0)
ret <- sum(rle(x2)$lengths[ 1:(min(which(rle(x2)$lengths >= runlength))-1) ]) + runlength
return(ret)
}
> FUN(x, value = 3, runlength = 8)
[1] 10
You could just convert your data so that the responses are only coded discriminating the measure of interest (greater than 3) and then your code will work as it is replacing x with x1.
x1 <- ifelse( x > 3, 4, 0 )
But if I was already doing this I might rewrite the code slightly more clearly this way.
runl <- rle(x1)
i <- which( runl$length > 8 & runl$value > 3 )[1]
sum( runl$length[1:(i-1)] ) + 8
Here's a vectorized way of doing it with just cumsum and cummax. Let's take an example that has a short (less than length 8) sequence of elements greater than 3 as well as a long one, just to make sure it's doing the right thing.
> x <- c(2,2,4,5,6,7,2,2,4,9,8,7,6,5,4,5,6,9,2,2,9)
> x3 <- x > 3
> cumsum(x3) - cummax(cumsum(x3)*(!x3))
[1] 0 0 1 2 3 4 0 0 1 2 3 4 5 6 7 8 9 10 0 0 1
> which( cumsum(x3) - cummax(cumsum(x3)*(!x3)) == 8)[1]
[1] 16

Resources