R: Different range checks for different elements of a vector - r

Is there an R idiom for performing a different (integer) range check for each element of a vector?
My function is passed a two-element (integer) vector of the form v = c(m, n) and must make the following range checks:
1 <= m <= M
1 <= n <= N
For my current task, I've implemented them by manually accessing each element, and running the associated range check against it.
# Check if this is a valid position on an M x N chess board.
validate = function (square) {
row = square[1]
col = square[2]
(row %in% 1:M) && (col %in% 1:N)
}
I wonder whether there's a compacter way of doing the range checks, especially if we were to generalize it to K-element vectors.

Since you're presumably setting up different criteria for each v[j], I'd recommend creating a list out of your range criteria. Like:
Rgames> set.seed(10)
Rgames> foo<-sample(1:5,5,rep=TRUE)
Rgames> foo
[1] 3 2 3 4 1
Rgames> bar<-list(one=1:5, two=3:5,three=1:3,four=c(2,4),five=c(1,4) )
Rgames> checkit<-NA
Rgames> for(j in 1:5) checkit[j]<-foo[j]%in%bar[[j]]
Rgames> checkit
[1] TRUE FALSE TRUE TRUE TRUE

If I understand your goal correctly, the inequality operators are vectorized in R, so you can make use of this fact.
limits <- c(M=3, N=4, 5)
v <- c(m=2, n=5, 8)
result <- 1 <= v & v <= limits
# m n
# TRUE FALSE FALSE
And if you want a single value that's FALSE if any of the limits are exceeded, then you can wrap the inequality expression with all.
all(1 <= v & v <= limits)

Maybe something like this:
`%between%` <- function(x,rng){
all(x <= max(rng,na.rm = TRUE)) && all(x >= min(rng,na.rm = TRUE))
}
> 3 %between% c(1,10)
[1] TRUE
> 3:5 %between% c(1,10)
[1] TRUE
> 9:12 %between% c(1,10)
[1] FALSE
With tweaks depending on how you want to handle NAs, and other edge cases.

Related

Limit 2 bounds of a vector in IF condition

I want to limit the 2 bounds of a vector in a IF condition. However I get the warnings "the condition has length > 1 and only the first element will be used" when I try to use the following function :
rho <- c(0.8,0,-0.5)
sigma.S <- 0.4
sigma.M <- 0.1
mu.S <- 0.06
T <- 1
N <- 365
dt <- T/N
m <- c(100,102,100,99,101)
z <- rnorm(N)
P <- matrix(0, N, 1)
P[1] <- m[1]
for (i in 2:N){
P[i] <- P[i-1]*( 1 + sigma.M*sqrt(dt)*z[i] )
}
tPts <- c(0,91,182,273,364)
yPts <- c(m[1]-P[1],m[2]-P[92],m[3]-P[183],m[4]-P[274],m[5]-P[365])
a <- tPts[1]
A <- yPts[1]
for(i in 2:5){
t <- seq(0,364,1)
b <- tPts[i]
B <- yPts[i]
if(a<=t & t<=b){
y <- ( B*(t-a) - A*(t-b) )/(b-a)
return(y)
}
a <- b
A <- B
}
Can anyone see what the problem is here? Thanks in advance!
We could change the if condition inside the loop by wrapping with all
if(all(a<=t) & all(t<=b))
assuming that we need condition to meet along the length of 't'
as a <= t or t <= b returns a logical vector of the same length as 't' and here 't' is created as a sequence from 0 to 364 i.e. even if one of the vector is of length 1 i.e. 'a' or 'b', the comparison operator does a recycling of that element to the do comparison across the larger length vector
5 < (1:6)
#[1] FALSE FALSE FALSE FALSE FALSE TRUE
and if/else requires input to be of length 1.

How to concisely deal with subsets when their lengths become zero?

To exclude elements from a vector x,
x <- c(1, 4, 3, 2)
we can subtract a vector of positions:
excl <- c(2, 3)
x[-excl]
# [1] 1 2
This also works dynamically,
(excl <- which(x[-which.max(x)] > quantile(x, .25)))
# [1] 2 3
x[-excl]
# [1] 1 2
until excl is of length zero:
excl.nolength <- which(x[-which.max(x)] > quantile(x, .95))
length(excl.nolength)
# [1] 0
x[-excl.nolength]
# integer(0)
I could kind of reformulate that, but I have many objects to which excl is applied, say:
letters[1:4][-excl.nolength]
# character(0)
I know I could use setdiff, but that's rather long and hard to read:
x[setdiff(seq(x), excl.nolength)]
# [1] 1 4 3 2
letters[1:4][setdiff(seq(letters[1:4]), excl.nolength)]
# [1] "a" "b" "c" "d"
Now, I could exploit the fact that nothing is excluded if the element number is greater than the number of elements:
length(x)
# [1] 4
x[-5]
# [1] 1 4 3 2
To generalize that I should probably use .Machine$integer.max:
tmp <- which(x[-which.max(x)] > quantile(x, .95))
excl <- if (!length(tmp) == 0) tmp else .Machine$integer.max
x[-excl]
# [1] 1 4 3 2
Wrapped into a function,
e <- function(x) if (!length(x) == 0) x else .Machine$integer.max
that's quite handy and clear:
x[-e(excl)]
# [1] 1 2
x[-e(excl.nolength)]
# [1] 1 4 3 2
letters[1:4][-e(excl.nolength)]
# [1] "a" "b" "c" "d"
But it seems a little fishy to me...
Is there a better equally concise way to deal with a subset of length zero in base R?
Edit
excl comes out as dynamic result of a function before (as shown with which above) and might be of length zero or not. If length(excl) == 0 nothing should be excluded. Following lines of code, e.g. x[-excl] should not have to be changed at best or as little as possible.
You can overwrite [ with your own function.
"[" <- function(x,y) {if(length(y)==0) x else .Primitive("[")(x,y)}
x <- c(1, 4, 3, 2)
excl <- c(2, 3)
x[-excl]
#[1] 1 2
excl <- integer()
x[-excl]
#[1] 1 4 3 2
rm("[") #Go back to normal mode
I would argue this is somewhat opinion based.
For example i find:
x <- x[-if(length(excl <- which(x[-which.max(x)] > quantile(x, .95))) == 0) .Machine$integer.max else excl]
very unreadable, but some people like one-liners. Reading package code you'll often find this is instead split up into one of the many suggestions you gave
excl <- which(x[-which.max(x)] > quantile(x, .95))
if(length(excl) != 0)
x <- x[-excl]
Alternatively, you could avoid which, and simply use the logical vector for subsetting, and this would likely be considered more clean by most
x <- x[!x[-which.max(x)] > quantile(x, .95)]
This would avoid zero-length index problem, at the cost of some loss of efficiency.
As a side note, the very example used above and in the question seems somewhat off. First which.max only returns the first index which is equal to the max value, and in addition the index will be offset for every value removed. More likely the expected example would be
x <- x[!(x > quantile(x, .95))[-which(x == max(x))]]
How bout this?
a <- letters[1:3]
excl1 <- c(1,3)
excl2 <- c()
a[!(seq_along(a) %in% excl1)]
a[!(seq_along(a) %in% excl2)]

Get index of element in unfiltered vector

I have a rle object from a vector and would like to get the sum of lengths (with values == TRUE & values == FALSE) before and after the maximum of lengths (with values == TRUE)
That I can do:
se <- c(3,1,2,3,1,2,3,4,5,6,5,8,9,9,9,5,4,3,4,5,4,3,2)
obj <- rle(se > 4)
obj
Run Length Encoding
lengths: int [1:5] 8 8 3 1 3
values : logi [1:5] FALSE TRUE FALSE TRUE FALSE
#Getting the maximum with values == True
with(obj, max(lengths[values]))
8
However now I am not sure how to proceed,.. If I would try to find the maximum 8 via match() I would first find the element with values == FALSE and I can not rely that there is not another lengths = 8 with values == FALSE right of the maximum I am trying to look for.
For the showed example the expected result would be:
Sum1 = 8
Sum2 = 7
Thank you a lot for your help,
yasel
Here is an idea that uses rleid from data.table in order to create TRUE/FALSE groups, i.e.
i1 <- setNames(se > 4, data.table::rleid(se > 4))
i2 <- names(which.max(tapply(names(i1)[i1], names(i1)[i1], length)))
sum1 <- length(i1[names(i1) < i2])
sum2 <- length(i1[names(i1) > i2])
sum1
#[1] 8
sum2
#[1] 7
You can also make it a function,
f1 <- function(x, size) {
i1 <- setNames(x > size, data.table::rleid(se > size))
i2 <- names(which.max(tapply(names(i1)[i1], names(i1)[i1], length)))
return(c(length(i1[names(i1) < i2]), length(i1[names(i1) > i2])))
}
f1(se, 4)
#[1] 8 7
Another possible approach:
with(obj, {
i <- which(values & lengths==max(lengths))
c(sum(lengths[seq_len(i-1)]), sum(lengths[-seq_len(i)]))
})
output:
[1] 8 7

Test for equality among all elements of a single numeric vector

I'm trying to test whether all elements of a vector are equal to one another. The solutions I have come up with seem somewhat roundabout, both involving checking length().
x <- c(1, 2, 3, 4, 5, 6, 1) # FALSE
y <- rep(2, times = 7) # TRUE
With unique():
length(unique(x)) == 1
length(unique(y)) == 1
With rle():
length(rle(x)$values) == 1
length(rle(y)$values) == 1
A solution that would let me include a tolerance value for assessing 'equality' among elements would be ideal to avoid FAQ 7.31 issues.
Is there a built-in function for type of test that I have completely overlooked? identical() and all.equal() compare two R objects, so they won't work here.
Edit 1
Here are some benchmarking results. Using the code:
library(rbenchmark)
John <- function() all( abs(x - mean(x)) < .Machine$double.eps ^ 0.5 )
DWin <- function() {diff(range(x)) < .Machine$double.eps ^ 0.5}
zero_range <- function() {
if (length(x) == 1) return(TRUE)
x <- range(x) / mean(x)
isTRUE(all.equal(x[1], x[2], tolerance = .Machine$double.eps ^ 0.5))
}
x <- runif(500000);
benchmark(John(), DWin(), zero_range(),
columns=c("test", "replications", "elapsed", "relative"),
order="relative", replications = 10000)
With the results:
test replications elapsed relative
2 DWin() 10000 109.415 1.000000
3 zero_range() 10000 126.912 1.159914
1 John() 10000 208.463 1.905251
So it looks like diff(range(x)) < .Machine$double.eps ^ 0.5 is fastest.
Why not simply using the variance:
var(x) == 0
If all the elements of x are equal, you will get a variance of 0.
This works only for double and integers though.
Edit based on the comments below:
A more generic option would be to check for the length of unique elements in the vector which must be 1 in this case. This has the advantage that it works with all classes beyond just double and integer from which variance can be calculated from.
length(unique(x)) == 1
If they're all numeric values then if tol is your tolerance then...
all( abs(y - mean(y)) < tol )
is the solution to your problem.
EDIT:
After looking at this, and other answers, and benchmarking a few things the following comes out over twice as fast as the DWin answer.
abs(max(x) - min(x)) < tol
This is a bit surprisingly faster than diff(range(x)) since diff shouldn't be much different than - and abs with two numbers. Requesting the range should optimize getting the minimum and maximum. Both diff and range are primitive functions. But the timing doesn't lie.
And, in addition, as #Waldi pointed out, abs is superfluous here.
I use this method, which compares the min and the max, after dividing by the mean:
# Determine if range of vector is FP 0.
zero_range <- function(x, tol = .Machine$double.eps ^ 0.5) {
if (length(x) == 1) return(TRUE)
x <- range(x) / mean(x)
isTRUE(all.equal(x[1], x[2], tolerance = tol))
}
If you were using this more seriously, you'd probably want to remove missing values before computing the range and mean.
You can just check all(v==v[1])
> isTRUE(all.equal( max(y) ,min(y)) )
[1] TRUE
> isTRUE(all.equal( max(x) ,min(x)) )
[1] FALSE
Another along the same lines:
> diff(range(x)) < .Machine$double.eps ^ 0.5
[1] FALSE
> diff(range(y)) < .Machine$double.eps ^ 0.5
[1] TRUE
You can use identical() and all.equal() by comparing the first element to all others, effectively sweeping the comparison across:
R> compare <- function(v) all(sapply( as.list(v[-1]),
+ FUN=function(z) {identical(z, v[1])}))
R> compare(x)
[1] FALSE
R> compare(y)
[1] TRUE
R>
That way you can add any epsilon to identical() as needed.
Since I keep coming back to this question over and over, here's an Rcpp solution that will generally be much much faster than any of the R solutions if the answer is actually FALSE (because it will stop the moment it encounters a mismatch) and will have the same speed as the fastest R solution if the answer is TRUE. For example for the OP benchmark, system.time clocks in at exactly 0 using this function.
library(inline)
library(Rcpp)
fast_equal = cxxfunction(signature(x = 'numeric', y = 'numeric'), '
NumericVector var(x);
double precision = as<double>(y);
for (int i = 0, size = var.size(); i < size; ++i) {
if (var[i] - var[0] > precision || var[0] - var[i] > precision)
return Rcpp::wrap(false);
}
return Rcpp::wrap(true);
', plugin = 'Rcpp')
fast_equal(c(1,2,3), 0.1)
#[1] FALSE
fast_equal(c(1,2,3), 2)
#[2] TRUE
I wrote a function specifically for this, which can check not only elements in a vector, but also capable of checking if all elements in a list are identical. Of course it as well handle character vectors and all other types of vector well. It also has appropriate error handling.
all_identical <- function(x) {
if (length(x) == 1L) {
warning("'x' has a length of only 1")
return(TRUE)
} else if (length(x) == 0L) {
warning("'x' has a length of 0")
return(logical(0))
} else {
TF <- vapply(1:(length(x)-1),
function(n) identical(x[[n]], x[[n+1]]),
logical(1))
if (all(TF)) TRUE else FALSE
}
}
Now try some examples.
x <- c(1, 1, 1, NA, 1, 1, 1)
all_identical(x) ## Return FALSE
all_identical(x[-4]) ## Return TRUE
y <- list(fac1 = factor(c("A", "B")),
fac2 = factor(c("A", "B"), levels = c("B", "A"))
)
all_identical(y) ## Return FALSE as fac1 and fac2 have different level order
You do not actually need to use min, mean, or max.
Based on John's answer:
all(abs(x - x[[1]]) < tolerance)
Here an alternative using the min, max trick but for a data frame. In the example I am comparing columns but the margin parameter from apply can be changed to 1 for rows.
valid = sum(!apply(your_dataframe, 2, function(x) diff(c(min(x), max(x)))) == 0)
If valid == 0 then all the elements are the same
Another solution which uses the data.table package, compatible with strings and NA is uniqueN(x) == 1

Count number of vector values in range with R

In R, if you test a condition on a vector instead of a scalar, it will return a vector containing the result of the comparison for each value in the vector. For example...
> v <- c(1,2,3,4,5)
> v > 2
[1] FALSE FALSE TRUE TRUE TRUE
In this way, I can determine the number of elements in a vector that are above or below a certain number, like so.
> sum(v > 2)
[1] 3
> sum(v < 2)
[1] 1
Does anyone know how I can determine the number of values in a given range? For example, how would I determine the number of values greater than 2 but less than 5?
Try
> sum(v > 2 & v < 5)
There are also the %<% and %<=% comparison operators in the TeachingDemos package which allow you to do this like:
sum( 2 %<% x %<% 5 )
sum( 2 %<=% x %<=% 5 )
which gives the same results as:
sum( 2 < x & x < 5 )
sum( 2 <= x & x <= 5 )
Which is better is probably more a matter of personal preference.
Use which:
set.seed(1)
x <- sample(10, 50, replace = TRUE)
length(which(x > 3 & x < 5))
# [1] 6

Resources