Reduce with less than symbol - r

I never think to use Reduce but I have a problem I thought it would be good for. I want to make sure the size of each iterative element of a vector is equal to or larger than the previous element. I can do this with sapply but my attempt with Reduce fails. How can I use this with Reduce?
#This works
y <- c(1,2,3,2,4,4)
sapply(seq_along(y)[-length(y)], function(i) y[i] <= y[i+1])
#attempts
Reduce('<', c(1,2,3,2,4,4)), accumulate = TRUE)
Reduce('<', c(1,2,3,2,4,4)))

The diff() function would be a logical choice here (others having explained nicely why Reduce() is not appropriate). It is already set up to compare the differences between elements of a vector and is already vectorised.
> !diff(y) < 0
[1] TRUE TRUE FALSE TRUE TRUE

Desparately bored? I was:
myFun <- function(x,z){
if(is.null(names(z))) names(z) <- z
if(is.null(names(x))) names(x) <- x
if(as.numeric(names(x)) < as.numeric(names(z))) res <- TRUE else res <- FALSE
names(res) <- names(z)
return(res)
}
as.logical(Reduce(myFun, y, accumulate = TRUE)[-1])
# [1] TRUE TRUE FALSE TRUE TRUE

It is my understanding from ?Reduce that Reduce compares the first and second element. Since 1 < 2 returns 1. It will reuse 1 and then compare it to the third element and so on. This means you will always compare 1 < y[3:length(y)] which turns out to be always true. Alternatively you could try:
head(y,-1) < tail(y, -1)

I don't think it can be used as Reduce will in general end up with something like f(f(x[1],x[2]),x[3]), so your comparison for the third element will be TRUE < 3.
identical(y,sort(y))
would appear to be a more efficient solution for this problem.

Related

ifelse is acting oddly by giving answer other than designed, in r [duplicate]

This question already has answers here:
if-else vs ifelse with lists
(3 answers)
Closed 8 years ago.
Those two functions should give similar results, don't they?
f1 <- function(x, y) {
if (missing(y)) {
out <- x
} else {
out <- c(x, y)
}
return(out)
}
f2 <- function(x, y) ifelse(missing(y), x, c(x, y))
Results:
> f1(1, 2)
[1] 1 2
> f2(1, 2)
[1] 1
This is not related to missing, but rather to your wrong use of ifelse. From help("ifelse"):
ifelse returns a value with the same shape as test which is filled with elements selected from either yes or no depending on whether the element of test is TRUE or FALSE.
The "shape" of your test is a length-one vector. Thus, a length-one vector is returned. ifelse is not just different syntax for if and else.
The same result occurs outside of the function:
> ifelse(FALSE, 1, c(1, 2))
[1] 1
The function ifelse is designed for use with vectorised arguments. It tests the first element of arg1, and if true returns the first element of arg2, if false the first element of arg3. In this case it ignores the trailing elements of arg3 and returns only the first element, which is equivalent to the TRUE value in this case, which is the confusing part. It is clearer what is going on with different arguments:
> ifelse(FALSE, 1, c(2, 3))
[1] 2
> ifelse(c(FALSE, FALSE), 1, c(2,3))
[1] 2 3
It is important to remember that everything (even length 1) is a vector in R, and that some functions deal with each element individually ('vectorised' functions) and some with the vector as a whole.

Finding if values exists in previous n values of vector r

Am I missing a better way of doing this - or at least a way that allows to vary the window size?
Say I have a vector, v.
v <- c(T,T,F,F,F,F,F,T,T,T,T,F,F,F,F,T,F,F,F,F,F,F,T,F)
I wish to convert this vector such that FALSEs are turned to TRUEs if a TRUE appeared within the previous 3 elements. e.g. the F's at positions 3,4,5 would also switch to T's as there is a T at position 2. The F at position 6 would not.
Solution if only interested in a window of 3:
vlag1 <- lag(v)
vlag2 <- lag(vlag1)
vlag3 <- lag(vlag2)
ifelse(v==T|vlag1==T|vlag2==T|vlag3==T,T,F)
Gives the desired result:
TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE
But what if we wanted to vary the window to e.g. 4 or 5 - is there a better way?
You could do this with rollapply from the zoo package:
library(zoo)
rollapply(v,
width = 4, align = "right", partial = TRUE,
FUN = function(x) ifelse(TRUE %in% x, TRUE, FALSE))
Note that I have set width = 4 here, not 3. In your question, you said you wanted to check the previous 3 elements. Width includes the ith element. So, if you want to base the result on the previous 3, you have to set width to 4. You also need to include align = "right" to look back from the ith element (the default centers on the ith element, and you can also look ahead with align = "left").
This is a more manual solution:
# Input vector
v <- c(T,T,F,F,F,F,F,T,T,T,T,F,F,F,F,T,F,F,F,F,F,F,T,F)
# Size of the window
k <- 3
# Output vector
outp <- rep(F,length(v))
for(i in seq(length(v))){
# Checking values on variable window
aux <- v[seq(pmax(1,i-k),i)]
# Writing on output vector
outp[i] <- any(aux==T)
}
outp
Here is one option with data.table
library(data.table)
n <- 3
r1 <- Reduce(`|`, shift(v, seq_len(n), fill = FALSE))
identical(r1, r2)
#[1] TRUE
where 'r2' is the OP's output from ifelse

comparing two numerical variables in R

I have run the R function stl() function and use its generated residuals for grubbs test. The code is the following:
stl.res = stl(dataset, s.windows='periodic')
residuals = as.numeric(strl.res$time.series[, "remainder"])
grubbs.result = grubbs.test(residuals)
strsplit(grubbs.result$alternative," ")[[1]][3]
## [1] "38.4000349179783"
outlier = as.numeric(strsplit(grubbs.result$alternative," ")[[1]][3])
outlier
## [1] 38.40003492
which(residuals == outlier)
## integer(0)
My question is why the return value of which() is 0. Actually residuals[1920] = 38.4000349179783. So the call of which() should return a value of 1921, not 0. I guess this is a problem with precision. I have tried many ways, but not managed to solve it.
If it's really a precision issue (which would be R FAQ 7.31), then there are various ways to get around it.
# this is an approximation of your test
x <- c(1, 2, 38.4000349179783, 4, 5)
y <- 38.40003492
> x == y
[1] FALSE FALSE FALSE FALSE FALSE
# so which doesn't return anything
# one basic approach
> which(abs(x - y) < .00001)
[1] 3
You could also rig up something using all.equal(), but checking for a difference less than your pre-selected limit is probably easiest.
Maybe you can use isTrue() and all.equal() as below:
which(sapply(residuals, function(v) isTrue(all.equal(v,outlier))) == T)

Creating a Pure function in R

I'm used to using Pure functions in Mathematica. How might I use them in R? For example:
Given a list of numbers, I want to assign TRUE/FALSE depending on whether the number is positive/negative.
z <- do.call(rnorm,list(n=10)) # Generate 10 numbers
f <- function(x) { x > 0 ? TRUE : FALSE } # Searching for proper syntax
b <- lapply(z,f)
Thanks
Narrowly translated, your function would be:
f <- function(x) { if (x > 0) TRUE else FALSE }
(you don't need to use ifelse() because this is a context in which x will be a scalar (i.e., a length-1 vector))
f <- function(x) { x > 0 }
would give the same result in your lapply call: so would
lapply(z,">",0)
As commented above you could use ifelse(z>0,TRUE,FALSE).
But there's no need to specify logical return values, because the result of z>0 is already a logical vector. The idiomatic way to do this would be
z <- rnorm(10) ## no need for do.call() in this example
z > 0
(logical comparison is vectorized in R)
Really easy:
b = z > 0
Most simple operations in R are already vectorized.

Test for equality among all elements of a single numeric vector

I'm trying to test whether all elements of a vector are equal to one another. The solutions I have come up with seem somewhat roundabout, both involving checking length().
x <- c(1, 2, 3, 4, 5, 6, 1) # FALSE
y <- rep(2, times = 7) # TRUE
With unique():
length(unique(x)) == 1
length(unique(y)) == 1
With rle():
length(rle(x)$values) == 1
length(rle(y)$values) == 1
A solution that would let me include a tolerance value for assessing 'equality' among elements would be ideal to avoid FAQ 7.31 issues.
Is there a built-in function for type of test that I have completely overlooked? identical() and all.equal() compare two R objects, so they won't work here.
Edit 1
Here are some benchmarking results. Using the code:
library(rbenchmark)
John <- function() all( abs(x - mean(x)) < .Machine$double.eps ^ 0.5 )
DWin <- function() {diff(range(x)) < .Machine$double.eps ^ 0.5}
zero_range <- function() {
if (length(x) == 1) return(TRUE)
x <- range(x) / mean(x)
isTRUE(all.equal(x[1], x[2], tolerance = .Machine$double.eps ^ 0.5))
}
x <- runif(500000);
benchmark(John(), DWin(), zero_range(),
columns=c("test", "replications", "elapsed", "relative"),
order="relative", replications = 10000)
With the results:
test replications elapsed relative
2 DWin() 10000 109.415 1.000000
3 zero_range() 10000 126.912 1.159914
1 John() 10000 208.463 1.905251
So it looks like diff(range(x)) < .Machine$double.eps ^ 0.5 is fastest.
Why not simply using the variance:
var(x) == 0
If all the elements of x are equal, you will get a variance of 0.
This works only for double and integers though.
Edit based on the comments below:
A more generic option would be to check for the length of unique elements in the vector which must be 1 in this case. This has the advantage that it works with all classes beyond just double and integer from which variance can be calculated from.
length(unique(x)) == 1
If they're all numeric values then if tol is your tolerance then...
all( abs(y - mean(y)) < tol )
is the solution to your problem.
EDIT:
After looking at this, and other answers, and benchmarking a few things the following comes out over twice as fast as the DWin answer.
abs(max(x) - min(x)) < tol
This is a bit surprisingly faster than diff(range(x)) since diff shouldn't be much different than - and abs with two numbers. Requesting the range should optimize getting the minimum and maximum. Both diff and range are primitive functions. But the timing doesn't lie.
And, in addition, as #Waldi pointed out, abs is superfluous here.
I use this method, which compares the min and the max, after dividing by the mean:
# Determine if range of vector is FP 0.
zero_range <- function(x, tol = .Machine$double.eps ^ 0.5) {
if (length(x) == 1) return(TRUE)
x <- range(x) / mean(x)
isTRUE(all.equal(x[1], x[2], tolerance = tol))
}
If you were using this more seriously, you'd probably want to remove missing values before computing the range and mean.
You can just check all(v==v[1])
> isTRUE(all.equal( max(y) ,min(y)) )
[1] TRUE
> isTRUE(all.equal( max(x) ,min(x)) )
[1] FALSE
Another along the same lines:
> diff(range(x)) < .Machine$double.eps ^ 0.5
[1] FALSE
> diff(range(y)) < .Machine$double.eps ^ 0.5
[1] TRUE
You can use identical() and all.equal() by comparing the first element to all others, effectively sweeping the comparison across:
R> compare <- function(v) all(sapply( as.list(v[-1]),
+ FUN=function(z) {identical(z, v[1])}))
R> compare(x)
[1] FALSE
R> compare(y)
[1] TRUE
R>
That way you can add any epsilon to identical() as needed.
Since I keep coming back to this question over and over, here's an Rcpp solution that will generally be much much faster than any of the R solutions if the answer is actually FALSE (because it will stop the moment it encounters a mismatch) and will have the same speed as the fastest R solution if the answer is TRUE. For example for the OP benchmark, system.time clocks in at exactly 0 using this function.
library(inline)
library(Rcpp)
fast_equal = cxxfunction(signature(x = 'numeric', y = 'numeric'), '
NumericVector var(x);
double precision = as<double>(y);
for (int i = 0, size = var.size(); i < size; ++i) {
if (var[i] - var[0] > precision || var[0] - var[i] > precision)
return Rcpp::wrap(false);
}
return Rcpp::wrap(true);
', plugin = 'Rcpp')
fast_equal(c(1,2,3), 0.1)
#[1] FALSE
fast_equal(c(1,2,3), 2)
#[2] TRUE
I wrote a function specifically for this, which can check not only elements in a vector, but also capable of checking if all elements in a list are identical. Of course it as well handle character vectors and all other types of vector well. It also has appropriate error handling.
all_identical <- function(x) {
if (length(x) == 1L) {
warning("'x' has a length of only 1")
return(TRUE)
} else if (length(x) == 0L) {
warning("'x' has a length of 0")
return(logical(0))
} else {
TF <- vapply(1:(length(x)-1),
function(n) identical(x[[n]], x[[n+1]]),
logical(1))
if (all(TF)) TRUE else FALSE
}
}
Now try some examples.
x <- c(1, 1, 1, NA, 1, 1, 1)
all_identical(x) ## Return FALSE
all_identical(x[-4]) ## Return TRUE
y <- list(fac1 = factor(c("A", "B")),
fac2 = factor(c("A", "B"), levels = c("B", "A"))
)
all_identical(y) ## Return FALSE as fac1 and fac2 have different level order
You do not actually need to use min, mean, or max.
Based on John's answer:
all(abs(x - x[[1]]) < tolerance)
Here an alternative using the min, max trick but for a data frame. In the example I am comparing columns but the margin parameter from apply can be changed to 1 for rows.
valid = sum(!apply(your_dataframe, 2, function(x) diff(c(min(x), max(x)))) == 0)
If valid == 0 then all the elements are the same
Another solution which uses the data.table package, compatible with strings and NA is uniqueN(x) == 1

Resources