Finding extreme values in vector elements in R - r

I have a vector like this:
x<-c(-0.193,-0.126,-0.275,-0.375,-0.307,-0.347,-0.159,-0.268,-0.013,0.070,0.346,
0.376,0.471,0.512,0.291,0.554,0.185,0.209,0.057,0.058,-0.157,-0.291,-0.509,
-0.534,-0.239,-0.389,0.060,0.250,0.279,0.116,0.052,0.201,0.407,0.360,0.065,
-0.167,-0.572,-0.984,-1.044,-1.039,-0.831,-0.584,-0.425,-0.362,-0.154,0.207,
0.550,0.677,0.687,0.856,0.683,0.375,0.298,0.581,0.546,0.098,-0.081)
I would like to find the position of the lowest number each time >=5 consecutive values are <-0.5. In the example that is the value -1.044.
How do I find this?
What I have done is this:
xx<-ifelse(x>.5,1,NA)
xx
aa<-rle(xx)
zz <- rep(FALSE, length(xx))
zz[sequence(aa$lengths) == 1] <- aa$lengths >= 5 & aa$values == 1
zz
But then I just find the position of the first value and not the extreme.
Any help?

Thanks for posting what you've tried.
I'd just use a logical comparison for xx:
xx <- x < -0.5
Then your rle logic becomes:
aa <- rle(xx)
zz <- aa$lengths >= 5 & aa$values
From there, identify which values of zz are true and use cumsum to get the indicies of x (this is oversimplified since there is only once instance but you get the picture):
first <- which(zz)
idxs <- cumsum(aa$lengths[1:first])
min(x[idxs[first-1]:idxs[first]])
In the instance where you have multiple matches, first will be a vector with length > 1. In that case, make a function and you can apply it to your vector:
myfun <- function(y) {
idxs <- c(0, cumsum(aa$lengths[1:y]))
min(x[idxs[y]:idxs[y+1]])
}
set.seed(20)
x <- rnorm(100)
xx <- x < -0.5
aa <- rle(xx)
zz <- aa$lengths >= 3 & aa$values
first <- which(zz)
sapply(first, myfun)

A function with the apply function inside:
find.val <- function(x,threshold,n,all=T){
tmp <- rle(x < threshold)
cs <- cumsum(tmp$lengths)
dfcs <- data.frame(indices=c(0,cs[-length(cs)])+1,l=cs)
pos <- (apply(dfcs,1,function(y) which.min(x[y[1]:y[2]])+y[1]-1))[tmp$values==1 & tmp$lengths >= n]
if(all==T) return(pos)
pos[which.min(x[pos])]
}
if you set all=T you get all matches otherwise only the position of the lowest match.
Example:
find.val(x,-0.5,5,all=T)

Related

Conditional removal of rows in R

Consider the following matrix:
testMat <- matrix(c(1,2,1,3,
3,2,3,3,
1,3,1,3,
2,3,3,3,
3,3,3,3,
3,2,3,1), byrow = T, ncol = 4)
and following condition:
cond <- c(0,0) # binary
Problem: If cond[1] is 0 and there is a 1 in either the first or third column, the corresponding rows will be removed. Similarly if cond[2] is 0 and there is a 1 in either the second or fourth column, the corresponding rows will be removed. For example the new matrix will be:
newMat <- testMat[-c(1,3,6),] # for cond <- c(0,0)
newMat <- testMat[-c(1,3),] # for cond <- c(0,1)
newMat <- testMat[-c(6),] # for cond <- c(1,0)
newMat <- testMat # for cond <- c(1,1)
I tried in following way which is both wrong and clumsy.
newMat <- testMat[-(cond[1] == 0 & testMat[,c(1,3)] == 1),]
newMat <- newMat[-(cond[2] == 0 & newMat[,c(2,4)] == 1),]
Can you help to find a base R solution?
This is ugly, but seems to work:
(generalized for length of cond, assuming that the matrix has length(cond)*2 columns)
keepRows <- apply(testMat, 1,
function(rw, cond) {
logicrow <- vector(length=2)
for (b in 1:length(cond)) {
logicrow[b] <- ifelse(cond[b]==0, all(rw[b]!=1) & all(rw[length(cond)+b]!=1), TRUE)
}
all(logicrow)
}, cond = cond)
newMat <- testMat[keepRows, ]
(edited according to comment)
Assuming 1) cond can be of arbitrary length, 2) testMat has an even number of columns, and 3) the rule is to look at the i-th and (i+2)-th column of testMat
cond=c(0,0)
unlist(
sapply(
1:length(cond),
function(i){
j=rowSums(testMat[,c(i,i+2)]==1)
if (cond[i]==0 & sum(j)>0) which(j>0) else nrow(testMat)+1
}
)
)
[1] 1 3 6
which returns the rows which satisfy your conditions, you can then remove these
testMat[-.Last.value,]

Comparing partitions from split() using a nested for loop containing an if statement

Consider the below MWE that splits a distance matrix and attempts to compare partitions:
set.seed(1234) # set random seed for reproducibility
# generate random normal variates
x <- rnorm(5)
y <- rnorm(5)
df <- data.frame(x, y) # merge vectors into dataframe
d <- dist(x) # generate distance matrix
splt <- split(d, 1:5) # split data with 5 values in each partition
# compare partitions
for (i in 1:length(splt)) {
for (j in 1:length(splt)) {
if(splt[[i]] != splt[[j]]) {
a <- length(which(splt[[i]] >= min(splt[[j]]))) / length(splt[[i]])
b <- length(which(splt[[j]] <= max(splt[[i]]))) / length(splt[[j]])
}
}
}
# Error in if (splt[[i]] != splt[[j]]) { : the condition has length > 1
The above for loop should compare all unique partitions (i.e., (1, 2), (1, 3), ... ,(4, 5)). However, the condition is greater than 1.
The result for comparing partition 1 (split[[1]]) and partition 2 (split[[2]]) for instance should be a = b = 1.
a <- length(which(splt[[1]] >= min(splt[[2]]))) / length(splt[[1]])
b <- length(which(splt[[2]] <= max(splt[[1]]))) / length(splt[[2]])
I know the solution is to instead use ifelse() but there is no else within the nested loop.
Any ideas on how to proceed?
Is your problem the error message? That is, why R does not like your comparison splt[[i]] == splt[[j]]? The reason is that we get a vector of comparisons:
> splt[[1]] != splt[[2]]
[1] TRUE TRUE
If I understand you correctly, splt[[i]] is equal to splt[[j]] if all entries are equal and different otherwise. If so, change the comparison to be !(all(splt[[i]] == splt[[j]])).
The total loop looks like this:
for (i in 1:length(splt)) {
for (j in 1:length(splt)) {
if (!(all(splt[[i]] == splt[[j]]))) {
a <- length(which(splt[[i]] >= min(splt[[j]]))) / length(splt[[i]])
b <- length(which(splt[[j]] <= max(splt[[i]]))) / length(splt[[j]])
}
}
}

Remove a vector from another vector

I would like to remove from the vector wine below the vector b=c(1,0).
The result should be d=c(1,1,0).
library(gtools)
wine=c(1,1,1,0,0)
x=combinations(5,2,v=wine,set=FALSE,repeats.allowed=FALSE)
y=matrix(NA,nrow(x),3)
I want to find the complementary matrix y of x.
Thanks for your time.
The following uses a function I have posted here. The function finds where in y the vector x occurs returning an index vector into y.
First, get where b occurs in wine. Then the location is used to remove the found vector.
occurs <- function(x, y) {
m <- length(x)
n <- length(y)
candidate <- seq.int(length = n - m + 1L)
for (i in seq.int(length = m)) {
candidate <- candidate[x[i] == y[candidate + i - 1L]]
}
candidate
}
wine <- c(1,1,1,0,0)
b <- c(1,0)
i <- occurs(b, wine)
d <- wine[-(i + seq(b) - 1L)]
d
#[1] 1 1 0

How to condition a computation and then add al computation done in R?

i am experimenting with and R and I can't find the way to do the next thing:
1- I want to multiply if x == 3 multiply by "y" value of the same row
2- Add all computations done in step 1.
x <- 3426278722533992028364647392927338
y <- 7479550949037487987438746984798374
x <- as.numeric(strsplit(as.character(x), "")[[1]])
y <- as.numeric(strsplit(as.character(y), "")[[1]])
Table <- table(x,y)
Table <- data.frame(Table)
Table$Freq <- NULL
So I tried creating a function:
Calculation <- function (x,y) {
z <- if(x == 3){ x * y }
w <- sum(z)
}
x and y are the columns of the data.frame
This prints and error which I struggle to solve...
Thanks for your time,
Kylian Pattje
2 things here:
1. Use ifelse in your function,
Calculation <- function (x,y) {
z <- ifelse(x == 3, x * y, NA)
w <- sum(z, na.rm = TRUE)
return(w)
}
2. Make sure your variables are NOT factors,
Table[] <- lapply(Table, function(i) as.numeric(as.character(i)))
Calculation(Table$x, Table$y)
#[1] 84

Overlay rasters in r to meet three different conditions using ifelse

I'm trying to run an overlay function with rasters where I want to meet all of 3 different conditions at each cell (using == and > or < operators) and produce a single raster as output.
Running ifelse with the & operator seems to look at the conditions in a linear fashion from left to right - If the first two conditions are met then it will produce the if condition as output, regardless of the third condition. && can't be used here because the result is not vectorized.
You can see this with this example below where with the resulting raster it's clear that it is not evaluating all three arguments. First clue is that it produces results even if some values are NA in the third raster.
I think I can get the result I want by first checking the condition of x and y and then with this result separately checking the condition of z with a different function, but I was hoping to be able to do it all in one function (seems like this should be possible, at least).
Hoping someone can point me in the right direction.
library(raster)
fn <- system.file("external/test.grd", package="raster")
s <- stack(fn, fn,fn)
#Create grids
s[[1]] <- round(runif(ncell(s), 1, 2))
s[[2]] <- round(runif(ncell(s), 1, 2))
s[[3]] <- round(runif(ncell(s), 1, 2))
#convert some values in s[[3]] to NA
s[[3]][s[[3]] == 1]<- NA
#run overlay function
result.rast <- overlay(s[[1]], s[[2]], s[[3]], fun =
function(x,y,z) {
ifelse( x == 2 & y == 1 & z ==2, 1, 0)
} )
I do not see evidence for the third condition not being used. NA values are a special case. See function f2 for some things you can do.
It is easier to see what's going on with a smaller raster
library(raster)
set.seed(0)
r <- raster(ncol=10, nrow=10, xmn=0, xmx=10, ymn=0, ymx=10)
r1 <- setValues(r, round(runif(ncell(r), 1, 2)))
r2 <- setValues(r, round(runif(ncell(r), 1, 2)))
r3 <- setValues(r, round(runif(ncell(r), 1, 2)))
r3[r3 == 1] <- NA
s <- stack(r1, r2, r3)
res1 <- overlay(s, fun =
function(x,y,z) {
ifelse( x == 2 & y == 1 & z ==2, 1, 0)
} )
#A more complex function, that keeps NAs
f2 <- function(x,y,z) {
a <- rep(0, length(x))
a[x == 2 & y == 1 & z ==2] <- 1
a[is.na(x) | is.na(y) | is.na(z)] <- NA
a
}
res2 <- overlay(s, fun = f2)
Did you try stackApply?
You can also use each layer of the raster stack as vectors.
Here is an example (it might be a better way to reference the cells in the rasterStack, though)
tt <- raster(ncol=4,nrow=5)
tt[] <- 1
tts <- stack(tt,tt,tt)
tts[[1]][4,2]<-NA
# now the condition
tt2 <- (tts[[1]] == 1 & tts[[2]] == 1 & tts[[3]] == 1)
plot(tt2)

Resources