R: which() with vector in condition - r

I have data
test <- 1:10
and I would like to obtain the indices of test that fulfill different related conditions. For example,
which(test>5)[1]
which(test>8)[1]
which(test>9)[1]
yield
[1] 6
[1] 9
[1] 10
when carried out individually, but is there a way to execute them simultaneously using a vector like
bounds <- c(5,8,9)
That then yields a vector containing the indices for each value in bounds?

A couple of options are
findInterval(bounds, test) + 1
#[1] 6 9 10
which is the fastest, or
max.col(outer(bounds, test, `<`), 'first')
#[1] 6 9 10
which is the slowest, along with the commented one below the OP's post:
sapply(bounds, function(x) which(test > x)[1])
#[1] 6 9 10
which is neither the fastest, nor the slowest.

Just use apply:
sapply(bounds, function(x) which(test>x)[1])
[1] 6 9 10

Related

concatenate vectors from two lists by names [duplicate]

This question already has answers here:
Merge Two Lists in R
(9 answers)
Merge contents within list of list by duplicate name
(1 answer)
Closed 3 years ago.
So I'm heavily simplifying my actual problem, but I am trying to find a way to append values inside vectors from one list, to values in vectors in another list, and do it by name ( assuming the two lists are not ordered).So this is the setup to the problem ( the numbers themselves are arbitrary here):
Data1 <- list( c(1),c(2),c(3))
names(Data1) <- c("A", "B","C")
Data2 <- list(c(11), c(12), c(13))
names(Data2) <- c("B","A","C")
Now what Im trying to do, is find a way to get a third list - say Data3, so that calling Data3[["A"]] will give me the same result as calling c(1,12):
[1] 1 12
so >Data3 should give:
[1] 1 12
[2] 2 11
[3] 3 13
Essentially im looking to append many values from one list of vectors, to another list of vectors, and do it by names rather than order, if that makes sense. (I did think about trying some loops, but I feel like there should be another way that is simpler)
nm = names(Data1)
setNames(lapply(nm, function(x){
c(Data1[[x]], Data2[[x]])
}), nm)
#$A
#[1] 1 12
#$B
#[1] 2 11
#$C
#[1] 3 13
list(do.call("cbind", list(Data1, Data2)))
[,1] [,2]
A 1 11
B 2 12
C 3 13
If you don't mind your output to be a dataframe:
Data3 <- rbind(data.frame(Data1), data.frame(Data2))
Then Data3[["A"]] will give you:
[1] 1 12
We can use Map and arrange the elements of Data2 in the same order as Data1 (or vice versa) using names and then concatenate them.
Map(c, Data1, Data2[names(Data1)])
#$A
#[1] 1 12
#$B
#[1] 2 11
#$C
#[1] 3 13

Count if a particular element is within a given range?

I have a vector made up of lists of length 10.
I have two other vectors storing their lower and upper quantiles.
Is there a way to extract the data between the quantile for each list of 10?
Basically I am looking to see how many of these have a specific number.
sims is the vector with the data
so far I have tried to use the %in% (note- sims is the vector with lists))
for (i in 1:100){
a <- 80.0 %in% sims[[i]]
}
I was going to count how many of these are true and then count them however, this only returns false and also doesn't guarantee if it is in the range.
Is there an easier way than sorting each list by extracting relevant data then checking if it is has the value?
Since you don't provide a sample dataset here is a reproducible example based on some sample data I generate
set.seed(2018)
lst <- replicate(4, sample(10), simplify = FALSE)
qrt <- lapply(lst, quantile, probs = c(0.25, 0.75))
Here I've generated the 25% and 75% quantiles for every vector in list; the result is a list with as many elements as list.
We can now use Map to select only those entries from the list elements that fall within the quantile range
Map(function(x, y) x[x >= y[1] & x <= y[2]], lst, qrt)
#[[1]]
#[1] 4 5 7 6
#
#[[2]]
#[1] 4 6 5 7
#
#[[3]]
#[1] 6 5 4 7
#
#[[4]]
#[1] 4 7 6 5
To count the number of elements within the quantile range
Map(function(x, y) sum(x >= y[1] & x <= y[2]), lst, qrt)
#[[1]]
#[1] 4
#
#[[2]]
#[1] 4
#
#[[3]]
#[1] 4
#
#[[4]]
#[1] 4

find indices of values within tolerance range in R

say I have vector x
x <- c(1, 1, 1.1, 2, 1, 2.1, 2.6)
tol <- 0.4
how do I get the indices of the groups of elements that are 'unique' within the tolerance range (tol) as in the list below. I don't know how many of these groups there are beforehand.
[[1]]
[1] 1 2 3 5
[[2]]
[1] 4 6
[[3]]
[1] 7
thanks
Not 100% reliable, since it uses unique on lists, but you can try:
unique(apply(outer(x,x,function(a,b) abs(a-b)<tol),1,which))
#[[1]]
#[1] 1 2 3 5
#
#[[2]]
#[1] 4 6
#
#[[3]]
#[1] 7
The point #Roland raised in the comments showed that there is some ambiguity in your requirements. For instance if x<-c(1, 1.3, 1.6), my line gives three groups: 1-2, 2-3 and 1-2-3. This because, from the 1 point of view, it is similar only to 1.3, but from 1.3 point of view, it is similar to both 1 and 1.6.
An alternative using nn2 from RANN to find nearest neighbors within radius for clustering:
library(RANN)
x <- c(1, 1, 1.1, 2, 1, 2.1, 2.6)
tol=0.4
nn <- nn2(x,x,k=length(x),searchtype="radius",radius=tol)
m <- unique(apply(nn$nn.idx,1,sort), MARGIN=2)
sapply(seq_len(ncol(m)), function(i) m[which(m[,i] > 0),i])
##[[1]]
##[1] 1 2 3 5
##
##[[2]]
##[1] 4 6
##
##[[3]]
##[1] 7
x <- c(1, 1.3, 1.6)
nn <- nn2(x,x,k=length(x),searchtype="radius",radius=tol)
m <- unique(apply(nn$nn.idx,1,sort), MARGIN=2)
sapply(seq_len(ncol(m)), function(i) m[which(m[,i] > 0),i])
##[[1]]
##[1] 1 2
##
##[[2]]
##[1] 1 2 3
##
##[[3]]
##[1] 2 3
Notes:
The call to nn2 finds all nearest neighbors for each element of x with respect to all elements of x within a radius equalling the tol. The result nn$nn.idx is a matrix whose rows contain the indices that are nearest neighbors for each element in x. The matrix is dense and filled with zeroes as needed.
Clustering is performed by sorting each row so that unique rows can be extracted. The output m is a matrix where each column contains the indices in a cluster. Again, this matrix is dense and filled with zeroes as needed.
The resulting list is extracted by subsetting each column to remove the zero entries.
This is likely more efficient for large x because nn2 uses a KD-Tree, but it suffers from the same issue for elements that overlap (with respect to the tolerance) as pointed out by nicola.
Maybe it's a hammer to kill a mosquito, but i thought of univariate density clustering: the dbscan library enables you to do exactly that:
library(dbscan)
groups <- dbscan(as.matrix(x), eps=tol, minPts=1)$cluster
#### [1] 1 1 1 2 1 2 3
You don't neek to know in advance the number of groups.
It gives you the cluster number in output but you can if you prefer, take the groups means and round them to the closest integer. Once you've got this, you generate the list for instance like this:
split(seq_along(x), groups)
#### $`1`
#### [1] 1 2 3 5
#### ...
Edit: Behaviour with overlapping:
This algo attributes the same group to all elements that are within the range of tolerance of one other (works by proximity). So you might end up with fewer groups than expected if there is overlapping.
Here is another attempt with cut function from base R. We first try to create the range vector named sq and then go through x elements that falls within any specific range.
sq <- seq(min(x)-tol,max(x)+tol*2,tol*2)
# [1] 0.6 1.4 2.2 3.0
sapply(1:(length(sq)-1), function(i) which(!is.na(cut(x, breaks =c(sq[i], sq[i+1])))))
# [[1]]
# [1] 1 2 3 5
# [[2]]
# [1] 4 6
# [[3]]
# [1] 7
It does not produce any duplicate. (no need to use unique as it is the case for #nicola's answer)
It works as follows, in sapply, first we search for elements within the range [0.6, 1.4], then for [1.4, 2.2] and finally [2.2, 3.0].

Splitting a vector into several chunks and accessing each chunk

My question is a continuation of this:
Split a vector into chunks
What would be the best possible way to access of all these chunks. For example, is there an easy way to access these mini-vectors if I have around a hundred of them. I would be needing to find the minimum of each of these chunks and store the results in a new vector.
Look at the plyr package there is family of function to process lists or vectors. In the post you mentioned, you see lists. Thus, use llply to have input as a list and output as a list, for vectors aaply is your choice.
# Examples from ?lapply
x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE))
> x
$a
[1] 1 2 3 4 5 6 7 8 9 10
$beta
[1] 0.04978707 0.13533528 0.36787944 1.00000000 2.71828183 7.38905610 20.08553692
$logic
[1] TRUE FALSE FALSE TRUE
llply(x, mean)
llply(x, quantile, probs = 1:3/4)

seq and seq_along, best of both worlds?

If I want to number all elements in two vectors, vector 1 gets all odd bumbers and vector 2 gets all even numbers, I can do this assuming the vectors are of length 10.
seq(1, 10, by=2)
[1] 1 3 5 7 9
seq(2, 11, by=2)
[1] 2 4 6 8 10
but if my vector has only one element I will run into problems:
seq(2)
[1] 1 2
so I use:
seq_along(2)
[1] 1
BUT I cant use by= in seq_long(). How do i get the reliability of seq_along with the functionality of seq()?
This example might clear things.
Imagine I ahve two lists:
list1 <- list(4)
list2 <- list(4)
list1 must get even names along the element of the list.
list2 must get odd names along the element of the list.
I dont know how long the list elements will be.
seq_along(list1[[1]]) # this will know to only give one name but I cant make it even
seq(list2[[1]]) # this know to give 1 name
#and
seq(2, list1[[1]], by=2) # this gives me even but too nay names
Here's a function that adds a 'by' argument to seq_along:
seq_along_by = function(x, by=1L, from = 1L) (seq_along(x) - 1L) * by + from
and some test cases
> seq_along_by(integer(), 2L)
integer(0)
> seq_along_by(1, 2L)
[1] 1
> seq_along_by(1:4, 2L)
[1] 1 3 5 7
> seq_along_by(1:4, 2.2)
[1] 1.0 3.2 5.4 7.6
> seq_along_by(1:4, -2.2)
[1] 1.0 -1.2 -3.4 -5.6
one way i just found is:
y <- seq_along(1:20)
y[y %% 2 == 0 ]
[1] 2 4 6 8 10 12 14 16 18 20
y[ !y %% 2 == 0 ]
[1] 1 3 5 7 9 11 13 15 17 19
But this will only work when my vectors are even. Must be able to do better.
I'm not sure what you are trying to do, but if you want to split odd and even elements in a vector, you can do just that:
x <- 1:19
split(x,x%%2)
$`0`
[1] 2 4 6 8 10 12 14 16 18
$`1`
[1] 1 3 5 7 9 11 13 15 17 19
To extract the odd and even numbered elements, use lapply on this list using seq_along to enumerate the element numbers:
x <- rep(c("odd","even"),times=4)
lapply(split(seq_along(x),seq_along(x)%%2),function(y) "["(x,y))
$`0`
[1] "even" "even" "even" "even"
$`1`
[1] "odd" "odd" "odd" "odd"
This can of course be made into a function:
split_oe <- function(x) lapply(split(seq_along(x),seq_along(x)%%2),function(y) "["(x,y))
split_oe(1:10)
$`0`
[1] 2 4 6 8 10
$`1`
[1] 1 3 5 7 9
> split_oe(2)
$`1`
[1] 2
I'm adding another answer to address what may be your intent of the question rather than the question as you've stated it.
Let's assume you have a couple arrays, A1 and A2, with values, and you want to link an index to those values, so you can say index[n] and get a corresponding value from A1[n/2 + 1] if n is odd and A2[n/2] if n is even.
We would build a new vector, index, like so:
# Sample arrays
A1 <- sample(LETTERS, 5, rep=TRUE)
A2 <- sample(LETTERS, 5, rep=TRUE)
n_Max <- length(c(A1,A2))
index <- integer(n_Max)
index[seq(1,n_Max,by=2)] <- A1
index[seq(2,n_Max,by=2)] <- A2
Now, index[n] returns A1 values when n is odd, and returns A2 values when n is even. This breaks if length(A2) is not equal to or one less than length(A1).
If I understand correctly, what you really want is a to get the 'seq' function to return only odd or oven numbers 1..max or 2..max, respectively. You would write that like so:
seq(1, max, by=2) # Odd numbers
seq(2, max, by=2) # Even numbers
Where max is the top number in your series. The only time this will break is if max is less than 2.
Update 1: There seems to be a bit of discussion about what the OP is requesting. If we assume there are two existing vectors to be numbered, we can obtain the total number of vector items using max <- length(c(vector1, vector2)) to obtain the maximum number being used. Then, the indices would be assigned like so:
vector1 <- seq(1, max, by=2)
vector2 <- seq(2, max, by=2)
And this will work for any set EXCEPT when one vector does not have any elements at all.
Update 2: There is one final approach, which you can take if your vectors do not represent all values between 1 and max. This is how it would work:
vector1 <- seq(1, length(vector1) * 2, by=2)
vector2 <- seq(1, length(vector2) * 2, by=2)
This independently assigns the values of vector1 and vector2 according to their own lengths.

Resources