Combining vector indexes and queries - r

I want to select the first 5 elements of a vector and those that are greater that a certain threshold. For example:
v = c(10, 11, 2, 8, 5, 2, 10)
v[1:5] # return the first 5 elements
v[which(v>5)] # returns all elements > 5
How do I combine the two queries to return 10, 11, 2, 8, 5, 10? That is the first 5 elements, plus 10 because greater than 5.

We could use union
union(v[1:5], v[which(v>5)])
Or as commented by #Vlo (in case there are duplicate values)
v[union(1:5, which(v>5))]

Related

How to get the positions of the k greatest(or smallest) elements of a vector? [duplicate]

This question already has answers here:
How to find the largest N elements in a list in R?
(4 answers)
Closed 10 months ago.
I have a vector and I want to find the indices of the k greatest elements, not the elements themselves which I could do with sort. One idea would be to add indices to the values and have a custom sort function that only compares the first elements of pairs (a classical solution to this problem) but surely there has to be a simpler way ? Note that performance isn`t a matter.
First I create a random vector:
vector <- c(1, 3, 6, 2, 7, 8, 10, 4)
Next, you can use the following code which will output the top k elements as x with index ix:
k <- 3
lst <- sort(vector, index.return=TRUE, decreasing=TRUE)
lapply(lst, `[`, lst$x %in% head(unique(lst$x),k))
Output:
$x
[1] 10 8 7
$ix
[1] 7 6 5
As you can see ix gives the index of the top k elements.
Using rank.
x <- c(1, 3, 6, 2, 7, 8, 10, 4)
seq_along(x)[rank(-x) < 4]
# [1] 5 6 7
If you have ties, the result is this:
x <- c(10, 3, 6, 2, 7, 8, 10, 4)
seq_along(x)[rank(-x) < 4]
# [1] 1 6 7

How to go along a numeric vector and mark the index of currently minimal value until finding a smaller one?

I want to obtain the indexes of minimal values such as:
v1 <- c(20, 30, 5, 18, 2, 10, 8, 4)
The result is:
1 3 5
Explanation:
Over v1, we start at value 20. Without moving on, we note the minimal value (20) and its index (1). We ignore the adjacent element because it is greater than 20. So 20 still holds the record for smallest. Then we move to 5, which is smaller than 20. Now that 5 is the smallest, we note its index (3). Since 18 isn't smaller than so-far-winner (5), we ignore it and keep going right. Since 2 is the smallest so far, it is the new winner and its position is noted (5). No value smaller than 2 moves right, so that's it. Finally, positions are:
1 # for `20`
3 # for `5`
5 # for `2`
Clearly, the output should always start with 1, because we never know what comes next.
Another example:
v2 <- c(7, 3, 4, 4, 4, 10, 12, 2, 7, 7, 8)
# output:
1 2 8
Which.min() seems to be pretty relevant. But I'm not sure how to use it to get the desired result.
You can use:
which(v1 == cummin(v1))
[1] 1 3 5
If you have duplicated cumulative minimums and don't want the duplicates indexed, you can use:
which(v1 == cummin(v1) & !duplicated(v1))
Or:
match(unique(cummin(v1)), v1)
This is the verbose way:
library(purrr)
v1 <- c(20, 30, 5, 18, 2, 10, 8, 4)
v1 %>%
length() %>%
seq() %>%
map_dbl(~ which.min(v1[1: .x])) %>%
unique()
#> [1] 1 3 5
Created on 2021-12-08 by the reprex package (v2.0.1)

calculating distance between values in vector R

I have the following multiset X, in which I want to find the distances between all the numbers. Is there any way to integrate this into a FOR LOOP so that If I was given a different sized multiset, I wouldn't have to manually do it like i did below?
the final answer IS [0,2, 2, 3, 3, 4, 5, 6, 7, 8, 10] (sorted) for this example
X=c(0,10,8,3,6)
L=length(X)
print(L)
##for(i in seq(from=1, to=L )){}
print(abs(X[1]-X[2]), abs(X[1]-X[3]),
abs(X[1]-X[4]), abs(X[1]-X[5]),
abs(X[1]-X[6]),
abs(X[2]-X[3]), abs(X[2]-X[4]),
abs(X[2]-X[5]), abs(X[2]-X[6]),
abs(X[3]-X[4]), abs(X[3]-X[5]),
abs(X[3]-X[6]),
abs(X[4]-X[5]), abs(X[4]-X[6]),
abs(X[5]-X[6])
)
You may see this vector as a column vector and apply dist:
sort(dist(X))
# [1] 2 2 3 3 4 5 6 7 8 10

R: find value/index which is not in rank

Imaging a vector which should have ordered values, but one or several values are not in order. How can I find the index of the values that break ranks?
I tried to use diff()but I couldn't find a way that it works for all different cases. I'm somehow lost with this. Thanks for any help.
E.g. a breking rank element at index 5
t <- c(4, 6, 10, 30, 15, 20, 31) # how to find the index of the value 30?
which(diff(t)<0)
> 4
this works but, what if the element breaking rank is 2 instead of 30?
t <- c(4, 6, 10, 2, 15, 20, 31) # how to find the index of the value 2?
which(diff(t)<0)
> 3
or it is the last element
t <- c(4, 6, 10, 12, 15, 20, 9) # how to find the index of the value 9?
which(diff(t)<0)
> 6

Sort, by the difference of two numbers

I want to sort an array in increasing way, by the difference of the biggest and smallest number.
Without loops.
I think I need a sort that I can give a condition but i can't find how.
Something like this:
sort(arr, decreasing = FALSE, by = max(a) - min(a))
sort(arr, decreasing = FALSE, condition = max(a) - min(a))
The sorted array have to look like this. The difference from the first and second number is the smallest for all numbers int the array, the difference from the second and the third is the second smallest ......
Example: // I thing is like this
array(22, 2, 32, 3, 6, 9, 7, 23, 11, 13)
sorted_array(9, 11, 7, 13, 6, 22, 3, 23, 2, 32)
I thing another way is to construct the sorted array be putting on the last position the biggest number after that the smallest, the second biggest, the second smallest, ...
Sorry for the bad explanation.
This is a idea how it could work, but only for arrays where the length is even. If you want to use this solution and you have uneven arrays, you can work with if. I need to admit that it have to be urgent, that I would like to use a construction like this instead of a loop.
x <- c(22, 2, 32, 3, 6, 9, 7, 23, 11, 13)
n <- length(x)
m <- floor(n/2)
rev(
as.numeric(
rbind(
sort(x)[n-c(0:(m-1))],
sort(x)[1:m]
)
)
)
I attempted to come up with a non-for-loop construction. So I first sorted the sequence and then split it in two halves by naming the first half "a_N" and second half "b_N", then "folded it " into a two column matrix with the first half reversed, and finally read it out by unfolding with c:
my_arr <- c(22, 2, 32, 3, 6, 9, 7, 23, 11, 13)
names(my_arr) <- paste0( rep( c("a","b"), each=length(my_arr)/2), order(my_arr) )
c( rbind( sort( my_arr[grep("a", names(my_arr))], decreasing=TRUE), #first half
my_arr[grep("b", names(my_arr))]) ) # second half
#[1] 9 11 7 13 6 22 3 23 2 32
You can see the intermediate value of the matrix:
rbind( sort( my_arr[grep("a", names(my_arr))], decreasing=TRUE), my_arr[grep("b", names(my_arr))])
a5 a4 a3 a2 a1
[1,] 9 7 6 3 2
[2,] 11 13 22 23 32
And since R matrices are read out in column order you get the desired interleaving with c() which also removes the names.

Resources