R: find value/index which is not in rank - r

Imaging a vector which should have ordered values, but one or several values are not in order. How can I find the index of the values that break ranks?
I tried to use diff()but I couldn't find a way that it works for all different cases. I'm somehow lost with this. Thanks for any help.
E.g. a breking rank element at index 5
t <- c(4, 6, 10, 30, 15, 20, 31) # how to find the index of the value 30?
which(diff(t)<0)
> 4
this works but, what if the element breaking rank is 2 instead of 30?
t <- c(4, 6, 10, 2, 15, 20, 31) # how to find the index of the value 2?
which(diff(t)<0)
> 3
or it is the last element
t <- c(4, 6, 10, 12, 15, 20, 9) # how to find the index of the value 9?
which(diff(t)<0)
> 6

Related

How to go along a numeric vector and mark the index of currently minimal value until finding a smaller one?

I want to obtain the indexes of minimal values such as:
v1 <- c(20, 30, 5, 18, 2, 10, 8, 4)
The result is:
1 3 5
Explanation:
Over v1, we start at value 20. Without moving on, we note the minimal value (20) and its index (1). We ignore the adjacent element because it is greater than 20. So 20 still holds the record for smallest. Then we move to 5, which is smaller than 20. Now that 5 is the smallest, we note its index (3). Since 18 isn't smaller than so-far-winner (5), we ignore it and keep going right. Since 2 is the smallest so far, it is the new winner and its position is noted (5). No value smaller than 2 moves right, so that's it. Finally, positions are:
1 # for `20`
3 # for `5`
5 # for `2`
Clearly, the output should always start with 1, because we never know what comes next.
Another example:
v2 <- c(7, 3, 4, 4, 4, 10, 12, 2, 7, 7, 8)
# output:
1 2 8
Which.min() seems to be pretty relevant. But I'm not sure how to use it to get the desired result.
You can use:
which(v1 == cummin(v1))
[1] 1 3 5
If you have duplicated cumulative minimums and don't want the duplicates indexed, you can use:
which(v1 == cummin(v1) & !duplicated(v1))
Or:
match(unique(cummin(v1)), v1)
This is the verbose way:
library(purrr)
v1 <- c(20, 30, 5, 18, 2, 10, 8, 4)
v1 %>%
length() %>%
seq() %>%
map_dbl(~ which.min(v1[1: .x])) %>%
unique()
#> [1] 1 3 5
Created on 2021-12-08 by the reprex package (v2.0.1)

Sort, by the difference of two numbers

I want to sort an array in increasing way, by the difference of the biggest and smallest number.
Without loops.
I think I need a sort that I can give a condition but i can't find how.
Something like this:
sort(arr, decreasing = FALSE, by = max(a) - min(a))
sort(arr, decreasing = FALSE, condition = max(a) - min(a))
The sorted array have to look like this. The difference from the first and second number is the smallest for all numbers int the array, the difference from the second and the third is the second smallest ......
Example: // I thing is like this
array(22, 2, 32, 3, 6, 9, 7, 23, 11, 13)
sorted_array(9, 11, 7, 13, 6, 22, 3, 23, 2, 32)
I thing another way is to construct the sorted array be putting on the last position the biggest number after that the smallest, the second biggest, the second smallest, ...
Sorry for the bad explanation.
This is a idea how it could work, but only for arrays where the length is even. If you want to use this solution and you have uneven arrays, you can work with if. I need to admit that it have to be urgent, that I would like to use a construction like this instead of a loop.
x <- c(22, 2, 32, 3, 6, 9, 7, 23, 11, 13)
n <- length(x)
m <- floor(n/2)
rev(
as.numeric(
rbind(
sort(x)[n-c(0:(m-1))],
sort(x)[1:m]
)
)
)
I attempted to come up with a non-for-loop construction. So I first sorted the sequence and then split it in two halves by naming the first half "a_N" and second half "b_N", then "folded it " into a two column matrix with the first half reversed, and finally read it out by unfolding with c:
my_arr <- c(22, 2, 32, 3, 6, 9, 7, 23, 11, 13)
names(my_arr) <- paste0( rep( c("a","b"), each=length(my_arr)/2), order(my_arr) )
c( rbind( sort( my_arr[grep("a", names(my_arr))], decreasing=TRUE), #first half
my_arr[grep("b", names(my_arr))]) ) # second half
#[1] 9 11 7 13 6 22 3 23 2 32
You can see the intermediate value of the matrix:
rbind( sort( my_arr[grep("a", names(my_arr))], decreasing=TRUE), my_arr[grep("b", names(my_arr))])
a5 a4 a3 a2 a1
[1,] 9 7 6 3 2
[2,] 11 13 22 23 32
And since R matrices are read out in column order you get the desired interleaving with c() which also removes the names.

Combining vector indexes and queries

I want to select the first 5 elements of a vector and those that are greater that a certain threshold. For example:
v = c(10, 11, 2, 8, 5, 2, 10)
v[1:5] # return the first 5 elements
v[which(v>5)] # returns all elements > 5
How do I combine the two queries to return 10, 11, 2, 8, 5, 10? That is the first 5 elements, plus 10 because greater than 5.
We could use union
union(v[1:5], v[which(v>5)])
Or as commented by #Vlo (in case there are duplicate values)
v[union(1:5, which(v>5))]

Conditional Replacement Column Content--many ids to be updated

Thinking I can take the easy way out, I was going to use elseif to replace id codes in an entire dataset. I have a specific dataset with a id column. I have to replace these old ids with updated ids, but there are 50k+ rows with 270 unique ids. So, I first tried:
df$id<- ifelse(df$id== 2, 1,
ifelse(df$id== 3, 5,
ifelse(df$id == 4, 5,
ifelse(df$id== 6, NA,
ifelse(df$id== 7, 7,
ifelse(df$id== 285, NA,
ifelse(df$id== 8, 10,.....
ifelse(df$id=200, 19, df$id)
While this would have worked, I am limited to 51 nests, and I cannot separate them because it would only a 1/4 of the set. And then updates for first half would interfere as codes do overlap.
I then tried
df$id[df$id== 2] <- 1
and I was going to do that for every code. However, if I update all twos to one, there is still a later code in which old and new "1" will become X number, and I would only want the old "1" to become X... I actually think this takes out the if else even if 51 was not the limit. A function similar to vlookup in Excel? Any ideas?
Thanks!
Old forum related to replacing cell contents, but does not work in my case.
Replace contents of factor column in R dataframe
partial example
df <- data.frame(id=seq(1, 10))
old.id <- c(2, 3, 4, 6)
new.id <- c(1, 5, 5, NA)
df$id[df$id %in% old.id] <- new.id[unlist(sapply(df$id, function(x) which(old.id==x)))]
output
> df
id
1 1
2 1
3 5
4 5
5 5
6 NA
7 7
8 8
9 9
10 10

Finding groups of contiguous numbers in a list [duplicate]

This question already has answers here:
Sequence length encoding using R
(6 answers)
Closed 9 years ago.
This is a duplicate question to this, except for R rather than Python.
I'd like to identify groups of contiguous (some people call them continuous) integers in a list, where duplicate entries are treated as existing within the same range. Therefore:
myfunc(c(2, 3, 4, 4, 5, 12, 13, 14, 15, 16, 17, 17, 20))
returns:
min max
2 5
12 17
20 20
Although any output format would be fine. My current brute-force, for-loop method is pretty slow.
(Apologies if I could have easily re-interpreted the Python answer and I'm being stupid!)
Just use diff:
x = c(2, 3, 4, 4, 5, 12, 13, 14, 15, 16, 17, 17, 20)
start = c(1, which(diff(x) != 1 & diff(x) != 0) + 1)
end = c(start - 1, length(x))
x[start]
# 2 12 20
x[end]
# 5 17 20

Resources