Finding matching position of numeric values in R - r

The numeric variable weitage is given like,
> weitage
[1] 20 10 50 10 5 5
Then,
sort_wei<-sort(weitage,decreasing = T)
sort_wei
[1] 50 20 10 10 5 5
match(sort_wei,weitage)
results in 3 1 2 2 5 5. But actually needed position is 3 1 2 4 5 6. How to get these positions? Can i use match() in R?

We can try using the order function, which returns the indices of the input vector according to some sort order:
order(weitage, decreasing=TRUE)
#[1] 3 1 2 4 5 6

Related

remove the n smallest elements from a vector, including repeating elements

I have the following vector:
v = c(1,2,3,1,3,2,3,4,3,3,1, 5, 5,2)
I would like to obtain the vector
v_new = c(3,3,2,3,4,3,3,5,2,2)
from which I removed the first smallest elements which are 1, 1, 1, 2. Please not that I do not want to remove the other occurrence of the number 2. The function order almost gives me what I need, except its output is weird because it takes care that v[order(v)] gives the elements in increasing order and does not give the rank of the elements. rank also gives something strange:
v[rank(v)]
[1] 2 3 3 2 3 3 3 5 3 3 2 5 5 3
Any help would be much appreciated.
order is what you need, but to make it work, you need negative indexing. By itself, order returns the set of indices that would sort the input vector:
v = c(1,2,3,1,3,2,3,4,3,3,1,5,5,2)
order(v)
#> [1] 1 4 11 2 6 14 3 5 7 9 10 8 12 13
v[order(v)]
#> [1] 1 1 1 2 2 2 3 3 3 3 3 4 5 5
You can use negative indexing to remove elements from a vector:
(5:1)[c(-1, -2)]
#> [1] 3 2 1
Putting the two together, to remove the smallest elements from a vector, negate the first n elements of the results of order:
v[-order(v)[1:4]]
#> [1] 3 3 2 3 4 3 3 5 5 2
Note that order indexes tied elements from the front, which is why the first 2 is the one removed.

Get the indices of the last element of each run in vector

How to the get the index of last element of each run?
For example:
Let us consider a vector
x=c(1,2,3,4,4,4,5,6,6,7,8,9,9,9,9)
Want get the output vector
x1=1 2 3 6 7 9 10 11 15
Tried using:
rank(x)
It is not giving the desired result.
(Probably a dupe, but here you go..)
You can use the magic powers of ?rle combined with cumsum:
cumsum(rle(x)$lengths)
#[1] 1 2 3 6 7 9 10 11 15
The output of rle is:
rle(x)
#Run Length Encoding
# lengths: int [1:9] 1 1 1 3 1 2 1 1 4
# values : num [1:9] 1 2 3 4 5 6 7 8 9
Using the which() function in R
k<-as.vector(unique(x))
x1<-vector()
for(i in 1:length(k)){
x1[i]=tail(which(x==k[i]),1)
}

Finding the minimum positive value

I guess I don't know which.min as well as I thought.
I'm trying to find the occurrence in a vector of a minimum value that is positive.
TIME <- c(0.00000, 4.47104, 6.10598, 6.73993, 8.17467, 8.80862, 10.00980, 11.01080, 14.78110, 15.51520, 16.51620, 17.11680)
I want to know for the values z of 1 to 19, the index of the above vector TIME containing the value that is closest to but above z. I tried the following code:
vec <- sapply(seq(1,19,1), function(z) which.min((z-TIME > 0)))
vec
#[1] 2 2 2 2 3 3 5 5 7 7 8 9 9 9 10 11 12 1 1
To my mind, the last two values of vec should be '12, 12'. The reason it's doing this is because it thinks that '0.0000' is closest to 0.
So, I thought that maybe it was because I exported the data from external software and that 0.0000 wasn't really 0. But,
TIME[1]==0 #TRUE
Then I got further confused. Why do these give the answer of index 1, when really they should be an ERROR?
which.min(0 > 0 ) #1
which.min(-1 > 0 ) #1
I'll be glad to be put right.
EDIT:
I guess in a nutshell, what is the better way to get this result:
#[1] 2 2 2 2 3 3 5 5 7 7 8 9 9 9 10 11 12 12 12
which shows the index of TIME that gives the smallest possible positive value, when subtracting each element of TIME from the values of 1 to 19.
The natural function to use here (both to limit typing and for efficiency) is actually not which.min + sapply but the cut function, which will determine which range of times each of the values 1:19 falls into:
cut(1:19, breaks=TIME, right=FALSE)
# [1] [0,4.47) [0,4.47) [0,4.47) [0,4.47) [4.47,6.11) [4.47,6.11) [6.74,8.17)
# [8] [6.74,8.17) [8.81,10) [8.81,10) [10,11) [11,14.8) [11,14.8) [11,14.8)
# [15] [14.8,15.5) [15.5,16.5) [16.5,17.1) <NA> <NA>
# 11 Levels: [0,4.47) [4.47,6.11) [6.11,6.74) [6.74,8.17) [8.17,8.81) ... [16.5,17.1)
From this, you can easily determine what you're looking for, which is the index of the smallest element in TIME greater than the cutoff:
(x <- as.numeric(cut(1:19, breaks=TIME, right=FALSE))+1)
# [1] 2 2 2 2 3 3 5 5 7 7 8 9 9 9 10 11 12 NA NA
The last two entries appear as NA because there is no element in TIME that exceeds 18 or 19. If you wanted to replace these with the largest element in TIME, you could do so with replace:
replace(x, is.na(x), length(TIME))
# [1] 2 2 2 2 3 3 5 5 7 7 8 9 9 9 10 11 12 12 12
Here's one way:
x <- t(outer(TIME,1:19,`-`))
max.col(ifelse(x<0,x,Inf),ties="first")
# [1] 2 2 2 2 3 3 5 5 7 7 8 9 9 9 10 11 12 12 12
It's computationally wasteful to take all the differences in this way, since both vectors are ordered.

R table function

If I have a vector numbers <- c(1,1,2,4,2,2,2,2,5,4,4,4), and I use 'table(numbers)', I get
names 1 2 4 5
counts 2 5 4 1
What if I want it to include 3 also or generally, all numbers from 1:max(numbers) even if they are not represented in numbers. Thus, how would I generate an output as such:
names 1 2 3 4 5
counts 2 5 0 4 1
If you want R to add up numbers that aren't there, you should create a factor and explicitly set the levels. table will return a count for each level.
table(factor(numbers, levels=1:max(numbers)))
# 1 2 3 4 5
# 2 5 0 4 1
For this particular example (positive integers), tabulate would also work:
numbers <- c(1,1,2,4,2,2,2,2,5,4,4,4)
tabulate(numbers)
# [1] 2 5 0 4 1

Replicating vector elements by index

I have an integer vector:
a <- c(1,1,3,1,4)
where each element in a indicates how many times its index should be replicated in a new vector.
So the resulting vector should be:
b <- c(1,2,3,3,3,4,5,5,5,5)
What would be the most efficient way to do this?
For example using rep:
rep(seq_along(a),a)
1 2 3 3 3 4 5 5 5 5
Another less efficient option is to use inverse.rle :
inverse.rle(list(lengths=a,values=seq_along(a)))
[1] 1 2 3 3 3 4 5 5 5 5

Resources