This question already has answers here:
rank and order in R
(7 answers)
Closed 4 years ago.
What is the difference between sort(), rank(), and order() in R.
Can you explain with examples?
sort() sorts the vector in an ascending order.
rank() gives the respective rank of the numbers present in the vector, the smallest number receiving the rank 1.
order() returns the indices of the vector in a sorted order.
for example: if we apply these functions are applied to the vector - c (3, 1, 2, 5, 4)
sort(c (3, 1, 2, 5, 4)) will give c(1,2,3,4,5)
rank(c (3, 1, 2, 5, 4)) will give c(3,1,2,5,4)
order(c (3, 1, 2, 5, 4)) will give c(2,3,1,5,4).
if you put these indices in this order, you will get the sorted vector. Notice how v[2] = 1, v[3] = 2, v[1] = 3, v[5] = 4 and v[4] = 5
also there is a tie handling method in R. If you run rank(c (3, 1, 2, 5, 4, 2)) it will give Rank 1 to 1, since there are two 2 present R will rank them on 2 and 3 but assign Rank 2.5 to each of them, next 3 will get Rank 4.0, so
rank(c (3, 1, 2, 5, 4, 2)) will give you output [4.0 1.0 2.5 6.0 5.0 2.5]
Hope this is helpful.
Related
This question already has answers here:
How to find the largest N elements in a list in R?
(4 answers)
Closed 10 months ago.
I have a vector and I want to find the indices of the k greatest elements, not the elements themselves which I could do with sort. One idea would be to add indices to the values and have a custom sort function that only compares the first elements of pairs (a classical solution to this problem) but surely there has to be a simpler way ? Note that performance isn`t a matter.
First I create a random vector:
vector <- c(1, 3, 6, 2, 7, 8, 10, 4)
Next, you can use the following code which will output the top k elements as x with index ix:
k <- 3
lst <- sort(vector, index.return=TRUE, decreasing=TRUE)
lapply(lst, `[`, lst$x %in% head(unique(lst$x),k))
Output:
$x
[1] 10 8 7
$ix
[1] 7 6 5
As you can see ix gives the index of the top k elements.
Using rank.
x <- c(1, 3, 6, 2, 7, 8, 10, 4)
seq_along(x)[rank(-x) < 4]
# [1] 5 6 7
If you have ties, the result is this:
x <- c(10, 3, 6, 2, 7, 8, 10, 4)
seq_along(x)[rank(-x) < 4]
# [1] 1 6 7
Let's say you have the following vector of numbers:
1, 2, 3, 4, 5
I want to find all possible combinations of numbers with the combination length 3. The combinations must not overlap, i.e. 1, 2, 3 is the same as 1, 3, 2 and only one of those should appear in the output!
So, the answers would be:
1, 2, 3
1, 2, 4
1, 2, 5
1, 3, 4
1, 3, 5
1, 4, 5
2, 3, 4
2, 3, 5
2, 4, 5
3, 4, 5
This is just a simple example, in reality I have a vector of length 10000 and I need to find all combinations with length 8000. What code would you use to generate those combinations in R?
#chinsoon12 suggested the package RcppAlgos. I investigated it and found that the following works:
comboIter(1:10000, 8000)
This question already has answers here:
Count the occurrence of one vector's values in another vector
(2 answers)
Comparing Vectors Values: 1 element with all other
(2 answers)
Closed 4 years ago.
New to R. I have seen a lot of similar questions where tables are used to count the number of occurrences, but I want to create a new vector for each integer in vector_1 (e.g. 1 through 10,), where the number of occurrences of the integer in vector_1 is checked in vector_2, and then returned in a third vector_3.
Desired Result:
vector_1 <- c(1:10)
vector_2 <- c(3, 4, 4, 5, 7, 9, 10)
vector_3 <- c(0, 0, 1, 2, 1, 0, 1, 0, 1, 1)
I have tried using for loops such as:
for (i in 1:10) {
for (j in vector_2) {
print(i) <- vector_3
}
}
Obviously this code doesn't work, but I am just not finding a good way to do a summation of the occurrences between the vectors. Any guidance or alternate approaches would be welcomed.
*Edit: most all answers that I have seen to similar questions use tables to count the occurrences within vector_2; I haven't come across questions that compare the two vectors and then output the result.
Your code doesn't make sense to me. Anyway, you can easily compare each value in vector 1 with each value in vector 2 using outer. rowSums then can give you the required counts.
vector_1 <- c(1:10)
vector_2 <- c(3, 4, 4, 5, 7, 9, 10)
rowSums(outer(vector_1, vector_2, "=="))
#[1] 0 0 1 2 1 0 1 0 1 1
Also you can create a factor variable:
vector_2 <- c(3, 4, 4, 5, 7, 9, 10)
vector_2 <- factor(vector_2,levels = 1:10)
table(vector_2)
I want to identify duplicate cases and number them as a vector (such as with an ID variable). Any case without any direct matches should be labeled as a fixed value (such as zero). Any case with a corresponding duplicate should be labeled 1, with each subsequent case being labeled n+1. So, if I have an ID variable like this 1, 2, 2, 2, 3, 4, 4, 5, I'd want the corresponding vector to produce: 0, 1, 2, 3, 0, 1, 2, 0.
How can I do this?
Duplicate identifies the first case as a non-duplicate, so that doesn't work.
Base R, ave with seq_along
x<-c(1,2,2,2,3,4,4,5)
ave(seq_along(x),x,FUN=function(g) if(length(g)>1) seq_along(g) else 0)
#> 0 1 2 3 0 1 2 0
Assume you have a vector with runs of consecutive values:
v <- c(1, 1, 1, 2, 2, 2, 2, 1, 1, 3, 3, 3, 3)
How can it be best reduced to one value per run and the length of each run. I.e. the first run is 1 repeated two times; 2nd run: 2 repeated four times; 3rd run: 1 repeated two times, and so on:
v.df <- data.frame(value = c(1, 2, 1, 3),
repetitions = c(3, 4, 2, 4))
In a procedural language I might just iterate through a loop and build the data.frame as I go, but with a large dataset in R such an approach is inefficient. Any advice?
or more simply
data.frame(rle(v)[])
with(rle(v), data.frame(values, lengths))
should get you what you need.
values lengths
1 3
2 4
1 2
3 4