Assume you have a vector with runs of consecutive values:
v <- c(1, 1, 1, 2, 2, 2, 2, 1, 1, 3, 3, 3, 3)
How can it be best reduced to one value per run and the length of each run. I.e. the first run is 1 repeated two times; 2nd run: 2 repeated four times; 3rd run: 1 repeated two times, and so on:
v.df <- data.frame(value = c(1, 2, 1, 3),
repetitions = c(3, 4, 2, 4))
In a procedural language I might just iterate through a loop and build the data.frame as I go, but with a large dataset in R such an approach is inefficient. Any advice?
or more simply
data.frame(rle(v)[])
with(rle(v), data.frame(values, lengths))
should get you what you need.
values lengths
1 3
2 4
1 2
3 4
Related
This question already has answers here:
rank and order in R
(7 answers)
Closed 4 years ago.
What is the difference between sort(), rank(), and order() in R.
Can you explain with examples?
sort() sorts the vector in an ascending order.
rank() gives the respective rank of the numbers present in the vector, the smallest number receiving the rank 1.
order() returns the indices of the vector in a sorted order.
for example: if we apply these functions are applied to the vector - c (3, 1, 2, 5, 4)
sort(c (3, 1, 2, 5, 4)) will give c(1,2,3,4,5)
rank(c (3, 1, 2, 5, 4)) will give c(3,1,2,5,4)
order(c (3, 1, 2, 5, 4)) will give c(2,3,1,5,4).
if you put these indices in this order, you will get the sorted vector. Notice how v[2] = 1, v[3] = 2, v[1] = 3, v[5] = 4 and v[4] = 5
also there is a tie handling method in R. If you run rank(c (3, 1, 2, 5, 4, 2)) it will give Rank 1 to 1, since there are two 2 present R will rank them on 2 and 3 but assign Rank 2.5 to each of them, next 3 will get Rank 4.0, so
rank(c (3, 1, 2, 5, 4, 2)) will give you output [4.0 1.0 2.5 6.0 5.0 2.5]
Hope this is helpful.
Let's say you have the following vector of numbers:
1, 2, 3, 4, 5
I want to find all possible combinations of numbers with the combination length 3. The combinations must not overlap, i.e. 1, 2, 3 is the same as 1, 3, 2 and only one of those should appear in the output!
So, the answers would be:
1, 2, 3
1, 2, 4
1, 2, 5
1, 3, 4
1, 3, 5
1, 4, 5
2, 3, 4
2, 3, 5
2, 4, 5
3, 4, 5
This is just a simple example, in reality I have a vector of length 10000 and I need to find all combinations with length 8000. What code would you use to generate those combinations in R?
#chinsoon12 suggested the package RcppAlgos. I investigated it and found that the following works:
comboIter(1:10000, 8000)
This question already has answers here:
Count the occurrence of one vector's values in another vector
(2 answers)
Comparing Vectors Values: 1 element with all other
(2 answers)
Closed 4 years ago.
New to R. I have seen a lot of similar questions where tables are used to count the number of occurrences, but I want to create a new vector for each integer in vector_1 (e.g. 1 through 10,), where the number of occurrences of the integer in vector_1 is checked in vector_2, and then returned in a third vector_3.
Desired Result:
vector_1 <- c(1:10)
vector_2 <- c(3, 4, 4, 5, 7, 9, 10)
vector_3 <- c(0, 0, 1, 2, 1, 0, 1, 0, 1, 1)
I have tried using for loops such as:
for (i in 1:10) {
for (j in vector_2) {
print(i) <- vector_3
}
}
Obviously this code doesn't work, but I am just not finding a good way to do a summation of the occurrences between the vectors. Any guidance or alternate approaches would be welcomed.
*Edit: most all answers that I have seen to similar questions use tables to count the occurrences within vector_2; I haven't come across questions that compare the two vectors and then output the result.
Your code doesn't make sense to me. Anyway, you can easily compare each value in vector 1 with each value in vector 2 using outer. rowSums then can give you the required counts.
vector_1 <- c(1:10)
vector_2 <- c(3, 4, 4, 5, 7, 9, 10)
rowSums(outer(vector_1, vector_2, "=="))
#[1] 0 0 1 2 1 0 1 0 1 1
Also you can create a factor variable:
vector_2 <- c(3, 4, 4, 5, 7, 9, 10)
vector_2 <- factor(vector_2,levels = 1:10)
table(vector_2)
I need a function, which checks for the frequency of values per row in a df, then checks whether one of the values appears 6 or more times, and if so, displays this value in a new column. If not, writes "nope" in the same new column instead.
In the example below: The values in the rows are either 1, 2, or 3. So if one of the values 1,2,or3 appears 6 or more times per row, whichever value that is (1,2,or3) has to appear in a new column. If none of the values appear 6 or more times per row, the value in that same new column should be "nope".
example
Try applying the table function for each row using
make_count_col <- function(x) {
cnt <- apply(x, 1, table)
x$newcolumn <- apply(cnt, 2, function(y) {
if (max(y, na.rm = T) < 6)
out <- 'nope'
else
out <- names(y)[which.max(y)]
out
})
x
}
Your example replicated
x <- as.data.frame(matrix(c(1, 2, 1, 2, 2, 2, 2, 2, 3,
2, 3, 1, 1, 3, 2, 1, 1, 3), nrow = 2, byrow = T))
colnames(x) <- paste0('svo', 1:9)
make_count_col(x)
svo1 svo2 svo3 svo4 svo5 svo6 svo7 svo8 svo9 newcolumn
1 2 1 2 2 2 2 2 3 2
2 3 1 1 3 2 1 1 3 nope
I want to identify duplicate cases and number them as a vector (such as with an ID variable). Any case without any direct matches should be labeled as a fixed value (such as zero). Any case with a corresponding duplicate should be labeled 1, with each subsequent case being labeled n+1. So, if I have an ID variable like this 1, 2, 2, 2, 3, 4, 4, 5, I'd want the corresponding vector to produce: 0, 1, 2, 3, 0, 1, 2, 0.
How can I do this?
Duplicate identifies the first case as a non-duplicate, so that doesn't work.
Base R, ave with seq_along
x<-c(1,2,2,2,3,4,4,5)
ave(seq_along(x),x,FUN=function(g) if(length(g)>1) seq_along(g) else 0)
#> 0 1 2 3 0 1 2 0