I have bunch of observations
x = c(1, 2, 4, 1, 6, 7, 11, 11, 12, 13, 14)
that I want to turn into the group:
y = c(1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3)
I.e I want the first 5 integers (1 to 5) to constitute one group, the next 5 integers to constitute the next group (6 to 10), and so on.
Is there a straightforward way to accomplish this without a loop?
Clarification: I need to programmatically create the groups form the input vector (x)
We can use %/% to create the group
x%/%5+1
#[1] 1 1 1 1 2 2 3 3 3 3 3
You can use ceiling to create groups
ceiling(x/5)
# [1] 1 1 1 1 2 2 3 3 3 3 3
Related
This question is related to this identify whenever values repeat in r
While searching for answer there this new question arose:
I have this vector:
vector <- c(1, 1, 2, 3, 5, 6, 6, 7, 1, 1, 1, 1, 2, 3, 3)
I would like to identify each consecutive (by 1) integer sequence e.g. 1,2,3,.. or 3,4,5,.. or 4,5,6,7,...
BUT
It should allow ties 1,1,2,3,.. or 3,3,4,5,... or 4,5,5,6,6,7
The expected output would be a list like:
sequence1 <- c(1, 1, 2, 3)
sequence2 <- c(5, 6, 6, 7)
sequence3 <- c(1, 1, 1, 1, 2, 3, 3)
So far the nearest approach I found here Check whether vector in R is sequential?, but could not transfer it to what I want.
An option is with diff and cumsum
split(vector, cumsum(c(TRUE, abs(diff(vector)) > 1)))
-output
`1`
[1] 1 1 2 3
$`2`
[1] 5 6 6 7
$`3`
[1] 1 1 1 1 2 3 3
I have a vector of the following form:-
a <- c(4, 6, 3, 6, 1)
What I want is to make a vector such that it has the index of the vector a the number of times the value of that index in vector a.
Like the first index has value 4, so there should be 4 ones, followed by 6 twos, followed by 3 threes, and so on.
Then resulting vector should be of the following form:-
b <- c(1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5)
Thanks in advance.
We can use rep as :
a <- c(4, 6, 3, 6, 1)
rep(seq_along(a), a)
#[1] 1 1 1 1 2 2 2 2 2 2 3 3 3 4 4 4 4 4 4 5
We can use sequence
cumsum(sequence(a) == 1)
#[1] 1 1 1 1 2 2 2 2 2 2 3 3 3 4 4 4 4 4 4 5
Or using uncount
library(dplyr)
library(tidyr)
tibble(a) %>%
mutate(rn = row_number()) %>%
uncount(a)
Given two sorted vectors, how can you get the index of the closest values from one onto the other.
For example, given:
a = 1:20
b = seq(from=1, to=20, by=5)
how can I efficiently get the vector
c = (1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4)
which, for each value in a, provides the index of the largest value in b that is less than or equal to it. But the solution needs to work for unpredictable (though sorted) contents of a and b, and needs to be fast when a and b are large.
You can use findInterval, which constructs a sequence of intervals given by breakpoints in b and returns the interval indices in which the elements of a are located (see also ?findInterval for additional arguments, such as behavior at interval boundaries).
a = 1:20
b = seq(from = 1, to = 20, by = 5)
findInterval(a, b)
#> [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
We can use cut
as.integer(cut(a, breaks = unique(c(b-1, Inf)), labels = seq_along(b)))
#[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
I have two vectors
a <- c(1, 5, 2, 1, 2, 3, 3, 4, 5, 1, 2)
b <- (1, 2, 3, 4, 5, 6)
I want to know how many times each element in b occurs in a. So the result should be
c(3, 3, 2, 1, 2, 0)
All methods I found like match(),==, %in% etc. are not suited for entire vectors. I know I can use a loop over all elements in b,
for (i in 1:length(b)) {
c[I] <- sum(a==b, na.rm=TRUE)
}
but this is used often and takes to long. That's why I'm looking for a vectorized way, or a way to use apply().
You can do this using factor and table
table(factor(a, unique(b)))
#
#1 2 3 4 5 6
#3 3 2 1 2 0
Since you mentioned match, here is a possibility without sapply loop (thanks to #thelatemail)
table(factor(match(a, b), unique(b)))
#
#1 2 3 4 5 6
#3 3 2 1 2 0
Here is a base R option, using sapply with which:
a <- c(1, 5, 2, 1, 2, 3, 3, 4, 5, 1, 2)
b <- c(1, 2, 3, 4, 5, 6)
sapply(b, function(x) length(which(a == x)))
[1] 3 3 2 1 2 0
Demo
Here is a vectorised method
x = expand.grid(b,a)
rowSums( matrix(x$Var1 == x$Var2, nrow = length(b)))
# [1] 3 3 2 1 2 0
I'm working on a Monte-Carlo simulation type problem and need to generate a vector of repeated random numbers, with the matching numbers grouped together, but in random order.
It's easier to explain with an example. If I had:
1, 3, 7, 12, 1, 3, 7, 12, 1, 3, 7, 12
I would like it sorted as:
7, 7, 7, 3, 3, 3, 12, 12, 12, 1, 1, 1 (or with the groups of matching numbers in any order but ascending/descending).
The reason I need the random order is because my MC simulation is for 2 variables, so if both are in order they won't vary independently.
I've got as far as:
sort(rep(runif(50,1,10),10), decreasing = FALSE)
Which generates 50 random numbers between 1 and 10, repeats each 10 times, then sorts the 50 groups of 10 matching random numbers in ascending order (or it could easily be descending order if I changed "FALSE" to "TRUE"). I just can't figure out the last step of getting 50 groups of 10 matching numbers in random order. Can anyone help?
Here is one option with split
unlist(sample(split(v1, v1)), use.names = FALSE)
#[1] 3 3 3 1 1 1 12 12 12 7 7 7
Or another option is match with unique
v1[order(match(v1, sample(unique(v1))))]
data
v1 <- c(1, 3, 7, 12, 1, 3, 7, 12, 1, 3, 7, 12)
An option could be as:
v <- c(1, 3, 7, 12, 1, 3, 7, 12, 1, 3, 7, 12)
lst <- split(v, unique(v))
sapply(sample(seq(length(lst)),length(lst)), function(i)lst[[i]])
# [,1] [,2] [,3] [,4]
#[1,] 3 12 7 1
#[2,] 3 12 7 1
#[3,] 3 12 7 1
#OR for having just a vector
as.vector(sapply(sample(seq(length(lst)),length(lst)), function(i)lst[[i]]))
#[1] 3 3 3 12 12 12 7 7 7 1 1 1