Extract every nth element of a vector - r

I would like to create a vector in which each element is the i+6th element of another vector.
For example, in a vector of length 120 I want to create another vector of length 20 in which each element is value i, i+6, i+12, i+18... of the initial vector, i.e. I want to extract every 6th element of the original.

a <- 1:120
b <- a[seq(1, length(a), 6)]

Another trick for getting sequential pieces (beyond the seq solution already mentioned) is to use a short logical vector and use vector recycling:
foo[ c( rep(FALSE, 5), TRUE ) ]

I think you are asking two things which are not necessarily the same
I want to extract every 6th element of
the original
You can do this by indexing a sequence:
foo <- 1:120
foo[1:20*6]
I would like to create a vector in
which each element is the i+6th
element of another vector.
An easy way to do this is to supplement a logical factor with FALSEs until i+6:
foo <- 1:120
i <- 1
foo[1:(i+6)==(i+6)]
[1] 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119
i <- 10
foo[1:(i+6)==(i+6)]
[1] 16 32 48 64 80 96 112

To select every nth element from any starting position in the vector
nth_element <- function(vector, starting_position, n) {
vector[seq(starting_position, length(vector), n)]
}
# E.g.
vec <- 1:12
nth_element(vec, 1, 3)
# [1] 1 4 7 10
nth_element(vec, 2, 3)
# [1] 2 5 8 11

To select every n-th element with an offset/shift of f=0,...,n-1, use
vec[mod(1:length(vec), n)==f]
Of course, you can wrap this in a nice function:
nth_element <- function(vec, interval, offset=0){
vec[mod(1:length(vec), interval)==mod(offset, interval)]
}

Related

Grow vector by certain length and value combinations

I have this vector:
a = c(4,5,6,81,82,83)
My desired result is the following:
b = c(1,2,3,4,5,6,78,79,80,81,82,83)
My logic is: There are two different sequences in a (this can be checked by using length(which(diff(a)>1))+1). Each one of them has to be extended from behind to reach the length of 1:end_of_first_seq (end_of_first_seq = a[which(diff(a)>1))[1]). Thus, in this case the length of each sequence should be 6. Each sequence must therefore grow three steps behind, so 4,5,6 becomes 1,2,3,4,5,6 and 81,82,83 becomes 78,79,80,81,82,83 while all being in the same vector.
Is there any fast way to do this? (this is a simple example, the number of sequences can be higher). It is worth mentioning all "previous" sequences are the same length (in this case, 3) and they are separated by at least two values (a case like 6,7,8,9,10,11 cannot happen). I know I can do this with loops but speed is a factor.
If all sequences have same length:
vec <- c(4,5,6,81,82,83)
LEN <- 3 # sequence length
want <- matrix(vec, ncol = LEN, byrow = TRUE)
want <- cbind(want - LEN, want)
want <- as.vector(t(want))
want
# [1] 1 2 3 4 5 6 78 79 80 81 82 83
We calculate length of each sequence and since all the sequence are of same length we can extract every nth value and create a sequence between two points in every sequence.
length_of_each_seq <- a[which.max(diff(a)>1)]
n <- 3
vals <- a[seq(n, length(a), by = n)]
c(mapply(`:`, vals - (length_of_each_seq - 1), vals))
#[1] 1 2 3 4 5 6 78 79 80 81 82 83
where vals is the end of sequence
vals
#[1] 6 83
and vals - (length_of_each_seq - 1) is from where we need to start
vals - (length_of_each_seq - 1)
#[1] 1 78

Return vector from list using seq [duplicate]

I would like to create a vector in which each element is the i+6th element of another vector.
For example, in a vector of length 120 I want to create another vector of length 20 in which each element is value i, i+6, i+12, i+18... of the initial vector, i.e. I want to extract every 6th element of the original.
a <- 1:120
b <- a[seq(1, length(a), 6)]
Another trick for getting sequential pieces (beyond the seq solution already mentioned) is to use a short logical vector and use vector recycling:
foo[ c( rep(FALSE, 5), TRUE ) ]
I think you are asking two things which are not necessarily the same
I want to extract every 6th element of
the original
You can do this by indexing a sequence:
foo <- 1:120
foo[1:20*6]
I would like to create a vector in
which each element is the i+6th
element of another vector.
An easy way to do this is to supplement a logical factor with FALSEs until i+6:
foo <- 1:120
i <- 1
foo[1:(i+6)==(i+6)]
[1] 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119
i <- 10
foo[1:(i+6)==(i+6)]
[1] 16 32 48 64 80 96 112
To select every nth element from any starting position in the vector
nth_element <- function(vector, starting_position, n) {
vector[seq(starting_position, length(vector), n)]
}
# E.g.
vec <- 1:12
nth_element(vec, 1, 3)
# [1] 1 4 7 10
nth_element(vec, 2, 3)
# [1] 2 5 8 11
To select every n-th element with an offset/shift of f=0,...,n-1, use
vec[mod(1:length(vec), n)==f]
Of course, you can wrap this in a nice function:
nth_element <- function(vec, interval, offset=0){
vec[mod(1:length(vec), interval)==mod(offset, interval)]
}

select multiple head() and tail() values in a vector

I have a vector as follows:
v <- c(1,3,4,5,6,7,8,9,NA,NA,NA,NA,27,25,30,41,NA,NA)
How can I extract the values 1, 9, 27 and 41 (i. e. the first and last position of each subset without NAs)?
I thought about using head(v, 1) and tail(v, 1) in combination. However I don't have an idea how to 'stop' at the NAs and restart again after them.
We create a grouping variable with rleid based on the logical vector (is.na(v)), use that in tapply to select the first and last values of each group, unlist the list output, remove the NA elements with na.omit and remove the attributes with c.
library(data.table)
c(na.omit(unlist(tapply(v, rleid(is.na(v)), function(x) c(x[1],
x[length(x)])), use.names=FALSE)))
#[1] 1 9 27 41
Or another option is rle from base R
v[with(rle(!is.na(v)), {
i1 <- cumsum(lengths)
i2 <- lengths[values]
c(rbind(i1[values] - i2 + 1 , i1[values]))
})]
#[1] 1 9 27 41
Another possible solution via base R could be to split based on NA entries in the vector, lapply the head and tail functions and remove NA's, i.e.
ind <- unname(unlist(lapply(split(v, cumsum(c(1, diff(is.na(v)) != 0))), function(i)
c(head(i, 1), tail(i, 1)))))
ind[!is.na(ind)]
#[1] 1 9 27 41
A base R solution:
x = na.omit( v[is.na(c(NA,diff(v))) | is.na(c(diff(v),NA))] )
> as.numeric(x)
# [1] 1 9 27 41

R expand.grid with row restrictions

I have a numeric vector x of length N and would like to create a vector of the within-set sums of all of the following sets: any possible combination of the x elements with at most M elements in each combination. I put together a slow iterative approach; what I am looking for here is a way without using any loops.
Consider the approach I have been taking, in the following example with N=5 and M=4
M <- 4
x <- 11:15
y <- as.matrix(expand.grid(rep(list(0:1), length(x))))
result <- y[rowSums(y) <= M, ] %*% x
However, as N gets large (above 22 for me), the expand.grid output becomes too big and gives an error (replace x above with x <- 11:55 to observe this). Ideally there would be an expand.grid function that permits restrictions on the rows before constructing the full matrix, which (at least for what I want) would keep the matrix size within memory limits.
Is there a way to achieve this without causing problems for large N?
Your problem has to do with the sheer amount of combinations.
What you appear to be doing is listing all different combinations of 0's and 1's in a sequence of length of x.
In your example x has length 5 and you have 2^5=32 combinations
When x has length 22 you have 2^22=4194304 combinations.
Couldn't you use a binary encoding instead?
In your case that would mean
0 stands for 00000
1 stands for 00001
2 stands for 00010
3 stands for 00011
...
It will not solve your problem completely, but you should be able to get a bit further than now.
Try this:
c(0, unlist(lapply(1:M, function(k) colSums(combn(x, k)))))
It generates the same result as with your expand.grid approach, shown below for the test data.
M <- 4
x <- 11:15
# expand.grid approach
y <- as.matrix(expand.grid(rep(list(0:1), length(x))))
result <- y[rowSums(y) <= M, ] %*% x
# combn approach
result1 <- c(0, unlist(lapply(1:M, function(k) colSums(combn(x, k)))))
all(sort(result[,1]) == sort(result1))
# [1] TRUE
This should be fast (it takes 0.227577 secs on my machine, with N=22, M=4):
x <- 1:22 # N = 22
M <- 4
c(0, unlist(lapply(1:M, function(k) colSums(combn(x, k)))))
# [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 3 4 5 6 7
you may want to choose the unique values of the sums with
unique(c(0, unlist(lapply(1:M, function(k) colSums(combn(x, k))))))

Create master list adding iterations of values in another list with known interval in R

I have a solution that works, but would appreciate ideas to improve the code to avoid using loops if possible.
I have a list of values, this is read in from a csv file, but takes the form
startingvalues = c(1, 7, 20, 32, 47)
I want to create a new list, that reads in each of these starting values and adds the next 2 (or 7 or 15 etc.) numbers, then goes to the next. For the above example this would be
newlist = c(1,2,3,7,8,9,20,21,22,32,33,34,47,48,49)
I have code that works, but I suspect there is a more elegant way to do this. I am not particularly worried about speed but would like to avoid the loop if there is a better way to do this.
newlist = c() # initialise an empty list
for (i in 1:length(startingvalues){
list1 = seq(startingvalues[i,1],startingvalues[i,1]+2, by = 1)
newlist = c(newlist,list1)
}
Any suggestions to improve my coding would be appreciated. This may be the best way to do this, however I suspect it isn't.
How about something like this
extend <- function(x,y) unlist(lapply(x, seq.int, length.out=y+1))
extend(startingvalues, 2)
# [1] 1 2 3 7 8 9 20 21 22 32 33 34 47 48 49
The first parameter is the vector of numbers and the second is how far you want to extend each number. We just us an lapply for the iteration and unlist the thing in the end. This is better than appending at each iteration which is not very efficient.
Here's another alternative
extend <- function(x,y) c(outer(0:y, x, `+`))
The outer() will build a matrix but we coerce back to a vector with c().
We can use rep with + to get the expected output
unique(sort(startingvalues + rep(0:2, each = length(startingvalues))))
#[1] 1 2 3 7 8 9 20 21 22 32 33 34 47 48 49
Or as #thelatemail mentioned replicating the 'startingvalues' and make use of the recycling would be better as sort can be avoided
s1 <- 0:2
rep(startingvalues, each=length(s1)) + s1
#[1] 1 2 3 7 8 9 20 21 22 32 33 34 47 48 49

Resources