Consecutive Sum of a Vector - r

This is a question following a previous one. In that question, it is suggested to use rollapply to calculate sum of the 1st, 2nd, 3rd entry of a vector; then 2nd, 3rd, 4th; and so on.
My question is how calculate sum of the 1st, 2nd and 3rd; then the 4th, 5th and 6th. That is, rolling without overlapping. Can this be easily done, please?

Same idea. You just need to specify the by argument. Default is 1.
x <-c(1, 5, 4, 5, 7, 8, 9, 2, 1)
zoo::rollapply(x, 3, by = 3, sum)
#[1] 10 20 12
#or another Base R option
sapply(split(x, ceiling(seq_along(x)/3)), sum)
# 1 2 3
#10 20 12

Using tapply in base R:
set.seed(1)
vec <- sample(10, 20, replace = TRUE)
#[1] 3 4 6 10 3 9 10 7 7 1 3 2 7 4 8 5 8 10 4 8
unname(tapply(vec, (seq_along(vec)-1) %/% 3, sum))
# [1] 13 22 24 6 19 23 12
Alternatively,
colSums(matrix(vec[1:(ceiling(length(vec)/3)*3)], nrow = 3), na.rm = TRUE)
#[1] 13 22 24 6 19 23 12
vec[1:(ceiling(length(vec)/3)*3)] fills in the vector with NA if the length is not divisible by 3. Then, you simply ignore NAs in colSums.
Yet another one using cut and aggregate:
x <- ceiling(length(vec)/3)*3
df <- data.frame(vec=vec[1:x], col=cut(1:x, breaks = seq(0,x,3)))
aggregate(vec~col, df, sum, na.rm = TRUE)[[2]]
#[1] 13 22 24 6 19 23 12

We can use roll_sum from RcppRoll which would be very efficient
library(RcppRoll)
roll_sum(x, n=3)[c(TRUE, FALSE, FALSE)]
#[1] 10 20 12
data
x <-c(1, 5, 4, 5, 7, 8, 9, 2, 1)

you can define the window size, and do:
x <-c(1, 5, 4, 5, 7, 8, 9, 2, 1)
n <- 3
diff(c(0, cumsum(x)[slice.index(x, 1)%%n == 0]))
p.s. using the input from the answer by #Sotos

Related

Is there an easy way of performing arithmetic on elements in a vector in R?

The input can be a vector of numbers or a string, and the output is the addition of the number 1 plus that element's position in the string of digits.
myFunciton(c(4,10))
[1] 5, 12
myFunction(1:10)
[1] 2, 4, 6, 8, 10, 12, 14, 16, 18, 20
You can use :
myfunction <- function(x) x + seq_along(x)
myfunction(c(4, 10))
#[1] 5 12
myfunction(1:10)
#[1] 2 4 6 8 10 12 14 16 18 20

How to condense non-sequential integers?

I'm trying to condense non-sequential numbers to subset haplotype data. I could do it manually, but given that I've got hundreds to do, I'd rather not if there's an alternative
class(haplotype1[[1]])
#[1] "integer"
haplotype1[[1]]
#[1] 1 2 3 4 5 7 8 9 10 11
I want to get [1:5, 7:11], which seems simple, but I haven't found a solution exactly matching my problem
Thanks!
Using cumsum to create the sequential groups,
tapply(x, cumsum(c(TRUE, diff(x) != 1)), FUN = function(i)paste(i[1], i[length(i)], sep = ':'))
# 1 2
#"1:5" "7:11"
It's unclear what type of object you want to create. I would just store the start and end values.
x <- c(1, 2, 3, 4, 5, 7, 8, 9, 10, 11)
starts <- x[!c(FALSE, diff(x) == 1L)]
#[1] 1 7
ends <- x[!c(diff(x) == 1L, FALSE)]
#[1] 5 11
paste(starts, ends, sep = ":")
#[1] "1:5" "7:11"
Maybe you want something like this ?
vec <- c(1, 2, 3, 4, 5, 7, 8, 9, 10, 11)
split(vec, cumsum(c(1,diff(vec)>1)))
# $`1`
# [1] 1 2 3 4 5
#
# $`2`
# [1] 7 8 9 10 11

Find values that are between list of numbers

I have two list of numbers like below.
x <- c(1, 5, 10, 17, 21, 30)
y <- c(2, 7, 19)
In my dataset, x divides 1 to 30 in different segments (from 1-5, 5-10, 10-17, 17-21, 21-30). Would it be possible to match these segments to numbers in y? (In this case, I'd want to get c(1,5,17) as an output because 2 is between 1 and 5, 7 is between 5 and 10, and 19 is in between 17 and 21.)
?findInterval to the rescue:
x[findInterval(y,x)]
#[1] 1 5 17
Using cut is another option
cut(y, breaks = x, labels = x[-length(x)])
#[1] 1 5 17
Could be also done with labels = FALSE
x[cut(y, breaks = x, labels = FALSE)]
#[1] 1 5 17
You can do this with sapply and a simple function
sapply(y, function(a) x[max(which(x<a))])
[1] 1 5 17

Order function returns different values as vector

I want to rearrange this vector decreasingly:
x <- c(10, 10, 7, 3, 6, 2, 2, 7, 8, 1, 3, 1, 1, 1, 5, 5, 5, 4, 4, 2, 1, 4, 4, 3, 3, 2, 2, 1)
order(x)
But it returns numbers which are different:
## [1] 10 12 13 14 21 28 6 7 20 26 27 4 11 24 25 18 19 22 23 15 16 17 5 3 8 9 1 2
order function returns permutation, not sorted vector:
http://stat.ethz.ch/R-manual/R-patched/library/base/html/order.html
> x <- c (3, 2, 4, 1)
> order(x)
[1] 4 2 1 3
The result (4 2 1 3) means that the smallest item is the 4th (that's 1), the second smallest is the 2nd (2)... and the biggest is the 3d item (which is 4)
if you want to sort the vector, use sort function:
> sort(x)
[1] 1 2 3 4
To sort in decreasing order specify decreasing parameter:
> sort(x, decreasing = TRUE)
[1] 4 3 2 1
order(x) returns indices of the elements of x in increasing order. You'll note that the smallest element of x is in 10th position in x, the second smallest (actually just as small) is at position 12 and so on.
to get x in decreasing order you can either use
sort(x, decreasing=TRUE)
or use order as an index:
x[order(-x)]
(why -x? Because order returns indices in increasing order. by flipping the numbers around zero you get the indices in decreasing order)

Map numbers to smallest in a vector of numbers in R

Given a vector of numbers, I'd like to map each to the smallest in a separate vector that the number does not exceed. For example:
# Given these
v1 <- 1:10
v2 <- c(2, 5, 11)
# I'd like to return
result <- c(2, 2, 5, 5, 5, 11, 11, 11, 11, 11)
Try
cut(v1, c(0, v2), labels = v2)
[1] 2 2 5 5 5 11 11 11 11 11
Levels: 2 5 11
which can be converted to a numeric vector using as.numeric(as.character(...)).
Another way (Thanks for the edit #Ananda)
v2[findInterval(v1, v2 + 1) + 1]
# [1] 2 2 5 5 5 11 11 11 11 11]

Resources