Rolling sum in R - r

df <- data.frame(x = seq(1:10))
I want this:
df$y <- c(1, 2, 3, 4, 5, 15, 20 , 25, 30, 35)
i.e. each y is the sum of previous five x values. This implies the first
five y will be same as x
What I get is this:
df$y1 <- c(df$x[1:4], RcppRoll::roll_sum(df$x, 5))
x y y1
1 1 1
2 2 2
3 3 3
4 4 4
5 5 15
6 15 20
7 20 25
8 25 30
9 30 35
10 35 40
In summary, I need y but I am only able to achieve y1

1) enhanced sum function Define a function Sum which sums its first 5 values if it receives 6 values and returns the last value otherwise. Then use it with partial=TRUE in rollapplyr:
Sum <- function(x) if (length(x) < 6) tail(x, 1) else sum(head(x, -1))
rollapplyr(x, 6, Sum, partial = TRUE)
## [1] 1 2 3 4 5 15 20 25 30 35
2) sum 6 and subtract off original Another possibility is to take the running sum of 6 elements filling in the first 5 elements with NA and subtracting off the original vector. Finally fill in the first 5.
replace(rollsumr(x, 6, fill = NA) - x, 1:5, head(x, 5))
## [1] 1 2 3 4 5 15 20 25 30 35
3) specify offsets A third possibility is to use the offset form of width to specify the prior 5 elements:
c(head(x, 5), rollapplyr(x, list(-(1:5)), sum))
## [1] 1 2 3 4 5 15 20 25 30 35
4) alternative specification of offsets In this alternative we specify an offset of 0 for each of the first 5 elements and offsets of -(1:5) for the rest.
width <- replace(rep(list(-(1:5)), length(x)), 1:5, list(0))
rollapply(x, width, sum)
## [1] 1 2 3 4 5 15 20 25 30 35
Note
The scheme for filling in the first 5 elements seems quite unusual and you might consider using partial sums for the first 5 with NA or 0 for the first one since there are no prior elements fir that one:
rollapplyr(x, list(-(1:5)), sum, partial = TRUE, fill = NA)
## [1] NA 1 3 6 10 15 20 25 30 35
rollapplyr(x, list(-(1:5)), sum, partial = TRUE, fill = 0)
## [1] 0 1 3 6 10 15 20 25 30 35
rollapplyr(x, 6, sum, partial = TRUE) - x
## [1] 0 1 3 6 10 15 20 25 30 35

A simple approach would be:
df <- data.frame(x = seq(1:10))
mysum <- function(x, k = 5) {
res <- rep(NA, length(x))
for (i in seq_along(x)) {
if (i <= k) { # edited ;-)
res[i] <- x[i]
} else {
res[i] <- sum(x[(i-k):(i-1)])
}
}
res
}
mysum(df$x)
# [1] 1 2 3 4 5 15 20 25 30 35

mysum <- function(x, k = 5) {
res <- x[1:k]
append<-sapply(2:(len(x)+1-k),function(i) sum(x[i:(i+k-1)]))
return(c(res,append))
}
mysum(df$x)

Related

R how to find a series of common values in a vector (identifying growing season)

I'm looking for a way to identify a growing season which consists of a number of days greater than say 60 between the last frost day of spring and the first frost day in the fall. A general version of this problem is this. If I have a vector of numbers like testVec, I want the item numbers of the beginning and end range of values where the number of items is 5 or greater and all of them are greater than 0.
testVec <- c(1,3,4,0, 1, -5, 6, 0, 1,3,4,6,7,5,9, 0)
In this example, the relevant range is 1,3,4,6,7,5,9 which is testVec[9] to testVec[15]
One option could be:
testVec[with(rle(testVec > 0), rep(lengths * values >= 5, lengths))]
[1] 1 3 4 6 7 5 9
Here, the idea is to, first, create runs of values that are smaller or equal to zero and bigger than zero. Second, it checks whether the runs of values bigger than zero are of length 5 or more. Finally, it subsets the original vector for the runs of values bigger than zero with length 5 or more.
1) rleid This also handles any number of sequences including zero. rleid(ok) is a vector the same length as ok such that the first run of identical elements is replaced with 1, the second run with 2 and so on. The result is a list of vectors where each vector has its positions in the original input as its names.
library(data.table)
getSeq <- function(x) {
names(x) <- seq_along(x)
ok <- x > 0
s <- split(x[ok], rleid(ok)[ok])
unname(s)[lengths(s) >= 5]
}
getSeq(testVec)
## [[1]]
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
getSeq(numeric(16))
## list()
getSeq(c(testVec, 10 * testVec))
## [[1]]
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
##
## [[2]]
## 25 26 27 28 29 30 31
## 10 30 40 60 70 50 90
If a data frame were desired then following gives the values and which sequence the row came from. The row names indicate the positions in the original input.
gs <- getSeq(c(testVec, 10 * testVec))
names(gs) <- seq_along(gs)
if (length(gs)) stack(gs) else gs
## values ind
## 9 1 1
## 10 3 1
## 11 4 1
## 12 6 1
## 13 7 1
## 14 5 1
## 15 9 1
## 25 10 2
## 26 30 2
## 27 40 2
## 28 60 2
## 29 70 2
## 30 50 2
## 31 90 2
2) gregexpr Replace each element that is > 0 with 1 and each other element with 0 pasting the 0's and 1's into a single character string. Then use gregexpr to look for sequences of 1's at least 5 long and for the ith such nonoverlapping sequence return the first positions, g, and lengths, attr(g, "match.length"). Define a function vals which extracts the values at the required positions from testVec of the ith such nonoverlapping sequence returning a list such that the ith component of the list is the ith such sequence. The names in the output vector are its positions in the input.
getSeq2 <- function(x) {
g <- gregexpr("1{5,}", paste(+(x > 0), collapse = ""))[[1]]
vals <- function(i) {
ix <- seq(g[i], length = attr(g, "match.length")[i])
setNames(x[ix], ix)
}
if (length(g) == 1 && g == -1) list() else lapply(seq_along(g), vals)
}
getSeq2(testVec)
## [[1]]
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
The above handles any number of sequences including 0 but if we knew there were exactly one sequence (which is the case for the example in the question) then it could be simplified to the following where the return value is just that vector:
g <- gregexpr("1{5,}", paste(+(testVec > 0), collapse = ""))[[1]]
ix <- seq(g, length = attr(g, "match.length"))
setNames(testVec[ix], ix)
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
You could "fix" #tmfmnk's solution like this:
f1 <- function(x, threshold, n) {
range(which(with(rle(x > threshold), rep(lengths * values >= n, lengths))))
}
x <- c(1, 3, 4, 0, 1, -5, 6, 0, 1,3,4,6,7,5,9, 0)
f1(x, 0, 5)
#[1] 9 15
But that does not work well when there are multiple runs
xx <- c(x, x)
f1(xx, 0, 5)
#[1] 9 31
Here is another, not so concise approach that returns the start and end of the longest run (the first one if there are ties).
f2 <- function(x, threshold, n) {
y <- x > threshold
y[is.na(y)] <- FALSE
a <- ave(y, cumsum(!y), FUN=cumsum)
m <- max(a)
if (m < n) return (c(NA, NA))
i <- which(a == m)[1]
c(i-m+1, i)
}
f2(x, 0, 5)
#[1] 9 15
f2(xx, 0, 5)
#[1] 9 15
or with rle
f3 <- function(x, threshold, n) {
y <- x > threshold
r <- rle(y)
m <- max(r$lengths)
if (m < n) return (c(NA, NA))
i <- sum(r$lengths[1:which.max(r$lengths)[1]])
c(i-max(r$lengths)+1, i)
}
f3(x, 0, 5)
#[1] 9 15
f3(xx, 0, 5)
#[1] 9 15
If you wanted the first run that is at least n, that is you do not want a next run, even if it is longer, you could do
f4 <- function(x, threshold, n) {
y <- with(rle(x > threshold), rep(lengths * values >= n, lengths))
i <- which(y)[1]
j <- i + which(!y[-c(1:i)])[1] - 1
c(i, j)
}

Piece-wise function in R

I am trying to make a piece-wise function. This is a really basic one. I want y to be a list of values (preferably not just a list of integers but a list of real numbers like (1.34, 20.92) in the future).
How might I make a piece-wise function?
y <- 1:10
if (y < 2){
print("CAN'T COMPUTE")
} else if (y >= 2 & y < 6){
print(y^2)
} else {
print(y * 2)
}
Let me give it a try:
library("dplyr")
y <- 1:10
y %>%
as_tibble() %>%
mutate(res = case_when(y < 2 ~ "CAN'T COMPUTE",
y >= 2 & y < 6 ~ as.character(y^2),
TRUE ~ as.character(y*2)))
Here's the results:
# A tibble: 10 x 2
value res
<int> <chr>
1 1 CAN'T COMPUTE
2 2 4
3 3 9
4 4 16
5 5 25
6 6 12
7 7 14
8 8 16
9 9 18
10 10 20
Here are a some base R approaches. We have used NA instead of a character string in order to produce a numeric vector result. The first uses a nested ifelse. The second uses a single ifelse to select between NA and the other values and computes the other values using a formula. The third computes which leg of the result is wanted (1, 2 or 3) and then uses switch to select that leg. The fourth is a variation of three that uses findInterval to compute the leg number.
ifelse(y < 2, NA, ifelse(y < 6, y^2, 2*y))
## [1] NA 4 9 16 25 12 14 16 18 20
ifelse(y < 2, NA, (y < 6) * y^2 + (y >= 6) * 2*y)
## [1] NA 4 9 16 25 12 14 16 18 20
mapply(switch, 1 + (y >= 2) + (y >= 6), NA, y^2, 2*y)
## [1] NA 4 9 16 25 12 14 16 18 20
mapply(switch, findInterval(y, c(-Inf, 2, 6, Inf), left.open = FALSE), NA, y^2, 2*y)
## [1] NA 4 9 16 25 12 14 16 18 20

Extract multiple ranges from a numeric vector

First, I simplify my question. I want to extract certain ranges from a numeric vector. For example, extracting 3 ranges from 1:20 at the same time :
1 < x < 5
8 < x < 12
17 < x < 20
Therefore, the expected output is 2, 3, 4, 9, 10, 11, 18, 19.
I try to use the function findInterval() and control arguments rightmost.closed and left.open to do that, but any arguments sets cannot achieve the goal.
x <- 1:20
v <- c(1, 5, 8, 12, 17, 20)
x[findInterval(x, v) %% 2 == 1]
# [1] 1 2 3 4 8 9 10 11 17 18 19
x[findInterval(x, v, rightmost.closed = T) %% 2 == 1]
# [1] 1 2 3 4 8 9 10 11 17 18 19 20
x[findInterval(x, v, left.open = T) %% 2 == 1]
# [1] 2 3 4 5 9 10 11 12 18 19 20
By the way, the conditions can also be a matrix like that :
[,1] [,2]
[1,] 1 5
[2,] 8 12
[3,] 17 20
I don't want to use for loop if it's not necessary.
I am grateful for any helps.
I'd probably do it using purrr::map2 or Map, passing your lower-bounds and upper-bounds as arguments and filtering your dataset with a custom function
library(purrr)
x <- 1:20
lower_bounds <- c(1, 8, 17)
upper_bounds <- c(5, 12, 20)
map2(
lower_bounds, upper_bounds, function(lower, upper) {
x[x > lower & x < upper]
}
)
You may use data.table::inrange and its incbounds argument. Assuming ranges are in a matrix 'm', as shown in your question:
x[data.table::inrange(x, m[ , 1], m[ , 2], incbounds = FALSE)]
# [1] 2 3 4 9 10 11 18 19
m <- matrix(v, ncol = 2, byrow = TRUE)
You were on the right path, and left.open indeed helps, but rightmost.closed actually concerns only the last interval rather than the right "side" of each interval. Hence, we need to use left.open twice. As you yourself figured out, it looks like an optimal way to do that is
x[findInterval(x, v) %% 2 == 1 & findInterval(x, v, left.open = TRUE) %% 2 == 1]
# [1] 2 3 4 9 10 11 18 19
Clearly there are alternatives. E.g.,
fun <- function(x, v)
if(length(v) > 1) v[1] < x & x < v[2] | fun(x, v[-1:-2]) else FALSE
x[fun(x, v)]
# [1] 2 3 4 9 10 11 18 19
I found an easy way just with sapply() :
x <- 1:20
v <- c(1, 5, 8, 12, 17, 20)
(v.df <- as.data.frame(matrix(v, 3, 2, byrow = T)))
# V1 V2
# 1 1 5
# 2 8 12
# 3 17 20
y <- sapply(x, function(x){
ind <- (x > v.df$V1 & x < v.df$V2)
if(any(ind)) x else NA
})
y[!is.na(y)]
# [1] 2 3 4 9 10 11 18 19

repeat the nth element every other n in a sequence

the original vector x:
x = 1:20
and what i look for is a vector y that repeats the n-th element in x every other n, for instance, when n=4:
n = 4
y = c(1,2,3,4,4,5,6,7,8,8,9,10,11,12,12,13,14,15,16,16,17,18,19,20,20)
i'm actually doing it for matrices and i think it relates to the use of apply here when margin=2 but couldn't figure it out right off the bat,
could anyone kindly show me a quick solution?
We can also use
v1 <- rep(1, length(x))
v1[c(FALSE, FALSE, FALSE, TRUE)] <- 2
rep(x, v1)
#[1] 1 2 3 4 4 5 6 7 8 8 9 10 11 12 12 13 14 15 16 16 17 18 19 20 20
Or as #MichaelChirico commented, the 2nd line of code can be made more general with
v1[seq_along(v1) %% n == 0L] = 2
Or in a one-liner with ifelse (from #JonathanCarroll's comments)
rep(x, ifelse(seq_along(x) %% n, 1, 2))
Indeed matrices are the way to go
duplast = function(M) rbind(M, M[nrow(M), ])
c(duplast(matrix(x, nrow = 4L)))
# [1] 1 2 3 4 4 5 6 7 8 8 9 10 11 12 12 13 14 15 16 16 17 18 19 20
# [25] 20
If you wanted to use apply:
c(apply(matrix(x, nrow = 4L), 2L, function(C) c(C, C[length(C)])))

Get a seq() in R with alternating steps

The seq function in R would give me a sequence from x to y with a constant step m:
seq(x, y, m)
E.g. seq(1,9,2) = c(1,3,5,7,9).
What would be the most elegant way to get a sequence from x to y with alternating steps m1 and m2, such that something like "seq(x, y, c(m1, m2))" would give me c(x, x + m1, (x + m1) + m2, (x + m1 + m2) + m1, ..., y), each time adding one of the steps (not necessarily reaching up to y, of course, as in seq)?
Example: x = 1; y = 19; m1 = 2; m2 = 4 and I get c(1,3,7,9,13,15,19).
I arrived the solution by:
1. Use cumsum with a vector c(from,rep(by,times),...), with by repeated times = ceiling((to-from)/sum(by)) times.
2. Truncate the sequence by !(seq > to).
seq_alt <- function(from, to, by) {
seq <- cumsum(c(from,rep(by,ceiling((to-from)/sum(by)))))
return(seq[! seq > to])
}
First n terms of this sequence you can generate with
x = 1; m1 = 2; m2 = 4
n <- 0:10 # first 11 terms
x + ceiling(n/2)*m1 + ceiling((n-1)/2)*m2
# [1] 1 3 7 9 13 15 19 21 25 27 31
Here is another idea,
fun1 <- function(x, y, j, z){
if(j >= y) {return(x)}else{
s1 <- seq(x, y, j+z)
s2 <- seq(x+j, y, j+z)
return(sort(c(s1, s2)))
}
}
fun1(1, 19, 2, 4)
#[1] 1 3 7 9 13 15 19
fun1(1, 40, 4, 3)
#[1] 1 5 8 12 15 19 22 26 29 33 36 40
fun1(3, 56, 7, 10)
#[1] 3 10 20 27 37 44 54
fun1(1, 2, 2, 4)
#[1] 1
Here is an alternative that uses diffinv This method over allocates the values, so as a stopping rule, I get the elements that are less than or equal to the stopping value.
seqAlt <- function(start, stop, by1, by2) {
out <- diffinv(rep(c(by1, by2), ceiling(stop / (by1 + by2))), xi=start)
return(out[out <= stop])
}
seqAlt(1, 19, 2, 4)
[1] 1 3 7 9 13 15 19
You could use Reduce with accumulate = TRUE to iteratively add either 2 or 4:
Reduce(`+`, rep(c(2,4), 10), init = 1, accumulate = TRUE)
# [1] 1 3 7 9 13 15 19 21 25 27 31 33 37 39 43 45 49 51 55 57 61
The number of times you repeat c(2,4) will determine sequence length; since it is 10 above, the sequence is length 20.
The purrr package has an accumulate wrapper, if you prefer the syntax:
purrr::accumulate(rep(c(2,4), 10), `+`, .init = 1)
## [1] 1 3 7 9 13 15 19 21 25 27 31 33 37 39 43 45 49 51 55 57 61
perfect example of recycling vectors in R
# 1.
x = 1; y = 19; m1 = 2; m2 = 4
(x:y)[c(TRUE, rep(FALSE, m1-1), TRUE, rep(FALSE,m2-1))]
# [1] 1 3 7 9 13 15 19
# 2.
x = 3; y = 56; m1 = 7; m2 = 10
(x:y)[c(TRUE, rep(FALSE, m1-1), TRUE, rep(FALSE,m2-1))]
# [1] 3 10 20 27 37 44 54

Resources