Using apply in a 'window' - r

Is there a way to use apply functions on 'windows' or 'ranges'? This example should serve to illustrate:
a <- 11:20
Now I want to calculate the sums of consecutive elements. i.e.
[11+12, 12+13, 13+14, ...]
The ways I can think of handling this are:
a <- 11:20
b <- NULL
for(i in 1:(length(a)-1))
{
b <- c(b, a[i] + a[i+1])
}
# b is 23 25 27 29 31 33 35 37 39
or alternatively,
d <- sapply( 1:(length(a)-1) , function(i) a[i] + a[i+1] )
# d is 23 25 27 29 31 33 35 37 39
Is there a better way to do this?
I'm hoping there's something like:
e <- windowapply( a, window=2, function(x) sum(x) ) # fictional function
# e should be 23 25 27 29 31 33 35 37 39

Here's an anternative using rollapply from zoo package
> rollapply(a, width=2, FUN=sum )
[1] 23 25 27 29 31 33 35 37 39
zoo package also offers rollsum function
> rollsum(a, 2)
[1] 23 25 27 29 31 33 35 37 39

We can define a general moving() function:
moving <- function(f){
g <- function(i , x, n , f, ...) f(x[(i-n+1):i], ...)
function(x, n, ...) {
N <- length(x)
vapply(n:N, g, x , n , f, FUN.VALUE = numeric(1), ...)
}
}
Function moving() returns function that, in turn can be used to generate any moving_f() functions:
moving_sum <- moving(sum)
moving_sum(x = 11:20, n = 2)
similarly, even passing extra arguments to moving_f()
moving_mean <- moving(mean)
moving_mean(x = rpois(22, 6), n = 5, trim = 0.1)

You can achieve your windowapply function by first creating a list of indices and then *applying over them such that they are used as extraction indices:
j <- lapply(seq_along(a), function(i) if(i<10) c(i,i+1) else i)
sapply(j, function(j) sum(a[j]))
## [1] 23 25 27 29 31 33 35 37 39

Related

Making a "Race" Between Two Variables

I would like to make two variables ("a" and "b") that keep:
taking a random value less ALWAYS than their current value (i.e. a1 > a2 > a3 ...> an , b1 > b2 > b3 ... bn ALWAYS)
until one of them less than or equal to 0:
I showed a demo below:
#iteration 1
a1 = 100 - rnorm(1,5,10)
b1 = 100 -rnorm(1,5,10)
a2 = a1 - rnorm(1,5,10)
b2 = b1 -rnorm(1,5,10)
a3 = a2 - rnorm(1,5,10)
b3 = b2 -rnorm(1,5,10)
#etc.
I would then like to repeat this many times. In the end, this would look something :
Currently, I am doing this manually, and then using the bind_rows() command to "pile" each iteration on top of each other. Can someone please show me a faster way to do this?
Thank you!
You could write a smallrecursive function:
fun <- function(x){
if(any(x < 0)) x
else rbind(x, fun(x - abs(rnorm(length(x),5,10)) ))
}
Now for 1 draw of A and B:
set.seed(1)
fun(c(A=100, B=100))
A B
x 100.00000 100.000000
x 98.73546 93.163567
x 95.37918 72.210759
x 87.08410 69.006075
x 77.20981 56.622828
x 66.45199 54.676712
x 46.33418 45.778279
x 45.12178 28.631280
x 28.87247 24.080617
x 24.03437 9.642254
10.82216 -1.296759
We can use this within a function to replicate. Will maintain BASE R although can be simplified in tidyverse:
random_seq <- function(n, start){
fun <- function(x){
if(any(x < 0)) c(x)
else rbind(x, fun(x - abs(rnorm(length(x),5,10)) ))
}
R <-replicate(n, data.frame(fun(start), row.names = NULL), simplify = FALSE)
S <- do.call(rbind, Map(cbind, id = seq(R), R))
U <-transform(S, time = ave(id, id, FUN = seq_along))
reshape(U, dir='wide', idvar = 'id', sep='')
}
set.seed(1)
random_seq(4, c(A=20,B=20))
id A1 B1 A2 B2 A3 B3 A4 B4
1 1 20 20 18.7354619 13.163567 15.379176 -7.789241 NA NA
4 2 20 20 11.7049223 16.795316 1.830632 4.412069 -8.927182 2.465953
8 3 20 20 -0.1178117 11.101568 NA NA NA NA
10 4 20 20 18.7875942 2.853001 2.538285 -1.697663 NA NA
BONUS:
if interested, fun can directly reproduce the names:
fun <- function(x){
nms <- as.numeric(sub('\\D+', '',names(x))) + 1
names(x) <- paste0(sub("\\d+", '', names(x)), nms)
if(any(x < 0)) c(x)
else c(x, Recall(x - abs(rnorm(length(x),5,10)) ))
}
fun(c(A0=20, B0=30))
A1 B1 A2 B2 A3 B3
20.000000 30.000000 11.234808 23.323201 -9.611483 1.544311
Here's a function that runs a single start to 0, nicely configurable, and we can use replicate to run it as many times as needed, returning a list.
to_0 = function(start = 100, fun = runif, ..., n = 1000) {
if(start <= 0) stop("Must start greater than 0")
result = start - c(0, cumsum(fun(n, ...)))
if(all(result > 0)) stop("Didn't reach 0, set a higher n or check inputs.")
first_0 = match(TRUE, result < 0)
result[seq_len(first_0)]
}
I used runif as the default instead of your rnorm because you say you want the series to be strictly decreasing, but rnorm is sometimes positive and sometimes negative so it will sometimes lead to increases.
I cut off the series at the first negative value. Since the lengths of each run are different, a data.frame seems like a bad choice, keeping them in a list is better. We can use lengths() to see how long each vector in the list is.
The function is parametrized, so you can easily try out other distributions or custom functions, e.g., to_0(start = 100, fun = rexp, rate = 0.1). Below I demonstrate with the uniform distribution starting at 10.
set.seed(47)
race = replicate(n = 100, to_0(start = 10))
head(race)
# [[1]]
# [1] 10.00000000 9.02303800 8.64912196 7.88761993 7.06512831 6.49158390 5.80017147 5.41110962 4.94216364 4.39885390 3.47396185
# [12] 3.33516427 2.63317707 2.47098343 1.87167641 1.36564030 0.46366678 0.06316398 0.03221901 -0.03913915
#
# [[2]]
# [1] 10.00000000 9.27320918 8.54814801 7.77974923 7.34440424 7.27499236 6.76825217 6.75134855 6.20214287 5.43031741 4.56633348
# [12] 3.59288910 3.24547860 2.60269295 1.75639299 1.73279651 1.72371866 1.38211688 0.71933800 0.04916749 -0.40714758
#
# [[3]]
# [1] 10.00000000 9.08923490 9.06189460 8.69397353 8.30179409 8.11077841 7.96295850 7.49701585 6.52812608 6.26480567 5.34558158
# [12] 5.31801508 4.90573089 3.98774633 3.89046321 3.70358854 3.61482042 3.53824450 3.36900151 2.86522484 2.23295349 1.80544403
# [23] 0.82311022 0.73664857 -0.09385818
#
# [[4]]
# [1] 10.0000000 9.2172681 8.4175584 8.1672679 7.3683421 7.3373712 7.0319788 6.6512214 5.7210315 5.2732412 4.6817849 4.1065416
# [13] 3.9452541 3.4009742 2.5018050 1.5316136 0.7175295 0.4410275 -0.1859260
#
# [[5]]
# [1] 10.00000000 9.91914621 9.90238843 9.82993154 9.33156028 8.90827720 8.44160294 7.46348397 6.76539075 6.27298443 5.97401412
# [12] 5.03395592 4.55537992 3.75737919 2.82175869 2.75045000 2.70081885 2.67523320 2.20266408 2.12695183 1.25880525 0.57011279
# [23] 0.03173135 -0.79275633
#
# [[6]]
# [1] 10.0000000 9.9292630 9.6154147 9.0754730 8.7814754 8.5273701 7.6998567 6.8127609 5.9944598 5.6232599 5.1505038 4.8676191
# [13] 4.6337121 4.5868438 4.0435219 3.0981151 2.2621741 1.9925101 1.2104707 0.9334569 0.7574446 0.1643009 -0.5220925
lengths(race)
# [1] 20 21 25 19 24 23 21 24 23 22 25 24 19 19 23 17 19 23 25 21 24 25 18 22 24 25 19 19 23 22 19 26 20 23 24 24 22 21 25 23 21 28 19 20 16 20
# [47] 22 25 20 22 23 23 24 22 19 23 23 23 22 18 22 23 24 21 21 23 21 22 20 25 22 23 21 17 20 20 16 25 21 21 21 20 20 19 24 19 23 24 26 25 20 21
# [93] 23 17 27 18 30 24 21 23

Using a loop to create a polynomial model gives R trouble understanding it?

I create a lot of polynomial models to compare them, so I used a loop like this:
library(ISLR)
library(boot)
data(Wage)
list = list()
for (i in 1:10){
list[[i]] = lm(wage ~ poly(age, i), data = Wage)
assign(paste("fit.aov", i, sep = ""), list[[i]])
}
agelims <- range(Wage$age)
age.grid <- seq(agelims[1], agelims[2])
If I run the following code
preds <- predict(fit.aov1, data.frame(age = age.grid), se=TRUE)
I receive the following error:
Error: variable 'poly(age, i)' was fitted with type "nmatrix.1" but type "nmatrix.10" was supplied
In addition: Warning message:
In Z/rep(sqrt(norm2[-1L]), each = length(x)) :
longer object length is not a multiple of shorter object length
However, if I create each model manually like this
fit1 = lm(wage, poly(age,1), data = Wage)
Then the predict() function runs just fine.
Here we need to create the formula with paste
lst1 <- vector('list', 10)
for (i in 1:10){
fmla <- sprintf("wage~ poly(age,%d)", i)
print(fmla)
lst1[[i]] = lm(as.formula(fmla), data = Wage)
lst1[[i]]$call <- parse(text =fmla )[[1]]
assign(paste("fit.aov", i, sep = ""), lst1[[i]])
}
-testing with predict
predict(fit.aov1, data.frame(age = age.grid), se=TRUE)
#$fit
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14
# 94.43570 95.14298 95.85025 96.55753 97.26481 97.97208 98.67936 99.38663 100.09391 100.80119 101.50846 102.21574 102.92301 103.63029
# 15 16 17 18 19 20 21 22 23 24 25 26 27 28
#104.33757 105.04484 105.75212 106.45939 107.16667 107.87394 108.58122 109.28850 109.99577 110.70305 111.41032 112.11760 112.82488 113.53215
# 29 30 31 32 33 34 35 36 37 38 39 40 41 42
#114.23943 114.94670 115.65398 116.36126 117.06853 117.77581 118.48308 119.19036 119.89764 120.60491 121.31219 122.01946 122.72674 123.43402
# 43 44 45 46 47 48 49 50 51 52 53 54 55 56
#124.14129 124.84857 125.55584 126.26312 126.97039 127.67767 128.38495 129.09222 129.79950 130.50677 131.21405 131.92133 132.62860 133.33588
# 57 58 59 60 61 62 63
#134.04315 134.75043 135.45771 136.16498 136.87226 137.57953 138.28681
# ...
The issue was that we are passing poly(age, i) which is not getting recognized as 1, 2, ... instead as only i

Extracting chunks from a matrix by columns

Say I have a matrix with 1000 columns. I want to create a new matrix with every other n columns from the original matrix, starting from column i.
So let say that n=3 and i=5, then the columns I need from the old matrix are 5,6,7,11,12,13,17,18,19 and so on.
Using two seq()s to create the start and stop bounds, then using a mapply() on those to build your true column index intervals. Then just normal bracket notation to extract from your matrix.
set.seed(1)
# using 67342343's test case
M <- matrix(runif(100^2), ncol = 100)
n <- 3
i <- 5
starts <- seq(i, ncol(M), n*2)
stops <- seq(i+(n-1), ncol(M), n*2)
col_index <- c(mapply(seq, starts, stops)) # thanks Jaap and Sotos
col_index
[1] 5 6 7 11 12 13 17 18 19 23 24 25 29 30 31 35 36 37 41 42 43 47 48 49 53 54 55 59 60 61 65 66 67 71 72 73 77 78
[39] 79 83 84 85 89 90 91 95 96 97
M[, col_index]
Another solution is based on the fact that R uses index recycling:
i <- 5; n <- 3
M <- matrix(runif(100^2), ncol = 100)
id <- seq(i, ncol(M), by = 1)[rep(c(TRUE, FALSE), each = n)]
M_sub <- M[, id]
I would write a function that determines the indices of the columns you want, and then call that function as needed.
col_indexes <- function(mat, start = 1, by = 1){
n <- ncol(mat)
inx <- seq(start, n, by = 2*by)
inx <- c(sapply(inx, function(i) i:(i + by -1)))
inx[inx <= n]
}
m <- matrix(0, nrow = 1, ncol = 20)
icol <- col_indexes(m, 5, 3)
icol
[1] 5 6 7 11 12 13 17 18 19
Here is a method using outer.
c(outer(5:7, seq(0L, 95L, 6L), "+"))
[1] 5 6 7 11 12 13 17 18 19 23 24 25 29 30 31 35 36 37 41 42 43 47 48 49 53
[26] 54 55 59 60 61 65 66 67 71 72 73 77 78 79 83 84 85 89 90 91 95 96 97
To generalize this, you could do
idx <- c(outer(seq(i, i + n), seq(0L, ncol(M) - i, 2 * n), "+"))
The idea is to construct the initial set of columns (5:7 or seq(i, i + n)), calculate the starting points for every subsequent set (seq(0L, 95L, 6L) or seq(0L, ncol(M) - i, 2 * n)) then use outer to calculate the sum of every combination of these two vectors.
you can subset the matrix using [ like M[, idx].

Finding local maxima and minima in R

I'm trying to create a function to find a "maxima" and "minima". I have the following data:
y
157
144
80
106
124
46
207
188
190
208
143
170
162
178
155
163
162
149
135
160
149
147
133
146
126
120
151
74
122
145
160
155
173
126
172
93
I have tried this function to find "maxima"
localMaxima <- function(x) {
# Use -Inf instead if x is numeric (non-integer)
y <- diff(c(-.Machine$integer.max, x)) > 0L
rle(y)$lengths
y <- cumsum(rle(y)$lengths)
y <- y[seq.int(1L, length(y), 2L)]
if (x[[1]] == x[[2]]) {
y <- y[-1]
}
y
}
maks <- localMaxima(x)
And funtion to find "minima"
localMinima <- function(x) {
# Use -Inf instead if x is numeric (non-integer)
y <- diff(c(.Machine$integer.max, x)) > 0L
rle(y)$lengths
y <- cumsum(rle(y)$lengths)
y <- y[seq.int(1L, length(y), 2L)]
if (x[[1]] == x[[2]]) {
y <- y[-1]
}
y
}
mins <- localMinima(x)
And the result is not 100% right
maks = 1 5 7 10 12 14 16 20 24 27 31 33 35
mins = 3 6 8 11 13 15 19 23 26 28 32 34 36
The result should
maks = 5 7 10 12 14 16 20 24 27 31 33 35
mins = 3 6 8 11 13 15 19 23 26 28 32 34
Finding local maxima and minima in R comes close, but doesn't quite fit.
How can I fix this?
Thanks you very much
You could define two functions like the below which produce the vectors you need:
library(data.table)
#shift lags or leads a vector by a certain amount defined as the second argument
#the default is to lag a vector.
#The rationale behind the below code is that each local minimum's adjucent
#values will be greater than itself. The opposite is true for a local
#maximum. I think this is what you are trying to achieve and one way to do
#it is the following code
maximums <- function(x) which(x - shift(x, 1) > 0 & x - shift(x, 1, type='lead') > 0)
minimums <- function(x) which(x - shift(x, 1) < 0 & x - shift(x, 1, type='lead') < 0)
Output:
> maximums(y)
[1] 5 7 10 12 14 16 20 24 27 31 33 35
> minimums(y)
[1] 3 6 8 11 13 15 19 23 26 28 32 34
this is a function i wrote a while back (and it's more general than you need). it finds peaks in sequential data x, where i define a peak as a local maxima with m points either side of it having lower value than it (so bigger m leads to more stringent criteria for peak finding):
find_peaks <- function (x, m = 3){
shape <- diff(sign(diff(x, na.pad = FALSE)))
pks <- sapply(which(shape < 0), FUN = function(i){
z <- i - m + 1
z <- ifelse(z > 0, z, 1)
w <- i + m + 1
w <- ifelse(w < length(x), w, length(x))
if(all(x[c(z : i, (i + 2) : w)] <= x[i + 1])) return(i + 1) else return(numeric(0))
})
pks <- unlist(pks)
pks
}
so for your case m = 1:
find_peaks(x, m = 1)
#[1] 5 7 10 12 14 16 20 24 27 31 33 35
and for the minima:
find_peaks(-x, m = 1)
#[1] 3 6 8 11 13 15 19 23 26 28 32 34

saving vectors from a loop in a list in r

Hi I never edited a question of mine but I'll give it a try. It's not soo extremely important what the code means actually. For me only saving the vectors "liste" in a new list is relevant :D
test <- list()
test <- replicate(5, sample(1:100, 50), simplify = FALSE) # Creates a list of 5 vectors
> test[[1]]
[1] 90 96 20 86 32 77 83 33 64 29 88 97 78 81 40 60 89 19 31 59 26 38 34 71 5 80 85
[28] 3 70 87 41 50 6 18 37 58 9 76 91 62 12 30 42 94 72 95 100 10 68 82
S <- test[[1]]
x <- diff(S) # following algorythm creates "liste" (vector) for test [[1]]
trendtest <- list()
k <- NULL
d <- NULL
t <- vector("list",length(x))
A <- vector("list",length(x))
z <- vector("list",length(x)-2)
za <- vector("list",length(x)-2)
liste <- NULL
dreisum <- sapply(1:(length(x)-2), function(i) sum(x[c(i,(i+1))]))
dreisumi <- lapply(1:(length(x)-2), function(i) dreisum[i:(length(x)-2)])
zdreisumi<- lapply(1:(length(x)-4), function(i) dreisumi[[i]] [3:length(dreisumi[[i]])]<0)
zadreisumi<- lapply(1:(length(S)-4), function(i) dreisumi[[i]][3:length(dreisumi[[i]])]>0)
Si <- lapply(1:(length(x)-2), function(i) S[i:(length(x))])
i <- 1
h <- 1
while(i<(length(x)-3) & h!=Inf){
k <- c(k,k <- (S[i]-S[i+2])/(-2))
d <- c(d,d <- (S[i+2]*i-S[i]*(i+2))/(-2))
t[[i]] <- i:(length(x))
A[[i]] <- k[length(liste)+1]*t[[i]]+d[length(liste)+1]
A[[i]][3] <- S[i+2]
z[[i]] <- Si[[i]][3:length(Si[[i]])]<A[[i]][3:length(A[[i]])]
za[[i]] <- Si[[i]][3:length(Si[[i]])]>A[[i]][3:length(A[[i]])]
if(k[length(liste)+1]>0 & S[i+3]>A[[i]][4] & is.element(TRUE,z[[i]])){h <- (min(which(z[[i]]!=FALSE))+1)}else{
if(k[length(liste)+1]>0 & S[i+3]<A[[i]][4] & is.element(TRUE,za[[i]])){h <- (min(which(za[[i]]!=FALSE))+1)}else{
if(k[length(liste)+1]<0 & S[i+3]>A[[i]][4] & is.element(TRUE,z[[i]])){h <- (min(which(z[[i]]!=FALSE))+1)}else{
if(k[length(liste)+1]<0 & S[i+3]<A[[i]][4] & is.element(TRUE,za[[i]])){h <- (min(which(za[[i]]!=FALSE))+1)}else{
if(k[length(liste)+1]>0 & S[i+3]>A[[i]][4] & (all(z[[i]]==FALSE))){h <- (min(which(zdreisumi[[i]]!=FALSE))+2)}else{
if(k[length(liste)+1]>0 & S[i+3]<A[[i]][4] & (all(za[[i]]==FALSE))){h <- (min(which(zdreisumi[[i]]!=FALSE))+2)}else{
if(k[length(liste)+1]<0 & S[i+3]>A[[i]][4] & (all(z[[i]]==FALSE))){h <- (min(which(zadreisumi[[i]]!=FALSE))+2)}else{
if(k[length(liste)+1]<0 & S[i+3]<A[[i]][4] & (all(za[[i]]==FALSE))){h <- (min(which(zadreisumi[[i]]!=FALSE))+2)}}}}}}}}
liste <- c(liste,i)
i <- i+h-1
if((length(x)-3)<=i & i<=length(x)){liste <- c(liste,i)}}
> liste
[1] 1 3 7 10 12 16 18 20 24 27 30 33 36 39 41 46
Actually the whole code is not so interesting for my problem because it works! I made the example for test[[1]] now. BUT I want that a for-loop (or whatever) takes ALL vectors in "test" and saves ALL 5 vectors "liste" in a new list (lets call it "trendtest" ... whatever :D)
The following will do what you ask for:
Delete the line trendtest <- list().
Take the code from x <- diff(S) to last line (except the very last line that only prints liste) and insert it at the position indicated by the placeholder __CODE_HERE__.
trendtest <- lapply(test, FUN = function(S) {
__CODE_HERE__
return(liste)
})
This is the "R way" of doing what you want. Alternatively, you could do the following (which is closer to your initial approach, but less the "R way"):
trendtest <- vector("list", length(test))
for (u in 1:length(test)) { # better: u in seq_along(test)
S <- test[[u]]
__CODE_HERE__
trendtest[[u]] <- liste
}
Note that there will be an error message which is due to the sample data (which doesn't fit the algorithm provided) and is unrelated to saving liste in trendtest.

Resources