how to calculate 5 days cumulative using apply family in R - r

i have a matrix data frame 6940 rows and 100 columns. I need to find 5 days cumulative at a time on the data set. Right now I was able to build a for loop code for this as follows :
cum<- matrix(data=q1,nrow=6940,ncol=100)
for (j in 1:100){
for (i in 1:6940){
cum[i,j]<-sum(q1[i,j],q1[i+1,j],q1[i+2,j],q1[i+3,j],q1[i+4,j],na.rm=T)
}
}
I wanted to know whether there is any function in apply family to do the same, as this code is very time consuming.
for example if i generate a data frame using the command
ens <- matrix(rnorm(200),20)
I want cumulative sum of 5 rows a time. i.e sum of row1:row5, row2:row6, row3:row7 and so on in a form of data frame.
i tried using apply function in this form :
apply(apply(apply(apply( apply(m, 2, cumsum),2, cumsum), 2, cumsum),2,cumsum),2,cumsum)
but the problem is I don't get the cumulative in blocks of 5, only an overall cumulative.

Here is one approach using the stats::filter function to calculate the rolling sums and apply to loop over the columns:
m <- matrix(1:48, ncol = 4)
# [,1] [,2] [,3] [,4]
# [1,] 1 13 25 37
# [2,] 2 14 26 38
# [3,] 3 15 27 39
# [4,] 4 16 28 40
# [5,] 5 17 29 41
# [6,] 6 18 30 42
# [7,] 7 19 31 43
# [8,] 8 20 32 44
# [9,] 9 21 33 45
#[10,] 10 22 34 46
#[11,] 11 23 35 47
#[12,] 12 24 36 48
apply(m, 2, filter, filter = rep(1, 5), sides = 1)
# [,1] [,2] [,3] [,4]
# [1,] NA NA NA NA
# [2,] NA NA NA NA
# [3,] NA NA NA NA
# [4,] NA NA NA NA
# [5,] 15 75 135 195
# [6,] 20 80 140 200
# [7,] 25 85 145 205
# [8,] 30 90 150 210
# [9,] 35 95 155 215
#[10,] 40 100 160 220
#[11,] 45 105 165 225
#[12,] 50 110 170 230
This might have to be adjusted depending on how you want to handle windows with less than 5 values (e.g., here in the beginning).

Another option is roll_sum (Data from #Roland's post)
library(RcppRoll)
apply(m, 2, roll_sumr, 5)
# [,1] [,2] [,3] [,4]
# [1,] NA NA NA NA
# [2,] NA NA NA NA
# [3,] NA NA NA NA
# [4,] NA NA NA NA
# [5,] 15 75 135 195
# [6,] 20 80 140 200
# [7,] 25 85 145 205
# [8,] 30 90 150 210
# [9,] 35 95 155 215
#[10,] 40 100 160 220
#[11,] 45 105 165 225
#[12,] 50 110 170 230
As #alexis_laz mentioned in the comments, roll_sumr can take matrix as well. It is more efficient.
roll_sumr(m, 5, by = 1)
Benchmarks
set.seed(24)
m1 <- matrix(sample(1:50, 5000*5000, replace=TRUE), ncol=5000)
system.time(apply(m1, 2, roll_sumr, 5))
# user system elapsed
# 1.84 0.16 1.99
system.time(roll_sumr(m1, 5, by = 1))
# user system elapsed
# 0.59 0.15 0.74
system.time(apply(m1, 2, stats::filter, filter = rep(1, 5), sides = 1))
# user system elapsed
# 4.46 0.20 4.68

Another approach, less sophisticated: Created 5 variable and sum by the variable 5 time.
Here:
m <- data.table(matrix(1:48, ncol = 4))
m[, index := .I]
m[, i1 := floor((index - 1) / 5) * 5 + 1]
m[, i2 := floor((index - 2) / 5) * 5 + 2]
m[, i3 := floor((index - 3) / 5) * 5 + 3]
m[, i4 := floor((index - 4) / 5) * 5 + 4]
m[, i5 := floor((index - 5) / 5) * 5 + 5]
cumsumm <- rbindlist(list(m[, list(value = sum(V1)), by = "i1"]
, m[, list(value = sum(V1)), by = "i2"]
, m[, list(value = sum(V1)), by = "i3"]
, m[, list(value = sum(V1)), by = "i4"]
, m[, list(value = sum(V1)), by = "i5"]), use.names=F)[i1 > 0, ]

Related

Multiply specific columns of one matrix with specific columns of another matrix for many indices

I have two large matrices P and Q around (10k x 50k dim in both, but to test this yourself a random 10x10 matrix for P and Q is sufficient). I have a list of indices, e.g.
i j
1 4
1 625
1 9207
2 827
... ...
etc. This means that I need to find the dot product of column 1 in P and column 4 in Q, then column 1 in P and column 625 in Q and so on. I could easily solve this with a for loop but I know they are not very efficient in R. Anyone got any ideas?
edit: asked for a reproducible example
P <- matrix(c(1,0,1,0,0,1,0,1,0), nrow = 3, ncol = 3)
Q <- matrix(c(0,0,1,0,1,0,1,0,1), nrow = 3, ncol = 3)
i <- c(1,1,2)
j <- c(2,1,3)
gives output (if in dot product form)
1: 0
2: 1
3: 1
P <- matrix(1:50, nrow = 5,ncol = 10)
Q <- matrix(1:50, nrow = 5, ncol = 10)
i <- c(1,2,4,7)
j <- c(5,3,7,2)
P
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 6 11 16 21 26 31 36 41 46
# [2,] 2 7 12 17 22 27 32 37 42 47
# [3,] 3 8 13 18 23 28 33 38 43 48
# [4,] 4 9 14 19 24 29 34 39 44 49
# [5,] 5 10 15 20 25 30 35 40 45 50
Q
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 6 11 16 21 26 31 36 41 46
# [2,] 2 7 12 17 22 27 32 37 42 47
# [3,] 3 8 13 18 23 28 33 38 43 48
# [4,] 4 9 14 19 24 29 34 39 44 49
# [5,] 5 10 15 20 25 30 35 40 45 50
P[,i] * Q[, j]
# [,1] [,2] [,3] [,4]
# [1,] 21 66 496 186
# [2,] 44 84 544 224
# [3,] 69 104 594 264
# [4,] 96 126 646 306
# [5,] 125 150 700 350
Using matrix multiplication, you can do
diag(t(P[, i]) %*% Q[, j])
[1] 0 1 1
Here is second a solution with apply.
apply(cbind(i, j), 1, function(x) t(P[, x[1]]) %*% Q[, x[2]])
[1] 0 1 1
To verify these agree in a second example:
set.seed(1234)
A <- matrix(sample(0:10, 100, replace=TRUE), 10, 10)
B <- matrix(sample(0:10, 100, replace=TRUE), 10, 10)
inds <- matrix(sample(10, 10, replace=TRUE), 5)
matrix multiplication
diag(t(A[, inds[,1]]) %*% B[, inds[,2]])
[1] 215 260 306 237 317
and apply
apply(inds, 1, function(x) t(A[, x[1]]) %*% B[, x[2]])
[1] 215 260 306 237 317

R Apply() function instead of a loop requiring the index of the values

I use a for loop (which works well) to replace randomly two values in each line of a dataset by NA (the indexes of this values are randomly changes at each line).
Now I would like to use apply() to do exactly the same thing.
I tried this code (as many other things which return NA everywhere):
my_fun<-function(x){if (j %in% sample(1:ncol(y),2)) {x[j]<-NA}}
apply(y,1,my_fun)
But it doesn't work (it does not make any change to the initial dataset).
The problem is that the object j is not found. j should be the number of the column.
Does someone have an idea?
From your description I argue that you want:
my_fun <- function(x) { x[sample(1:length(x), 2)] <- NA; x }
apply(y, 1, my_fun) # or
t(apply(y, 1, my_fun))
Testing the function:
set.seed(42)
y <- matrix(1:60, 10)
y
t(apply(y, 1, my_fun))
# > t(apply(y, 1, my_fun))
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 11 21 31 NA NA
# [2,] 2 NA 22 32 NA 52
# [3,] 3 13 NA NA 43 53
# [4,] NA 14 24 34 NA 54
# [5,] 5 15 25 NA 45 NA
# [6,] 6 16 NA NA 46 56
# [7,] 7 NA 27 37 47 NA
# [8,] 8 18 NA 38 NA 58
# [9,] NA 19 29 39 49 NA
# [10,] 10 20 NA 40 50 NA

R: How to sum pairs in a Matrix by row?

Probably this would be easy. I have a Matrix:
testM <- matrix(1:40, ncol = 4, byrow = FALSE)
testM
[,1] [,2] [,3] [,4]
[1,] 1 11 21 31
[2,] 2 12 22 32
[3,] 3 13 23 33
[4,] 4 14 24 34
[5,] 5 15 25 35
[6,] 6 16 26 36
[7,] 7 17 27 37
[8,] 8 18 28 38
[9,] 9 19 29 39
[10,] 10 20 30 40
and I want to "reduce" the matrix summing column pairs by row. Expected result:
[,1] [,2]
[1,] 12 52
[2,] 14 54
[3,] 16 56
[4,] 18 58
[5,] 20 60
[6,] 22 62
[7,] 24 64
[8,] 26 66
[9,] 28 68
[10,] 30 70
I tried this but doesn't work
X <- apply(1:(ncol(testM)/2), 1, function(x) sum(testM[x], testM[x+1]) )
Error in apply(1:(ncol(testM)/2), 1, function(x) sum(testM[x], testM[x + :
dim(X) must have a positive length
testM[,c(T,F)]+testM[,c(F,T)];
## [,1] [,2]
## [1,] 12 52
## [2,] 14 54
## [3,] 16 56
## [4,] 18 58
## [5,] 20 60
## [6,] 22 62
## [7,] 24 64
## [8,] 26 66
## [9,] 28 68
## [10,] 30 70
Here's a solution using rowSums()
sapply( list(1:2,3:4) , function(i) rowSums(testM[,i]) )
if the number of columns should be arbitrary, it gets more complicated:
li <- split( 1:ncol(testM) , rep(1:(ncol(testM)/2), times=1 , each=2))
sapply( li , function(i) rowSums(testM[,i]) )
We can do a matrix multiplication:
M <- matrix(c(1,1,0,0, 0,0,1,1), 4, 2)
testM %*% M
another solution with tapply():
g <- gl(ncol(testM)/2, 2)
t(apply(testM, 1, FUN=tapply, INDEX=g, sum))
How about:
matrix(c(testM[, 1] + testM[, 2], testM[, 2] + testM[, 4]), nrow = 10)
a solution around your initial idea:
sapply(seq(2, ncol(testM), 2), function(x) apply(testM[, (x-1):x], 1, sum))

Replace NA with previous and next rows mean in R

How could I Replace a NA with mean of its previous and next rows in a fast manner?
name grade
1 A 56
2 B NA
3 C 70
4 D 96
such that B's grade would be 63.
Or you may try na.approx from package zoo: "Missing values (NAs) are replaced by linear interpolation"
library(zoo)
x <- c(56, NA, 70, 96)
na.approx(x)
# [1] 56 63 70 96
This also works if you have more than one consecutive NA:
vals <- c(1, NA, NA, 7, NA, 10)
na.approx(vals)
# [1] 1.0 3.0 5.0 7.0 8.5 10.0
na.approx is based on the base function approx, which may be used instead:
vals <- c(1, NA, NA, 7, NA, 10)
xout <- seq_along(vals)
x <- xout[!is.na(vals)]
y <- vals[!is.na(vals)]
approx(x = x, y = y, xout = xout)$y
# [1] 1.0 3.0 5.0 7.0 8.5 10.0
Assume you have a data.frame df like this:
> df
name grade
1 A 56
2 B NA
3 C 70
4 D 96
5 E NA
6 F 95
Then you can use the following:
> ind <- which(is.na(df$grade))
> df$grade[ind] <- sapply(ind, function(i) with(df, mean(c(grade[i-1], grade[i+1]))))
> df
name grade
1 A 56
2 B 63
3 C 70
4 D 96
5 E 95.5
6 F 95
An alternative solution, using the median instead of mean, is represented by the na.roughfix function of the randomForest package.
As described in the documentation, it works with a data frame or numeric matrix.
Specifically, for numeric variables, NAs are replaced with column medians. For factor variables, NAs are replaced with the most frequent levels (breaking ties at random). If object contains no NAs, it is returned unaltered.
Using the same examples as #Henrik,
library(randomForest)
x <- c(56, NA, 70, 96)
na.roughfix(x)
#[1] 56 70 70 96
or with a larger matrix:
y <- matrix(1:50, nrow = 10)
y[sample(1:length(y), 4, replace = FALSE)] <- NA
y
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 11 21 31 41
# [2,] 2 12 22 32 42
# [3,] 3 NA 23 33 NA
# [4,] 4 14 24 34 44
# [5,] 5 15 25 35 45
# [6,] 6 16 NA 36 46
# [7,] 7 17 27 37 47
# [8,] 8 18 28 38 48
# [9,] 9 19 29 39 49
# [10,] 10 20 NA 40 50
na.roughfix(y)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 11 21.0 31 41
# [2,] 2 12 22.0 32 42
# [3,] 3 16 23.0 33 46
# [4,] 4 14 24.0 34 44
# [5,] 5 15 25.0 35 45
# [6,] 6 16 24.5 36 46
# [7,] 7 17 27.0 37 47
# [8,] 8 18 28.0 38 48
# [9,] 9 19 29.0 39 49
#[10,] 10 20 24.5 40 50

Convert list of vectors into matrix

I have a matrix:
b<-matrix(NA,ncol=100,nrow=10)
and a list of vectors:
load("https://dl.dropboxusercontent.com/u/22681355/a.Rdata")
This list contains 100 vectors. I would like to assign each vector in a list to one column of the matrix. Is this possible to do?
Have a look at ?do.call and ?cbind, e.g:
## create an example list with 3 vectors
l <- list(c(1:3), c(4:6), c(7:9))
## columnwise binding of all vectors in the list `l`
do.call(cbind, l)
Or you could use a simple for loop:
for (i in seq(along=l)) {
n <- length(l[[i]])
b[seq(n), i] <- l[[i]]
}
Something like this (don't remember where I got this from):
cbind.fill <- function(nm) {
nm <- lapply(nm, as.matrix)
n <- max(sapply(nm, nrow))
do.call(cbind, lapply(nm, function(x) rbind(x, matrix(, n - nrow(x), ncol(x)))))
}
l <- list(c(1:3), c(4:8), c(7:9))
cbind.fill(l)
## > cbind.fill(l)
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
## [4,] NA 7 NA
## [5,] NA 8 NA
One trick is to first "lengthen" the vectors so they are all the same length (in your case 10). Here I start by creating dummy data (only 10 columns so I can show results easily, but this extends to your 100 column case):
set.seed(1)
lst <- replicate(10, sample(1:100, sample(5:10, 1))) # 10 vectors, length 5-10
Now lengthen, and cbind
lst <- lapply(lst, function(x) { length(x) <- 10; x }) # make all length 10
do.call(cbind, lst)
That's it:
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 38 63 100 39 19 65 87 41 9 78
# [2,] 57 7 38 87 82 78 44 91 87 96
# [3,] 90 21 77 34 66 55 24 29 34 43
# [4,] 20 18 91 47 78 52 7 45 82 70
# [5,] 87 66 21 58 11 76 10 32 98 39
# [6,] 98 37 62 NA 69 3 31 62 32 31
# [7,] NA 73 12 NA 39 45 49 25 45 72
# [8,] NA 47 25 NA NA 69 NA 97 83 NA
# [9,] NA NA 36 NA NA 64 NA NA 80 NA
# [10,] NA NA NA NA NA NA NA NA NA NA
If you call a position bigger than vector's length you got NA in the 'extras positions'. So, a simply mapply does the work:
set.seed(1)
lst <- replicate(10, sample(1:100, sample(5:10, 1))) # Simulating data (Thanks #BrodieG!)
mapply(function(x) x[1:10], lst) # You just need change tha maximium length
matrix(unlist(list(c(1:3), c(4:6), c(7:9))), ncol = 3)

Resources