I have read the description of by.column for rollapply in the manual but I couldn't understand how to use it. see below:
x=matrix(1:60,nrow=10)
library('zoo')
rollapply(x,3,mean,fill=NA,align="right",by.column=FALSE)
[1] NA NA 27 28 29 30 31 32 33 34
when i use by.column= FALSE: it applies mean to width (3) rolling number of lines mean(x[1:3,])
now, if I use by.column=TRUE then I get:
x=matrix(1:60,nrow=10)
rollapply(x,3,mean,fill=NA,align="right",by.column=TRUE)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] NA NA NA NA NA NA
[2,] NA NA NA NA NA NA
[3,] 2 12 22 32 42 52
[4,] 3 13 23 33 43 53
[5,] 4 14 24 34 44 54
[6,] 5 15 25 35 45 55
[7,] 6 16 26 36 46 56
[8,] 7 17 27 37 47 57
[9,] 8 18 28 38 48 58
[10,] 9 19 29 39 49 59
I can't make sense of the result. could anyone please explain what's the use of by.column and maybe provide an example?
by.column = TRUE (which is the default) with FUN = mean does a rolling mean separately for each column. The ith column of the result would be:
rollapplyr(x[, i], 3, mean, fill = NA)
by.column = FALSE inputs all columns at once to the function so in this case it would be the same as:
c(NA, NA, sapply(1:8, function(ix) mean(x[seq(ix, ix+2), ])))
Related
I want to know the command in R to lag a matrix.
I have defined x as:
> (x <- matrix(1:50, 10, 5))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 11 21 31 41
[2,] 2 12 22 32 42
[3,] 3 13 23 33 43
[4,] 4 14 24 34 44
[5,] 5 15 25 35 45
[6,] 6 16 26 36 46
[7,] 7 17 27 37 47
[8,] 8 18 28 38 48
[9,] 9 19 29 39 49
[10,] 10 20 30 40 50
I want create l.x:
[,1] [,2] [,3] [,4] [,5]
[1,] NA NA NA NA NA
[2,] 1 11 21 31 41
[3,] 2 12 22 32 42
[4,] 3 13 23 33 43
[5,] 4 14 24 34 44
[6,] 5 15 25 35 45
[7,] 6 16 26 36 46
[8,] 7 17 27 37 47
[9,] 8 18 28 38 48
[10,] 9 19 29 39 49
lag will coerce your object to a time-series (ts class to be specific) and only shifts the time index. It does not change the underlying data.
You need to manually lag the matrix yourself by adding rows of NA at the beginning and removing the same number of rows at the end. Here's an example of a function that does just that:
lagmatrix <- function(x, k) {
# ensure 'x' is a matrix
stopifnot(is.matrix(x))
if (k == 0)
return(x)
na <- matrix(NA, nrow=abs(k), ncol=ncol(x))
if (k > 0) {
nr <- nrow(x)
# prepend NA and remove rows from end
rbind(na, x[-((nr-k):nr),])
} else {
# append NA and remove rows from beginning
rbind(x[-1:k,], na)
}
}
Or you can use a lag function that does what you expect. For example, xts::lag.xts.
> xts::lag.xts(x)
[,1] [,2] [,3] [,4] [,5]
[1,] NA NA NA NA NA
[2,] 1 11 21 31 41
[3,] 2 12 22 32 42
[4,] 3 13 23 33 43
[5,] 4 14 24 34 44
[6,] 5 15 25 35 45
[7,] 6 16 26 36 46
[8,] 7 17 27 37 47
[9,] 8 18 28 38 48
[10,] 9 19 29 39 49
> is.matrix(xts::lag.xts(x))
[1] TRUE
Here is one manual method in base R with head and rbind:
rbind(NA, head(x, 9))
[,1] [,2] [,3] [,4] [,5]
[1,] NA NA NA NA NA
[2,] 1 11 21 31 41
[3,] 2 12 22 32 42
[4,] 3 13 23 33 43
[5,] 4 14 24 34 44
[6,] 5 15 25 35 45
[7,] 6 16 26 36 46
[8,] 7 17 27 37 47
[9,] 8 18 28 38 48
[10,] 9 19 29 39 49
More generally, as noted by #akrun, head(., -1) will work for any sized matrix:
rbind(NA, head(x, -1))
We can use apply
library(dplyr)
apply(x, 2, lag)
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA NA NA NA NA
# [2,] 1 11 21 31 41
# [3,] 2 12 22 32 42
# [4,] 3 13 23 33 43
# [5,] 4 14 24 34 44
# [6,] 5 15 25 35 45
# [7,] 6 16 26 36 46
# [8,] 7 17 27 37 47
# [9,] 8 18 28 38 48
#[10,] 9 19 29 39 49
0r
rbind(NA, x[-nrow(x),])
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA NA NA NA NA
# [2,] 1 11 21 31 41
# [3,] 2 12 22 32 42
# [4,] 3 13 23 33 43
# [5,] 4 14 24 34 44
# [6,] 5 15 25 35 45
# [7,] 6 16 26 36 46
# [8,] 7 17 27 37 47
# [9,] 8 18 28 38 48
#[10,] 9 19 29 39 49
Below is a pure dplyr solution without the need for apply. Only annoyance here is that it needs to be converted to a data.frame to work.
library(dplyr)
x %>% as.data.frame %>% mutate_each( funs(lag))
Matlab can do this task. I cannot get it right so far by using matrix(), t(), and reShape().
My intention is to transpose a series to a matrix of fixed 10 rows and the number of column varies based on the length of the data series. If these are some remains left, they can be discarded.
For example:
Row #1 1 2 3 4
Row #2 5 6 7 8
Row #3 9 10 11 12
Row #4 13 14 15 16
Row #5 17 18 19 20
Row #6 21 22 23 24
Row #7 25 26 27 28
Row #8 29 30 31 32
Row #9 33 34 35 36
Row #10 37 38 39 40
If there are any remains left (i.e, 41~49), these data can be just discarded.
Any suggestions?
This is what I think you are asking for. A vector of arbitrary length and data. To be turned into a matrix with nrow 10 and ncol based on data length.
#your series of arbitrary length
data = 1:49
#calculate number of columns based on length
col = as.integer(length(data)/10)
#max index
maxIndx = 10*col
#create and transpose matrix
yourMtx = t(matrix(data[0:maxIndx],col,10))
#your matrix
> [,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
[4,] 13 14 15 16
[5,] 17 18 19 20
[6,] 21 22 23 24
[7,] 25 26 27 28
[8,] 29 30 31 32
[9,] 33 34 35 36
[10,] 37 38 39 40
#create reverse matrix
revMtx = yourMtx[,rev(seq_len(ncol(yourMtx)))]
#reverse matrix
> [,1] [,2] [,3] [,4]
[1,] 4 3 2 1
[2,] 8 7 6 5
[3,] 12 11 10 9
[4,] 16 15 14 13
[5,] 20 19 18 17
[6,] 24 23 22 21
[7,] 28 27 26 25
[8,] 32 31 30 29
[9,] 36 35 34 33
[10,] 40 39 38 37
If I understand your question correctly, this looks to be an approach you could use.
# generate my series
myseries <- 1:49
# specify number of columns and rows
ncols <- 4
nrows <- 10
# create a matrix with the first ncols*nrows elements and fill by row
mymatrix <- matrix(myseries[1:(ncols*nrows)],
ncol = ncols, nrow = nrows, byrow = TRUE)
mymatrix
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
[4,] 13 14 15 16
[5,] 17 18 19 20
[6,] 21 22 23 24
[7,] 25 26 27 28
[8,] 29 30 31 32
[9,] 33 34 35 36
[10,] 37 38 39 40
I have a 60 column matrix, and I want to reverse the some of its rows.
I came across the following two ways to do this:
#rtr is an integer vectors with the indices of the rows I want to reverse
matrix[rtr,]<-matrix[rtr,(ncol(matrix):1]
and
matrix[rtr,]<-rev(mat[rtr,])
Are these two implementations expected to produce the same result, or
are there some differences between them?
Thanks in advance
This seems to be a pretty easy thing to test
mm <- matrix(1:(6*7), ncol=6)
m2 <- m1 <- mm
rtr<-c(1,6,7)
m1[rtr,]<-m1[rtr, ncol(m1):1]
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 36 29 22 15 8 1
# [2,] 2 9 16 23 30 37
# [3,] 3 10 17 24 31 38
# [4,] 4 11 18 25 32 39
# [5,] 5 12 19 26 33 40
# [6,] 41 34 27 20 13 6
# [7,] 42 35 28 21 14 7
m2[rtr,]<-rev(m2[rtr,])
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 42 35 28 21 14 7
# [2,] 2 9 16 23 30 37
# [3,] 3 10 17 24 31 38
# [4,] 4 11 18 25 32 39
# [5,] 5 12 19 26 33 40
# [6,] 41 34 27 20 13 6
# [7,] 36 29 22 15 8 1
We can see they produce different output. The latter changes the order of the rows as well rather than just reversing them "in place"
How could I Replace a NA with mean of its previous and next rows in a fast manner?
name grade
1 A 56
2 B NA
3 C 70
4 D 96
such that B's grade would be 63.
Or you may try na.approx from package zoo: "Missing values (NAs) are replaced by linear interpolation"
library(zoo)
x <- c(56, NA, 70, 96)
na.approx(x)
# [1] 56 63 70 96
This also works if you have more than one consecutive NA:
vals <- c(1, NA, NA, 7, NA, 10)
na.approx(vals)
# [1] 1.0 3.0 5.0 7.0 8.5 10.0
na.approx is based on the base function approx, which may be used instead:
vals <- c(1, NA, NA, 7, NA, 10)
xout <- seq_along(vals)
x <- xout[!is.na(vals)]
y <- vals[!is.na(vals)]
approx(x = x, y = y, xout = xout)$y
# [1] 1.0 3.0 5.0 7.0 8.5 10.0
Assume you have a data.frame df like this:
> df
name grade
1 A 56
2 B NA
3 C 70
4 D 96
5 E NA
6 F 95
Then you can use the following:
> ind <- which(is.na(df$grade))
> df$grade[ind] <- sapply(ind, function(i) with(df, mean(c(grade[i-1], grade[i+1]))))
> df
name grade
1 A 56
2 B 63
3 C 70
4 D 96
5 E 95.5
6 F 95
An alternative solution, using the median instead of mean, is represented by the na.roughfix function of the randomForest package.
As described in the documentation, it works with a data frame or numeric matrix.
Specifically, for numeric variables, NAs are replaced with column medians. For factor variables, NAs are replaced with the most frequent levels (breaking ties at random). If object contains no NAs, it is returned unaltered.
Using the same examples as #Henrik,
library(randomForest)
x <- c(56, NA, 70, 96)
na.roughfix(x)
#[1] 56 70 70 96
or with a larger matrix:
y <- matrix(1:50, nrow = 10)
y[sample(1:length(y), 4, replace = FALSE)] <- NA
y
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 11 21 31 41
# [2,] 2 12 22 32 42
# [3,] 3 NA 23 33 NA
# [4,] 4 14 24 34 44
# [5,] 5 15 25 35 45
# [6,] 6 16 NA 36 46
# [7,] 7 17 27 37 47
# [8,] 8 18 28 38 48
# [9,] 9 19 29 39 49
# [10,] 10 20 NA 40 50
na.roughfix(y)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 11 21.0 31 41
# [2,] 2 12 22.0 32 42
# [3,] 3 16 23.0 33 46
# [4,] 4 14 24.0 34 44
# [5,] 5 15 25.0 35 45
# [6,] 6 16 24.5 36 46
# [7,] 7 17 27.0 37 47
# [8,] 8 18 28.0 38 48
# [9,] 9 19 29.0 39 49
#[10,] 10 20 24.5 40 50
I have R object which is to be shared with another analyst. I write:
dput(objectname,"filename.R")
and then the object can be reconstructed by sharing the .R file and using the dget(filename.R) function.
Now what I want to do is just share the object structure and not the file. Is it possible?
I tried the following:
a <- dput(b)
structure(1:50, .Dim = c(10L, 5L))
But cannot reconstruct this object. All I get is :
dget(a)
?
> a <- dput(b)
structure(1:50, .Dim = c(10L, 5L))
> a # 'reconstructing' b
[,1] [,2] [,3] [,4] [,5]
[1,] 1 11 21 31 41
[2,] 2 12 22 32 42
[3,] 3 13 23 33 43
[4,] 4 14 24 34 44
[5,] 5 15 25 35 45
[6,] 6 16 26 36 46
[7,] 7 17 27 37 47
[8,] 8 18 28 38 48
[9,] 9 19 29 39 49
[10,] 10 20 30 40 50
Note that after dputting b and assigning it to a a message is displayed, to recover b, just print a.