Related
I have a number of subarrays, say 2 (for simplicity), each with the same number of rows and columns. Each spot in the subarrays is occupied by a number in [1, 10].
What I would like to do is move rows randomly between subarrays according to some rate of movement m = [0, 1]. m = 0 corresponds to no movement, while m = 1 means that any rows across all subarrays can be moved.
I take inspiration from:
How to swap a number of the values between 2 rows in R
but my problem is a bit different than this. I do know that sample() would be needed here.
Is there an easy way to go about accomplishing this?
This doesn't do it, but I believe I'm on the right track anyway.
m <- 0.2
a <- array(dim = c(5, 5, 2)) # 5 rows, 5 columns, 2 subarrays
res <- rep(sample(nrow(a), size = ceiling(nrow(a)*m), replace = FALSE)) # sample 20% of rows from array a.
Any assistance is appreciated.
It is significantly easier if you can use a matrix (2-dim array).
set.seed(2)
m <- 0.2
d <- c(10, 4)
a <- array(sample(prod(d)), dim = d)
a
# [,1] [,2] [,3] [,4]
# [1,] 8 17 14 1
# [2,] 28 37 40 26
# [3,] 22 38 16 29
# [4,] 7 35 3 32
# [5,] 34 11 23 4
# [6,] 36 33 19 31
# [7,] 5 24 30 13
# [8,] 39 6 27 25
# [9,] 15 10 12 9
# [10,] 18 2 21 20
(I'm going to set the seed again to something that conveniently gives me something "interesting" to show.)
set.seed(2)
ind <- which(runif(d[1]) < m)
ind
# [1] 1 4 7
The first randomness, runif, is compared against m and generates the indices that may change. The second randomness, sample below, takes those indices and possibly reorders them. (In this case, it reorders "1,4,7" to "4,1,7", meaning the third of the rows-that-may-change will be left unchanged.)
a[ind,] <- a[sample(ind),]
a
# [,1] [,2] [,3] [,4]
# [1,] 7 35 3 32 # <-- row 4
# [2,] 28 37 40 26
# [3,] 22 38 16 29
# [4,] 8 17 14 1 # <-- row 1
# [5,] 34 11 23 4
# [6,] 36 33 19 31
# [7,] 5 24 30 13 # <-- row 7, unchanged
# [8,] 39 6 27 25
# [9,] 15 10 12 9
# [10,] 18 2 21 20
Note that this is probabilistic, which means a probability of 0.2 does not guarantee you 20% (or even any) of the rows will be swapped.
(Since I'm guessing you'd really like to preserve your 3-dim (or even n-dim) array, you might be able to use aperm to transfer between array <--> matrix.)
EDIT 1
As an alternative to a probabilitic use of runif, you can use:
ind <- head(sample(d[1]),size=d[1]*m)
to get closer to your goal of "20%". Since d[1]*m will often not be an integer, head silently truncates/floors the number, so you'll get the price-is-right winner: closest to but not over your desired percentage.
EDIT 2
A reversible method for transforming an n-dimensional array into a matrix and back again. Caveat: though the logic appears solid, my testing has only included a couple arrays.
array2matrix <- function(a) {
d <- dim(a)
ind <- seq_along(d)
a2 <- aperm(a, c(ind[2], ind[-2]))
dim(a2) <- c(d[2], prod(d[-2]))
a2 <- t(a2)
attr(a2, "origdim") <- d
a2
}
The reversal uses the "origdim" attribute if still present; this will work as long as your modifications to the matrix do not clear its attributes. (Simple row-swapping does not.)
matrix2array <- function(m, d = attr(m, "origdim")) {
ind <- seq_along(d)
m2 <- t(m)
dim(m2) <- c(d[2], d[-2])
aperm(m2, c(ind[2], ind[-2]))
}
(These two functions should probably do some more error-checks, such as is.null(d).)
A sample run:
set.seed(2)
dims <- 5:2
a <- array(sample(prod(dims)), dim=dims)
Quick show:
a[,,1,1:2,drop=FALSE]
# , , 1, 1
# [,1] [,2] [,3] [,4]
# [1,] 23 109 61 90
# [2,] 84 15 27 102
# [3,] 68 95 83 24
# [4,] 20 53 117 46
# [5,] 110 62 43 8
# , , 1, 2
# [,1] [,2] [,3] [,4]
# [1,] 118 25 14 93
# [2,] 65 21 16 77
# [3,] 87 82 3 38
# [4,] 92 12 78 17
# [5,] 49 4 75 80
The transformation:
m <- array2matrix(a)
dim(m)
# [1] 30 4
head(m)
# [,1] [,2] [,3] [,4]
# [1,] 23 109 61 90
# [2,] 84 15 27 102
# [3,] 68 95 83 24
# [4,] 20 53 117 46
# [5,] 110 62 43 8
# [6,] 67 47 1 54
Proof of reversability:
identical(matrix2array(m), a)
# [1] TRUE
EDIT 3, "WRAP UP of all code"
Creating fake data:
dims <- c(5,4,2)
(a <- array(seq(prod(dims)), dim=dims))
# , , 1
# [,1] [,2] [,3] [,4]
# [1,] 1 6 11 16
# [2,] 2 7 12 17
# [3,] 3 8 13 18
# [4,] 4 9 14 19
# [5,] 5 10 15 20
# , , 2
# [,1] [,2] [,3] [,4]
# [1,] 21 26 31 36
# [2,] 22 27 32 37
# [3,] 23 28 33 38
# [4,] 24 29 34 39
# [5,] 25 30 35 40
(m <- array2matrix(a))
# [,1] [,2] [,3] [,4]
# [1,] 1 6 11 16
# [2,] 2 7 12 17
# [3,] 3 8 13 18
# [4,] 4 9 14 19
# [5,] 5 10 15 20
# [6,] 21 26 31 36
# [7,] 22 27 32 37
# [8,] 23 28 33 38
# [9,] 24 29 34 39
# [10,] 25 30 35 40
# attr(,"origdim")
# [1] 5 4 2
The random-swapping of rows. I'm using 50% here.
pct <- 0.5
nr <- nrow(m)
set.seed(3)
(ind1 <- sample(nr, size = ceiling(nr * pct)))
# [1] 2 8 4 3 9
(ind2 <- sample(ind1))
# [1] 3 2 9 8 4
m[ind1,] <- m[ind2,]
m
# [,1] [,2] [,3] [,4]
# [1,] 1 6 11 16
# [2,] 3 8 13 18
# [3,] 23 28 33 38
# [4,] 24 29 34 39
# [5,] 5 10 15 20
# [6,] 21 26 31 36
# [7,] 22 27 32 37
# [8,] 2 7 12 17
# [9,] 4 9 14 19
# [10,] 25 30 35 40
# attr(,"origdim")
# [1] 5 4 2
(Note that I pre-made ind1 and ind2 here, mostly to see what was going on internally. You can replace m[ind2,] with m[sample(ind1),] for the same effect.)
BTW: if we had instead used a seed of 2, we would notice that 2 rows are not swapped:
set.seed(2)
(ind1 <- sample(nr, size = ceiling(nr * pct)))
# [1] 2 7 5 10 6
(ind2 <- sample(ind1))
# [1] 6 2 5 10 7
Because of this, I chose a seed of 3 for demonstration. However, this may give the appearance of things not working. Lacking more controlling code, sample does not ensure that positions change: it is certainly reasonable to expect that "randomly swap rows" could randomly choose to move row 2 to row 2. Take for example:
set.seed(267)
(ind1 <- sample(nr, size = ceiling(nr * pct)))
# [1] 3 6 5 7 2
(ind2 <- sample(ind1))
# [1] 3 6 5 7 2
The first randomly chooses five rows, and then reorders them randomly into an unchanged order. (I suggest that if you want to force that they are all movements, you should ask a new question asking about just forcing a sample vector to change.)
Anyway, we can regain the original dimensionality with the second function:
(a2 <- matrix2array(m))
# , , 1
# [,1] [,2] [,3] [,4]
# [1,] 1 6 11 16
# [2,] 3 8 13 18
# [3,] 23 28 33 38
# [4,] 24 29 34 39
# [5,] 5 10 15 20
# , , 2
# [,1] [,2] [,3] [,4]
# [1,] 21 26 31 36
# [2,] 22 27 32 37
# [3,] 2 7 12 17
# [4,] 4 9 14 19
# [5,] 25 30 35 40
In the first plane of the array, rows 1 and 5 are unchanged; in the second plane, rows 1, 2, and 5 are unchanged. Five rows the same, five rows moved around (but otherwise unchanged within each row).
I am following the thread 2d matrix to 3d stacked array in r and have a clarification on the aperm function.
1) I get the first part of the solution, but did not understand the c(2,1,3) used in the function. Could you kindly clarify that?
2) Also I am trying a slight variation of the example in that thread.
My case is as follows:
For a similar matrix in example:
set.seed(1)
mat <- matrix(sample(100, 12 * 5, TRUE), ncol = 5)
[,1] [,2] [,3] [,4] [,5]
[1,] 27 69 27 80 74
[2,] 38 39 39 11 70
[3,] 58 77 2 73 48
[4,] 91 50 39 42 87
[5,] 21 72 87 83 44
[6,] 90 100 35 65 25
[7,] 95 39 49 79 8
[8,] 67 78 60 56 10
[9,] 63 94 50 53 32
[10,] 7 22 19 79 52
[11,] 21 66 83 3 67
[12,] 18 13 67 48 41
I am trying to rearrange such that I have a 3 (row) X 5 (col) x 11 (third dim) array.
So, essentially the rows would overlap and show something like:
,,1
27 69 27 80 74
38 39 39 11 70
58 77 2 73 48
,,2
38 39 39 11 70
58 77 2 73 48
91 50 39 42 87
,,3
58 77 2 73 48
91 50 39 42 87
21 72 87 83 44
and so on until we hit ,,11
Would someone have any experience with this?
Thanks!
Just stumbled over this question. Though the answer comes a little late, here are two options for you.
First, you need to extend mat in such a way that it's rows overlap. We can use this vector for row indexing.
#[1] 1 2 3 2 3 4 3 4 5 4 5 6 5 6 7 6 7 8 7 8 9 8 9 10 9 10 11 10 11 12
I used rollapply from the zoo package to create it as follows:
library(zoo)
row_nums <- c(t(rollapply(1:nrow(mat), width = 3, FUN = rep, 1)))
mat <- mat[row_nums, ]
dim(mat)
#[1] 30 5
Now use the matsplitter function that #Mr.Flick provided in this answer (please consider to upvote his answer) to get the desired output:
matsplitter(mat, 3, 5)
#, , 1
#
# [,1] [,2] [,3] [,4] [,5]
#[1,] 27 69 27 80 74
#[2,] 38 39 39 11 70
#[3,] 58 77 2 73 48
#
#, , 2
#
# [,1] [,2] [,3] [,4] [,5]
#[1,] 38 39 39 11 70
#[2,] 58 77 2 73 48
#[3,] 91 50 39 42 87
#
#, , 3
#
# [,1] [,2] [,3] [,4] [,5]
#[1,] 58 77 2 73 48
#[2,] 91 50 39 42 87
#[3,] 21 72 87 83 44
#
#, , 4
# ...
Note that you will end up with an array of dimension 3 x 5 x 10, not 11.
matsplitter <- function(M, r, c) {
rg <- (row(M) - 1) %/% r + 1
cg <- (col(M) - 1) %/% c + 1
rci <- (rg - 1) * max(cg) + cg
N <- prod(dim(M)) / r / c
cv <- unlist(lapply(1:N, function(x)
M[rci == x]))
dim(cv) <- c(r, c, N)
cv
}
Here is a solution using aperm as in the linked answer (assuming that mat was extended as above and is of dimension 30 x 5).
aperm(`dim<-`(t(mat), list(5, 3, 10)), c(2, 1, 3))
t(mat): transposes mat (new dimension: 5 x 30)
`dim<-`(t(mat), list(5, 3, 10)): changes the dimension of t(mat) from 5 X 30 to 5 x 3 x 10
aperm(..., c(2, 1, 3)) permutes the dimensions of the array `dim<-`(t(mat), list(5, 3, 10)) from 5 x 3 x 10 to 3 x 5 x 10, i.e. the second dimension becomes the first, the first
dimension becomes the second and the third dimension stays the same.
i have a matrix data frame 6940 rows and 100 columns. I need to find 5 days cumulative at a time on the data set. Right now I was able to build a for loop code for this as follows :
cum<- matrix(data=q1,nrow=6940,ncol=100)
for (j in 1:100){
for (i in 1:6940){
cum[i,j]<-sum(q1[i,j],q1[i+1,j],q1[i+2,j],q1[i+3,j],q1[i+4,j],na.rm=T)
}
}
I wanted to know whether there is any function in apply family to do the same, as this code is very time consuming.
for example if i generate a data frame using the command
ens <- matrix(rnorm(200),20)
I want cumulative sum of 5 rows a time. i.e sum of row1:row5, row2:row6, row3:row7 and so on in a form of data frame.
i tried using apply function in this form :
apply(apply(apply(apply( apply(m, 2, cumsum),2, cumsum), 2, cumsum),2,cumsum),2,cumsum)
but the problem is I don't get the cumulative in blocks of 5, only an overall cumulative.
Here is one approach using the stats::filter function to calculate the rolling sums and apply to loop over the columns:
m <- matrix(1:48, ncol = 4)
# [,1] [,2] [,3] [,4]
# [1,] 1 13 25 37
# [2,] 2 14 26 38
# [3,] 3 15 27 39
# [4,] 4 16 28 40
# [5,] 5 17 29 41
# [6,] 6 18 30 42
# [7,] 7 19 31 43
# [8,] 8 20 32 44
# [9,] 9 21 33 45
#[10,] 10 22 34 46
#[11,] 11 23 35 47
#[12,] 12 24 36 48
apply(m, 2, filter, filter = rep(1, 5), sides = 1)
# [,1] [,2] [,3] [,4]
# [1,] NA NA NA NA
# [2,] NA NA NA NA
# [3,] NA NA NA NA
# [4,] NA NA NA NA
# [5,] 15 75 135 195
# [6,] 20 80 140 200
# [7,] 25 85 145 205
# [8,] 30 90 150 210
# [9,] 35 95 155 215
#[10,] 40 100 160 220
#[11,] 45 105 165 225
#[12,] 50 110 170 230
This might have to be adjusted depending on how you want to handle windows with less than 5 values (e.g., here in the beginning).
Another option is roll_sum (Data from #Roland's post)
library(RcppRoll)
apply(m, 2, roll_sumr, 5)
# [,1] [,2] [,3] [,4]
# [1,] NA NA NA NA
# [2,] NA NA NA NA
# [3,] NA NA NA NA
# [4,] NA NA NA NA
# [5,] 15 75 135 195
# [6,] 20 80 140 200
# [7,] 25 85 145 205
# [8,] 30 90 150 210
# [9,] 35 95 155 215
#[10,] 40 100 160 220
#[11,] 45 105 165 225
#[12,] 50 110 170 230
As #alexis_laz mentioned in the comments, roll_sumr can take matrix as well. It is more efficient.
roll_sumr(m, 5, by = 1)
Benchmarks
set.seed(24)
m1 <- matrix(sample(1:50, 5000*5000, replace=TRUE), ncol=5000)
system.time(apply(m1, 2, roll_sumr, 5))
# user system elapsed
# 1.84 0.16 1.99
system.time(roll_sumr(m1, 5, by = 1))
# user system elapsed
# 0.59 0.15 0.74
system.time(apply(m1, 2, stats::filter, filter = rep(1, 5), sides = 1))
# user system elapsed
# 4.46 0.20 4.68
Another approach, less sophisticated: Created 5 variable and sum by the variable 5 time.
Here:
m <- data.table(matrix(1:48, ncol = 4))
m[, index := .I]
m[, i1 := floor((index - 1) / 5) * 5 + 1]
m[, i2 := floor((index - 2) / 5) * 5 + 2]
m[, i3 := floor((index - 3) / 5) * 5 + 3]
m[, i4 := floor((index - 4) / 5) * 5 + 4]
m[, i5 := floor((index - 5) / 5) * 5 + 5]
cumsumm <- rbindlist(list(m[, list(value = sum(V1)), by = "i1"]
, m[, list(value = sum(V1)), by = "i2"]
, m[, list(value = sum(V1)), by = "i3"]
, m[, list(value = sum(V1)), by = "i4"]
, m[, list(value = sum(V1)), by = "i5"]), use.names=F)[i1 > 0, ]
I have read the description of by.column for rollapply in the manual but I couldn't understand how to use it. see below:
x=matrix(1:60,nrow=10)
library('zoo')
rollapply(x,3,mean,fill=NA,align="right",by.column=FALSE)
[1] NA NA 27 28 29 30 31 32 33 34
when i use by.column= FALSE: it applies mean to width (3) rolling number of lines mean(x[1:3,])
now, if I use by.column=TRUE then I get:
x=matrix(1:60,nrow=10)
rollapply(x,3,mean,fill=NA,align="right",by.column=TRUE)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] NA NA NA NA NA NA
[2,] NA NA NA NA NA NA
[3,] 2 12 22 32 42 52
[4,] 3 13 23 33 43 53
[5,] 4 14 24 34 44 54
[6,] 5 15 25 35 45 55
[7,] 6 16 26 36 46 56
[8,] 7 17 27 37 47 57
[9,] 8 18 28 38 48 58
[10,] 9 19 29 39 49 59
I can't make sense of the result. could anyone please explain what's the use of by.column and maybe provide an example?
by.column = TRUE (which is the default) with FUN = mean does a rolling mean separately for each column. The ith column of the result would be:
rollapplyr(x[, i], 3, mean, fill = NA)
by.column = FALSE inputs all columns at once to the function so in this case it would be the same as:
c(NA, NA, sapply(1:8, function(ix) mean(x[seq(ix, ix+2), ])))
How can I melt a lower half triangle plus diagonal matrix ?
11 NA NA NA NA
12 22 NA NA NA
13 23 33 NA NA
14 24 34 44 NA
15 25 35 45 55
A <- t(matrix (c(11, NA, NA, NA, NA, 12, 22, NA, NA, NA,
13, 23, 33, NA, NA, 14, 24, 34, 44, NA,15, 25,
35, 45, 55), ncol = 5))
> A
[,1] [,2] [,3] [,4] [,5]
[1,] 11 NA NA NA NA
[2,] 12 22 NA NA NA
[3,] 13 23 33 NA NA
[4,] 14 24 34 44 NA
[5,] 15 25 35 45 55
To data.frame in row and col (preserving the following order)
col row value
1 1 11
1 2 12
1 3 13
1 4 14
1 5 15
2 2 22
2 3 23
2 4 24
2 5 25
3 3 33
3 4 34
3 5 35
4 4 44
4 5 45
5 5 55
If you want the indices as columns as well, this should work:
m <- matrix(1:25,5,5)
m[upper.tri(m)] <- NA
m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 NA NA NA NA
[2,] 2 7 NA NA NA
[3,] 3 8 13 NA NA
[4,] 4 9 14 19 NA
[5,] 5 10 15 20 25
cbind(which(!is.na(m),arr.ind = TRUE),na.omit(as.vector(m)))
row col
[1,] 1 1 1
[2,] 2 1 2
[3,] 3 1 3
[4,] 4 1 4
[5,] 5 1 5
[6,] 2 2 7
[7,] 3 2 8
[8,] 4 2 9
[9,] 5 2 10
[10,] 3 3 13
[11,] 4 3 14
[12,] 5 3 15
[13,] 4 4 19
[14,] 5 4 20
[15,] 5 5 25
I guess I'll explain this a bit. I'm using three "tricks":
The arr.ind argument to which to get the indices
The very useful na.omit function to avoid some extra typing
The fact that R stores matrices in column major form, hence as.vector returns the values in the right order.
My one liner.
reshape2::melt(A, varnames = c('row', 'col'), na.rm = TRUE)
Here's my first solution:
test <- rbind(c(11,NA,NA,NA,NA),
c(12,22,NA,NA,NA),
c(13,23,33,NA,NA),
c(14,24,34,44,NA),
c(15,25,35,45,55)) ## Load the matrix
test2 <- as.vector(test) ## "melt" it into a vector
test <- cbind( test2[!is.na(test2)] ) ## get rid of NAs, cbind it into a column
Results are:
> test
[,1]
[1,] 11
[2,] 12
[3,] 13
[4,] 14
[5,] 15
[6,] 22
[7,] 23
[8,] 24
[9,] 25
[10,] 33
[11,] 34
[12,] 35
[13,] 44
[14,] 45
[15,] 55
Alternatively, you can use the matrix command:
test <- rbind(c(11,NA,NA,NA,NA),
c(12,22,NA,NA,NA),
c(13,23,33,NA,NA),
c(14,24,34,44,NA),
c(15,25,35,45,55)) ## Load the matrix
test2 <- matrix(test, ncol=1)
test <- cbind( test2[!is.na(test2), ] )
## same as above, except now explicitly noting rows to replace.
Here is my attempt:
# enter the data
df <- c(11,12,13,14,15,NA,22,23,24,25,NA,NA,33,34,35,NA,NA,NA,44,45,NA,NA,NA,NA,55)
dim(df) <- c(5,5)
df
# make new data frame with rows and column indicators
melteddf <- data.frame(
value=df[lower.tri(df,diag=T)],
col=rep(1:ncol(df),ncol(df):1),
row=unlist(sapply(1:nrow(df),function(x) x:nrow(df)))
)
I wish I knew about the arr.ind part of cbind which before now though.
Here is a method using arrayInd which is basically the same as #joran's but might be useful in other settings:
na.omit( data.frame(arrayInd(1:prod(dim(A)), dim(A)), value=c(A)) )
X1 X2 value
1 1 1 11
2 2 1 12
3 3 1 13
4 4 1 14
5 5 1 15
7 2 2 22
8 3 2 23
9 4 2 24
10 5 2 25
13 3 3 33
14 4 3 34
15 5 3 35
19 4 4 44
20 5 4 45
25 5 5 55