Related
n <- 5
a <- matrix(c(1:n**2),nrow = n, byrow = T)
output is
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
[4,] 16 17 18 19 20
[5,] 21 22 23 24 25
how do I shift the '1' to the current position of '25' to look like this:
[,1] [,2] [,3] [,4] [,5]
[1,] 2 3 4 5 6
[2,] 7 8 9 10 11
[3,] 12 13 14 15 16
[4,] 17 18 19 20 21
[5,] 22 23 24 25 1
a <- t(a); a[] <- c(a[-1], a[1]); a <- t(a)
a
# [,1] [,2] [,3] [,4] [,5]
# [1,] 2 3 4 5 6
# [2,] 7 8 9 10 11
# [3,] 12 13 14 15 16
# [4,] 17 18 19 20 21
# [5,] 22 23 24 25 1
c(a) unwinds or unlists the matrix into a vector. It does this column-first, so c(a) results in [1] 1 6 11 16 21 2 .... We want it to be row-first, though, so
t(a) transposes it, so that what was a row-first is now column-first, allowing c(a) and such to work.
c(a[-1], a[1]) is just "concatenate all except the first with the first", the classic way to put the first element of a vector at the end.
a[] <- is a way to do calcs on its values where the calcs do not preserve the "dimensionality" of the object.
After we've rearranged, we then transpose back to the original shape and row/column-order.
Here is a base R one-liner
> t(`dim<-`(t(a)[seq_along(a)%%length(a)+1],rev(dim(a))))
[,1] [,2] [,3] [,4] [,5]
[1,] 2 3 4 5 6
[2,] 7 8 9 10 11
[3,] 12 13 14 15 16
[4,] 17 18 19 20 21
[5,] 22 23 24 25 1
I am a newbie in R, I now have a matrix of 3 columns and 8, 000 rows, with groups of 500 rows, which means 16 sets of 500*3 matrices stacked on top of each other in rows. Now I want to take the first 300 rows of each group of matrices, put 16 groups of 300 by 3 into a new matrix, what do I do?
Two 6 * 2 matrices on top of each other:
m <- matrix(1:24, ncol = 2)
# [,1] [,2]
# [1,] 1 13
# [2,] 2 14
# [3,] 3 15
# [4,] 4 16
# [5,] 5 17
# [6,] 6 18
# [7,] 7 19
# [8,] 8 20
# [9,] 9 21
#[10,] 10 22
#[11,] 11 23
#[12,] 12 24
Make it an array:
a <- array(m, c(6, 2, 2))
a <- aperm(a, c(1, 3, 2))
First three rows of each matrix:
a[1:3,,]
#, , 1
#
# [,1] [,2]
#[1,] 1 13
#[2,] 2 14
#[3,] 3 15
#
#, , 2
#
# [,1] [,2]
#[1,] 7 19
#[2,] 8 20
#[3,] 9 21
Use this if you need a matrix:
matrix(aperm(a[1:3,,], c(1, 3, 2)), ncol = 2)
# [,1] [,2]
#[1,] 1 13
#[2,] 2 14
#[3,] 3 15
#[4,] 7 19
#[5,] 8 20
#[6,] 9 21
You need to generate the sequence 1:300, 501:800, ... etc, then subset out these rows. If your matrix is called mat you can do that like this:
new_mat <- mat[as.numeric(sapply((0:15 * 500), "+", 1:300)),]
If you're looking for just a way to select the first 300 rows from your matrix for each group, this could be a solution.
Given m your matrix of 8000x3 composed by 16 groups on top of each other, then:
r <- 500 # rows for each group
g <- 16 # number of groups
n <- 300 # first n rows to select
new_m <- m[rep(rep(c(T,F), c(n,r-n)), g), ]
dim(new_m)
#> [1] 4800 3
new_m is now a matrix 4800x3
In case you are working with keras or reticulate, you could use array_reshape.
#### 0. parameters
nrows <- 4 # 500 in your example # rows for each group
ncols <- 3 # 3 in your example
ngrps <- 2 # 16 in your example # number of groups
nslct <- 3 # 300 in your example # first n rows to select
#### 1. create an example matrix
m <- matrix(1:24, nrows*ngrps, ncols)
m
#> [,1] [,2] [,3]
#> [1,] 1 9 17
#> [2,] 2 10 18
#> [3,] 3 11 19
#> [4,] 4 12 20
#> [5,] 5 13 21
#> [6,] 6 14 22
#> [7,] 7 15 23
#> [8,] 8 16 24
dim(m)
#> [1] 8 3
#--> c(ngrps * nrows, ncols)
#### 2. reshape in groups
m <- reticulate::array_reshape(m, c(ngrps,nrows,ncols))
dim(m)
#> [1] 2 4 3
# --> c(n_groups, n_rows, n_cols)
m[1,,]
#> [,1] [,2] [,3]
#> [1,] 1 9 17
#> [2,] 2 10 18
#> [3,] 3 11 19
#> [4,] 4 12 20
m[2,,]
#> [,1] [,2] [,3]
#> [1,] 5 13 21
#> [2,] 6 14 22
#> [3,] 7 15 23
#> [4,] 8 16 24
#### 3. select first nslct rows for each group
new_m <- m[,seq_len(nslct),]
# that's the result for each group
new_m[1,,]
#> [,1] [,2] [,3]
#> [1,] 1 9 17
#> [2,] 2 10 18
#> [3,] 3 11 19
new_m[2,,]
#> [,1] [,2] [,3]
#> [1,] 5 13 21
#> [2,] 6 14 22
#> [3,] 7 15 23
#### 4. recreate one matrix
reticulate::array_reshape(new_m, c(nslct*ngrps,ncols))
#> [,1] [,2] [,3]
#> [1,] 1 9 17
#> [2,] 2 10 18
#> [3,] 3 11 19
#> [4,] 5 13 21
#> [5,] 6 14 22
#> [6,] 7 15 23
Created on 2020-11-23 by the reprex package (v0.3.0)
A simple way to generate an array that mimics the fill of your matrix is to use the transpose of the matrix as the input for the array function. Here is a simple example:
n <- rep(1:3, each = 4)
m1 <- matrix(n, ncol = 2, byrow = TRUE)
> m1
[,1] [,2]
[1,] 1 1
[2,] 1 1
[3,] 2 2
[4,] 2 2
[5,] 3 3
[6,] 3 3
m2 <- t(m1)
a1 <- array(m2, c(2, 2, 3))
> a1
, , 1
[,1] [,2]
[1,] 1 1
[2,] 1 1
, , 2
[,1] [,2]
[1,] 2 2
[2,] 2 2
, , 3
[,1] [,2]
[1,] 3 3
[2,] 3 3
I have a number of subarrays, say 2 (for simplicity), each with the same number of rows and columns. Each spot in the subarrays is occupied by a number in [1, 10].
What I would like to do is move rows randomly between subarrays according to some rate of movement m = [0, 1]. m = 0 corresponds to no movement, while m = 1 means that any rows across all subarrays can be moved.
I take inspiration from:
How to swap a number of the values between 2 rows in R
but my problem is a bit different than this. I do know that sample() would be needed here.
Is there an easy way to go about accomplishing this?
This doesn't do it, but I believe I'm on the right track anyway.
m <- 0.2
a <- array(dim = c(5, 5, 2)) # 5 rows, 5 columns, 2 subarrays
res <- rep(sample(nrow(a), size = ceiling(nrow(a)*m), replace = FALSE)) # sample 20% of rows from array a.
Any assistance is appreciated.
It is significantly easier if you can use a matrix (2-dim array).
set.seed(2)
m <- 0.2
d <- c(10, 4)
a <- array(sample(prod(d)), dim = d)
a
# [,1] [,2] [,3] [,4]
# [1,] 8 17 14 1
# [2,] 28 37 40 26
# [3,] 22 38 16 29
# [4,] 7 35 3 32
# [5,] 34 11 23 4
# [6,] 36 33 19 31
# [7,] 5 24 30 13
# [8,] 39 6 27 25
# [9,] 15 10 12 9
# [10,] 18 2 21 20
(I'm going to set the seed again to something that conveniently gives me something "interesting" to show.)
set.seed(2)
ind <- which(runif(d[1]) < m)
ind
# [1] 1 4 7
The first randomness, runif, is compared against m and generates the indices that may change. The second randomness, sample below, takes those indices and possibly reorders them. (In this case, it reorders "1,4,7" to "4,1,7", meaning the third of the rows-that-may-change will be left unchanged.)
a[ind,] <- a[sample(ind),]
a
# [,1] [,2] [,3] [,4]
# [1,] 7 35 3 32 # <-- row 4
# [2,] 28 37 40 26
# [3,] 22 38 16 29
# [4,] 8 17 14 1 # <-- row 1
# [5,] 34 11 23 4
# [6,] 36 33 19 31
# [7,] 5 24 30 13 # <-- row 7, unchanged
# [8,] 39 6 27 25
# [9,] 15 10 12 9
# [10,] 18 2 21 20
Note that this is probabilistic, which means a probability of 0.2 does not guarantee you 20% (or even any) of the rows will be swapped.
(Since I'm guessing you'd really like to preserve your 3-dim (or even n-dim) array, you might be able to use aperm to transfer between array <--> matrix.)
EDIT 1
As an alternative to a probabilitic use of runif, you can use:
ind <- head(sample(d[1]),size=d[1]*m)
to get closer to your goal of "20%". Since d[1]*m will often not be an integer, head silently truncates/floors the number, so you'll get the price-is-right winner: closest to but not over your desired percentage.
EDIT 2
A reversible method for transforming an n-dimensional array into a matrix and back again. Caveat: though the logic appears solid, my testing has only included a couple arrays.
array2matrix <- function(a) {
d <- dim(a)
ind <- seq_along(d)
a2 <- aperm(a, c(ind[2], ind[-2]))
dim(a2) <- c(d[2], prod(d[-2]))
a2 <- t(a2)
attr(a2, "origdim") <- d
a2
}
The reversal uses the "origdim" attribute if still present; this will work as long as your modifications to the matrix do not clear its attributes. (Simple row-swapping does not.)
matrix2array <- function(m, d = attr(m, "origdim")) {
ind <- seq_along(d)
m2 <- t(m)
dim(m2) <- c(d[2], d[-2])
aperm(m2, c(ind[2], ind[-2]))
}
(These two functions should probably do some more error-checks, such as is.null(d).)
A sample run:
set.seed(2)
dims <- 5:2
a <- array(sample(prod(dims)), dim=dims)
Quick show:
a[,,1,1:2,drop=FALSE]
# , , 1, 1
# [,1] [,2] [,3] [,4]
# [1,] 23 109 61 90
# [2,] 84 15 27 102
# [3,] 68 95 83 24
# [4,] 20 53 117 46
# [5,] 110 62 43 8
# , , 1, 2
# [,1] [,2] [,3] [,4]
# [1,] 118 25 14 93
# [2,] 65 21 16 77
# [3,] 87 82 3 38
# [4,] 92 12 78 17
# [5,] 49 4 75 80
The transformation:
m <- array2matrix(a)
dim(m)
# [1] 30 4
head(m)
# [,1] [,2] [,3] [,4]
# [1,] 23 109 61 90
# [2,] 84 15 27 102
# [3,] 68 95 83 24
# [4,] 20 53 117 46
# [5,] 110 62 43 8
# [6,] 67 47 1 54
Proof of reversability:
identical(matrix2array(m), a)
# [1] TRUE
EDIT 3, "WRAP UP of all code"
Creating fake data:
dims <- c(5,4,2)
(a <- array(seq(prod(dims)), dim=dims))
# , , 1
# [,1] [,2] [,3] [,4]
# [1,] 1 6 11 16
# [2,] 2 7 12 17
# [3,] 3 8 13 18
# [4,] 4 9 14 19
# [5,] 5 10 15 20
# , , 2
# [,1] [,2] [,3] [,4]
# [1,] 21 26 31 36
# [2,] 22 27 32 37
# [3,] 23 28 33 38
# [4,] 24 29 34 39
# [5,] 25 30 35 40
(m <- array2matrix(a))
# [,1] [,2] [,3] [,4]
# [1,] 1 6 11 16
# [2,] 2 7 12 17
# [3,] 3 8 13 18
# [4,] 4 9 14 19
# [5,] 5 10 15 20
# [6,] 21 26 31 36
# [7,] 22 27 32 37
# [8,] 23 28 33 38
# [9,] 24 29 34 39
# [10,] 25 30 35 40
# attr(,"origdim")
# [1] 5 4 2
The random-swapping of rows. I'm using 50% here.
pct <- 0.5
nr <- nrow(m)
set.seed(3)
(ind1 <- sample(nr, size = ceiling(nr * pct)))
# [1] 2 8 4 3 9
(ind2 <- sample(ind1))
# [1] 3 2 9 8 4
m[ind1,] <- m[ind2,]
m
# [,1] [,2] [,3] [,4]
# [1,] 1 6 11 16
# [2,] 3 8 13 18
# [3,] 23 28 33 38
# [4,] 24 29 34 39
# [5,] 5 10 15 20
# [6,] 21 26 31 36
# [7,] 22 27 32 37
# [8,] 2 7 12 17
# [9,] 4 9 14 19
# [10,] 25 30 35 40
# attr(,"origdim")
# [1] 5 4 2
(Note that I pre-made ind1 and ind2 here, mostly to see what was going on internally. You can replace m[ind2,] with m[sample(ind1),] for the same effect.)
BTW: if we had instead used a seed of 2, we would notice that 2 rows are not swapped:
set.seed(2)
(ind1 <- sample(nr, size = ceiling(nr * pct)))
# [1] 2 7 5 10 6
(ind2 <- sample(ind1))
# [1] 6 2 5 10 7
Because of this, I chose a seed of 3 for demonstration. However, this may give the appearance of things not working. Lacking more controlling code, sample does not ensure that positions change: it is certainly reasonable to expect that "randomly swap rows" could randomly choose to move row 2 to row 2. Take for example:
set.seed(267)
(ind1 <- sample(nr, size = ceiling(nr * pct)))
# [1] 3 6 5 7 2
(ind2 <- sample(ind1))
# [1] 3 6 5 7 2
The first randomly chooses five rows, and then reorders them randomly into an unchanged order. (I suggest that if you want to force that they are all movements, you should ask a new question asking about just forcing a sample vector to change.)
Anyway, we can regain the original dimensionality with the second function:
(a2 <- matrix2array(m))
# , , 1
# [,1] [,2] [,3] [,4]
# [1,] 1 6 11 16
# [2,] 3 8 13 18
# [3,] 23 28 33 38
# [4,] 24 29 34 39
# [5,] 5 10 15 20
# , , 2
# [,1] [,2] [,3] [,4]
# [1,] 21 26 31 36
# [2,] 22 27 32 37
# [3,] 2 7 12 17
# [4,] 4 9 14 19
# [5,] 25 30 35 40
In the first plane of the array, rows 1 and 5 are unchanged; in the second plane, rows 1, 2, and 5 are unchanged. Five rows the same, five rows moved around (but otherwise unchanged within each row).
Suppose I have a matrix m and a positive integer vector v, what I want to do is get a new matrix m_new and each row of m (say m[i, ]) are replicated by v[i] times in m_new. For example:
m = matrix(1:6, nrow = 3)
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
v = c(3, 1, 2)
And m_new should be:
[,1] [,2]
[1,] 1 4 # m[1, ] is replicated by
[2,] 1 4 # v[1] = 3
[3,] 1 4 # times
[4,] 2 5
[5,] 3 6
[6,] 3 6
A for loop will make it for the small case:
m_new = matrix(0, sum(v), ncol(m))
k = 1
for(i in 1:nrow(m)){
for(j in k:(k+v[i]-1)){
m_new[j, ] = m[i, ]
}
k = k + v[i]
}
, but the row number of m in real world is usually big. Is there any effient way to do this?
m[rep(1:nrow(m), times = v), ]
# [,1] [,2]
# [1,] 1 4
# [2,] 1 4
# [3,] 1 4
# [4,] 2 5
# [5,] 3 6
# [6,] 3 6
> m <- matrix(1:25, ncol=5)
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25
> apply(m, 2, function(c) rep(c,v))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 2 7 12 17 22
[4,] 3 8 13 18 23
[5,] 3 8 13 18 23
[6,] 3 8 13 18 23
[7,] 4 9 14 19 24
[8,] 4 9 14 19 24
[9,] 4 9 14 19 24
[10,] 4 9 14 19 24
[11,] 5 10 15 20 25
[12,] 5 10 15 20 25
[13,] 5 10 15 20 25
[14,] 5 10 15 20 25
[15,] 5 10 15 20 25
I have four tables. Each of them got 4 rows and 4 columns. Followings are the four tables.
For the 1st table,
t1 <- array(1:20, dim=c(4,4))
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
For the 2nd table,
t2 <- array(6:25, dim=c(4,4))
[,1] [,2] [,3] [,4]
[1,] 6 10 14 18
[2,] 7 11 15 19
[3,] 8 12 16 20
[4,] 9 13 17 21
For the 3rd table,
t3 <- array(11:30, dim=c(4,4))
[,1] [,2] [,3] [,4]
[1,] 11 15 19 23
[2,] 12 16 20 24
[3,] 13 17 21 25
[4,] 14 18 22 26
For the 4th table,
t4 <- array(21:30, dim=c(4,4))
[,1] [,2] [,3] [,4]
[1,] 21 25 29 23
[2,] 22 26 30 24
[3,] 23 27 21 25
[4,] 24 28 22 26
For each tables, I got a fixed set of y-value.
t1 = 0.1
t2 = 3
t4 = 0.5
t6 = 7
In other words:
y <- c( 0.1, 3, 0.75, 7)
Then, I want to extract x values from each of the cell in the four tables. That is for the [1,1] cell, the x-values extacted should be (0.1, 3, 0.5, 7). We repeats this step one by one till the end of the table, i.e. the [4,4] cell. Thus, I got a total of 16 sets of x-values as folows:
cell x-values
[1,1] (1,6,11,21)
[1,2] (5,10,15,25)
…..
[4,4] (16, 21,26,26)
Then I try to calculate the R2 for linear regression for each y-x pairs. In other word, I want to got a total of 16 R2 values as follows:
For [1,1] cell, linear regression between (0.1, 3, 0.5, 7) and (1,6,11,21) = 0.6853
For [1,2] cell, linear regression between (0.1, 3, 0.5, 7) and (5,10,15,25) = 0.6853
…..
For [4,4] cell, linear regression between (0.1, 3, 0.5, 7) and (16, 21,26,26) = 0.2719
Finally, I want to get a table with the following two columns
cell R2
[1,1] 0.6853
[1,2] 0.6853
….
[4,4] 0.2719
I learnt that to do linear regression for x and y series of data, I can use following command:
Rcoefficient <- summary(lm(y ~ x, data=faithful))$r.squared
However, I have trouble readin each set of x-values from the four tables. I tried to use reshape, but I still cannot get it right. Could experts in Stackoverflow, help to suggest an efficient way to do it with R, as my real tables are very large with over 1000 columns and rows.
Thanks a lot.
I would[*] manipulate the arrays in place, by concatenating them into a 4 x 4 x 4 array:
t1 <- array(1:20, dim=c(4,4))
t2 <- array(6:25, dim=c(4,4))
t3 <- array(11:30, dim=c(4,4))
t4 <- array(21:30, dim=c(4,4))
tt <- array(c(t1,t2,t3,t4), dim = c(4,4,4))
## now you can remove the original arrays
which gives:
> tt
, , 1
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
, , 2
[,1] [,2] [,3] [,4]
[1,] 6 10 14 18
[2,] 7 11 15 19
[3,] 8 12 16 20
[4,] 9 13 17 21
, , 3
[,1] [,2] [,3] [,4]
[1,] 11 15 19 23
[2,] 12 16 20 24
[3,] 13 17 21 25
[4,] 14 18 22 26
, , 4
[,1] [,2] [,3] [,4]
[1,] 21 25 29 23
[2,] 22 26 30 24
[3,] 23 27 21 25
[4,] 24 28 22 26
Then we use aperm() to rearrange the dimensions of the array so that the indices you requested are in the right order. We create a matrix from this array as a final step.
X <- matrix(aperm(tt, c(3,1,2)), ncol = 4, byrow = TRUE)
The aperm(tt, c(3,1,2)) step produces
> aperm(tt, c(3,1,2))
, , 1
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 6 7 8 9
[3,] 11 12 13 14
[4,] 21 22 23 24
, , 2
[,1] [,2] [,3] [,4]
[1,] 5 6 7 8
[2,] 10 11 12 13
[3,] 15 16 17 18
[4,] 25 26 27 28
, , 3
[,1] [,2] [,3] [,4]
[1,] 9 10 11 12
[2,] 14 15 16 17
[3,] 19 20 21 22
[4,] 29 30 21 22
, , 4
[,1] [,2] [,3] [,4]
[1,] 13 14 15 16
[2,] 18 19 20 21
[3,] 23 24 25 26
[4,] 23 24 25 26
where the indices you want are in columns, which we exploit when creating the matrix as R will treat the permuted array as a vector filled from the columns of the permuted array. X results in
> X
[,1] [,2] [,3] [,4]
[1,] 1 6 11 21
[2,] 2 7 12 22
[3,] 3 8 13 23
[4,] 4 9 14 24
[5,] 5 10 15 25
[6,] 6 11 16 26
[7,] 7 12 17 27
[8,] 8 13 18 28
[9,] 9 14 19 29
[10,] 10 15 20 30
[11,] 11 16 21 21
[12,] 12 17 22 22
[13,] 13 18 23 23
[14,] 14 19 24 24
[15,] 15 20 25 25
[16,] 16 21 26 26
Then we can proceed as per #January's answer and fit the regression (though note I explicitly pass in y as the scoping rules of lm() are non-standard and I'm being defensive.)
y <- c( 0.1, 3, 0.75, 7)
r2 <- apply(X, 1, function(x, y) summary(lm(y ~ x))$r.squared, y = y)
This results in:
> head(r2)
[1] 0.7160542 0.7160542 0.7160542 0.7160542 0.7160542 0.7160542
Note that there is an inconsistency in your text and code. You state the response is (0.1, 3, 0.5, 7) but define y as c( 0.1, 3, 0.75, 7). The results I show use the latter but your results used the former, hence the difference.
[*] without knowing more about the context I'm not sure I'd be wanting to fit millions of linear models...
First, let's reformat the data.
EDIT: this code is less then optimal, see Gavins solution in the other answer.
t <- NULL
for( row in 1:nrow( t1 ) ) {
for( col in 1:ncol( t1 ) ) {
t <- rbind( t, c( t1[ row, col ], t2[ row, col ], t3[ row, col ], t4[ row, col ] ) )
}
}
This will produce a matrix with four columns (one for each table), and nrow * ncol rows - as many rows as you have cells in one table. Check it with dim( t ). Running regression is now easy:
apply( t, 1, function( x ) { summary( lm( y ~ x ) )$r.squared )