Initializing data frames from matrixes - r

I am having some problem understanding how to initialize data frames with matrix. When I execute the following:
m1 = cbind(1:5,11:15)
m2 = cbind(21:25, 31:35)
d = data.frame(m1)
d$m2 = m2
How can I create directly create a dataframe with m1, for which df$m1 would return a matrix, as the df$m2 does in my example?

Use I to specify the matrices should be treated "as is"
> d<-data.frame(m1=I(m1),m2=I(m2))
> d$m1
[,1] [,2]
[1,] 1 11
[2,] 2 12
[3,] 3 13
[4,] 4 14
[5,] 5 15
> d$m2
[,1] [,2]
[1,] 21 31
[2,] 22 32
[3,] 23 33
[4,] 24 34
[5,] 25 35

Related

Sorting specified rows in a matrix by the first column in R

I have a character matrix mtr of n rows and 3 columns.
I have a numeric vector nmb with some numbers, for example, 4,5,6
I want to sort only the rows of mtr, the numbers of which are contained by nmb, by the first column of my matrix.
So in my case I want to leave my matrix untouched except for rows 4,5,6 which I would like to be sorted by the first column and, of course, written back into my matrix mtr.
How could I do that? Thanks.
You can do it in this way:
mtr[nmb,] <- mtr[order(mtr[nmb,1]),]
I think this will do it
mtr[nmb,] <- mtr[nmb,][order(mtr[nmb,1]),]
An example:
nmb <- 4:6
mtr <- matrix(30:1, ncol=3)
> mtr
[,1] [,2] [,3]
[1,] 30 20 10
[2,] 29 19 9
[3,] 28 18 8
[4,] 27 17 7
[5,] 26 16 6
[6,] 25 15 5
[7,] 24 14 4
[8,] 23 13 3
[9,] 22 12 2
[10,] 21 11 1
> mtr[nmb,] <- mtr[nmb,][order(mtr[nmb,1]),]
> mtr
[,1] [,2] [,3]
[1,] 30 20 10
[2,] 29 19 9
[3,] 28 18 8
[4,] 25 15 5 <-
[5,] 26 16 6 <- sorted
[6,] 27 17 7 <-
[7,] 24 14 4
[8,] 23 13 3
[9,] 22 12 2
[10,] 21 11 1

Delete row based on the value of the rows above

I have a the following data set:
data <- cbind(c(1,2,3,4,5,6,7,8,9,10,11),c(1,11,21,60,30,2,61,12,3,35,63))
I would like to select the rows for which the number in the second column is greater than the highest number reached up to that point. The result should look like this.
[,1] [,2]
[1,] 1 1
[2,] 2 11
[3,] 3 21
[4,] 4 60
[5,] 7 61
[6,] 11 63
You want to try cummax:
> d[ d[,2] == cummax(d[,2]) ,]
[,1] [,2]
[1,] 1 1
[2,] 2 11
[3,] 3 21
[4,] 4 60
[5,] 7 61
[6,] 11 63
PS. data is an internal R function, so, since R variables and functions share the namespace (R design was influenced by Scheme, which is a "Lisp-1"), your variable shadows the system function.
The cummax function should work well
data[ data[,2]==cummax(data[,2]),]
returns
[,1] [,2]
[1,] 1 1
[2,] 2 11
[3,] 3 21
[4,] 4 60
[5,] 7 61
[6,] 11 63
as desired.

R: summing over an interval of rows

I'm a beginner R user and I need to write a function that sums the rows of a data frame over a fixed interval (every 4 rows).
I've tried the following code
camp<-function(X){
i<-1
n<-nrow(X)
xc<-matrix(nrow=36,ncol=m)
for (i in 1:n){
xc<-apply(X[i:(i+4),],2,sum)
rownames(xc[i])<-rownames(X[i])
i<-i+5
}
return(xc)
}
the result is "Error in X[i:(i + 4), ] : index out of range".
How can I solve? Any suggestion?
Thanks.
The zoo package has rollapply which is pretty handy for stuff like this...
# Make some data
set.seed(1)
m <- matrix( sample( 10 , 32 , repl = TRUE ) , 8 )
# [,1] [,2] [,3] [,4]
#[1,] 3 7 8 3
#[2,] 4 1 10 4
#[3,] 6 3 4 1
#[4,] 10 2 8 4
#[5,] 3 7 10 9
#[6,] 9 4 3 4
#[7,] 10 8 7 5
#[8,] 7 5 2 6
# Sum every 4 rows
require( zoo )
tmp <- rollapply( m , width = 4 , by = 4 , align = "left" , FUN = sum )
# [,1] [,2] [,3] [,4]
#[1,] 23 13 30 12
#[2,] 29 24 22 24
You can also use rowSums() on the result if you actually wanted to aggregate the columns into a single value for each of the 4 rows...
rowSums( tmp )
#[1] 78 99
Here is a way to do it :
## Sample data
m <- matrix(1:36, nrow=12)
## Create a "group" index
fac <- (seq_len(nrow(m))-1) %/% 4
## Apply sum
apply(m, 2, function(v) tapply(v, fac, sum))
Sample data :
[,1] [,2] [,3]
[1,] 1 13 25
[2,] 2 14 26
[3,] 3 15 27
[4,] 4 16 28
[5,] 5 17 29
[6,] 6 18 30
[7,] 7 19 31
[8,] 8 20 32
[9,] 9 21 33
[10,] 10 22 34
[11,] 11 23 35
[12,] 12 24 36
Result :
[,1] [,2] [,3]
0 10 58 106
1 26 74 122
2 42 90 138

Linear regression for each cell in a table

I have four tables. Each of them got 4 rows and 4 columns. Followings are the four tables.
For the 1st table,
t1 <- array(1:20, dim=c(4,4))
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
For the 2nd table,
t2 <- array(6:25, dim=c(4,4))
[,1] [,2] [,3] [,4]
[1,] 6 10 14 18
[2,] 7 11 15 19
[3,] 8 12 16 20
[4,] 9 13 17 21
For the 3rd table,
t3 <- array(11:30, dim=c(4,4))
[,1] [,2] [,3] [,4]
[1,] 11 15 19 23
[2,] 12 16 20 24
[3,] 13 17 21 25
[4,] 14 18 22 26
For the 4th table,
t4 <- array(21:30, dim=c(4,4))
[,1] [,2] [,3] [,4]
[1,] 21 25 29 23
[2,] 22 26 30 24
[3,] 23 27 21 25
[4,] 24 28 22 26
For each tables, I got a fixed set of y-value.
t1 = 0.1
t2 = 3
t4 = 0.5
t6 = 7
In other words:
y <- c( 0.1, 3, 0.75, 7)
Then, I want to extract x values from each of the cell in the four tables. That is for the [1,1] cell, the x-values extacted should be (0.1, 3, 0.5, 7). We repeats this step one by one till the end of the table, i.e. the [4,4] cell. Thus, I got a total of 16 sets of x-values as folows:
cell x-values
[1,1] (1,6,11,21)
[1,2] (5,10,15,25)
…..
[4,4] (16, 21,26,26)
Then I try to calculate the R2 for linear regression for each y-x pairs. In other word, I want to got a total of 16 R2 values as follows:
For [1,1] cell, linear regression between (0.1, 3, 0.5, 7) and (1,6,11,21) = 0.6853
For [1,2] cell, linear regression between (0.1, 3, 0.5, 7) and (5,10,15,25) = 0.6853
…..
For [4,4] cell, linear regression between (0.1, 3, 0.5, 7) and (16, 21,26,26) = 0.2719
Finally, I want to get a table with the following two columns
cell R2
[1,1] 0.6853
[1,2] 0.6853
….
[4,4] 0.2719
I learnt that to do linear regression for x and y series of data, I can use following command:
Rcoefficient <- summary(lm(y ~ x, data=faithful))$r.squared
However, I have trouble readin each set of x-values from the four tables. I tried to use reshape, but I still cannot get it right. Could experts in Stackoverflow, help to suggest an efficient way to do it with R, as my real tables are very large with over 1000 columns and rows.
Thanks a lot.
I would[*] manipulate the arrays in place, by concatenating them into a 4 x 4 x 4 array:
t1 <- array(1:20, dim=c(4,4))
t2 <- array(6:25, dim=c(4,4))
t3 <- array(11:30, dim=c(4,4))
t4 <- array(21:30, dim=c(4,4))
tt <- array(c(t1,t2,t3,t4), dim = c(4,4,4))
## now you can remove the original arrays
which gives:
> tt
, , 1
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
, , 2
[,1] [,2] [,3] [,4]
[1,] 6 10 14 18
[2,] 7 11 15 19
[3,] 8 12 16 20
[4,] 9 13 17 21
, , 3
[,1] [,2] [,3] [,4]
[1,] 11 15 19 23
[2,] 12 16 20 24
[3,] 13 17 21 25
[4,] 14 18 22 26
, , 4
[,1] [,2] [,3] [,4]
[1,] 21 25 29 23
[2,] 22 26 30 24
[3,] 23 27 21 25
[4,] 24 28 22 26
Then we use aperm() to rearrange the dimensions of the array so that the indices you requested are in the right order. We create a matrix from this array as a final step.
X <- matrix(aperm(tt, c(3,1,2)), ncol = 4, byrow = TRUE)
The aperm(tt, c(3,1,2)) step produces
> aperm(tt, c(3,1,2))
, , 1
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 6 7 8 9
[3,] 11 12 13 14
[4,] 21 22 23 24
, , 2
[,1] [,2] [,3] [,4]
[1,] 5 6 7 8
[2,] 10 11 12 13
[3,] 15 16 17 18
[4,] 25 26 27 28
, , 3
[,1] [,2] [,3] [,4]
[1,] 9 10 11 12
[2,] 14 15 16 17
[3,] 19 20 21 22
[4,] 29 30 21 22
, , 4
[,1] [,2] [,3] [,4]
[1,] 13 14 15 16
[2,] 18 19 20 21
[3,] 23 24 25 26
[4,] 23 24 25 26
where the indices you want are in columns, which we exploit when creating the matrix as R will treat the permuted array as a vector filled from the columns of the permuted array. X results in
> X
[,1] [,2] [,3] [,4]
[1,] 1 6 11 21
[2,] 2 7 12 22
[3,] 3 8 13 23
[4,] 4 9 14 24
[5,] 5 10 15 25
[6,] 6 11 16 26
[7,] 7 12 17 27
[8,] 8 13 18 28
[9,] 9 14 19 29
[10,] 10 15 20 30
[11,] 11 16 21 21
[12,] 12 17 22 22
[13,] 13 18 23 23
[14,] 14 19 24 24
[15,] 15 20 25 25
[16,] 16 21 26 26
Then we can proceed as per #January's answer and fit the regression (though note I explicitly pass in y as the scoping rules of lm() are non-standard and I'm being defensive.)
y <- c( 0.1, 3, 0.75, 7)
r2 <- apply(X, 1, function(x, y) summary(lm(y ~ x))$r.squared, y = y)
This results in:
> head(r2)
[1] 0.7160542 0.7160542 0.7160542 0.7160542 0.7160542 0.7160542
Note that there is an inconsistency in your text and code. You state the response is (0.1, 3, 0.5, 7) but define y as c( 0.1, 3, 0.75, 7). The results I show use the latter but your results used the former, hence the difference.
[*] without knowing more about the context I'm not sure I'd be wanting to fit millions of linear models...
First, let's reformat the data.
EDIT: this code is less then optimal, see Gavins solution in the other answer.
t <- NULL
for( row in 1:nrow( t1 ) ) {
for( col in 1:ncol( t1 ) ) {
t <- rbind( t, c( t1[ row, col ], t2[ row, col ], t3[ row, col ], t4[ row, col ] ) )
}
}
This will produce a matrix with four columns (one for each table), and nrow * ncol rows - as many rows as you have cells in one table. Check it with dim( t ). Running regression is now easy:
apply( t, 1, function( x ) { summary( lm( y ~ x ) )$r.squared )

Sum a list of matrices [duplicate]

This question already has answers here:
How to sum a numeric list elements
(2 answers)
Closed 5 years ago.
I have a list where each element is a 5*5 matrix. Eg
[[1]]
V1 V2 V3 V4 V5
[1,] 0.000000 46.973700 21.453500 338.547000 10.401600
[2,] 43.020500 0.000000 130.652000 840.526000 56.363700
[3,] 12.605600 173.238000 0.000000 642.075000 19.628100
[4,] 217.946000 626.368000 481.329000 0.000000 642.341000
[5,] 217.946000 626.368000 481.329000 0.000000 642.341000
[[2]]
V1 V2 V3 V4 V5
[1,] 0.000000 47.973700 21.453500 338.547000 10.401600
[2,] 143.020500 0.000000 130.652000 840.526000 56.363700
[3,] 312.605600 17.238000 0.000000 642.075000 19.628100
[4,] 17.946000 126.368000 481.329000 0.000000 642.341000
[5,] 217.946000 626.368000 481.329000 0.000000 642.341000
...
How can I use an apply-like function to sum matrix [1] to [n], and return a 5*5 matrix as a result (each element is a sum of the corresponding elements in each of the matrix in the list) ?
Use Reduce.
## dummy data
.list <- list(matrix(1:25, ncol = 5), matrix(1:25, ncol = 5))
Reduce('+', .list)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 2 12 22 32 42
## [2,] 4 14 24 34 44
## [3,] 6 16 26 36 46
## [4,] 8 18 28 38 48
## [5,] 10 20 30 40 50
I think #mnel's answer is the more efficient but this is another approach:
apply(simplify2array(.list), c(1,2), sum)
[,1] [,2] [,3] [,4] [,5]
[1,] 2 12 22 32 42
[2,] 4 14 24 34 44
[3,] 6 16 26 36 46
[4,] 8 18 28 38 48
[5,] 10 20 30 40 50
You could you do.call with some monkeying around but it loses its eloquence:
.list <- list(matrix(1:25, ncol=5), matrix(1:25,ncol=5), matrix(1:25,ncol=5))
x <- .list[[1]]
lapply(seq_along(.list)[-1], function(i){
x <<- do.call("+", list(x, .list[[i]]))
})
x

Resources