I have a matrix:
b<-matrix(NA,ncol=100,nrow=10)
and a list of vectors:
load("https://dl.dropboxusercontent.com/u/22681355/a.Rdata")
This list contains 100 vectors. I would like to assign each vector in a list to one column of the matrix. Is this possible to do?
Have a look at ?do.call and ?cbind, e.g:
## create an example list with 3 vectors
l <- list(c(1:3), c(4:6), c(7:9))
## columnwise binding of all vectors in the list `l`
do.call(cbind, l)
Or you could use a simple for loop:
for (i in seq(along=l)) {
n <- length(l[[i]])
b[seq(n), i] <- l[[i]]
}
Something like this (don't remember where I got this from):
cbind.fill <- function(nm) {
nm <- lapply(nm, as.matrix)
n <- max(sapply(nm, nrow))
do.call(cbind, lapply(nm, function(x) rbind(x, matrix(, n - nrow(x), ncol(x)))))
}
l <- list(c(1:3), c(4:8), c(7:9))
cbind.fill(l)
## > cbind.fill(l)
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
## [4,] NA 7 NA
## [5,] NA 8 NA
One trick is to first "lengthen" the vectors so they are all the same length (in your case 10). Here I start by creating dummy data (only 10 columns so I can show results easily, but this extends to your 100 column case):
set.seed(1)
lst <- replicate(10, sample(1:100, sample(5:10, 1))) # 10 vectors, length 5-10
Now lengthen, and cbind
lst <- lapply(lst, function(x) { length(x) <- 10; x }) # make all length 10
do.call(cbind, lst)
That's it:
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 38 63 100 39 19 65 87 41 9 78
# [2,] 57 7 38 87 82 78 44 91 87 96
# [3,] 90 21 77 34 66 55 24 29 34 43
# [4,] 20 18 91 47 78 52 7 45 82 70
# [5,] 87 66 21 58 11 76 10 32 98 39
# [6,] 98 37 62 NA 69 3 31 62 32 31
# [7,] NA 73 12 NA 39 45 49 25 45 72
# [8,] NA 47 25 NA NA 69 NA 97 83 NA
# [9,] NA NA 36 NA NA 64 NA NA 80 NA
# [10,] NA NA NA NA NA NA NA NA NA NA
If you call a position bigger than vector's length you got NA in the 'extras positions'. So, a simply mapply does the work:
set.seed(1)
lst <- replicate(10, sample(1:100, sample(5:10, 1))) # Simulating data (Thanks #BrodieG!)
mapply(function(x) x[1:10], lst) # You just need change tha maximium length
matrix(unlist(list(c(1:3), c(4:6), c(7:9))), ncol = 3)
Related
I have two large matrices P and Q around (10k x 50k dim in both, but to test this yourself a random 10x10 matrix for P and Q is sufficient). I have a list of indices, e.g.
i j
1 4
1 625
1 9207
2 827
... ...
etc. This means that I need to find the dot product of column 1 in P and column 4 in Q, then column 1 in P and column 625 in Q and so on. I could easily solve this with a for loop but I know they are not very efficient in R. Anyone got any ideas?
edit: asked for a reproducible example
P <- matrix(c(1,0,1,0,0,1,0,1,0), nrow = 3, ncol = 3)
Q <- matrix(c(0,0,1,0,1,0,1,0,1), nrow = 3, ncol = 3)
i <- c(1,1,2)
j <- c(2,1,3)
gives output (if in dot product form)
1: 0
2: 1
3: 1
P <- matrix(1:50, nrow = 5,ncol = 10)
Q <- matrix(1:50, nrow = 5, ncol = 10)
i <- c(1,2,4,7)
j <- c(5,3,7,2)
P
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 6 11 16 21 26 31 36 41 46
# [2,] 2 7 12 17 22 27 32 37 42 47
# [3,] 3 8 13 18 23 28 33 38 43 48
# [4,] 4 9 14 19 24 29 34 39 44 49
# [5,] 5 10 15 20 25 30 35 40 45 50
Q
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 6 11 16 21 26 31 36 41 46
# [2,] 2 7 12 17 22 27 32 37 42 47
# [3,] 3 8 13 18 23 28 33 38 43 48
# [4,] 4 9 14 19 24 29 34 39 44 49
# [5,] 5 10 15 20 25 30 35 40 45 50
P[,i] * Q[, j]
# [,1] [,2] [,3] [,4]
# [1,] 21 66 496 186
# [2,] 44 84 544 224
# [3,] 69 104 594 264
# [4,] 96 126 646 306
# [5,] 125 150 700 350
Using matrix multiplication, you can do
diag(t(P[, i]) %*% Q[, j])
[1] 0 1 1
Here is second a solution with apply.
apply(cbind(i, j), 1, function(x) t(P[, x[1]]) %*% Q[, x[2]])
[1] 0 1 1
To verify these agree in a second example:
set.seed(1234)
A <- matrix(sample(0:10, 100, replace=TRUE), 10, 10)
B <- matrix(sample(0:10, 100, replace=TRUE), 10, 10)
inds <- matrix(sample(10, 10, replace=TRUE), 5)
matrix multiplication
diag(t(A[, inds[,1]]) %*% B[, inds[,2]])
[1] 215 260 306 237 317
and apply
apply(inds, 1, function(x) t(A[, x[1]]) %*% B[, x[2]])
[1] 215 260 306 237 317
I have a large matrix that contains various features extracted from microscopic cell images. The different features are distributed across the columns, the individual cells across the rows of that matrix. However, the measurements come from time lapse microscopy, such that each individual cell has 90 rows (time points) in that matrix. So this matrix has the dimension [cell_amount*90; feature_amount].
My goal is to:
calculate the difference of subsequent time points for each cell (the "derivative" of the time series), and then
create a new matrix that contains an aggregation of those differences for each cell (so that new matrix has the dimension [cell_amount; feature_amount]).
I set up some code in R to test my problem, where I have 4 cells, 4 features (columns) and each cell has 3 time point values. So the first cell would be on rows 1-3, the second on row 4-6, and so on. From this I calculate the difference of the values:
A <- matrix(sample(1:100, 4*12), ncol = 4)
B <- abs( A - dplyr::lag(A) )
B[seq(1,nrow(B), 3),] <- NA
This results in a matrix where the first row of each cell contains NA values:
[,1] [,2] [,3] [,4]
[1,] NA NA NA NA
[2,] 82 29 54 22
[3,] 32 44 18 31
[4,] NA NA NA NA
[5,] 22 61 10 33
[6,] 19 64 54 35
[7,] NA NA NA NA
[8,] 59 18 6 10
[9,] 34 47 70 6
[10,] NA NA NA NA
[11,] 60 23 68 22
[12,] 17 13 12 9
The resulting matrix containing an aggregation for those values for each cell, in this case the variance, should then look like:
[,1] [,2] [,3] [,4]
[1,] 1250 112.5 648 40.5
[2,] 4.5 4.5 968 2
[3,] 312.5 420.5 2048 8
[4,] 924.5 50 1568 84.5
How can I calculate this new matrix in R? Any help is appreciated.
Because you used a random sample without a seed, I can't re-create your A matrix. However, here's a recreation of your B matrix.
B <- matrix(scan(text="
NA NA NA NA
82 29 54 22
32 44 18 31
NA NA NA NA
22 61 10 33
19 64 54 35
NA NA NA NA
59 18 6 10
34 47 70 6
NA NA NA NA
60 23 68 22
17 13 12 9"), ncol=4, byrow=T)
If you really want to keep this a matrix, you can reshape this into a multi-dimensional array and the use apply over the margins to get the value of interest, for example
apply(array(B, dim=c(3,4,4)),2:3, var, na.rm=T)
# [,1] [,2] [,3] [,4]
# [1,] 1250.0 112.5 648 40.5
# [2,] 4.5 4.5 968 2.0
# [3,] 312.5 420.5 2048 8.0
# [4,] 924.5 50.0 1568 84.5
You could also create a proper grouping variable and use aggregate()
row_sample <- rep(1:3, each=nrow(B)/3)
aggregate(B, list(row_sample), var, na.rm=T)
# Group.1 V1 V2 V3 V4
# 1 1 1250.0000 112.5000 648.0000 40.50000
# 2 2 496.3333 662.3333 709.3333 193.00000
# 3 3 469.0000 305.3333 1084.0000 72.33333
I use a for loop (which works well) to replace randomly two values in each line of a dataset by NA (the indexes of this values are randomly changes at each line).
Now I would like to use apply() to do exactly the same thing.
I tried this code (as many other things which return NA everywhere):
my_fun<-function(x){if (j %in% sample(1:ncol(y),2)) {x[j]<-NA}}
apply(y,1,my_fun)
But it doesn't work (it does not make any change to the initial dataset).
The problem is that the object j is not found. j should be the number of the column.
Does someone have an idea?
From your description I argue that you want:
my_fun <- function(x) { x[sample(1:length(x), 2)] <- NA; x }
apply(y, 1, my_fun) # or
t(apply(y, 1, my_fun))
Testing the function:
set.seed(42)
y <- matrix(1:60, 10)
y
t(apply(y, 1, my_fun))
# > t(apply(y, 1, my_fun))
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 11 21 31 NA NA
# [2,] 2 NA 22 32 NA 52
# [3,] 3 13 NA NA 43 53
# [4,] NA 14 24 34 NA 54
# [5,] 5 15 25 NA 45 NA
# [6,] 6 16 NA NA 46 56
# [7,] 7 NA 27 37 47 NA
# [8,] 8 18 NA 38 NA 58
# [9,] NA 19 29 39 49 NA
# [10,] 10 20 NA 40 50 NA
i have a matrix data frame 6940 rows and 100 columns. I need to find 5 days cumulative at a time on the data set. Right now I was able to build a for loop code for this as follows :
cum<- matrix(data=q1,nrow=6940,ncol=100)
for (j in 1:100){
for (i in 1:6940){
cum[i,j]<-sum(q1[i,j],q1[i+1,j],q1[i+2,j],q1[i+3,j],q1[i+4,j],na.rm=T)
}
}
I wanted to know whether there is any function in apply family to do the same, as this code is very time consuming.
for example if i generate a data frame using the command
ens <- matrix(rnorm(200),20)
I want cumulative sum of 5 rows a time. i.e sum of row1:row5, row2:row6, row3:row7 and so on in a form of data frame.
i tried using apply function in this form :
apply(apply(apply(apply( apply(m, 2, cumsum),2, cumsum), 2, cumsum),2,cumsum),2,cumsum)
but the problem is I don't get the cumulative in blocks of 5, only an overall cumulative.
Here is one approach using the stats::filter function to calculate the rolling sums and apply to loop over the columns:
m <- matrix(1:48, ncol = 4)
# [,1] [,2] [,3] [,4]
# [1,] 1 13 25 37
# [2,] 2 14 26 38
# [3,] 3 15 27 39
# [4,] 4 16 28 40
# [5,] 5 17 29 41
# [6,] 6 18 30 42
# [7,] 7 19 31 43
# [8,] 8 20 32 44
# [9,] 9 21 33 45
#[10,] 10 22 34 46
#[11,] 11 23 35 47
#[12,] 12 24 36 48
apply(m, 2, filter, filter = rep(1, 5), sides = 1)
# [,1] [,2] [,3] [,4]
# [1,] NA NA NA NA
# [2,] NA NA NA NA
# [3,] NA NA NA NA
# [4,] NA NA NA NA
# [5,] 15 75 135 195
# [6,] 20 80 140 200
# [7,] 25 85 145 205
# [8,] 30 90 150 210
# [9,] 35 95 155 215
#[10,] 40 100 160 220
#[11,] 45 105 165 225
#[12,] 50 110 170 230
This might have to be adjusted depending on how you want to handle windows with less than 5 values (e.g., here in the beginning).
Another option is roll_sum (Data from #Roland's post)
library(RcppRoll)
apply(m, 2, roll_sumr, 5)
# [,1] [,2] [,3] [,4]
# [1,] NA NA NA NA
# [2,] NA NA NA NA
# [3,] NA NA NA NA
# [4,] NA NA NA NA
# [5,] 15 75 135 195
# [6,] 20 80 140 200
# [7,] 25 85 145 205
# [8,] 30 90 150 210
# [9,] 35 95 155 215
#[10,] 40 100 160 220
#[11,] 45 105 165 225
#[12,] 50 110 170 230
As #alexis_laz mentioned in the comments, roll_sumr can take matrix as well. It is more efficient.
roll_sumr(m, 5, by = 1)
Benchmarks
set.seed(24)
m1 <- matrix(sample(1:50, 5000*5000, replace=TRUE), ncol=5000)
system.time(apply(m1, 2, roll_sumr, 5))
# user system elapsed
# 1.84 0.16 1.99
system.time(roll_sumr(m1, 5, by = 1))
# user system elapsed
# 0.59 0.15 0.74
system.time(apply(m1, 2, stats::filter, filter = rep(1, 5), sides = 1))
# user system elapsed
# 4.46 0.20 4.68
Another approach, less sophisticated: Created 5 variable and sum by the variable 5 time.
Here:
m <- data.table(matrix(1:48, ncol = 4))
m[, index := .I]
m[, i1 := floor((index - 1) / 5) * 5 + 1]
m[, i2 := floor((index - 2) / 5) * 5 + 2]
m[, i3 := floor((index - 3) / 5) * 5 + 3]
m[, i4 := floor((index - 4) / 5) * 5 + 4]
m[, i5 := floor((index - 5) / 5) * 5 + 5]
cumsumm <- rbindlist(list(m[, list(value = sum(V1)), by = "i1"]
, m[, list(value = sum(V1)), by = "i2"]
, m[, list(value = sum(V1)), by = "i3"]
, m[, list(value = sum(V1)), by = "i4"]
, m[, list(value = sum(V1)), by = "i5"]), use.names=F)[i1 > 0, ]
Here is a 10 x 12 matrix:
mat <- matrix(runif(120, 0, 1), 10)
I am trying to find column sums for subsets of a matrix (specifically, column sums for columns 1 through 4, 5 through 8, and 9 through 12) by row. The desired output would be a 10 x 3 matrix.
I tried the approaches from this answer using tapply and by (with detours to rowsum and aggregate), but encountered errors with all of them.
What the OP is describing is called a row sum in R:
# using Matthew Lundberg's example data
x <- matrix(1:36, 3,12)
g = split(seq(ncol(x)), (seq(ncol(x)) - 1) %/% 4 )
sapply(g, function(cols) rowSums( x[, cols] ))
# 0 1 2
# [1,] 22 70 118
# [2,] 26 74 122
# [3,] 30 78 126
It's typical to have grouping variables over rows/observations not columns/variables. To reach this case, the OP could transpose:
rowsum( t(x), (seq(ncol(x))-1) %/% 4 )
# [,1] [,2] [,3]
# 0 22 26 30
# 1 70 74 78
# 2 118 122 126
You can do this with a brute-force approach, specifying each column within apply:
t(apply(x, 1, function(y) c(sum(y[1:4]), sum(y[5:8]), sum(y[9:12]))))
It's easier to see with non-random data, and a shorter matrix for input:
> x <- matrix(1:36, 3,12)
> x
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 4 7 10 13 16 19 22 25 28 31 34
[2,] 2 5 8 11 14 17 20 23 26 29 32 35
[3,] 3 6 9 12 15 18 21 24 27 30 33 36
> t(apply(x, 1, function(y) c(sum(y[1:4]), sum(y[5:8]), sum(y[9:12]))))
[,1] [,2] [,3]
[1,] 22 70 118
[2,] 26 74 122
[3,] 30 78 126
You can also split the vector with split, and while this is more idiomatic for R and more flexible, it is not really more readable:
> t(apply(x, 1, function(y) sapply(split(y, ceiling(seq_along(y)/4)), sum)))
1 2 3
[1,] 22 70 118
[2,] 26 74 122
[3,] 30 78 126
We could convert to array, use apply with MARGIN=1 and get the colSums
n <- 4
t(apply(array(mat, dim=c(nrow(mat), n, ncol(mat)/n)), 1, colSums))
Or another option is melt/acast from library(reshape2)
library(reshape2)
acast(melt(mat), Var1~(Var2-1)%/%n, value.var='value', sum)
The wrapper function recast can be used to make this compact
recast(mat, Var1~(Var2-1)%/%4, id.var=NULL, sum)