I have a question on the following issue:
Suppose I have some matrices
A1 <- matrix(runif(rowsA1*T), rowsA1, T)
…
AD <- matrix(runif(rowsAD*T), rowsAD, T)
The number of matrices is variable (but most certainly not too large).
Is there a way to perform the following more efficiently (but in a set-up that allows for a variable number of matrices):
f1 <- function(A1, A2, ..., AD) {
for(i in 1:nrow(A1)) {
for(j in 1:nrow(A2)) {
...
for(d in 1:nrow(AD)) {
ret[i,j,...,d] <- \sum_{t=1}^T (A1[i,t]*A2[j,t]*...*AD[d,t])
}
...
}
}
ret
}
Thank you very much for your help!
Romain
---------------------------------- Edit with example ----------------------------------
A1 <- |a b c| A2 <- |j k l| A3 <- |s t u|
|d e f| |m n o| |v w x|
|g h i| |p q r| |y z ä|
And I want for instance to get the following:
ret[1,1,1] <- a*j*s + b*k*t + c*l*u
ret[2,1,3] <- d*j*y + e*k*z + f*l*ä
Hopefully this makes my point clearer.
---------------------------------- Edit Nov. 26th, 2013 -------------------------------
Hi #flodel. I tried to implement your code, but there seems to be an issue once one has more than three matrices.
Suppose, I have the following matrices
A1 <- matrix(runif(4*3), nrow = 4, ncol = 3)
A2 <- matrix(runif(3*3), nrow = 3, ncol = 3)
A3 <- matrix(runif(2*3), nrow = 2, ncol = 3)
A4 <- matrix(runif(1*3), nrow = 1, ncol = 3)
and pluging them into your code
output.f1 <- f1(A1,A2,A3,A4)
provides the correct number of dimensions
dim(output)
# [1] 4 3 2 1
but the output is full of NAs
output.f1
# , , 1, 1
# [,1] [,2] [,3]
# [1,] 0.13534704 NA NA
# [2,] 0.07360135 NA NA
# [3,] 0.07360135 NA NA
# [4,] 0.07360135 NA NA
# , , 2, 1
# [,1] [,2] [,3]
# [1,] NA NA NA
# [2,] NA NA NA
# [3,] NA NA NA
# [4,] NA NA NA
Thanks for some help...
Best,
Romain
Give this a try. With a big apply loop, it might be slow with large matrices, but it will do the job as far as being general to any number of matrices without necessarily the same number of rows:
f1 <- function(...) {
args <- list(...)
nrows <- sapply(args, nrow)
idx <- do.call(expand.grid, lapply(nrows, seq.int))
get.row <- function(i, mat) mat[i, ]
get.val <- function(i.vec) sum(Reduce(`*`, Map(get.row, i.vec, args)))
idx$val <- apply(idx, 1, get.val)
ret <- array(NA, dim = nrows)
ret[as.matrix(idx[, seq_along(args)])] <- idx$val
ret
}
Example usage:
A1 <- matrix(1:12, nrow = 4, ncol = 3)
A2 <- matrix(1:9, nrow = 3, ncol = 3)
A3 <- matrix(1:6, nrow = 2, ncol = 3)
out <- f1(A1, A2, A3)
Check:
identical(out[3, 2, 1],
sum(A1[3, ] * A2[2, ] * A3[1, ]))
# [1] TRUE
Related
I have two matrices, one of them has a NA value and I want to use a function that only runs if there are NAs present in the data, so if I run the function it should only work on df2 and not df1. How would I do this?
df1 <- matrix(1:4, nrow = 2, ncol = 2)
df2 <- matrix(1,2,3,NA, nrow = 2, ncol = 2)
Based on the comment above, here is a complete answer (assuming I understand what you are getting at). The function is set up to do something or not to the matrix depending on whether it has NA values.
df1 <- matrix(1:4, nrow = 2, ncol = 2)
df2 <- matrix(c(1,2,3,NA), nrow = 2, ncol = 2)
myfunc <- function(m) {
ret <- m
if (all(!is.na(m))) {
print("This matrix has no NAs")
} else {
print("This matrix has NAs")
}
return(ret)
}
myfunc(df1)
# [1] "This matrix has no NAs"
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
myfunc(df2)
# [1] "This matrix has NAs"
# [,1] [,2]
# [1,] 1 3
# [2,] 2 NA
I have a dataset with very large matrices and vectors. I would like to multiply a matrix with a vector that has one "1" element with the rest being zero. I would like to loop this calculation so that every possible 1 and 0 combination within the vector has been multiplied with the matrix, and the store the results in a vector.
I give an example of what I'm trying to do.
I have two matrices, a and b:
a <- matrix(1:16, nrow = 4, byrow = TRUE)
b <- matrix(17:32, nrow = 4, byrow = TRUE)
and a vector with 1's and 0's. As I don't know how to loop well yet, I write down the code for each combination:
c1 <- rep(0, times = 4)
c1[1] <- 1
c2 <- rep(0, times = 4)
c2[2] <- 1
c3 <- rep(0, times = 4)
c3[3] <- 1
c4 <- rep(0, times = 4)
c4[4] <- 1
I multiply a with each of the vector combinations c, diagonalize it, multiply this with b and sum each row and column. I then store this result in a vector results:
d1 <- sum(colSums(b %*% diag(as.vector(a %*% c1), nrow = 4)))
d2 <- sum(colSums(b %*% diag(as.vector(a %*% c2), nrow = 4)))
d3 <- sum(colSums(b %*% diag(as.vector(a %*% c3), nrow = 4)))
d4 <- sum(colSums(b %*% diag(as.vector(a %*% c4), nrow = 4)))
results <- cbind(d1, d2, d3, d4)
that gives:
d1 d2 d3 d4
[1,] 2824 3216 3608 4000
Is there a good line of code that does this more efficiently than what I did here?
Because of the special structure of your calculation you can shorten it to
a <- matrix(1:16, nrow = 4, byrow = TRUE)
b <- matrix(17:32, nrow = 4, byrow = TRUE)
results <- apply(a, 2, function(ai) sum(b %*% ai))
# [1] 2824 3216 3608 4000
or even shorter
colSums(b %*% a)
# [1] 2824 3216 3608 4000
I want to reduce time and memory usage (I previously used outer for this but it consumes more memory than I have) by reducing the iterations to create a symmetric matrix, that is sol[i, j] is the same as sol[j, i].
My code so far:
# Prepare input
subss <- list(a = c(1, 2, 4), b = c(1, 2, 3), c = c(4, 5))
A <- matrix(runif(25), ncol = 5, nrow = 5)
# Pre allocate memory
sol <- matrix(nrow = length(subss), ncol = length(subss),
dimnames = list(names(subss), names(subss)))
x <- 0
for (i in seq_along(subss)) {
# Omit for the subsets I already calculated ?
for (j in seq_along(subss)) {
x <- x + 1
message(x)
# The function I use here might result in a NA
sol[i, j] <- mean(A[subss[[i]], subss[[j]]])
sol[j, i] <- sol[i, j] # Will overwrite when it shouldn't
}
}
Will use 9 iterations, how can I avoid them and do just 6 iterations?
I need to calculate the symmetric values, so this question doesn't apply. Also this other one doesn't work either because there might be many combinations and at some point it can't allocate the vector in memory.
A for loop will usually be slower than outer. Try byte-compiling the loop or implement it in Rcpp.
subss <- list(a = c(1, 2, 4), b = c(1, 2, 3), c = c(4, 5))
set.seed(42)
A <- matrix(runif(25), ncol = 5, nrow = 5)
#all combinations of indices
ij <- combn(seq_along(subss), 2)
#add all i = j
ij <- matrix(c(ij, rep(seq_along(subss), each = 2)), nrow = 2)
#preallocate
res <- numeric(ncol(ij))
#only one loop
for (k in seq_len(ncol(ij))) {
message(k)
res[k] <- mean(A[subss[[ij[1, k]]], subss[[ij[2, k]]]])
}
#1
#2
#3
#4
#5
#6
#create symmetric sparse matrix
library(Matrix)
sol <- sparseMatrix(i = ij[1,], j = ij[2,],
x = res, dims = rep(length(subss), 2),
symmetric = TRUE, index1 = TRUE)
#3 x 3 sparse Matrix of class "dsCMatrix"
#
#[1,] 0.7764715 0.6696987 0.7304413
#[2,] 0.6696987 0.6266553 0.6778936
#[3,] 0.7304413 0.6778936 0.5161089
I found a way with plain for loops:
x <- 0
for (i in seq_along(subss)) {
for (j in seq_len(i)) { # or for (j in 1:i) as proposed below
x <- x + 1
message(x)
sol[i, j] <- mean(A[subss[[i]], subss[[j]]])
sol[j, i] <- sol[i, j]
}
}
for (i in 1:length(subss)) {
for (j in 1:i) {
message(i, ' ', j, ' - ', mean(A[subss[[i]], subss[[j]]]) ) # Check iterations and value
sol2[i, j] <- sol2[j, i] <- mean(A[subss[[i]], subss[[j]]])
}
}
I checked your script values and aren't symmetric:
1 1 - 0.635455905252861
1 2 - 0.638608284398086
1 3 - 0.488700995299344
2 1 - 0.568414432255344
2 2 - 0.602851431118324
2 3 - 0.516099992596234
3 1 - 0.595461705311512
3 2 - 0.656920690399905
3 3 - 0.460815121419728
Mine values (same as #Llopis):
1 2 - 0.638608284398086
1 3 - 0.488700995299344
2 2 - 0.602851431118324
2 3 - 0.516099992596234
3 2 - 0.656920690399905
3 3 - 0.460815121419728
I have two lists and I want make them consistent in terms of NA. Put NA Whenever there is NA in one of the two lists without changing in anything else in the structure of the list.
set.seed(123)
m1 <- matrix(nrow=2,ncol=2,data=runif(4))
m1[1,1] <- NA
m2 <- matrix(nrow=2,ncol=2,data=runif(4))
m2[1,2] <- NA
lis <- list(m1, m2)
m1 <- matrix(nrow=2,ncol=2,data=runif(4))
m2 <- matrix(nrow=2,ncol=2,data=runif(4))
m2[2,1] <- NA
bis <- list(m1, m2)
I tried this but with no success bis[is.na(lis)]=NA
Desired output:
> lis
[[1]]
[,1] [,2]
[1,] NA 0.9568333
[2,] 0.4566147 0.4533342
[[2]]
[,1] [,2]
[1,] 0.9404673 NA
[2,] 0.0455565 NA
> bis
[[1]]
[,1] [,2]
[1,] NA 0.9568333
[2,] 0.4566147 0.4533342
[[2]]
[,1] [,2]
[1,] 0.6775706 NA
[2,] 0.5726334 NA
Using Map to create a list of matrices with the NA positions as NA:
naposmtx <- Map(function(mtx1, mtx2){
nasmtx <- mtx1 + mtx2 # because NA + non-NA = NA
nasmtx[!is.na(nasmtx)] <- 0
nasmtx
}, lis, bis)
Then:
lis <- Map(`+`, lis, naposmtx)
bis <- Map(`+`, bis, naposmtx)
Here's an option:
z <- Map("|", lapply(lis, is.na), lapply(bis, is.na))
bis <- Map(function(mat, idx) {mat[idx] <- NA; mat}, bis, z)
lis <- Map(function(mat, idx) {mat[idx] <- NA; mat}, lis, z)
However, there may be faster / more efficient approaches due to the numerous Map and lapply calls.
For the case of >2 lists you can use the following approach (assuming that each list has the same length):
# create a named list - naming is important if you want to
# assign them back to the global environment later on
mylist <- list(lis = lis, bis = bis, kis = kis)
n <- max(lengths(mylist))
z <- lapply(1:n, function(i) {
Reduce(`+`, Map(function(y) is.na(y[[i]]), mylist))>0
})
mylist <- lapply(mylist, function(mat) {
Map(function(m, idx) {m[idx] <- NA; m}, mat, z)
})
# to assign them back to the global environment, run:
list2env(mylist, envir = .GlobalEnv)
Now your original lists are modified in the global environment.
Sample data:
set.seed(123)
n <- 4
lis <- list(
m1 = matrix(nrow=n,ncol=n,data=sample(c(NA, 1:10), n*n, TRUE)),
m2 = matrix(nrow=n,ncol=n,data=sample(c(NA, 1:10), n*n, TRUE))
)
bis <- list(
m1 = matrix(nrow=n,ncol=n,data=sample(c(NA, 1:10), n*n, TRUE)),
m2 = matrix(nrow=n,ncol=n,data=sample(c(NA, 1:10), n*n, TRUE))
)
kis <- list(
m1 = matrix(nrow=n,ncol=n,data=sample(c(NA, 1:10), n*n, TRUE)),
m2 = matrix(nrow=n,ncol=n,data=sample(c(NA, 1:10), n*n, TRUE))
)
I have another simple r question that hopefully someone can help with. I have a series of dataframes that have a repetitive name structure. I would like to loop through them and perform some analysis. Here is hardcoded example of what I want to do using some fake data:
#Create some fake data
n1 = c(2, 3, 5, 7)
s1 = c(1, 1, 2, 0)
b1 = c(6, 0, 0, 0)
Tank001.df = data.frame(n1, s1, b1)
n2 = c(1, 2, 4, 6)
s2 = c(2, 2, 0, 0)
b2 = c(8, 9, 10, 0)
Tank002.df = data.frame(n2, s2, b2)
n3 = c(7, 12, 0, 0)
s3 = c(5, 3, 0, 0)
b3 = c(8, 9, 10, 4)
Tank003.df = data.frame(n3, s3, b3)
The first action I would like to automate is the conversion of 0 values to "NA". Here is the harcoded version but I would ideally automate this dependant on how many Tankxxx.df dataframes I have:
#Convert zeros to NA
Tank001.df[Tank001.df==0] <- NA
Tank002.df[Tank002.df==0] <- NA
Tank003.df[Tank003.df==0] <- NA
Finally I would like to complete a series of queries of the data, a simple example of which might be the number of values smaller than 5 in each dataframe:
#Return the number of values smaller than 5
Tank001.less.than.5 <- numeric(length(Tank001.df))
for (i in 1:(length(Tank001.df))) {Tank001.less.than.5[i] <- sum(Tank001.df[[i]] < 5,na.rm=TRUE)}
Tank002.less.than.5 <- numeric(length(Tank002.df))
for (i in 1:(length(Tank002.df))) {Tank002.less.than.5[i] <- sum(Tank002.df[[i]] < 5,na.rm=TRUE)}
Tank003.less.than.5 <- numeric(length(Tank003.df))
for (i in 1:(length(Tank003.df))) {Tank003.less.than.5[i] <- sum(Tank003.df[[i]] < 5,na.rm=TRUE)}
Ideally I would also like to know how to write the results of such simple calculations to a new dataframe. In this case for example Less.than.5$TankXXX etc.
Any help would be greatly appreciated.
Create a list of your data.frames and use a combination of lapply and sapply as follows:
TankList <- list(Tank001.df, Tank002.df, Tank003.df)
lapply(TankList, function(x) {
x[x == 0] <- NA
sapply(x, function(y) sum(y < 5, na.rm = TRUE))
})
# [[1]]
# n1 s1 b1
# 2 3 0
#
# [[2]]
# n2 s2 b2
# 3 2 0
#
# [[3]]
# n3 s3 b3
# 0 1 1
This also works with a single lapply and colSums:
l <- list(Tank001.df, Tank002.df, Tank003.df) # create a list
lapply(l, function(x) colSums("is.na<-"(x, !x) < 5, na.rm = TRUE))
# [[1]]
# n1 s1 b1
# 2 3 0
#
# [[2]]
# n2 s2 b2
# 3 2 0
#
# [[3]]
# n3 s3 b3
# 0 1 1