How to create multiple matrices based on a formula using two data frames and sum those matrices up in one go? - r

I'm fairly new to R and am thus not that knowledgeable yet about its different functionalities. I'm wondering if there is a more efficient way to replicate the following other than writing and running 230 lines of code.
I have two matrices, Z and E, which contain continuous numerical data and have the dimensions 7x229 and 17x229 respectively. For each column (so 229 times) I want to create a new 119x119 matrix by using the (repeated) formula below
ZZEE1 <- kronecker((Z[,1] %*% t(Z[,1])), (E[,1] %*% t(E[,1])))
ZZEE2 <- kronecker((Z[,2] %*% t(Z[,2])), (E[,2] %*% t(E[,2])))
ZZEE3 <- kronecker((Z[,3] %*% t(Z[,3])), (E[,3] %*% t(E[,3])))
ZZEE4 <- kronecker((Z[,4] %*% t(Z[,4])), (E[,4] %*% t(E[,4])))
#...
ZZEE228 <- kronecker((Z[,228] %*% t(Z[,228])), (E[,228] %*% t(E[,228])))
ZZEE229 <- kronecker((Z[,229] %*% t(Z[,229])), (E[,229] %*% t(E[,229])))
After this is done, I want to add all 229 matrices up into one matrix like this (not complete)
Sum_ZZEE <- ZZEE1 + ZZEE2 + ZZEE3 + ZZEE4 + ZZEE228 + ZZEE229 #Sum of all matrices from ZZEE1 to ZZEE229
Is there a quicker fix out there that will do exactly this? I have tried to find an answer online but did not find something that worked or something that I understood to the extent that I could modify it to my own data/code. As far as I understood it, there might be a fix with the function() function, but I would not know how to code it correctly. Getting the 'Sum_ZZEE' matrix is the final goal, I do not necessarily need the individual matrices stored in the workspace. Much obliged!

First construct a list of matrices: the following two code chunks are equivalent, use whichever is clearer to you.
ZZ_list <- lapply(1:229,
function(i) kronecker((Z[,i] %*% t(Z[,i])), (E[,i] %*% t(E[,i])))
)
or
ZZ_list <- list()
for (i in 1:229) {
ZZ_list[[i]] <- kronecker((Z[,i] %*% t(Z[,i])), (E[,i] %*% t(E[,i])))
}
Then use Reduce() (unfortunately sum() doesn't work the way you want):
answer <- Reduce("+", ZZ_list)
There might be some super-clever answer that works in pure linear algebra (e.g. with stacking/unstacking operators) ...

Related

faster 'outer' implementation in R

I was trying to use outer() function in R to create a matrix by pairwise evaluation of elements in a vector of dimension n. Specifically, let x be n-dimensional vector and I want to compare each pair of the elements of x. To do so, I use the following naive implementation using outer() function.
# these codes are example
n <- 500
x <- rnorm(n)
f <- function(x, y){
as.numeric(x<y)+0.5*as.numeric(x==y)
}
#new.mat <- outer(seq_len(n), seq_len(n), f) this was posted wrongly
new.mat <- outer(x, x, f) # edited
This implementation is extremely slow when n increases, and I would like to know an efficient way of doing this job. I really appreciate if you introduce me to your trick.
Thanks,
Alemu

Solve non-linear equations using "nleqslv" package

I tried to solve the these non-linear equations by using nleqslv. However it does not work well. I do know the reason why it does not because I didn't separate the two unknowns to different sides of the equation.
My questions are: 1, Are there any other packages that could solve this kind of
equations?
2, Is there any effective way in R that could help me rearrange
the equation so that it meets the requirement of the package
nleqslv?
Thank you guys.
Here are the codes and p[1] and p[2] are the two unknowns I want to solve.
dslnex<-function(p){
p<-numeric(2)
0.015=sum(exp(Calib2$Median_Score*p[1]+p[2])*weight_pd_bad)
cum_dr<-0
for (i in 1:length(label)){
cum_dr[i]<-exp(Calib2$Median_Score*p[1]+p[2][1:i]*weight_pd_bad[1:i]/0.015
}
mid<-0
for (i in 1:length(label)){
mid[i]<-sum(cum_dr[1:i])/2
}
0.4=(sum(mid*weight_pd_bad)-0.5)/(0.5*(1-0.015))
}
pstart<-c(-0.000679354,-4.203065891)
z<- nleqslv(pstart, dslnex, jacobian=TRUE,control=list(btol=.01))
Following up on my comment I have rewritten your function as follows correcting errors and inefficiencies.
Errors and other changes are given as inline comments.
# no need to use dslnex as name for your function
# dslnex <- function(p){
# any valid name will do
f <- function(p) {
# do not do this
# you are overwriting p as passed by nleqslv
# p<-numeric(2)
# declare retun vector
y <- numeric(2)
y[1] <- 0.015 - (sum(exp(Calib2$Median_Score*p[1]+p[2])*weight_pd_bad))
# do not do this
# cum_dr is initialized as a scalar and will be made into a vector
# which will be grown as a new element is inserted (can be very inefficient)
# cum_dr<-0
# so declare cum_dr to be a vector with length(label) elements
cum_dr <- numeric(length(label))
for (i in 1:length(label)){
cum_dr[i]<-exp(Calib2$Median_Score*p[1]+p[2][1:i]*weight_pd_bad[1:i]/0.015
}
# same problem as above
# mid<-0
mid <- numeric(length(label))
for (i in 1:length(label)){
mid[i]<-sum(cum_dr[1:i])/2
}
y[2] <- 0.4 - (sum(mid*weight_pd_bad)-0.5)/(0.5*(1-0.015))
# return vector y
y
}
pstart <-c(-0.000679354,-4.203065891)
z <- nleqslv(pstart, dslnex, jacobian=TRUE,control=list(btol=.01))
nleqslv is intended for solving systems of equations of the form f(x) = 0 which must be square.
So a function must return a vector with the same size as the x-vector.
You should now be able to proceed provided your system of equations has a solution. And provided there are no further errors in your equations. I have my doubles about the [1:i] in the expression for cum_dr and the expression for mid[i]. The loop calculating mid possibly can be written as a single statement: mid <- cumsum(cum_dr)/2. Up to you.

A function for calculating the eigenvalues of a matrix in R

I want to write a function like eigen() to calculating eigenvalues and eigenvectors of an arbitary matrix. I wrote the following codes for calculation of eigenvalues and I need a function or method to solve the resulted linear equation.
eig <- function(x){
if(nrow(x)!=ncol(x)) stop("dimension error")
ff <- function(lambda){
for(i in 1:nrow(x)) x[i,i] <- x[i,i] - lambda
}
det(x)
}
I need to solve det(x)=0 that is a polynomial linear equation to find the values of lambda. Is there any way?
Here is one solution using uniroot.all:
library(rootSolve)
myeig <- function(mat){
myeig1 <- function(lambda) {
y = mat
diag(y) = diag(mat) - lambda
return(det(y))
}
myeig2 <- function(lambda){
sapply(lambda, myeig1)
}
uniroot.all(myeig2, c(-10, 10))
}
R > x <- matrix(rnorm(9), 3)
R > eigen(x)$values
[1] -1.77461906 -1.21589769 -0.01010515
R > myeig(x)
[1] -1.77462211 -1.21589767 -0.01009019
Computing determinant is such a bad idea as it is not numerically stable. You can easily get Inf etc even for a moderately big matrix. I suggest reading the following answers (read them otherwise you have no idea what my code is doing):
Are eigenvectors returned by R function eigen() wrong?
eigenvectors when A-lx is singular with no solution
then use either of the following
NullSpace(A - diag(lambda, nrow(A)))
nullspace(A - diag(lambda, nrow(A)))
The solution from #liuminzhao won't work if there is two repeated eigenvalues. The function will fail to find the roots, because the characteristic polynomial of the matrix will not change sign (it is zero and does not cross the zero line), which is what rootSolve::uniroot.all() is doing when looking for roots. So you need another way to find a local minima (like optim()). Moreover, it will failed to determine the number of repeated eigenvalues.
A better way is to find the characteristic equation with, which is easily done with pracma::charpoly() and then using polyroot().
par <- pracma::charpoly(M) # find parameters of the CP of matrix M
par <- par[length(par):1] # reverse order for polyroot()
roots <- Re(polyroot(par)) # keep real part of the polyroot()
The pracma::charpoly() is not too complicated in itself, see its source code, starting at line a1 <- a.

How do I repeat a calculation for each row of a matrix in R?

I am very very new to programming and R. I have tried to find an answer to my question, but part of the problem is I don't know exactly what to search.
I am trying to repeat a calculation (statistical distance) for each row of a matrix. Here is what I have so far:
pollution1 <-as.matrix(pollution[,5:6])
ss <- var(pollution1)
ssinv <- solve(ss)
xbar <- colMeans(pollution1)
t(pollution1[1,]-xbar)%*%ssinv%*%(pollution1[1,]-xbar)
This gets me only the first statistical distance, but I don't want to retype this line with a different matrix row to get all of them.
From what I have read, I may need a loop or to use apply(), but haven't had success on my own. Any help with this, and advice on how to search for help so I don't need to post, would be appreciated.
Thank you.
You might also consider the mahalanobis function: from ?mahalanobis,
Returns the squared Mahalanobis distance of all rows in ‘x’ and
the vector mu = ‘center’ with respect to Sigma = ‘cov’. This is
(for vector ‘x’) defined as
D^2 = (x - mu)' Sigma^-1 (x - mu)
Of course, it's good to learn how to use apply too ...
What about just using apply
apply(pollution1, 1, function(i) t(i-xbar) %*% ssinv %*% (i-xbar))
Also, it's helpful if you make your example reproducible, for example:
pollution1 = matrix(rnorm(100), ncol=2)
ss = var(pollution1)
ssinv = solve(ss)
xbar = colMeans(pollution1)
t(pollution1[1,]-xbar) %*% ssinv %*% (pollution1[1,]-xbar)

Vectorize function to avoid loop

I'm trying to speed up my code because it's running very long. I already found out where the problem lies. Consider the following example:
x<-c((2+2i),(3+1i),(4+1i),(5+3i),(6+2i),(7+2i))
P<-matrix(c(2,0,0,3),nrow=2)
out<-sum(c(0.5,0.5)%*%mtx.exp(P%*%(matrix(c(x,0,0,x),nrow=2)),5))
I have a vector x with complex values, the vector has 12^11 entries and then I want to calculate the sum in the third row. (I need the function mtx.exp because it's a complex matrix power (the function is in the package Biodem). I found out that the %^% function does not support complex arguments.)
So my problem is that if I try
sum(c(0.5,0.5)%*%mtx.exp(P%*%(matrix(c(x,0,0,x),nrow=2)),5))
I get an error: "Error in pot %*% pot : non-conformable arguments." So my solution was to use a loop:
tmp<-NULL
for (i in 1:length(x)){
tmp[length(tmp)+1]<-sum(c(0.5,0.5)%*%mtx.exp(P%*%matrix(c(x[i],0,0,x[i]),nrow=2),5))
}
But as said, this takes very long. Do you have any ideas how to speed up the code? I also tried sapply but that takes just as long as the loop.
I hope you can help me, because i have to run this function approximatly 500 times and this took in first try more than 3 hours. Which is not very satisfying..
Thank u very much
The code can be sped up by pre-allocating your vector,
tmp <- rep(NA,length(x))
but I do not really understand what you are trying to compute:
in the first example,
you are trying to take the power of a non-square matrix,
in the second, you are taking the power of a diagonal matrix
(which can be done with ^).
The following seems to be equivalent to your computations:
sum(P^5/2) * x^5
EDIT
If P is not diagonal and C not scalar,
I do not see any easy simplification of mtx.exp( P %*% C, 5 ).
You could try something like
y <- sapply(x, function(u)
sum(
c(0.5,0.5)
%*%
mtx.exp( P %*% matrix(c(u,0,0,u),nrow=2), 5 )
)
)
but if your vector really has 12^11 entries,
that will take an insanely long time.
Alternatively, since you have a very large number
of very small (2*2) matrices,
you can explicitely compute the product P %*% C
and its 5th power (using some computer algebra system:
Maxima, Sage, Yacas, Maple, etc.)
and use the resulting formulas:
these are just (50 lines of) straightforward operations on vectors.
/* Maxima code */
p: matrix([p11,p12], [p21,p22]);
c: matrix([c1,0],[0,c2]);
display2d: false;
factor(p.c . p.c . p.c . p.c . p.c);
I then copy and paste the result in R:
c1 <- dnorm(abs(x),0,1); # C is still a diagonal matrix
c2 <- dnorm(abs(x),1,3);
p11 <- P[1,1]
p12 <- P[1,2]
p21 <- P[2,1]
p22 <- P[2,2]
# Result of the Maxima computations:
# I just add all the elements of the resulting 2*2 matrix,
# but you may want to do something slightly different with them.
c1*(c2^4*p12*p21*p22^3+2*c1*c2^3*p11*p12*p21*p22^2
+2*c1*c2^3*p12^2*p21^2*p22
+3*c1^2*c2^2*p11^2*p12*p21*p22
+3*c1^2*c2^2*p11*p12^2*p21^2
+4*c1^3*c2*p11^3*p12*p21+c1^4*p11^5)
+
c2*p12
*(c2^4*p22^4+c1*c2^3*p11*p22^3+3*c1*c2^3*p12*p21*p22^2
+c1^2*c2^2*p11^2*p22^2+4*c1^2*c2^2*p11*p12*p21*p22
+c1^3*c2*p11^3*p22+c1^2*c2^2*p12^2*p21^2
+3*c1^3*c2*p11^2*p12*p21+c1^4*p11^4)
+
c1*p21
*(c2^4*p22^4+c1*c2^3*p11*p22^3+3*c1*c2^3*p12*p21*p22^2
+c1^2*c2^2*p11^2*p22^2+4*c1^2*c2^2*p11*p12*p21*p22
+c1^3*c2*p11^3*p22+c1^2*c2^2*p12^2*p21^2
+3*c1^3*c2*p11^2*p12*p21+c1^4*p11^4)
+
c2*(c2^4*p22^5+4*c1*c2^3*p12*p21*p22^3
+3*c1^2*c2^2*p11*p12*p21*p22^2
+3*c1^2*c2^2*p12^2*p21^2*p22
+2*c1^3*c2*p11^2*p12*p21*p22
+2*c1^3*c2*p11*p12^2*p21^2+c1^4*p11^3*p12*p21)

Resources