Implementing a particular approach to calculating a log-likelihood using matrix operations - r

I cam across a mathematical expression for log-likelihood in a CrossValidated.com answer and am unclear how I should implement in R. I'm not sure if SO can represent MathML the same as CV, but this is the first equation in the second (not accepted) anser:
$$
\begin{eqnarray}
\ell(\mu, \Sigma) &=& C - \frac{m}{2}\log|\Sigma|-\frac{1}{2} \sum_{i=1}^m \text{tr}\left[(\mathbf{x}^{(i)}-\mu)^T \Sigma^{-1} (\mathbf{x}^{(i)}-\mu)\right]\\
$$
I focusing on the 3rd term in that equation and I do not think the trace operation is necessary according to another answer on that page. I suppose I could look at one of the several implementations in the various packages that exist, but I'm thinking they use more economical approaches that don't clearly follow that equation's procedure, as did #onyambu in the answer here:
I'm ripping out code from an earlier SO example:
library(MASS)
# Make covariance matrix. See note above re the implications of using a correlation matrix.
S = matrix(c(1.0, 0.2, 0.1, 0.35, 0.0,
0.2, 1.0, 0.0, 0.4, 0.0,
0.1, 0.0, 1.0, 0.0, 0.4,
0.35, 0.4, 0.0, 1.0, 0.6,
0.0, 0.0, 0.4, 0.6, 1.0), ncol = 5)
colnames(S) = c("Y1", "X1", "X2", "Z1" ,"Z2")
rownames(S) = colnames(S)
# Make mean vector
mus = c(1, 2, 3, 4, 5); names(mus) = colnames(S)
# Generate 5347 observations
obs = mvrnorm(n = 200, mu = mus, Sigma = S)
This effort was in response to a question correctly answered now but not using a summation of a matrix expression. I think I can do it with a for-loop to create individual contributions for each data point:
llmat.term3 <- matrix(NA, 200,1)
for(n in 1:200) {
llmat.term3[n] <- t(obs[n,]-mus) %*% solve(S) %*% (obs[n,]-mus) }
sum(llmat.term3)
#[1] 982.7356
.... but I'm wondering if there is a more compact matrix approach? Or I suppose, filled in the gaps in my linear algebra knowledge that explains why sum(u * solve(sig, u) is the same as sum{i=1,N} ( t(obs[n,]-mu) %*% S^-1 %*% (obs[n,]-mu) ).

in your code you have
S = matrix(c(1.0, 0.2, 0.1, 0.35, 0.0,
0.2, 1.0, 0.0, 0.4, 0.0,
0.1, 0.0, 1.0, 0.0, 0.4,
0.35, 0.4, 0.0, 1.0, 0.6,
0.0, 0.0, 0.4, 0.6, 1.0), ncol = 5)
colnames(S) = c("Y1", "X1", "X2", "Z1" ,"Z2")
rownames(S) = colnames(S)
# Make mean vector
mus = c(1, 2, 3, 4, 5); names(mus) = colnames(S)
# Generate 5347 observations
set.seed(123)
obs = MASS::mvrnorm(n = 200, mu = mus, Sigma = S)
llmat.term3 <- matrix(NA, 200,1)
for(n in 1:200) {
llmat.term3[n] <- t(obs[n,]-mus) %*% solve(S) %*% (obs[n,]-mus) }
sum(llmat.term3)
#[1] 982.7356
compare to more compact approaches:
u <- t(obs) - mus
sum(diag(solve(S, tcrossprod(u))))
#> [1] 982.7356
sum(u * solve(S, u))
#> [1] 982.7356
Though the two expressions give similar results, The first one seems to be quicker than the second. I do not know why since in the first one there is a computation of n * n matrix. The for loop takes for-ever to compute.
Unit: milliseconds
expr min lq mean median uq max neval
a 4532.6753 4679.4043 5470.94765 4815.1294 6061.3284 7789.5116 10
b 2.8991 3.2693 3.73495 3.3675 3.7777 6.9719 10
c 7.8176 8.5473 12.03060 9.2542 16.4089 20.1742 10
set.seed(123)
n <- 200000
obs = MASS::mvrnorm(n = n, mu = mus, Sigma = S)
u <- t(obs) -mus
microbenchmark::microbenchmark(a = {
llmat.term3 <- matrix(NA, n,1)
for(i in seq(n)) {
llmat.term3[i] <- t(obs[i,]-mus) %*% solve(S) %*% (obs[i,]-mus) }
sum(llmat.term3)
},
b = sum(diag(solve(S, tcrossprod(u)))),
c = sum(u * solve(S, u)),
check = 'equal', times = 10)
NB: took me a while to get the seed you used. Next time include it in your data generation

Related

How do I need to assign values to each other in triplets using R?

The situation is as follows:
I need to create a dataset of triplets where we have discrete distribution of stock prices S <- c(80,100,120,140,160), with probability P <- c(0.2, 0.3, 0.2, 0.2, 0.1), call option C <- max(S-120,0) = c(0,0,0,20,40) and liability of an option which pays 30 if in a certain region otherwise zero, namely L = I{110 \leq S \leq 150} = c(0,0,30,30,0) <- c(0,0,30,30,0). It is important to mention that if P[1] = 80, then C[1] and L[1]. This holds for i = 1,2,3,4,5. How do you create a dataset for N = 10000 simulations where each value for i corresponds to the other two values for the same i?
This is the code I had for now. Note that X_1 = S, X_2 = C and Y = L.
X_1 <- function(n) {
sample(c(80,100,120,140,160), size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
X_2 <- function(n) {
sample(X_1 - 120, size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
Y <- function(n) {
sample(L, size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
##Creating triplets##
df <- data.frame(S_T = X_1(10000), C_T = X_2(10000), L_T =Y(10000))
df```
I'm not sure if you want C_T to be dependent on the S_T values. If you do, I think you just want to call X_1, assign the results to an object, then use that as the argument to X_2 (or just subtract 120, which is what X_2 does).
X_1 <- function(n) {
sample(c(80,100,120,140,160), size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
# Call that function
S_T <- X_1(10) # for practice
C_T <- S_T - 120 # that's all you're doing in function X_2, if you want to use S_T
If you want to C_T to contain values independent of S_T, you can create function within function
X_1 <- function(n) {
sample(c(80,100,120,140,160), size = n,
prob = c(0.2, 0.3, 0.2, 0.2, 0.1), replace=T)
}
X_2 <- function(n) {
X_1(n) - 120
}
S_T <- X_1(10) # Same as above
C_T <- X_2(10) # Gives values not dependent on S_T
EDIT to address comment below:
It's hard to read the comment, but it looks like you want create a function that takes the results of function X_1 and returns a result based on a condition. Use ifelse to read each element one at at time. You can create another function and then input the results of function X_1
Y <- function(X_1_func){
ifelse( X_1_func == 80,
return(0),
ifelse(X_1_func == 100,
return(0),
ifelse(X_1_func == 120,
return(30),
return(60) # Add a default value here or the last possible value if others are F
)
)
)
}
sapply(X_1(10), Y) # Use an apply to input one element of function X_1 at a time. Assign results to L or whatever you with to call.
If this all works for you, you can accept the answer.

Solving a system of multiple ODEs in R

I have a system with X connected patches, each has a simple predator-prey model like that:
C_i = r_c*C -d*C + e*P + b*\sum_j(A_ij*C_j)
P_i = r_p*P +e*P + b*\sum_j(A_ij*P_j)
where C_i and P_i are consumer and prey abundances in patch i; r_c,r_p are growth rates; d is rate of consumer death, e is rate of prey intake. The last term in each equation is the influx of consumers or prey: A_ij is a matrix indicating if patches i and j are connected, and b is a rate of migration from patch j. (My system is more complicated but this simple example will do).
This kind of system will require as many ODE systems as there are patches. Any idea how to implement this kind of system? I know how to implement it for a single patch (no indices and no influx term) with deSolve. So any solution with deSolve is preferred.
Latex version of equations:
Consumer equation in patch i
Prey equation in patch i
The following matrix predator-prey model may serve as a starting point:
library(deSolve)
model <- function(t, n, parms) {
with(parms, {
dn <- r * n + n * (A %*% n)
list(dn)
})
}
parms <- list(
r = c(r1 = 0.1, r2 = 0.1, r3 = -0.1, r4 = -0.1),
A = matrix(c(
0.0, 0.0, -0.2, 0.0, # prey 1
0.0, 0.0, 0.0, -0.1, # prey 2
0.2, 0.0, 0.0, 0.0, # predator 1; eats prey 1
0.0, 0.1, 0.0, 0.0), # predator 2; eats prey 2
nrow = 4, ncol = 4, byrow = TRUE)
)
times = seq(0, 500, 0.1)
n0 = c(n1 = 1, n2 = 1, n3 = 2, n4 = 2)
out <- ode(n0, times, model, parms)
plot(out)
Here a more meta-population related example following our off-list discussion. Please not that this is just experimental and comes WITHOUT WARRANTY. Feedback and improvements are welcome.
library(deSolve)
n <- 7 # number of metapopulations
beta <- rep(c(-500, 500, 0), each = n)
gamma <- rep(c(0, -365/13, 365/13), each = n)
## case(1) a "fully connected" system
#mig <- 1e-10 # migration rate
#As <- matrix(mig, nrow=n, ncol=n)
#diag(As) <- 0
## case (2) directed move
mig <- 0.0001 # migration rate
As <- matrix(0, nrow=n, ncol=n)
As[1:(n-1), 2:n] <- diag(mig, n-1)
As[2:n, 1:(n-1)] <- As[2:n, 1:(n-1)] + diag(mig, n-1)
## case (3) enter migration matrix manually ...
## expand movement to full matrix, within respective states S, I, R
## assumes that all states move equally; this can of course be changed
A <- matrix(0, nrow = 3 * n, ncol = 3 * n)
A[1:n, 1:n] <- As
A[(n+1):(2*n), (n+1):(2*n)] <- As
A[(2*n+1):(3*n), (2*n+1):(3*n)] <- As
## balance: what moves to other cells needs to be removed from the cell itself
diag(A) <- -rowSums(A)
## migration matrix A
## - positive values: what moves from the neighbors
## - negative values: what moves to the neighbors
A
S <- rep(0.99, times=n)
I <- c(0.01, rep(0, n-1)) # only first sub-population infected
R <- rep(0, times=n)
Y0 <- c(S, I, R)
sirmodel <- function(t, Y, parameters) {
S <- Y[1:n]
I <- Y[(n+1):(2*n)]
# dS <- -beta*S*I
# dI <- beta*S*I-gamma*I
# dR <- gamma*I
dY <- beta * S * I + gamma * I + Y %*% A
list(dY)
}
times <-seq(from=0, to=0.2, length.out=100)
out <- ode(y = Y0, times = times, func = sirmodel, parms = NULL)
windows(height = 6, width = 2 * n) # create somewhat bigger window
plot(out, xlab = "time", ylab = "-", mfrow=c(3, n))

Sampling iteratively without a for loop in R

Even though I think the issue I have may be simple, I nevertheless can't figure it out. Here's the thing:
I have the following list and vector. The list is used to fill up the vector:
probabilities = list(c(0.2, 0.3, 0.5), c(0.1, 0.1, 0.8), c(0.3,0.4,0.3))
nextState = c()
for(counter in 1:3){
nextState[counter] = sample(1:3, size = 1, prob = probabilities[[counter]])
}
The code works fine. However, when expanding to larger lists (>10,000 elements), the loop becomes aggravatingly slow. Since the loop above is used multiple times in the larger code, the time consumed is way too much. Would there be a way to achieve the same result without looping?
Additional question:
Thanks guys, you've been a big help. One additional question: How would approach the same issue if the probabilities and the nextState were interdependent Meaning, how could I avoid the for loop? Perhaps some code to clarify:
M <- list(matrix(c(0.1, 0.2, 0.7, 0.2, 0.2, 0.6, 0.3, 0.3, 0.4), nrow = 3, ncol = 3),
matrix(c(0.3, 0.3, 0.4, 0.5, 0.5, 0, 0.1, 0.1, 0.8), nrow = 3, ncol = 3))
probabilities <- list()
nextState <- c(2, NA, NA)
for(i in 1:2){
probabilities[[i]] <- M[[i]][nextState[i], ]
nextState[i + 1] <- sample(1:3, size = 1, prob = probabilities[[i]])
}
If you've got any idea, then you truly are miracle workers!!
try sapply
nextstate <- sapply( probabilities, function(x) {sample(1:3, size = 1, prob = x)})
benchmarks
# Unit: microseconds
# expr min lq mean median uq max neval
# for 2115.170 2223.475 2436.0797 2283.2755 2371.546 10048.64 100
# sapply 24.704 29.524 164.0261 37.3565 41.123 12763.03 100
microbenchmark::microbenchmark(
`for` = {
nextState = c()
for(counter in 1:3){
nextState[counter] = sample(1:3, size = 1, prob = probabilities[[counter]])
}
},
sapply = sapply( probabilities, function(x) {sample(1:3, size = 1, prob = x)}),
times = 100)
Another possibility with purrr package:
library(purrr)
nexstate <- map_int(probabilities, function(x) {sample(1:3, size = 1, prob = x)})
Data:
probabilities = list(c(0.2, 0.3, 0.5), c(0.1, 0.1, 0.8), c(0.3,0.4,0.3))

R use apply when arguments contain arrays and scalars

I want to avoid the following loop:
for(i in 1:2){
vectVal[i] = myFunc(M[,,i],S[,,i],phi2, sig2)
}
by using the apply function.
The problem is that the arguments passed to the apply function contain arrays (--> M and S) and scalars (--> phi2 and sig2).
I tried the following:
apply(M,3,myFunc, S = S, phi2 = phi2, sig2 = sig2)
which resulted in an error message because S is an array and not a matrix as required in myFunc (see below):
Here is a reproducible code:
M = array(data = c(
0.5, 0.7, 0.45,
0.5, 0.3, 0.45,
0.5, 0.7, 0.3,
0.5, 0.3, 0.7,
0.5, 0.7, 0.45,
0.5, 0.3, 0.55),
dim = c(3,2,2),
)
S = array(data = c(
0.7723229, -0.2149794, -0.2159068,
-0.2149794, 0.7723229, -0.2083123,
-0.2159068, -0.2083123, 0.7723229,
0.7723229, -0.2149794, -0.2149794,
-0.2149794, 0.7723229, -0.1783025,
-0.2149794, -0.1783025, 0.7723229,
0.7723229, -0.2149794, -0.2176665,
-0.2149794, 0.7723229, -0.2111496,
-0.2176665, -0.2111496, 0.7723229),
dim = c(3,3,2)
)
phi2 = 0.5
sig2 = 0.3
myFunc = function(M, S, phi2, sig2){
valMult = M[,1]%*%diag(S)
valEnd = valMult + phi2 - sig2
return(valEnd)
}
vectVal = vector(length = 2)
for(i in 1:2){
vectVal[i] = myFunc(M[,,i],S[,,i],phi2, sig2)
}
vectVal
Does someone has an idea?
One (not particularly efficient) way would be to use plyr to split your arrays into lists (each element of the lists are the third dimension of your arrays). You could then use mapply to run your function like so:
require( plyr)
ml <- alply( M , 3 )
sl <- alply( S , 3 )
mapply( myFunc , ml , sl , phi2 , sig2 )
# 1 2
#1.474333 1.358484
Update:
A more vectorised alternative (but still not as fast as for and %*% [see #JorisMeys comment below]) is to get the diag of S and then use colSums and matrix multiplication like so to achieve the same result:
s <- apply(S,3,diag)
colSums( M[,1,] * s ) + phi2 - sig2
# [1] 1.474333 1.358484
Update, update:
#JorisMeys has written a vectorised extractor function for getting the diagonal elements of 3D square arrays. Check this out.

How to set parameters' sum to 1 in constrained optimization

Here's the code (I'm sorry if it's so long, but it was the first example I had); I'm using the CVaR example from CreditMetrics package by A. Wittmann and DEoptim solver to optimize:
library(CreditMetrics)
library(DEoptim)
N <- 3
n <- 100000
r <- 0.003
ead <- rep(1/N,N)
rc <- c("AAA", "AA", "A", "BBB", "BB", "B", "CCC", "D")
lgd <- 0.99
rating <- c("BBB", "AA", "B")
firmnames <- c("firm 1", "firm 2", "firm 3")
alpha <- 0.99
# correlation matrix
rho <- matrix(c( 1, 0.4, 0.6,
0.4, 1, 0.5,
0.6, 0.5, 1), 3, 3, dimnames = list(firmnames, firmnames),
byrow = TRUE)
# one year empirical migration matrix from standard&poors website
rc <- c("AAA", "AA", "A", "BBB", "BB", "B", "CCC", "D")
M <- matrix(c(90.81, 8.33, 0.68, 0.06, 0.08, 0.02, 0.01, 0.01,
0.70, 90.65, 7.79, 0.64, 0.06, 0.13, 0.02, 0.01,
0.09, 2.27, 91.05, 5.52, 0.74, 0.26, 0.01, 0.06,
0.02, 0.33, 5.95, 85.93, 5.30, 1.17, 1.12, 0.18,
0.03, 0.14, 0.67, 7.73, 80.53, 8.84, 1.00, 1.06,
0.01, 0.11, 0.24, 0.43, 6.48, 83.46, 4.07, 5.20,
0.21, 0, 0.22, 1.30, 2.38, 11.24, 64.86, 19.79,
0, 0, 0, 0, 0, 0, 0, 100
)/100, 8, 8, dimnames = list(rc, rc), byrow = TRUE)
cm.CVaR(M, lgd, ead, N, n, r, rho, alpha, rating)
y <- cm.cs(M, lgd)[which(names(cm.cs(M, lgd)) == rating)]
Now I write my function...
fun <- function(w) {
# ...
- (t(w) %*% y - r) / cm.CVaR(M, lgd, ead = w, N, n, r,
rho, alpha, rating)
}
...and I want to optimize it:
DEoptim(fn = fun, lower = rep(0, N), upper = rep(1, N),
control = DEoptim.control())
Can you tell me what do I have to insert in # ... to make sum(w) = 1 during optimization?
Below I show you optimization results according to flodel's tips:
# The first trick is to include B as large number to force the algorithm to put sum(w) = 1
fun <- function(w) {
- (t(w) %*% y - r) / cm.CVaR(M, lgd, ead = w, N, n, r, rho, alpha, rating) +
abs(10000 * (sum(w) - 1))
}
DEoptim(fn = fun, lower = rep(0, N), upper = rep(1, N),
control = DEoptim.control())
$optim$bestval
[1] -0.05326055
$optim$bestmem
par1 par2 par3
0.005046258 0.000201286 0.994752456
parsB <- c(0.005046258, 0.000201286, 0.994752456)
> fun(parsB)
[,1]
[1,] -0.05326089
...and...
As you can see, the first trick works better in that he finds a results which is smaller than the second one. Unfortunately it seems he takes longer.
# The second trick needs you use w <- w / sum(w) in the function itself
fun <- function(w) {
w <- w / sum(w)
- (t(w) %*% y - r) / cm.CVaR(M, lgd, ead = w, N, n, r, rho, alpha, rating) #+
#abs(10000 * (sum(w) - 1))
}
DEoptim(fn = fun, lower = rep(0, N), upper = rep(1, N),
control = DEoptim.control())
$optim$bestval
[1] -0.0532794
$optim$bestmem
par1 par2 par3
1.306302e-15 2.586823e-15 9.307001e-01
parsC <- c(1.306302e-15, 2.586823e-15, 9.307001e-01)
parC <- parsC / sum(parsC)
> fun(parC)
[,1]
[1,] -0.0532794
Any comment?
Should I increase the number of iterations because of a "too-stochastic" to-be-optimized-function?
Try:
w <- w / sum(w)
and if DEoptim gives you an optimal solution w* such that sum(w*) != 1 then w*/sum(w*) should be your optimal solution.
Another approach is to solve over all your variables but one. We know the value of the last variable must be 1 - sum(w) so in the body of the function, have:
w <- c(w, 1-sum(w))
and do the same to the optimal solution returned by DEoptim: w* <- c(w*, 1-sum(w*))
Both solutions require that you re-formulate your problem into an unconstrained (not counting for variable bounds) optimization so DEoptim can be used; which forces you to do a little extra work outside of DEoptim to recover the solution to the original problem.
In reply to your comment, if you want DEoptim to give you the correct answer right away (i.e. without the need for a post-transformation), you could also try to include a penalty cost to your objective function: for example add B * abs(sum(w)-1) where B is some arbitrary large number so sum(w) will be forced to 1.
I think you should add a penalty for any deviation from one.
Add to your minimizing problem the term +(sum(weights) - 1)^2 * 1e10. You should see that this huge penalty will force the weights to sum to 1!
With the trick you applied:
fun <- function(w) {
w <- w / sum(w)
- (t(w) %*% y - r) / cm.CVaR(M, lgd, ead = w, N, n, r, rho, alpha, rating) #+
#abs(10000 * (sum(w) - 1))
}
Why would you not use optim in this case? I think it will be much faster.

Resources