My attempt is:
library(igraph)
set.seed(41)
n<-10
A <- sample.int (2, n*n, TRUE)-1L; dim(A) <- c(n,n);
m <- sum(A)
g <- graph_from_adjacency_matrix(A)
k_in <- degree(g, v = V(g), mode = "in", loops = TRUE, normalized = FALSE)#; k_in
k_out <- degree(g, v = V(g), mode = "out", loops = TRUE, normalized = FALSE)#; k_out
p <- (k_in %*% t(k_out) / (2*m))/(k_in %*% t(k_out) / (2*m) + k_in %*% t(k_out) / (2*m))
round(p, 3)
All values of probability matrix p is 0.5.
I think the error in the denominator of p, because matrix A is not symmetry.
Question. How to specify the denominator correctly?
Edit. After the Stéphane Laurent's answer.
I think we should have for different value: k_j_out, k_i_in, k_i_out, k_j_in.
Finally, I need to obtain the weight matrix, W.
I <- matrix(0, n, n); diag(I) <- 1
W <- A %*% (I - P) - t(A) %*% (I - P)
And I think this matrix should symmetric.
The 2m is useless because it appears at both the numerator and the denominator. You can do:
p <- (k_in %*% t(k_out))/(k_out %*% t(k_in) + k_in %*% t(k_out))
Or, the same with less computations:
M <- k_in %*% t(k_out)
M / (M + t(M))
EDIT
We can check:
i <- 2; j <- 3
k_out[j] * k_in[i] / (k_out[j] * k_in[i] + k_out[i] * k_in[j])
# 0.5384615
p[i,j]
# 0.5384615
Related
Following this question: How to get the value of `t` so that my function `h(t)=epsilon` for a fixed `epsilon`?
I first sampling 500 eigenvectors v of a random matrix G and then generate 100 different random vectors initial of dimension 500. I normalized them in mats.
#make this example reproducible
set.seed(100001)
n <- 500
#Sample GOE random matrix
A <- matrix(rnorm(n*n, mean=0, sd=1), n, n)
G <- (A + t(A))/sqrt(2*n)
ev <- eigen(G)
l <- ev$values
v <- ev$vectors
#size of multivariate distribution
mean <- rep(0, n)
var <- diag(n)
#simulate bivariate normal distribution
initial <- MASS::mvrnorm(n=1000, mu=mean, Sigma=var) #ten random vectors
#normalized the first possible initial value, the initial data uniformly distributed on the sphere
xmats <- lapply(1:1000, function(i) initial[i, ]/norm(initial[i, ], type="2"))
Then I compute res
h1t <- function(t,x_0) {
h10 <- c(x_0 %*% v[, n])
denom <- vapply(t, function(.t) {
sum((x_0 %*% v)^2 * exp(-4*(l - l[n]) * .t))
}, numeric(1L))
abs(h10) / sqrt(denom)
}
find_t <- function(x, epsilon = 0.01, range = c(-50, 50)) {
uniroot(function(t) h1t(t, x) - epsilon, range,
tol = .Machine$double.eps)$root
}
I want to get res:
res <- lapply(xmats, find_t)
However, it shows error that Error in uniroot(function(t) h1t(t, x) - epsilon, range, tol = .Machine$double.eps) : f() values at end points not of opposite sign
res is a list. I run hist(unlist(res)) and it worked well.
I have been trying to figure out the core part of the varimax function in R. I found a wiki link that writes out the algorithm. But why is B <- t(x) %*% (z^3 - z %*% diag(drop(rep(1, p) %*% z^2))/p) is computed? I also am not sure as to why SVD is computed of the matrix B. The iteration step is probably to maximize/minimize the variance, and the singular values would really be variances of Principal Components. But I am also unsure about that. I am pasting the whole code of varimax for convenience, but really the relevant part and therefore my question on what is actually happening under the hood, is within the for loop.
function (x, normalize = TRUE, eps = 1e-05)
{
nc <- ncol(x)
if (nc < 2)
return(x)
if (normalize) {
sc <- sqrt(drop(apply(x, 1L, function(x) sum(x^2))))
x <- x/sc
}
p <- nrow(x)
TT <- diag(nc)
d <- 0
for (i in 1L:1000L) {
z <- x %*% TT
B <- t(x) %*% (z^3 - z %*% diag(drop(rep(1, p) %*% z^2))/p)
sB <- La.svd(B)
TT <- sB$u %*% sB$vt
dpast <- d
d <- sum(sB$d)
if (d < dpast * (1 + eps))
break
}
z <- x %*% TT
if (normalize)
z <- z * sc
dimnames(z) <- dimnames(x)
class(z) <- "loadings"
list(loadings = z, rotmat = TT)
}
Edit: The algorithm is available in the book "Factor Analysis of Data Matrices" by Holt, Rinehart and Winston and the actual sources can be found therein. This book is also cited with the varimax function in R.
I calculated the integral of the product of a Gaussian density and some function.
First, I did it with the function int2() (rmutil package).
And then, I did it with Gauss-Hermite points.
The two results I have obtained are different.
Should I consider that the Gauss-Hermite method is the good one and the numerical integration is an approximation ?
I provide below an example :
1. rmutil::int2()
library(rmutil)
Sig <- matrix (c(0.2^2, 0, 0, 0.8^2), ncol=2)
Mu<- c(2, 0)
to.integrate <- function(B0, B1) {
first.int= 1/0.8 * (1.2 * exp(B0 + B1 * 0.5))^(-1/0.8) * gamma(1/0.8)
B=matrix(c(B0, B1), ncol=1)
multi.norm=1 / (2 * pi * det(Sig)^(1/2)) *
exp (- 0.5 * t( B - Mu ) %*% solve(Sig) %*%( B - Mu ) )
return (first.int %*% multi.norm)
}
result_int2 <- int2(to.integrate, a=c(-Inf, -Inf), b=c(Inf, Inf),
eps=1.0e-6, max=16, d=5)
2. Compute multivariate Gaussian quadrature points:
library(statmod)
mgauss.hermite <- function(n, mu, sigma) {
dm <- length(mu)
gh <- gauss.quad(n, 'hermite')
gh <- cbind(gh$nodes, gh$weights)
idx <- as.matrix(expand.grid(rep(list(1:n), dm)))
pts <- matrix(gh[idx, 1], nrow(idx), dm)
wts <- apply(matrix(gh[idx, 2], nrow(idx), dm), 1, prod)
eig <- eigen(sigma)
rot <- eig$vectors %*% diag(sqrt(eig$values))
pts <- t(rot %*% t(pts) + mu)
return(list(points=pts, weights=wts))
}
nod_wei <- mgauss.hermite(10, mu=Mu, sigma=Sig)
gfun <- function(B0, B1) {
first.int <- 1/0.8 *(1.2 * exp(B0 + B1 * 0.5))^(-1/0.8)* gamma(1/0.8)
return(first.int)
}
result_GH <- sum(gfun(nod_wei$points[, 1], nod_wei$points[, 2]) * nod_wei$weights)/pi
result_int2
result_GH
The mistake came from the way the points were calculated in the mgauss.hermite function.
I changed the decomposition of the Sigma matrix for a Cholesky decomposition with a multiplication by square root of 2.
And the results of the two methods became very similar.
Below is the correction of the mgauss.hermite function
mgauss.hermite <- function(n, mu, sigma) {
dm <- length(mu)
gh <- gauss.quad(n, 'hermite')
gh <- cbind(gh$nodes, gh$weights)
idx <- as.matrix(expand.grid(rep(list(1:n),dm)))
pts <- matrix(gh[idx,1],nrow(idx),dm)
wts <- apply(matrix(gh[idx,2],nrow(idx),dm), 1, prod)
rot <- 2.0**0.5*t(chol(sigma))
pts <- t(rot %*% t(pts) + mu)
return(list(points=pts, weights=wts))
}
Here is the current script I have:
delta <- 1/52
T <- 0.5
S0 <- 25
sigma <- 0.30
K <- 25
r <- 0.05
n <- 1000000
m <- T/delta
S <- numeric(m + 1)
S[1] <- S0
#Payoff asian option
asian_option_price <- function() {
for(j in 1:m) {
W <- rnorm(1)
S[j + 1] <- S[j] * exp((r - 0.5 * sigma^2) * delta + sigma * sqrt(delta) * W)
}
Si.bar <- mean(S)
exp(-r * T) * max(Si.bar - K, 0)
}
#Loops
C <- raply(n, asian_option_price(), .progress = "text")
My issue is that I need to use "-W" for a second simulation right after this one is done. The way the script is made, "W" is inside my loop which makes it impossible (i think) to use the corresponding "-W" after that. I think I need to use an independent matrix filled with rnorm() mat(x) = matrix(rnorm(m*n,mean=0,sd=1), m, n) so that I can simply use -mat(x) in my second simulation. I don't get how to take "W" out of my loop and still use it's corresponding matrix. Any help would be very useful. Thanks!
Your idea to preallocate all the random numbers is correct. You could then loop over the individual entries. However, it is faster to go for a vectorized approach:
delta <- 1/52
T <- 0.5
S0 <- 25
sigma <- 0.30
K <- 25
r <- 0.05
n <- 100000
m <- ceiling(T/delta)
W <- matrix(rnorm(n*m), nrow = m, ncol = n)
S <- apply(exp((r - 0.5 * sigma^2) * delta + sigma * sqrt(delta) * W), 2, cumprod)
S <- S0 * rbind(1, S)
Si_bar <- apply(S, 2, mean)
mean(pmax(Si_bar -K, 0)) * exp(-r*T)
I am trying to implement the Regularized Latent Semantic Indexing (RLSI) algorithm on R.
The original paper can be found here:
http://research.microsoft.com/en-us/people/hangli/sigirfp372-wang.pdf
Below is my code.
Here, I generate a matrix D from two matrices U and V. Each column of U correspond to a topic vector, and it is made to be sparse. After that, I apply RLSI on the D matrix to see if I can factorize it into two matrices, one of which has sparse vectors like U.
However, the resulting U is far from being sparse. Actually, every element of it is filled with numbers.
Is there something wrong with my code?
Thank you very much in advance.
library(magrittr)
# functions
updateU <- function(D,U,V){
S <- V %*% t(V)
R <- D %*% t(V)
for(m in 1:M){
u_m <- rep(0, K)
u_previous <- u_m
diff_u <- 100
while(diff_u > 0.1){
for(k in 1:K){
w_mk <- R[m,k] - S[k,-k] %*% U[m,-k]
in_hinge <- (abs(w_mk) - 0.5 * lambda_1)
u_m[k] <- (ifelse(in_hinge > 0, in_hinge, 0) * ifelse(w_mk >= 0, 1, -1)) / S[k,k]
}
diff_u <- sum(u_m - u_previous)
u_previous <- u_m
}
U[m,] <- u_m
}
return(U)
}
updateV <- function(D,U,V){
Sigma <- solve(t(U) %*% U + lambda_2 * diag(K))
Phi <- t(U) %*% D
V <- Sigma %*% Phi
return(V)
}
# Set constants
M <- 5000
N <- 1000
K <- 30
lambda_1 <- 1
lambda_2 <- 0.5
# Create D
originalU <- c(rpois(50000, lambda = 10), rep(0, 100000)) %>% sample(., 150000) %>% matrix(., M, K)
originalV <- rpois(30000, lambda = 5) %>% sample(., 30000) %>% matrix(., K, N)
D <- originalU %*% originalV
# Initialize U and V
V <- matrix(rpois(30000, lambda = 5), K, N)
U <- matrix(0, M, K)
# Run RLSI (iterate 100 times for now)
for(t in 1:100){
cat(t,":")
U <- updateU(D,U,V)
V <- updateV(D,U,V)
loss <- sum((D - U %*% V) ^ 2)
cat(loss, "\n")
}
I've got it. Each row in U has to be set to a zero vector each time updateU function is run.