How to find when a matrix converges with a loop - r

I've been given a matrix:
P <- matrix(c(0, 0, 0, 0.5, 0, 0.5, 0.1, 0.1, 0, 0.4, 0, 0.4, 0, 0.2, 0.2, 0.3, 0, 0.3, 0, 0, 0.3, 0.5, 0, 0.2, 0, 0, 0, 0.4, 0.6, 0, 0, 0, 0, 0, 0.4, 0.6), nrow = 6, ncol = 6, byrow = TRUE)
Using the functions, mpow, rows_equal, matrices_equal. I want to find when P^n converges, in other words what n is, when all the rows are equal in the matrix and when P^n = P^(n+1).
By just looking at the functions i have managed to deduce that around n=19-21 the matrix will converge.
Although, I want to find the right n using a loop. Here under are the functions mpow, rows_equal and matrices_equal. I know they can be written differently but please keep them as they are.
mpow <- function(P, n, d=4) {
if (n == 0) diag(nrow(P)))
else if (n== 1) P
else P %*% mpow(P, n - 1))
}
rows_equal <- function(P, d = 4) {
P_new <- trunc(P * 10^d)
for (k in 2:nrow(P_new)) {
if (!all(P_new[1, ] == P_new[k, ])) {
return(FALSE)}
}
return(TRUE)
}
matrices_equal <- function(A, B, d = 4) {
A_new <- trunc(A * 10^d)
B_new <-trunc(B * 10^d)
if (all(A_new == B_new)) TRUE else FALSE
}
Now, to write the loop, we should do it something along the lines of:
First creating a function like so:
when_converged <- function(P) {...}
and
for (n in 1:50)
To try for when t.ex n = 50.
Although i don't know how to write the code correctly to do so, can anyone help me with that?
Thank you for reading my question.

Actually, a much better way is to do this:
## transition probability matrix
P <- matrix(c(0, 0, 0, 0.5, 0, 0.5, 0.1, 0.1, 0, 0.4, 0, 0.4, 0, 0.2, 0.2, 0.3, 0, 0.3, 0, 0, 0.3, 0.5, 0, 0.2, 0, 0, 0, 0.4, 0.6, 0, 0, 0, 0, 0, 0.4, 0.6), nrow = 6, ncol = 6, byrow = TRUE)
## a function to find stationary distribution
stydis <- function(P, tol = 1e-16) {
n <- 1; e <- 1
P0 <- P ## transition matrix P0
while(e > tol) {
P <- P %*% P0 ## resulting matrix P
e <- max(abs(sweep(P, 2, colMeans(P))))
n <- n + 1
}
cat(paste("convergence after",n,"steps\n"))
P[1, ]
}
Then when you call the function:
stydis(P)
# convergence after 71 steps
# [1] 0.002590674 0.025906736 0.116580311 0.310880829 0.272020725 0.272020725
The function stydis, essentially continuously does:
P <- P %*% P0
until convergence of P is reached. Convergence is numerically determined by the L1 norm of discrepancy matrix:
sweep(P, 2, colMeans(P))
The L1 norm is the maximum, absolute value of all matrix elements. When the L1 norm drops below 1e-16, convergence occurs.
As you can see, convergence takes 71 steps. Now, we can obtain faster "convergence" by controlling tol (tolerance):
stydis(P, tol = 1e-4)
# convergence after 17 steps
# [1] 0.002589361 0.025898057 0.116564506 0.310881819 0.272068444 0.271997814
But if you check:
mpow(P, 17)
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 0.002589361 0.02589806 0.1165645 0.3108818 0.2720684 0.2719978
# [2,] 0.002589415 0.02589722 0.1165599 0.3108747 0.2720749 0.2720039
# [3,] 0.002589738 0.02589714 0.1165539 0.3108615 0.2720788 0.2720189
# [4,] 0.002590797 0.02590083 0.1165520 0.3108412 0.2720638 0.2720515
# [5,] 0.002592925 0.02592074 0.1166035 0.3108739 0.2719451 0.2720638
# [6,] 0.002588814 0.02590459 0.1166029 0.3109419 0.2720166 0.2719451
Only the first 4 digits are the same, as you put tol = 1e-4.
A floating point number has a maximum of 16 digits, so I would suggest you use tol = 1e-16 for reliable convergence test.

Related

Vectorizing many for loops

I have a summation that I can calculate using four for loops but I wonder if this can be simplified, maybe using a vectorized function, to reduce the computation time. Something similar to the Kronecker product (in R: kronecker(x, x)), or maybe something using outer?
The summation is:
where E is the sample space of integers ranging from 1 - 9. The i and j indices are also integers ranging from 1 - 9.
So, f, g, and h all matrices of dimension 9x9.
The h matrix is fixed and I have that but I am simulating g many times and then I choose the one that minimizes another function. The problem is, one thousand simulations, which is too few, takes about 1 second. I really want to try a million, but that many would take a long time.
I have the for loops in a function:
sim <- function(y, nreps, h) {
G <- vector("list", nreps) # list containing random values from Dirichlet distribution
F <- vector("list", nreps) # list containing the f matrices
M <- vector("numeric", nreps) # vector to store the results
require(gtools)
for(n in 1:nreps) {
f <- matrix(0, nrow=9, ncol=9) # initialize f
g <- gtools::rdirichlet(9, rep(1,9)) # simulate g
for(i in 1:9) {
for(j in 1:9) {
for(k in 1:9) {
for(l in 1:9) {
f[i,j] <- f[i,j] + h[i,k] * h[j,l] * g[k,l] # summation (see above)
}
}
}
}
F[[n]] <- f # store f matrix
G[[n]] <- g # store g matrix
M[n] <- sum((y - f)^2) # sum of squared differences between y and f
}
m <- which.min(M) # which M is the minimum?
return(list(g=G[[m]], m=M[m]))
}
And I call the function with
sim(y=f.y1, nreps=1000, h=x)
Here is the data:
> dput(f.y1)
structure(c(0.0182002022244692, 0.0121334681496461, 0.0101112234580384,
0, 0, 0, 0, 0, 0, 0.0485338725985844, 0.0940343781597573, 0.112234580384226,
0.0434782608695652, 0.00910010111223458, 0.00101112234580384,
0, 0, 0, 0.0333670374115268, 0.110212335692619, 0.132457027300303,
0.0808897876643074, 0.0222446916076845, 0.0070778564206269, 0.00101112234580384,
0, 0, 0.0070778564206269, 0.0202224469160768, 0.0596562184024267,
0.0616784630940344, 0.0262891809908999, 0.0070778564206269, 0,
0, 0, 0.00202224469160768, 0.00505561172901921, 0.0151668351870576,
0.0182002022244692, 0.0111223458038423, 0.00404448938321537,
0, 0, 0, 0.00202224469160768, 0.00404448938321537, 0.00505561172901921,
0.00505561172901921, 0.00202224469160768, 0.00202224469160768,
0, 0, 0, 0, 0.00202224469160768, 0.00202224469160768, 0.00202224469160768,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0), class = "table", dim = c(9L, 9L), dimnames = structure(list(
c("0", "1", "2", "3", "4", "5", "6", "7", "8"), c("0", "1",
"2", "3", "4", "5", "6", "7", "8")), names = c("", "")))
> dput(x)
structure(c(0.61, 0.16, 0.03, 0.005, 0, 0, 0, 0, 0, 0.32, 0.61,
0.16, 0.03, 0.005, 0, 0, 0, 0, 0.06, 0.16, 0.61, 0.16, 0.03,
0.005, 0, 0, 0, 0.01, 0.06, 0.16, 0.61, 0.16, 0.03, 0.01, 0,
0, 0, 0.01, 0.03, 0.16, 0.61, 0.16, 0.03, 0.01, 0, 0, 0, 0.01,
0.03, 0.16, 0.61, 0.16, 0.06, 0.01, 0, 0, 0, 0.005, 0.03, 0.16,
0.61, 0.16, 0.06, 0, 0, 0, 0, 0.005, 0.03, 0.16, 0.61, 0.32,
0, 0, 0, 0, 0, 0.005, 0.03, 0.16, 0.61), dim = c(9L, 9L))
And you'll need to load the gtools package for the rdirichlet function. Thanks heaps!
library(gtools)
Luckily this particular example is just "simple" matrix multiplication, so can easily be vectorised with:
sim1 <- function(y, nreps, h) {
G <- vector("list", nreps) # list containing random values from Dirichlet distribution
F <- vector("list", nreps) # list containing the f matrices
M <- vector("numeric", nreps) # vector to store the results
require(gtools)
for(n in 1:nreps) {
g <- gtools::rdirichlet(9, rep(1,9)) # simulate g
f <- h %*% g %*% t(h)
F[[n]] <- f # store f matrix
G[[n]] <- g # store g matrix
M[n] <- sum((y - f)^2) # sum of squared differences between y and f
}
m <- which.min(M) # which M is the minimum?
return(list(g=G[[m]], m=M[m]))
}
Run function for comparison
#Original version
set.seed(0)
system.time(a <- sim(y=f.y1, nreps=1000, h=x))
# user system elapsed
# 0.97 0.03 1.00
#revised version
set.seed(0)
system.time(b <- sim1(y=f.y1, nreps=1000, h=x))
# user system elapsed
# 0.01 0.00 0.02
#Check they give the same answer
all.equal(a, b)
#[1] TRUE

constrOptim initial values

I'm attempting to use the constrOptim() function in R to optimise:
2x + 2y + 3z
subject to:
-2x + y + z <= 1
4x - y + 3z <= 3
x, y, z >= 0
What I have so far is this:
ui = matrix(c(2,-1,-1,-4, 1,-3, 1, 0, 0, 0, 1, 0, 0, 0, 1),
nrow = 5,
byrow = T)
ci = c(-1, -3, 0, 0, 0)
theta = c(0, 1, 0)
constrOptim(
theta = theta,
f = func,
ui = ui,
ci = ci)
This gives me the error that "initial value is not in the interior of the feasible region". However, if I run the following as a test:
ui = matrix(c(2,-1,-1,-4, 1,-3, 1, 0, 0, 0, 1, 0, 0, 0, 1),
nrow = 5,
byrow = T)
ci = c(-1, -3, 0, 0, 0)
theta = c(0, 1, 0)
ui %*% theta - ci
I get (0 4 0 1 0), which is definitely >=0.
My question is why do I get an error telling me that ui %*% theta - ci is not >= 0, when it clearly is? What am I missing?
Edit: managed to sort it out thanks to Stéphane Laurent.
Any ideas how best to plot the feasible region in R? Any useful packages?
The starting value must be in the interior of the feasible region, so you need > 0 and not >= 0. You can use theta = c(0.1, 1, 0.1)
ui = matrix(c(2,-1,-1,-4, 1,-3, 1, 0, 0, 0, 1, 0, 0, 0, 1),
nrow = 5,
byrow = T)
ci = c(-1, -3, 0, 0, 0)
theta = c(0.1, 1, 0.1)
all(ui %*% theta - ci > 0) # TRUE
constrOptim(
theta = theta,
f = function(xyz) c(crossprod(c(2,2,3), xyz)),
grad = NULL,
ui = ui,
ci = ci)
BTW, it seems obvious to me that the solution is c(0,0,0).
The cause of error has been explained from the answer by Stéphane Laurent.
An alternative of constrOptim is to use fmincon from package pracma, and you can run the code without any error even with initial values on the boundary, i.e., theta = c(0,1,0)
ui = matrix(c(2,-1,-1,-4, 1,-3, 1, 0, 0, 0, 1, 0, 0, 0, 1),
nrow = 5,
byrow = T)
ci = c(-1, -3, 0, 0, 0)
theta = c(0, 1, 0)
func <- function(v) crossprod(c(2,2,3),v)
res <- pracma::fmincon(theta,
f = func,
A = -ui,
b = -ci)
such that
> res
$par
[1] 0 0 0
$value
[,1]
[1,] 0
$convergence
[1] 0
$info
$info$lambda
$info$lambda$lower
[,1]
[1,] 0
[2,] 0
[3,] 0
$info$lambda$upper
[,1]
[1,] 0
[2,] 0
[3,] 0
$info$lambda$ineqlin
[1] 0 0 2 2 3
$info$grad
[,1]
[1,] 2
[2,] 2
[3,] 3
$info$hessian
[,1] [,2] [,3]
[1,] 1 0.00 0
[2,] 0 0.03 0
[3,] 0 0.00 1

How to return an element from the function that return a list in r?

I built my own function like this:
library(VineCopula)
Matrix <- c(5, 2, 3, 1, 4,
0, 2, 3, 4, 1,
0, 0, 3, 4, 1,
0, 0, 0, 4, 1,
0, 0, 0, 0, 1)
Matrix <- matrix(Matrix, 5, 5)
family <-par <- par2 <- list()
for(i in 1:3){
# define R-vine pair-copula family matrix
family[[i]] <- c(0, 1, 3, 4, 4,
0, 0, 3, 4, 1,
0, 0, 0, 4, 1,
0, 0, 0, 0, 3,
0, 0, 0, 0, 0)
family[[i]] <- matrix(family[[i]], 5, 5)
# define R-vine pair-copula parameter matrix
par[[i]] <- c(0, 0.2, 0.9, 1.5, 3.9,
0, 0, 1.1, 1.6, 0.9,
0, 0, 0, 1.9, 0.5,
0, 0, 0, 0, 4.8,
0, 0, 0, 0, 0)
par[[i]] <- matrix(par[[i]], 5, 5)
# define second R-vine pair-copula parameter matrix
par2[[i]] <- matrix(0, 5, 5)
}
my_func <- function(Matrix, family, par, par2){
x <- list()
for(i in 1:3){
x[[i]] <- RVineMatrix(Matrix = Matrix,family=family[[i]],par=par[[i]],par2 = par2[[i]])
}
x
}
This will return me a list. How can I then extract specific element from my function. For example, how I can get my_func$Matrix or my_func$par[1]
Note: family <– par <– par2 <– list(). I also tried return(x[i]$family[i]) and return NULL.
To run the function:
y <- my_func(Matrix = Matrix,family = family,par = par,par2 = par2)
> y$Matrix
NULL
Your function's return value is a listof class RVineMatrix with an element named Matrix. See the help page ?RVineMatrix, section Value. So you need y[[1]]$Matrix.
y <- my_func(Matrix, family, par, par2)
class(y)
[1] "list"
class(y[[1]])
[1] "RVineMatrix"
y[[1]]$Matrix
[,1] [,2] [,3] [,4] [,5]
[1,] 5 0 0 0 0
[2,] 2 2 0 0 0
[3,] 3 3 3 0 0
[4,] 1 4 4 4 0
[5,] 4 1 1 1 1
You'll need to assign the outcome of that function to something. For example:
y <- my_func(Matrix = ... , family = ..., par = ..., par2 = ...)
where ... above are the values of your arguments. Based on your my_func definition, the end result is an unnamed list so to access its elements you use:
y[[1]]
y[[2]]
y[[3]]
or just y to access all elements.

All rows in a matrix equal

Im trying to get the command rows_equal to work but not managing to do so.
The matrix in question is:
P <- matrix(c(0, 0, 0, 0.5, 0, 0.5, 0.1, 0.1, 0, 0.4, 0, 0.4, 0, 0.2, 0.2, 0.3, 0, 0.3, 0, 0, 0.3, 0.5, 0, 0.2, 0, 0, 0, 0.4, 0.6, 0, 0, 0, 0, 0, 0.4, 0.6), nrow = 6, ncol = 6, byrow = TRUE)
What I'm trying to figure out is how large "n" has to be in P^n that will make all the rows in the matrix equal.
mpow <- function(P, n) {if (n == 0) {return(diag(nrow(P)))} else if
(n == 1) {return(P)} else {
return(P %*% mpow(P, n - 1))} }
rows_equal <- function(P, d = 4) {P_new <- trunc(P * 10^d)
for (k in 2:nrow(P_new)) {if
(!all(P_new[1, ] == P_new[k, ])) {
return(FALSE)} }
return(TRUE) }
This is what I enter into Rstudio, however, I don't see what I'm doing wrong. Isn't the command rows_equal suppose to give us False or True?
Thank you for reading my question/
Confused student with a large headache.
Aside from formatting, I made two changes to your code:
you forgot to use your mpow function and used trunc(P * 10^d) instead.
I replaced == with all.equal, which allows for some numerical imprecision.
Using all.equal to compare numerical values is typically preferable to == in these situations.
rows_equal <- function(P, d = 4) {
P_new <- mpow(P, d)
for (k in 2:nrow(P_new)) {
if ((all.equal(P_new[1, ], P_new[k, ])) != TRUE) {
return(FALSE)
}}
return(TRUE)
}
This outputs
> rows_equal(P, 10)
[1] FALSE
> rows_equal(P, 50)
[1] TRUE

Creating Monte Carlo data for Dummy Variables in R

I'm setting up an Monte Carlo simulation, and I have been trying to create a set of dummy variables for 180 countries and 12 time periods. Given the large amount of data points, is there a shorter way to create dummy variables for time and country fixed effects without pulling it out of an excel file?
For Example
F.T(1) 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0. 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0......(Extends until 180 countries)
F.T(2) 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0. 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0......(Extrends until 180 countries)
Any help would be greatly appreciated.
Using replicate with your random number generator of choice should do the trick
Here is an example using a simple binomial distribution with prob = 1/2
replicate(12, rbinom(180, 1, .5), simplify=FALSE)
I think it might be easier/faster to create all data with rbinom first and then convert it into a matrix instead of calling rbinom 12 times. That is:
set.seed(45)
t <- rbinom(180*12, 1, 0.5)
dim(t) <- c(180, 12)
Just to see if there 's a difference, here's a benchmark
# I use simplify = TRUE here.
FUN1 <- function(n, a) {
set.seed(45)
replicate(n, rbinom(a, 1, .5), simplify = TRUE)
}
FUN2 <- function(n, a) {
set.seed(45)
t <- rbinom(n*a, 1, 0.5)
dim(t) <- c(a, n)
t
}
require(rbenchmark)
benchmark(t1 <- FUN1(1000, 12000), t2 <- FUN2(1000, 12000),
order="elapsed", replications=5)
# test replications elapsed relative user.self sys.self
# 2 t2 <- FUN2(1000, 12000) 5 3.991 1.000 3.859 0.111
# 1 t1 <- FUN1(1000, 12000) 5 5.337 1.337 4.785 0.472
identical(t1, t2)
# [1] TRUE
To answer your question in comment:
w <- rep(diag(12)[1:9, ], N)
dim(w) <- c(9, 12*N)
w <- t(w)
colnames(w) <- paste0("t", 1:9)
Or even better:
w2 <- do.call(rbind, replicate(N, diag(12)[, 1:9], simplify = FALSE))
colnames(w2) <- paste0("t", 1:9)

Resources