Finding stationary distribution of a markov process given a transition probability matrix - r

There has been two threads related to this issue on Stack Overflow:
How can I obtain stationary distribution of a Markov Chain given a transition probability matrix describes what a transition probability matrix is, and demonstrate how a stationary distribution is reached by taking powers of this matrix;
How to find when a matrix converges with a loop uses an R loop to determine when the matrix power converges.
The above is straightforward, but very expensive. If we have a transition matrix of order n, then at each iteration we compute a matrix-matrix multiplication at costs O(n ^ 3).
Is there a more efficient way to do this? One thing that occurs to me is to use Eigen decomposition. A Markov matrix is known to:
be diagonalizable in complex domain: A = E * D * E^{-1};
have a real Eigen value of 1, and other (complex) Eigen values with length smaller than 1.
The stationary distribution is the Eigen vector associated with the Eigen value of 1, i.e., the first Eigen vector.
Well, the theory is nice, but I can't get it work. Taking the matrix P in the first linked question:
P <- structure(c(0, 0.1, 0, 0, 0, 0, 0, 0.1, 0.2, 0, 0, 0, 0, 0, 0.2,
0.3, 0, 0, 0.5, 0.4, 0.3, 0.5, 0.4, 0, 0, 0, 0, 0, 0.6, 0.4,
0.5, 0.4, 0.3, 0.2, 0, 0.6), .Dim = c(6L, 6L))
If I do:
Re(eigen(P)$vectors[, 1])
# [1] 0.4082483 0.4082483 0.4082483 0.4082483 0.4082483 0.4082483
What's going on? According to previous questions, the stationary distribution is:
# [1] 0.002590673 0.025906737 0.116580322 0.310880848 0.272020713 0.272020708

Well, to use Eigen decomposition, we need to work with t(P).
The definition of a transition probability matrix differs between probability / statistics and linear algebra. In statistics all rows of P sum to 1, while in linear algebra, all columns of P sum to 1. So instead of eigen(P), we need eigen(t(P)):
e <- Re(eigen(t(P))$vectors[, 1])
e / sum(e)
# [1] 0.002590673 0.025906737 0.116580322 0.310880848 0.272020713 0.272020708
As we can see, we've only used the first Eigen vector, i.e., the Eigen vector of the largest Eigen value. Therefore, there is no need to compute all Eigen values / vectors using eigen. The power method can be used to find an Eigen vector of the largest Eigen value. Let's implement this in R:
stydis1 <- function (A) {
n <- dim(A)[1L]
## checking
if (any(.rowSums(A, n, n) != 1))
stop (" 'A' is not a Markov matrix")
## implement power method
e <- runif (n)
oldnorm <- sqrt(c(crossprod(e)))
repeat {
e <- crossprod(A, e)
newnorm <- sqrt(c(crossprod(e)))
if (abs(newnorm / oldnorm - 1) < 1e-8) break
e <- e / newnorm
oldnorm <- newnorm
}
## rescale `e` so that it sums up to 1
c(e / sum(e))
}
stydis1 (P)
# [1] 0.002590673 0.025906737 0.116580322 0.310880848 0.272020713 0.272020708
And the result is correct.
In fact, we don't have to exploit Eigen decomposition. We can adjust the method used in your second linked question. Over there, we took matrix power which is expensive as you commented; but why not re-cast it into a matrix-vector multiplication?
stydis2 <- function (A) {
n <- dim(A)[1L]
## checking
if (any(.rowSums(A, n, n) != 1))
stop (" 'A' is not a Markov matrix")
## direct computation
b <- A[1, ]
oldnorm <- sqrt(c(crossprod(b)))
repeat {
b <- crossprod(A, b)
newnorm <- sqrt(c(crossprod(b)))
if (abs(newnorm / oldnorm - 1) < 1e-8) break
oldnorm <- newnorm
}
## return stationary distribution
c(b)
}
stydis2 (P)
# [1] 0.002590673 0.025906737 0.116580322 0.310880848 0.272020713 0.272020708
We start from an arbitrary initial distribution, say A[1, ], and iteratively apply transition matrix until the distribution converges. Again, the result is correct.

Your vector y = Re(eigen(P)$vectors[, 1]) is not a distribution (since it doesn't add up to one) and solves P'y = y, not x'P = x. The solution from your linked Q&A does approximately solve the latter:
x = c(0.00259067357512953, 0.0259067357512953, 0.116580310880829,
0.310880829015544, 0.272020725388601, 0.272020725388601)
all(abs(x %*% P - x) < 1e-10) # TRUE
By transposing P, you can use your eigenvalue approach:
x2 = Re(eigen(t(P))$vectors[, 1])
x2 <- x2/sum(x2)
(function(x) all(abs(x %*% P - x) < 1e-10))(
x2
) # TRUE
It's finding a different stationary vector in this instance, though.

By the definition of the stationary probability vector, it is a left-eigenvector of the transition probability matrix with unit eigenvalue. We can find objects of this kind by computing the eigendecomposition of the matrix, identifying the unit eigenvalues and then computing the stationary probability vectors for each of these unit eigenvalues. Here is a function in R to do this.
stationary <- function(P) {
#Get matrix information
K <- nrow(P)
NAMES <- rownames(P)
#Compute the eigendecomposition
EIGEN <- eigen(P)
VALS <- EIGEN$values
RVECS <- EIGEN$vectors
LVECS <- solve(VECS)
#Find the unit eigenvalue(s)
RES <- zapsmall(Mod(VALS - as.complex(rep(1, K))))
IND <- which(RES == 0)
N <- length(IND)
#Find the stationary vector(s)
OUT <- matrix(0, nrow = N, ncol = K)
rownames(OUT) <- sprintf('Stationary[%s]', 1:N)
colnames(OUT) <- NAMES
for (i in 1:length(IND)) {
SSS <- Re(eigen(t(P))$vectors[, IND[i]])
OUT[i,] <- SSS/sum(SSS) }
#Give the output
OUT }
(Note: The computed eigendecomposition using eigen is subject to some numerical error, so there is no eigenvalue that is exactly equal to one. For this reason we zapsmall the modular deviation from one to identify the unit eigenvector(s). This will give us the correct answer so long as there is no true eigenvalue that is less than one, but so close to one that it also gets "zapped" to one.)
Applying this function to your transition probability matrix correctly identifies the unique stationary probability vector in this case. There is a small amount of numerical error in the computation, but this should be manageable in most cases.
#Compute the stationary probability vector
S <- stationary(P)
#Show this vector and confirm stationarity
S
[,1] [,2] [,3] [,4] [,5] [,6]
Stationary[1] 0.002590674 0.02590674 0.1165803 0.3108808 0.2720207 0.2720207
S %*% P
[,1] [,2] [,3] [,4] [,5] [,6]
Stationary[1] 0.002590674 0.02590674 0.1165803 0.3108808 0.2720207 0.2720207
#Show error in computation
c(S %*% P - S)
[1] 4.336809e-17 2.775558e-17 1.110223e-16 -2.775558e-16 1.665335e-16 -5.551115e-17

Related

Sum of correlation matrix convergence

Assume a correlation matrix P with diagonal of zero. I want to determine the order n where the sum of all the correlation matrices orders would converge i.e. diag(3)+ P + P%^%2 + P%^%3 + ... + P%^%n would converge meaning the L1 norm drops below a tol. I looked into How to find when a matrix converges with a loop but this doens't do it for me, since it doesn't keep the orders, nor it sums them up. I can do it in a really lengthy and lousy way with for loops and all but I don't want to, since I have a big df with many time windows, so I'm looking for something efficient. Thanks!
P <- matrix(c(0,0.1,0.8,0.1,0,-0.7,0.8,-0.7,0), nrow = 3, ncol = 3, byrow = TRUE)
Some notes: The %^% operator is from expm package. To sum the matrices I used matrix(mapply(sum, diag(3), P, P%^%2, P%^%3, MoreArgs=list(na.rm=T)), ncol=3).
x %^% n computes the nth power of x efficiently, but it is inefficient to compute x %^% i for all i from 0 to n, because each x %^% i requires O(log(i)) matrix multiplications.
In general, the most efficient way to compute all of the powers of x up to the nth is recursive multiplication by x, possibly taking advantage of the diagonalizability of x.
The difference is nontrivial for large n: whereas
x2 <- x %^% 2
x3 <- x %^% 3
x4 <- x %^% 4
## and so on
requires O(log(n!)) = O(n * log(n)) matrix multiplications,
x2 <- x %*% x
x3 <- x2 %*% x
x4 <- x3 %*% x
## and so on
requires just O(n).
Here is a function that recursively computes the powers of a matrix x and their sum until it encounters a power whose 1-norm is less than tol. It begins by checking that the spectral radius of x is less than 1, which is a necessary and sufficient condition for convergence of the norm of x %^% n to 0 and thus a necessary condition for convergence of the power series. It does not attempt to diagonalize x, which would simplify computation of the power series but complicate computation of norms.
f <- function(x, tol = 1e-06, nmax = 1e+03) {
stopifnot(max(abs(eigen(x, only.values = TRUE)$values)) < 1)
pow <- sum <- diag(nrow(x))
nrm <- rep.int(NA_real_, nmax + 1)
i <- 1
while ((nrm[i] <- norm(pow, "1")) >= tol && i <= nmax) {
pow <- pow %*% x
sum <- sum + pow
i <- i + 1
}
list(x = x, tol = tol, nmax = nmax, n = i - 1, sum = sum,
norm = nrm[seq_len(i)], converged = nrm[i] < tol)
}
Your matrix P has spectral radius greater than 1, hence:
P <- matrix(c(0, 0.1, 0.8, 0.1, 0, -0.7, 0.8, -0.7, 0), 3L, 3L, byrow = TRUE)
f(P)
Error in f(P) :
max(abs(eigen(x, only.values = TRUE)$values)) < 1 is not TRUE
We can always construct a matrix P whose spectral radius is less than 1, for the purpose of testing f:
set.seed(1L)
m <- 3L
V <- matrix(rnorm(m * m), m, m)
D <- diag(runif(m, -0.9, 0.9))
P <- V %*% D %*% solve(V)
all.equal(sort(eigen(P)$values), sort(diag(D))) # [1] TRUE
(fP <- f(P))
$x
[,1] [,2] [,3]
[1,] 0.26445172 0.5317116 -0.2432849
[2,] 0.04932194 0.6332122 0.1496390
[3,] -0.31174920 0.6847937 0.1682702
$tol
[1] 1e-06
$nmax
[1] 1000
$n
[1] 60
$sum
[,1] [,2] [,3]
[1,] 1.53006915 2.081717 -0.07302465
[2,] -0.04249899 4.047528 0.74063387
[3,] -0.60849191 2.552208 1.83947562
$norm
[1] 1.000000e+00 1.849717e+00 1.223442e+00 1.008928e+00 7.799426e-01
[6] 6.131516e-01 4.795602e-01 3.754905e-01 2.938577e-01 2.299751e-01
[11] 1.799651e-01 1.408263e-01 1.101966e-01 8.622768e-02 6.747162e-02
[16] 5.279503e-02 4.131077e-02 3.232455e-02 2.529304e-02 1.979107e-02
[21] 1.548592e-02 1.211727e-02 9.481396e-03 7.418905e-03 5.805067e-03
[26] 4.542288e-03 3.554202e-03 2.781054e-03 2.176090e-03 1.702724e-03
[31] 1.332329e-03 1.042507e-03 8.157298e-04 6.382837e-04 4.994374e-04
[36] 3.907945e-04 3.057848e-04 2.392672e-04 1.872193e-04 1.464934e-04
[41] 1.146266e-04 8.969179e-05 7.018108e-05 5.491455e-05 4.296896e-05
[46] 3.362189e-05 2.630810e-05 2.058529e-05 1.610736e-05 1.260351e-05
[51] 9.861865e-06 7.716607e-06 6.038009e-06 4.724558e-06 3.696822e-06
[56] 2.892650e-06 2.263410e-06 1.771049e-06 1.385792e-06 1.084340e-06
[61] 8.484627e-07
$converged
[1] TRUE
Hence convergence is attained at n = 60. You can check that the reported sum is correct by comparing against the directly (but inefficiently) calculated value:
library("expm")
all.equal(Reduce(`+`, lapply(0:fP$n, function(i) P %^% i)), fP$sum) # [1] TRUE
And just for fun:
plot(0:fP$n, fP$norm)

Manual simulation of Markov Chain in R

Consider the Markov chain with state space S = {1, 2}, transition matrix
and initial distribution α = (1/2, 1/2).
Simulate 5 steps of the Markov chain (that is, simulate X0, X1, . . . , X5). Repeat the simulation 100
times. Use the results of your simulations to solve the following problems.
Estimate P(X1 = 1|X0 = 1). Compare your result with the exact probability.
My solution:
# returns Xn
func2 <- function(alpha1, mat1, n1)
{
xn <- alpha1 %*% matrixpower(mat1, n1+1)
return (xn)
}
alpha <- c(0.5, 0.5)
mat <- matrix(c(0.5, 0.5, 0, 1), nrow=2, ncol=2)
n <- 10
for (variable in 1:100)
{
print(func2(alpha, mat, n))
}
What is the difference if I run this code once or 100 times (as is said in the problem-statement)?
How can I find the conditional probability from here on?
Let
alpha <- c(1, 1) / 2
mat <- matrix(c(1 / 2, 0, 1 / 2, 1), nrow = 2, ncol = 2) # Different than yours
be the initial distribution and the transition matrix. Your func2 only finds n-th step distribution, which isn't needed, and doesn't simulate anything. Instead we may use
chainSim <- function(alpha, mat, n) {
out <- numeric(n)
out[1] <- sample(1:2, 1, prob = alpha)
for(i in 2:n)
out[i] <- sample(1:2, 1, prob = mat[out[i - 1], ])
out
}
where out[1] is generated using only the initial distribution and then for subsequent terms we use the transition matrix.
Then we have
set.seed(1)
# Doing once
chainSim(alpha, mat, 1 + 5)
# [1] 2 2 2 2 2 2
so that the chain initiated at 2 and got stuck there due to the specified transition probabilities.
Doing it for 100 times we have
# Doing 100 times
sim <- replicate(chainSim(alpha, mat, 1 + 5), n = 100)
rowMeans(sim - 1)
# [1] 0.52 0.78 0.87 0.94 0.99 1.00
where the last line shows how often we ended up in state 2 rather than 1. That gives one (out of many) reasons why 100 repetitions are more informative: we got stuck at state 2 doing just a single simulation, while repeating it for 100 times we explored more possible paths.
Then the conditional probability can be found with
mean(sim[2, sim[1, ] == 1] == 1)
# [1] 0.4583333
while the true probability is 0.5 (given by the upper left entry of the transition matrix).

What is wrong with my starting values

I am using nleqslv package in R to solve nonlinear system of equations. The R codes are given below;
require(nleqslv)
x <- c(6,12,18,24,30)
NMfun1 <- function(k,n) {
y <- rep(NA, length(k))
y[1] <- -(5/k[1])+sum(x^k[2]*exp(k[3]*x))+2*sum(k[4]*x^k[2]*exp(-k[1]*x^k[2]*exp(k[3]*x)+k[3]*x)/(1-k[4]*exp(-k[1]*x^k[2]*exp(k[3]*x))))
y[2] <- -sum(log(x))-sum(1/(k[2]+k[3]*x))+sum(k[1]*x^k[2]*exp(k[3]*x)*log(x))+2*sum(k[1]*k[4]*exp(-k[1]*x^k[2]*exp(k[3]*x)+k[3]*x)*log(x)/(1-k[4]*exp(-k[1]*x^k[2]*exp(k[3]*x))))
y[3] <- -sum(x/(k[2]+k[3]*x))+sum(k[1]*x^(k[2]+1)*exp(k[3]*x))-sum(x)+2*sum(k[4]*x^k[2]*exp(-k[1]*x^k[2]*exp(k[3]*x)+k[3]*x)/(1-k[4]*exp(-k[1]*x^k[2]*exp(k[3]*x))))
y[4] <- -(5/(1-k[4]))+2*sum(exp(-k[1]*x^k[2]*exp(k[3]*x))/(1-k[4]*exp(-k[1]*x^k[2]*exp(k[3]*x))))
return(y)
}
kstart <- c(0.05, 0, 0.35, 0.9)
NMfun1(kstart)
nleqslv(kstart, NMfun1, control=list(btol=.0001),method="Newton")
The estimated values for k obtained are; 0.04223362 -0.08360564 0.14216026 0.37854908
But the estimated values of k are to be
greater than zero.
Ok. So you want real larger than 0 solutions if they exist of course.
Make a new function which squares the input argument before passing it to NMfun1. And then use the searchZeros function in the package nleqslv to search for solutions. Like this
NMfun1.alt <- function(k0,n) NMfun1(k0^2,n)
3 use set.seed for reproducibility
set.seed(413)
# generate 100 random starting values
xstart <- matrix(runif(4*100,min=0,max=1), nrow=100,ncol=4)
z <- searchZeros(xstart,NMfun1.alt)
z
ksol <- z$x^2
ksol
# in this case there are two solutions
NMfun1(ksol[1,])
NMfun1(ksol[2,])
The output of the last 4 non comment lines of this code are
> ksol <- z$x^2
> ksol
[,1] [,2] [,3] [,4]
[1,] 0.002951051 1.669142 0.03589502 0.001167185
[2,] 0.002951051 1.669142 0.03589502 0.001167185
> NMfun1(ksol[1,])
[1] 3.231138e-11 3.602561e-13 -4.665268e-12 -1.119105e-13
> NMfun1(ksol[2,])
[1] 1.532663e-12 1.085046e-14 6.894485e-14 -2.664535e-15
You will see that the solution contained in object z has a negative element. And that is squared.
From this experiment it appears that your system has a single positive solution.

How to get optim working with matrix multiplication inside the function to be maximized in R

I am attempting to maximize a likelihood for a Matrix parameter of dimension 2x2. The Likelihood function needs to pass in a couple of fixed matrix parameters that the likelihood is also a function of. The data, denoted Y, and a covariance matrix, Sigma.star (which I am passing through as a lower triangular matrix), are necessary for the calculation but I would like to keep those fixed and run an optim function over this, in my code trying to optimize A
My issue is that optim seems to be erring from the fact it's optimizing something inside of an object I'm using for matrix algebra. Is there some way to make it work without programming every little calculation out?
The specific error is:
Error in diag(1, nrow = (m^2)) - A %x% A : non-conformable arrays
But A kronecker A should be an m^2 x m^2 matrix just like the identity…
Code:
library(MCMCpack)
library(mvtnorm)
set.seed(1000)
Likelihood.orig<-function(A, Y, Sigma.star){
Sigma<-xpnd(Sigma.star)
n<-nrow(Y)
if(is.vector(A)==TRUE){
A<-as.matrix(A, nrow=nrow(Sigma), ncol=ncol(Sigma))
}
m<-nrow(A)
V<-matrix(solve(diag(1, nrow=(m^2))-A%x%A)%*%as.vector(Sigma), nrow=m, ncol=m)
temp1<- (-.5)*log(abs(det(V)))
temp2<- (-(n-1)/2)*log(abs(det(Sigma)))
temp3<- t(Y[,1, drop=FALSE]) %*% (solve(V)) %*% Y[,1, drop=FALSE]
terms<- numeric(n-1)
for(i in 2:n){
terms[i-1]<- t(Y[,i, drop=FALSE] - A %*%Y[,i-1, drop=FALSE]) %*% (solve(Sigma)) %*% (Y[,i] - A %*%Y[,i-1])
}
return(temp1+temp2-.5*(temp3+sum(terms)))
}
Generate.Y<-function(n, A, Sigma){
m<-nrow(A)
Y<-matrix(0, nrow=m, ncol=n)
V<-matrix(solve(diag(1, nrow=m^2)-A%x%A)%*%as.vector(Sigma), nrow=m, ncol=m)
Y[,1]<-rmvnorm(1, numeric(nrow(A)), V)
for(i in 2:n){
Y[,i]<-A%*%Y[,i-1, drop=FALSE]+t(rmvnorm(1, mean = numeric(m), sigma = Sigma))
}
return(Y)
}
n<-500
A.true<-matrix(c(.8, .3, 0, .5), nrow=2, ncol=2)
Sigma<-matrix(c(1, 0, 0, .5), nrow=2, ncol=2)
Y<-matrix(0, nrow=2, ncol=n)
Y<-Generate.Y(n, A.true, Sigma)
m=nrow(Y)
lower.Sigma<-vech(Sigma)
optim(par=c(1, 0, 0, 1), fn=Likelihood.orig, method="Nelder-Mead",
control=list(maxit=500, fnscale=-1), Sigma.star=lower.Sigma, Y=Y)
Your approach is correct, i.e., make optim optimize over a vector, and only turn that vector into a matrix inside the function you are trying to maximize.
However, you need to use matrix and not as.matrix to create that matrix. See the difference between:
as.matrix(1:4, nrow=2, ncol=2) # wrong tool
# [,1]
# [1,] 1
# [2,] 2
# [3,] 3
# [4,] 4
and
matrix(1:4, nrow=2, ncol=2)
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
For problems of this type, I would highly recommend you learn the R debugging tools (browser, debug, debugonce, etc.). See General suggestions for debugging in R for examples.

how to calculate the Euclidean norm of a vector in R?

I tried norm, but I think it gives the wrong result. (the norm of c(1, 2, 3) is sqrt(1*1+2*2+3*3), but it returns 6..
x1 <- 1:3
norm(x1)
# Error in norm(x1) : 'A' must be a numeric matrix
norm(as.matrix(x1))
# [1] 6
as.matrix(x1)
# [,1]
# [1,] 1
# [2,] 2
# [3,] 3
norm(as.matrix(x1))
# [1] 6
Does anyone know what's the function to calculate the norm of a vector in R?
norm(c(1,1), type="2") # 1.414214
norm(c(1, 1, 1), type="2") # 1.732051
This is a trivial function to write yourself:
norm_vec <- function(x) sqrt(sum(x^2))
I was surprised that nobody had tried profiling the results for the above suggested methods, so I did that. I've used a random uniform function to generate a list and used that for repetition (Just a simple back of the envelop type of benchmark):
> uut <- lapply(1:100000, function(x) {runif(1000, min=-10^10, max=10^10)})
> norm_vec <- function(x) sqrt(sum(x^2))
> norm_vec2 <- function(x){sqrt(crossprod(x))}
>
> system.time(lapply(uut, norm_vec))
user system elapsed
0.58 0.00 0.58
> system.time(lapply(uut, norm_vec2))
user system elapsed
0.35 0.00 0.34
> system.time(lapply(uut, norm, type="2"))
user system elapsed
6.75 0.00 6.78
> system.time(lapply(lapply(uut, as.matrix), norm))
user system elapsed
2.70 0.00 2.73
It seems that taking the power and then sqrt manually is faster than the builtin norm for real values vectors at least. This is probably because norm internally does an SVD:
> norm
function (x, type = c("O", "I", "F", "M", "2"))
{
if (identical("2", type)) {
svd(x, nu = 0L, nv = 0L)$d[1L]
}
else .Internal(La_dlange(x, type))
}
and the SVD function internally converts the vector into a matrix, and does more complicated stuff:
> svd
function (x, nu = min(n, p), nv = min(n, p), LINPACK = FALSE)
{
x <- as.matrix(x)
...
EDIT (20 Oct 2019):
There have been some comments to point out the correctness issue which the above test case doesn't bring out:
> norm_vec(c(10^155))
[1] Inf
> norm(c(10^155), type="2")
[1] 1e+155
This happens because large numbers are considered as infinity in R:
> 10^309
[1] Inf
So, it looks like:
It seems that taking the power and then sqrt manually is faster than the builtin norm for real values vectors for small numbers.
How small? So that the sum of squares doesn't overflow.
norm(x, type = c("O", "I", "F", "M", "2"))
The default is "O".
"O", "o" or "1" specifies the one norm, (maximum absolute column sum);
"F" or "f" specifies the Frobenius norm (the Euclidean norm of x treated as if it were a vector);
norm(as.matrix(x1),"o")
The result is 6, same as norm(as.matrix(x1))
norm(as.matrix(x1),"f")
The result is sqrt(1*1+2*2+3*3)
So, norm(as.matrix(x1),"f") is answer.
We can also find the norm as :
Result<-sum(abs(x)^2)^(1/2)
OR Even You can also try as:
Result<-sqrt(t(x)%*%x)
Both will give the same answer
I'mma throw this out there too as an equivalent R expression
norm_vec(x) <- function(x){sqrt(crossprod(x))}
Don't confuse R's crossprod with a similarly named vector/cross product. That naming is known to cause confusion especially for those with a physics/mechanics background.
Answer for Euclidean length of a vector (k-norm) with scaling to avoid destructive underflow and overflow is
norm <- function(x, k) { max(abs(x))*(sum((abs(x)/max(abs(x)))^k))^(1/k) }
See below for explanation.
1. Euclidean length of a vector with no scaling:
norm() is a vector-valued function which computes the length of the vector. It takes two arguments such as the vector x of class matrix and the type of norm k of class integer.
norm <- function(x, k) {
# x = matrix with column vector and with dimensions mx1 or mxn
# k = type of norm with integer from 1 to +Inf
stopifnot(k >= 1) # check for the integer value of k greater than 0
stopifnot(length(k) == 1) # check for length of k to be 1. The variable k is not vectorized.
if(k == Inf) {
# infinity norm
return(apply(x, 2, function(vec) max(abs(vec)) ))
} else {
# k-norm
return(apply(x, 2, function(vec) (sum((abs(vec))^k))^(1/k) ))
}
}
x <- matrix(c(1,-2,3,-4)) # column matrix
sapply(c(1:4, Inf), function(k) norm(x = x, k = k))
# [1] 10.000000 5.477226 4.641589 4.337613 4.000000
1-norm (10.0) converges to infinity-norm (4.0).
k-norm is also called as "Euclidean norm in Euclidean n-dimensional space".
Note:
In the norm() function definition, for vectors with real components, the absolute values can be dropped in norm-2k or even indexed norms, where k >= 1.
If you are confused with the norm function definition, you can read each one individually as given below.
norm_1 <- function(x) sum(abs(x))
norm_2 <- function(x) (sum((abs(x))^2))^(1/2)
norm_3 <- function(x) (sum((abs(x))^3))^(1/3)
norm_4 <- function(x) (sum((abs(x))^4))^(1/4)
norm_k <- function(x) (sum((abs(x))^k))^(1/k)
norm_inf <- max(abs(x))
2. Euclidean length of a vector with scaling to avoid destructive overflow and underflow issues:
Note-2:
The only problem with this solution norm() is that it does not guard against overflow or underflow problems as alluded here and here.
Fortunately, someone had already solved this problem for 2-norm (euclidean length) in the blas (basic linear algebra subroutines) fortran library. A description of this problem can be found in the textbook of "Numerical Methods and Software by Kahaner, Moler and Nash" - Chapter-1, Section 1.3, page - 7-9.
The name of the fortran subroutine is dnrm2.f, which handles destructive overflow and underflow issues in the norm() by scaling with the maximum of the vector components. The destructive overflow and underflow problem arise due to radical operation in the norm() function.
I will show how to implement dnrm2.f in R below.
#1. find the maximum among components of vector-x
max_x <- max(x)
#2. scale or divide the components of vector by max_x
scaled_x <- x/max_x
#3. take square of the scaled vector-x
sq_scaled_x <- (scaled_x)^2
#4. sum the square of scaled vector-x
sum_sq_scaled_x <- sum(sq_scaled_x)
#5. take square root of sum_sq_scaled_x
rt_sum_sq_scaled_x <- sqrt(sum_sq_scaled_x)
#6. multiply the maximum of vector x with rt_sum_sq_scaled_x
max_x*rt_sum_sq_scaled_x
one-liner of the above 6-steps of dnrm2.f in R is:
# Euclidean length of vector - 2norm
max(x)*sqrt(sum((x/max(x))^2))
Lets try example vectors to compute 2-norm (see other solutions in this thread) for this problem.
x = c(-8e+299, -6e+299, 5e+299, -8e+298, -5e+299)
max(x)*sqrt(sum((x/max(x))^2))
# [1] 1.227355e+300
x <- (c(1,-2,3,-4))
max(x)*sqrt(sum((x/max(x))^2))
# [1] 5.477226
Therefore, the recommended way to implement a generalized solution for k-norm in R is that single line, which guard against the destructive overflow or underflow problems. To improve this one-liner, you can use a combination of norm() without scaling for a vector containing not-too-small or not-too-large components and knorm() with scaling for a vector with too-small or too-large components. Implementing scaling for all vectors results in too many calculations. I did not implement this improvement in knorm() given below.
# one-liner for k-norm - generalized form for all norms including infinity-norm:
max(abs(x))*(sum((abs(x)/max(abs(x)))^k))^(1/k)
# knorm() function using the above one-liner.
knorm <- function(x, k) {
# x = matrix with column vector and with dimensions mx1 or mxn
# k = type of norm with integer from 1 to +Inf
stopifnot(k >= 1) # check for the integer value of k greater than 0
stopifnot(length(k) == 1) # check for length of k to be 1. The variable k is not vectorized.
# covert elements of matrix to its absolute values
x <- abs(x)
if(k == Inf) { # infinity-norm
return(apply(x, 2, function(vec) max(vec)))
} else { # k-norm
return(apply(x, 2, function(vec) {
max_vec <- max(vec)
return(max_vec*(sum((vec/max_vec)^k))^(1/k))
}))
}
}
# 2-norm
x <- matrix(c(-8e+299, -6e+299, 5e+299, -8e+298, -5e+299))
sapply(2, function(k) knorm(x = x, k = k))
# [1] 1.227355e+300
# 1-norm, 2-norm, 3-norm, 4-norm, and infinity-norm
sapply(c(1:4, Inf), function(k) knorm(x = x, k = k))
# [1] 2.480000e+300 1.227355e+300 9.927854e+299 9.027789e+299 8.000000e+299
x <- matrix(c(1,-2,3,-4))
sapply(c(1:4, Inf), function(k) knorm(x = x, k = k))
# [1] 10.000000 5.477226 4.641589 4.337613 4.000000
x <- matrix(c(1,-2,3,-4, 0, -8e+299, -6e+299, 5e+299, -8e+298, -5e+299), nc = 2)
sapply(c(1:4, Inf), function(k) knorm(x = x, k = k))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1.00e+01 5.477226e+00 4.641589e+00 4.337613e+00 4e+00
# [2,] 2.48e+300 1.227355e+300 9.927854e+299 9.027789e+299 8e+299
If you have a data.frame or a data.table 'DT', and want to compute the Euclidian norm (norm 2) across each row, the apply function can be used.
apply(X = DT, MARGIN = 1, FUN = norm, '2')
Example:
>DT
accx accy accz
1: 9.576807 -0.1629486 -0.2587167
2: 9.576807 -0.1722938 -0.2681506
3: 9.576807 -0.1634264 -0.2681506
4: 9.576807 -0.1545590 -0.2681506
5: 9.576807 -0.1621254 -0.2681506
6: 9.576807 -0.1723825 -0.2682434
7: 9.576807 -0.1723825 -0.2728810
8: 9.576807 -0.1723825 -0.2775187
> apply(X = DT, MARGIN = 1, FUN = norm, '2')
[1] 9.581687 9.582109 9.581954 9.581807 9.581932 9.582114 9.582245 9.582378
Following AbdealiJK's answer,
I experimented further to gain some insight.
Here's one.
x = c(-8e+299, -6e+299, 5e+299, -8e+298, -5e+299)
sqrt(sum(x^2))
norm(x, type='2')
The first result is Inf and the second one is 1.227355e+300 which is quite correct as I show you in the code below.
library(Rmpfr)
y <- mpfr(x, 120)
sqrt(sum(y*y))
The result is 1227354879.... I didn't count the number of trailing numbers but it looks all right. I know there another way around this OVERFLOW problem which is first applying log function to all numbers and summing up, which I do not have time to implement!
Create your matrix as column vise using cbind then the norm function works well with Frobenius norm (the Euclidean norm) as an argument.
x1<-cbind(1:3)
norm(x1,"f")
[1] 3.741657
sqrt(1*1+2*2+3*3)
[1] 3.741657

Resources