singular matrix or not: conflict between determinant and rank - r

I made a correlation matrix from exponentially smoothed returns, using Appendix C in https://openresearch-repository.anu.edu.au/bitstream/1885/65527/2/01_Pozzi_Exponential_smoothing_weighted_2012.pdf as a guide.
It's a 101 by 101 matrix, but I don't know if it is singular or not, due to the following conflicting results:
pracma::Rank says its rank is 101;
matrixcalc::is.singular.matrix returns TRUE;
base::determinant.matrix gives a very close-to-zero value.
pracma::Rank(try.wgtd.cor)
#[1] 101
matrixcalc::is.singular.matrix(try.wgtd.cor)
#[1] TRUE
base::determinant.matrix(try.wgtd.cor, logarithm = FALSE)
#$modulus
#[1] 2.368591e-55
#attr(,"logarithm")
#[1] FALSE
#
#$sign
#[1] 1
#
#attr(,"class")
#[1] "det"
Does anyone know why/how this could be?

No no no, don't rely on determinant. A small determinant does not necessarily means singularity. For example, the following diagonal matrix is not singular at all, but has a very small determinant.
## all diagonal elements are 0.1; dimension 101 x 101
D <- diag(0.1, nrow = 101, ncol = 101)
## the obvious way to compute det(D)
prod(diag(D))
#[1] 1e-101
## use base::determinant.matrix
determinant.matrix(D, logarithm = FALSE)$modulus
#[1] 1e-101
The determinant equals the product of eigenvalues. So in general, if all eigenvalues of a matrix are smaller than 1, then the determinant will be very small for sure!
matrixcalc::is.singular.matrix is based on determinant, so do not trust it. In addition, its result is too subjective because you can tweak tol:
By contrast, pracma::Rank uses both QR and SVD factorizations to determine rank. The result is extremely reliable. Here is the source code of Rank (with my comments):
function (M)
{
if (length(M) == 0)
return(0)
if (!is.numeric(M))
stop("Argument 'M' must be a numeric matrix.")
if (is.vector(M))
M <- matrix(c(M), nrow = length(M), ncol = 1)
## detect rank by QR factorization
r1 <- qr(M)$rank
## detect rank by SVD factorization
sigma <- svd(M)$d
tol <- max(dim(M)) * max(sigma) * .Machine$double.eps
r2 <- sum(sigma > tol)
## check consistency
if (r1 != r2)
warning("Rank calculation may be problematic.")
return(r2)
}
In conclusion, your 101 x 101 matrix try.wgtd.cor actually has full rank!

Related

Why am I getting NAs in this calculation in R?

While working on an Rcpp program, I used the sample() function, which gave me the following error: "NAs not allowed in probability." I traced this issue to the fact that the probability vector I used had NA values in it. I have no idea how. Below is some R code that captures the errors:
n.0=20
n.1=20
n.reps=1
beta0.vals=rep(seq(-.3,.1,,n.0),n.reps)
beta1.vals=rep(seq(-7,0,,n.1),n.reps)
beta.grd=as.matrix(expand.grid(beta0.vals,beta1.vals))
n.rnd=200
beta.rnd.grd=cbind(runif(n.rnd,min(beta0.vals),max(beta0.vals)),runif(n.rnd,min(beta1.vals),max(beta1.vals)))
beta.grd=rbind(beta.grd,beta.rnd.grd)
N = 22670
count = 0
for(i in 1:dim(beta.grd)[1]){ # iterate through 600 possible beta values in beta grid
beta.ind = 0 # indicator for current pair of beta values
for(j in 1:N){ # iterate through all possible Nsums
logit = beta.grd[i,1]/N*(j - .1*N)^2 + beta.grd[i,2];
phi01 = exp(logit)/(1 + exp(logit))
if(is.na(phi01)){
count = count + 1
}
}
}
cat("Total number of invalid probabilities: ", count)
Here, $\beta_0 \in (-0.3, 0.1), \beta_1 \in (-7, 0), N = 22670, N_\text{sum} \in (1, N)$. Note that $N$ and $N_\text{sum}$ are integers, whereas the beta values may not be.
Since mathematically, $\phi_{01} \in (0,1)$, I'm assuming that NAs are arising because R is not liking extremely small values. I am receiving an overwhelming amount of NA values, too. More so than numbers. Why would I be getting NAs in this code?
Include print(logit) next to count = count + 1 and you will find lots of logit > 1000 values. exp(1000) == Inf so you divide Inf by Inf which will get you a NaN and NaN is NA:
> exp(500)
[1] 1.403592e+217
> Inf/Inf
[1] NaN
> is.na(NaN)
[1] TRUE
So your problems are not too small but to large numbers coming first out of the evaluation of exp(x) with x larger then roughly 700:
> exp(709)
[1] 8.218407e+307
> exp(710)
[1] Inf
Bernhard's answer correctly identifies the problem:
If logit is large, exp(logit) = Inf.
Here is a solution:
for(i in 1:dim(beta.grd)[1]){ # iterate through 600 possible beta values in beta grid
beta.ind = 0 # indicator for current pair of beta values
for(j in 1:N){ # iterate through all possible Nsums
logit = beta.grd[i,1]/N*(j - .1*N)^2 + beta.grd[i,2];
## This one isn't great because exp(logit) can be very large
# phi01 = exp(logit)/(1 + exp(logit))
## So, we say instead
## phi01 = 1 / ( 1 + exp(-logit) )
phi01 = plogis(logit)
if(is.na(phi01)){
count = count + 1
}
}
}
cat("Total number of invalid probabilities: ", count)
# Total number of invalid probabilities: 0
We can use the more stable 1 / (1 + exp(-logit)
(to convince yourself of this, multiply your expression with exp(-logit) / exp(-logit)),
and luckily either way, R has a builtin function plogis() that can calculate these probabilities quickly and accurately.
You can see from the help file (?plogis) that this function evaluates the expression I gave, but you can also double check to assure yourself
x = rnorm(1000)
y = 1 / (1 + exp(-x))
z = plogis(x)
all.equal(y, z)
[1] TRUE

Why do I get many NA's in a "for" loop that simulates Poisson random variables

I keep getting error messages saying that I am creating hundreds of NA's in my for loop. Where do those NA come from? Any help would be greatly appreciated!
drip <- function(rate = 1, minutes = 120) {
count <- 0
for(i in 1:(minutes)) {
count <- count + rpois(1, rate)
rate <- rate * runif(1, 0, 5)
}
count
}
drip()
You get integer overflow. Try
set.seed(0)
rpois(1, 1e+8)
#[1] 100012629
rpois(1, 1e+9)
#[1] 999989683
rpois(1, 1e+10)
#[1] NA
#Warning message:
#In rpois(1, 1e+10) : NAs produced
As soon as lambda is too large, 32-bit representation of integer is insufficient and NA is returned. (Recall that Poisson random variables are integers).
Your loop has a dynamic growth on rate (lambda), which can eventually become too big. Running your function with a smaller minutes, say 10, is fine.
By contrast, ppois and dpois which produce double-precision floating point numbers are fine with large lambda.
dpois(1e+8, 1e+8)
#[1] 3.989423e-05
dpois(1e+9, 1e+9)
#[1] 1.261566e-05
dpois(1e+10, 1e+10)
#[1] 3.989423e-06
dpois(1e+11, 1e+11)
#[1] 1.261566e-06
ppois(1e+8, 1e+8)
#[1] 0.5000266
ppois(1e+9, 1e+9)
#[1] 0.5000084
ppois(1e+10, 1e+10)
#[1] 0.5000027
ppois(1e+11, 1e+11)
#[1] 0.5000008
With each passing minute, the rate parameter increases by x%, where x is a random value from the uniform distribution on the interval [0, 5].
The rate increases by x% not x. So you should use
rate <- rate * (1 + runif(1, 0, 5) / 100)

Simulation and apply functions in matrix, R

I have a couple of questions regarding to the piece of code shown below, the function called "Func1" will return a matrix as a result, the size of the matrix will be 50 rows and 15 columns, I called it "M", and "M2" is just the transpose of it. W0 is the initial value for the next part of the code, if I run the function called "Rowresult", then it also give me a 50*15 matrix.
My first question is: if I want to run the "Rowresult" function for different W0 values,such as W0 = 10,20,30. and I want to have 3 matrices in the size of 50*15 with different W0 values as results,how could I achieve it?
My second question is : if you tried my code in R, you will see a matrix called "wealth_result 2" as a result. once I got this big matrix, I would like to divide it (50*15 matrix) into three same size matrix, each matrix has a size of 50*5 (so they share the same rows but different columns, the first matrix takes the first 5 columns, the second takes 6-10 columns, third one takes 11-15 columns),and then I want to work out how many positive rows (rows with all numbers positive) among each of the 50 *5 matrix? How could I achieve this?
N=15
func1<-function(N){
alpha1 = 8.439e-02
beta1 = 8.352e-01
mu = 7.483e-03
omega = 1.343e-04
X_0 = -3.092031e-02
sigma_0 = 0.03573968
eps = rt (N,7.433e+00)
# loops
Xn= numeric (N)
sigma= numeric (N)
sigma[1] = sigma_0
Xn[1] = X_0
for (t in 2:N){
sigma[t] = sqrt (omega + alpha1 * (Xn[t-1])^2 + beta1* (sigma[t-1])^2)
Xn[t] = sigma[t] * eps[t]
}
Y = mu + Xn
}
# return matrix
M<-replicate(50,func1(N))
# returns matrix
M2<-t(M)
View(M2)
# wealth with initial wealth 10
W0=10
# 10,20,30,40
r= c(0.101309031, -0.035665516, -0.037377270, -0.005928941, 0.036612849,
0.062404039, 0.124240950, -0.034843633, 0.004770613, 0.005018101,
0.097685945, -0.090660099, 0.004863099, 0.029215984, 0.020835366)
Rowresult<- function(r){
const = exp(cumsum(r))
exp.cum = cumsum(1/const)
wealth=const*(W0 - exp.cum)
wealth
}
# wealth matrix
wealth_result <-apply(M2,1,Rowresult)
wealth_result2 <-t(wealth_result )
View(wealth_result2)
This delivers the desired counds of (all) "positive rows":
> sapply(1:3, function(m) sum( rowSums( wealth_result2[ , (1:5)+(m-1)*5 ] >0 )) )
[1] 250 230 2

How to construct a sequence with a pattern in R

I would like to construct a sequence with length 50 of the following type:
Xn+1=4*Xn*(1-Xn). For your information, this is the Logistic Map for r=4. In the case of the Logistic Map with parameter r = 4 and an initial state in (0,1), the attractor is also the interval (0,1) and the probability measure corresponds to the beta distribution with parameters a = 0.5 and b = 0.5. (The Logistic Map is a polynomial mapping (equivalently, recurrence relation) of degree 2, often cited as an archetypal example of how complex, chaotic behaviour can arise from very simple non-linear dynamical equations). How can I do this in R?
There are some ready to use solution on the net. I cite the general solution of mage's blog where you can find more detailed description.
logistic.map <- function(r, x, N, M){
## r: bifurcation parameter
## x: initial value
## N: number of iteration
## M: number of iteration points to be returned
z <- 1:N
z[1] <- x
for(i in c(1:(N-1))){
z[i+1] <- r *z[i] * (1 - z[i])
}
## Return the last M iterations
z[c((N-M):N)]
}
For OP example:
logistic.map(4,0.2,50,49)
This isn't really an R question, is it? More basic programming. Anyway, you probably need an accumulator and a value to process.
values <- 0.2 ## this accumulates as a vector, starting with 0.2
xn <- values ## xn gets the first value
for (it in 2:50) { ## start the loop from the second iteration
xn <- 4L*xn*(1L-xn) ## perform the sequence function
values <- c(values, xn) ## add the new value to the vector
}
values
# [1] 0.2000000000 0.6400000000 0.9216000000 0.2890137600 0.8219392261 0.5854205387 0.9708133262 0.1133392473 0.4019738493 0.9615634951 0 .1478365599 0.5039236459
# [13] 0.9999384200 0.0002463048 0.0009849765 0.0039360251 0.0156821314 0.0617448085 0.2317295484 0.7121238592 0.8200138734 0.5903644834 0 .9673370405 0.1263843622
# [25] 0.4416454208 0.9863789723 0.0537419811 0.2034151221 0.6481496409 0.9122067356 0.3203424285 0.8708926280 0.4497546341 0.9899016128 0 .0399856390 0.1535471506
# [37] 0.5198816927 0.9984188732 0.0063145074 0.0250985376 0.0978744041 0.3531800204 0.9137755744 0.3151590962 0.8633353611 0.4719496615 0 .9968527140 0.0125495222
# [49] 0.0495681269 0.1884445109

Generate multivariate normal r.v.'s with rank-deficient covariance via Pivoted Cholesky Factorization

I'm just beating my head against the wall trying to get a Cholesky decomposition to work in order to simulate correlated price movements.
I use the following code:
cormat <- as.matrix(read.csv("http://pastebin.com/raw/qGbkfiyA"))
cormat <- cormat[,2:ncol(cormat)]
rownames(cormat) <- colnames(cormat)
cormat <- apply(cormat,c(1,2),FUN = function(x) as.numeric(x))
chol(cormat)
#Error in chol.default(cormat) :
# the leading minor of order 8 is not positive definite
cholmat <- chol(cormat, pivot=TRUE)
#Warning message:
# In chol.default(cormat, pivot = TRUE) :
# the matrix is either rank-deficient or indefinite
rands <- array(rnorm(ncol(cholmat)), dim = c(10000,ncol(cholmat)))
V <- t(t(cholmat) %*% t(rands))
#Check for similarity
cor(V) - cormat ## Not all zeros!
#Check the standard deviations
apply(V,2,sd) ## Not all ones!
I'm not really sure how to properly use the pivot = TRUE statement to generate my correlated movements. The results look totally bogus.
Even if I have a simple matrix and I try out "pivot" then I get bogus results...
cormat <- matrix(c(1,.95,.90,.95,1,.93,.90,.93,1), ncol=3)
cholmat <- chol(cormat)
# No Error
cholmat2 <- chol(cormat, pivot=TRUE)
# No warning... pivot changes column order
rands <- array(rnorm(ncol(cholmat)), dim = c(10000,ncol(cholmat)))
V <- t(t(cholmat2) %*% t(rands))
#Check for similarity
cor(V) - cormat ## Not all zeros!
#Check the standard deviations
apply(V,2,sd) ## Not all ones!
There are two errors with your code:
You did not use pivoting index to revert the pivoting done to the Cholesky factor. Note, pivoted Cholesky factorization for a semi-positive definite matrix A is doing:
P'AP = R'R
where P is a column pivoting matrix, and R is an upper triangular matrix. To recover A from R, we need apply the inverse of P (i.e., P'):
A = PR'RP' = (RP')'(RP')
Multivariate normal with covariance matrix A, is generated by:
XRP'
where X is multivariate normal with zero mean and identity covariance.
Your generation of X
X <- array(rnorm(ncol(R)), dim = c(10000,ncol(R)))
is wrong. First, it should not be ncol(R) but nrow(R), i.e., the rank of X, denoted by r. Second, you are recycling rnorm(ncol(R)) along columns, and the resulting matrix is not random at all. Therefore, cor(X) is never close to an identity matrix. The correct code is:
X <- matrix(rnorm(10000 * r), 10000, r)
As a model implementation of the above theory, consider your toy example:
A <- matrix(c(1,.95,.90,.95,1,.93,.90,.93,1), ncol=3)
We compute the upper triangular factor (suppressing possible rank-deficient warnings) and extract inverse pivoting index and rank:
R <- suppressWarnings(chol(A, pivot = TRUE))
piv <- order(attr(R, "pivot")) ## reverse pivoting index
r <- attr(R, "rank") ## numerical rank
Then we generate X. For better result we centre X so that column means are 0.
X <- matrix(rnorm(10000 * r), 10000, r)
## for best effect, we centre `X`
X <- sweep(X, 2L, colMeans(X), "-")
Then we generate target multivariate normal:
## compute `V = RP'`
V <- R[1:r, piv]
## compute `Y = X %*% V`
Y <- X %*% V
We can verify that Y has target covariance A:
cor(Y)
# [,1] [,2] [,3]
#[1,] 1.0000000 0.9509181 0.9009645
#[2,] 0.9509181 1.0000000 0.9299037
#[3,] 0.9009645 0.9299037 1.0000000
A
# [,1] [,2] [,3]
#[1,] 1.00 0.95 0.90
#[2,] 0.95 1.00 0.93
#[3,] 0.90 0.93 1.00

Resources