sparse-QR more rows than original matrix - r

I did a sparse qr decomposition with "Matrix" package in R like
a <- Matrix(runif(20), nrow = 5, sparse = T)
a[3:5,] <- 0 #now **a** is a 5X4 matrix
b <- qr.R(qr(a), complete = T) #but now **b** is a 7X4 matrix!
anyone knows why? Note that if I keep a dense, then the bug(?) does not appear.

I'll assume you did not see the warning, otherwise you would have mentioned it, right?
Warning message:
In qr.R(qr(a), complete = T) :
qr.R(< sparse >) may differ from qr.R(< dense >) because of permutations
Now if you are asking what those permutations mean, it's a different story...
The help("sparseQR-class") page may have more info on the issue:
However, because the matrix Q is not uniquely defined, the results of qr.qy and qr.qty do not necessarily match those from the corresponding dense matrix calculations.
Maybe it is the same with qr.R?
Finally, further down on the same help page:
qr.R --- signature(qr = "sparseQR"): compute the upper triangular R matrix of the QR decomposition. Note that this currently warns because of possible permutation mismatch with the classical qr.R() result, and you can suppress these warnings by setting options() either "Matrix.quiet.qr.R" or (the more general) either "Matrix.quiet" to TRUE.

Related

Confusion about the matrix "B" returned by `quantreg::boot.rq`

When invoking boot.rq like this
b_10 = boot.rq(x, y, tau = .1, bsmethod = "xy", cov = TRUE, R = reps, mofn = mofn)
what does the B matrix (size R x p) in b_10 contain: bootstrapped coefficient estimates or bootstrapped standard errors?
The Value section in the documentation says:
A list consisting of two elements: A matrix B of dimension R by p is returned with the R resampled estimates of the vector of quantile regression parameters. [...]
So, it seems to be the coefficient estimates. But Description section says:
These functions can be used to construct standard errors, confidence intervals and tests of hypotheses regarding quantile regression models.
So it seems to be bootstrapped standard errors.
What is it really?
Edit:
I also wonder what difference the option cov = TRUE makes. Thanks!
The bootstrapped values are different depending on whether I use cov = TRUE or not. The code was written by someone else so I'm not sure why that option was put there.
It stores the bootstrap coefficients. Each row of B is a sample of coefficients, and you have R rows.
These samples are the basis of further inference. We can compute various statistics from them. For example, to compute bootstrap mean and standard error, we can do:
colMeans(B)
apply(B, 2, sd)
Do you also happen to know what difference the option cov = TRUE makes?
Are you sure that cov = TRUE works? First of all, boot.rq itself has no such argument. It may be passed in via .... However, ... is forwarded to boot.rq.pxy (if bsmethod = "pxy") or boot.rq.pwxy (if bsmethod = "pwxy"), neither of which deals with a cov argument. Furthermore, you use bsmethod = "xy", so ... will be silently ignored. As far as I could see, cov = TRUE has no effect at all.
It works in the sense that R doesn't throw me an error.
That is what "silently ignored" means. You can pass whatever into .... They are just ignored.
The bootstrapped values are different depending on whether I use cov = TRUE or not. The code was written by someone else so I'm not sure why that option was put there.
Random sampling won't give identical results on different runs. I suggest you fix a random seed then do testing:
set.seed(0); ans1 <- boot.rq(..., cov = FALSE)
set.seed(0); ans2 <- boot.rq(..., cov = TRUE)
all.equal(ans1$B, ans2$B)
If you don't get TRUE, come back to me.
You're right. It's just because of the different seeds. Thanks!!

Invert singular matrices in R

I am trying to grasp the basic concept of invertible and non-invertible matrices.
I created a random non-singular square matrix
S <- matrix(rnorm(100, 0, 1), ncol = 10, nrow = 10)
I know that this matrix is positive definite (thus invertible) because when I decompose the matrix S into its eigenvalues, their product is positive.
eig_S <- eigen(S)
eig_S$values
[1] 3.0883683+0.000000i -2.0577317+1.558181i -2.0577317-1.558181i 1.6884120+1.353997i 1.6884120-1.353997i
[6] -2.1295086+0.000000i 0.1805059+1.942696i 0.1805059-1.942696i -0.8874465+0.000000i 0.8528495+0.000000i
solve(S)
According to this paper, we can compute the inverse of a non-singular matrix by its SVD too.
Where
(where U and V are eigenvectors and D eigenvalues, please do correct me if I am wrong).
The inverse then is, .
Indeed, I can run the formula in R:
s <- svd(S)
s$v%*%solve(diag(s$d))%*%t(s$u)
Which produces exactly the same result as solve(S).
My first question is:
1) Are s$d indeed represent the eigenvalues of S? Because s$d and eig_S$values are quite different.
Now the second part,
If I create a singular matrix
I <- matrix(rnorm(100, 0, 1), ncol = 5, nrow = 20)
I <- I%*%t(I)
eig_I <- eigen(I)
eig_I$values
[1] 3.750029e+01 2.489995e+01 1.554184e+01 1.120580e+01 8.674039e+00 3.082593e-15 5.529794e-16 3.227684e-16
[9] 2.834454e-16 5.876634e-17 -1.139421e-18 -2.304783e-17 -6.636508e-17 -7.309336e-17 -1.744084e-16 -2.561197e-16
[17] -3.075499e-16 -4.150320e-16 -7.164553e-16 -3.727682e-15
The solve function will produce an error
solve(I)
system is computationally singular: reciprocal condition number =
1.61045e-19
So, again according to the same paper we can use the SVD
i <- svd(I)
solve(i$u %*% diag(i$d) %*% t(i$v))
which produces the same error.
Then I tried to use the Cholesky decomposition for matrix inversion
Conj(t(I))%*%solve(I%*%Conj(t(I)))
and again I get the same error.
Could someone please explain where am I using the equations wrong?
I know that for matrix I%*%Conj(t(I)), the determinant of the eigenvalue matrix is positive but the matrix is not a full rank due to the initial multiplication that I did.
j <- eigen(I%*%Conj(t(I)))
det(diag(j$values))
[1] 3.17708e-196
qr(I %*% Conj(t(I)))$rank
[1] 5
UPDATE 1: Following the comments bellow, and after going through the paper/Wikipedia page again. I used these two codes, which they produce some results but I am not sure about their validity. The first example seems more believable. The SVD solution
i$v%*%diag(1/i$d)%*%t(i$u)
and the Cholesky
Conj(t(I))%*%(I%*%Conj(t(I)))^(-1)
I am not sure if I interpreted the two sources correctly though.

Compute the null space of a sparse matrix

I found the function (null OR nullspace) to find the null space of a regular matrix in R, but I couldn't find any function or package for a sparse matrix (sparseMatrix).
Does anybody know how to do this?
If you take a look at the code of ggm::null, you will see that it is based on the QR decomposition of the input matrix.
On the other hand, the Matrix package provides its own method to compute the QR decomposition of a sparse matrix.
For example:
require(Matrix)
A <- matrix(rep(0:1, 3), 3, 2)
As <- Matrix(A, sparse = TRUE)
qr.Q(qr(A), complete=TRUE)[, 2:3]
qr.Q(qr(As), complete=TRUE)[, 2:3]

Pseudoinverse of large sparse matrix in R

I am trying to calculate the pseudoinverse of a large sparse matrix in R using the singular value decomposition. The matrix L is roughly 240,000 x 240,000, and I have it stored as type dgCMatrix. $L$ represents the Laplacian of a large diameter graph, and I happen to know that the pseudoinverse $L^+$ should also be sparse. From empirical observations of a smaller subset of this graph, L is ~.07% sparse, while $L^+$ which is ~2.5% sparse.
I have tried using pinv, ginv, and other standard pseudoinverse functions, but they error out due to memory constraints. I then tried to opt for the sparse matrix svd provided by the package irlba, which I was then going to use to compute the pseudoinverse using the standard formula after converting all outputs to sparse matrices. My code is here:
lim = 40
digits = 4
SVD =irlba(L,lim)
tU = round(SVD$u,digits)
nonZeroU = which(abs(U)>0,arr.ind = T)
sparseU = sparseMatrix(i=nonZeroU[,2],j=nonZeroU[,1],x = U[nonZeroU])
V = round(SVD$v,digits)
nonZeroV = which(abs(V)>0,arr.ind = T)
sparseV = sparseMatrix(i=nonZeroV[,1],j=nonZeroV[,2],x = U[nonZeroV])
D = as(Diagonal(x=1/SVD$d),"sparseMatrx")
pL =D%*%sparseU
pL = sparseV%*%pL
I am able to get to the last line without an issue, but then I get an error due to memory constraints that says
Error in sparseV %*% pL :
Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
Of course I could piece together the pseudoinverse entry by entry using a for loop and vector multiplications, but I would like to be able to calculate it using a simple function that takes advantage of the sparsity of the resultant pseudoinverse matrix. Is there any way to use the SVD of L to efficiently and approximately compute the pseudoinverse of $L+$, other than calculating each row individually?

Help using predict() for kernlab's SVM in R?

I am trying to use the kernlab R package to do Support Vector Machines (SVM). For my very simple example, I have two pieces of training data. A and B.
(A and B are of type matrix - they are adjacency matrices for graphs.)
So I wrote a function which takes A+B and generates a kernel matrix.
> km
[,1] [,2]
[1,] 14.33333 18.47368
[2,] 18.47368 38.96053
Now I use kernlab's ksvm function to generate my predictive model. Right now, I'm just trying to get the darn thing to work - I'm not worried about training error, etc.
So, Question 1: Am I generating my model correctly? Reasonably?
# y are my classes. In this case, A is in class "1" and B is in class "-1"
> y
[1] 1 -1
> model2 = ksvm(km, y, type="C-svc", kernel = "matrix");
> model2
Support Vector Machine object of class "ksvm"
SV type: C-svc (classification)
parameter : cost C = 1
[1] " Kernel matrix used as input."
Number of Support Vectors : 2
Objective Function Value : -0.1224
Training error : 0
So far so good. We created our custom kernel matrix, and then we created a ksvm model using that matrix. We have our training data labeled as "1" and "-1".
Now to predict:
> A
[,1] [,2] [,3]
[1,] 0 1 1
[2,] 1 0 1
[3,] 0 0 0
> predict(model2, A)
Error in as.matrix(Z) : object 'Z' not found
Uh-oh. This is okay. Kind of expected, really. "Predict" wants some sort of vector, not a matrix.
So lets try some things:
> predict(model2, c(1))
Error in as.matrix(Z) : object 'Z' not found
> predict(model2, c(1,1))
Error in as.matrix(Z) : object 'Z' not found
> predict(model2, c(1,1,1))
Error in as.matrix(Z) : object 'Z' not found
> predict(model2, c(1,1,1,1))
Error in as.matrix(Z) : object 'Z' not found
> predict(model2, km)
Error in as.matrix(Z) : object 'Z' not found
Some of the above tests are nonsensical, but that is my point: no matter what I do, I just can't get predict() to look at my data and do a prediction. Scalars don't work, vectors don't work. A 2x2 matrix doesn't work, nor does a 3x3 matrix.
What am I doing wrong here?
(Once I figure out what ksvm wants, then I can make sure that my test data can conform to that format in a sane/reasonable/mathematically sound way.)
If you think about how the support vector machine might "use" the kernel matrix, you'll see that you can't really do this in the way you're trying (as you've seen :-)
I actually struggled a bit with this when I first was using kernlab + a kernel matrix ... coincidentally, it was also for graph kernels!
Anyway, let's first realize that since the SVM doesn't know how to calculate your kernel function, it needs to have these values already calculated between your new (testing) examples, and the examples it picks out as the support vectors during the training step.
So, you'll need to calculate the kernel matrix for all of your examples together. You'll later train on some and test on the others by removing rows + columns from the kernel matrix when appropriate. Let me show you with code.
We can use the example code in the ksvm documentation to load our workspace with some data:
library(kernlab)
example(ksvm)
You'll need to hit return a few (2) times in order to let the plots draw, and let the example finish, but you should now have a kernel matrix in your workspace called K. We'll need to recover the y vector that it should use for its labels (as it has been trampled over by other code in the example):
y <- matrix(c(rep(1,60),rep(-1,60)))
Now, pick a subset of examples to use for testing
holdout <- sample(1:ncol(K), 10)
From this point on, I'm going to:
Create a training kernel matrix named trainK from the original K kernel matrix.
Create an SVM model from my training set trainK
Use the support vectors found from the model to create a testing kernel matrix testK ... this is the weird part. If you look at the code in kernlab to see how it uses the support vector indices, you'll see why it's being done this way. It might be possible to do this another way, but I didn't see any documentation/examples on predicting with a kernel matrix, so I'm doing it "the hard way" here.
Use the SVM to predict on these features and report accuracy
Here's the code:
trainK <- as.kernelMatrix(K[-holdout,-holdout]) # 1
m <- ksvm(trainK, y[-holdout], kernel='matrix') # 2
testK <- as.kernelMatrix(K[holdout, -holdout][,SVindex(m), drop=F]) # 3
preds <- predict(m, testK) # 4
sum(sign(preds) == sign(y[holdout])) / length(holdout) # == 1 (perfect!)
That should just about do it. Good luck!
Responses to comment below
what does K[-holdout,-holdout] mean? (what does the "-" mean?)
Imagine you have a vector x, and you want to retrieve elements 1, 3, and 5 from it, you'd do:
x.sub <- x[c(1,3,5)]
If you want to retrieve everything from x except elements 1, 3, and 5, you'd do:
x.sub <- x[-c(1,3,5)]
So K[-holdout,-holdout] returns all of the rows and columns of K except for the rows we want to holdout.
What are the arguments of your as.kernelMatrix - especially the [,SVindex(m),drop=F] argument (which is particulary strange because it looks like that entire bracket is a matrix index of K?)
Yeah, I inlined two commands into one:
testK <- as.kernelMatrix(K[holdout, -holdout][,SVindex(m), drop=F])
Now that you've trained the model, you want to give it a new kernel matrix with your testing examples. K[holdout,] would give you only the rows which correspond to the training examples in K, and all of the columns of K.
SVindex(m) gives you the indexes of your support vectors from your original training matrix -- remember, those rows/cols have holdout removed. So for those column indices to be correct (ie. reference the correct sv column), I must first remove the holdout columns.
Anyway, perhaps this is more clear:
testK <- K[holdout, -holdout]
testK <- testK[,SVindex(m), drop=FALSE]
Now testK only has the rows of our testing examples and the columns that correspond to the support vectors. testK[1,1] will have the value of the kernel function computed between your first testing example, and the first support vector. testK[1,2] will have the kernel function value between your 1st testing example and the second support vector, etc.
Update (2014-01-30) to answer comment from #wrahool
It's been a while since I've played with this, so the particulars of kernlab::ksvm are a bit rusty, but in principle this should be correct :-) ... here goes:
what is the point of testK <- K[holdout, -holdout] - aren't you removing the columns that correspond to the test set?
Yes. The short answer is that if you want to predict using a kernel matrix, you have to supply the a matrix that is of the dimension rows by support vectors. For each row of the matrix (the new example you want to predict on) the values in the columns are simply the value of the kernel matrix evaluated between that example and the support vector.
The call to SVindex(m) returns the index of the support vectors given in the dimension of the original training data.
So, first doing testK <- K[holdout, -holdout] gives me a testK matrix with the rows of the examples I want to predict on, and the columns are from the same examples (dimension) the model was trained on.
I further subset the columns of testK by SVindex(m) to only give me the columns which (now) correspond to my support vectors. Had I not done the first [, -holdout] selection, the indices returned by SVindex(m) may not correspond to the right examples (unless all N of your testing examples are the last N columns of your matrix).
Also, what exactly does the drop = FALSE condition do?
It's a bit of defensive coding to ensure that after the indexing operation is performed, the object that is returned is of the same type as the object that was indexed.
In R, if you index only one dimension of a 2D (or higher(?)) object, you are returned an object of the lower dimension. I don't want to pass a numeric vector into predict because it wants to have a matrix
For instance
x <- matrix(rnorm(50), nrow=10)
class(x)
[1] "matrix"
dim(x)
[1] 10 5
y <- x[, 1]
class(y)
[1] "numeric"
dim(y)
NULL
The same will happen with data.frames, etc.
First off, I have not used kernlab much. But simply looking at the docs, I do see working examples for the predict.ksvm() method. Copying and pasting, and omitting the prints to screen:
## example using the promotergene data set
data(promotergene)
## create test and training set
ind <- sample(1:dim(promotergene)[1],20)
genetrain <- promotergene[-ind, ]
genetest <- promotergene[ind, ]
## train a support vector machine
gene <- ksvm(Class~.,data=genetrain,kernel="rbfdot",\
kpar=list(sigma=0.015),C=70,cross=4,prob.model=TRUE)
## predict gene type probabilities on the test set
genetype <- predict(gene,genetest,type="probabilities")
That seems pretty straight-laced: use random sampling to generate a training set genetrain and its complement genetest, then fitting via ksvm and a call to a predict() method using the fit, and new data in a matching format. This is very standard.
You may find the caret package by Max Kuhn useful. It provides a general evaluation and testing framework for a variety of regression, classification and machine learning methods and packages, including kernlab, and contains several vignettes plus a JSS paper.
Steve Lianoglou is right.
In kernlab it is a bit wired, and when predicting it requires the input kernel matrix between each test example and the support vectors. You need to find this matrix yourself.
For example, a test matrix [n x m], where n is the number of test samples and m is the number of support vectors in the learned model (ordered in the sequence of SVindex(model)).
Example code
trmat <- as.kernelMatrix(kernels[trainidx,trainidx])
tsmat <- as.kernelMatrix(kernels[testidx,trainidx])
#training
model = ksvm(x=trmat, y=trlabels, type = "C-svc", C = 1)
#testing
thistsmat = as.kernelMatrix(tsmat[,SVindex(model)])
tsprediction = predict(model, thistsmat, type = "decision")
kernels is the input kernel matrix. trainidx and testidx are ids for training and test.
Build the labels yourself from the elements of the solution. Use this alternate predictor method which takes ksvm model (m) and data in original training format (d)
predict.alt <- function(m, d){
sign(d[, m#SVindex] %*% m#coef[[1]] - m#b)
}
K is a kernelMatrix for training. For validation's sake, if you run predict.alt on the training data you will notice that the alternate predictor method switches values alongside the fitted values returned by ksvm. The native predictor behaves in an unexpected way:
aux <- data.frame(fit=kout#fitted, native=predict(kout, K), alt=predict.alt(m=kout, d=as.matrix(K)))
sample_n(aux, 10)
fit native alt
1 0 0 -1
100 1 0 1
218 1 0 1
200 1 0 1
182 1 0 1
87 0 0 -1
183 1 0 1
174 1 0 1
94 1 0 1
165 1 0 1

Resources