R: Computing covariance matrix of a large dataset - r

I'm trying to compute the covariance matrix of a very large image data matrix. I have tried both
cov(data)
and
data %*% t(data)/ (nrow(t(data))-1)
and ended up with a matrix of NaN values which makes absolutely no sense. The size of the covariance matrix is correct but why the values are all NaN does not. If I try
cov(data)
and
t(data) %*% data/ (nrow(data)-1)
I get an error message saying
Error: cannot allocate vector of size ...
I have also tried using the bigcor() but I get this error every time:
Error in if (length < 0 || length > .Machine$integer.max) stop("length must be between 0 and .Machine$integer.max") :
missing value where TRUE/FALSE needed
In addition: Warning message:
In ff(vmode = "double", dim = c(NCOL, NCOL)) :
NAs introduced by coercion to integer range
Any idea of what could be causing this and how to fix it?
I'm following this tutorial:
https://rpubs.com/dherrero12/543854

Related

Increasing Max Weights in neural netwrok

I have a very large data set with around 700000 lines and 26 predictor variables. My neural netwrok cannot run it all because the default max weights is set to 100. I am trying to use the MaxNWts to set it at 10000 but this gives me the following error. What can I do so it runs, even if it takes a bit longer.
trainingData=sample(1:nrow(data),0.7*nrow(data))
predictors=c(1:21,23:27)
myNet=nnet.formula(data[,22]~ data$ï..PatientMRN+data$IsNewToProvider+data$IsOverbooked+data$IsOverride+data$AppointmentDateandTime+data$VisitType+data$DayofWeek+data$DepartmentSpecialty+data$LeadDays,data=data, subset=trainingData, size=7, MaxNWts = 10000)
Forgot to include this is the error message i get
Error in nnet.default(x, y, w, ...) :
NA/NaN/Inf in foreign function call (arg 2)
In addition: Warning message:
In nnet.default(x, y, w, ...) : NAs introduced by coercion
Instead of trying to change the max weight, you should first normalize column-wise your data (before splitting into train-test!) so that the maximal value of each table is set to 1.0
since then all values of the table will be 0 <= value <= 1.0
your weights won't be huge.

Errors in using Matrxmodels package in R: CHOLMOD factorization was unsuccessful

I am trying to estimate a linear model using sparse matrices. The code I used is:
MatrixModels:::lm.fit.sparse( Dc, y, w = NULL, offset = NULL,
method = c("cholesky"),
tol = 1e-7, singular.ok = TRUE, order = NULL,
transpose = FALSE);
The error message is:
Error in .solve.dgC.chol(as(if (transpose) tx else t(x), "CsparseMatrix"), :
CHOLMOD factorization was unsuccessful
In addition: Warning message:
In .solve.dgC.chol(as(if (transpose) tx else t(x), "CsparseMatrix"), :
Cholmod warning 'matrix not positive definite' at file ../Supernodal/t_cholmod_super_numeric.c, line 729
Dc is a sparse matrix with class dgCMatrix and y is a vector. Dc is the product matrix of D and W', where D is a sparse matrix with sparsity level of more than 95%. W' is a transform matrix. I have checked the columns and row of Dc and there is no all-zero column/row.
The possible reason for the error is that Dc is too dense to be used in lm.fit.sparse. However, the dimension of the matrix is too large to use matrix class (dimension of Dc is 1311247*8192). Could anyone please point out to me if I did anything wrong or how I could solve this problem. Thank you!

Error in panel spatial model in R using spml

I am trying to fit a panel spatial model in R using the package spml. I first define the NxN weighting matrix as follows
neib <- dnearneigh(coordinates(coord), 0, 50, longlat = TRUE)
dlist <- nbdists(neib, coordinates(coord))
idlist <- lapply(dlist, function(x) 1/x)
w50 <- nb2listw(neib,zero.policy=TRUE, glist=idlist, style="W")
Thus I define two observations to be neighbours if they are distant within a range of 50km at most. The weights attached to each pairs of neighbour observations correspond to the inverse of their distance, so that closer neighbours receive higher weights. I also use the option zero.policy=TRUE so that observations which do not have neighbours are associated with a vector of zero weights.
Once I do this I try to fit the panel spatial model in the following way
mod <- spml(y ~ x , data = data_p, listw = w50, na.action = na.fail, lag = F, spatial.error = "b", model = "within", effect = "twoways" ,zero.policy=TRUE)
but I get the following error and warning messages
Error in lag.listw(listw, u) : Variable contains non-finite values In
addition: There were 50 or more warnings (use warnings() to see the
first 50)
Warning messages: 1: In mean.default(X[[i]], ...) : argument is not
numeric or logical: returning NA
...
50: In mean.default(X[[i]], ...) : argument is not numeric or
logical: returning NA
I believe this to be related to the non-neighbour observations. Can please anyone help me with this? Is there any way to deal with non-neighbour observations besides the zero.policy option?
Many many thanks for helping me.
You should check two things:
1) Make sure that the weight matrix is row-normalized.
2) Treat properly if you have any NA values in the dataset and as well in the W matrix.

R, Glasso error

I have been trying to find the glasso matrix for a covariance matrix input link:
SP_glasso_matrix= Glasso(SP_covar_matrix, rho=0)
Warning message returned is:
Warning message: In glasso(SP_covar_matrix, rho = 0) : With rho=0,
there may be convergence problems if the input matrix is not of full
rank
Is there something wrong with my covariance matrix? What is rho and how do I set it?

Incorrect dimensions error while using cv.glmnet function in R

cv.out.lasso <- cv.glmnet(x[train, ], y[train], alpha=1)
I am trying to use cv.glmnet , but I am getting this error:
Error in x[train, ] : incorrect number of dimensions.
i checked the dimensions of the matrix x[train, ] and also the length of y. The columns of the matrix equals the length of the y vector.
I am not getting what's wrong here. Can someone please help?

Resources