I'm trying to compute the covariance matrix of a very large image data matrix. I have tried both
cov(data)
and
data %*% t(data)/ (nrow(t(data))-1)
and ended up with a matrix of NaN values which makes absolutely no sense. The size of the covariance matrix is correct but why the values are all NaN does not. If I try
cov(data)
and
t(data) %*% data/ (nrow(data)-1)
I get an error message saying
Error: cannot allocate vector of size ...
I have also tried using the bigcor() but I get this error every time:
Error in if (length < 0 || length > .Machine$integer.max) stop("length must be between 0 and .Machine$integer.max") :
missing value where TRUE/FALSE needed
In addition: Warning message:
In ff(vmode = "double", dim = c(NCOL, NCOL)) :
NAs introduced by coercion to integer range
Any idea of what could be causing this and how to fix it?
I'm following this tutorial:
https://rpubs.com/dherrero12/543854
I'm trying to run a factor analysis on a set of 80 dichotomous variables (1440 cases) using the hector function from the polycor package and the instructions I found here: http://researchsupport.unt.edu/class/Jon/Benchmarks/BinaryFA_L_JDS_Sep2014.pdf
Sadly, after I select just the variables interest from the rest of my dataset and run the factor analysis on them, I seem to consistently get the following error and warnings
Error in optim(0, f, control = control, hessian = TRUE, method = "BFGS") :
non-finite finite-difference value [1]
In addition: Warning messages:
1: In log(P) : NaNs produced
2: In log(P) : NaNs produced
This is with the command/when I hit the step described in the above PDF:
testMat <- hetcor(data)$cor
No idea what this means or how to proceed... Your thoughts are appreciated. Thank you!
I'm trying to reproduce the following example from David Ruppert's "Statistics and Data Analysis for Financial Engineering", which fits Students t-distribution to historical risk free rate:
library(MASS)
data(Capm, package = "Ecdat")
x <- Capm$rf
fitt <- fitdistr(x,"t", start = list(m=mean(x),s=sd(x)), df=3)
as.numeric(fitt$estimate)
0.437310595161651 0.152205764779349
The output is accompanied by the following Warnings message:
Warning message:
In log(s): NaNs producedWarning message:
In log(s): NaNs producedWarning message:
In log(s): NaNs producedWarning message:
In log(s): NaNs producedWarning message:
In log(s): NaNs producedWarning message:
In log(s): NaNs producedWarning message:
In log(s): NaNs producedWarning message:
In log(s): NaNs producedWarning message:
In log(s): NaNs produced
It appears from the R's help file that MASS::fitdistr uses maximum-likelihood for finding optimal parameters. However, when I do optimization manually (same book), all goes smoothly and there is no warnings:
library(fGarch)
loglik_t <- function(beta) {sum( - dt((x - beta[1]) / beta[2],
beta[3], log = TRUE) + log(beta[2]) )}
start <- c(mean(x), sd(x), 5)
lower <- c(-1, 0.001, 1)
fit_t <- optim(start, loglik_t, hessian = T, method = "L-BFGS-B", lower = lower)
fit_t$par
0.44232633269102 0.163306955396773 4.12343777572566
The fitted parameters are within acceptable standard errors, and, in addition to mean and sd I have gotten df.
Can somebody advise me please:
Why MASS::fitdistr produces warnings whereas optimization via fGarch::optim succeeds without a warning?
Why there is no df in MASS::fitdistr output?
Is there a way to run MASS:fitdistr on this data without a warning and get df?
Disclaimer:
a similar question was asked couple of times without an answer here and here
You are not passing the lower argument to the function fitdistr which leads it to make a search in positive and negative domain. By passing the lower argument to function
fitt <- fitdistr(x,"t", start = list(m=mean(x),s=sd(x)), df=3, lower=c(-1, 0.001))
you get no NaNs -as you did in your manual optimisation.
EDIT:
fitt <- fitdistr(x,"t", start = list(m=mean(x),s=sd(x),df=3),lower=c(-1, 0.001,1))
returns non-integer degrees of freedom result. However, I guess, the rounded value of it, which is round(fitt$estimate['df'],0) can be used for fitted degrees of freedom parameter.
Reproducable example which will give the mentioned error code every time is:
(Note that even without set.seed, the error comes up every time)
library(MASS)
set.seed(seed = 1)
data<-rnorm(n = 10000,mean = 0.0002,sd = 0.001)
fitdistr(x = data,densfun = "t")
The error message is:
Error in stats::optim(x = c(-0.000426453810742332, 0.000383643324222082, :
non-finite finite-difference value [2]
In addition: Warning message:
In log(s) : NaNs produced
The problem is the "non-finite finite-difference value". Fitdistr does not give me a result.
My knowledge:
I researched and apparently this could mean that a parameter is negative during the iteration. And that the solution could be to provide a better or at least different starting value. But I could not figure out how to do this and I am not sure if this is the issue.
MY QUESTION:
a) Why do I get this error message
and
b)how can I fix it in R, so that I can fit the student-t distribution to my normally distributed data?
I'm running different model of this form:
gamm(H_1_3~ s(wcomp.x.cum, bs='cr')+s(wcomp.y.cum, bs='cr')+s(h_AST, bs='cr'),
na.action=na.omit,data=lag4_1DAY, method='REML', weights=vf)
R doesn't throw me an Error (i.e. I have an output) but I have a warning like this one:
Warning message:
In logLik.reStruct(object, conLin):
Singular precision matrix in level -3, block 1
what does it means?
is it a problem or can I live with it?