I'd like to report the standard error of the clustering parameters (kappa, sigma) of an inhomogeneous Thomas point process model that I've fitted in spatstat. Yue and Loh (2015) reported doing this by a parametric bootstrap. I'm not very experienced in this concept, or applying it to point process models. How would I do this?
My first guess is to simulate my kppm a number of times and re-fit the resulting simulated points with the same covariates. Then, calculate the standard errors from the clustering parameters of each subsequent fitting. Is this correct? If so, how many simulations would be considered acceptable in this case? Thanks in advance for any pointers!
Basically your own description is completely correct.
My first guess is to simulate my kppm a number of times and re-fit the
resulting simulated points with the same covariates. Then, calculate
the standard errors from the clustering parameters of each subsequent
fitting.
The only question left is how many simulations to do. Basically the answer is: "As many as you have time to do!". It is common to see people do 1000 simulations, so why don't you start there?
Related
I am running a SEM using lavaan that includes 5 latent variables. Also, I have 5 regression equations (Y~...) where outcomes are manifest variables and regressors are a mix of latents and indicators.
When I use maximum likelihood estimation the model runs without problem. But when I switch to WLSMV estimation (adding the argument estimator = "WLSMV") I am finding two problems. The first problem is that the execution becomes extremely slow taking several hours to run a single model, any idea why this is happening and if there is a way to fix it?
The second problem is that when I try to fit multigroup SEMs and start constraining the model I get the following warning:
lavaan WARNING: the optimizer (NLMINB) claimed the model converged,
but not all elements of the gradient are (near) zero;
the optimizer may not have found a local solution
use check.gradient = FALSE to skip this check.
Any idea what this means? what are the implications? is this a problem? how I fix it? should I simply stay with maximum likelihood?
IMPORTANT: when I remove the regressions and keep only the measurement part (the five latent variables) the function execute fast and I stop getting the warning message. Does it mean that WLSMV should not be used when the CFA becomes a SEM?
Thanks in advance!
You have a big model for a small sample, I bet, and particularly small for the DWLS estimator with a mean- and variance-adjusted (MV) chi-squared test statistic... WLSMV. You can try to simplify your model, increase your sample, or use a different estimator, like the "MLR" maximum likelihood estimation with robust (Huber– White) standard errors.
I suggest that you check the chapter by Finney, DiStefano, and Kopp (2016).
S.J. Finney, C. DiStefano, J.P. Kopp. Overview of estimation methods and preconditions for their application with structural equation modeling K. Schweizer, C. DiStefano (Eds.), Principles and methods of test construction: Standards and recent advances, Hogrefe Publishing, Boston, MA, USA (2016), pp. 135-165, 10.1027/00449-000
I'm modelling multiple linear regression. I used the bptest function to test for heteroscedasticity. The result was significant at less than 0.05.
How can I resolve the issue of heteroscedasticity?
Try using a different type of linear regression
Ordinary Least Squares (OLS) for homoscedasticity.
Weighted Least Squares (WLS) for heteroscedasticity without correlated errors.
Generalized Least Squares (GLS) for heteroscedasticity with correlated errors.
Welcome to SO, Arun.
Personally, I don't think heteroskedasticity is something you "solve". Rather, it's something you need to allow for in your model.
You haven't given us any of your data, so let's assume that the variance of your residuals increases with the magnitude of your predictor. Typically a simplistic approach to handling it is to transform the data so that the variance is constant. One way of doing this might be to log-transform your data. That might give you a more constant variance. But it also transforms your model. Your errors are no longer IID.
Alternatively, you might have two groups of observarions that you want to compare with a t-test, bit the variance in one group is larger than in the other. That's a different sot of heteroskedasticity. There are variants of the standard "pooled variance" t-test that might handle that.
I realise this isn't an answer to your question in the conventional sense. I would have made it a comment, but I knew before I started that I'd need more words than a comment would let me have.
I am trying to investigate the relationship between some Google Trends Data and Stock Prices.
I performed the augmented ADF Test and KPSS test to make sure that both time series are integrated of the same order (I(1)).
However, after I took the first differences, the ACF plot was completely insigificant (except for 1 of course), which told me that the differenced series are behaving like white noise.
Nevertheless I tried to estimate a VAR model which you can see attached.
As you can see, only one constant is significant. I have already read that because Stocks.ts.l1 is not significant in the equation for GoogleTrends and GoogleTrends.ts.l1 is not significant in the equation for Stocks, there is no dynamic between the two time series and both can also be models independently from each other with a AR(p) model.
I checked the residuals of the model. They fulfill the assumptions (normally distributed residuals are not totally given but ok, there is homoscedasticity, its stable and there is no autocorrelation).
But what does it mean if no coefficient is significant as in the case of the Stocks.ts equation? Is the model just inappropriate to fit the data, because the data doesn't follow an AR process. Or is the model just so bad, that a constant would describe the data better than the model? Or a combination of the previous questions? Any suggestions how I could proceed my analysis?
Thanks in advance
I wish to confirm my understanding of CV procedure in the glmnet package to explain it to a reviewer of my paper. I will be grateful if someone can add information to clarify the answer further.
Specifically, I had a binary classification problem with 29 input variables and 106 rows. Instead of splitting into training/test data (and further decreasing training data) I went with lasso choosing lambda through cross-validation as a means to minimise overfitting. After training the model with cv.glmnet I tested its classification accuracy on the same dataset (bootstrapped x 10000 for error intervals). I acknowledge that overfitting cannot be eliminated in this setting, but lasso with its penalizing term chosen by cross-validation is going to lessen its effect.
My explanation to the reviewer (who is a doctor like me) of how cv.glmnet does this is :
In each step of 10 fold cross-validation, data were divided randomly
into two groups containing 9/10th data for training and 1/10th for
internal validation (i.e., measuring binomial deviance/error of model
developed with that lambda). Lambda vs. deviance was plotted. When the
process was repeated 9 more times, 95% confidence intervals of lambda
vs. deviance were derived. The final lambda value to go into the model
was the one that gave the best compromise between high lambda and low
deviance. High lambda is the factor that minimises overfitting because
the regression model is not allowed to improve by assigning large
coefficients to the variables. The model is then trained on the entire
dataset using least squares approximation that minimises model error
penalized by lambda term. Because the lambda term is chosen through
cross-validation (and not from the entire dataset), the choice of
lambda is somewhat independent of the data.
I suspect my explanation can be improved much or the flaws in the methodology pointed out by the experts reading this.
Thanks in advance.
A bit late I guess, but here goes.
By default glmnet chooses the lambda.1se. It is the largest λ at which the MSE is within one standard error of the minimal MSE. Along the lines of overfitting, this usually reduces overfitting by selecting a simpler model (less non zero terms) but whose error is still close to the model with the least error. You can also check out this post. Not very sure if you mean this with "The final lambda value to go into the model was the one that gave the best compromise between high lambda and low deviance."
The main issue with your approach is calculating its accuracy on the same training data. This does not tell you how good the model will perform on unseen data, and bootstrapping does not address the error in the accuracy. For an estimate of the error, you should actually use the error from the cross validation. If your model does not work on 90% of the data, I don't see how using all of the training data works.
I am looking into time series data compression at the moment.
The idea is to fit a curve on a time series of n points so that the maximum deviation on any of the points is not greater than a given threshold. In other words, none of the values that the curve takes at the points where the time series is defined, should be "further away" than a certain threshold from the actual values.
Till now I have found out how to do nonlinear regression using the least squares estimation method in R (nls function) and other languages, but I haven't found any packages that implement nonlinear regression with the L-infinity norm.
I have found literature on the subject:
http://www.jstor.org/discover/10.2307/2006101?uid=3737864&uid=2&uid=4&sid=21100693651721
or
http://www.dtic.mil/dtic/tr/fulltext/u2/a080454.pdf
I could try to implement this in R for instance, but I first looking to see if this hasn't already been done and that I could maybe reuse it.
I have found a solution that I don't believe to be "very scientific": I use nonlinear least squares regression to find the starting values of the parameters which I subsequently use as starting points in the R "optim" function that minimizes the maximum deviation of the curve from the actual points.
Any help would be appreciated. The idea is to be able to find out if this type of curve-fitting is possible on a given time series sequence and to determine the parameters that allow it.
I hope there are other people that have already encountered this problem out there and that could help me ;)
Thank you.