Note that the previous question flagged as a possible duplicate is not a duplicate because the previous question concerns backwards elimination and this question concerns forward entry.
I am currently performing a simulation where I want to show how stepwise regression is a biased estimator. In particular, previous researchers seem to have used one of the stepwise procedure in SPSS (or something identical to it). This involves using the p-value of the F value for r-square change to determine whether an additional variable should be added into the model. Thus, in order for my simulation results to have the most impact I need to replicate the SPSS stepwise regression procedure in R.
While R has a number of stepwise procedures (e.g., based on AIC), the ones that I have found are not the same as SPSS.
I have found this function by Paul Rubin. It seems to work, but the input and output of the function is a little strange. I've started tweaking it so that it (a) take a formula as input, (b) returns the best fitting model. The logic of the function is what I'm after.
I have also found this question on backwards stepwise regression. Note that backwards entry is different to forwards entry because backwards entry removes non-significant terms whereas forwards entry adds significant terms.
Nonetheless, it would be great if there was another function in an existing R package that could do what I want.
Is there an R function designed to perform forward entry stepwise regression using p-values of the F change?
Ideally, it could take a DV a set of IVs (either as named variables or as a formula) and a data.frame and would return the model that the stepwise regression selects as "best". For my purposes, there are no issues with inclusion of interaction terms.
The function two.ways.stepfor in the bioconductor package maSigPro contains a form of forward entry stepwise regression based on p-values.
However, the alpha in and alpha out can be specified and they must be the same. In SPSS the alpha in and alpha out can be different.
The package can be installed with:
source("http://bioconductor.org/biocLite.R")
biocLite("maSigPro")
Related
Unfortunately, I had convergence (and singularity) issues when calculating my GLMM analysis models in R. When I tried it in SPSS, I got no such warning message and the results are only slightly different. Does it mean I can interpret the results from SPSS without worries? Or do I have to test for singularity/convergence issues to be sure?
You have two questions. I will answer both.
First Question
Does it mean I can interpret the results from SPSS without worries?
You do not want to do this. The reason being is that mixed models have a very specific parameterization. Here is a screenshot of common lme4 syntax from the original article about lme4 from the author:
With this comes assumptions about what your model is saying. If for example you are running a model with random intercepts only, you are assuming that the slopes do not vary by any measure. If you include correlated random slopes and random intercepts, you are then assuming that there is a relationship between the slopes and intercepts that may either be positive or negative. If you present this data as-is without knowing why it produced this summary, you may fail to explain your data in an accurate way.
The reason as highlighted by one of the comments is that SPSS runs off defaults whereas R requires explicit parameters for the model. I'm not surprised that the model failed to converge in R but not SPSS given that SPSS assumes no correlation between random slopes and intercepts. This kind of model is more likely to converge compared to a correlated model because the constraints that allow data to fit a correlated model make it very difficult to converge. However, without knowing how you modeled your data, it is impossible to actually know what the differences are. Perhaps if you provide an edit to your question that can be answered more directly, but just know that SPSS and R do not calculate these models the same way.
Second Question
Or do I have to test for singularity/convergence issues to be sure?
SPSS and R both have singularity checks as a default (check this page as an example). If your model fails to converge, you should drop it and use an alternative model (usually something that has a simpler random effects structure or improved optimization).
I asked this question on RCommunity but haven't had anyone bite... so I'm here!
My current project involves me predicting whether some trees will survive given future climate change scenarios. Against better judgement (like using Maxent) I've decided to pursue this with a GLM, which requires presence and absence data. Everytime I generate my absence data (as I was only given presence data) using randomPoints from dismo, the resulting GLM model has different significant variables. I found a package called My.stepwise that has a My.stepwise.glm function (here: My.stepwise.glm: Stepwise Variable Selection Procedure for Generalized Linear... in My.stepwise: Stepwise Variable Selection Procedures for Regression Analysis) , and this goes through a forward/backward selection process to find the best variables and returns a model ready for you.
My problem is that I don't want to run My.stepwise.glm just once and use the model it spits out for me. I'd like to run it roughly 100 times with different pseudo-absence data and see which variables it returns, then take the most frequent variables and move forward with building my model using those. The issue is that the My.stepwise.glm function ends by 'print(summary(initial.model))' and I would like to be able to access the output similar to how step() returns a list, where you can then say 'step$coefficients' and have the function coefficients return as numerics. Can anyone help me with this?
The only package I know that does unconditional quantile regression in R is uqr. Unfortunately, it's been removed from CRAN. Even though I can still use it, its functionality is limited (e.g., does not conduct significance tests or allow to compare effects across quantiles). I'm wondering if anyone knows how to conduct UQR in R, with either functions they wrote or some other means.
there are many limitations in terms of test and asymptotic theory regarding unconditional quantile regressions, especially if you are thinking on the version proposed in Firpo, Fortin, and Limeaux (2009) "Unconditional quantile regressions".
The application, however, is straightforward. you need only 2 elements:
the unconditional quantile (estimated with any of your favorite packages).
the density of the outcome at the quantile you got in (1)
After that, you apply the RIF function:
$$RIF(q) = q(t)+\frac{t-1(y<=q(t)}{f(q(t))}$$
Once you have this, you just use that instead of your dep variable, when you write your "lm()" function. And that is it.
HTH
I want to learn how to do nonlinear regression in R. I managed to learn the basics of the nls function, but how we know it's crucial in nonlinear regression to use good initial parameters. I tried to figure out how selfStart and getInitial functions works but failed. The documentation is very scarce and not very usefull. I wanted to learn these functions via a simple simulation data. I simulated data from logistic model:
n<-100 #Number of observations
d<-10000 #our parameters
b<--2
e<-50
set.seed(n)
X<-rnorm(n, -e/b, 2) #Thanks to it we'll have many observations near the point where logistic function grows the faster
Y<-d/(1+exp(b*X+e))+rnorm(n, 0, 200) #I simulate data
Now I wanted to do regression with a function f(x)=d/(1+exp(b*x+e)) but I don't know how to use selfStart or getInitial. Could you help me? But please, don't tell me about SSlogis. I'm aware it's a functon destined to find initial parameters in logistic regression, but It seems it only works in regression with one explanatory variable and I'd like to learn how to do logistic regression with more than one explanatory variables and even how to do general nonlinear regression with a function that I defined mysefl.
I will be very gratefull for your help.
I don't know why the calculus of good initial parameters fails in R. The aim of my answer is to provide a method to find good enough initial parameters.
Note that a non-iterative method exists which doesn't requires initial parameters. The principle is explained in this paper, pp.37-46 : https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales
A simplified version is shown below.
If the results are not sufficient, they can be used as initial parameters in an usual non-linear regression software such as in R.
A numerical example is shown below. Usually the number of points is much higher. Here it is deliberately low in order to make easier the checking when one edit the code and check it.
I've found the package Impute.jl but it's only able to use these simple methods:
drop: remove missing.
locf: last observation carried forward
nocb: next observation carried backward
interp: linear interpolation of values in vector
fill: replace with a specific value or a function...
There seems not to exist any advanced "multiple imputation" method.
How can use more advanced methods when I have several variables?
Such as: fully conditional specification (mice), bayesian methods, random forest, multilevel, nested imputation, censored data, categorical data, survival data...
I don't mean creating my own code but finding any Julia package able to do it automatically. Other software do have it (R, Stata, SAS…).