Is there a way to loop an R classification model while swapping out one variable? - r

I'm basically trying to do what is described in this question R Loop for Variable Names to run linear regression model (i.e. swapping out one independent variable while all others stay constant)
except that I'd like to run a classification model as opposed to a linear regression. Any classification model will do, the simpler the better. Can anyone provide a basic guideline of what this kind of code would look like?

Related

R glm anova. Putting in each individual term as first sequentially, or using chi squared to see the effect of each individual term

Need a hand analysing a model which I've built. I've built a model which analyses the effects of a number of variables on the chances that someone quits smoking. The output of the model is as follows:
I want to run anova on the model using Chi squared. The current way I am doing this is as follows, where each term is added sequentially:
As well as the effect of dependance, I also want to see the effect of each other variables in the same way so that they're able to be compared. At the moment, if I am to This means that I need to do one of the following:
Do anova, adding each term first sequentially. At the moment, the only way I can think to do this is to write a new model for each variable, adding this variable in first e.g.
Run anova, but add each term comparing the model with it to the model without it. How I do this I'm not sure though...
Any help or advice on how to achieve any of these would be great! Please ask for any more details!

How to do Two-Part Models in R

I’m currently working with a dataset that has lots zeros in the predictor variables as well as the response variable too. The response variable is continuous and it is very skewed to the right.
I’m trying to apply a discrete-continous model where in the first level i perform a binomial logit model to model the zero o and in the second level i perform a regression model for nonzero observations.
Stata program allows you to do this type of analysis very easily but i am using RStudio and did not find any clear packages that implement such apprach. I’d greatly appreciate it if someone can point me to which package i should be using and showing an example would be greatly appreciated too.

Initial parameters in nonlinear regression in R

I want to learn how to do nonlinear regression in R. I managed to learn the basics of the nls function, but how we know it's crucial in nonlinear regression to use good initial parameters. I tried to figure out how selfStart and getInitial functions works but failed. The documentation is very scarce and not very usefull. I wanted to learn these functions via a simple simulation data. I simulated data from logistic model:
n<-100 #Number of observations
d<-10000 #our parameters
b<--2
e<-50
set.seed(n)
X<-rnorm(n, -e/b, 2) #Thanks to it we'll have many observations near the point where logistic function grows the faster
Y<-d/(1+exp(b*X+e))+rnorm(n, 0, 200) #I simulate data
Now I wanted to do regression with a function f(x)=d/(1+exp(b*x+e)) but I don't know how to use selfStart or getInitial. Could you help me? But please, don't tell me about SSlogis. I'm aware it's a functon destined to find initial parameters in logistic regression, but It seems it only works in regression with one explanatory variable and I'd like to learn how to do logistic regression with more than one explanatory variables and even how to do general nonlinear regression with a function that I defined mysefl.
I will be very gratefull for your help.
I don't know why the calculus of good initial parameters fails in R. The aim of my answer is to provide a method to find good enough initial parameters.
Note that a non-iterative method exists which doesn't requires initial parameters. The principle is explained in this paper, pp.37-46 : https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales
A simplified version is shown below.
If the results are not sufficient, they can be used as initial parameters in an usual non-linear regression software such as in R.
A numerical example is shown below. Usually the number of points is much higher. Here it is deliberately low in order to make easier the checking when one edit the code and check it.

How to do crossvalidation on a two-part regression model?

I am currently working on regression analysis to model EQ-5D-5L health data. This data is inflated at the upper bound i.e. 1, and one of the approaches I use to model is with two-part models. By doing that, I combine a logistic model with binary data (1 or not 1), and continuous data.
The issue comes when trying to cross-validate (K-fold) the two-part models: I cannot find a way to include both "parts" of the model in the caret package in R, and I have not been able to find anybody that has solved the problem for me.
When I generate the predictions from the two-part model, it is essentially the coefficients from the two separate models that are multiplied together. So the models are developed separately, as they model different things from the same variable (binary and continuous outcome), but joined together when used to predict values.
Could it be possible to somehow cross-validate each part of the model separately, and get some kind of useful answer out of it?
Hope you guys can help.

R: Which variables to include in model?

I'm fairly new to R and am currently trying to find the best model to predict my dependent variable from a number of predictor variables. I have 20 precictor variables and I want to see which ones I should include in my model and which ones I should exclude.
I am currently just running models with different predictor variables in each and comparing them to see which one has the lowest AIC, but this is taking a really long time. Is there an easier way to do this?
Thank you in advance.
This is more of a theoretical question actually...
In principle, if all of the predictors are actually exogenous to the model, they can all be included together and assuming you have enough data (N >> 20) and they are not too similar (which could give rise to multi-collinearity), that should help prediction. In practice, you need to think about whether each of (or any of) your predictors are actually exogenous to the model (that is, independent of the error term in the model). If they are not, then they will impart a bias on the estimates. (Also, omitting explanatory variables that are actually necessary imparts a bias.)
If predictive accuracy (even spurious in-sample accuracy) is the goal, then techniques like LASSO (as mentioned in the comments) could also help.

Resources