I'm running a model without an intercept, and I only have categorical predictors. I have also included an interaction term between my main predictor and another covariate. But the output shows that for the interaction terms it has a reference group, even though I suppress the intercept. Therefore, I was wondering if R has a function equivalent to Statas .ibn function?
The ibn. factor-variable operator in Stata enables that all levels of the factor variable are included in the model (the factor variable "loses" its base).
In R you can achieve this by resetting the contrasts for the factor variables. Here how to do this is described by fabians : All Levels of a Factor in a Model Matrix in R
Related
I am trying to use interact_plot() in R to visualize my interaction.
I have three variables - a continuous outcome variable, a binary predictor, and a continuous moderator.
I used this code:
interact_plot(fit2, pred=rise_indicator, modx=Years_Education_recode)
I got the following error message:
Focal predictor ("pred") cannot be a factor. Either use it as modx, convert it to a numeric dummy variable, or use the cat_plot function for factor by factor interaction plots.
I changed my variable to numeric using the following code:
impact$rise_indicator<-as.numeric(as.character(impact$rise_indicator))
is.factor(impact$rise_indicator)
[1] FALSE
When I re-run my model, I still get the same error message. I'm not sure why that is.
Focal predictor ("pred") cannot be a factor. Either use it as modx, convert it to a numeric dummy variable, or use the cat_plot function for factor by factor interaction plots.```
I was following a tutorial on model building using logistic regression.
In the tutorial, columns having numeric data type and with levels 3, were converted into factors using as.factor function. I wanted to know the reason for this conversion.
If vectors of class-"numeric" with a small number of unique values are left in that form, logistic regression, i.e. glm( form, family="binomial", ...), will return a single coefficient. Generally, that is not what the data will support, so the authors of that tutorial are advising that these vectors be converted to factors so that the default handling of categorical values by the glm function will occur. It's possible that those authors already know for a fact that the underlying data-gathering process has encoded categorical data with numeric levels and the data input process was not "told" to process as categorical. That could have been done using the colClasses parameter to whichever read.* function was used.
The default handling of factors by most R regression routines uses the first level as part of the baseline (Intercept) estimate and estimates a coefficient for each of the other levels. If you had left that vector as numeric you would have gotten an estimate that could have been interpreted as the slope of an effect of an ordinal variable. The statistical tests associated with such an encoding of an ordinal relationship is often called a "linear test of trend" and is sometimes a useful result when the data situation in the "real world" can be interpreted as an ordinal relationship.
Alternate title: Model matrix and set of coefficients show different numbers of variables
I am using the mice package for R to do some analyses. I wanted to compare two models (held in mira objects) using pool.compare(), but I keep getting the following error:
Error in model.matrix(formula, data) %*% coefs : non-conformable arguments
The binary operator %*% indicates matrix multiplication in R.
The expression model.matrix(formula, data) produces "The design matrix for a regression-like model with the specified formula and data" (from the R Documentation for model.matrix {stats}).
In the error message, coefs is drawn from est1$qbar, where est1 is a mipo object, and the qbar element is "The average of complete data estimates. The multiple imputation estimate." (from the documentation for mipo-class {mice}).
In my case
est1$qbar is a numeric vector of length 36
data is a data.frame with 918 observations of 82 variables
formula is class 'formula' containing the formula for my model
model.matrix(formula, data) is a matrix with dimension 918 x 48.
How can I resolve/prevent this error?
As occasionally happens, I found the answer to my own question while writing the question.
The clue I was that the estimates for categorical variables in est1.qbar only exist if that level of that variables was present in the data. Some of my variables are factor variables where not every level is represented. This caused the warning "contrasts dropped from factor variable name due to missing levels", which I foolishly ignored.
On the other hand, looking at dimnames(model.matrix.temp)[[2]] shows that the model matrix has one column for each level of each factor variable, regardless of whether that level of that variable was present in the data. So, although the contrasts for missing factor levels are dropped in terms of estimating the coefficients, those factor levels still appear in the model matrix. This means that the model matrix has more columns than the length of est1.qbar (the vector of estimated coefficients), so matrix multiplication is not going to work.
The answer here is to fix the factor variables so that there are no unused levels. This can be done with the factor() function (as explained here). Unfortunately, this needs to be done on the original dataset, prior to imputation.
I have run a logit model through glmnet. I am extracting the coefficients from the minimum lambda, and it gives me the results I expect. However I have a factor variable with nine unique values, and glmnet produces a single coefficient for this, which is expected for a binary variable but not factor...
library(glmnet)
coef(model.obj, s = 'lambda.min')
#output:
TraumaticInj 2.912419e-02
Toxin .
OthInj 4.065266e-03
CurrentSTDYN 7.601812e-01
GeoDiv 1.372628e-02 #this is a factor variable w/ 9 options...
so my questions:
1) how should I interpret a single coefficient from a factor variable in glmnet?
2) is there a method to extract the coefficients for the different factors of the variable?
Glmnet doesn't handle factor variables. You have to convert them to dummies using eg model. Matrix. So the results you are seeing is glmnet treating your factor variable as a single real variable.
Can't be done, b/c glmnet doesn't treat factor variables. This is pretty much answered here: How does glmnet's standardize argument handle dummy variables?
This comment by #R_User in the answer is particularly insightful:
#DTRM - In general, one does not standardize categorical variables to
retain the interpretability of the estimated regressors. However, as
pointed out by Tibshirani here:
statweb.stanford.edu/~tibs/lasso/fulltext.pdf, "The lasso method
requires initial standardization of the regressors, so that the
penalization scheme is fair to all regressors. For categorical
regressors, one codes the regressor with dummy variables and then
standardizes the dummy variables" - so while this causes arbitrary
scaling between continuous and categorical variables, it's done for
equal penalization treatment. – R_User Dec 6 '13 at 1:20
I have a variable called experience which was coded as numeric and contains 3 values (1,5,10). I changed the class to factor using df$experience<-factor(df$experience) and it changed to factor.
Next I run a GLM model as
reg<-glm(cbind(win,loss)~experience, data=df, family=binomial)
but when I get summary(reg), only one level of the experience variable shows in the table (i.e. experience10).
Shouldn't there also be another categorical variable, experience5?
After using sub-samples of my data, the other levels were showing up. I guess it was just a matter of Multicollinearity where the other level was not being calculated somehow!
Thanks!