Unbinding a list of factors in R - r

I've got a simple question I hope.
I'm trying to assign predicted factor levels from an ordinal logistic regression to a variable in order to calculate the hit-rate of the model. I used the following code:
telecomdata$predfpm1 <- predict(fpm1,type = "class")
FPM is an ordered logistic regression as presented below:
fpm1 <- clm(proposition ~ DMpropHigh + age + rel_length + education + gender + income + num_phones + arpu_index + calls_out_6_index + calls_in_6_index + calls_6_index + DM , data = telecomdata)
The output can be found above. The numbers look okay, but the format is wrong. R now creates a list of factors for each observation, rather than assigning each individual predicted value to an observation. Can anyone help me with how to fix this?

Related

My panel linear regression with log variables returns error on non-finite values, but there are no logs on zero or negative values

I'm trying to run a fixed effects regression in my panel data (using plm package). The regression on levels worked well, so as the first regressions using log variables (I'm putting log on only the dependent and some independent variables, which are in monetary terms). However, my regressions with logs stopped working.
require (AER)
library (AER)
require(plm)
library("plm")
#Indicates the panel and the time and individual columns
dd <- pdata.frame(painel, index = c ('Estado', 'Ano'))
#Model 1 - Model within with individual fixed effects
mod_1_within <- plm(PIB ~ txinad + op + desoc + Divliq + Esc_15 + RT + DC + DK + Gini + I(DK*Gini) + I(DC*Gini), data = dd, effect = 'individual')
summary (mod_1_within)
#this worked well
#Model 2 - Model 1 with the monetary variables in log (the others are % or indexes):
mod_1_within_log<- plm(log(PIB) ~ txinad + log(RT) + op + desoc + Divliq + Esc_15 + log(DC) + log(DK) + Gini + I(Gini*log(DC)) + I(Gini*log(DK)), data = dd, effect = 'individual')
summary (mod_1_within_log)
#This returns:
> mod_1_within_log<- plm(log(PIB) ~ txinad + log(RT) + op + desoc + Divliq + Esc_15 + log(DC) + log(DK) + Gini + I(Gini*log(DC)) + I(Gini*log(DK)), data = dd, effect = 'individual')
Error in model.matrix.pdata.frame(data, rhs = 1, model = model, effect = effect, :
model matrix or response contains non-finite values (NA/NaN/Inf/-Inf)
> summary (mod_1_within_log)
Error in summary(mod_1_within_log) : object 'mod_1_within_log' not found
This is ocurring even though there are no log variables with negative or zero values. I will take this opportunity to ask another question: if there is a variable with a zero value, is there a way I can make that value null and them take the log of that variable?
Thanks in advance!
I assume the reason why you're getting that error might be that you have Inf or -Inf values logged predictors or logged outcomes.
In order to check whether that is the case see the untransformed variables (before log) and check whether any observation has a value of zero. If it does, that is the problem. Why? R returns Inf from log(0). So when you run the FE model, plm is giving you that error because it can't deal with NAN or Inf values.

Stepwise regression in r with mixed models: numbers of rows changing [duplicate]

I want to run a stepwise regression in R to choose the best fit model, my code is attached here:
full.modelfixed <- glm(died_ed ~ age_1 + gender + race + insurance + injury + ais + blunt_pen +
comorbid + iss +min_dist + pop_dens_new + age_mdn + male_pct +
pop_wht_pct + pop_blk_pct + unemp_pct + pov_100x_npct +
urban_pct, data = trauma, family = binomial (link = 'logit'), na.action = na.exclude)
reduced.modelfixed <- stepAIC(full.modelfixed, direction = "backward")
There is a error message said
Error in stepAIC(full.modelfixed, direction = "backward") :
number of rows in use has changed: remove missing values?
Almost every variable in the data has some missing values, so I cannot delete all missing values (data = na.omit(data))
Any idea on how to fix this?
Thanks!!
This should probably be in a stats forum (stats.stackexchange) but briefly there are a number of considerations.
The main one is that when comparing two models they need to be fitted on the same dataset (i.e you need to be able to nest the models within each other).
For examples
glm1 <- glm(Dependent~indep1+indep2+indep3, family = binomial, data = data)
glm2 <- glm(Dependent~indep2+indep2, family = binomial, data = data)
Now imagine that we are missing values of indep3 but not indep1 or indep2.
When we run glm1 we are running it on a smaller dataset - the dataset for which we have the dependent variable and all three independent ones (i.e we exclude any rows where indep3 values are missing).
When we run glm2 the rows missing a value for indep3 are included because those rows do contain dependent, indep1 and indep2 which are the models in the variable.
We can no longer directly compare models as they are fitted on different datasets.
I think broadly you can either
1) Limit to data which is complete
2) If appropriate consider multiple imputation
Hope that helps.
You can use the MICE package to do imputation, then working with the dataset will not give you errors

Predicting probability of disease according to a continuous variable adjusting by confusing variables

I have a doubt regarding to the R package "margins". I'm estimating a logistic model:
modelo1 <- glm(VD ~ VE12 + VE.cont + VE12:VE.cont + VC1 + VC2 + VC3 + VC4, family="binomial", data=data)
Where:
VD2 is a dichotomous variable (1 disease / 0 not disease)
VE12 is a dichotomous exposure variable (with values 0 an 1)
VE.cont a continuous exposure variable
VCx (the rest of variables) are confounding variables.
My objective is to obtain predicted probability of disease (VD2) for a vector of values of VE.cont and for each VE12 group, but adjusting by VCx variables. In other words, I would like to obtain the dose-response line between VD2 and VE.cont by VE12 group but assuming the same distribution of VCx for each dose-response line (i.e. without confounding).
Following the nomenclature of this article (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4052139/) I think that I should do a "marginal standardisation" (method 1) that can be done with stata, but I'm not sure how can I do it with R.
I'm using this syntax (with R):
cdat0 <- cplot(modelo1, x="VE.cont", what="prediction", data = data[data[["VE12"]] == 0,], draw=T, ylim=c(0,0.3))
cdat1 <- cplot(modelo1, x="VE.cont", what="prediction", data = data[data[["VE12"]] == 1,], draw=marg"add", col="blue")
but I'm not sure if I'm doing it right because this approach gives similar results as using the model without confounding variables and the function predict.glm.
modelo0 <- glm(VD2 ~ VE12 + VE.cont + VE12:VE.cont, family="binomial", data=data)
Perhaps, I should use the margins option but I don't understand the results because the values obtained in the column VE.cont are not in the probability scale (between 0 and 1).
x <- c(1,2,3,4,5)
margins::margins(modelo1, at=list("VE.cont"=x, "VE12"=c(0,1)), type="response")
This is an example of figure that I would like to obtain:

Plot predicted values from lmer longitudinal analysis

I'm analyzing some longitudinal data using lme4 package (lmer function) with 3 Levels: measurement points nested in individuals nested in households. I'm interested in linear and non-linear change curves surrounding a specific life event. My model has many time predictors (indicating linear change before and after the event occurs and indicating non-linear change (i.e., squared time variables) before and after the event occurs). Additionally, I have several Level-2 predictors that do not vary with time (i.e., personality traits) and some control variables (e.g., age, gender). So far I did not include any random slopes or cross-level interactions.
This is my model code:
model.RI <- lmer(outcome ~ time + female_c + age_c + age_c2 + preLin + preLin.sq + postLin + postLin.sq + per1.c + per2.c + per3.c + per4.c + per5.c + (1 | ID) + (1 | House))
outcome = my dependent variable
time = year 1, year 2, year 3 ... (until year 9); this variable symbolizes something like a testing effect
female_c = gender centered
age_c = age centered
age_c2 = age squared centered
preLin = time variable indicating time to the event (this variable is 0 after the event has occurred and is -1 e.g. one year ahead of the event, -2 two years ahead of the event etc.)
preLin.sq = squared values of preLin
postLin = time variable indicating time after the event (this variable is 0 before the event and increases after the event has occurred; e.g. is +1 one year after the event)
postLin.sq = squared values of postLin
per1.c until per5.c = personality traits on Level 2 (centered)
ID = indicating the individuum
House = indicating the household
I was wondering how I could plot the predicted values of this lmer model (e.g., using ggplot2?). I've plotted change curves using the method=gam in R. This is a rather data-driven method to inspect the data without pre-defining if the curve is linear or quadratic or whatever. I would now like to check whether my parametric lmer model is comparable to that data-driven gam-plot I already have. Do you have any advise how to do this?
I would be more than happy to get some help on this! Please also feel free to ask if I was not precise enough on my explanation of what I would like to do!
Thanks a lot!
Follow this link: This is how my gam-plot looks like and I hope to get something similar when plotting the predicted values of my lmer model!
You can use the ggpredict()-function from the ggeffects-package. If you want to plot predicted values of time (preLin), you would simply write:
ggpredict(model.RI, "preLin")
The function returns a data frame (see articles), which you can use in ggplot, but you can also directly plot the results:
ggpredict(model.RI, "preLin") %>% plot()
or
p <- ggpredict(model.RI, "preLin")
plot(p)
You could also use the sjPlot-package, however, for marginal effects / predicted values, the sjPlot::plot_model()-function internally just calls ggeffects::ggpredict(), so the results would basically be identical.
Another note to your model: if you have longitudinal data, you should also include your time-variable as random slope. I'm not sure how postLin acutally refers to preLin, but if preLin captures all your measurements, you should at least write your model like this:
model.RI <- lmer(
outcome ~ time + female_c + age_c + age_c2 + preLin + preLin.sq +
postLin + postLin.sq + per1.c + per2.c + per3.c + per4.c + per5.c +
(1 + preLin | ID) + (1 + preLin | House)
)
If you also assume a quadratic trend for each person (ID), you could even add the squared term as random slope.
As your figure example suggests using splines, you could also try this:
library(splines)
model.RI <- lmer(
outcome ~ time + female_c + age_c + age_c2 + bs(preLin)
postLin + postLin.sq + per1.c + per2.c + per3.c + per4.c + per5.c +
(1 + preLin | ID) + (1 + preLin | House)
)
p <- ggpredict(model.RI, "preLin")
plot(p)
Examples for splines are also demonstrated on the website I mentioned above.
Edit:
Another note is related to nesting: you're currently modelling a fully crossed or cross-classified model. If it's completely nested, the random parts would look like this:
... + (1 + preLin | House / ID)
(see also this small code-example).

predict after multiple imputation in R

I used the mice package in R to perform multiple imputation for my data:
### multiple inputation by chained equations
imp.data <- mice(data, maxit = 5, m = 5, seed = 92385, print = F)
I want to run a logistic regression model after the MI, and predict the outcome based on the model:
model <- with(imp.data, glm(died ~ agecat + female_1 + insurance + mech + transfer +
iss + mxaisbr1 + maxais + cm_chf_1 + cm_mets_1 + cm_liver_1 +
cm_htn_c_1 + cm_bldloss_1 + state, family = binomial))
However, the predict command does not work:
predict(pool(model), type = c('response'))
It would be much simpler if I have the data with imputed values, but the imputation got 5 imputed data sets, making the post estimation complicated.
Any idea ?
Thanks!!
I'm not sure if the imputed models are saved as a data.frame or a matrix..
But, if you convert the model to a data.frame you can plug on the columns of interest to your original frame.
imp.data <- data.frame(imp.data)
original.df$NewImputtedColumn <- imp.data[, 1] # Assuming you want column 1
Now, you can easily keep the copies of the imputted models and still only work with a single data.frame in your predictive models.
This is what I tend to do anyway, might not be the standard way (I'm not sure).
You are only taking the inputted values from 1 of the imputed models, right?

Resources