How to fit a model I built to another data set and get residuals? - r

I fitted a mixed model to Data A as follows:
model <- lme(Y~1+X1+X2+X3, random=~1|Class, method="ML", data=A)
Next, I want to see how the model fits Data B and also get the estimated residuals. Is there a function in R that I can use to do so?
(I tried the following method but got all new coefficients.)
model <- lme(Y~1+X1+X2+X3, random=~1|Class, method="ML", data=B)

The reason you are getting new coefficients in your second attempt with data=B is that the function lme returns a model fitted to your data set using the formula you provide, and stores that model in the variable model as you have selected.
To get more information about a model you can type summary(model_name). the nlme library includes a method called predict.lme which allows you to make predictions based on a fitted model. You can type predict(my_model) to get the predictions using the original data set, or type predict(my_model, some_other_data) as mentioned above to generate predictions using that model but with a different data set.
In your case to get the residuals you just need to subtract the predicted values from observed values. So use predict(my_model,some_other_data) - some_other_data$dependent_var, or in your case predict(model,B) - B$Y.

You model:
model <- lme(Y~1+X1+X2+X3, random=~1|Class, method="ML", data=A)
2 predictions based on your model:
pred1=predict(model,newdata=A,type='response')
pred2=predict(model,newdata=B,type='response')
missed: A function that calculates the percent of false positives, with cut-off set to 0.5.
(predicted true but in reality those observations were not positive)
missed = function(values,prediction){sum(((prediction > 0.5)*1) !=
values)/length(values)}
missed(A,pred1)
missed(B,pred2)

Related

How to identify the non-zero coefficients in final caret elastic net model -

I have used caret to build a elastic net model using 10-fold cv and I want to see which coefficients are used in the final model (i.e the ones that aren't reduced to zero). I have used the following code to view the coefficients, however, this apears to create a dataframe of every permutation of coefficient values used, rather than the ones used in the final model:
tr_control = train_control(method="cv",number=10)
formula = response ~.
model1 = caret::train(formula,
data=training,
method="glmnet",
trControl=tr_control,
metric = "Accuracy",
family = "binomial")
Then to extract the coefficients from the final model and using the best lambda value, I have used the following:
data.frame(as.matrix(coef(model1$finalModel, model1$bestTune$.lambda)))
However, this just returns a dataframe of all the coefficients and I can see different instances of where the coefficients have been reduced to zero, however, I'm not sure which is the one the final model uses. Using some slightly different code, I get slightly different results, but in this instance, non of the coefficients are reduced to zero, which suggests to me that the the final model isn't reducing any coefficients to zero:
data.frame(as.matrix(coef(model1$finalModel, model1$bestTune$lambda))) #i have removed the full stop preceeding lambda
Basically, I want to know which features are in the final model to assess how the model has performed as a feature reduction process (alongside standard model evaluation metrics such as accuracy, sensitivity etc).
Since you do not provide any example data I post an example based on the iris built-in dataset, slightly modified to fit better your need (a binomial outcome).
First, modify the dataset
library(caret)
set.seed(5)#just for reproducibility
iris
irisn <- iris[iris$Species!="virginica",]
irisn$Species <- factor(irisn$Species,levels = c("versicolor","setosa"))
str(irisn)
summary(irisn)
fit the model (the caret function for setting controls parameters for train is trainControl, not train_control)
tr_control = trainControl(method="cv",number=10)
model1 <- caret::train(Species~.,
data=irisn,
method="glmnet",
trControl=tr_control,
family = "binomial")
You can extract the coefficients of the final model as you already did:
data.frame(as.matrix(coef(model1$finalModel, model1$bestTune$lambda)))
Also here the model did not reduce any coefficients to 0, but what if we add a random variable that explains nothing about the outcome?
irisn$new1 <- runif(nrow(irisn))
model2 <- caret::train(Species~.,
data=irisn,
method="glmnet",
trControl=tr_control,
family = "binomial")
var <- data.frame(as.matrix(coef(model2$finalModel, model2$bestTune$lambda)))
Here, as you can see, the coefficient of the new variable was turning to 0. You can extract the variable name retained by the model with:
rownames(var)[var$X1!=0]
Finally, the accuracy metrics from the test set can be obtained with
confusionMatrix(predict(model1,test),test$outcome)

Predict fitted Probabilities for gmnl regression

I'm using the gmnl function to fit a mixed multinomial logit model.
Since I'm further interested in the predicted probaliities of that model, I want to obtain them by applying something like the predic function.
m4=gmnl(int_choice ~ 1+fico+annual_inc+int_emp_length+| time +grade+ last_fico |0, data = mldata, model="mixl",R=50,panel=TRUE,correlation = TRUE,ranp=c(annual_inc="n",int_emp_length="n"))
## how to mimic predict??
p_hat=predict(m4,type="probs")
Any suggestions?
What you are looking for is a simple conversion rule like this:
Convert a logit (glm output) to probability by computing exp() of the coefficients, which will give you the odds.
Convert the odds into probability using this formula: prob = odds / (1 + odds).
Very good explanation with examples can be found here: https://sebastiansauer.github.io/convert_logit2prob/
The solution can't be found in the vignette but is documented in the help-file. Typing help("fitted.gmnl") yields the following:
fitted(object, outcome = TRUE, ...)
if TRUE, then the fitted and residuals methods return a vector that corresponds to the chosen alternative, otherwise it returns a matrix where each column corresponds to each alternative.
I type str() over the gmnl model
I find an internal attribute called prob.alt that gives choice - residuals
So in your case m4$prob.alt gives some usefull values, (find the max by row gives the predicted choice)
(In my case (a latent class mnl) this do not help since it has the predicted latent class probabilities)

Obtaining predicted (i.e. expected) values from the orm function (Ordinal Regression Model) from rms package in R

I've run a simple model using orm (i.e. reg <- orm(formula = y ~ x)) and I'm having trouble understanding how to get predicted values for Y. I've never worked with models that use multiple intercepts. I want to know for each and every value of Y in my dataset what the predicted value from the model would be. I tried predict(reg, type="mean") and this produced values that are close to the predicted values from an OLS regression, but I'm not sure if this is what I want. I really just want something analogous to OLS where you can obtain the E(Y) given a set of predictors. If possible, please provide code I can run to do this with a very short explanation.

extract predicted values from Partial least square regression in R

I have following:
library(pls)
pcr(price ~ X, 6, data=cars, validation="CV")
it works, but because I have a small dataset, I cannot divide in into training and test and therefore I want to perform cross-validation and then extract predicted data for AUC and accuracy. But I could not find how I can extract the predicted data.Which parameter is it?
When you fit a cross-validated principal component regression model with pcr() and the validation= argument, one of the components of the output list is called validation. This contains the results of the cross validation. This in turn is a list and it has a component called pred, which contains the cross-validated predictions.
An example adapted from example("pcr"):
sens.pcr <- pcr(sensory ~ chemical, data = oliveoil, validation = "CV")
sens.pcr$validation$pred
As an aside, it's generally a good idea to set your random seed immediately prior to performing cross validation to ensure reproducibility of your results.

glmer - predict with binomial data (cbind count data)

I am trying to predict values over time (Days in x axis) for a glmer model that was run on my binomial data. Total Alive and Total Dead are count data. This is my model, and the corresponding steps below.
full.model.dredge<-glmer(cbind(Total.Alive,Total.Dead)~(CO2.Treatment+Lime.Treatment+Day)^3+(Day|Container)+(1|index),
data=Survival.data,family="binomial")
We have accounted for overdispersion as you can see in the code (1:index).
We then use the dredge command to determine the best fitted models with the main effects (CO2.Treatment, Lime.Treatment, Day) and their corresponding interactions.
dredge.models<-dredge(full.model.dredge,trace=FALSE,rank="AICc")
Then made a workspace variable for them
my.dredge.models<-get.models(dredge.models)
We then conducted a model average to average the coefficients for the best fit models
silly<-model.avg(my.dredge.models,subset=delta<10)
But now I want to create a graph, with the Total Alive on the Y axis, and Days on the X axis, and a fitted line depending on the output of the model. I understand this is tricky because the model concatenated the Total.Alive and Total.Dead (see cbind(Total.Alive,Total.Dead) in the model.
When I try to run a predict command I get the error
# 9: In UseMethod("predict") :
# no applicable method for 'predict' applied to an object of class "mer"
Most of your problem is that you're using a pre-1.0 version of lme4, which doesn't have the predict method implemented. (Updating would be easiest, but I believe that if you can't for some reason, there's a recipe at http://glmm.wikidot.com/faq for doing the predictions by hand by extracting the fixed-effect design matrix and the coefficients ...)There's actually not a problem with the predictions, which predict the log-odds (by default) or the probability (if type="response"); if you wanted to predict numbers, you'd have to multiply by N appropriately.
You didn't give one, but here's a reproducible (albeit somewhat trivial) example using the built-in cbpp data set (I do get some warning messages -- no non-missing arguments to max; returning -Inf -- but I think this may be due to the fact that there's only one non-trivial fixed-effect parameter in the model?)
library(lme4)
packageVersion("lme4") ## 1.1.4, but this should work as long as >1.0.0
library(MuMIn)
It's convenient for later use (with ggplot) to add a variable for the proportion:
cbpp <- transform(cbpp,prop=incidence/size)
Fit the model (you could also use glmer(prop~..., weights=size, ...))
gm0 <- glmer(cbind(incidence, size - incidence) ~ period+(1|herd),
family = binomial, data = cbpp)
dredge.models<-dredge(gm0,trace=FALSE,rank="AICc")
my.dredge.models<-get.models(dredge.models)
silly<-model.avg(my.dredge.models,subset=delta<10)
Prediction does work:
predict(silly,type="response")
Creating a plot:
library(ggplot2)
theme_set(theme_bw()) ## cosmetic
g0 <- ggplot(cbpp,aes(period,prop))+
geom_point(alpha=0.5,aes(size=size))
Set up a prediction frame:
predframe <- data.frame(period=levels(cbpp$period))
Predict at the population level (ReForm=NA -- this may have to be REForm=NA in lme4 `1.0.5):
predframe$prop <- predict(gm0,newdata=predframe,type="response",ReForm=NA)
Add it to the graph:
g0 + geom_point(data=predframe,colour="red")+
geom_line(data=predframe,colour="red",aes(group=1))

Resources