plotting semivariograms with non-nlme package models - r

I am trying to plot a semivariogram of my model residuals for a generalised mixed effect model in R. Doing this for a mixed effect model with normal distribution is straightforward with the nlme package, and using the quakes dataset as an example.
library(nlme)
data(quakes)
head(quakes)
model1 <- lme(mag ~ depth , random = ~1|stations, data = quakes)
summary(model1)
semivario <- Variogram(model1, form = ~long+lat,resType = "normalized")
plot(semivario, smooth = TRUE)
I want to create a model with a non-normal distribution, which I can't do with nlme, so I have tried glmer and glmmPQL. I have turned the 'mag' into a binomial variable, then try to reapply the Variogram function to make a plot with models.
quakes$thresh <- ifelse(quakes$mag > "5", 0, 1)
library(MASS)
model2 <- glmmPQL(as.factor(thresh) ~ depth , random = ~1|stations, family = binomial, data = quakes)
summary(model2)
semivario <- Variogram(model2, form = ~long+lat,resType = "normalized")
plot(semivario, smooth = TRUE)
library(lme4)
model3 <-glmer(as.factor(thresh) ~ depth + (1|stations), data = quakes, family = binomial)
summary(model3)
semivario <- Variogram(model3, form = ~long+lat,resType = "normalized")
plot(semivario, smooth = TRUE)
Neither of these appear to work for plotting the variogram. The glmmPQL says that lat and long isn't found, and the glmer says distance isn't specified.
How can I code a plot of semivariogram of these models? Is the Variogram function from the nlme package unusable for them? And if so what alternatives can I use?

Related

How to plot the roc of a glm model with multiple terms in R?

I have a glm model with multiple terms. I need to plot the roc and find the auc. I tried using roc() and multiclass.roc() but get Error in plot.new() : figure margins too large.
library(AER)
data("Affairs")
str(Affairs)
Affairs$affairs <- as.factor(Affairs$affairs)
m3 <- glm( affairs ~ gender+age+yearsmarried+religiousness+rating, family =
binomial, data = Affairs)
honk <- roc(affairs ~ gender+age+yearsmarried+religiousness+rating, data = Affairs)
plot(honk)
honk$auc

Refit an arima model with new training data in fable package

I have a function which takes a fitted model and then refits that model to new training data (this is for step-ahead cross validation). For lm models it works like this:
#create data
training_data <-
data.frame(date = seq.Date(
from = as.Date("2020-01-01"),
by = 1, length.out = 365
), x = 1:365, y = 1:365 + rnorm(n = 365))
# specify and fit model
lm_formula <- as.formula(y ~ x)
my_lm <- lm(lm_formula, data = training_data)
# refit on new training data
update(my_lm, data = new_training_data)
Is there a way to do the same thing for arima models fitted with the fable package? I'm creating the models like this
library(fable)
library(forecast)
arima_formula <- as.formula(y ~ x + PDQ(0, 0, 0))
my_arima <- as_tsibble(training_data) %>% model(ARIMA(arima_formula))
But I can't figure out a way to take the my_arima model that I've already fitted and pass it new_training_data, either using update or by extracting the formula and refitting as a new model. Note that although I've included the model formula in the reprex above, my function takes a fitted model rather than a formula. So just fitting a new model using arima_formula is not an option.
Thank you.

Is it normal to get the same Goodness of Fit values for a logit model and an LPM model, based on the same data?

I am using the Stata dataset ANES.dta with the information about the 2000 presidential election in the USA. I build two models on this dataset - one logit and one LPM. I want to compare the two models with each other using the following Goodness of fit measures - accuracy, sensitivity and specificity of the both models.
I am new to R, I have mainly used STATA so far and that's why I'm wondering if it is normal to get absolutely the same values in confusion matrices for a logit model and a LPM model, based on the same data? Am I doing something wrong?
rm(list=ls())
library(foreign)
dat <- read.dta("ANES.dta", convert.factors = FALSE)
dat_clear <- na.omit(dat)
head(dat_clear)
#Logit model
m1_logit <- glm(vote ~ gender + income + pro_choice ,
data = dat_clear, family = binomial(link = "logit") ,
na.action = na.omit)
summary(m1_logit)
#LPM
m2_lpm <- lm(vote ~ gender + income + pro_choice,
data = dat_clear, na.action = na.omit)
summary(m2_lpm)
#Confusion matrix for logit model
dat_clear$prediction_log <- predict(m1_logit, newdata = dat_clear, type = "response")
dat_clear$vote_pred_log <- as.numeric(dat_clear$prediction_log > .5)
table(observed = dat_clear$vote, predicted = dat_clear$vote_pred_log)
#Confusion matrix for LPM model
dat_clear$prediction_lpm <- predict(m2_lpm, newdata = dat_clear, type = "response")
dat_clear$vote_pred_lpm <- as.numeric(dat_clear$prediction_lpm > .5)
table(observed = dat_clear$vote, predicted = dat_clear$vote_pred_lpm)
This is what the confusion matrices look like

Getting estimated means after multiple imputation using the mitml, nlme & geepack R packages

I'm running multilevel multiple imputation through the package mitml (using the panimpute() function) and am fitting linear mixed models and marginal models through the packages nlme and geepack and the mitml:with() function.
I can get the estimates, p-values etc for those through the testEstimates() function but I'm also looking to get estimated means across my model predictors. I've tried the emmeans package, which I normally use for getting estimated means when running nlme & geepack without multiple imputation but doing so emmeans tell me "Can't handle an object of class “mitml.result”".
I'm wondering is there a way to get pooled estimated means from the multiple imputation analyses I've run?
The data frames I'm analyzing are longitudinal/repeated measures and in long format. In the linear mixed model I want to get the estimated means for a 2x2 interaction effect and in the marginal model I'm trying to get estimated means for the 6 levels of 'time' variable. The outcome in all models is continuous.
Here's my code
# mixed model
fml <- Dep + time ~ 1 + (1|id)
imp <- panImpute(data=Data, formula=fml, n.burn=50000, n.iter=5000, m=100, group = "treatment")
summary(imp)
plot(imp, trace="all")
implist <- mitmlComplete(imp, "all", force.list = TRUE)
fit <- with(implist, lme(Dep ~ time*treatment, random = ~ 1|id, method = "ML", na.action = na.exclude, control = list(opt = "optim")))
testEstimates(fit, var.comp = TRUE)
confint.mitml.testEstimates(testEstimates(fit, var.comp = TRUE))
# marginal model
fml <- Dep + time ~ 1 + (1|id)
imp <- panImpute(data=Data, formula=fml, n.burn=50000, n.iter=5000, m=100)
summary(imp)
plot(imp, trace="all")
implist <- mitmlComplete(imp, "all", force.list = TRUE)
fit <- with(implist, geeglm(Dep ~ time, id = id, corstr ="unstructured"))
testEstimates(fit, var.comp = TRUE)
confint.mitml.testEstimates(testEstimates(fit, var.comp = TRUE))
is there a way to get pooled estimated means from the multiple imputation analyses I've run?
This is not a reprex without Data, so I can't verify this works for you. But emmeans provides support for mira-class (lists of) models in the mice package. So if you fit your model in with() using the mids rather than mitml.list class object, then you can use that to obtain marginal means of your outcome (and any contrasts or pairwise comparisons afterward).
Using example data found here, which uncomfortably loads an external workspace:
con <- url("https://www.gerkovink.com/mimp/popular.RData")
load(con)
## imputation
library(mice)
ini <- mice(popNCR, maxit = 0)
meth <- ini$meth
meth[c(3, 5, 6, 7)] <- "norm"
pred <- ini$pred
pred[, "pupil"] <- 0
imp <- mice(popNCR, meth = meth, pred = pred, print = FALSE)
## analysis
library(lme4) # fit multilevel model
mod <- with(imp, lmer(popular ~ sex + (1|class)))
library(emmeans) # obtain pooled estimates of means
(em <- emmeans(mod, specs = ~ sex) )
pairs(em) # test comparison

Stepwise Regression with ROC

I am learning data science with R on DataCamp. In one exercise, I have to build a stepwise regression model. Even though I create the stepwise model successfully, roc() function doesn't accept the response and it gives an error like: "'response' has more than two levels. Consider setting 'levels' explicitly or using 'multiclass.roc' instead"
I want to learn how to handle this problem so I wrote my code below.
# Specify a null model with no predictors
null_model <- glm(donated ~ 1, data = donors, family = "binomial")
# Specify the full model using all of the potential predictors
full_model <- glm(donated ~ ., data = donors, family = "binomial")
# Use a forward stepwise algorithm to build a parsimonious model
step_model <- step(null_model, scope = list(lower = null_model, upper = full_model), direction = "forward")
# Estimate the stepwise donation probability
step_prob <- predict(step_model, type = "response")
# Plot the ROC of the stepwise model
library(pROC)
ROC <- roc( step_prob, donors$donated)
plot(ROC, col = "red")
auc(ROC)
I changed the order of roc function's argument and the error was solved.
library(pROC)
ROC <- roc( donors$donated, step_prob)
plot(ROC, col = "red")
auc(ROC)

Resources