Calculating confusion matrix for fixed effect logit - r

I would like to ask how to calculate a confusion matrix for a fixed effect logit model (bife package)
With the basic logit model (glm) there is no problem, but with fixed effect logit there is.
For some reason the number of predictions is different for logit and fixed effect logit.
Example:
library(bife)
library(tidyverse)
library(caret)
dataset <- psid
logit <- glm(LFP ~ AGE + I(AGE^2) + log(INCH) + KID1 + KID2 + KID3, data = dataset, family = "binomial")
mod <- bife(LFP ~AGE + I(AGE^2) + log(INCH) + KID1 + KID2 + KID3| ID, dataset)
summary(mod)
summary(logit)
predict(logit)
predict(mod)
Y <- factor(dataset$LFP)
PRE <- factor(round(predict(logit, type = "response")))
PRE_FIX <- factor(round(predict(mod, type = "response")))
confusionMatrix(Y, PRE)
# Not working
confusionMatrix(Y, PRE_FIX)

It is possible to compute the confusionMatrix:
confusionMatrix<-table(true=Y , pred = round(fitted(PRE_FIX)))
And then, convert the confusion matrix to a table shape.
0 1
0 TruePositive FalseNegative
1 FalsePositive TrueNegative

Related

Different results between kriging on residuals and universal kriging (R)

I have a dataset with 132 observations, 1 response variable (Lopend_gemiddelde), and 9 predictors on which I perform three methods: 1) multiple linear regression (MLR), 2) kriging on the residuals (based on a multiple linear regression), and 3) universal kriging. The results for the methods differ slightly in terms of performance (R2, RMSE, MAE). As an example, the performances for the three methods, based on a 20-fold cross validation are:
Performance criterion
Multiple linear regression
Kriging on residuals
Universal kriging
R2
0.337
0.323
0.333
RMSE
7.585
7.718
7.615
MAE
6.118
6.170
6.084
My two questions related to these results:
Why does the addition of kriged residuals to the MLR results in poorer performance (in terms of R2, RMSE, and MAE), compared to only the MLR?
What causes the difference in model performance (in terms of R2, RMSE, and MAE) between the kriging on residuals on the one hand, and the universal kriging on the other hand?
My related code for the first method (MLR) is displayed below. test$predicted is used to calculate the R2, RMSE, and MAE.
## == 1) multiple linear regression == ##
#multiple linear regression, input data is trained dataset (75% of original)
model_train = lm(Lopend_gemiddelde ~ 1 + nightlight_450 + nightlight_4950 + population_3000 + road_class_1_5000 + road_class_2_1000 + road_class_2_5000 + road_class_3_100 + road_class_3_300 + trafBuf50 , data=train)
#predict NO2 by the trained model (linear)
test$predicted = predict(model_train, test)
The related code for the second method (kriging on residuals) is displayed below (inspired by this post). The kriged residuals are added to the prediction values that derive from the multiple linear regression method (i.e. method 1). I project my kriging on a 100m resolution grid (grid_sp). predicted_model$PredAddedKrigedResi is used to calculate the R2, RMSE, and MAE.
## == 2) kriging on the residuals == ##
train_df = data.frame(train)
data_model_train <- data.frame(x = train_df$coords.x1, y = train_df$coords.x2,resid=resid(model_train))
#fitting the variogram (autofit)
coordinates(data_model_train) = ~x+y
variogram_train = autofitVariogram(resid ~ 1, data_model_train)
#create dataset including variogram parameters
autofit_params <- variogram_train$var_model
#use variogram with autofitted parameters
lz.ok_train_resid <- krige(resid ~ 1, data_model_train, grid_sp, autofit_params)
#convert to raster
raster_train_resid <- raster(lz.ok_train_resid['var1.pred'])
#predict NO2 by the trained model (linear)
test$predicted = predict(model_train, test)
#spatially join predicted values by trained model with corresponding kriged residuals.
predicted_model = raster::extract(raster_train_resid, test, sp=T) #sp = T: keep all data
#add kriged residuals to the predicted values by the trained model
predicted_model$PredAddedKrigedResi <- predicted_model$predicted + predicted_model$var1.pred
The related code for the universal kriging method is displayed below. I project my kriging on a 100m resolution grid (grid100) that, this time, includes information on the 9 predictions to account for trends. predicted_model$predictedUK is used to calculate the R2, RMSE, and MAE.
## == 3) universal kriging == ##
variogram_uk = autofitVariogram(Lopend_gemiddelde ~ 1 + nightlight_450 + nightlight_4950 + population_3000 + road_class_1_5000 + road_class_2_1000 + road_class_2_5000 + road_class_3_100 + road_class_3_300 + trafBuf50, input_data = train)
autofit_params_uk <- variogram_uk$var_model
system.time(parallelX <- parLapply(cl = cl, X = 1:no_cores, fun = function(x) krige(formula = Lopend_gemiddelde ~ 1 + nightlight_450 + nightlight_4950 + population_3000 + road_class_1_5000 + road_class_2_1000 + road_class_2_5000 + road_class_3_100 + road_class_3_300 + trafBuf50, locations = train, newdata = grid100[parts[[x]],], model = autofit_params_uk)))
# Create SpatialPixelsDataFrame from mergeParallelX
mergeParallelX <- SpatialPixelsDataFrame(points = mergeParallelX, data = mergeParallelX#data)
#convert to raster
raster_uk <- raster(mergeParallelX["var1.pred"])
#spatially join predicted values by trained model with corresponding kriged residuals.
predicted_model = raster::extract(raster_uk, test, sp=T) #sp = T: keep all data
#rename
predicted_model <- predicted_model %>% rename(predictedUK = "var1.pred")

Extracting linear predictors from general linear model for panel data ( pglm )

I would like to extract linear predictors out of a pglm model.
With a basic glm model, it can be done simply using the command $linear.predictors
#For example
library(plm)
library(pglm)
data(UnionWage) # from pglm-package
punions <- pdata.frame(UnionWage, c("id", "year"))
punions <- subset(punions, wage > 0)
glm.model <- glm(wage ~ exper + rural + married, data=punions, family = "poisson")
glm.model$linear.predictors
How can I extract or compute linear predictors from a pglm model like
pglm.model <- pglm(wage ~ exper + rural + married, data=punions, model="random", family="poisson")
I hope anybody can help!

lmer multilevel fit with intercept constraint

I regularly have this problem: I want to fit a multilevel regression, with constraint. I don't know how to do that. I usualy end up using lavaan, as it allows to set constraints on the regression coefficients. But still it can't have random slope models (only random intercept, and truth is I don't know how to set a constraint on the intercept in lavaan either), and I would like to have a multilevel approach.
So basically I have y variable having a second order polynomial dependence on x, with coefficients depending on the subject ID:
library(data.table)
library(ggplot2)
df <- data.table(x = rep(0:10,5),ID = rep(LETTERS[1:5],each = 11))
df[,a:= rnorm(1,2,1),by = ID]
df[,b:= rnorm(1,1,0.2),by = ID]
df[,y := rnorm(.N,0,10) + a*x + b*x^2 ]
ggplot(df,aes(x,y,color = ID))+
geom_point()
and I can do normall multilevel:
lmer(y ~ x + I(x^2) + (x+ I(x^2)|ID),df)
But I would like to constrain the intercept to be 0. Is there a simple way to do so ?
Thank you
You can suppress the intercept with -1. For example:
coef(summary(lmer(y ~ x + I(x^2) + (x+ I(x^2)|ID),df)))
Estimate Std. Error t value
(Intercept) -1.960196 4.094491 -0.4787398
x 2.535092 1.754963 1.4445275
I(x^2) 1.015212 0.130004 7.8090889
coef(summary(lmer(y ~ -1 + x + I(x^2) + (x+ I(x^2)|ID),df)))
Estimate Std. Error t value
x 1.831692 0.9780500 1.872800
I(x^2) 1.050261 0.1097583 9.568856

Calculating working residuals of a Gamma GLM model

I am trying to calculate the working residuals of a Gamma GLM model. I'm doing this manually because I want to calculate the partial residuals step-by-step. My model and its coefficients and predictions are described below:
library(datasets)
data(mtcars)
model <- glm(mpg ~ cyl + disp + hp, data=mtcars, family=Gamma)
coefs <- coef(model)
pred <- coefs[1] + coefs[2]*mtcars$cyl + coefs[3]*mtcars$disp + coefs[4]*mtcars$hp
I tried to calculate the working residuals by applying the formula (value - fitted.value)/fitted.value , which works fine for a Poisson glm. However, it didn't work for Gamma since the values differ from those I generate using the function resid():
(mtcars$mpg - (-pred^(-1)))/-pred^(-1))
resid(model, type="working")
Does anybody know how to estimate such working residuals to then calculate the partial residuals?
The working residuals are just model$residuals. See ?glm
## setup
library(datasets)
data(mtcars)
model <- glm(mpg ~ cyl + disp + hp, data = mtcars, family = Gamma)
## family info
oo <- Gamma(link = "inverse")
## compute linear predictor manually (assuming no model offset)
coefs <- coef(model)
eta <- coefs[1] + coefs[2] * mtcars$cyl + coefs[3] * mtcars$disp +
coefs[4] * mtcars$hp
## compute working residuals
resi_working <- (mtcars$mpg - oo$linkinv(eta)) / oo$mu.eta(eta)
## validation
range(resi_working - model$residuals)
#[1] 0 0

Population-level prediction from bam {mgcv}

Using bam, I made a logistic mixed model with the following form:
PresAbs ~ s(Var 1) + s(Var 2) + ... + s(Var n) + s(RandomVar, bs = "re")
The RandomVar is a factor and I am not interested in the predictions for each of its level. How can I obtain population-level prediction, comparable to predict.lme?
One way is just exclude the random effect spline from the predictions.
Using the example from ?gam.models
library("mgcv")
dat <- gamSim(1,n=400,scale=2) ## simulate 4 term additive truth
## Now add some random effects to the simulation. Response is
## grouped into one of 20 groups by `fac' and each groups has a
## random effect added....
fac <- as.factor(sample(1:20,400,replace=TRUE))
dat$X <- model.matrix(~fac-1)
b <- rnorm(20)*.5
dat$y <- dat$y + dat$X%*%b
m1 <- gam(y ~ s(fac,bs="re")+s(x0)+s(x1)+s(x2)+s(x3),data=dat,method="ML")
we want to exclude the term s(fac) as it is written in the output from
summary(m1)
For the observed data, population effects are
predict(m1, exclude = 's(fac)')
but you can supply newdata to generate predictions for other combinations of the covariates.

Resources