I'm trying to make a predicted probability plot for a logit model, using clustered robust standard errors. Supposedly the margins package should let you do this, using cplot(), but there seems to be a bug, such that cplot() doesn't recognize the optional vcov input. Below is a minimum working example. Does anyone know how to fix the bug or do this another way?
require("margins")
require("sandwich")
##Generating random numbers
set.seed(10)
y<-factor(rbinom(n=1000,size=1,prob=.5))
x <- rnorm(n=1000, mean=100,sd=1)
z<- rbinom(n=1000,size=3,prob=.5)
#creating a "dataset"
dta<-data.frame(x,y,z)
##Basic logit model
model <-glm(y~x,family="binomial"(link="logit"),data=dta)
##Creating variance-covariance matrix, clustered by z
vcov <- vcovCL(model, cluster=z)
##Making a plot
cplot(model,"x",vcov=vcov,what="prediction")
#can see below that vcov has no effect (if not obvious from plot)
print(cplot(model,"x",vcov=vcov,what="prediction",draw=FALSE))
print(cplot(model,"x",what="prediction",draw=FALSE))
You could use the following code:
# Predict values
pred.dta <- ggeffects::ggpredict(
model=model,
terms="x [all]",
vcov.fun="vcovCL",
vcov.type="HC1",
vcov.args=list(cluster=z)
)
# Plot predictions
ggplot2::ggplot(data=pred.dta,
ggplot2::aes(x=x, y=predicted))+
ggplot2::geom_line()+
ggplot2::geom_errorbar(ggplot2::aes(ymin=conf.low, ymax=conf.high), width=.1)
For comparison, this is the same code but without the clustered errors:
# Predict values
pred.dta <- ggeffects::ggpredict(
model=model,
terms="x [all]" )
# Plot predictions
ggplot2::ggplot(data=pred.dta,
ggplot2::aes(x=x, y=predicted))+
ggplot2::geom_line()+
ggplot2::geom_errorbar(ggplot2::aes(ymin=conf.low, ymax=conf.high), width=.1)
Related
I am using metafor package for combining beta coefficients from the linear regression model. I used the following code. I supplied the reported se and beta values for the rma function. But, when I see the forest plot, the 95% confidence intervals are different from the ones reported in the studies. I also tried it using mtcars data set by running three models and combining the coefficients. Still, the 95%CI we see on the forest plot are different from the original models. The deviations are far from rounding errors. A reproducible example is below.
library(metafor)
library(dplyr)
lm1 <- lm(hp~mpg, data=mtcars[1:15,])
lm2 <- lm(hp~mpg, data=mtcars[1:32,])
lm3 <- lm(hp~mpg, data=mtcars[13:32,])
study <- c("study1", "study2", "study3")
beta_coef <- c(lm1$coefficients[2],
lm2$coefficients[2],
lm3$coefficients[2]) %>% as.numeric()
se <- c(1.856, 1.31,1.458)
ci_lower <- c(confint(lm1)[2,1],
confint(lm2)[2,1],
confint(lm3)[2,1]) %>% as.numeric()
ci_upper <- c(confint(lm1)[2,2],
confint(lm2)[2,2],
confint(lm3)[2,2]) %>% as.numeric()
df <- cbind(study=study,
beta_coef=beta_coef,
se=se,
ci_lower=ci_lower,
ci_upper=ci_upper) %>% as.data.frame()
pooled <- rma(yi=beta_coef, vi=se, slab=study)
forest(pooled)
Compare the confidence intervals on the forest plot with the one on the data frame.
data frame
df <- cbind(study=study,
beta_coef=beta_coef,
se=se,
ci_lower=ci_lower,
ci_upper=ci_upper) %>% as.data.frame()
Argument vi is for specifying the sampling variances, but you are passing the standard errors to the argument. So you should do:
pooled <- rma(yi=beta_coef, sei=se, slab=study)
But you will still find a discrepancy here, since the CIs in the forest plot are constructed based on a normal distribution, while the CIs you obtained from the regression model are based on t-distributions. If you want the exact same CIs in the forest plot, you could just pass the CI bounds to the function like this:
forest(beta_coef, ci.lb=ci_lower, ci.ub=ci_upper)
If you want to add a summary polygon from some meta-analysis to the forest plot, you can do this with addpoly(). So the complete code for this example would be:
forest(beta_coef, ci.lb=ci_lower, ci.ub=ci_upper, ylim=c(-1.5,6))
addpoly(pooled, row=-1)
abline(h=0)
I am using spgwr::ggwr() to fit generalized geographically weighted regression with Poisson model and log-link function. The results provide local coefficient estimates, but i am missing how to get their standard errors (or t statistics) to compute pseudo p-values.
Below is a toy example using SpatialEpi::NYleukemia dataset:
library(SpatialEpi)
library(spgwr)
## Load data
data(NYleukemia)
population <- NYleukemia$data$population
cases <- ceiling(NYleukemia$data$cases * 100)
centroids <- latlong2grid(NYleukemia$geo[, 2:3])
# data frame
nyleuk <- data.frame(centroids, cases, population)
# set coordinates as vector
coordny <- cbind(centroids[,1],centroids[,2])
# set a kernel bandwidth
bw <- 0.5
# fit ggwr()
m_pois <- ggwr(cases ~ offset(log(population)),
data = nyleuk, gweight = gwr.Gauss,
adapt = bw, family = poisson(link="log"),
type="working", coords = coordny)
# returns spatial point with coefficients
# but no standard errors :(
head(m_pois$SDF#data)
Is there any way i can get standard errors of the coefficients?
Thanks!
You may obtain standard errors from local coefficients running the function GWmodel::ggwr.basic. Function spgwr::ggwr() returns coefficients but no standard errors.
For a bit of background, I am using the nnet package building a simple neural network.
My dataset has a number of factor and continuous variable features. To handle the continuous variables I apply scale and center which minuses each by its mean and divides by its SD.
I'm trying to produce an ROC & AUC plot from the results of neural network model.
The below is the code used to build my basic neural network model:
model1 <- nnet(Cohort ~ .-Cohort,
data = train.sample,
size = 1)
To get some predictions, I call the following function:
train.predictions <- predict(model1, train.sample)
Now, this assigns the train.predictions object to a large matrix consisting of 0 & 1 values. What I want to do, is getting the class probabilities for each prediction so I can plot an ROC curve using the pROC package.
So, I tried adding the following parameter to my predict function:
train.predictions <- predict(model1, train.sample, type="prob")
But I get an error:
Error in match.arg(type) : 'arg' should be one of “raw”, “class”
How can I go about getting class probabilities from outputs?
Assuming your test/validation data set is in train.test, and train.labels contains the true class labels:
train.predictions <- predict(model1, train.test, type="raw")
## This might not be necessary:
detach(package:nnet,unload = T)
library(ROCR)
## train.labels:= A vector, matrix, list, or data frame containing the true
## class labels. Must have the same dimensions as 'predictions'.
## computing a simple ROC curve (x-axis: fpr, y-axis: tpr)
pred = prediction(train.predictions, train.labels)
perf = performance(pred, "tpr", "fpr")
plot(perf, lwd=2, col="blue", main="ROC - Title")
abline(a=0, b=1)
When using predict() on an object returned by glmmPQL (MASS package in R), I appear not to be able to return the standard errors. Here's a representative example of my workflow with some dummy data:
#### simulate some representative data
set.seed(1986)
dep <- rbinom(200, 1, .5) # dependent binomial variable
set.seed(1987)
ind <- rnorm(200) # Gaussian independent variable
set.seed(1988)
ran <- rep(1:5, 40) # random factor
##### use PQL to run binomial GLMM
anTest <- glmmPQL(dep~ind, random=~1|ran, family=binomial)
##### specify values of *ind* at which to predict. expand.grid() is overkill here...
newData <- expand.grid(ind=seq(min(ind), max(ind), length.out=100))
#### and generate predictions
pred <- predict(anTest, newdata=newData, type="response", level=0, se.fit=TRUE)
(newData <- data.frame(newData, fit=pred))
However, as you can see, even though se.fit is set to TRUE, only predictions are generated by the function. What's going on here? I've tried doing this with simulated Poisson and Gaussian data and I still get no standard errors. Help please!
I'm running R Studio v 0.98.490 on Apple OSX v. 10.9.1
Hey out there how can I can I export a table of the results used to make the chart I generated for this linear regression model below.
d <- data.frame(x=c(200110,86933,104429,240752,255332,75998,
204302,97321,342812,220522,110990,259706,65733),
y=c(200000,110000,165363,225362,313284,113972,
137449,113106,409020,261733,171300,344437,89000))
lm1 <- lm(y~x,data=d)
p_conf1 <- predict(lm1,interval="confidence")
nd <- data.frame(x=seq(0,80000,length=510000))
p_conf2 <- predict(lm1,interval="confidence",newdata=nd)
plot(y~x,data=d,ylim=c(-21750,600000),xlim=c(0,600000)) ## data
abline(lm1) ## fit
matlines(d$x,p_conf1[,c("lwr","upr")],col=2,lty=1,type="b",pch="+")
matlines(nd$x,p_conf2[,c("lwr","upr")],col=4,lty=1,type="b",pch="+")
Still not entirely sure what you want but this would seem to be reasonable:
dat1 <- data.frame(d,p_conf1)
dat2 <- data.frame(nd,y=NA,p_conf2)
write.csv(rbind(dat1,dat2),file="linpredout.csv")
It includes x, y (equal to the observation or NA for non-observed points), the predicted value fit, and lwr/upr bounds.
edit: fix typo.
This will return a matrix that has some of the information needed to construct the confidence intervals:
> coef(summary(lm1))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 21749.037058 2.665203e+04 0.8160369 4.317954e-01
x 1.046954 1.374353e-01 7.6177997 1.037175e-05
Any text on linear regression should have the formula for the confidence interval. You may need to calculate some ancillary quantities dependent on which formula you're using. The code for predict is visible ... just type at the console :
predict.lm
And don't forget that confidence intervals are different than prediction intervals.