Plotting binomial glm with interactions in numeric variables - r

I want to know if is possible to plotting binomial glm with interactions in numeric variables. In my case:
##Data set artificial
set.seed(20)
d <- data.frame(
mating=sample(0:1, 200, replace=T),
behv = scale(rpois(200,10)),
condition = scale(rnorm(200,5))
)
#Binomial GLM ajusted
model<-glm(mating ~ behv + condition, data=d, family=binomial)
summary(model)
In a situation where behv and condition are significant in the model
#Plotting first for behv
x<-d$behv ###Take behv values
x2<-rep(mean(d$condition),length(d_p[,1])) ##Fixed mean condition
# Points
plot(d$mating~d$behv)
#Curve
curve(exp(model$coefficients[1]+model$coefficients[2]*x+model$coefficients[3]*x2)
/(1+exp(model$coefficients[1]+model$coefficients[2]*x+model$coefficients[3]*x2)))
But doesn't work!! There is another correct approach?
Thanks

It seems like your desired output is a plot of the conditional means (or best-fit line). You can do this by computing predicted values with the predict function.
I'm going to change your example a bit, to get a nicer looking result.
d$mating <- ifelse(d$behv > 0, rbinom(200, 1, .8), rbinom(200, 1, .2))
model <- glm(mating ~ behv + condition, data = d, family = binomial)
summary(model)
Now, we make a newdata dataframe with your desired values:
newdata <- d
newdata$condition <- mean(newdata$condition)
newdata$yhat <- predict(model, newdata, type = "response")
Finally, we sort newdata by the x-axis variable (if not, we'll get lines that zig-zag all over the plot), and then plot:
newdata <- newdata[order(newdata$behv), ]
plot(newdata$mating ~ newdata$behv)
lines(x = newdata$behv, y = newdata$yhat)
Output:

Related

How to predict gam model with random effect in R?

I am working on predicting gam model with random effect to produce 3D surface plot by plot_ly.
Here is my code;
x <- runif(100)
y <- runif(100)
z <- x^2 + y + rnorm(100)
r <- rep(1,times=100) # random effect
r[51:100] <- 2 # replace 1 into 2, making two groups
df <- data.frame(x, y, z, r)
gam_fit <- gam(z ~ s(x) + s(y) + s(r,bs="re"), data = df) # fit
#create matrix data for `add_surface` function in `plot_ly`
newx <- seq(0, 1, len=20)
newy <- seq(0, 1, len=30)
newxy <- expand.grid(x = newx, y = newy)
z <- matrix(predict(gam_fit, newdata = newxy), 20, 30) # predict data as matrix
However, the last line results in error;
Error in model.frame.default(ff, data = newdata, na.action = na.act) :
variable lengths differ (found for 'r')
In addition: Warning message:
In predict.gam(gam_fit, newdata = newxy) :
not all required variables have been supplied in newdata!
Thanks to the previous answer, I am sure that above codes work without random effect, as in here.
How can I predict gam models with random effect?
Assuming you want the surface conditional upon the random effects (but not for a specific level of the random effect), there are two ways.
The first is to provide a level for the random effect but exclude that term from the predicted values using the exclude argument to predict.gam(). The second is to again use exclude but this time to not provide any data for the random effect and instead stop predict.gam() from checking the newdata using the argument newdata.guaranteed = TRUE.
Option 1:
newxy1 <- with(df, expand.grid(x = newx, y = newy, r = 2))
z1 <- predict(gam_fit, newdata = newxy1, exclude = 's(r)')
z1 <- matrix(z1, 20, 30)
Option 2:
z2 <- predict(gam_fit, newdata = newxy, exclude = 's(r)',
newdata.guaranteed=TRUE)
z2 <- matrix(z2, 20, 30)
These produce the same result:
> all.equal(z1, z2)
[1] TRUE
A couple of notes:
Which you use will depend on how complex the rest of you model is. I would generally use the first option as it provides an extra check against me doing something stupid when creating the data. But in this instance, with a simple model and set of covariates it seems safe enough to trust that newdata is OK.
Your example uses a random slope (was that intended?), not a random intercept as r is not a factor. If your real example uses a factor random effect then you'll need to be a little more careful when creating the newdata as you need to get the levels of the factor right. For example:
expand.grid(x = newx, y = newy,
r = with(df, factor(2, levels = levels(r))))
should get the right set-up for a factor r

Logistic Regression's ROC Goes Abnormal

Currently, I'm learning about logistic regression and LDA (Linear Discriminant Analysis) classification. I'm trying to generate the data differently to learn logistic regression and LDA behavior.
Here is the data visualization of 2-dimensional predictors with class plotted as color:
Here is my code:
library(ggplot2)
library(MASS)
set.seed(1)
a <- mvrnorm(n = 1000, mu = c(0,0), Sigma = matrix(c(0.4,0,0,0.4), nrow = 2, ncol = 2))
b <- mvrnorm(n = 1000, mu = c(0,0), Sigma = matrix(c(10,0,0,10), nrow = 2, ncol =2 ))
#I want to make sure b1 separated from a
b1 <- b[sqrt(b[,1]^2 + b[,2]^2) > 4,]
df <- as.data.frame(rbind(a,b1))
names(df) <- c('x','y')
labelA <- rep('A', nrow(a))
labelB <- rep('B', nrow(b1))
#Put the label column to the data frame
df$labs <- c(labelA,labelB)
ggplot(df, aes(x = x, y = y, col = labs)) + geom_point()
prd <- glm(as.factor(labs) ~ x + y, family = binomial('probit'), data = df)
prd_score <- predict(prd, type = 'response')
plot(roc(df$labs,prd_score))
auc(roc(df$labs,prd_score))
And this is the roc curve plot
It's really frustrating because I couldn't find any mistake in my code that generates this kind of problem. Can anyone help me to point out any mistake in my code that generates this weird kind of ROC or any explanation on why the ROC could become weird like that?
NB: Please assume that the generated data set above is the training data and I want to predict the training data again.
There is no mistake in your code.
Your dataset is a typical example that cannot be separated with a linear combination of features. Therefore linear classification method such as logistic regression or LDA won't help you here. This is why your ROC curve looks "weird", but it's totally normal and only telling you that your model fails to separate the data.
You need to investigate non-linear classification techniques. Given the radial distribution of the data, I can imagine that support vector machines (SVM) with a radial basis kernel could do the trick.
require(e1071)
# We need a numeric label for SVM regression
labelA <- rep(0, nrow(a))
labelB <- rep(1, nrow(b1))
df$labsNum <- c(labelA,labelB)
# We create a radial basis model
svm_prd <- svm(labsNum ~ x + y, data = df, kernel = "radial", type = "eps-regression")
svm_score <- predict(svm_prd)
plot(roc(df$labs,prd_score))
auc(roc(df$labs,prd_score))

Plotting quadratic curves with poisson glm with interactions in categorical/numeric variables

I want to know if it's possible to plot quadratic curves with Poisson glm with interactions in categorical/numeric variables. In my case:
##Data set artificial
set.seed(20)
d <- data.frame(
behv = c(rpois(100,10),rpois(100,100)),
mating=sort(rep(c("T1","T2"), 200)),
condition = scale(rnorm(200,5))
)
#Condition quadratic
d$condition2<-(d$condition)^2
#Binomial GLM ajusted
md<-glm(behv ~ mating + condition + condition2, data=d, family=poisson)
summary(md)
In a situation where mating, condition and condition2 are significant in the model, I make:
#Create x's vaiues
x<-d$condition##
x2<-(d$condition)^2
# T1 estimation
y1<-exp(md$coefficients[1]+md$coefficients[3]*x+md$coefficients[4]*x2)
#
# T2 estimation
y2<-exp(md$coefficients[1]+md$coefficients[2]+md$coefficients[3]*x+md$coefficients[4]*x2)
#
#
#Separete data set
d_T1<-d[d[,2]!="T2",]
d_T2<-d[d[,2]!="T1",]
#Plot
plot(d_T1$condition,d_T1$behv,main="", xlab="condition", ylab="behv",
xlim=c(-4,3), ylim=c(0,200), col= "black")
points(d_T2$condition,d_T2$behv, col="gray")
lines(x,y1,col="black")
lines(x,y2,col="grey")
#
Doesn't work and I don't have my desirable curves. I'd like a curve for T1 and other for T2 in mating variable. There are any solution for this?
In the code below, we use the poly function to generate a quadratic model without needing to create an extra column in the data frame. In addition, we create a prediction data frame to generate model predictions across the range of condition values and for each level of mating. The predict function with type="response" generates predictions on the scale of the outcome, rather than on the linear predictor scale, which is the default. Also, we change 200 to 100 in creating the data for mating in order to avoid having the exact same outcome data for each level of mating.
library(ggplot2)
# Fake data
set.seed(20)
d <- data.frame(
behv = c(rpois(100,10),rpois(100,100)),
mating=sort(rep(c("T1","T2"), 100)), # Changed from 200 to 100
condition = scale(rnorm(200,5))
)
# Model with quadratic condition
md <- glm(behv ~ mating + poly(condition, 2, raw=TRUE), data=d, family=poisson)
#summary(md)
# Get predictions at range of condition values
pred.data = data.frame(condition = rep(seq(min(d$condition), max(d$condition), length=50), 2),
mating = rep(c("T1","T2"), each=50))
pred.data$behv = predict(md, newdata=pred.data, type="response")
Now plot with ggplot2 and with base R:
ggplot(d, aes(condition, behv, colour=mating)) +
geom_point() +
geom_line(data=pred.data)
plot(NULL, xlim=range(d$condition), ylim=range(d$behv),
xlab="Condition", ylab="behv")
with(subset(d, mating=="T1"), points(condition, behv, col="red"))
with(subset(d, mating=="T2"), points(condition, behv, col="blue"))
with(subset(pred.data, mating=="T1"), lines(condition, behv, col="red"))
with(subset(pred.data, mating=="T2"), lines(condition, behv, col="blue"))
legend(-3, 70, title="Mating", legend=c("T1","T2"), pch=1, col=c("blue", "red"))

Plotting two curves with poisson glm with interactions in categorical variable

I want to know if is possible to plotting two curves with poisson glm with interactions in categorical variables. In my case:
##Data set artificial
set.seed(20)
d <- data.frame(
behv = c(rpois(100,10),rpois(100,100)),
mating=sort(rep(c("T1","T2"), 200)),
condition = scale(rnorm(200,5))
)
#Binomial GLM ajusted
model<-glm(behv ~ mating + condition, data=d, family=poisson)
summary(model)
In a situation where mating (categorical) and condition (numeric) are significant in the model
newdata <- d
newdata$condition <- mean(d$condition)
newdata$yhat <- predict(model, newdata, type = "response")
newdata <- newdata[order(newdata$mating),]
plot(newdata$behv~newdata$condition,ylab=c("behv"),
xlab=c("condition"),xlim=c(-3,3))
lines(x = newdata$mating, y = newdata$yhat)
Doesn't work and I've like a curve for T1 and other curve for T2 in mating variable and
changes is given by condition variable in the plot. I will try to use coefficients selection for each level of mating variable, but doesn't work too. Any ideas?

coefplot in R with parts of independent variables

I would get a coefplot only with part of independent variables. My regression equation is a fixed effects regression as follows:
aa1 <-glm(Eighty_Twenty ~ Market_Share_H+Market_Share_L+Purchase_Frequency_H+Purchase_Frequency_L+factor(product_group))
coefplot(aa1)
However, I do NOT want to plot coefficients of factor(product_group) variables since there are product groups. Instead, I would get a coefplot with only the coefficients of other variables. How can I do this?
From the help pages (see ?coefplot.default) you can select what predictors or coefficients that you want in your plot.
# some example data
df <- data.frame(Eighty_Twenty = rbinom(100,1,0.5),
Market_Share_H = runif(100),
Market_Share_L = runif(100),
Purchase_Frequency_H = rpois(100, 40),
Purchase_Frequency_L = rpois(100, 40),
product_group = sample(letters[1:3], 100, TRUE))
# model
aa1 <- glm(Eighty_Twenty ~ Market_Share_H+Market_Share_L +
Purchase_Frequency_H + Purchase_Frequency_L +
factor(product_group), df, family="binomial")
library(coefplot)
# coefficient plot with the intercept
coefplot(aa1, coefficients=c("(Intercept)","Market_Share_H","Market_Share_L",
"Purchase_Frequency_H","Purchase_Frequency_L"))
# coefficient plot specifying predictors (no intercept)
coefplot(aa1, predictors=c("Market_Share_H","Market_Share_L" ,
"Purchase_Frequency_H","Purchase_Frequency_L"))

Resources