Plotting multiple lift curves - r

I am new to R and trying to learn. I am trying to plot lift curves of multiple classifiers in one graph. I can't figure out a way to do it. I know the below two classifiers are essentially the same but they both give different graphs and I just want to combine the two. Below is the code I tried. Could someone please point me in the right direction
fullmod = glm(Response ~ page_views_90d+win_visits+osx_visits+mc_1+mc_2+mc_3+mc_4+mc_5+mc_6+store_page+orders+orderlines+bookings+purchase, data=training, family=binomial)
summary(fullmod)
fullmod.results <- predict(fullmod, newdata = testing, type='response')
plotLift(fitted.results, test_data_full$class, cumulative = TRUE,col="orange", n.buckets = 5)
redmod1 = glm(Response ~ win_visits+osx_visits+mc_2+mc_4+mc_6+store_page+orders+orderlines+bookings+purchase, data=training, family=binomial)
redmod1.results <- predict(redmod1, newdata = testing, type = 'response')
plotLift(redmod1.results, test_data_full$class, cumulative = TRUE,col="orange", n.buckets = 5)
# Attempt to plot multiple classifiers
plotLift((redmod1.results, fullmod.results), test_data_full$class, cumulative = TRUE,col="orange", n.buckets = 5)

Here is a way to plot multiple lift curves using the caret library. But first some data:
set.seed(1)
for_lift <- data.frame(Class = factor(rep(1:2, each = 50)),
model1 = sort(runif(100), decreasing = TRUE),
model2 = runif(100),
model3 = runif(100))
Here the Class column is the real classes
model1 is the predicted probabilities by the first model and so on.
Now create a lift object from the data using:
library(caret)
lift_curve <- lift(Class ~ model1 + model2, data = for_lift)
and plot it
xyplot(lift_curve, auto.key = list(columns = 3))
If you would like to plot with ggplot:
library(ggplot2)
ggplot(lift_curve$data)+
geom_line(aes(CumTestedPct, CumEventPct, color = liftModelVar))+
xlab("% Samples tested")+
ylab("% Samples found")+
scale_color_discrete(guide = guide_legend(title = "method"))+
geom_polygon(data = data.frame(x = c(0, lift_curve$pct, 100, 0),
y = c(0, 100, 100, 0)),
aes(x = x, y = y), alpha = 0.1)

Related

Mixed model plotting with R - showing the data points

I have run a mixed effects binary model using the following code:
model = glmer(A ~ B + (1|C), data = data, family = "binomial")
summary(model)
I am now plotting the marginal fixed effects for a variable of interest (B). I have taken the code from the nice page on:
https://cran.r-project.org/web/packages/ggeffects/vignettes/practical_logisticmixedmodel.html
To produce the graph I have used:
ggpredict(model, "B")
plot(ggpredict(model, "B"))
The following is created which I like. But I want also the data points from the variable B to show on the graph. How can I add these in? Thanks.
welcome to stackoverflow :)
Sadly, I dont know how to (/whether it is possible) to add points to your plot of the ggpredict-object, since I am no good with ggplots :/
But I can do a workaround with baseplot. Only thing missing are the grey confidence intervals...which may bw crucial for good looks? :D
Cheers
#using the example data from the link you provided:
library(magrittr)
library(ggeffects)
library(sjmisc)
library(lme4)
library(splines)
set.seed(123)
#creating the data:
dat <- data.frame(
outcome = rbinom(n = 100, size = 1, prob = 0.35),
var_binom = as.factor(rbinom(n = 100, size = 1, prob = 0.2)),
var_cont = rnorm(n = 100, mean = 10, sd = 7),
group = sample(letters[1:4], size = 100, replace = TRUE)
)
dat$var_cont <- sjmisc::std(dat$var_cont)
#model creation:
m1 <- glmer( outcome ~ var_binom + var_cont + (1 | group),
data = dat,
family = binomial(link = "logit")
)
#save results:
m1_results <- ggpredict(m1, "var_cont")
#same plot you did:
plot(m1_results)
#workaround using baseplot:
#plotting the raw data:
plot(dat$outcome~dat$var_cont,
pch = 16,
ylab = "outcome",
xlab = "var_cont",
yaxt = "n")
#adding yaxis with percentages:
axis(2, at = pretty(dat$outcome), lab=paste0(pretty(dat$outcome) * 100," %"), las = TRUE)
#adding the model taken from ggpredict:
lines(m1_results$predicted~m1_results$x,
type = "l")
#upper and lower conf intervals:
lines(m1_results$conf.low~m1_results$x,
lty=2)
lines(m1_results$conf.high~m1_results$x,
lty=2)

brms package in R smoother

I have this data frame in R:
x = rep(seq(-10,10,1),each=5)
y = rep(0,length(x) )
weights = sample( seq(1,20,1) ,length(x), replace = TRUE)
weights = weights/sum(weights)
groups = rep( letters[1:5], times =length(x)/5 )
and some data that looks like this:
library(ggplot2)
ggplot(data = dat, aes(x = x, y = y, color = group))+geom_point( aes(size = weights))+
ylab("outcome")+
xlab("predictor x1")+
geom_vline(xintercept = 0)+ geom_hline(yintercept = 0)
fit_brms = brm(y~ s(x)+(1|group), data = dat)
by_group = marginal_effects(fit_brms, conditions = data.frame(group = dat$group) ,
re_formula = NULL, method = "predict")
plot(by_group, ncol = 5, points = TRUE)
I'd like to make a hierarchical nonlinear model so that there is a different nonlinear fit for each group.
In brms I have the code below which is doing a spline fit on the x predictor with random intercepts on group the fitted line is the same for all groups. the difference is where the lines cross the y intercept. Is there a way to make the non-linear fit be different for each group's data points?
ON page 13 here : https://cran.r-project.org/web/packages/brms/vignettes/brms_multilevel.pdf
It states "As the smooth term itself cannot be modeled as varying by year in a multilevel manner,we add a basic varying intercept in an effort to account for variation between years"
So the spline will be the same for all groups it appears? The only difference in the plots is where the spline cross the y intercept. That seems very restrictive. Can this be modified to make the spline unique to each group?
Use the formula: y ~ s(x, by = group) + (1|group)

Plotting lift curve in MLR

I would like to know how to plot lift curves in MLR especially for a Benchmark experiment with multiple algorithms and tasks. Help with ROC curve plotting will also be appreciated.
Thanks.
I am not a mlr user but here is a general way.
First some data:
Two class problem
iris2 = iris[iris$Species!="setosa",]
iris2$Species = factor(iris2$Species)
1st model:
log_model = glm(Species~., data = iris2, family = "binomial")
prob = predict(log_model, iris2, type = "response") #get the logistic regression prob
2nd model:
library(e1071)
svm_model = svm(Species~., data = iris2, probability = TRUE)
prob_svm = predict(svm_model, iris2, probability = TRUE)
prob_svm = attr(prob_svm , "probabilities")[,2] #get the probability for svm model
make a data frame from classes (1/0 coding) and additional columns for predicted probabilities for each model
for_lift = data.frame(Class = as.factor(ifelse(iris2$Species == "versicolor", 1, 0)), glm = prob, svm = prob_svm)
make a lift object
library(caret)
lift_obj = lift(Class ~ glm+svm, data = for_lift)
xyplot(lift_obj, auto.key = list(columns = 2,
lines = TRUE,
points = FALSE))
You can use the same data frame to plot ROC curves
library(pROC)
plot(pROC::roc(response = for_lift$Class,
predictor = for_lift$glm,
levels=c(0, 1)),
lwd=1.5)
plot(
pROC::roc(response = for_lift$Class,
predictor = for_lift$svm ,
levels=c(0, 1)),
add=T, lty=2, lwd=1.5)
legend(0.9, 0.9, c("logistic", "svm"), lty = c(1,2))
You can also check the ROCR package: https://cran.r-project.org/web/packages/ROCR/ROCR.pdf it has methods to plot both types of plots
Additionally if you are a ggplot2 user you can use the lift_obj to plot lift and ROC curves with it also.
library(ggplot2)
p1 = ggplot(lift_obj$data)+
geom_line(aes(CumTestedPct, CumEventPct, color = liftModelVar))+
xlab("% Samples tested")+
ylab("% Samples found")+
scale_color_discrete(guide = guide_legend(title = "method"))+
geom_polygon(data = data.frame(x = c(0, lift_obj$pct, 100, 0),
y = c(0, 100, 100, 0)),
aes(x = x, y = y), alpha = 0.1)
p2 = ggplot(lift_obj$data)+
geom_line(aes(1-Sp , Sn, color = liftModelVar))+
scale_color_discrete(guide = guide_legend(title = "method"))
library(cowplot)
plot_grid(p1, p2, labels=c("lift", "ROC"))

Visualize Multilevel Growth Model with nlme/ggplot2 vs lme4/ggplot2

I am trying to visualize the results of an nlme object without success. When I do so with an lmer object, the correct plot is created. My goal is to use nlme and visualize a fitted growth curve for each individual with ggplot2. The predict() function seems to work differently with nlme and lmer objects.
model:
#AR1 with REML
autoregressive <- lme(NPI ~ time,
data = data,
random = ~time|patient,
method = "REML",
na.action = "na.omit",
control = list(maxlter=5000, opt="optim"),
correlation = corAR1())
nlme visualization attempt:
data <- na.omit(data)
data$patient <- factor(data$patient,
levels = 1:23)
ggplot(data, aes(x=time, y=NPI, colour=factor(patient))) +
geom_point(size=1) +
#facet_wrap(~patient) +
geom_line(aes(y = predict(autoregressive,
level = 1)), size = 1)
when I use:
data$fit<-fitted(autoregressive, level = 1)
geom_line(aes(y = fitted(autoregressive), group = patient))
it returns the same fitted values for each individual and so ggplot produces the same growth curve for each. Running test <-data.frame(ranef(autoregressive, level=1)) returns varying intercepts and slopes by patient id. Interestingly, when I fit the model with lmer and run the below code it returns the correct plot. Why does predict() work differently with nlme and lmer objects?
timeREML <- lmer(NPI ~ time + (time | patient),
data = data,
REML=T, na.action=na.omit)
ggplot(data, aes(x = time, y = NPI, colour = factor(patient))) +
geom_point(size=3) +
#facet_wrap(~patient) +
geom_line(aes(y = predict(timeREML)))
In creating a reproducible example, I found that the error was not occurring in predict() nor in ggplot() but instead in the lme model.
Data:
###libraries
library(nlme)
library(tidyr)
library(ggplot2)
###example data
df <- data.frame(replicate(78, sample(seq(from = 0,
to = 100, by = 2), size = 25,
replace = F)))
##add id
df$id <- 1:nrow(df)
##rearrange cols
df <- df[c(79, 1:78)]
##sort columns
df[,2:79] <- lapply(df[,2:79], sort)
##long format
df <- gather(df, time, value, 2:79)
##convert time to numeric
df$time <- factor(df$time)
df$time <- as.numeric(df$time)
##order by id, time, value
df <- df[order(df$id, df$time),]
##order value
df$value <- sort(df$value)
Model 1 with no NA values fits successfully.
###model1
model1 <- lme(value ~ time,
data = df,
random = ~time|id,
method = "ML",
na.action = "na.omit",
control = list(maxlter=5000, opt="optim"),
correlation = corAR1(0, form=~time|id,
fixed=F))
Introducing NA's causes invertible coefficient matrix error in model 1.
###model 1 with one NA value
df[3,3] <- NA
model1 <- lme(value ~ time,
data = df,
random = ~time|id,
method = "ML",
na.action = "na.omit",
control = list(maxlter=2000, opt="optim"),
correlation = corAR1(0, form=~time|id,
fixed=F))
But not in model 2, which has a more simplistic within-group AR(1) correlation structure.
###but not in model2
model2 <- lme(value ~ time,
data = df,
random = ~time|id,
method = "ML",
na.action = "na.omit",
control = list(maxlter=2000, opt="optim"),
correlation = corAR1(0, form = ~1 | id))
However, changing opt="optim" to opt="nlminb" fits model 1 successfully.
###however changing the opt to "nlminb", model 1 runs
model3 <- lme(value ~ time,
data = df,
random = ~time|id,
method = "ML",
na.action = "na.omit",
control = list(maxlter=2000, opt="nlminb"),
correlation = corAR1(0, form=~time|id,
fixed=F))
The code below visualizes model 3 (formerly model 1) successfully.
df <- na.omit(df)
ggplot(df, aes(x=time, y=value)) +
geom_point(aes(colour = factor(id))) +
#facet_wrap(~id) +
geom_line(aes(y = predict(model3, level = 0)), size = 1.3, colour = "black") +
geom_line(aes(y = predict(model3, level=1, group=id), colour = factor(id)), size = 1)
Note that I am not exactly sure what changing the optimizer from "optim" to "nlminb" does and why it works.

Graph GLM in ggplot2 where x variable is categorical

I need to graph the predicted probabilities of a logit regression in ggplot2. Essentially, I am trying to graph a glm by each treatment condition within the same graph. However, I am getting quite confused about how to do this seeing that my treat variable (i.e. the x I am interested in) is categorical.This means that when I try to graph the treatment effects using ggplot I just get a bunch of points at 0, 1, and 2 but no lines.
My question is... How could I graph the logit prediction lines in this case? Thanks in advance!
set.seed(96)
df <- data.frame(
vote = sample(0:1, 200, replace = T),
treat = sample(0:3, 200, replace = T))
glm_output <- glm(vote ~ as.factor(treat), data = df, family = binomial(link = "logit"))
predicted_vote <- predict(glm_output, newdata = df, type = "link", interval = "confidence", se = TRUE)
df <- cbind(df, data.frame(predicted_vote))
Since the explanatory variable treat is categorical, it will make more sense if you use boxplot instead like the following:
ggplot(df, aes(x = treat, y = predicted_prob)) +
geom_boxplot(aes(fill = factor(treat)), alpha = .2)
If you want to see the predicted probabilities by glm across different values of some of other explanatory variables you may try this:
ggplot(df, aes(x = treat, y = predicted_prob)) +
geom_boxplot(aes(fill = factor(treat)), alpha = .2) + facet_wrap(~gender)
# create age groups
df$age_group <- cut(df$age, breaks=seq(0,100,20))
ggplot(df, aes(x = treat, y = predicted_prob)) +
geom_boxplot(aes(fill = factor(treat)), alpha = .2) + facet_grid(age_group~gender)

Resources