here is my model. Exam_taken is a binary variable (0,1), and social class (1-10 scale) and GDP are continuous variables.
fit<-glm(Exam_taken~Gender+Social_class*GDP, data=final, family=binomial(link="probit")
summary(fit)
I need to draw graphs. Goal 1) the relationship between Social_class and Exam_taken; Goal 2) the interaction of Social_class*GDP on Exam_taken.
I encountered two problems.
I used the following code for Goal 1:
#exclude missing values
final=subset(final, final$Social_class!="NA")
final=subset(final, final$Exam_taken!="NA")
#graph
library(popbio)
logi.hist.plot(final$Social_class, final$Exam_taken, boxp=FALSE, type = "hist")
I got an error "Error in seq.default(min(independ),max(independ),len=100):'from' must be a finite number"
How to fix it? Thank you so much
I have no idea how to draw the interaction with two continuous variables on a binary outcome. Can anyone provide some directions? Thanks!
It can be difficult to represent a regression involving three dependent variables, since it is effectively a four-dimensional structure. However, since one of the variables (Gender) has only two levels, and Social class has 10 discrete levels, we can display the model using color scales and facets. First we create a data frame with all combinations of Gender and Social class at every value of GDP from, say, $1000 to $100,000
pred_df <- expand.grid(Gender = c("Male", "Female"),
Social_class = 1:10,
GDP = 1:100 * 1000)
Now we get the probability of taking the exam at each combination:
pred_df$fit <- predict(fit, newdata = pred_df, type = "response")
We can then plot the model predictions like so:
ggplot(pred_df, aes(GDP, fit, colour = Social_class, group = Social_class)) +
geom_line() +
facet_grid(Gender~.) +
scale_x_continuous(labels = scales::dollar, limits = c(0, 1e5)) +
labs(y = "Probability of taking exam",
color = "Social class") +
scale_color_viridis_c(breaks = 1:10) +
theme_minimal(base_size = 16) +
guides(color = guide_colorbar(barheight = unit(50, "mm")))
Data used
Obviously, we don't have your data, but we can make a reasonable replica given clues from your description and code.
set.seed(1)
final <- data.frame(Gender = rep(c("Male", "Female"), 100),
Social_class = sample(10, 200, TRUE),
GDP = 1000 * sample(20:60, 200, TRUE))
final$Exam_taken <- rbinom(200, 1,
c(0, 0.1) + 0.05 * final$Social_class +
final$GDP/1e5 - 0.2)
You can use the sjPlot package to plot the predicted values from the model. If you save the output of the plot_model() function, you can modify its appearance using ggplot2.
Here is one of many pages that can show you other options with this package:
https://cran.r-project.org/web/packages/sjPlot/vignettes/plot_model_estimates.html
library(sjPlot)
plot_model(fit, type = "int")
Related
Is it possible to display smooth terms from different GAM's in R if those terms are estimated using the same form of data?
I have two ecological datasets with values of species diversity along an elevational gradient. One site ranges from 1700-2800 m a.s.l. while the other ranges from 2500-3800 m a.s.l. I've modelled the relationship of species diversity against elevation using gam in R and I would like to display the smooth term from each site's GAM in the same plot area. Something like this:
I know you can use compare_smooths to compare smooths between two GAMs but I was wondering if there was a more versatile approach.
Hopefully this question is fine without reproducible data.
The easiest way would be to create predictions from the two models covering the elevational ranges you want, and then plot the fitted values.
For example (code not checked as I am AFK right now)
library("gratia")
library("dplyr")
newd <- data.frame(Elevation = c(seq(1700, 2800, length = 100),
seq(2500, 3800, length = 100)),
Site = rep(c("A", "B"), each = 100))
fv1 <- fitted_values(gam1, data = filter(newd, Site == "A"),
type = "response")
fv2 <- fitted_values(gam2, data = filter(newd, Site == "B"),
type = "response")
fv <- bind_rows(fv1, fv2)
Then you should have enough data in fv to plot, say
library("ggplot2")
fv %>% ggplot(aes(x = Elevation, colour = Site)) +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2) +
geom_line(aes(y = fitted)) +
geom_point(data = foo, aes(x = Elevation, y = Diversity))
(where the last line is for adding the raw data, but I'm not sure how you have this arranged, so figure that bit out for yourself.)
I am trying to make an interaction plot in sjPlot showing percent probabiliites of my outcome under two conditions of my predictive variable. Everything works perfectly, except the show.values = T and sort.est = T arguments, which don't seem to do anything. Is there a way to get this to work? Or, if not, how can I extract the dataframe sjPlot is using to create this figure? Looking for some way to either label or tabulate the displayed probability values. Thank you!
Here is some example data and what I have so far:
set.seed(100)
dat <- data.frame(Species = rep(letters[1:10], each = 5),
threat_cat = rep(c("recreation", "climate", "pollution", "fire", "invasive_spp"), 10),
impact.pres = sample(0:1, size = 50, replace = T),
threat.pres = sample(0:1, size = 50, replace = T))
mod <- glm(impact.pres ~ 0 + threat_cat/threat.pres,
data = dat, family = "binomial")
library(sjPlot)
library(ggpubr)
plot_model(mod, type = "int",
title = "",
axis.title = c("Threat category", "Predicted probabilities of threat being observed"),
legend.title = "Threat predicted",
colors = c("#f2bf10",
"#4445ad"),
line.size = 2,
dot.size = 4,
sort.est = T,
show.values = T)+
coord_flip()+
theme_pubr(legend = "right", base_size = 30)
sjPlot produces a ggplot object, so you can examine the aesthetic mappings and underlying data. After a bit of digging around you will find the default mapping is already correct for the x, y placements of text labels, so all you need to do is add a geom_text to the plot, and only need to specify the labels as an aesthetic mapping. You can get the labels from a column called predicted stored in the ggplot object.
The upshot is that if you add the following layer to your plot:
geom_text(aes(label = scales::percent(predicted)),
position = position_dodge(width = 1), size = 8)
You get
Getting the labels in order is trickier. You have to fiddle with the internal components of the plot to do this. Suppose we store the above plot as p, then we can sort by the predicted percentages by doing:
p$data <- as.data.frame(p$data)
ord <- p$data$x[p$data$group == 1][order(p$data$predicted[p$data$group == 1])]
p$data$x <- match(p$data$x, ord)
p$scales$scales[[1]]$labels <- p$scales$scales[[1]]$labels[ord]
p
I trained a model using rpart and I want to generate a plot displaying the Variable Importance for the variables it used for the decision tree, but I cannot figure out how.
I was able to extract the Variable Importance. I've tried ggplot but none of the information shows up. I tried using the plot() function on it, but it only gives me a flat graph. I also tried plot.default, which is a little better but still now what I want.
Here's rpart model training:
argIDCART = rpart(Argument ~ .,
data = trainSparse,
method = "class")
Got the variable importance into a data frame.
argPlot <- as.data.frame(argIDCART$variable.importance)
Here is a section of what that prints:
argIDCART$variable.importance
noth 23.339346
humanitarian 16.584430
council 13.140252
law 11.347241
presid 11.231916
treati 9.945111
support 8.670958
I'd like to plot a graph that shows the variable/feature name and its numerical importance. I just can't get it to do that. It appears to only have one column. I tried separating them using the separate function, but can't do that either.
ggplot(argPlot, aes(x = "variable importance", y = "feature"))
Just prints blank.
The other plots look really bad.
plot.default(argPlot)
Looks like it plots the points, but doesn't put the variable name.
Since there is no reproducible example available, I mounted my response based on an own R dataset using the ggplot2 package and other packages for data manipulation.
library(rpart)
library(tidyverse)
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
df <- data.frame(imp = fit$variable.importance)
df2 <- df %>%
tibble::rownames_to_column() %>%
dplyr::rename("variable" = rowname) %>%
dplyr::arrange(imp) %>%
dplyr::mutate(variable = forcats::fct_inorder(variable))
ggplot2::ggplot(df2) +
geom_col(aes(x = variable, y = imp),
col = "black", show.legend = F) +
coord_flip() +
scale_fill_grey() +
theme_bw()
ggplot2::ggplot(df2) +
geom_segment(aes(x = variable, y = 0, xend = variable, yend = imp),
size = 1.5, alpha = 0.7) +
geom_point(aes(x = variable, y = imp, col = variable),
size = 4, show.legend = F) +
coord_flip() +
theme_bw()
If you want to see the variable names, it may be best to use them as the labels on the x-axis.
plot(argIDCART$variable.importance, xlab="variable",
ylab="Importance", xaxt = "n", pch=20)
axis(1, at=1:7, labels=row.names(argIDCART))
(You may need to resize the window to see the labels properly.)
If you have a lot of variables, you may want to rotate the variable names so that the do not overlap.
par(mar=c(7,4,3,2))
plot(argIDCART$variable.importance, xlab="variable",
ylab="Importance", xaxt = "n", pch=20)
axis(1, at=1:7, labels=row.names(argIDCART), las=2)
Data
argIDCART = read.table(text="variable.importance
noth 23.339346
humanitarian 16.584430
council 13.140252
law 11.347241
presid 11.231916
treati 9.945111
support 8.670958",
header=TRUE)
Sorry if this is not well asked, first ever question.
Aim: to calculate the bone mineral density T-score (+/- 2.5 SD for sex and age specific BMD value). To say whether a patient is osteoporotic or not.
I am trying to do this graphically using ggplot 2 and geom_smooth
I am using the NHANES dataset (https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DXXFEM_H.htm#DXXINBMD) which is accessed through nhanesA package.
r load programs:
library(nhanesA)
library(ggplot2)
I am interested only in the intertrochanteric BMD, age and sex.
r load data:
nhanesTableVars('EXAM', "DXXFEM_D")
DXXFEM_D <- nhanes('DXXFEM_D')
fem_d <- DXXFEM_D
demo_d <- nhanes('DEMO_D')
demo_d <- nhanesTranslate('DEMO_D', 'RIAGENDR', data=demo_d)
DXXFEM_D_vars <- nhanesTableVars('EXAM', 'DXXFEM_D', namesonly=TRUE)
DXXFEM_D <- nhanesTranslate('DXXFEM_D', DXXFEM_D_vars, data=DXXFEM_D)
FEM_demo <- merge(demo_d, DXXFEM_D)
FEM_demo_1 <- FEM_demo[,c(5,6,55)]
Then I attempted the plot but with a levels argument in the "geom_smooth" does not work with level at 2.5.
r plot BMD with SD:
ggplot(data = FEM_demo_1, aes(x = RIDAGEYR, y = DXXINBMD, group = RIAGENDR, color = RIAGENDR)) +
geom_smooth(se = TRUE, level = 2.5) +
scale_x_continuous(minor_breaks = seq(0,85,1), breaks = seq(0,85,5))
1) I would ideally like a plot which shows the mean, -1SD (which refers to Osteopaenia) and -2SD which refers to cut off for osteoporosis which can be used to translate BMD into clinical criteria. Is there a way to do this?
2) Is there anyway to do this numerically?
Thanks
Here is the code for plot with mean, -1SD and -2SD. You can add styling to your liking. The calculations for mean and SD are done beforehand in dataframe.
data <- aggregate(FEM_demo_1$DXXINBMD, by=list(FEM_demo_1$RIAGENDR, FEM_demo_1$RIDAGEYR), FUN=mean, na.rm=TRUE)
names(data) <- c("gender", "age", "mean")
data[,"sd"] <- aggregate(FEM_demo_1$DXXINBMD, by=list(FEM_demo_1$RIAGENDR, FEM_demo_1$RIDAGEYR), FUN=sd, na.rm=TRUE)[3]
ggplot(data=data, aes(x=age, group=gender))+
geom_smooth(se = FALSE, aes(y=mean))+
geom_smooth(se = FALSE, aes(y=mean-sd))+
geom_smooth(se = FALSE, aes(y=mean-(2*sd)))
I am trying to plot the interaction between a fixed effect and random factor. sjPlot seems like a good package for this, but I am having trouble changing the line types and colors. I would like the change the line colors to a gray-scale scheme with different line types to differentiate the groups. I've experimented with the geom.color argument and sjp.setTheme function, but so far have not been able to get the desired results.
The example code below shows my initial attempts, borrowing from the example on the sjPlot website:
data(efc)
efc$hi_qol <- dicho(efc$quol_5)
efc$grp = as.factor(efc$e15relat)
levels(x = efc$grp) <- get_labels(efc$e15relat)
mydf <- data.frame(hi_qol = efc$hi_qol,
sex = to_factor(efc$c161sex),
c12hour = efc$c12hour,
neg_c_7 = efc$neg_c_7,
grp = efc$grp)
fit <- glmer(hi_qol ~ sex + c12hour + neg_c_7 + (1 | grp),
data = mydf, family = binomial("logit"))
sjp.glmer(fit, type="ri.slope", facet.grid=F, vars="neg_c_7")
To change the line colors, I tried setting geom.colors="black", but that didn't appear to do anything.
sjp.glmer(fit, type="ri.slope", facet.grid=F, geom.colors="black", vars="neg_c_7")
Next I tried changing the theme used by sjPlot to change the line type, but that didn't work either.
sjp.setTheme(geom.linetype = c(1:8))
sjp.glmer(fit, type="ri.slope", facet.grid=F, vars="neg_c_7")
Am I missing something obvious or is changing the line types and colors more complex?
The sjPlot-package does not support changing the line type - only the color. Linetype aesthetics is currently not mapped by the sjp-functions. However, you could access the data that is used for the plot and create your own interaction plot:
library(ggplot2)
library(sjmisc)
data(efc)
# create binary response
y <- ifelse(efc$neg_c_7 < median(stats::na.omit(efc$neg_c_7)), 0, 1)
# create data frame for fitted model
mydf <- data.frame(y = as.factor(y),
sex = as.factor(efc$c161sex),
barthel = as.numeric(efc$barthtot))
# fit model
fit <- glm(y ~ sex * barthel, data = mydf, family = binomial(link = "logit"))
p <- sjp.int(fit, geom.colors = "gs")
ggplot(p$data.list[[1]], aes(x = x, y = y, linetype = grp)) + geom_line()
Changing the colors for interaction plots works with the geom.colors-argument, see Details in ?sjp.grpfrq.
I'm not sure how to do this with sjplot but you could use the interaction.plot function to generate the plot and add col= c(...) to change the colours of the line.
interaction.plot(factor, factor, fit, .. col = c("red","blue"))