I would like to create a forest plot after Cox survival models. However, I would liketo display only some of the covariates in the graph? Does someone know if it is possible? Maybe using ggforest2?
Thanks
library(survival)
library(survminer)
model <- coxph(Surv(time, status) ~ sex + rx + adhere,
data = colon )
ggforest(model)
colon <- within(colon, {
sex <- factor(sex, labels = c("female", "male"))
differ <- factor(differ, labels = c("well", "moderate", "poor"))
extent <- factor(extent, labels = c("submuc.", "muscle", "serosa", "contig."))
})
bigmodel <-
coxph(Surv(time, status) ~ sex + rx + adhere + differ + extent + node4,
data = colon )
ggforest(bigmodel)
The current version of ggforest on my machine does not allow me to select variables to be presented in the plot. However, another package forestmodel::forest_model has covariates = which is supposed to allow users to select variables. However, the current version of forestmodel may not perform this correctly, as you can see from the following graph:
colon <- within(colon, {
sex <- factor(sex, labels = c("female", "male"))
differ <- factor(differ, labels = c("well", "moderate", "poor"))
extent <- factor(extent, labels = c("submuc.", "muscle", "serosa", "contig."))
})
bigmodel <-
coxph(Surv(time, status) ~ sex + rx + adhere + differ + extent + node4,
data = colon )
forest_model(bigmodel, covariates = c("sex", "rx"))
It might be something the original contributor could fix. At some stage, I was able to generate something like this with some minor modification of previous version of the function. However, after I reinstalled the updated package, it no longer works.
EDIT
Another approach would be flexible. It takes two steps. First, collect model information (I use broom::tidy here but you can use other methods. Second, use forestplot::forest_plotto produce the graph. Again, you can also use other Meta analysis package for this.
Let's continue with the above bigmodel
library(forestplot)
library(tidyverse)
# Save model information
df <- broom::tidy(bigmodel, exponentiate = TRUE)
# pick up the first 4 values
df1 <- df[1:4, ] %>%
transmute(
HR = round(estimate, 2),
low = conf.low,
high = conf.high)
row_names <- cbind(c("Name", "Sex", "Lev", "Lev + 5FU", "adhere"),
c("HR", df1$HR))
df1 <- rbind(rep(NA, 4), df1)
forestplot(labeltext = row_names,
df1[,c("HR", "low", "high")],
is.summary=c(FALSE, FALSE, FALSE),
zero = 1,
xlog = TRUE)
This produces the following graph. It may take a little bit more learning to generate a satisfactory graph, but you are in control, relatively.
Related
here is my model. Exam_taken is a binary variable (0,1), and social class (1-10 scale) and GDP are continuous variables.
fit<-glm(Exam_taken~Gender+Social_class*GDP, data=final, family=binomial(link="probit")
summary(fit)
I need to draw graphs. Goal 1) the relationship between Social_class and Exam_taken; Goal 2) the interaction of Social_class*GDP on Exam_taken.
I encountered two problems.
I used the following code for Goal 1:
#exclude missing values
final=subset(final, final$Social_class!="NA")
final=subset(final, final$Exam_taken!="NA")
#graph
library(popbio)
logi.hist.plot(final$Social_class, final$Exam_taken, boxp=FALSE, type = "hist")
I got an error "Error in seq.default(min(independ),max(independ),len=100):'from' must be a finite number"
How to fix it? Thank you so much
I have no idea how to draw the interaction with two continuous variables on a binary outcome. Can anyone provide some directions? Thanks!
It can be difficult to represent a regression involving three dependent variables, since it is effectively a four-dimensional structure. However, since one of the variables (Gender) has only two levels, and Social class has 10 discrete levels, we can display the model using color scales and facets. First we create a data frame with all combinations of Gender and Social class at every value of GDP from, say, $1000 to $100,000
pred_df <- expand.grid(Gender = c("Male", "Female"),
Social_class = 1:10,
GDP = 1:100 * 1000)
Now we get the probability of taking the exam at each combination:
pred_df$fit <- predict(fit, newdata = pred_df, type = "response")
We can then plot the model predictions like so:
ggplot(pred_df, aes(GDP, fit, colour = Social_class, group = Social_class)) +
geom_line() +
facet_grid(Gender~.) +
scale_x_continuous(labels = scales::dollar, limits = c(0, 1e5)) +
labs(y = "Probability of taking exam",
color = "Social class") +
scale_color_viridis_c(breaks = 1:10) +
theme_minimal(base_size = 16) +
guides(color = guide_colorbar(barheight = unit(50, "mm")))
Data used
Obviously, we don't have your data, but we can make a reasonable replica given clues from your description and code.
set.seed(1)
final <- data.frame(Gender = rep(c("Male", "Female"), 100),
Social_class = sample(10, 200, TRUE),
GDP = 1000 * sample(20:60, 200, TRUE))
final$Exam_taken <- rbinom(200, 1,
c(0, 0.1) + 0.05 * final$Social_class +
final$GDP/1e5 - 0.2)
You can use the sjPlot package to plot the predicted values from the model. If you save the output of the plot_model() function, you can modify its appearance using ggplot2.
Here is one of many pages that can show you other options with this package:
https://cran.r-project.org/web/packages/sjPlot/vignettes/plot_model_estimates.html
library(sjPlot)
plot_model(fit, type = "int")
I trained a model using rpart and I want to generate a plot displaying the Variable Importance for the variables it used for the decision tree, but I cannot figure out how.
I was able to extract the Variable Importance. I've tried ggplot but none of the information shows up. I tried using the plot() function on it, but it only gives me a flat graph. I also tried plot.default, which is a little better but still now what I want.
Here's rpart model training:
argIDCART = rpart(Argument ~ .,
data = trainSparse,
method = "class")
Got the variable importance into a data frame.
argPlot <- as.data.frame(argIDCART$variable.importance)
Here is a section of what that prints:
argIDCART$variable.importance
noth 23.339346
humanitarian 16.584430
council 13.140252
law 11.347241
presid 11.231916
treati 9.945111
support 8.670958
I'd like to plot a graph that shows the variable/feature name and its numerical importance. I just can't get it to do that. It appears to only have one column. I tried separating them using the separate function, but can't do that either.
ggplot(argPlot, aes(x = "variable importance", y = "feature"))
Just prints blank.
The other plots look really bad.
plot.default(argPlot)
Looks like it plots the points, but doesn't put the variable name.
Since there is no reproducible example available, I mounted my response based on an own R dataset using the ggplot2 package and other packages for data manipulation.
library(rpart)
library(tidyverse)
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
df <- data.frame(imp = fit$variable.importance)
df2 <- df %>%
tibble::rownames_to_column() %>%
dplyr::rename("variable" = rowname) %>%
dplyr::arrange(imp) %>%
dplyr::mutate(variable = forcats::fct_inorder(variable))
ggplot2::ggplot(df2) +
geom_col(aes(x = variable, y = imp),
col = "black", show.legend = F) +
coord_flip() +
scale_fill_grey() +
theme_bw()
ggplot2::ggplot(df2) +
geom_segment(aes(x = variable, y = 0, xend = variable, yend = imp),
size = 1.5, alpha = 0.7) +
geom_point(aes(x = variable, y = imp, col = variable),
size = 4, show.legend = F) +
coord_flip() +
theme_bw()
If you want to see the variable names, it may be best to use them as the labels on the x-axis.
plot(argIDCART$variable.importance, xlab="variable",
ylab="Importance", xaxt = "n", pch=20)
axis(1, at=1:7, labels=row.names(argIDCART))
(You may need to resize the window to see the labels properly.)
If you have a lot of variables, you may want to rotate the variable names so that the do not overlap.
par(mar=c(7,4,3,2))
plot(argIDCART$variable.importance, xlab="variable",
ylab="Importance", xaxt = "n", pch=20)
axis(1, at=1:7, labels=row.names(argIDCART), las=2)
Data
argIDCART = read.table(text="variable.importance
noth 23.339346
humanitarian 16.584430
council 13.140252
law 11.347241
presid 11.231916
treati 9.945111
support 8.670958",
header=TRUE)
While I am enjoying using package visreg to visualize my regressions, there's one thing that I can't yet control: the number of columns when faceting. See the following factor-by-curve generalized additive regression for example:
library(dplyr)
library(mgcv)
library(visreg)
data(airquality)
test <- gam(
Ozone ~ s(Temp, by = Month),
data = airquality %>% mutate(Month = as.factor(Month))
)
If I do
visreg(test, xvar = "Temp", by = "Month", gg = TRUE)
I get a 1-row, 5-column factor-by-curves.
Funnily enough, if I take the gg = TRUE out, it becomes 2-row. But whichever is the case I would like to be able to control the number of columns and rows when faceting. So far I have been unsuccessful, by either manipulating the ellipsis argument of visreg or by directly trying to manipulate the resulting ggplot object.
So for example, if I wanted to do visreg with gg = TRUE with 3-row, 2-column, what would be my best chance---or is there another package that is recommended?
You can just modify the ggplot object and add facet_wrap in the usual way:
p <- visreg(test, xvar = "Temp", by = "Month", gg = TRUE)
p + facet_wrap(vars(Month), nrow = 3)
You don't actually need to create p first, this gives the same result:
visreg(test, xvar = "Temp", by = "Month", gg = TRUE) +
facet_wrap(vars(Month), nrow = 3)
Sorry if this is not well asked, first ever question.
Aim: to calculate the bone mineral density T-score (+/- 2.5 SD for sex and age specific BMD value). To say whether a patient is osteoporotic or not.
I am trying to do this graphically using ggplot 2 and geom_smooth
I am using the NHANES dataset (https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DXXFEM_H.htm#DXXINBMD) which is accessed through nhanesA package.
r load programs:
library(nhanesA)
library(ggplot2)
I am interested only in the intertrochanteric BMD, age and sex.
r load data:
nhanesTableVars('EXAM', "DXXFEM_D")
DXXFEM_D <- nhanes('DXXFEM_D')
fem_d <- DXXFEM_D
demo_d <- nhanes('DEMO_D')
demo_d <- nhanesTranslate('DEMO_D', 'RIAGENDR', data=demo_d)
DXXFEM_D_vars <- nhanesTableVars('EXAM', 'DXXFEM_D', namesonly=TRUE)
DXXFEM_D <- nhanesTranslate('DXXFEM_D', DXXFEM_D_vars, data=DXXFEM_D)
FEM_demo <- merge(demo_d, DXXFEM_D)
FEM_demo_1 <- FEM_demo[,c(5,6,55)]
Then I attempted the plot but with a levels argument in the "geom_smooth" does not work with level at 2.5.
r plot BMD with SD:
ggplot(data = FEM_demo_1, aes(x = RIDAGEYR, y = DXXINBMD, group = RIAGENDR, color = RIAGENDR)) +
geom_smooth(se = TRUE, level = 2.5) +
scale_x_continuous(minor_breaks = seq(0,85,1), breaks = seq(0,85,5))
1) I would ideally like a plot which shows the mean, -1SD (which refers to Osteopaenia) and -2SD which refers to cut off for osteoporosis which can be used to translate BMD into clinical criteria. Is there a way to do this?
2) Is there anyway to do this numerically?
Thanks
Here is the code for plot with mean, -1SD and -2SD. You can add styling to your liking. The calculations for mean and SD are done beforehand in dataframe.
data <- aggregate(FEM_demo_1$DXXINBMD, by=list(FEM_demo_1$RIAGENDR, FEM_demo_1$RIDAGEYR), FUN=mean, na.rm=TRUE)
names(data) <- c("gender", "age", "mean")
data[,"sd"] <- aggregate(FEM_demo_1$DXXINBMD, by=list(FEM_demo_1$RIAGENDR, FEM_demo_1$RIDAGEYR), FUN=sd, na.rm=TRUE)[3]
ggplot(data=data, aes(x=age, group=gender))+
geom_smooth(se = FALSE, aes(y=mean))+
geom_smooth(se = FALSE, aes(y=mean-sd))+
geom_smooth(se = FALSE, aes(y=mean-(2*sd)))
I'm new to R and statistics and haven't been able to figure out how one would go about plotting predicted values vs. Actual values after running a multiple linear regression. I have come across similar questions (just haven't been able to understand the code). I would greatly appreciate it if you explain the code.
This is what I have done so far:
# Attach file containing variables and responses
q <- read.csv("C:/Users/A/Documents/Design.csv")
attach(q)
# Run a linear regression
model <- lm(qo~P+P1+P4+I)
# Summary of linear regression results
summary(model)
The plot of predicted vs. actual is so I can graphically see how well my regression fits on my actual data.
It would be better if you provided a reproducible example, but here's an example I made up:
set.seed(101)
dd <- data.frame(x=rnorm(100),y=rnorm(100),
z=rnorm(100))
dd$w <- with(dd,
rnorm(100,mean=x+2*y+z,sd=0.5))
It's (much) better to use the data argument -- you should almost never use attach() ..
m <- lm(w~x+y+z,dd)
plot(predict(m),dd$w,
xlab="predicted",ylab="actual")
abline(a=0,b=1)
Besides predicted vs actual plot, you can get an additional set of plots which help you to visually assess the goodness of fit.
--- execute previous code by Ben Bolker ---
par(mfrow = c(2, 2))
plot(m)
A tidy way of doing this would be to use modelsummary::augment():
library(tidyverse)
library(cowplot)
library(modelsummary)
set.seed(101)
# Using Ben's data above:
dd <- data.frame(x=rnorm(100),y=rnorm(100),
z=rnorm(100))
dd$w <- with(dd,rnorm(100,mean=x+2*y+z,sd=0.5))
m <- lm(w~x+y+z,dd)
m %>% augment() %>%
ggplot() +
geom_point(aes(.fitted, w)) +
geom_smooth(aes(.fitted, w), method = "lm", se = FALSE, color = "lightgrey") +
labs(x = "Actual", y = "Fitted") +
theme_bw()
This will work nicely for deep nested regression lists especially.
To illustrate this, consider some nested list of regressions:
Reglist <- list()
Reglist$Reg1 <- dd %>% do(reg = lm(as.formula("w~x*y*z"), data = .)) %>% mutate( Name = "Type 1")
Reglist$Reg2 <- dd %>% do(reg = lm(as.formula("w~x+y*z"), data = .)) %>% mutate( Name = "Type 2")
Reglist$Reg3 <- dd %>% do(reg = lm(as.formula("w~x"), data = .)) %>% mutate( Name = "Type 3")
Reglist$Reg4 <- dd %>% do(reg = lm(as.formula("w~x+z"), data = .)) %>% mutate( Name = "Type 4")
Now is where the power of the above tidy plotting framework comes to life...:
Graph_Creator <- function(Reglist){
Reglist %>% pull(reg) %>% .[[1]] %>% augment() %>%
ggplot() +
geom_point(aes(.fitted, w)) +
geom_smooth(aes(.fitted, w), method = "lm", se = FALSE, color = "lightgrey") +
labs(x = "Actual", y = "Fitted",
title = paste0("Regression Type: ", Reglist$Name) ) +
theme_bw()
}
Reglist %>% map(~Graph_Creator(.)) %>%
cowplot::plot_grid(plotlist = ., ncol = 1)
Same as #Ben Bolker's solution but getting a ggplot object instead of using base R
#first generate the dd data set using the code in Ben's solution, then...
require(ggpubr)
m <- lm(w~x+y+z,dd)
ggscatter(x = "prediction",
y = "actual",
data = data.frame(prediction = predict(m),
actual = dd$w)) +
geom_abline(intercept = 0,
slope = 1)