Differentiating each Line with different type in `ggsurv` plots (or in `plot`) - r

I am using Rstudio. I am using ggsurv function from GGally package for drawing Kaplan-Meier curves for my data (for survival analysis), from tutorial here. I am using it instead of plot because ggsurv takes care of legends by itself.
As shown on the link, multiple curves are differentiated by color. I want to differentiate based on linetype. The tutorial does not seem to have any option for that. Following is my command:
surv1 <- survfit(Surv(DaysOfTreatment,Survived)~AgeOnFirstContactGroup)
print(ggsurv(surv1, lty.est = 3)+ ylim(0, 1))
lty.est=3(or 2) gives same dashed lines for all the lines. I want differently dashed line for each line. Using lty=type gives error:object 'type' not found. And lty=type would work in ggplot but ggplot does not directly deal with survfit plots.
Please show me how to differentiate curves by linetype in either ggsurv or simple plot (although I would prefer ggsurv because it takes care of legends)

From the documentation for ggsurv
lty.est: linetype of the survival curve(s). Vector length should be
either 1 or equal to the number of strata.
So, to get a different line type for each stratum, set lty.est equal to a vector of the same length as the number of lines you are plotting, with each value corresponding to a different line type.
For example, using the lung data from the survival package
library(GGally)
library(survival)
data(lung)
surv1 <- survfit(Surv(time,status) ~ sex, data = lung)
ggsurv(surv1, lty.est=c(1,2), surv.col = 1)
Gives the following plot
You can add ggplot themes or other ggplot elements to the plot too. For example, we can improve the appearance using the cowplot theme as follows
library(ggplot2)
library(cowplot)
ggsurv(surv1, lty.est=c(1,2), surv.col = 1) + theme_cowplot()
If you need to change the legend labels after differentiating by linetype, then you can do it this way
ggsurv(surv1, lty.est=c(1,2), surv.col = 1) +
guides(colour = FALSE) +
scale_linetype_discrete(name = 'Sex', breaks = c(1,2), labels = c('Male', 'Female'))

Related

How does one control the appearance (e.g. line size, line type, colour) of mqgam plots produced using plot.mgamViz from the "mgcViz" package?

I am using quantile regression in R with the qgam package and visualising them using the mgcViz package, but I am struggling to understand how to control the appearance of the plots. The package effectively turns gams (in my case mqgams) into ggplots.
Simple reprex:
egfit <- mqgam(data = iris,
Sepal.Length ~ s(Petal.Length),
qu = c(0.25,0.5,0.75))
plot.mgamViz(getViz(egfit))
I am able to control things that can be added, for example the axis labels and theme of the plot, but I'm struggling to effect things that would normally be addressed in the aes() or geom_x() functions.
How would I control the thickness of the line? If this were a normal geom_smooth() or geom_line() I'd simply put size = 1 inside of the geoms, but I cannot see how I'd do so here.
How can I control the linetype of these lines? The "id" is continuous and one cannot supply a linetype to a continuous scale. If this were a nomral plot I would convert "id" to a character, but I can't see a way of doing so with the plot.mgamViz function.
How can I supply a new colour scale? It seems as though if I provide it with a new colour scale it invents new ID values to put on the legend that don't correlate to the actual "id" values, e.g.
plot.mgamViz(getViz(egfit)) + scale_colour_viridis_c()
I fully expect this to be relatively simple and I'm missing something obvious, and imagine the answer to all three of these subquestions are very similar to one another. Thanks in advance.
You need to extract your ggplot element using this:
p1 <- plot.mgamViz(getViz(egfit))
p <- p1$plots [[1]]$ggObj
Then, id should be as.factor:
p$data$id <- as.factor(p$data$id)
Now you can play with ggplot elements as you prefer:
library(mgcViz)
egfit <- mqgam(data = iris,
Sepal.Length ~ s(Petal.Length),
qu = c(0.25,0.5,0.75))
p1 <- plot.mgamViz(getViz(egfit))
# Taking gg infos and convert id to factor
p <- p1$plots [[1]]$ggObj
p$data$id <- as.factor(p$data$id)
# Changing ggplot attributes
p <- p +
geom_line(linetype = 3, size = 1)+
scale_color_brewer(palette = "Set1")+
labs(x="Petal Length", y="s(Petal Length)", color = "My ID labels:")+
theme_classic(14)+
theme(legend.position = "bottom")
p
Here the generated plot:
Hope it is useful!

Adjusting facet order and legend labels when using plot_model function of sjplot

I have successfully used the plot_model function of sjplot to plot a multinomial logistic regression model. The regression contains an outcome (Info Sought, with 3 levels) and 2 continuous predictors (DSA, ASA). I have also changed the values of ASA in the plot_model so as to plot predicted effect outcomes based on the ASA mean value and SDs:
plot1 <- plot_model(multinomialmodel , type = "pred", terms = c("DSA", "ASA[meansd]")
I have two customization questions:
1) Facet Order: The facet order is based on the default alphabetical order of the outcome levels ("Expand" then "First Pic" then "Multiple Pics"). Is there a means by which to adjust this? I tried resorting the levels with factor() (as exampled here with ggplot2) prior to running and plotting the model, but this did not cause any changes in the resulting facet order. Perhaps instead something through ggplot2, as exampled in the first solution provided here?
2) Legend Labels: The legend currently labels the plotted lines with the -1 SD, mean, and +1 SD values for ASA; is there a way to adjust these labels to instead simply say "-1 SD", "mean", and "+1 SD" instead of the raw values?
Thanks!
First I replicate your plot using your supplied data:
library(dplyr)
library(readr)
library(nnet)
library(sjPlot)
"ASA,DSA,Info_Sought
-0.108555801,0.659899854,First Pic
0.671946671,1.481880373,First Pic
2.184170211,-0.801398848,First Pic
-0.547588442,1.116555698,First Pic
-1.27930951,-0.299077419,First Pic
0.037788412,1.527545958,First Pic
-0.74271406,-0.755733264,Multiple Pics
1.20854212,-1.166723523,Multiple Pics
0.769509479,-0.390408588,Multiple Pics
-0.450025633,-1.02972677,Multiple Pics
0.769509479,0.614234269,Multiple Pics
0.281695434,0.705565438,Multiple Pics
-0.352462824,-0.299077419,Expand
0.671946671,1.481880373,Expand
2.184170211,-0.801398848,Expand
-0.547588442,1.116555698,Expand
-0.157337206,1.070890114,Expand
-1.27930951,-0.299077419,Expand" %>%
read_csv() -> d
multinomialmodel <- multinom(Info_Sought ~ ASA + DSA, data = d)
p1 <- plot_model(multinomialmodel ,
type = "pred",
terms = c("DSA", "ASA[meansd]"))
p1
Your attempt to re-factor did not work because sjPlot::plot_model() does not pay heed. One way to tackle reordering the facets is to produce an initial plot as above and replace the faceting variable in the data with a factor version containing your desired order like so:
p2 <- p1
p2$data$response.level <- factor(p2$data$response.level,
levels = c("Multiple Pics", "First Pic", "Expand"))
p2
Finally, to tackle the legend labeling issue, we can just replace the color scale with one containing your desired labels:
p2 +
scale_color_discrete(labels = c("-1 SD", "mean", "+1 SD"))
Just following up on #the-mad-statter's answer, I wanted to add a note on how to change the legend title and labels when you're working with a black-and-white graph where the lines differ by linetype (i.e. using sjplot's colors = "bw" argument).
p1 <- plot_model(multinomialmodel ,
type = "pred",
terms = c("DSA", "ASA[meansd]"),
colors = "bw)
As the lines are all black, if you would like to change the axis title and labels, you need to use the scale_linetype_manual() function instead of scale_color_discrete(), like this:
p1 + scale_linetype_manual(name = "ASA values",
values = c("dashed", "solid", "dotted"),
labels = c("Low (-1 SD)", "Medium (mean)", "High (+1 SD)"))
The resulting graph with look like this:
Note that I also took this opportunity to change how linetypes are assigned to values, making the line corresponding to the mean of ASA solid.

adapt plot code to make a ggplot

I have the following data
[1] 0.09733344 0.17540020 0.14168188 0.54093074 0.78151039 0.28068527
[7] 1.96164429 0.33743328 0.05200734 0.09103039 0.28842044 0.09240131
[13] 0.09143535 0.38142022 0.11700952
from which I did bayesian inference and made a plot with the following code
f_theta<-function(theta,Data){
(theta^length(Data) )*exp(-theta*sum(Data))}
theta<-seq(1,20,length=100)
a=b=0.001
plot(theta,dgamma(theta,a,b),type="l",col="red",
ylim=c(0,2),tck=-0.01,cex.lab=0.8,cex.axis=0.8)
lines(theta,dgamma(theta,length(Data)+a,sum(Data)+b),col="green",lty=1)
lines(theta,f_theta(theta,Data=Data),lty=1,col="blue")
legend('topright',legend=c("Prior","Post","Likelihood")
,col=c("red","green","blue","purple"),lty=1,bty="n",cex=0.8)
But I've seen the following graph
which has code
# ggplot2 examples
library(ggplot2)
# create factors with value labels
mtcars$gear <- factor(mtcars$gear,levels=c(3,4,5),
labels=c("3gears","4gears","5gears"))
mtcars$am <- factor(mtcars$am,levels=c(0,1),
labels=c("Automatic","Manual"))
mtcars$cyl <- factor(mtcars$cyl,levels=c(4,6,8),
labels=c("4cyl","6cyl","8cyl"))
# Kernel density plots for mpg
# grouped by number of gears (indicated by color)
qplot(mpg, data=mtcars, geom="density", fill=gear, alpha=I(.5),
main="Distribution of Gas Milage", xlab="Miles Per Gallon",
ylab="Density")
but I'm not quite familiar with ggplot library and graphs and I would like some help in order to adapt my code and make a graph similar to last one.
ggplot() assumes that your data are in a particular format (sometimes called "long", but the author of ggplot() dislikes that description), so let's start by putting them into that format:
Data2 = data.frame(
theta = rep(theta, 3),
WhichDistribution = c(rep("Prior",length(theta)), rep("Post",length(theta)), rep("Likelihood",length(theta))),
Density = c(dgamma(theta,a,b), dgamma(theta,length(Data)+a,sum(Data)+b), f_theta(theta,Data=Data))
)
Then we can construct a ggplot() command. ggplot() needs data, aesthetics, and a geometry. Your data will be the data frame just constructed. The aesthetics refer generally to how the qualities of the data will impact the graph (what is on axes, what determines groups, etc.), and the geometry is the kind of plot (not a great wording, sorry).
ggplot(Data2, aes(x=theta, y=Density, group=WhichDistribution, color=WhichDistribution, fill=WhichDistribution))+
# position="identity" in order to not stack the densities
geom_area(alpha=.2, position="identity") +
# gets rid of the title on the legend
theme(legend.title = element_blank())+
# make the horizontal axis label pretty
scale_x_continuous(expression(theta))
You can change alpha to adjust transparency. If you want the horizontal axis to not go all the way to 20, change it in scale_x_continuous():
ggplot(Data2, aes(x=theta, y=Density, group=WhichDistribution, color=WhichDistribution, fill=WhichDistribution))+
# position="identity" in order to not stack the densities
geom_area(alpha=.2, position="identity") +
# gets rid of the title on the legend
theme(legend.title = element_blank())+
# make the horizontal axis label pretty
scale_x_continuous(expression(theta), limits=c(0,7))
qplot() is a quick plotting function that seems to mostly get in the way for people trying to learn the ggplot() language, so you might want to avoid it.

Displaying smoothed (convolved) densities with ggplot2

I'm trying to display some frequencies convolved with a Gaussian kernel in ggplot2. I tried smoothing the lines with:
+ stat_smooth(se = F,method = "lm", formula = y ~ poly(x, 24))
Without success.
I read an article suggesting the frequencies should be convolved with a Gaussian kernel. Which ggplot2's stat_density function (http://docs.ggplot2.org/current/stat_density.html) seem to be able to produce.
However, I can't seem to be able to replace my geometry with stat_density. I there anything wrong with my code?
require(reshape2)
library(ggplot2)
library(RColorBrewer)
fileName = "/1.csv" # downloadable there: https://www.dropbox.com/s/l5j7ckmm5s9lo8j/1.csv?dl=0
mydata = read.csv(fileName,sep=",", header=TRUE)
dataM = melt(mydata,c("bins"))
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
ggplot(data=dataM,
aes(x=bins, y=value, colour=variable)) +
geom_line() + scale_x_continuous(limits = c(0, 2))
This code produces the following plot:
I'm looking at smoothing the lines a little bit, so they look more like this:
(from http://journal.frontiersin.org/Journal/10.3389/fncom.2013.00189/full)
Since my comments solved your problem, I'll convert them to an answer:
The density function takes individual measurements and calculates a kernel density distribution by convolution (gaussian is the default kernel). For example, plot(density(rnorm(1000))). You can control the smoothness with the bw (bandwidth) parameter. For example, plot(density(rnorm(1000), bw=0.01)).
But your data frame is already a density distribution (analogous to the output of the density function). To generate a smoother density estimate, you need to start with the underlying data and run density on it, adjusting bw to get the smoothness where you want it.
If you don't have access to the underlying data, you can smooth out your existing density distributions as follows:
ggplot(data=dataM, aes(x=bins, y=value, colour=variable)) +
geom_smooth(se=FALSE, span=0.3) +
scale_x_continuous(limits = c(0, 2)).
Play around with the span parameter to get the smoothness you want.

Density plots with multiple groups

I am trying to produce something similar to densityplot() from the lattice package, using ggplot2 after using multiple imputation with the mice package. Here is a reproducible example:
require(mice)
dt <- nhanes
impute <- mice(dt, seed = 23109)
x11()
densityplot(impute)
Which produces:
I would like to have some more control over the output (and I am also using this as a learning exercise for ggplot). So, for the bmi variable, I tried this:
bar <- NULL
for (i in 1:impute$m) {
foo <- complete(impute,i)
foo$imp <- rep(i,nrow(foo))
foo$col <- rep("#000000",nrow(foo))
bar <- rbind(bar,foo)
}
imp <-rep(0,nrow(impute$data))
col <- rep("#D55E00", nrow(impute$data))
bar <- rbind(bar,cbind(impute$data,imp,col))
bar$imp <- as.factor(bar$imp)
x11()
ggplot(bar, aes(x=bmi, group=imp, colour=col)) + geom_density()
+ scale_fill_manual(labels=c("Observed", "Imputed"))
which produces this:
So there are several problems with it:
The colours are wrong. It seems my attempt to control the colours is completely wrong/ignored
There are unwanted horizontal and vertical lines
I would like the legend to show Imputed and Observed but my code gives the error invalid argument to unary operator
Moreover, it seems like quite a lot of work to do what is accomplished in one line with densityplot(impute) - so I wondered if I might be going about this in the wrong way entirely ?
Edit: I should add the fourth problem, as noted by #ROLO:
.4. The range of the plots seems to be incorrect.
The reason it is more complicated using ggplot2 is that you are using densityplot from the mice package (mice::densityplot.mids to be precise - check out its code), not from lattice itself. This function has all the functionality for plotting mids result classes from mice built in. If you would try the same using lattice::densityplot, you would find it to be at least as much work as using ggplot2.
But without further ado, here is how to do it with ggplot2:
require(reshape2)
# Obtain the imputed data, together with the original data
imp <- complete(impute,"long", include=TRUE)
# Melt into long format
imp <- melt(imp, c(".imp",".id","age"))
# Add a variable for the plot legend
imp$Imputed<-ifelse(imp$".imp"==0,"Observed","Imputed")
# Plot. Be sure to use stat_density instead of geom_density in order
# to prevent what you call "unwanted horizontal and vertical lines"
ggplot(imp, aes(x=value, group=.imp, colour=Imputed)) +
stat_density(geom = "path",position = "identity") +
facet_wrap(~variable, ncol=2, scales="free")
But as you can see the ranges of these plots are smaller than those from densityplot. This behaviour should be controlled by parameter trim of stat_density, but this seems not to work. After fixing the code of stat_density I got the following plot:
Still not exactly the same as the densityplot original, but much closer.
Edit: for a true fix we'll need to wait for the next major version of ggplot2, see github.
You can ask Hadley to add a fortify method for this mids class. E.g.
fortify.mids <- function(x){
imps <- do.call(rbind, lapply(seq_len(x$m), function(i){
data.frame(complete(x, i), Imputation = i, Imputed = "Imputed")
}))
orig <- cbind(x$data, Imputation = NA, Imputed = "Observed")
rbind(imps, orig)
}
ggplot 'fortifies' non-data.frame objects prior to plotting
ggplot(fortify.mids(impute), aes(x = bmi, colour = Imputed,
group = Imputation)) +
geom_density() +
scale_colour_manual(values = c(Imputed = "#000000", Observed = "#D55E00"))
note that each ends with a '+'. Otherwise the command is expected to be complete. This is why the legend did not change. And the line starting with a '+' resulted in the error.
You can melt the result of fortify.mids to plot all variables in one graph
library(reshape)
Molten <- melt(fortify.mids(impute), id.vars = c("Imputation", "Imputed"))
ggplot(Molten, aes(x = value, colour = Imputed, group = Imputation)) +
geom_density() +
scale_colour_manual(values = c(Imputed = "#000000", Observed = "#D55E00")) +
facet_wrap(~variable, scales = "free")

Resources