Stacke different plots in a facet manner - r

To train with ggplot and to improve my skills in writing R functions I decided to build a series of functions that produces survival plots, with all kinds of extras. I managed to build a good working function for the basic survival plot, now I am getting to the extras. One thing I would like to do is an option that stacks an area plot of the number at risk at a given time point, on top of the survival plot. I would like it to look just like the facet_grid option of ggplot, but I did not manage to do it with this function. I do not want the two plots binded, like we can do with grid.arrange, but rather to have the same x-axis.
The following code produces the two (simplified) plots that I would like to stack on top of each other. I tried to do this with facet_grid, but I don't think the solution lies in this
library(survival)
library(ggplot2)
data(lung)
s <- survfit(Surv(time, status) ~ 1, data = lung)
dat <- data.frame(time = c(0, s$time),
surv = c(1, s$surv),
nr = c(s$n, s$n.risk))
pl1 <- ggplot(dat, aes(time, surv)) + geom_step()
pl2 <- ggplot(dat, aes(time, nr)) + geom_area()

First, melt your data to long format.
library(reshape2)
dat.long<-melt(dat,id.vars="time")
head(dat.long)
time variable value
1 0 surv 1.0000000
2 5 surv 0.9956140
3 11 surv 0.9824561
4 12 surv 0.9780702
5 13 surv 0.9692982
6 15 surv 0.9649123
Then use subset() to use only surv data in geom_step() and nr data in geom_area() and with facet_grid() you will get each plot in separate facet as variable is used to divide data for facetting and for subsetting. scales="free_y" will make pretty axis.
ggplot()+geom_step(data=subset(dat.long,variable=="surv"),aes(time,value))+
geom_area(data=subset(dat.long,variable=="nr"),aes(time,value))+
facet_grid(variable~.,scales="free_y")

Related

How to replicate plot with two panels?

I am trying to replicate this plot, here:
Here is the source of this plot, slide 89:
http://www.drizopoulos.com/courses/Int/JMwithR_CEN-ISBS_2017.pdf
The top of the plot is the hazard function over time, whereas the bottom green curve is the fitted linear mixed effects model over time.
I have been able to plot both of these separately, however, cannot seem to combine them using either par(mfrow=c(2,1)) or the gridExtra package (because only one is a ggplot object).
I am using the aids and aids.id datasets (as a part of the JM package) in R.
# Load packages JM and lattice
library("JM")
library("lattice")
library("ggplot2")
#Fit models
lmeFit.aids <- lme(CD4 ~ obstime + obstime:drug,
random = ~ obstime | patient, data = aids)
coxFit.aids <- coxph(Surv(Time, death) ~ drug, data = aids.id, x = TRUE)
#Plot longitudinal process
p1<-ggplot(data=aids,aes(x=obstime,y=fitted(lmeFit.aids)))
p1<-p1+geom_smooth(se=FALSE)
p1
#Plot survival process
library(rms)
p2<-psm(Surv(Time,death)~1,data=aids.id)
survplot(p2,what='hazard')
Thank you!
Up front, patchwork allows you to combine ggplot2 and base graphics into one plot. Adapted from ?wrap_elements,
library(ggplot2)
library(patchwork)
gg <- ggplot(mtcars, aes(mpg, disp)) + geom_point()
gg / wrap_elements(full = ~ plot(mtcars$mpg, mtcars$disp))
I was able to extract the values of the hazard at various time points using the survest() function. Then, I was able to plot this using ggplot, meaning I could use grid.arrange().
est<-survest(p2,,what='hazard')
hazard<-data.frame(time=est$time, hazard=est$surv)

Error message while plotting density functions in ggplot

I had a data frame with 750 observations and 250 columns, and I would like to plot two density plots on top of each other. In one case, a particular factor is present, in the other it isn't (commercial activities against non-commercial activities).
I created a subset of the data
CommercialActivityData <- subset(MbadSurvey, Q2== 1)
NonCommercialActivityData <- subset(MbadSurvey, Q2== 2)
I then tried to plot this as follows
p1 <- ggplot(CommercialActivityData, aes(x = water_use_PP)) + geom_density()
p1
However, when I do, I get the following error message
Error: Aesthetics must be either length 1 or the same as the data (51): x
I have 51 data values where there is commercial, and 699 where there isn't.
EDIT: new code!!
I don't have access to your data set so I have simulated your data:
# Creating the data frame
MbadSurvey <- data.frame("water_use_PP"=runif(1000,1,100),
"Q2"=as.factor(round(runif(1000,1,2),0)))
# Requiring the package
require(ggplot2)
# Creating 3 different density plots based on the Species
p1 <- ggplot(MbadSurvey, aes(x = water_use_PP,colour = Q2)) + geom_density()
p1
NOTE: The variable Q2 must be a factor!

In R: Get multiple barplots from a single output

I have a data frame (100 x 4). The first column is a set of "bins" 0-100, the remaining columns are the counts for each variable of events within each bin (0 to the maximum number of events).
What I'm trying to do is to plot each of the three columns of data (2:4), alongside each other. Because the counts in each of the bins for each of the data sets is close to identical, the data are overlapped in the histogram/barplots I've created, despite my use of beside=true, and position = dodge.
I've set the first column as both numeric and character, but the results are identical- the bars are overlayed on top of each other. (semi-transparent density plots don't work because I want counts not the distribution densities).
The attached code, based on both R and other documentation produced the attached chart.
barplot(BinCntDF$preT,main=NewMain_Trigger, plot=TRUE,
xlab="sample frequency interval counts (0-100 msec bins)",
names.arg=BinCntDF$dT, las=0,
ylab="bin counts", axes=TRUE, xlim=c(0,100),
ylim=c(0,1000), col="red")
geom_bar(position="dodge")
barplot(BinCntDF$postT, beside=TRUE, add=TRUE)
geom_bar()
The goal is to be able to compare the two (or more) data sets side by side on the same axes, without either overlapping the other(s).
I think you have confused barplot with ggplot2. ggplot2 is a library where the function geom_bar comes from and isn't compatible with barplot which comes with Base R.
Simply compare ?barplot and ?geom_bar, and you will see that geom_bar is from the ggplot2 library. To achieve what you're after I have used the ggplot2 library and reshape2.
Step 1
Based on your description, I have assumed that your data looks roughly like this:
df <- data.frame(x = 1:10,
c1 = sample(0:100, replace=TRUE, size=10),
c2 = sample(0:50, replace=TRUE, size=10),
c3 = sample(0:70, replace=TRUE, size=10))
To plot it using ggplot2 you first have to transform the data to a long format instead of a wide format. You can do this using melt function from reshape2.
library(reshape2)
a <- melt(df, id=c("x"))
The output would look something like this
> head(a)
x variable value
1 1 c1 62
2 2 c1 47
3 3 c1 20
4 4 c1 64
5 5 c1 4
6 6 c1 52
Step 2
There are plenty of tutorials online to what ggplot2 does and the arguments. I would recommend you Google, or search through the many threads in SO to understand.
ggplot(a, aes(x=x, y=value, group=variable, fill=variable)) +
geom_bar(stat='identity', position='dodge')
Which gives you the output:
In a nutshell:
group groups the variables of interest
stat=identity ensures that no additional aggregations are made on your data
With that many bins (100) and groups (3) the plot will look messy, but try this:
set.seed(123)
myDF <- data.frame(bins=1:100, x=sample(1:100, replace=T), y=sample(1:100, replace=T), z=sample(1:100, replace=T))
myDF.m <- melt(myDF, id.vars='bins')
ggplot(myDF.m, aes(x=bins, y=value, fill=variable)) + geom_bar(stat='identity', position='dodge')
You could also try plotting w/ facets:
ggplot(myDF.m, aes(x=bins, y=value, fill=variable)) + geom_bar(stat='identity') + facet_wrap(~ variable)

plotting multiple plots in ggplot2 on same graph that are unrelated

How would one use the smooth.spline() method in a ggplot2 scatterplot?
If my data is in the data frame called data, with two columns, x and y.
The smooth.spline would be sm <- smooth.spline(data$x, data$y). I believe I should use geom_line(), with sm$x and sm$y as the xy coordinates. However, how would one plot a scatterplot and a lineplot on the same graph that are completely unrelated? I suspect it has something to do with the aes() but I am getting a little confused.
You can use different data(frames) in different geoms and call the relevant variables using aes or you could combine the relevant variables from the output of smooth.spline
# example data
set.seed(1)
dat <- data.frame(x = rnorm(20, 10,2))
dat$y <- dat$x^2 - 20*dat$x + rnorm(20,10,2)
# spline
s <- smooth.spline(dat)
# plot - combine the original x & y and the fitted values returned by
# smooth.spline into a data.frame
library(ggplot2)
ggplot(data.frame(x=s$data$x, y=s$data$y, xfit=s$x, yfit=s$y)) +
geom_point(aes(x,y)) + geom_line(aes(xfit, yfit))
# or you could use geom_smooth
ggplot(dat, aes(x , y)) + geom_point() + geom_smooth()

5 dimensional plot in r

I am trying to plot a 5 dimensional plot in R. I am currently using the rgl package to plot my data in 4 dimensions, using 3 variables as the x,y,z, coordinates, another variable as the color. I am wondering if I can add a fifth variable using this package, like for example the size or the shape of the points in the space. Here's an example of my data, and my current code:
set.seed(1)
df <- data.frame(replicate(4,sample(1:200,1000,rep=TRUE)))
addme <- data.frame(replicate(1,sample(0:1,1000,rep=TRUE)))
df <- cbind(df,addme)
colnames(df) <- c("var1","var2","var3","var4","var5")
require(rgl)
plot3d(df$var1, df$var2, df$var3, col=as.numeric(df$var4), size=0.5, type='s',xlab="var1",ylab="var2",zlab="var3")
I hope it is possible to do the 5th dimension.
Many thanks,
Here is a ggplot2 option. I usually shy away from 3D plots as they are hard to interpret properly. I also almost never put in 5 continuous variables in the same plot as I have here...
ggplot(df, aes(x=var1, y=var2, fill=var3, color=var4, size=var5^2)) +
geom_point(shape=21) +
scale_color_gradient(low="red", high="green") +
scale_size_continuous(range=c(1,12))
While this is a bit messy, you can actually reasonably read all 5 dimensions for most points.
A better approach to multi-dimensional plotting opens up if some of your variables are categorical. If all your variables are continuous, you can turn some of them to categorical with cut and then use facet_wrap or facet_grid to plot those.
For example, here I break up var3 and var4 into quintiles and use facet_grid on them. Note that I also keep the color aesthetics as well to highlight that most of the time turning a continuous variable to categorical in high dimensional plots is good enough to get the key points across (here you'll notice that the fill and border colors are pretty uniform within any given grid cell):
df$var4.cat <- cut(df$var4, quantile(df$var4, (0:5)/5), include.lowest=T)
df$var3.cat <- cut(df$var3, quantile(df$var3, (0:5)/5), include.lowest=T)
ggplot(df, aes(x=var1, y=var2, fill=var3, color=var4, size=var5^2)) +
geom_point(shape=21) +
scale_color_gradient(low="red", high="green") +
scale_size_continuous(range=c(1,12)) +
facet_grid(var3.cat ~ var4.cat)

Resources