How to highlight an item of time-series in a ggplot2 plot - r

I wish to highlight segments above or below a certain value in a time series by a unique colour or a shape. In the example data I am decomposing a mortality time series into its components. My goal is to highlight the segments when the mortality in the trend component falls below 35 (deep between 1997 and 2000) and when the residual component is above 100 (the spike). I have tried to use annotate, but that did not produce what I wanted.
#Load library and obtain data
library(gamair)
library(tsModel)
library(ggplot2)
library(reshape2)
data<-data(chicago)
## create variables, decompose TS
chicago$date<-seq(from=as.Date("1987-01-01"), to=as.Date("2000-12-31"),length=5114)
data<- chicago[,c("date","death")]
mort <- tsdecomp(data$death, c(1, 2, 15, 5114))
## Convert matrix to df, rename, melt
df<-as.data.frame(mort)
names(df)[1] <- "Trend"
names(df)[2] <- "Seasonal"
names(df)[3] <- "Residual"
df$date<-seq(as.Date("1987-01-01"), as.Date("2000-12-31"), "day")
meltdf <- melt(df,id="date")
## Plot
ggplot(meltdf,aes(x=date,y=value,colour=variable,group=variable)) + geom_line() +
theme_bw() +
ylab("") + xlab("") +
facet_grid(variable ~ . , scales = "free") +
theme(legend.position = "none")
annotate("rect", xmin=1995-01-01,xmax=1996-01-01,ymin= 10, ymax=300, alpha = .2,fill="blue")

Well, this works but I must admit it's more work that I'd hoped.
get.box <- function(data) {
rng <- range(data$date) + c(-50,50)
z <- meltdf[meltdf$date>=rng[1] & meltdf$date <=rng[2] & meltdf$variable==unique(data$variable),]
data.frame(variable=unique(z$variable),
xmin=min(z$date),xmax=max(z$date),ymin=min(z$value),ymax=max(z$value))
}
hilight.trend <- get.box(with(meltdf,meltdf[variable=="Trend" & value<35,]))
hilight.resid <- get.box(with(meltdf,meltdf[variable=="Residual" & value>100,]))
ggplot(meltdf,aes(colour=variable,group=variable)) +
geom_line(aes(x=date,y=value)) +
theme_bw() +
ylab("") + xlab("") +
facet_grid(variable ~ . , scales = "free") +
theme(legend.position = "none") +
geom_rect(data=hilight.trend, alpha=0.2, fill="red",
aes(xmax=xmax,xmin=xmin,ymax=ymax,ymin=ymin)) +
geom_rect(data=hilight.resid, alpha=0.2, fill="blue",
aes(xmax=xmax,xmin=xmin,ymax=ymax,ymin=ymin))
You can't really use annotate(...) with facets, because you will get the same annotation on all the facets. So you're left with something like geom_rect(...). The problem here is that geom_rect(...) draws a rectangle for every row in the data. So you need to create an auxiliary dataset with just one row for each variable, containing the x- and y- min and max.

Related

Why the R script provide by site pubmed is not executing ? Is it possible to make it run?

If possible, I need help to understand why the code below is not working. This code I was found on the page: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3817376/. Would it be possible for any expert member to adapt it to work?
library(ggplot2)
library(nlme)
head(Theoph)
ggplot(data=Theoph, aes(x=Time, y=conc, group=Subject)) + geom_line() + labs(x=“Time (hr)”, y=“Concentration (mg/L)”)
p <- ggplot(data=Theoph, aes(x=Time, y=conc, group=Subject)) + geom_line() + labs(x=“Time (hr)”, y=“Concentration (mg/L)”) + stat_summary(fun.y=median, geom=“line”,aes(x=ntpd, y=conc, group=1), color=“red”, size=1)
print(p) # “p” is a ggplot object
# create a flag for body weight
Theoph$WT <- ifelse(Theoph$Wt<70, “WT < 70kg”, “WT >= 70kg”)
p + facet_grid(.~WT)""t>
There are a couple things to help you run this.
First, you have curly/smart quotes “ in your code, and should just use plain quotes ". Sometimes we get this excess formatting when we copy/paste code from other sources like this.
Second, you need to use the supplementary materials to calculate ntpd, add to the Theoph dataset.
Below is code that seemed to work at my end to reproduce the spaghetti plots.
library(ggplot2)
library(nlme)
# Reference:
# https://ascpt.onlinelibrary.wiley.com/doi/10.1038/psp.2013.56
head(Theoph)
ggplot(data = Theoph, aes(x = Time, y = conc, group = Subject)) +
geom_line() +
labs(x = "Time (hr)", y = "Concentration (mg/L)")
##################################################################################
## we need some data manipulation for Figure 1(e) and Figure (f)
## below code is how to calculate approximate ntpd (nominal post time dose)
## "ntpd" is used for summarizing conc data (calculate mean at each time point)
## create body weight category for <70 kg or >=70 kg
##################################################################################
#--create a cut (time intervals)
Theoph$cut <- cut(Theoph$Time, breaks=c(-0.1,0,1,1.5, 2,3,4,6,8,12,16,20,24))
#--make sure each time point has reasonable data
table(Theoph$cut)
#--calcuate approximate ntpd
library(plyr)
tab <- ddply(Theoph, .(cut), summarize, ntpd=round(mean(Time, na.rm=T),2))
#--merge ntpd into Theoph data
Theoph <- merge(Theoph, tab, by=c("cut"), all.x=T)
#--sort the data by Subject and Time, select only nessesary columns
Theoph <- Theoph[order(Theoph$Subject, Theoph$Time),c("Subject","Wt","Dose","Time","conc","ntpd")]
#--create body weight category for <70 kg or >=70 kg for Figure 1(f)
Theoph$WT <- ifelse(Theoph$Wt<70, "WT < 70kg", "WT >= 70kg")
#--end of data manipulation
##################################################################################
p <- ggplot(data = Theoph, aes(x=Time, y=conc, group=Subject)) +
geom_line() +
labs(x="Time (hr)", y="Concentration (mg/L)") +
stat_summary(fun = median, geom = "line", aes(x = ntpd, y = conc, group = 1), color = "red", size=1)
print(p)
p + facet_grid(. ~ WT)

Annotate x-axis with N in faceted plot, but preserve empty facets

I asked a question yesterday about annotating the x-axis with N in a faceted plot using a minimal example that turns out to be too simple, relative to my real problem. The answer given there works in the case of complete data, but if you have missing facets you would like to preserve, the combination of facet_wrap options drop=FALSE and scales="free_x" triggers an error: "Error in if (zero_range(from) || zero_range(to)) { : missing value where TRUE/FALSE needed"
Here is a new, less-minimal example. The goal here is to produce a large graph with two panels using grid.arrange; the first showing absolute values over time by treatment group; the second showing the change from baseline over time by treatment group. In the second panel, we need a blank facet when vis=1.
# setup
library(ggplot2)
library(plyr)
library(gridExtra)
trt <- factor(rep(LETTERS[1:2],150),ordered=TRUE)
vis <- factor(c(rep(1,150),rep(2,100),rep(3,50)),ordered=TRUE)
id <- c(c(1:150),c(1:100),c(1:50))
val <- rnorm(300)
data <- data.frame(id,trt,vis,val)
base <- with(subset(data,vis==1),data.frame(id,trt,baseval=val))
data <- merge(data,base,by="id")
data <- transform(data,chg=ifelse(vis==1,NA,val-baseval))
data.sum <- ddply(data, .(vis, trt), summarise, N=length(na.omit(val)))
data <- merge(data,data.sum)
data <- transform(data, trtN=paste(trt,N,sep="\n"))
mytheme <- theme_bw() + theme(panel.margin = unit(0, "lines"), strip.background = element_blank())
# no missing facets
plot.a <- ggplot(data) + geom_boxplot(aes(x=trtN,y=val,group=trt,colour=trt), show.legend=FALSE) +
facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1, scales="free_x") +
labs(x="Visit") + mytheme
# first facet should be blank
plot.b <- ggplot(data) + geom_boxplot(aes(x=trtN,y=chg,group=trt,colour=trt), show.legend=FALSE) +
facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1, scales="free_x") +
labs(x="Visit") + mytheme
grid.arrange(plot.a,plot.b,nrow=2)
You can add a blank layer to draw all the facets in your second plot. The key is that you need a variable that exists for every level of vis to use as your y variable. In your case you can simply use the variable you used in your first plot.
ggplot(data) +
geom_boxplot(aes(x = trtN, y = chg, group = trt, colour = trt), show.legend = FALSE) +
geom_blank(aes(x = trtN, y = val)) +
facet_wrap(~ vis, switch = "x", nrow = 1, scales = "free_x") +
labs(x="Visit") + mytheme
If your variables have different ranges, you can set the y limits using the overall min and max of your boxplot y variable.
+ scale_y_continuous(limits = c(min(data$chg, na.rm = TRUE), max(data$chg, na.rm = TRUE)))

Annotate x-axis with N in faceted plot

I'm trying to produce a boxplot of some numeric outcome broken down by treatment condition and visit number, with the number of observations in each box placed under the plot, and the visit numbers labeled as well. Here's some fake data that will serve to illustrate, and I give two examples of things I've tried that didn't quite work.
library(ggplot2)
library(plyr)
trt <- factor(rep(LETTERS[1:2],150),ordered=TRUE)
vis <- factor(c(rep(1,150),rep(2,100),rep(3,50)),ordered=TRUE)
val <- rnorm(300)
data <- data.frame(trt,vis,val)
data.sum <- ddply(data, .(vis, trt), summarise,
N=length(na.omit(val)))
mytheme <- theme_bw() + theme(panel.margin = unit(0, "lines"), strip.background = element_blank())
The below code produces a plot that has N labels where I want them. It does this by grabbing summary data from an auxiliary dataset I created. However, I couldn't figure out how to also label visit on the x-axis (ideally, below the individual box labels), or to delineate visits visually in other ways (e.g. lines separating them into panels).
plot1 <- ggplot(data) +
geom_boxplot(aes(x=vis:trt,y=val,group=vis:trt,colour=trt), show.legend=FALSE) +
scale_x_discrete(labels=paste(data.sum$trt,data.sum$N,sep="\n")) +
labs(x="Visit") + mytheme
The plot below is closer to what I want than the one above, in that it has a nice hierarchy of treatments and visits, and a pretty format delineating the visits. However, for each panel it grabs the Ns from the first row in the summary data that matches the treatment condition, because it doesn't "know" that each facet needs to use the row corresponding to that visit.
plot2 <- ggplot(data) + geom_boxplot(aes(x=trt,y=val,group=trt,colour=trt), show.legend=FALSE) +
facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1) +
scale_x_discrete(labels=paste(data.sum$trt,data.sum$N,sep="\n")) +
labs(x="Visit") + mytheme
One workaround is to manipulate your dataset so your x variable is the interaction between trt and N.
Working off what you already have, you can add N to the original dataset via a merge.
test = merge(data, data.sum)
Then make a new variable that is the combination of trt and N.
test = transform(test, trt2 = paste(trt, N, sep = "\n"))
Now make the plot, using the new trt2 variable on the x axis and using scales = "free_x" in facet_wrap to allow for the different labels per facet.
ggplot(test) +
geom_boxplot(aes(x = trt2, y = val, group = trt, colour = trt), show.legend = FALSE) +
facet_wrap(~ vis, drop = FALSE, switch="x", nrow = 1, scales = "free_x") +
labs(x="Visit") +
mytheme
Since this functionality isn't built in a good work-around is grid.extra:
library(gridExtra)
p1 <- ggplot(data[data$vis==1,]) + geom_boxplot(aes(x=trt,y=val,group=trt,colour=trt), show.legend=FALSE) +
#facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1) +
scale_x_discrete(labels=lb[1:2]) + #paste(data.sum$trt,data.sum$N,sep="\n")
labs(x="Visit") + mytheme
p2 <- ggplot(data[data$vis==2,]) + geom_boxplot(aes(x=trt,y=val,group=trt,colour=trt), show.legend=FALSE) +
#facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1) +
scale_x_discrete(labels=lb[3:4]) + #paste(data.sum$trt,data.sum$N,sep="\n")
labs(x="Visit") + mytheme
p3 <- ggplot(data[data$vis==3,]) + geom_boxplot(aes(x=trt,y=val,group=trt,colour=trt), show.legend=FALSE) +
#facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1) +
scale_x_discrete(labels=lb[5:6]) + #paste(data.sum$trt,data.sum$N,sep="\n")
labs(x="Visit") + mytheme
grid.arrange(p1,p2,p3,nrow=1,ncol=3) # fully customizable
Related:
Varying axis labels formatter per facet in ggplot/R
You can also make them vertical or do other transformations:

Merging two plots in ggplot2 and keeping individual features of the plots (geom_text, geom_vline)

I have two geom_line time series with some geom_text & geom_points on each plot.
One of the two plots additionally has geom_vline. I wonder if it is possible to merge two but i failed to get a solution.
Here are the two plots:
require(zoo)
require(ggplot2)
set.seed(10)
# plot 1:
tmp1 <- xts(cumsum(rnorm(5000,1,10)), Sys.Date()-5000:1)
data.tmp1 = data.frame(date=as.Date(index(tmp1)),
value=drop(coredata(tmp1)))
data.tmp1.year.end = data.frame(date=as.Date(index(tmp1[endpoints(tmp1, "years", 1)])),
value= drop(coredata(tmp1[endpoints(tmp1, "years", 1)])))
plot1 =
ggplot(data.tmp1, aes(x=date, y=value)) +
geom_line(aes(y=value), size=1) +
geom_point(data=data.tmp1.year.end, col="red") +
geom_text(data=data.tmp1.year.end, label=data.tmp1.year.end$value, vjust=0, hjust=1)
# plot 2:
tmp2 <- xts(cumsum(rnorm(5000,1,100)), Sys.Date()-5000:1)
data.tmp2 = data.frame(date=as.Date(index(tmp2)),
value=drop(coredata(tmp2)))
data.tmp2.year.end = data.frame(date=as.Date(index(tmp2[endpoints(tmp2, "years", 1)])),
value= drop(coredata(tmp2[endpoints(tmp2, "years", 1)])))
tmp2.date =as.Date(c("2008-01-01"))
plot2 =
ggplot(data.tmp2, aes(x=date, y=value)) +
geom_line(aes(y=value), size=1) +
geom_point(data=data.tmp2.year.end, col="red") +
geom_vline(xintercept=as.numeric(tmp2.date), linetype="dotted") +
geom_text(data=data.tmp2.year.end, label=data.tmp2.year.end$value, vjust=0, hjust=1)
The goal now is that plot1 and plot2 share one xaxis and all features of the individual graphs are kept in the corresponding plot.
The result should look like this:
You might try combining your daily data sets and your year end data sets into single data frames and then using ggplot's faceting to display on a single date axis. Code could look like:
data.tmp1 <- cbind(data.tmp1, data_name="tmp1")
data.tmp1.year.end <- cbind(data.tmp1.year.end, data_name="tmp1")
data.tmp2 <- cbind(data.tmp2, data_name="tmp2")
data.tmp2.year.end <- cbind(data.tmp2.year.end, data_name="tmp2")
data.tmp <- rbind(data.tmp1,data.tmp2)
data.tmp.year.end <- rbind(data.tmp1.year.end, data.tmp2.year.end)
ggplot(data.tmp, aes(x=date, y=value)) +
geom_line(aes(y=value), size=1) +
geom_point(data=data.tmp.year.end, col="red") +
geom_text(data=data.tmp.year.end, aes(label=data.tmp.year.end$value), vjust=0, hjust=1) +
geom_vline(xintercept=as.numeric(tmp2.date), linetype="dotted") +
facet_grid( data_name ~ . , scales="free_y")
which gives the chart

Varying factor order in each facet of ggplot2

I am trying to create a Cleveland Dot Plot given for two categories in this case J and K. The problem is the elements A,B,C are in both categories so R keeps farting. I have made a simple example:
x <- c(LETTERS[1:10],LETTERS[1:3],LETTERS[11:17])
type <- c(rep("J",10),rep("K",10))
y <- rnorm(n=20,10,2)
data <- data.frame(x,y,type)
data
data$type <- as.factor(data$type)
nameorder <- data$x[order(data$type,data$y)]
data$x <- factor(data$x,levels=nameorder)
ggplot(data, aes(x=y, y=x)) +
geom_segment(aes(yend=x), xend=0, colour="grey50") +
geom_point(size=3, aes(colour=type)) +
scale_colour_brewer(palette="Set1", limits=c("J","K"), guide=FALSE) +
theme_bw() +
theme(panel.grid.major.y = element_blank()) +
facet_grid(type ~ ., scales="free_y", space="free_y")
Ideally, I would want a dot plot for both categories(J,K) individually with each factor(vector x) decreasing with respect to the y vector. What ends up happening is that both categories aren't going from biggest to smallest and are erratic at the end instead. Please help!
Unfortunately factors can only have one set of levels. The only way i've found to do this is actually to create two separate data.frames from your data and re-level the factor in each. For example
data <- data.frame(
x = c(LETTERS[1:10],LETTERS[1:3],LETTERS[11:17]),
y = rnorm(n=20,10,2),
type= c(rep("J",10),rep("K",10))
)
data$type <- as.factor(data$type)
J<-subset(data, type=="J")
J$x <- reorder(J$x, J$y, max)
K<-subset(data, type=="K")
K$x <- reorder(K$x, K$y, max)
Now we can plot them with
ggplot(mapping = aes(x=y, y=x, xend=0, yend=x)) +
geom_segment(data=J, colour="grey50") +
geom_point(data=J, size=3, aes(colour=type)) +
geom_segment(data=K, colour="grey50") +
geom_point(data=K, size=3, aes(colour=type)) +
theme_bw() +
theme(panel.grid.major.y = element_blank()) +
facet_grid(type ~ ., scales="free_y", space="free_y")
which results in

Resources