Consider the simple example:
library(ggplot2)
head(mtcars)
# create the plot
ggplot(mtcars, aes(factor(cyl))) + geom_bar() + theme_bw() +
theme(strip.text.x = element_text(size = 20, face="bold"))+
xlab("number of cyl") + ylab("Count")
Now we can obtain the average $mpg per cyl with:
aggregate(mpg ~ cyl, data = mtcars, FUN=mean)
How can I put these average values into the x-axis so that they appear below the corresponding cyl. Can one draw a table and somehow write that this is the ...average mpg per cyl...
Here is a simple way to do it by rewriting the factor level names:
(Note that this is safe only as long as aggregate generates it table in the same order as the factor level names and without any gaps - which seems like it should be the case, but one would have to investigate to make sure. It might be safer to code it as a loop and look at the level names to make sure they match up correctly)
library(ggplot2)
head(mtcars)
adf <- aggregate(mpg ~ cyl, data = mtcars, FUN=mean)
mtcars$fcyl <- factor(mtcars$cyl)
levels(mtcars$fcyl) <- sprintf("%d \n %.1f",adf$cyl,adf$mpg)
# create the plot
ggplot(mtcars, aes(fcyl)) + geom_bar() + theme_bw() +
theme(strip.text.x = element_text(size = 20, face="bold"))+
xlab("number of cyl") + ylab("Count")
yielding:
Related
I'm trying to put expressions into facet labels using label_parsed but with no success:
library(ggplot2)
mpg3 <- mpg
levels(mpg3$drv)[levels(mpg3$drv)=="4"] <- "4^{wd}"
levels(mpg3$drv)[levels(mpg3$drv)=="f"] <- "- Front %.% e^{pi * i}"
levels(mpg3$drv)[levels(mpg3$drv)=="r"] <- "4^{wd} - Front"
ggplot(mpg3, aes(x=displ, y=hwy)) + geom_point() +
facet_grid(. ~ drv, labeller = label_parsed)
The plot that I get lacks expressions - facet labels contain the original levels of drv variable.
If I type levels(mpg3$drv) I get character(0).
There are two problems - firstly mpg$drv is character, not factor, and secondly, you need to set the factor labels, not the levels. I think this is what you want...
mpg3 <- mpg
mpg3$drv <- factor(mpg3$drv,
levels=c("4","f","r"),
labels=c("4^{wd}","- Front %.% e^{pi * i}","4^{wd} - Front"))
ggplot(mpg3, aes(x=displ, y=hwy)) +
geom_point() +
facet_grid(. ~ drv, labeller = label_parsed)
I asked a question yesterday about annotating the x-axis with N in a faceted plot using a minimal example that turns out to be too simple, relative to my real problem. The answer given there works in the case of complete data, but if you have missing facets you would like to preserve, the combination of facet_wrap options drop=FALSE and scales="free_x" triggers an error: "Error in if (zero_range(from) || zero_range(to)) { : missing value where TRUE/FALSE needed"
Here is a new, less-minimal example. The goal here is to produce a large graph with two panels using grid.arrange; the first showing absolute values over time by treatment group; the second showing the change from baseline over time by treatment group. In the second panel, we need a blank facet when vis=1.
# setup
library(ggplot2)
library(plyr)
library(gridExtra)
trt <- factor(rep(LETTERS[1:2],150),ordered=TRUE)
vis <- factor(c(rep(1,150),rep(2,100),rep(3,50)),ordered=TRUE)
id <- c(c(1:150),c(1:100),c(1:50))
val <- rnorm(300)
data <- data.frame(id,trt,vis,val)
base <- with(subset(data,vis==1),data.frame(id,trt,baseval=val))
data <- merge(data,base,by="id")
data <- transform(data,chg=ifelse(vis==1,NA,val-baseval))
data.sum <- ddply(data, .(vis, trt), summarise, N=length(na.omit(val)))
data <- merge(data,data.sum)
data <- transform(data, trtN=paste(trt,N,sep="\n"))
mytheme <- theme_bw() + theme(panel.margin = unit(0, "lines"), strip.background = element_blank())
# no missing facets
plot.a <- ggplot(data) + geom_boxplot(aes(x=trtN,y=val,group=trt,colour=trt), show.legend=FALSE) +
facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1, scales="free_x") +
labs(x="Visit") + mytheme
# first facet should be blank
plot.b <- ggplot(data) + geom_boxplot(aes(x=trtN,y=chg,group=trt,colour=trt), show.legend=FALSE) +
facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1, scales="free_x") +
labs(x="Visit") + mytheme
grid.arrange(plot.a,plot.b,nrow=2)
You can add a blank layer to draw all the facets in your second plot. The key is that you need a variable that exists for every level of vis to use as your y variable. In your case you can simply use the variable you used in your first plot.
ggplot(data) +
geom_boxplot(aes(x = trtN, y = chg, group = trt, colour = trt), show.legend = FALSE) +
geom_blank(aes(x = trtN, y = val)) +
facet_wrap(~ vis, switch = "x", nrow = 1, scales = "free_x") +
labs(x="Visit") + mytheme
If your variables have different ranges, you can set the y limits using the overall min and max of your boxplot y variable.
+ scale_y_continuous(limits = c(min(data$chg, na.rm = TRUE), max(data$chg, na.rm = TRUE)))
I'm trying to produce a boxplot of some numeric outcome broken down by treatment condition and visit number, with the number of observations in each box placed under the plot, and the visit numbers labeled as well. Here's some fake data that will serve to illustrate, and I give two examples of things I've tried that didn't quite work.
library(ggplot2)
library(plyr)
trt <- factor(rep(LETTERS[1:2],150),ordered=TRUE)
vis <- factor(c(rep(1,150),rep(2,100),rep(3,50)),ordered=TRUE)
val <- rnorm(300)
data <- data.frame(trt,vis,val)
data.sum <- ddply(data, .(vis, trt), summarise,
N=length(na.omit(val)))
mytheme <- theme_bw() + theme(panel.margin = unit(0, "lines"), strip.background = element_blank())
The below code produces a plot that has N labels where I want them. It does this by grabbing summary data from an auxiliary dataset I created. However, I couldn't figure out how to also label visit on the x-axis (ideally, below the individual box labels), or to delineate visits visually in other ways (e.g. lines separating them into panels).
plot1 <- ggplot(data) +
geom_boxplot(aes(x=vis:trt,y=val,group=vis:trt,colour=trt), show.legend=FALSE) +
scale_x_discrete(labels=paste(data.sum$trt,data.sum$N,sep="\n")) +
labs(x="Visit") + mytheme
The plot below is closer to what I want than the one above, in that it has a nice hierarchy of treatments and visits, and a pretty format delineating the visits. However, for each panel it grabs the Ns from the first row in the summary data that matches the treatment condition, because it doesn't "know" that each facet needs to use the row corresponding to that visit.
plot2 <- ggplot(data) + geom_boxplot(aes(x=trt,y=val,group=trt,colour=trt), show.legend=FALSE) +
facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1) +
scale_x_discrete(labels=paste(data.sum$trt,data.sum$N,sep="\n")) +
labs(x="Visit") + mytheme
One workaround is to manipulate your dataset so your x variable is the interaction between trt and N.
Working off what you already have, you can add N to the original dataset via a merge.
test = merge(data, data.sum)
Then make a new variable that is the combination of trt and N.
test = transform(test, trt2 = paste(trt, N, sep = "\n"))
Now make the plot, using the new trt2 variable on the x axis and using scales = "free_x" in facet_wrap to allow for the different labels per facet.
ggplot(test) +
geom_boxplot(aes(x = trt2, y = val, group = trt, colour = trt), show.legend = FALSE) +
facet_wrap(~ vis, drop = FALSE, switch="x", nrow = 1, scales = "free_x") +
labs(x="Visit") +
mytheme
Since this functionality isn't built in a good work-around is grid.extra:
library(gridExtra)
p1 <- ggplot(data[data$vis==1,]) + geom_boxplot(aes(x=trt,y=val,group=trt,colour=trt), show.legend=FALSE) +
#facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1) +
scale_x_discrete(labels=lb[1:2]) + #paste(data.sum$trt,data.sum$N,sep="\n")
labs(x="Visit") + mytheme
p2 <- ggplot(data[data$vis==2,]) + geom_boxplot(aes(x=trt,y=val,group=trt,colour=trt), show.legend=FALSE) +
#facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1) +
scale_x_discrete(labels=lb[3:4]) + #paste(data.sum$trt,data.sum$N,sep="\n")
labs(x="Visit") + mytheme
p3 <- ggplot(data[data$vis==3,]) + geom_boxplot(aes(x=trt,y=val,group=trt,colour=trt), show.legend=FALSE) +
#facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1) +
scale_x_discrete(labels=lb[5:6]) + #paste(data.sum$trt,data.sum$N,sep="\n")
labs(x="Visit") + mytheme
grid.arrange(p1,p2,p3,nrow=1,ncol=3) # fully customizable
Related:
Varying axis labels formatter per facet in ggplot/R
You can also make them vertical or do other transformations:
I wish to highlight segments above or below a certain value in a time series by a unique colour or a shape. In the example data I am decomposing a mortality time series into its components. My goal is to highlight the segments when the mortality in the trend component falls below 35 (deep between 1997 and 2000) and when the residual component is above 100 (the spike). I have tried to use annotate, but that did not produce what I wanted.
#Load library and obtain data
library(gamair)
library(tsModel)
library(ggplot2)
library(reshape2)
data<-data(chicago)
## create variables, decompose TS
chicago$date<-seq(from=as.Date("1987-01-01"), to=as.Date("2000-12-31"),length=5114)
data<- chicago[,c("date","death")]
mort <- tsdecomp(data$death, c(1, 2, 15, 5114))
## Convert matrix to df, rename, melt
df<-as.data.frame(mort)
names(df)[1] <- "Trend"
names(df)[2] <- "Seasonal"
names(df)[3] <- "Residual"
df$date<-seq(as.Date("1987-01-01"), as.Date("2000-12-31"), "day")
meltdf <- melt(df,id="date")
## Plot
ggplot(meltdf,aes(x=date,y=value,colour=variable,group=variable)) + geom_line() +
theme_bw() +
ylab("") + xlab("") +
facet_grid(variable ~ . , scales = "free") +
theme(legend.position = "none")
annotate("rect", xmin=1995-01-01,xmax=1996-01-01,ymin= 10, ymax=300, alpha = .2,fill="blue")
Well, this works but I must admit it's more work that I'd hoped.
get.box <- function(data) {
rng <- range(data$date) + c(-50,50)
z <- meltdf[meltdf$date>=rng[1] & meltdf$date <=rng[2] & meltdf$variable==unique(data$variable),]
data.frame(variable=unique(z$variable),
xmin=min(z$date),xmax=max(z$date),ymin=min(z$value),ymax=max(z$value))
}
hilight.trend <- get.box(with(meltdf,meltdf[variable=="Trend" & value<35,]))
hilight.resid <- get.box(with(meltdf,meltdf[variable=="Residual" & value>100,]))
ggplot(meltdf,aes(colour=variable,group=variable)) +
geom_line(aes(x=date,y=value)) +
theme_bw() +
ylab("") + xlab("") +
facet_grid(variable ~ . , scales = "free") +
theme(legend.position = "none") +
geom_rect(data=hilight.trend, alpha=0.2, fill="red",
aes(xmax=xmax,xmin=xmin,ymax=ymax,ymin=ymin)) +
geom_rect(data=hilight.resid, alpha=0.2, fill="blue",
aes(xmax=xmax,xmin=xmin,ymax=ymax,ymin=ymin))
You can't really use annotate(...) with facets, because you will get the same annotation on all the facets. So you're left with something like geom_rect(...). The problem here is that geom_rect(...) draws a rectangle for every row in the data. So you need to create an auxiliary dataset with just one row for each variable, containing the x- and y- min and max.
I am trying to create a Cleveland Dot Plot given for two categories in this case J and K. The problem is the elements A,B,C are in both categories so R keeps farting. I have made a simple example:
x <- c(LETTERS[1:10],LETTERS[1:3],LETTERS[11:17])
type <- c(rep("J",10),rep("K",10))
y <- rnorm(n=20,10,2)
data <- data.frame(x,y,type)
data
data$type <- as.factor(data$type)
nameorder <- data$x[order(data$type,data$y)]
data$x <- factor(data$x,levels=nameorder)
ggplot(data, aes(x=y, y=x)) +
geom_segment(aes(yend=x), xend=0, colour="grey50") +
geom_point(size=3, aes(colour=type)) +
scale_colour_brewer(palette="Set1", limits=c("J","K"), guide=FALSE) +
theme_bw() +
theme(panel.grid.major.y = element_blank()) +
facet_grid(type ~ ., scales="free_y", space="free_y")
Ideally, I would want a dot plot for both categories(J,K) individually with each factor(vector x) decreasing with respect to the y vector. What ends up happening is that both categories aren't going from biggest to smallest and are erratic at the end instead. Please help!
Unfortunately factors can only have one set of levels. The only way i've found to do this is actually to create two separate data.frames from your data and re-level the factor in each. For example
data <- data.frame(
x = c(LETTERS[1:10],LETTERS[1:3],LETTERS[11:17]),
y = rnorm(n=20,10,2),
type= c(rep("J",10),rep("K",10))
)
data$type <- as.factor(data$type)
J<-subset(data, type=="J")
J$x <- reorder(J$x, J$y, max)
K<-subset(data, type=="K")
K$x <- reorder(K$x, K$y, max)
Now we can plot them with
ggplot(mapping = aes(x=y, y=x, xend=0, yend=x)) +
geom_segment(data=J, colour="grey50") +
geom_point(data=J, size=3, aes(colour=type)) +
geom_segment(data=K, colour="grey50") +
geom_point(data=K, size=3, aes(colour=type)) +
theme_bw() +
theme(panel.grid.major.y = element_blank()) +
facet_grid(type ~ ., scales="free_y", space="free_y")
which results in