Formatting output with Knitr, ggplot2 and xtable - r

I am trying to achieve the following task with Knitr, ggplot2 and xtables:
Generate several annotated plots of beta-distributions with ggplot2
Write the output in a layout such that I have a plot, and a corresponding summary Stats table following it, for every plot.
Write the code such that both PDF and HTML reports can be a generated in a presentable way
Here is my attempt at this task (Rnw file):
\documentclass{article}
\begin{document}
Test for ggplot2 with Knitr
<<Initialize, echo=FALSE>>=
library(ggplot2)
library(ggthemes)
library(data.table)
library(grid)
library(xtable)
library (plyr)
pltlist <- list()
statlist <- list()
#
The libraries are loaded. Now run the main loop
<<plotloop, echo=FALSE>>=
for (k in seq(1,7)){
x <- data.table(rbeta(100000,1.6,14+k))
xmean <- mean(x$V1, na.rm=T)
xqtl <- quantile(x$V1, probs = c(0.995), names=F)
xdiff <- xqtl - xmean
dens <- density(x$V1)
xscale <- (max(dens$x, na.rm=T) - min(dens$x, na.rm=T))/100
yscale <- (max(dens$y, na.rm=T))/100
y_max <- max(dens$y, na.rm=T)
y_intercept <- y_max-(10*yscale)
data <- data.frame(x)
y <- ggplot(data, aes(x=V1)) + geom_density(colour="darkgreen", size=2, fill="green",alpha=.3) +
geom_vline(xintercept = xmean, colour="blue", linetype = "longdash") +
geom_vline(xintercept = xqtl, colour="red", linetype = "longdash") +
geom_segment(aes(x=xmean, xend=xqtl, y=y_intercept, yend=y_intercept), colour="red", linetype = "solid", arrow = arrow(length = unit(0.2, "cm"), ends = "both", type = "closed")) +
annotate("text", x = xmean+xscale, y = y_max, label = paste("Val1:",round(xmean,4)), hjust=0) +
annotate("text", x = xqtl+xscale, y = y_max, label = paste("Val2:",round(xqtl,4))) +
annotate("text", x = xmean+10*xscale, y = y_max-15*yscale, label = paste("Val3:",round(xdiff,4))) +
xlim(min(dens$x, na.rm=T), xqtl + 9*xscale) +
xlab("Values") +
ggtitle("Beta Distribution") +
theme_bw() +
theme(plot.title = element_text(hjust = 0, vjust=2))
pltlist[[k]] <- y
statlist[[k]] <- list(mean=xmean, quantile=xqtl)
}
stats <- ldply(statlist, data.frame)
#
Plots are ready. Now Plot them
<<PrintPlots, warning=FALSE, results='asis', echo=FALSE, cache=TRUE, fig.height=3.5>>=
for (k in seq(1,7)){
print(pltlist[[k]])
print(xtable(stats[k,], caption="Summary Statistics", digits=6))
}
#
Plotting Finished.
\end{document}
I am faced with several issues after running this code.
When I run this code just as R code, Once I try to print the plots in the list, the horizontal line from the geom_segment part starts to move all over the place. However if I plot the figures individually, without putting them in a list, the figures are fine, as I would expect them to be.
Only the last plot is as I would expect the output to be, in all the other plots, the geom_segment line moves around randomly.
I am also unable to put a separate caption for the Plots as I can for the Tables.
Points to note :
I am storing the beta-random numbers in data.table since in our actual code, we are using data.table. However for the purposes of testing ggplot2 in this way, I convert the data.table into a data.frame, as ggplot2 requires.
I also need to generate the random numbers within the loop and generate the plots per iteration (so something like first generating the random numbers and then using melt would not work here), since generating the random numbers is emulating a complex database call per iteration of the loop.
I am using RStudio Version 0.98.1091 and
R version 3.1.2 (2014-10-31) on Windows 8.1
This is the expected Plot:
This is the plot I am getting when plotting from the list:
My output in PDF form :
PDF Output
Please advice if there are any ideas for solutions.
Thank you,
SG

I don't know why the horizontal line in geom_segment is "moving around" from plot to plot, rather than spanning xmean to xqtl. However, I was able to get the horizontal line in the correct location by getting the value from the stats data frame, rather than from direct calculation of the mean and quantile. You just have to create the stats data frame before the loop, rather than after, so that you can use it in the loop.
stats <- ldply(statlist, data.frame)
for (k in seq(1,7)){
...
y <- ggplot(data, aes(x=V1)) +
...
geom_segment(aes(x=stats[k,1], xend=stats[k,2], y=y_intercept, yend=y_intercept),
colour="red", linetype = "solid",
arrow = arrow(length = unit(0.2, "cm"), ends = "both", type = "closed")) +
...
pltlist[[k]] <- y
statlist[[k]] <- list(mean=xmean, quantile=xqtl)
}
Hopefully, someone else will be able to explain the anomalous behavior, but at least this seems to fix the problem.
For the figure caption, you can add a fig.cap argument to the chunk where you plot the figures, although this results in the same caption for each figure and causes the figures and tables to be plotted in separate groups, rather than interleaved:
<<PrintPlots, warning=FALSE, results='asis', echo=FALSE, cache=TRUE, fig.cap="Caption", fig.height=3.5>>=
for (k in seq(1,7)){
print(pltlist[[k]])
print(xtable(stats[k,], caption="Summary Statistics", digits=6))
}

You might want to use R Markdown and knitr which is easier than using LaTeX and R (as also zhaoy suggested).
You might also want to check out the ReporteRs package. I think it is actually easier to use than knitr. However, you cannot generate PDFs with it. But you can use pandoc to convert them into PDFs.

Related

ggsave cuts of part of the common legend created with ggarrange

I am trying to generate multiple plots from my data by using lapply and then arranging the resulting list with ggarrange. When I try to save the final figure with ggsave part of the legend text is cut off in the png.
First I define what I want to plot along with Plot titles and colors
main.overview <- list(
c("AA", "AA", "black"),
c("X5.HETE", "5-HETE", "red"),)
I then define a function to generate the plots.
plot.overview = function(data, mediator) {
analyte <- mediator[[1]]
name <- mediator[[2]]
color <- mediator[[3]]
ggplot(data = data, aes_string(x="Compound",y=analyte)) +
geom_boxplot(aes(fill=Compound)) +
labs(title=name) +
scale_fill_brewer(palette="Reds") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5, color = color),axis.title.x = element_blank(),axis.title.y = element_blank())}
Finally I call the function and arrange the plots into a figure
myplots <- lapply(main.overview, plot.overview, data=lm)
arrange <- ggarrange(plotlist = myplots, common.legend = TRUE, nrow=1, legend = "right")
figure <- annotate_figure(arrange, left = text_grob(expression(10^6~cells), rot=90))
ggsave("overview.png", dpi="print", device="png",plot=figure, height=10, width=30, units="cm")
In the final png however the common legend i put on the right is cut off.
EDIT:
I have figured out part of the problem, the problem only occurs on my desktop-pc and not on my laptop, so it might be a problem with additional packages or versions of the R libraries

Dividing long time series in multiple panels with ggplot2

I have a rather long timeseries that I want to plot in ggplot, but it's sufficiently long that even using the full width of the page it's barely readable.
What I want to do instead is to divide the plot into 2 (or more, in the general case) panels one on top of each other.
I could do it manually but not only it's cumbersome but also it's hard to get the axis to have the same scale. Ideally I would like to have something like this:
ggplot(data, aes(time, y)) +
geom_line() +
facet_time(time, n = 2)
And then get something like this:
(This plot was made using facet_wrap(~(year(as.Date(time)) > 2000), ncol = 1, scales = "free_x"), which messes up x axis scale, it works only for 2 panels, and doesn't work well with geom_smooth())
Also, ideally it would also handle summary statistics correctly. For example, using the correct data for geom_smooth() (so facetting wouldn't do it, because at the beginning of every facet it would not use the data in the last chunk of the previous one).
Is there a way to do this?
Thank you!
Below I create two separate plots, one for the period 1982-1999 and one for 1999-2016 and then lay them out using grid.arrange from the gridExtra package. The horizontal axes are scaled equivalently in both plots.
I also generate regression lines outside of ggplot using the loess function so that it can be added using geom_line (you can of course use any regression function here, such as lm, gam, splines, etc). With this approach the regression can be run on the entire time series, ensuring continuity of the regression line across the two panels, even though we break the time series into two halves for plotting.
library(dplyr) # For the chaining (%>%) operator
library(purrr) # For the map function
library(gridExtra) # For the grid.arrange function
Function to extract a legend from a ggplot. We'll use this to get one legend across two separate plots.
# http://stackoverflow.com/questions/12539348/ggplot-separate-legend-and-plot
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
legend
}
# Fake data
set.seed(255)
dat = data.frame(time=rep(seq(1982,2016,length.out=500),2),
value= c(arima.sim(list(ar=c(0.4, 0.05, 0.5)), n=500),
arima.sim(list(ar=c(0.3, -0.3, 0.6)), n=500)),
group=rep(c("A","B"), each=500))
Generate smoother lines using loess: We want a separate regression line for each level of group, so we use group_by with the chaining operator from dplyr:
dat = dat %>% group_by(group) %>%
mutate(smooth = predict(loess(value ~ time, span=0.1)))
Create a list of two plots, one for each time period: We use map to create separate plots for each time period and return a list with the two plot objects as elements (you can also use base lapply for this instead of map):
pl = map(list(c(1982,1999), c(1999,2016)),
~ ggplot(dat %>% filter(time >= .x[1], time <= .x[2]),
aes(colour=group)) +
geom_line(aes(time, value), alpha=0.5) +
geom_line(aes(time, smooth), size=1) +
scale_x_continuous(breaks=1982:2016, expand=c(0.01,0)) +
scale_y_continuous(limits=range(dat$value)) +
theme_bw() +
labs(x="", y="", colour="") +
theme(strip.background=element_blank(),
strip.text=element_blank(),
axis.title=element_blank()))
# Extract legend as a separate graphics object
leg = g_legend(pl[[1]])
Finally, we lay out both plots (after removing legends) plus the extracted legend:
grid.arrange(arrangeGrob(grobs=map(pl, function(p) p + guides(colour=FALSE)), ncol=1),
leg, ncol=2, widths=c(10,1), left="Value", bottom="Year")
You can do this by storing the plot object, then printing it twice. Each time add an option coord_cartesian:
orig_plot <- ggplot(data, aes(time, y)) +
geom_line()
early <- orig_plot + coord_cartesian(xlim = c(1982, 2000))
late <- orig_plot + coord_cartesian(xlim = c(2000, 2016))
That makes sure that both plots use all the data.
To plot them on the same page, use grid (I got this from the ggplot2 book, which is probably around as a pdf somewhere):
library(grid)
vp1 <- viewport(width = 1, height = .5, just = c("center", "bottom"))
vp2 <- viewport(width = 1, height = .5, just = c("center", "top"))
print(early, vp = vp1)
print(late, vp = vp2)

arbitrary number of plots for grid.arrange

I'm trying to plot an arbitrary number of bar plots with rmarkdown separated by 2 columns. In my example there will be 20 total plots so I was hoping to get 10 plots in each column, however, I can't seem to get this to work with grid.arrange
plot.categoric = function(df, feature){
df = data.frame(x=df[,feature])
plot.feature = ggplot(df, aes(x=x, fill = x)) +
geom_bar() +
geom_text(aes(label=scales::percent(..count../1460)), stat='count', vjust=-.4) +
labs(x=feature, fill=feature) +
ggtitle(paste0(length(df$x))) +
theme_minimal()
return(plot.feature)
}
plist = list()
for (i in 1:20){
plist = c(plist, list(plot.categoric(train, cat_features[i])))
}
args.list = c(plist, list(ncol=2))
do.call("grid.arrange", args.list)
When I knit this to html I'm getting the following output:
I was hoping I would get something along the lines of:
but even with this the figure sizes are still funky, I've tried playing with heights and widths but still no luck. Apologies if this is a long question
If you have all the ggplot objects in a list then you can easily build the two column graphic via gridExtra::grid.arrange. Here is a simple example that will put eight graphics into a 4x2 matrix.
library(ggplot2)
library(gridExtra)
# Build a set of plots
plots <-
lapply(unique(diamonds$clarity),
function(cl) {
ggplot(subset(diamonds, clarity %in% cl)) +
aes(x = carat, y = price, color = color) +
geom_point()
})
length(plots)
# [1] 8
grid.arrange(grobs = plots, ncol = 2)

Saving ggplots to a list in a for loop

I produce nine ggplots within a for loop and subsequently arrange those plots using grid.arrange:
plot_list <- list()
for(i in c(3:ncol(bilanz.vol))) {
histogram <- ggplot(data = bilanz.vol, aes(x = bilanz.vol[,i])) +
geom_histogram() +
scale_x_log10() +
ggtitle(paste(varnames[i]))
# ggsave(filename = paste("Graphs/", vars[i], ".png", sep = ""), width = 16, height = 12, units = "cm")
plot_list <- c(plot_list, list(histogram))
}
library(gridExtra)
png(filename = "Graphs/non-mfi.png", width = 1280, height = 960, units = "px")
do.call(grid.arrange, c(plot_list, list(ncol = 3)))
dev.off()
The code itself works fine and there are no errors. But for some reason I do not understand, the grid shows the same (last) histogram nine times. Still, each plot shows the correct title.
Interestingly, when I uncomment the ggsave line in the code above, each plot is saved correctly (separately) and shows the expected histogram.
Any ideas?
The reason is that ggplot does not evaluate the expression in the aes call before it is used (so I believe at least), it just sets up the plot and stores the data inside of it. In you case "the data" is the entire data frame bilanz.vol and since i = ncol(bilanz.vol) after the for loop completes the expression bilanz.vol[,i] will evaluate to the same thing for all plot objects.
To make it work you could do this, which makes sure all plot objects contains different data sets my.data.
my.data <- data.frame(x = bilanz.vol[,i])
histogram <- ggplot(data = my.data, aes(x = x)) +
geom_histogram() +
scale_x_log10() +
ggtitle(paste(varnames[i]))

gwidgets ggraphics cutting edge of ggplot

This may be something really obvious, but I am struggling to find a good resource explaining how to use features of gwidgets. With some help I have this script which creates checkboxes which alter a list of file names which are then used to create a plot of the checked files using ggplot. The problem is that the plot is getting cut off at the right edge and I have no idea how to fix this.
EDIT: I see some of you have been busy down-rating me, but now this should work if you run it with the file I provided. I have a suspicion that the problem arises from cairoDevice and the way ggraphics renders the plot.
read.table("foo.csv", header = TRUE, sep = ",", row.names=1)
ggplot(MeanFrameMelt, aes(x=variable, y=value, color=Legend, group=Legend))+
geom_line()+
theme(panel.background = element_rect(fill='NA', colour='black', size = 1),
legend.position = "none")+
ylab("Tag Density (mean coverage/bp)")+
xlab("Distance from reference side (bp)")+
scale_x_discrete(breaks=c("V1", "V200", "V400"), labels=c("-10000", "0", "10000"))+
GraphFiles <- FileNamesOrig
w <- gwindow("Tag Density Checkboxes", width = 1000)
g <- ggroup(container = w, horizontal = FALSE)
add(g, ggraphics())
lyt <- glayout(container = g, horizontal = FALSE)
print(p)
foo.cvs (this is the MeanFrameMelt)
EDIT 2:
This is what the graph looks like for me. I don't know what is going on, I am exporting the data.frame with this command:
write.table(MeanFrameMelt, file="test.cvs", sep=",", col.names=TRUE)
but then when I run it with the exported file I get exactly what agstudy got. The files are supposed to be identical.
EDIT 3:
Tested it with gput (thank you for the suggestion) and now its creating the correct plot:
New file
Use dget(file="test.txt")
I just reorganized your code, but I can't reproduce the problem. You have to call the plot actions inside a handelr to interact later with user(e.g zoom , mouse events). I show an example here.
First time you run you have the plot with an ugly axis. Then when you click in a region , the plot is refreshed and you have a nice axis.
## I define my plot
p <- ggplot(MeanFrameMelt, aes(x=variable, y=value, color=Legend, group=Legend))+
geom_line()+
theme(panel.background = element_rect(fill='NA', colour='black', size = 1),
legend.position = "none")+
ylab("Tag Density (mean coverage/bp)")+
xlab("Distance from reference side (bp)")
## init gwidgets
library(gWidgetsRGtk2)
w <- gwindow("Tag Density Checkboxes", width = 1000)
g <- ggroup(container = w, horizontal = FALSE)
gg <- ggraphics(container=g)
lyt <- glayout(container = g, horizontal = FALSE)
## I plot it the first time
print(p)
## I add a handler
ID <- addHandlerChanged(gg, handler=function(h,...) {
p <- p + scale_x_discrete(breaks=c("V1", "V200", "V400"),
labels=c("-1000", "0", "1000"))
print(p)
})
print(p)

Resources