ggplot2 histogram legend too large - r

I'm pretty happy with the results I'm getting with R. Most of my stacked histogram plots are looking fine, e.g.
and
However, I have a few that have so many categories in the legend that the legend is crowing out the plot, e.g.
How can I fix this?
Here is my plot.r, which I call on the command line like this
RScript plot.r foo.dat foo.png 1600 800
foo.dat
account,operation,call_count,day
cal3510,foo-method,1,2016-10-01
cra4617,foo-method,1,2016-10-03
cus4404,foo-method,1,2016-10-03
hin4510,foo-method,1,2016-10-03
mas4484,foo-method,1,2016-10-04
...
entirety of foo.dat: http://pastebin.com/xnJtJSrU
plot.r
library(ggplot2)
library(scales)
args<-commandArgs(TRUE)
filename<-args[1]
png_filename<-args[2]
wide<-as.numeric(args[3])
high<-as.numeric(args[4])
print(wide)
print(high)
print(filename)
print(png_filename)
dat = read.csv(filename)
dat$account = as.character(dat$account)
dat$operation = as.character(dat$operation)
dat$call_count = as.integer(dat$call_count)
dat$day = as.Date(dat$day)
png(png_filename,width=wide,height=high)
p <- ggplot(dat, aes(x=day, y=call_count, fill=account))
p <- p + geom_histogram(stat="identity")
p <- p + scale_x_date(labels=date_format("%b-%Y"), limits=as.Date(c('2016-10-01','2017-01-01')))
print(p)
dev.off()

Answer from #PierreLafortune
using:
p <- p + theme(legend.position="bottom")
p <- p + guides(fill=guide_legend(nrow=5, byrow=TRUE))

Related

ggplot2 can't draw a correct plot with only two or three data points

I'm using R to generate some plots of some metrics and getting nice results like this for data that has > 3 data points:
However, I'm noticing that for data with only a few values - I get very poor results.
If I draw a plot with only two data points, I get a blank plot.
foo_two_points.dat
cluster,account,current_database,action,operation,count,day
cluster19,col0063,col0063,foo_two,two_bar,10,2016-10-04 00:00:00-07:00
cluster61,dwm4944,dwm4944,foo_two,two_bar,2,2016-12-14 00:00:00-08:00
If I draw one data point, it works.
foo_one_point.dat
cluster,account,current_database,action,operation,count,day
cluster1,foo0424,foo0424,fooone,,2,2016-11-01 00:00:00-07:00
Three, it almost works, but isn't accurate.
foo_three_points.dat
cluster,account,current_database,action,operation,count,day
cluster23,col2225,col2225,foo_three,bar,9,2016-12-22 00:00:00-08:00
cluster23,col2225,col2225,foo_three,bar,1,2016-12-29 00:00:00-08:00
cluster12,red1782,red1782,foo_three,bar,2,2016-10-25 00:00:00-07:00
4, 5, etc. all seem fine
But two or three points - nope.
Here is my plot.r file:
library(ggplot2)
library(scales)
args<-commandArgs(TRUE)
filename<-args[1]
n = nchar(filename) - 4
thetitle = substring(filename, 1, n)
print(thetitle)
png_filename <- stringi::stri_flatten(stringi::stri_join(c(thetitle,'.png')))
wide<-as.numeric(args[2])
high<-as.numeric(args[3])
legend_left<-as.numeric(args[4])
pos <- if(legend_left == 1) c(1,0) else c(0,1)
place <- if(legend_left == 1) 'left' else 'right'
print(wide)
print(high)
print(filename)
print(png_filename)
dat = read.csv(filename)
dat$account = as.character(dat$account)
dat$action=as.character(dat$action)
dat$operation = as.character(dat$operation)
dat$count = as.integer(dat$count)
dat$day = as.Date(dat$day)
dat[is.na(dat)]<-"N/A"
png(png_filename,width=wide,height=high)
p <- ggplot(dat, aes(x=day, y=count, fill=account, labels=TRUE))
p <- p + geom_histogram(stat="identity")
p <- p + scale_x_date(labels=date_format("%b-%Y"), limits=as.Date(c('2016-10-01','2017-01-01')))
p <- p + theme(legend.position="bottom")
p <- p + guides(fill=guide_legend(nrow=5, byrow=TRUE))
p <- p + theme(text = element_text(size=15))
p<-p+labs(title=thetitle)
print(p)
dev.off()
Here's the command I use to run it:
RScript plot.r foo_five_points.dat 1600 800 0
What am I doing wrong?
I don't know if this is a bug, I think it is actually by design and the bars are getting clipped as they spill over into the limits.
I also think this is more of a geom_bar than a geom_histogram as this doesn't seem to be distribution data, but that is irrelevant to the issue, both behave the same.
One solution it is to set the width parameter explicitly in geom_histo instead of letting it be calculated:
p <- ggplot(dat, aes(x=day, y=count, fill=account, labels=TRUE))
p <- p + geom_histogram(stat="identity",width=1)
p <- p + scale_x_date(labels=date_format("%b-%Y"), limits=as.Date(c('2016-10-1','2017-01-01')))
p <- p + theme(legend.position="bottom")
p <- p + guides(fill=guide_legend(nrow=5, byrow=TRUE))
p <- p + theme(text = element_text(size=15))
p<-p+labs(title=thetitle)
Then your two point example that is blank above gives you this - which seems right:
Can't be sure that setting the width explicitly will work when you have a lot of data though and the bars keep needing to get smaller - I suppose you could set it conditionally.

ggplot, drawing multiple lines across facets

I drew two panels in a column using ggplot2 facet, and would like to add two vertical lines across the panels at x = 4 and 8. The following is the code:
library(ggplot2)
library(gtable)
library(grid)
dat <- data.frame(x=rep(1:10,2),y=1:20+rnorm(20),z=c(rep("A",10),rep("B",10)))
P <- ggplot(dat,aes(x,y)) + geom_point() + facet_grid(z~.) + xlim(0,10)
Pb <- ggplot_build(P);Pg <- ggplot_gtable(Pb)
for (i in c(4,8)){
Pg <- gtable_add_grob(Pg, moveToGrob(i/10,0),t=8,l=4)
Pg <- gtable_add_grob(Pg, lineToGrob(i/10,1),t=6,l=4)
}
Pg$layout$clip <- "off"
grid.newpage()
grid.draw(Pg)
The above code is modified from:ggplot, drawing line between points across facets.
And .
There are two problems in this figure. First, only one vertical line was shown. It seems that moveToGrob only worked once.. Second, the shown line is not exact at x = 4. I didn't find the Pb$panel$ranges variable, so is there a way that I can correct the range as well? Thanks a lot.
Updated to ggplot2 V3.0.0
In the simple scenario where panels have common axes and the lines extend across the full y range you can draw lines over the whole gtable cells, having found the correct npc coordinates conversion (cf previous post, updated because ggplot2 keeps changing),
library(ggplot2)
library(gtable)
library(grid)
dat <- data.frame(x=rep(1:10,2),y=1:20+rnorm(20),z=c(rep("A",10),rep("B",10)))
p <- ggplot(dat,aes(x,y)) + geom_point() + facet_grid(z~.) + xlim(0,10)
pb <- ggplot_build(p)
pg <- ggplot_gtable(pb)
data2npc <- function(x, panel = 1L, axis = "x") {
range <- pb$layout$panel_params[[panel]][[paste0(axis,".range")]]
scales::rescale(c(range, x), c(0,1))[-c(1,2)]
}
start <- sapply(c(4,8), data2npc, panel=1, axis="x")
pg <- gtable_add_grob(pg, segmentsGrob(x0=start, x1=start, y0=0, y1=1, gp=gpar(lty=2)), t=7, b=9, l=5)
grid.newpage()
grid.draw(pg)
You can just use geom_vline and avoid the grid mess altogether:
ggplot(dat, aes(x, y)) +
geom_point() +
geom_vline(xintercept = c(4, 8)) +
facet_grid(z ~ .) +
xlim(0, 10)

How to set width of y-axis tick labels in ggplot2 in R?

I would like to left align the plot panels in a vertical array of ggplot2 graphs in R. The maximum width of the y-axis tick labels varies from graph to graph, breaking this alignment, as shown in the sample code below.
I've tried various plot, panel, and axis.text margin options without success, and have not been able to find an option for controlling the width of the y-axis tick labels.
Guidance appreciated.
#install.packages(c("ggplot2", "gridExtra", "reshape2"), dependencies = TRUE)
require(ggplot2)
require(gridExtra)
require(reshape2)
v <- 1:5
data1 <- data.frame(x=v, y=v)
data2 <- data.frame(x=v, y=1000*v)
plot1 <- ggplot(data=melt(data1, id='x'), mapping=aes_string(x='x', y='value')) + geom_line()
plot2 <- ggplot(data=melt(data2, id='x'), mapping=aes_string(x='x', y='value')) + geom_line()
grid.arrange(plot1, plot2, ncol=1)
You can use function plot_grid() from library cowplot to align plots
# install.packages(c("ggplot2", "cowplot", "reshape2"), dependencies = TRUE)
library(cowplot)
plot_grid(plot1,plot2,ncol=1,align="v")
would this something like that work for you:
data1$Data <- "data1"
data2$Data <- "data2"
data3 <- rbind(data1, data2)
ggplot(data=data3, aes(x=x, y=y)) + geom_line() + facet_grid(Data~., scales = "free_y")
like this? (code below)
# install.packages(c("ggplot2", "gridExtra", "reshape2"), dependencies = TRUE)
require(ggplot2)
require(gridExtra)
require(reshape2)
v <- 1:5
data1 <- data.frame(x=v, y=v)
data2 <- data.frame(x=v, y=1000*v)
plot1 <- ggplot(data=melt(data1, id='x'), mapping=aes_string(x='x', y='value')) + geom_line() + scale_y_continuous(breaks=NULL)
plot2 <- ggplot(data=melt(data2, id='x'), mapping=aes_string(x='x', y='value')) + geom_line() + scale_y_continuous(breaks=c(1000,2000))
grid.arrange(plot1, plot2, ncol=1)

write plots to png file

I can use ggplot2 to store the output of ggplot command to an object and call that object within grid.arrange to write to a file in an R script, as below:
p<-ggplot(x, aes(x=Date, y=Date)) + geom_bar(aes(x=Date,y=Data)
png("data.png", height=700, width=650)
grid.arrange(p, main=textGrob("Data"), gp=gpar(cex=2)
dev.off()
I am creating bunch of forecast graphs using plot but I cannot do the same thing. Any one has any suggestion how can I write the ouput of plot to a png file in a script?
We don't have data to work with and the questions not clear so here's an example of what I think the OP is after (separate plots for each plot) using the mtcars data set:
dat <- split(mtcars, mtcars$cyl)
lapply(dat, function(x) {
ggplot(x, aes(mpg, disp, colour=gear)) + geom_point()
}
)
#a way to get separate plots for each plot
plot2 <- function(theplot, name, ...) {
name <- paste0(name, ".png")
png(filename=name)
print(theplot)
dev.off()
} #plotting function
lapply(seq_along(dat), function(i) {
x <- dat[[i]]
z <- ggplot(x, aes(mpg, disp, colour=gear)) + geom_point()
plot2(z, name=paste0("TEST", names(dat)[i]))
}
)
data <- data.frame(x=1:10,y=rnorm(10))
p <- ggplot(data, aes(x,y)) + geom_point()
p
library(gridExtra)
Loading required package: grid
grid.arrange(p,p,p)
ggsave('~/Desktop/grid.png')
Does this approach not work with forecast graphs?

cut off density plot in ggplot2

I use
frame <- read.table(paste('data', fname, sep="/"), sep=",", header=TRUE)
colnames(frame) <- c("pos", "word.length")
plot <- ggplot(frame, aes(x=pos, y=word.length)) + xlim(0,20) + ylim(0,20) + geom_density2d() + stat_density2d(aes(color=..level..))
png(paste("graphs/", fname, ".png", sep=""), width=600, height=600)
print(plot)
dev.off()
to create plots, but they get cut off. How do I fix this?
http://ompldr.org/vZTN0eQ
The data I used to create this plot: http://sprunge.us/gKiL
According to the ggplot2 book, you use scale_x_continuous(limits=c(1,20)) instead of xlim(1,20) for that.

Resources