fix the width of the grid in ggplot2 - r

I am trying to plot two side by side plots with ggplot. However, as the two data sets contain different number of observations, the width of the grid is automatically adjusted so that the overall width of the two plots is the same.
However, I need to have the same width between each vertical line-segment on the plot and different total widths to reflect the different sample size in the two cases.
Thanks in advance for any advice.
Here is my code
dat1 <- data.frame(a=1:10,b=letters[1:10])
dat2 <- data.frame(a=1:6, b=letters[12:17])
require(gridExtra)
plot1 <- ggplot(dat1, aes(x=b, y=a)) + geom_point(size=4)
plot2 <- ggplot(dat2, aes(x=b, y=a)) + geom_point(size=4)
grid.arrange(plot1, plot2, ncol=2)

You can set widths in grid.arrange(). The following will automatically adjust the width of the graphs based on the number of levels in each data frame.
I renamed your variables so that row name and elements are not repeated:
dat1 <- data.frame(A1=1:10,B1=letters[1:10])
dat2 <- data.frame(A2=1:6, B2=letters[12:17])
Load the necessary libraries:
require(gridExtra)
require(ggplot2)
Now plot:
plot1 <- ggplot(dat1, aes(x=B1, y=A1)) + geom_point(size=4)
plot2 <- ggplot(dat2, aes(x=B2, y=A2)) + geom_point(size=4)
grid.arrange(plot1, plot2, ncol=2,
widths=c(nlevels(dat1$B1),nlevels(dat2$B2)))
Which gives you the following plot:
As you can see, each level is plotted equidistant from each other across both plots.

Related

R ggplot2 overlapping histogram, adding in legend for overlapping part

I have a histogram that is plotting 2 different groups with some overlap between them. I have been able to manually color the groups and a legend is generated for each group, however I am asking how to add into the legend a color and label for the overlapping part?
For example, in the above histogram I would like to add a legend for the purplish part where A and B overlap (which should be labeled as "Overlap" in the legend, underneath B).
Code for generating above histogram:
set.seed(42)
n <- 100
dat <- data.frame(id=1:n,
group=rep(LETTERS[1:2], n/2),
x=rnorm(n))
ggplot(dat, aes(x=x, fill=group)) + geom_histogram(alpha=.5, position="identity") +
scale_fill_manual(values=c("blue","red"))
A partially overlap solution
Sample code:
library(ggplot2)
ggplot(dat, aes(x=x, fill=group)) +
geom_histogram(position = position_dodge(width = 0.6))+
scale_fill_manual(values=c("blue","red"))+
scale_y_continuous(expand=c(0,0))+
theme_bw()
Plot:

ggplot facet_grid with different y axis scales: reverse axis for a facet panel

I have got four plots with all the same x axis (Time) but different y axis. So I used
library(ggplot2)
Gio.m <- melt(Gio, id="AGE")
ggplot(Gio.m[!is.na(Gio.m$value),], aes(x=AGE, y=value, group=1))+
geom_line(aes(color=variable)) +
facet_grid(variable ~ ., scales="free_y") +
theme(legend.position="none")
to make a grid with four scatterplots.
The result looks like this:
The first question would be how to avoid that the output shows all the y-values.
The second question is, if there is a possibility of turning the axis of only one plot within the grid (which should afterwards have a reversed y-axis).
Thanks a lot for your help, and if I should provide more infos on the data pls let me know.
For your first question, as already mentioned by #Roman, you most probably have categorical data in the column value after you melt Gio table. To fix that, transform it back to numeric:
if value is character, then run Gio.m$value <- as.numeric(Gio.m$value)
if value is factor, then run Gio.m$value <- as.numeric(levels(Gio.m$value))[Gio.m$value] as pointed out here
For the second question - not sure if I understand correctly, but one solution could be this:
1) Generate a plot example and its version with reversed OY axis:
library(ggplot2)
library(grid)
# Plot 1
p1 <- ggplot(mpg, aes(cty, displ)) + geom_point() + facet_grid(drv ~ cyl)
# Plot 2 = plot 1 with OY reversed
p2 <- p1 + scale_y_reverse()
2) Get the grid layout and identify grobs:
# Generate the ggplot2 plot grob for each case
g1 <- ggplotGrob(p1)
g2 <- ggplotGrob(p2)
# Draw a diagram of a Grid layout; Is helpful to identifies grobs
grid.show.layout(gtable:::gtable_layout(g1))
# or reduce the font if more practical
grid.show.layout(gtable:::gtable_layout(g1), vp = viewport(gp = gpar(cex=0.7)))
# Check also the layout
g1$layout
Checking and visualizing the layout structure as above can help with identifying the wanted grobs. Here, I want to identify the names of the top panel grobs, so that I replace them with the ones from the graph with reversed OY.
3) Replace the grobs. Will replace the top 3 panels of plot 1 (p1) with the ones from p2 having the OY reversed. Also need to replace the axis.
# Replace the panels from g1 with the ones from g2
panels <- c('panel-1-1', 'panel-4-1', 'panel-3-2', 'panel-2-3')
for (p in panels){
g1$grobs[grep(p, g1$layout$name)] <- g2$grobs[grep(p, g2$layout$name)]
}
# Also replace the axis corresponding to those panels
g1$grobs[grep('axis-l-1', g1$layout$name)] <- g2$grobs[grep('axis-l-1', g2$layout$name)]
Check the results
p1 # the original plot
grid.newpage(); grid.draw(g1) # the edited plot with top panels having OY reversed
Just realized that you do not facet by two variables, but only by one, in this case, is a bit less complex:
p1 <- ggplot(mpg, aes(cty, displ)) + geom_point() + facet_grid(cyl ~ ., scales="free_y")
p2 <- p1 + scale_y_reverse()
g1 <- ggplotGrob(p1)
g2 <- ggplotGrob(p2)
g1$grobs[grep("panel-1-1", g1$layout$name)] <- g2$grobs[grep("panel-1-1", g2$layout$name)]
g1$grobs[grep('axis-l-1', g1$layout$name)] <- g2$grobs[grep('axis-l-1', g2$layout$name)]

ggplot2 width of boxplot

I was trying to make 2 separate plots which I want to present side by side in my poster (I need to make them separate and cannot make use of facet_wrap). One of the plots has several boxplots, while the second plot only has one. How can I manipulate the width of the boxplots such that the second boxplot is the same dimension as the width of any one of the individual boxplots in plot 1, when I put the two plots side by side? A reproducible example:
tvalues <- sample(1:10000,1200)
sex <- c(rep('M',600),rep('F',600))
region <- c('R1','R2','R3','R4','R5')
df1 <- data.frame(tvalues,sex,region)
tvalues2 <- sample(1:10000,200)
sex2 <- sample(c('M','F'),200,replace=T)
region2 <- 'R6'
df2 <- data.frame(tvalues2,sex2,region2)
p1 <- ggplot(data=df1,aes(x=region,y=tvalues,color=sex)) +
geom_boxplot(width=0.5)
p2 <- ggplot(data=df2,aes(x=region2,y=tvalues2,color=sex2)) +
geom_boxplot(width=0.5)
Plot 1
Plot2
I suggest to divide the width of boxes in the second plot by the number of categories of region in the first plot.
p2 <- ggplot(data=df2,aes(x=region2,y=tvalues2,color=sex2)) +
geom_boxplot(width=0.5/length(unique(df1$region)))
In case of a single boxplot like in the following example:
a<- data.frame(obs=rep("A", 50),
value=rnorm(50, 100, 50))
ggplot(a, aes(y=value))+
geom_boxplot()
Wide boxplot
We can establish a false x/y axis and establish an axis limit so the width option of geom_boxplot() determines the width of the box
ggplot(a, aes(y=value, x=0))+
geom_boxplot(width=0.7) +
xlim(-1,1)
Thinner boxplot
You can add the following to remove all x.axis text and ticks
theme(theme(axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank())

Plotting continuous and discrete series in ggplot with facet

I have data that plots over time with four different variables. I would like to combine them in one plot using facet_grid, where each variable gets its own sub-plot. The following code resembles my data and the way I'm presenting it:
require(ggplot2)
require(reshape2)
subm <- melt(economics, id='date', c('psavert','uempmed','unemploy'))
mcsm <- melt(data.frame(date=economics$date, q=quarters(economics$date)), id='date')
mcsm$value <- factor(mcsm$value)
ggplot(subm, aes(date, value, col=variable, group=1)) + geom_line() +
facet_grid(variable~., scale='free_y') +
geom_step(data=mcsm, aes(date, value)) +
scale_y_discrete(breaks=levels(mcsm$value))
If I leave out scale_y_discrete, R complains that I'm trying to combine discrete value with continuous scale. If I include scale_y_discreate my continuous series miss their scale.
Is there any neat way of solving this issue ie. getting all scales correct ? I also see that the legend is alphabetically sorted, can I change that so the legend is ordered in the same order as the sub-plots ?
Problem with your data is that that for data frame subm value is numeric (continuous) but for the mcsm value is factor (discrete). You can't use the same scale for numeric and continuous values and you get y values only for the last facet (discrete). Also it is not possible to use two scale_y...() functions in one plot.
My approach would be to make mcsm value as numeric (saved as value2) and then use them - it will plot quarters as 1,2,3 and 4. To solve the problem with legend, use scale_color_discrete() and provide breaks= in order you need.
mcsm$value2<-as.numeric(mcsm$value)
ggplot(subm, aes(date, value, col=variable, group=1)) + geom_line()+
facet_grid(variable~., scale='free_y') + geom_step(data=mcsm, aes(date, value2)) +
scale_color_discrete(breaks=c('psavert','uempmed','unemploy','q'))
UPDATE - solution using grobs
Another approach is to use grobs and library gridExtra to plot your data as separate plots.
First, save plot with all legends and data (code as above) as object p. Then with functions ggplot_build() and ggplot_gtable() save plot as grob object gp. Extract from gp only part that plots legend (saved as object gp.leg) - in this case is list element number 17.
library(gridExtra)
p<-ggplot(subm, aes(date, value, col=variable, group=1)) + geom_line()+
facet_grid(variable~., scale='free_y') + geom_step(data=mcsm, aes(date, value2)) +
scale_color_discrete(breaks=c('psavert','uempmed','unemploy','q'))
gp<-ggplot_gtable(ggplot_build(p))
gp.leg<-gp$grobs[[17]]
Make two new plot p1 and p2 - first plots data of subm and second only data of mcsm. Use scale_color_manual() to set colors the same as used for plot p. For the first plot remove x axis title, texts and ticks and with plot.margin= set lower margin to negative number. For the second plot change upper margin to negative number. faced_grid() should be used for both plots to get faceted look.
p1 <- ggplot(subm, aes(date, value, col=variable, group=1)) + geom_line()+
facet_grid(variable~., scale='free_y')+
theme(plot.margin = unit(c(0.5,0.5,-0.25,0.5), "lines"),
axis.text.x=element_blank(),
axis.title.x=element_blank(),
axis.ticks.x=element_blank())+
scale_color_manual(values=c("#F8766D","#00BFC4","#C77CFF"),guide="none")
p2 <- ggplot(data=mcsm, aes(date, value,group=1,col=variable)) + geom_step() +
facet_grid(variable~., scale='free_y')+
theme(plot.margin = unit(c(-0.25,0.5,0.5,0.5), "lines"))+ylab("")+
scale_color_manual(values="#7CAE00",guide="none")
Save both plots p1 and p2 as grob objects and then set for both plots the same widths.
gp1 <- ggplot_gtable(ggplot_build(p1))
gp2 <- ggplot_gtable(ggplot_build(p2))
maxWidth = grid::unit.pmax(gp1$widths[2:3],gp2$widths[2:3])
gp1$widths[2:3] <- as.list(maxWidth)
gp2$widths[2:3] <- as.list(maxWidth)
With functions grid.arrange() and arrangeGrob() arrange both plots and legend in one plot.
grid.arrange(arrangeGrob(arrangeGrob(gp1,gp2,heights=c(3/4,1/4),ncol=1),
gp.leg,widths=c(7/8,1/8),ncol=2))

bar and line plot in one chart with a legend under ggplot2

I would like put a bar and a line plot of two separate but related series on the same chart with a legend (the bar plot is of quarterly growth the line plot is of annual growth).
I currently do it with a data.frame in wide format and code like this:
p <- ggplot() +
geom_bar(df, aes(x=Date, y=quarterly), colour='blue') +
geom_line(df, aes(x=Date, y=annual), colour='red')
but I cannot work out how to add a legend, which has a red line labeled 'Annual Growth'; and a blue square labeled 'Quarterly Growth'.
Alternatively, I cannot work out how to have differnt geoms for different series with a long-form data.frame.
UPDATE:
The following example code gets me part of the way towards a solution, but with a really ugly duplicate legend. Still looking for a complete solution ... This approach is based on putting the data in long form and then plotting subsets of the data ...
library(ggplot2)
library(reshape)
library(plyr)
library(scales)
### --- make a fake data set
x <- rep(as.Date('2012-01-01'), 24) + (1:24)*30
ybar <- 1:24
yline <- ybar + 1
df <- data.frame(x=x, ybar=ybar, yline=yline)
molten <- melt(df, id.vars='x', measure.vars=c('ybar', 'yline'))
molten$line <- ifelse(molten$variable=='yline', TRUE, FALSE)
molten$bar <- ifelse(molten$variable=='ybar', TRUE, FALSE)
### --- subset the data set
df.line <- subset(molten, line==TRUE)
df.bar <- subset(molten, bar==TRUE)
### --- plot it
p <- ggplot() +
geom_bar(data=df.bar, mapping=aes(x=x, y=value, fill=variable, colour=variable),
stat='identity', position='dodge') +
geom_line(data=df.line, mapping=aes(x=x, y=value, colour=variable)) +
opts(title="Test Plot", legend.position="right")
ggsave(p, width=5, height=3, filename='plot.png', dpi=150)
And an example plot ...
By use of the subset argument to geoms.
> x=1:10;df=data.frame(x=x,y=x+1,z=x+2)
> ggplot(melt(df),
aes(x,value,color=variable,fill=variable))+
geom_bar(subset=.(variable=="y"),stat="identity")+
geom_line(subset=.(variable=="z"))

Resources