How to remove gap at end of ggplot2 graph - r

My input file is here on PasteBin.
My current graph code is:
#Input and data formatting
merg_agg_creek<-read.table("merged aggregated creek.txt",header=TRUE)
library(ggplot2)
library(grid)
source("http://egret.psychol.cam.ac.uk/statistics/R/extensions/rnc_ggplot2_border_themes.r")
CombinedCreek<-data.frame(merg_agg_creek)
Combined<-CombinedCreek[order(CombinedCreek[,2]),]
Combined$Creek <- factor(rep(c('Culvert Creek','North Silcox','South Silcox','Yucca Pen'),c(32,57,51,31)))
Combined$Creek<-factor(Combined$Creek,levels(Combined$Creek)[c(1,4,3,2)])
#The Graph Code
creek <-ggplot(Combined,aes(Month,Density,color=factor(Year),shape=factor(Year)))+scale_color_discrete("Year")+scale_shape_discrete("Year")
creek<-creek + facet_grid(Creek~. ,scales = "free_y")
creek <- creek + geom_jitter(position = position_jitter(width = .3))
creek<-creek+scale_color_grey("Year",end=.6)+theme_bw()
creek<-creek+scale_y_continuous(expression("Number of prey captured " (m^2) ^-1))
creek<-creek+opts( panel.border = theme_L_border() )+ opts(axis.line = theme_segment())
creek<-creek+opts(panel.grid.minor = theme_blank())+opts(panel.grid.major = theme_blank())
creek<-creek+scale_x_discrete("Month",breaks=c(2,5,8,11),labels=c("February","May","August","November"))
creek
The resulting graph is:
Graph
My issue is that by creating the breaks and labels in "scale_x_discrete", a large gap exists on the righthand side of the plot, between the data in December and the facet labels. I tried eliminating this gap by adding "limits=c(0,13)" to the "scale_x_discrete: command, but the resulting graph destroys the x-labels.
How do I remove that gap? Is there something fundamentally flawed in my plot creation?
Thanks!
EDIT: Didzis answered the question below. I just need to change from scale_x_discrete to scale_x_continuous

As the Month in your data is numerical, try replace
scale_x_discrete("Month",breaks=c(2,5,8,11),labels=c("February","May","August","November"))
with
scale_x_continuous("Month",breaks=c(2,5,8,11),labels=c("February","May","August","November"))
leaving all other parameters the same

You need to use the expand argument in the scale. I think the extra space might be due to the jitter, but if you give it a -2 it goes all the way to the edge. Almost seems like a bug that it's padding so much.
ggplot(Combined,aes(Month,Density,color=factor(Year),shape=factor(Year))) +
scale_shape_discrete("Year") +
facet_grid(Creek~. ,scales = "free_y") +
geom_jitter(position = position_jitter(width = .3)) +
scale_color_grey("Year",end=.6) +
theme_bw() +
scale_y_continuous(expression("Number of prey captured " (m^2) ^-1))+
scale_x_discrete("Month",breaks=c(2,5,8,11),labels=c("February","May","August","November"), expand= c(0,-2)) +
theme(
panel.grid.major=element_blank(),
panel.grid.minor=element_blank()
)

Related

Adding a "//" on the x-axis to remove whitespace in one side of the ggplot panel plot

I'm hoping if there's a way to remove whitespace in one side of the panel plot (created by facet_wrap) by adding "//" on the x-axis. Below is sample data and code:
df <- data.frame(
condition = c("cond1","cond2","cond3"),
measure = c("type1","type2"),
value = rep(NA, 6)
)
# all type 1 measure values are between -0.5 and 0.5
# all type 2 measure values are between 0.5 and 2
df[df$measure=="type1",]$value <- runif(3, min=-0.5, max=0.5)
df[df$measure=="type2",]$value <- runif(3, min= 1.5, max=2.0)
# both panels should have same axis tick intervals
custom_breaks = function(x){
seq(round(min(x), 2), round(max(x), 2), 0.2)
}
# create a panel plot with vertical line at y=0 for both panels
ggplot(df, aes(x=condition, y=value, color=measure)) +
geom_point() +
geom_hline(aes(yintercept=0), color="grey") +
scale_y_continuous(breaks=custom_breaks) +
facet_wrap(~measure, scales="free_x") +
coord_flip() +
theme_bw() +
theme(panel.grid.major=element_blank(), panel.grid.minor=element_blank())
This code returns the below plot:
Because the values for type 2 (right panel) are far off from zero, adding a vertical line at y=0 results in lots of whitespace. I'm wondering if there's a way to put a "//" on the x-axis on the right panel after 0 and going straight to 1.5 so there aren't tons of wasted white space. Any help would be greatly appreciated!
Broken axes are generally discouraged because they can lead to misleading visualizations, so this is intentionally not implemented in ggplot2 (as answered by Hadley Wickham himself).
My preferred solutions for something like this are (a) facetting (which you are already doing) or (b) log transormation of the axis - but only if it makes sense for the given data.
Take this barchart for example (source / link to image): Since there is valuable information in the outliers (red circle and arrows) both log transformation and broken axes would distort the representation of reality. The package library(ggforce) has an implementation for such zoom facets with the facet_zoom() function.
Your scales = "free_x" is working just fine - the issue is that your geom_hline putting a line at 0 is included in both facets. Here's a way to include it only on the first facet.
ggplot(df, aes(x=condition, y=value, color=measure)) +
geom_point() +
geom_hline(data = data.frame(measure = "type1"), aes(yintercept=0), color="grey") +
scale_y_continuous(breaks=custom_breaks) +
facet_wrap(~measure, scales="free_x") +
coord_flip() +
theme_bw() +
theme(panel.grid.major=element_blank(), panel.grid.minor=element_blank())

Overlay points (and error bars) over bar plot with position_dodge

I have been trying to look for an answer to my particular problem but I have not been successful, so I have just made a MWE to post here.
I tried the answers here with no success.
The task I want to do seems easy enough, but I cannot figure it out, and the results I get are making me have some fundamental questions...
I just want to overlay points and error bars on a bar plot, using ggplot2.
I have a long format data frame that looks like the following:
> mydf <- data.frame(cell=paste0("cell", rep(1:3, each=12)),
scientist=paste0("scientist", rep(rep(rep(1:2, each=3), 2), 3)),
timepoint=paste0("time", rep(rep(1:2, each=6), 3)),
rep=paste0("rep", rep(1:3, 12)),
value=runif(36)*100)
I have attempted to get the plot I want the following way:
myPal <- brewer.pal(3, "Set2")[1:2]
myPal2 <- brewer.pal(3, "Set1")
outfile <- "test.pdf"
pdf(file=outfile, height=10, width=10)
print(#or ggsave()
ggplot(mydf, aes(cell, value, fill=scientist )) +
geom_bar(stat="identity", position=position_dodge(.9)) +
geom_point(aes(cell, color=rep), position=position_dodge(.9), size=5) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_manual(values=myPal) +
scale_color_manual(values=myPal2)
)
dev.off()
But I obtain this:
The problem is, there should be 3 "rep" values per "scientist" bar, but the values are ordered by "rep" instead (they should be 1,2,3,1,2,3, instead of 1,1,2,2,3,3).
Besides, I would like to add error bars with geom_errorbar but I didn't manage to get a working example...
Furthermore, overlying actual value points to the bars, it is making me wonder what is actually being plotted here... if the values are taken properly for each bar, and why the max value (or so it seems) is plotted by default.
The way I think this should be properly plotted is with the median (or mean), adding the error bars like the whiskers in a boxplot (min and max value).
Any idea how to...
... have the "rep" value points appear in proper order?
... change the value shown by the bars from max to median?
... add error bars with max and min values?
I restructured your plotting code a little to make things easier.
The secret is to use proper grouping (which is otherwise inferred from fill and color. Also since you're dodging on multiple levels, dodge2 has to be used.
When you are unsure about "what is plotted where" in bar/column charts, it's always helpful to add the option color="black" which reveals that still things are stacked on top each other, because of your use of dodge instead of dodge2.
p = ggplot(mydf, aes(x=cell, y=value, group=paste(scientist,rep))) +
geom_col(aes(fill=scientist), position=position_dodge2(.9)) +
geom_point(aes(cell, color=rep), position=position_dodge2(.9), size=5) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")
ggsave(filename = outfile, plot=p, height = 10, width = 10)
gives:
Regarding error bars
Since there are only three replicates I would show original data points and maybe a violin plot. For completeness sake I added also a geom_errorbar.
ggplot(mydf, aes(x=cell, y=value,group=paste(cell,scientist))) +
geom_violin(aes(fill=scientist),position=position_dodge(),color="black") +
geom_point(aes(cell, color=rep), position=position_dodge(0.9), size=5) +
geom_errorbar(stat="summary",position=position_dodge())+
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")
gives
Update after comment
As I mentioned in my comment below, the stacking of the percentages leads to an undesirable outcome.
ggplot(mydf, aes(x=paste(cell, scientist), y=value)) +
geom_bar(aes(fill=rep),stat="identity", position=position_stack(),color="black") +
geom_point(aes(color=rep), position=position_dodge(.9), size=3) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")

Preserving text size with `grid.arrange`

Problem
I have 4 graphs that I want to display using grid.arrange(). When I display them individually, they look like this:
But when I use grid.arrange(), they become distorted
with them individually looking like
Specific Issues:
The x-axis labels do not scale and overlap, making them unreadable.
The subtitles get cutoff.
Goal
I want to reproduce each plot exactly like the first ideal case in a grid with grid.arrange(). One possible way might be to convert each plot to an image and then use grid.arrange() but I don't know how to do this.
Reproducible Example
Below is an example reproducible code that shows the problem I am having.
p1 <- ggplot(subset(mtcars, cyl = 4), aes(wt, mpg, colour = cyl)) + geom_point() + labs(title = "TITLE-TITLE-TITLE-TITLE-TITLE-TITLE", subtitle = "-subtitle-subtitle-subtitle-subtitle-subtitle-subtitle-subtitle-") +theme(plot.title = element_text(hjust = 0.5),plot.subtitle = element_text(hjust = 0.5))
p2 <- ggplot(subset(mtcars, cyl = 4), aes(wt, mpg, colour = cyl)) + geom_point() + labs(title = "TITLE-TITLE-TITLE-TITLE-TITLE-TITLE", subtitle = "-subtitle-subtitle-subtitle-subtitle-subtitle-subtitle-subtitle-") +theme(plot.title = element_text(hjust = 0.5),plot.subtitle = element_text(hjust = 0.5))
grid.arrange(p1, p2, ncol = 2)
When you display those graphs individually they simply have more space. So, those are natural distortions and there are perhaps only three ways to solve that.
When exporting the combined graph, make it big enough. If the individual one looks good in 6x5 inches, then surely the combined one will look good in 12x10 inches.
Give correspondingly less space for the problematic parts: x-axis labels and the subtitle. For instance, use something like element_text(size = 6) for plot.subtitle and axis.title.x, add \n to the subtitles and even x-axis labels, try something like element_text(angle = 30) for the latter as well.
Get rid of something unnecessary. As #Richard Telford suggests in the comments, using facet_wrap should work better. That would be due to, e.g., not repeating the y-axis labels and, hence, giving more horizontal space.

How to adjust the following ggplot2 graph?

I conduct a research about global education recently and the following graph is an important plot of my research.
ggplot(sam_data,aes(JOY,PV)) +
geom_line(aes(colour = Individualism))+
facet_grid(occupation~as.factor(Gender)) +
theme(legend.key.height = unit(2.0,"cm"),legend.text = element_text(size = 5,face = "plain")) +
scale_color_continuous("Individualism",labels=sam_data$country,breaks =sam_data$Individualism)+
geom_smooth()
And the problem is obvious :
1) The correlation line of different countries is all combined into one line, instead of different lines when segmented into gender and occupation.
2) The legend is a mess as I want to make it shown clear the countries corresponding to their individualism level. However, I tried to adjust many parameters of the legend and it did not work so much.
3) Also, I do not know how to delete the white gap produced by the breaks parameter. Any thoughts would be great!
I have solved the second problem by adjusting the aes parameter in ggplot function. The new code of mine is as follows
ggplot(sam_data,aes(JOYSCIE,PV1SCIE,group = CNTRYID)) +
geom_point(aes(color = Individualism.comp4))+
facet_grid(recode.OCOD3~as.factor(Gender0women1men)) +
theme(legend.key.height = unit(3.0,"cm"),legend.text = element_text(size = 5,face = "plain")) +
scale_color_gradientn("Individualism",labels=sam_data$CNTRYID,breaks =sam_data$Individualism.comp4,colors = rainbow(4))+
scale_x_continuous(limits = c(-2,2))

Unexpected behaviour: italic() causing cut-off of two-line axis label in ggplot2

Use of italics (italic()) in a y-axis label that goes over two lines in ggplot is causing the first line to be partly cut off.
E.g.
ggplot() +
geom_hline(aes(yintercept = 1)) +
labs(y = expression(paste("Something\nsomething", italic(x'))))
There's no reason apparent this should be happening — the same thing doesn't happen with very similar code not using italic(), e.g. using hat() instead:
ggplot() +
geom_hline(aes(yintercept = 1)) +
labs(y = expression(paste("Something\nsomething", hat(x))))
Anyone know why this would occur or what to do about it, other than tedious manual altering plot and margin sizes or such?
Not sure why this happens but you can increase the plot margins within ggplot2...
ggplot() +
geom_hline(aes(yintercept = 1)) +
labs(y = expression(paste("Something\nsomething", hat(x)))) +
theme(plot.margin=unit(c(1,1,1,1), "cm"))

Resources