Related
I'm new using R and ggplot library, and I'm trying to build a bar chart with an error bar, like this:
pd <- position_dodge(0.80)
ggplot(aqp1) + geom_bar( aes(x=Type, y=Error, fill=Set, colour=Set, group=Set), width=0.75, stat="identity", alpha=1, position=pd)
+ geom_errorbar( aes(x=Type, ymin=Error,ymax=Max.Error, colour=NULL, group=Set), position=pd, width=0.1, colour="black", alpha=1, size=0.1)
+ theme_light() + labs(title="", x="Sizes", y="Relative error (%)")
The only thing is that I would like logscale the y-axis, so I tried using the scale_y_log10 function:
ggplot(aqp1) + geom_bar( aes(x=Type, y=Error, fill=Set, colour=Set, group=Set), width=0.75, stat="identity", alpha=1, position=pd)
+ geom_errorbar( aes(x=Type, ymin=Error,ymax=Max.Error, colour=NULL, group=Set), position=pd, width=0.1, colour="black", alpha=1, size=0.1)
+ theme_light() + labs(title="", x="Sizes", y="Relative error (%)")
+ scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x)))
But I have a bizarre result, where the errorbar has another scale, different from the bar chart.
How can I fix this? I tried using + ylim(10^-2, 10^1) but it doesn't work.
I just want to quote #eipi10
A bar chart doesn't work well on a log scale, because the bars' baseline is set to 1 (10^0) instead of zero
This explains, why the images go wrong.
Consider the following figure:
mainplot = ggplot(mtcars, aes(y=mpg,x=wt)) + geom_point() + theme_classic(15) + ylim(c(5,40)) + geom_hline(yintercept=c(15,25), color="red")
gg = ggplot(data.frame(mpg=0), aes(x=mpg))
f = function(mpg,center) {exp(-(mpg - center)^2/(20))}
f15 = function(mpg) {f(mpg,15)}
f25 = function(mpg) {f(mpg,25)}
sideplot = gg + stat_function(fun = f15, linetype="dashed") + stat_function(fun = f25, linetype="dashed") + theme_classic(15) + scale_x_continuous(name=NULL,limits=c(5,40)) + coord_flip() + ylab("f") + theme(axis.title.y=element_blank(),axis.text.y=element_blank(),axis.ticks.y=element_blank()) + geom_vline(xintercept=c(15,25), color="red")
multiplot(mainplot, sideplot, layout=matrix(c(1,1,1,2),nrow=1))
As the figure is made of two independent graphs, the red horizontal lines are interrupted. Is there any way I can make it a continuous line?
It is possible that the easiest solution consists at using Adobe Illustrator (or some equivalent) to modify the figure.
Not really a solution but a work around.
Reduce the margins to stick the two graphs together
Make your line a dashed line
Remove the y axis line of the sideplot
mainplot = ggplot(mtcars, aes(y=mpg,x=wt)) + geom_point() + theme_classic(15) + ylim(c(5,40)) + geom_hline(yintercept=c(15,25), color="red", linetype="dashed") + theme(plot.margin = unit(c(1,0,1,1), "cm"))
sideplot = gg + stat_function(fun = f15, linetype="dashed") + stat_function(fun = f25, linetype="dashed") + theme_classic(15) + scale_x_continuous(name=NULL,limits=c(5,40)) + coord_flip() + ylab("f") + theme(axis.line.y=element_blank(),axis.title.y=element_blank(),axis.text.y=element_blank(),axis.ticks.y=element_blank()) + geom_vline(xintercept=c(15,25), color="red", linetype="dashed") + theme(plot.margin = unit(c(1,1,1,0), "cm"))
multiplot(mainplot, sideplot, layout=matrix(c(1,1,1,2),nrow=1))
I have some data where x is categorical, y is numeric, and color.var is another categorical variable that I would like to color by. My goal is to plot all of the points using position_jitterdodge(), and then highlight a couple of the points, draw a line between them, and add labels, while making sure these highlighted points line up with the corresponding strips of points that were plotted using position_jitterdodge(). The highlighted points are aligned properly when all factors are present in the variable used to dodge, but it does not work well when some factors are missing.
Minimal (non-)working example
library(ggplot2)
Generate some data
d = data.frame(x = c(rep('x1', 1000), rep('x2', 1000)),
y = runif(n=2000, min=0, max=1),
color.var= rep(c('color1', 'color2'), 1000),
facet.var = rep(c('facet1', 'facet1', 'facet2', 'facet2'), 500))
head(d)
dd = d[c(1,2,3,4,1997,1998, 1999,2000),]
dd
df1 = dd[dd$color.var=='color1',] ## data for first set of points, labels, and the line connecting them
df2 = dd[dd$color.var=='color2',] ## data for second set of points, labels, and the line connecting them
df1
dw = .75 ## Define the dodge.width
Plot all points
Here are all of the points, separated using position_jitterdodge() and the aesthetic fill.
ggplot() +
geom_point(data=d, aes(x=x, y=y, fill=color.var), position=position_jitterdodge(dodge.width=dw), size=3, alpha=1, shape=21, color='darkgray') +
facet_wrap(~facet.var) +
scale_fill_manual(values=c( 'lightblue','gray'))+
theme(axis.title = element_blank()) +
theme(legend.position="top")
That works well.
Additional highlighted points.
Here is the same plot, with additional points in dd added.
ggplot() +
geom_point(data=d, aes(x=x, y=y, fill =color.var), position=position_jitterdodge(dodge.width=dw), size=3, alpha=1, shape=21, color='darkgray') +
geom_point(data=dd, aes(x=x, y=y, color=color.var ), position=position_dodge(width=.75), size=4 ) +
geom_line(data=dd, aes(x=x, y=y, color=color.var, group=color.var ), position=position_dodge(width=.75), size=1 ) +
geom_label(data=dd, aes(x=x, y=y, color=color.var, group=color.var, label=round(y,1)), position=position_dodge(width=.75), vjust=-.5) +
facet_wrap(~facet.var) +
scale_fill_manual(values=c( 'lightblue','gray'))+
scale_color_manual(values=c( 'blue', 'gray40')) +
theme(axis.title = element_blank())+
theme(legend.position="top")
This is what I want it to look like. However, this only works properly if both factors of the color.var variable are in the set of points to highlight.
If both factors aren't present in the new data, the horizonal alignment fails.
Highlight points, only one factor present
Here is an example where only the 'color1' factor (blue) is present. Note that data=dd was replaced with data=df1 (data that only contains blue highlighted dots) in this code.
ggplot() +
geom_point(data=d, aes(x=x, y=y, fill =color.var), position=position_jitterdodge(dodge.width=dw), size=3, alpha=1, shape=21, color='darkgray') +
geom_point(data=df1, aes(x=x, y=y, color=color.var ), position=position_dodge(width=.75), size=4 ) +
geom_line(data=df1, aes(x=x, y=y, color=color.var, group=color.var ), position=position_dodge(width=.75), size=1 ) +
geom_label(data=df1, aes(x=x, y=y, color=color.var, group=color.var, label=round(y,1)), position=position_dodge(width=.75), vjust=-.5) +
facet_wrap(~facet.var) +
scale_fill_manual(values=c( 'lightblue','gray'))+
scale_color_manual(values=c( 'blue', 'gray40')) +
theme(axis.title = element_blank())+
theme(legend.position="top") +
scale_x_discrete(drop=F)
The highlight blue dots appear between the blue and gray dots, instead of aligned with the blue dots. Note that the additional code scale_x_discrete(drop=F) had no apparent effect on the alignment.
A manual solution
One possible fix is to edit the x coordinate manually, like this
ggplot(data=d, aes(x=x, y=y)) +
geom_point(aes(fill=color.var), position=position_jitterdodge(dodge.width=dw), size=3, alpha=1, shape=21, color='darkgray') +
geom_point(data=df1, aes(x=as.numeric(x)-dw/4, y=y), alpha=.9, size=4 , color='blue') + ## first set of points
geom_line( data=df1, aes(x=as.numeric(x)-dw/4, y=y , group=color.var ), color='blue', size=1) + ## first line
geom_label(data=df1, aes(x=as.numeric(x)-dw/4, y=y , label=round(y,1)), color='blue', vjust=-.25)+ ## first set of labels
facet_wrap(~facet.var) +
scale_fill_manual(values=c( 'lightblue','gray'))+
theme(axis.title = element_blank() +
theme(legend.position="top")
An adjustment of 1/4 of the dodge.width seems to work. This works fine, but it seems like there should be a better way, especially since I will eventually want to do this with 4-5 sets of highlighted points/lines, which may all be all be the same color.var, like the blue 'color1' factor above. Repeating this 4-5 times would be cumbersome. I will also eventually want to do this will 5-10 different figures. I suppose dodge.width*1/4 will always work, and copying and pasting might do the trick, but would like to know if there is a better way.
Here is a solution based on #aosmith's comment. Basically, just need to add this code before using ggplot:
library(dplyr) ## needed for group_by()
library(tidyr) ## needed for complete()
df1 = df1 %>% group_by(facet.var, x) %>% complete(color.var)
That adds extra rows to the data so that all the levels of color.var are present. Then the code given in the question, along with a couple of small edits that fix the legend, can be used:
ggplot() +
geom_point(data=d , aes(x=x, y=y, fill =color.var), position=position_jitterdodge(dodge.width=dw), size=3, alpha=1, shape=21, color='darkgray', show.legend=T) +
geom_point(data=df1, aes(x=x, y=y, color=color.var ), position=position_dodge(width=.75), size=4, show.legend=T ) +
geom_line( data=df1, aes(x=x, y=y, color=color.var, group=color.var ), position=position_dodge(width=.75), size=1, show.legend=F ) +
geom_label(data=df1, aes(x=x, y=y, color=color.var, group=color.var, label=round(y,1)), position=position_dodge(width=.75), vjust=-.5, show.legend=F) +
facet_wrap(~facet.var) +
scale_fill_manual( values=c( 'lightblue','gray'), name='Background dots', guide=guide_legend(override.aes = list(color=c('lightblue', 'gray')))) +
scale_color_manual(values=c( 'blue', 'gray40') , name='Highlighted dots') +
theme(axis.title = element_blank())+
theme(legend.position="top")+
scale_x_discrete(drop=F)
How can I draw several lines between two facets?
I attempted this by plotting points at the min value of the top graph but they are not between the two facets. See picture below.
This is my code so far:
t <- seq(1:1000)
y1 <- rexp(1000)
y2 <- cumsum(y1)
z <- rep(NA, length(t))
z[100:200] <- 1
df <- data.frame(t=t, values=c(y2,y1), type=rep(c("Bytes","Changes"), each=1000))
points <- data.frame(x=c(10:200,300:350), y=min(y2), type=rep("Bytes",242))
vline.data <- data.frame(type = c("Bytes","Bytes","Changes","Changes"), vl=c(1,5,20,5))
g <- ggplot(data=df, aes(x=t, y=values)) +
geom_line(colour=I("black")) +
facet_grid(type ~ ., scales="free") +
scale_y_continuous(trans="log10") +
ylab("Log values") +
theme(axis.text.x = element_text(angle = 90, hjust = 1), panel.margin = unit(0, "lines"))+
geom_point(data=points, aes(x = x, y = y), colour="green")
g
In order to achieve that, you have to set the margins inside the plot to zero. You can do that with expand=c(0,0). The changes I made to your code:
When you use scale_y_continuous, you can define the axis label inside that part and you don't need a seperarate ylab.
Changed colour=I("black") to colour="black" inside geom_line.
Added expand=c(0,0) to scale_x_continuous and scale_y_continuous.
The complete code:
ggplot(data=df, aes(x=t, y=values)) +
geom_line(colour="black") +
geom_point(data=points, aes(x = x, y = y), colour="green") +
facet_grid(type ~ ., scales="free") +
scale_x_continuous("t", expand=c(0,0)) +
scale_y_continuous("Log values", trans="log10", expand=c(0,0)) +
theme(axis.text.x=element_text(angle=90, vjust=0.5), panel.margin=unit(0, "lines"))
which gives:
Adding lines can also be done with geom_segment. Normally the lines (segments) will appear in both facets. If you want them to appear between the two facets, you will have to restrict that in data parameter:
ggplot(data=df, aes(x=t, y=values)) +
geom_line(colour="black") +
geom_segment(data=df[df$type=="Bytes",], aes(x=10, y=0, xend=200, yend=0), colour="green", size=2) +
geom_segment(data=df[df$type=="Bytes",], aes(x=300, y=0, xend=350, yend=0), colour="green", size=1) +
facet_grid(type ~ ., scales="free") +
scale_x_continuous("t", expand=c(0,0)) +
scale_y_continuous("Log values", trans="log10", expand=c(0,0)) +
theme(axis.text.x=element_text(angle=90, vjust=0.5), panel.margin=unit(0, "lines"))
which gives:
Currently my regression plot looks like this. Notice that
the regression line is deeply buried.
Is there any way I can modify my code here, to show it on top of the dots?
I know I can increase the size but it's still underneath the dots.
p <- ggplot(data=my_df, aes(x=x,y=y),) +
xlab("x") +
ylab("y")+
geom_smooth(method="lm",se=FALSE,color="red",formula=y~x,size=1.5) +
geom_point()
p
Just change the order:
p <- ggplot(data=my_df, aes(x=x,y=y),) +
xlab("x") +
ylab("y")+
geom_point() +
geom_smooth(method="lm",se=FALSE,color="red",formula=y~x,size=1.5)
p
The issue is not the color, but the order of the geoms.
If you first call geom_point() and then geom_smooth()
the latter will be on top of the former.
Plot the following for comparison:
Before <-
ggplot(data=my_df, aes(x=x,y=y),) +
xlab("x") +
ylab("y")+
geom_smooth(method="lm",se=FALSE,color="red",formula=y~x,size=1.5) +
geom_point()
After <-
ggplot(data=my_df, aes(x=x,y=y),) +
xlab("x") +
ylab("y")+
geom_point() +
geom_smooth(method="lm",se=FALSE,color="red",formula=y~x,size=1.5)
How about transparent points?
library(ggplot2)
seed=616
x1<- sort(runif(rnorm(1000)))
seed=626
x2<- rnorm(1000)*0.02+sort(runif(rnorm(1000)))
my_df<- data.frame(x= x1, y = x2)
p <- ggplot(data=my_df, aes(x=x,y=y),) +
xlab("x") +
ylab("y")+
geom_smooth(method="lm",se=FALSE,color="red",formula=y~x,size=1.5)+
geom_point(size = I(2), alpha = I(0.1))
p