Minor ticks on log10 scale using ggplot2

Minor ticks on log10 scale using ggplot2 - r

I have a dataframe of ~108m rows of data, in 7 columns. I use this R script to make a boxplot of it:
ggplot(expanded_results, aes(factor(hour), dynamic_nox)) +
geom_boxplot(fill="#6699FF", outlier.size = 0.5, lwd=.1) +
scale_y_log10() +
stat_summary(fun.y=mean, geom="line", aes(group=1, colour="red")) +
ylab(expression(Exposure~to~NO[x])) +
xlab(expression(Hour~of~the~day)) +
ggtitle("Hourly exposure to NOx") +
theme(axis.text=element_text(size=12, colour="black"),
axis.title=element_text(size=12, colour="black"),
plot.title=element_text(size=12, colour="black"),
legend.position="none")
The graph looks like this. It's pretty much fine, however it would be better to have a value towards the top of the Y axis. I guess it should be something like 1000 given the Y axis is a log10 scale. I'm not sure how to do this though?
Any ideas please?
EDIT: In response to DrDom:
Try to add scale_y_log10(breaks=c(0,10,100,1000)). The output of doing that, is this:
The output of doing the following:
scale_y_log10(breaks=c(0,10,100,1000), limits=c(0,1000))
Is an error of:
Error in seq.default(dots[[1L]][[1L]], dots[[2L]][[1L]], length = dots[[3L]][[1L]]:
'from' cannot be NA, NaN or infinite
In respnonse to Jaap who suggested the following code:
library(ggplot2)
library(scales)
ggplot(expanded_results, aes(factor(hour), dynamic_nox)) +
geom_boxplot(fill="#6699FF", outlier.size = 0.5, lwd=.1) +
stat_summary(fun.y=mean, geom="line", aes(group=1, colour="red")) +
scale_y_continuous(breaks=c(0,10,100,1000,3000), trans="log1p") +
labs(title="Hourly exposure to NOx", x=expression(Hour~of~the~day), y=expression(Exposure~to~NO[x])) +
theme(axis.text=element_text(size=12, colour="black"), axis.title=element_text(size=12, colour="black"),
plot.title=element_text(size=12, colour="black"), legend.position="none")
It produces this graph. Have I done something wrong? I'm still missing a '1000' tick label? A tick inbetween the 10 and the 100 would also be good given that is where most of the data is?

You can modify your log scale by adding arguments breaks= to scale_y_log10(), only there shouldn't be a 0 value because from those values also log is calculated.
df<-data.frame(x=1:10000,y=1:10000)
ggplot(df,aes(x,y))+geom_line()+
scale_y_log10(breaks=c(1,5,10,85,300,5000))

Instead of using scale_y_log10 you can also use scale_y_continuous together with a log transformation from the scales package. When you use the log1p transformation, you are also able to include a 0 in your breaks: scale_y_continuous(breaks=c(0,1,3,10,30,100,300,1000,3000), trans="log1p")
Your complete code will then look like this (notice that I also combined the title arguments in labs):
library(ggplot2)
library(scales)
ggplot(expanded_results, aes(factor(hour), dynamic_nox)) +
geom_boxplot(fill="#6699FF", outlier.size = 0.5, lwd=.1) +
stat_summary(fun.y=mean, geom="line", aes(group=1, colour="red")) +
scale_y_continuous(breaks=c(0,1,3,10,30,100,300,1000,3000), trans="log1p") +
labs(title="Hourly exposure to NOx", x=expression(Hour~of~the~day), y=expression(Exposure~to~NO[x])) +
theme(axis.text=element_text(size=12, colour="black"), axis.title=element_text(size=12, colour="black"),
plot.title=element_text(size=12, colour="black"), legend.position="none")

Related

ggplot2: smooth and fill

I'd like to smooth the geom_lines and fill the area between. I've tried stat_smooth() to smooth the lines, and both geom_ribbon() and geom_polygon() but without success.
Apologies for the double barrel question.
bell <- data.frame(
month = c("Launch","1st","2nd","3rd","4th","5th","6th","7th","8th","9th","10th","11th","12th"),
rate = c(0,.05,.12,.18,.34,.42,.57,.68,.75,.81,.83,.85,.87))
bell$month <- factor(bell$month, levels = rev(c("Launch","1st","2nd","3rd","4th","5th","6th","7th","8th","9th","10th","11th","12th")))
ggplot() +
theme_minimal() +
coord_flip() +
scale_fill_manual(values=cols) +
geom_line(data=bell, aes(x=month, y=.5-(rate/2), group=1), color='pink', size=1) +
geom_line(data=bell, aes(x=month, y=.5+(rate/2), group=1), color='pink', size=1) +
theme(legend.position='none', axis.ticks=element_blank(), axis.text.x=element_blank(),axis.title.x=element_blank())

One option is to calculate the points of the loess regression outside of ggplot and then plot them using geom_line (for a line) or geom_area for a filled area (geom_area is geom_ribbon, but with ymin fixed at zero).
Also, you don't need coord_flip. Instead, just switch your x and y mappings. This is necessary anyway if you want to fill underneath the curve.
In the example below I've created a numeric month variable for the regression. I've also commented out the scale_fill_manual line because your example doesn't provide a cols vector and the plot code doesn't produce a legend anyway. I've also commented out the legend.position='none' line as it's superfluous.
bell$month.num = 0:12
m1 = loess(rate ~ month.num, data=bell)
bell$loess.mod = predict(m1)
ggplot(bell, aes(y=month, group=1)) +
theme_minimal() +
#scale_fill_manual(values=cols) +
geom_area(aes(x=.5-(loess.mod/2)), fill='pink', size=1) +
geom_area(aes(x=.5+(loess.mod/2)), fill='pink', size=1) +
theme(#legend.position='none',
axis.ticks=element_blank(),
axis.text.x=element_blank(),
axis.title.x=element_blank())

Dodged geom_errorbar with geom_bar using fill, colour, and alpha

I am attempting to make a facet_wrap bar_graph with error bars (se) that clearly shows three different categorical variables (Treatment, Horizon, Enzyme) with one response variable (AbundChangetoAvgCtl). Below is the code for some dummy data followed by the ggplot code I have so far. The graphs I've made can be see at this link:
bargraph figures
Enzyme <- c("Arabinosides","Arabinosides","Arabinosides","Arabinosides","Arabinosides","Arabinosides","Cellulose","Cellulose","Cellulose","Cellulose","Cellulose","Cellulose","Chitin","Chitin","Chitin","Chitin","Chitin","Chitin","Lignin","Lignin","Lignin","Lignin","Lignin","Lignin")
Treatment <- c("Deep","Deep","Int","Int","Low","Low","Deep","Deep","Int","Int","Low","Low","Deep","Deep","Int","Int","Low","Low","Deep","Deep","Int","Int","Low","Low")
Horizon <- c("Org","Min","Org","Min","Org","Min","Org","Min","Org","Min","Org","Min","Org","Min","Org","Min","Org","Min","Org","Min","Org","Min","Org","Min")
AbundChangetoAvgCtl <- rnorm(24,mean=0,sd=1)
se <- rnorm(24, mean=0.5, sd=0.25)
notrans_noctl_enz_toCtl_summary <- data.frame(Enzyme,Treatment,Horizon,AbundChangetoAvgCtl,se)
ggplot(notrans_noctl_enz_toCtl_summary, aes(x=Horizon, y=AbundChangetoAvgCtl, fill=Horizon, alpha=Treatment)) +
geom_bar(position=position_dodge(), colour="black", stat="identity", aes(fill=Horizon)) +
geom_errorbar(aes(ymin=AbundChangetoAvgCtl-se, ymax=AbundChangetoAvgCtl+se),
width=.2,
position=position_dodge(.9)) +
scale_fill_brewer(palette = "Set1") + theme_bw() +
geom_hline(yintercept=0) +
labs(y = "Rel Gene Abundance Change / Control", x="") +
theme(axis.ticks = element_blank(),
axis.text.x = element_blank(),
strip.text.x = element_text(size=20),
plot.title = element_text(size=22, vjust=2, face="bold"),
axis.title.y = element_text(size=18),
legend.key.size = unit(.75, "in"),
legend.text = element_text(size = 15),
legend.title = element_text(size = 18)) +
facet_wrap(~Enzyme, scales="free")
(figure 1)
So this is close to what I want, however for some reason, the "alpha=Treatment" call in ggplot causes my errorbars to fade (which I don't want) as well as the bar_fill (which I do want). I've tried moving the "alpha=Treatment" to the geom_bar call, as well as adding "alpha=1" to geom_bar, but when I do that, the error bars all move to a single location and overlap (figure 2).
I initially wanted to cluster the bars within facet_wrap, but found the alpha option on this site, which seems to accomplish what I'm looking for as well. Any help would be appreciated. If there is a better way to represent all of this, those ideas are welcome as well.
Also, if there is a way to condense and clarify my legend, that would be extra bonus!
Thanks in advance for your help!
Mike

You need to assign Treatment to the group option in the ggplot() command and then move the alpha=Treatment option to the geom_bar() command. Then the alpha value of geom_errorbar won't be affected by the global option and will be black. Like this:
ggplot(notrans_noctl_enz_toCtl_summary, aes(x=Horizon, y=AbundChangetoAvgCtl, fill=Horizon, group = Treatment)) +
geom_bar(position=position_dodge(), colour="black", stat="identity", aes(fill=Horizon, alpha = Treatment))
Also, I would check whether setting alpha=Treatment corresponds to more transparent as being equivalent to low treatment and less transparent to high treatment. At least that would be my intuitive understanding, without any background on the research design or data.
For information about formatting legends, see here.

How to align labels with coord_polar?

I am trying to produce a circular "heatmap" in R, and found a solution with coord_polar, and how to distribute the labels around the plot.
My problem is that the labels around the plot seem to be centred and the long names are overlapping the plot. I can't use hjust and vjust to align the text to the edge of the plot.
My code and a subset of my data:
library(reshape)
library(ggplot2)
data <- data.frame(id=c("S_subsp_houtenae_str_ATCC_BAA-1581","S_Heidelberg_S_1_7","S_Haifa_S_11_3","S_Infantis_S_2_3","S_Newport_S_1_4","S_Bredeney_S_1_3","S_Saint_Paul_S_1_5","S_Bovismorbificans_S_3_8","S_Saintpaul_str_SARA26","S_London_S_6_7","S_Mbandaka_S_7_5","S_Corvallis_S_5_6","S_San_Diego_S_9_5","S_Javiana_str_10721"),
A.C2=c(0,0,0,0,0,0,0,0,0,0,0,2,0,0),Col156=c(0,0,0,0,0,4,0,0,0,0,0,0,0,0),
ColRNAI=c(0,8,0,0,8,8,8,0,8,0,0,0,0,0),FIB=c(0,0,0,0,10,0,0,10,10,0,0,0,0,0),
FII=c(0,0,0,0,0,0,0,12,12,0,0,0,0,0),HI2=c(0,15,0,0,15,15,0,0,0,0,0,0,0,0),
HI2A=c(0,15,0,0,15,15,0,0,0,0,0,0,0,0),I1=c(0,17,17,17,0,0,0,0,0,0,0,17,17,0),
I2=c(0,0,0,0,0,0,0,0,0,0,0,18,18,18),N=c(0,0,0,0,0,0,0,19,19,19,19,0,0,0),
P=c(20,20,20,20,20,20,20,0,0,0,0,0,0,0),Q1=c(0,22,0,0,22,0,0,0,0,0,0,22,0,0))
data <- transform(data,id=factor(id,levels=unique(id)))
data.m <- melt(data)
data.m$var2 = as.numeric(data.m$variable) + 15
y_labels = levels(data.m$variable)
y_breaks = seq_along(y_labels) + 15
sequence_length = length(unique(data.m$id))
first_sequence = c(1:(sequence_length%/%2))
second_sequence = c((sequence_length%/%2+1):sequence_length)
first_angles =c(90 - 180/length(first_sequence) * first_sequence)
second_angles = c(-90 - 180/length(second_sequence) * second_sequence)
Palette <- c("#f1f1f1","#302013","#614126","#58DB41","#638A5C","#62D585","#579134","#B8DD95","#9ED84D","#4B6FC8","#2A344D","#47689B","#315CEE","#D9AB68","#E09B33","#FE9E2A","#D97B0C","#6A2F45","#A02A77","#E1C73E","#D16F60","#C13420","#DA435C","#E20338","#000000","#999999")
p = ggplot(data.m, aes(x=id, y=var2, fill=factor(value))) +
geom_tile(colour="white") +
scale_fill_manual(values=Palette) +
scale_y_discrete(breaks=y_breaks, labels=y_labels) +
theme(panel.background=element_blank(),
axis.title=element_blank(),
panel.grid=element_blank(),
axis.text.x=element_text(angle= c(first_angles,second_angles),size=8),
axis.ticks=element_blank(),
axis.text.y=element_blank(),
legend.position="none")
p = p + coord_polar()
plot(p)

I've had similar issues in coord_polar() with labels not responding to either hjust= or vjust= and therefore not aligning as I'd like.
The solution to this, shown here https://stackoverflow.com/a/28846989/4340137, is to use geom_text() to manually label the data.
The example at the link provided does everything you need. Unfortunately, I just can't get it working quickly with your more complicated data structure and SO won't let me leave this as a comment.
Someone else may be able to edit to include the exact code.

In RStudio, when I run the following and zoom, all the labels are outside the circle except the longest one, which may mean the plot margin at the top is too tight (or you might consider shortening the name or using \n for a new line). I changed the axis.text.y argument to theme. I also couldn't get the odd legend in the top left to go away. Even so, the inserted plot suffers from the overlap problem you described.
ggplot(data.m, aes(x=id, y=var2, fill=factor(value))) +
geom_tile(colour="white") +
scale_fill_manual(values=Palette) +
scale_y_discrete(breaks=y_breaks, labels=y_labels) +
theme(panel.background=element_blank(), axis.title=element_blank(), panel.grid=element_blank(),
axis.text.x=element_text(angle= c(first_angles,second_angles),size=8, vjust=-1), # vjust=-1
axis.ticks=element_blank(), legend.position="none",
axis.text.y=element_text(vjust = -2), legend.position="none") +
coord_polar()

How to illustrate non available data points in a different shape using ggplot2?

Is there a way to change the shape of the points for missing data in R? I am plotting .csv files like this one in a lollipop style.
Name,chr,Pos,Reads...ME_016,Reads...ME_017,Reads...ME_018,Reads...ME_019
cg01389728,chr10,6620395,33.82,41.38,41.38,38.46
cg01389728,chr10,6620410,0,-,-,-
cg01389728,chr10,6620430,0,0,-,-
cg01389728,chr10,6620447,0,-,0,-
cg01389728,chr10,6620478,0,-,-,-
cg01389728,chr10,6620510,28.33,29.85,25.64,28.13
cg01389728,chr10,6620520,0,0,-,0
cg01389728,chr10,6620531,0,-,50,-
Using ggplot2, my graphs are created with this:
dataset <-read.table("testset", sep=",",na.strings="-", header=TRUE)
dataset <- subset(dataset, select=c(-Name, -chr))
dataset <- melt(dataset, id.vars="Pos")
dataset$variable <- gsub("\\.\\.\\.","_",dataset$variable)
xaxes <- unique(dataset$Pos)
dataset$Pos <- as.factor(dataset$Pos)
ggplot(dataset, aes(x=Pos, y=variable,fill=cut(value, breaks=10))) + geom_point(size=4, shape=21) + geom_line() + scale_fill_discrete(labels=c("0-10%","10-20%","20-30%","30-40%","40-50%","50-60%","60-70%","70-80%","80-90%","90-100%")) +
xlab("CpG Positions") +
ylab("Sample") +
labs(fill="Coverage in %") +
theme_bw() +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5),plot.title = element_text(vjust=2),axis.title.x = element_text(vjust=-0.5),axis.title.y = element_text(vjust=1.5))
However, I want to set the shape of the missing points ("-") in the plot to an "x", (shape=4) and show them also in the legend.
I've tried approaches like:
scale_fill_manual(values=c(value, NA))
or:
scale_shape_manual(values=c(21,4))
By default, the "-" are also shown with shape 21 and grey colour. There must be a way to manipulate this? Writing a method like this might be the trick, but how to call it for the whole column?
formas <- function(x){
+ if(is.na(x)) forma <- 4
+ if(!is.na(x)) forma <- 21
+ return(forma)
+ }

This comes pretty close, I think.
ggplot(dataset, aes(x=Pos, y=variable,
color=cut(value, breaks=10),
shape=ifelse(is.na(value),"Missing","Present"))) +
geom_point(size=4) +
geom_line() +
scale_shape_manual(name="",values=c(Missing=4,Present=19))+
scale_color_discrete(labels=c("0-10%","10-20%","20-30%","30-40%","40-50%","50-60%","60-70%","70-80%","80-90%","90-100%")) +
xlab("CpG Positions") +
ylab("Sample") +
labs(color="Coverage in %") +
theme_bw() +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5),plot.title = element_text(vjust=2),axis.title.x = element_text(vjust=-0.5),axis.title.y = element_text(vjust=1.5))
Change are:
used color instead of fill, with shape=19 for points with data
added shape aesthetic to ggplot(...) call.
removed shape=21 from geom_point(...) call.
added scale_shape_manual(...) to define the shapes for Missing and Present, and turn off the guide label.
I know you wanted filled points with a black outline (it does look better), but when I tried that with the added shape aesthetic, the fill legend does not display the colors correctly. Try it yourself.

Here is another approach that comes closer to producing the graph you specified (circular points with black outline and fill color determined by coverage).
fill.colors <- hcl(h=seq(15, 375, length=11), l=65, c=100)[1:10]
ggplot(dataset, aes(x=Pos, y=variable,
fill=cut(value, breaks=10),
shape=ifelse(is.na(value),"Missing","Present"))) +
geom_point(size=4) +
geom_line() +
scale_fill_manual(name="Coverage in %",
values=fill.colors,
labels=c("0-10%","10-20%","20-30%","30-40%","40-50%","50-60%","60-70%","70-80%","80-90%","90-100%"),
drop=FALSE) +
scale_shape_manual(name="",values=c(Missing=4,Present=21),limits=c("Missing"))+
xlab("CpG Positions") +
ylab("Sample") +
labs(color="Coverage in %") +
theme_bw() +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5),
plot.title = element_text(vjust=2),
axis.title.x = element_text(vjust=-0.5),
axis.title.y = element_text(vjust=1.5))+
guides(fill=guide_legend(override.aes=list(colour=fill.colors),order=1))
The problem in the other answer with using point shape 21 and the fill aesthetic is that, while the fill colors are displayed correctly in the plot, they are not displayed correctly in the legend. One way around that is to force ggplot to set the legend fill colors using
guides(fill=guide_legend(override.aes=list(colour=fill.colors),order=1))
Unfortunately, to do that you have to specify the fill colors manually (so that the actual fill and the override fill are the same). This code does that using
fill.colors <- hcl(h=seq(15, 375, length=11), l=65, c=100)[1:10]
which creates a color palette that mimics the ggplot default. You could of course use your own color palette here.
While this does come closer to your original intent, I actually think the other answer provides a better data visualization. The black outlines around the points, while "attractive", make it much more difficult to distinguish between fill colors, especially with 10 possible colors (which is at the edge of discernability anyway).

I can't see, why this is not working:
fill.colors <- hcl(h=seq(15, 375, length=11), l=65, c=100)[1:10]
ggplot(dataset, aes(x=Pos, y=variable
,color=cut(value, breaks=c(-0.01,10,20,30,40,50,60,70,80,90,100))
,shape=ifelse(is.na(value),"Missing","Present"))) +
geom_point(size=4) +
scale_shape_manual(name="",values=c("Missing"=4,"Present"=19),limits=c("Missing"))+
scale_color_manual(name="Coverage in %",
values=ifelse(is.na(dataset$value),"grey",fill.colors),
labels=c("0-10%","10-20%","20-30%","30-40%","40-50%","50-60%","60-70%","70-80%","80-90%","90-100%"),drop=FALSE) +
theme_bw() +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5),
plot.title = element_text(vjust=2),
axis.title.x = element_text(vjust=-0.5),
axis.title.y = element_text(vjust=1.5)) +
xlab("CpG Positions") +
ylab("Sample") +
labs(color="Coverage in %") +
guides(fill=guide_legend(override.aes=list(colour=fill.colors),order=1))
NA values are not shown anymore with an X, and instead of displaying them in "grey", the class 90-100% will be shown in grey. No error message is shown - what is the problem?

Apply coord_flip() to single layer

I would like to have a boxplot showing the same distribution underneath my histogram.
The code below almost works, but coord_flip() is being applied to all layers, instead of just the geom_boxplot layer.
plot1<-ggplot(newdatahistogram, aes_string(x=newdatahistogram[RawLocation])) +
xlab(GGVar) + ylab("Proportion of Instances") +
geom_histogram(aes(y=..density..), binwidth=1, colour="black", fill="white",origin=-0.5) +
scale_x_continuous(limits=c(-3,6), breaks=seq(0,5,by=1), expand=c(.01,0)) +
geom_boxplot(aes_string(x=-1, y=newdatahistogram[RawLocation])) + coord_flip()
How can I apply coord_flip() to a single layer?
Thank you!

I got it to work with a bit of a hack;
plot1 <- ggplot(newdatahistogram, aes_string(x=newdatahistogram[RawLocation], fill=(newdatahistogram[,"PQ"]))) +
xlab(GGVar) + ylab("Proportion of Observation") +
geom_histogram(aes(y=..density..), binwidth=1, colour="black", origin=-0.5) +
scale_x_continuous(limits=c(-1,6), breaks=seq(0,5,by=1), expand=c(.01,0)) +
scale_y_continuous(limits=c(-.2,1), breaks=seq(0,1,by=.2))
theme(plot.margin = unit(c(0,0,0,0), "cm"))
plot_box <- ggplot(newdatahistogram) +
geom_boxplot(aes_string(x=1, y=newdatahistogram[RawLocation])) +
scale_y_continuous(breaks=(0:5), labels=NULL, limits=c(-1,6), expand=c(.0,-.03)) +
scale_x_continuous(breaks=NULL) + xlab(NULL) + ylab(NULL) +
coord_flip() + theme_bw() +
theme(plot.margin = unit(c(0,0,.0,0), "cm"),
line=element_blank(),text=element_blank(),
axis.line = element_blank(),title=element_blank(), panel.border=theme_blank())
PB = ggplotGrob(plot_box)
plot1 <- plot1 + annotation_custom(grob=PB, xmin=-1.01, xmax=5.95, ymin=-.3,ymax=0)
This saves the rotated boxplot as a grob object and inserts it into the plot under the histogram.
I needed to play with the expansion element a bit to get the scales to line up,
but it works!
Seriously though, I think ggplot should have a horizontal boxplot available without cord_flip()... I tried to edit the boxplot code, but it was way too difficult for me!
Tried to post image, but not enough reputation

You can't: coord_flip always acts on all layers. However, you do have two alternatives:
The solution here shows how to use grid.arrange() to add a marginal histogram. (The comments in the question also link to a nice base-R way to do the same thing)
You could indicate density using a rug plot on of the four sides of the plot with plot1 + geom_rug(sides='r')
ggplot(mpg, aes(x=class, y=cty)) +
geom_boxplot() + geom_rug(sides="r")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Minor ticks on log10 scale using ggplot2 - r

You can modify your log scale by adding arguments breaks= to scale_y_log10(), only there shouldn't be a 0 value because from those values also log is calculated. df<-data.frame(x=1:10000,y=1:10000) ggplot(df,aes(x,y))+geom_line()+ scale_y_log10(breaks=c(1,5,10,85,300,5000))

Related

ggplot2: smooth and fill

Dodged geom_errorbar with geom_bar using fill, colour, and alpha

How to align labels with coord_polar?

How to illustrate non available data points in a different shape using ggplot2?

Apply coord_flip() to single layer

Categories

Resources