Add a legend for a geom_text layer to explain labels - r

Consider the following example where a scatter is made and only the "significant" point are colored and labelled.
genes <- read.table("https://gist.githubusercontent.com/stephenturner/806e31fce55a8b7175af/raw/1a507c4c3f9f1baaa3a69187223ff3d3050628d4/results.txt", header = TRUE)
genes$Significant <- ifelse(genes$padj < 0.05, "FDR < 0.05", "Not Sig")
ggplot(genes, aes(x = log2FoldChange, y = -log10(pvalue))) +
geom_point(aes(color = Significant)) +
scale_color_manual(values = c("red", "grey")) +
theme_bw(base_size = 12) + theme(legend.position = "bottom") +
geom_text_repel(
data = subset(genes, padj < 0.05),
aes(label = Gene),
size = 5,
box.padding = unit(0.35, "lines"),
point.padding = unit(0.3, "lines")
)
It yields the following plot
Now imagine that the labels are actually acronyms and that they have a real full-length name (e.g., "DOK6" is the acronym for "Duo Ocarino Kayne 6"). Would it be possible to add a legend to the plot where the keys are the labels used on the plot, and the entries are the full-length name of the labels ?

First, I added Gene2 for another legend which only shows significant Gene.\
Next, Gene2 was added on the aes as a fill. (Only color would affect the color of points on geom_point).\
Finally, scale_fill_discrete was added for the second legend. All you need to do is just annotate the full-length name column at Full names here.
genes$Gene2 <-ifelse(genes$padj<0.05, genes$Gene, NA)
ggplot(genes, aes(x = log2FoldChange, y = -log10(pvalue),
fill=Gene2)) +
geom_point(aes(color = Significant)) +
scale_color_manual(values = c("red", "grey")) +
theme_bw(base_size = 12) + theme(legend.position = "bottom") +
geom_text_repel(
data = subset(genes, padj < 0.05),
aes(label = Gene),
size = 5,
box.padding = unit(0.35, "lines"),
point.padding = unit(0.3, "lines")
) +
scale_fill_discrete(labels=paste0(genes$Gene,': ',' Full names here'),
name='Significant genes') +
theme(legend.position = 'right')
Output

Related

Adjust ggplot legend

I am trying to make the following changes to the ggplot below (partly illustrated in the picture provided):
change shading legend to show economic cycle (shaded is a recession, no shade is an expansion)
add additional legend to show economic variables (green is 'CLI' and red is 'Inflation Expectations")
The code so far looks like this:
A <- ggplot(Alldata, aes(Date)) +
geom_tile(aes(alpha = Recession, y = 1),
fill = "grey", height = Inf) +
scale_alpha_continuous(range = c(0, 1), breaks = c(0, 1))+
geom_line(aes(y = stdINFEX), col = 'blue', size = .8)+
ylab('')+
theme( axis.text.y=element_blank(), #remove y axis labels
axis.ticks.y=element_blank() #remove y axis ticks
)
A
B <- A + geom_line(aes(y = CLI), col = 'green', size = .8)
B
Maybe this is what you are looking for:
You could set the labels for legend entries via the labels argument of the scale, e.g. using a named vector you could assign a label Expansion to the value "0"
To get a legend for your lines you have to map on aesthetics, i.e. move color=... inside of aes(). Note that using color names inside aes() is meaningless. Therefore I would suggest to use meaningful labels. You could then set your desired colors via scale_color_manual.
Finally, to set the labels for your legends you could make use of labs()
As you provided no example data (see how to make a minimal reproducible example) I make use of the ggplot2::economics dataset as example data:
library(ggplot2)
set.seed(123)
economics$Recession <- 0
economics$Recession[sample(1:nrow(economics), 100)] <- 1
ggplot(economics, aes(date)) +
geom_tile(aes(alpha = Recession, y = 1),
fill = "grey", height = Inf
) +
scale_alpha_continuous(range = c(0, 1),
breaks = c(0, 1),
labels = c("0" = "Expansion", "1" = "Recession")) +
geom_line(aes(y = psavert, color = "psavert"), size = .8) +
geom_line(aes(y = uempmed, color = "uempmed"), size = .8) +
scale_color_manual(values = c(psavert = "blue", uempmed = "green")) +
labs(y = NULL, alpha = "Economic Cycle", color = "Economic Variable") +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank()
)

Combine legend for fill and colour ggplot to give only single legend

I am plotting a smooth to my data using geom_smooth and using geom_ribbon to plot shaded confidence intervals for this smooth. No matter what I try I cannot get a single legend that represents both the smooth and the ribbon correctly, i.e I am wanting a single legend that has the correct colours and labels for both the smooth and the ribbon. I have tried using + guides(fill = FALSE), guides(colour = FALSE), I also read that giving both colour and fill the same label inside labs() should produce a single unified legend.
Any help would be much appreciated.
Note that I have also tried to reset the legend labels and colours using scale_colour_manual()
The below code produces the below figure. Note that there are two curves here that are essentially overlapping. The relabelling and setting couours has worked for the geom_smooth legend but not the geom_ribbon legend and I still have two legends showing which is not what I want.
ggplot(pred.dat, aes(x = age.x, y = fit, colour = tagged)) +
geom_smooth(size = 1.2) +
geom_ribbon(aes(ymin = lci, ymax = uci, fill = tagged), alpha = 0.2, colour = NA) +
theme_classic() +
labs(x = "Age (days since hatch)", y = "Body mass (g)", colour = "", fill = "") +
scale_colour_manual(labels = c("Untagged", "Tagged"), values = c("#3399FF", "#FF0033")) +
theme(axis.title.x = element_text(face = "bold", size = 14),
axis.title.y = element_text(face = "bold", size = 14),
axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
legend.text = element_text(size = 12))
The problem is that you provide new labels for the color-aesthetic but not for the fill-aesthetic. Consequently ggplot shows two legends because the labels are different.
You can either also provide the same labels for the fill-aesthetic (code option #1 below) or you can set the labels for the levels of your grouping variable ("tagged") before calling ggplot (code option #2).
library(ggplot2)
#make some data
x = seq(0,2*pi, by = 0.01)
pred.dat <- data.frame(x = c(x,x),
y = c(sin(x), cos(x)) + rnorm(length(x) * 2, 0, 1),
tag = rep(0:1, each = length(x)))
pred.dat$lci <- c(sin(x), cos(x)) - 0.4
pred.dat$uci <- c(sin(x), cos(x)) + 0.4
#option 1: set labels within ggplot call
pred.dat$tagged <- as.factor(pred.dat$tag)
ggplot(pred.dat, aes(x = x, y = y, color = tagged, fill = tagged)) +
geom_smooth(size = 1.2) +
geom_ribbon(aes(ymin = lci, ymax = uci), alpha = 0.2, color = NA) +
scale_color_manual(labels = c("untagged", "tagged"), values = c("#F8766D", "#00BFC4")) +
scale_fill_manual(labels = c("untagged", "tagged"), values = c("#F8766D", "#00BFC4")) +
theme_classic() + theme(legend.title = element_blank())
#option 2: set labels before ggplot call
pred.dat$tagged <- factor(pred.dat$tag, levels = 0:1, labels = c("untagged", "tagged"))
ggplot(pred.dat, aes(x = x, y = y, color = tagged, fill = tagged)) +
geom_smooth(size = 1.2) +
geom_ribbon(aes(ymin = lci, ymax = uci), alpha = 0.2, color = NA) +
theme_classic() + theme(legend.title = element_blank())

Bubble plot with facets displays data only for one facet instead of all

I have the dataframe below
GO<-c("cytosol (GO:0005829)","cytosol (GO:0005829)")
FE<-c(2.70,4.38)
FDR<-c(0.00159,0.00857)
Facet<-c("ileum 24h","ileum 72h")
CCC<-data.frame(GO,FE,FDR,Facet)
and with this code
CCC %>%
arrange(desc(CCC$GO))%>%
ggplot(aes(x=FDR, y=GO, size=FE, color=FDR)) +
geom_point(alpha=0.5) +
scale_size(range = c(0,8), name="Fold enrichment")+ facet_grid(cols = vars(Facet))+
theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=1)
)+
scale_y_discrete(name="GO biological process complete")+
scale_x_discrete(name ="FDR")+
scale_colour_gradient(low = "yellow", high = "red", na.value = NA,name="FDR")+ theme_bw()
I create the bubble plot below. The issue is that I should have had a bubble in both facets but now only one is displayed.
You have plotted points in both facets. It's just that you have told ggplot that you want the sizes to vary between 0 and 8 by setting scale_size(range = c(0, 8)). Since there are only two levels of the variable FE to map to the size scale, this means that the lower value is mapped to size 0 and the higher value is mapped to size 8.
So the simple fix is to either get rid of the size scale and just make the points the same size by setting geom_point(alpha = 0.5, size = 8) or change the range parameter in scale_size so that the smallest value is actually visible.
CCC %>%
arrange(desc(CCC$GO))%>%
ggplot(aes(x = FDR, y = GO, size = FE, color = FDR)) +
geom_point(alpha = 0.5) +
scale_size(range = c(5, 8), name = "Fold enrichment") +
facet_grid(cols = vars(Facet), scales = "free") +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust = 1)) +
scale_y_discrete(name = "GO biological process complete") +
scale_x_continuous(name = "FDR") +
scale_colour_gradient(low = "yellow", high = "red", name = "FDR") +
theme_bw()

Dotplot: How to change dot sizes of dotplot based on a value in data and make all x axis values into whole numbers

I have made a dotplot for my data but need to help with the finishing touches. Been around stackoverflow a bit and haven't seen any posts that directly answer my queries yet.
My code for my dotplot is:
ggplot()+
geom_dotplot(mapping = aes(x= reorder(Description, -p.adjust), y=Count, fill=-p.adjust),
data = head(X[which(X$p.adjust < 0.05),], n = 15), binaxis = 'y', dotsize = 2,
method = 'dotdensity', binpositions = 'all', binwidth = NULL)+
scale_fill_continuous(low="black", high="light grey") +
labs(y = "Associated genes", x = "wikipathways", fill = "p.adjust") +
theme(axis.text=element_text(size=8)) +
ggtitle('') +
theme(plot.title = element_text(2, face = "bold", hjust = 1),
legend.key.size = unit(2, "line")) +
theme(panel.background = element_rect(fill = 'white', colour = 'black'))+
coord_fixed(ratio = 0.5)+
coord_flip()
Let's say the X is something along the lines of:
Description p.adjust Count GeneRatio
1 DescriptionA 0.001 3 3/20
2 DescriptionB 0.002 2 2/20
3 DescriptionC 0.003 5 5/20
4 DescriptionD 0.004 10 10/20
To complete this plot I need two edits.
I would like to use base the size of the dots on the GeneRatio, and make a secondary key based around this size. Is this possible with ggplot2, dotplots?
Next I would like to keep the X axis values as integers. I'd want to avoid using something like scale_x_continuous(limits = c(2, 10)) as this plot code is part of a function for multiple data sets of various sizes. Thus containing the limits/scale would not work well.
Help would be most appreciated.
If you can switch to a geom_point chart instead of geom_dotplot it's easy to adjust the dot size according to a variable. It also seems to have corrected your axis issue luckily enough.
ggplot(x)+
geom_point(mapping = aes(x= reorder(Description, -p.adjust), y=Count, fill=-p.adjust, size=GeneRatio),
data = head(x[which(x$p.adjust < 0.05),], n = 15), binaxis = 'y', #dotsize = 2,
method = 'dotdensity', binpositions = 'all', binwidth = NULL)+
scale_fill_continuous(low="black", high="light grey") +
labs(y = "Associated genes", x = "wikipathways", fill = "p.adjust") +
theme(axis.text=element_text(size=8)) +
ggtitle('') +
theme(plot.title = element_text(2, face = "bold", hjust = 1),
legend.key.size = unit(2, "line")) +
theme(panel.background = element_rect(fill = 'white', colour = 'black'))+
coord_fixed(ratio = 0.5)+
coord_flip()

Create legend with manual shapes and colours

I use bars and line to create my plot. The demo code is:
timestamp <- seq(as.Date('2010-01-01'),as.Date('2011-12-01'),by="1 mon")
data1 <- rnorm(length(timestamp), 3000, 30)
data2 <- rnorm(length(timestamp), 30, 3)
df <- data.frame(timestamp, data1, data2)
p <- ggplot()
p <- p + geom_histogram(data=df,aes(timestamp,data1),colour="black",stat="Identity",bindwidth=10)
p <- p + geom_line(data=df,aes(timestamp,y=data2*150),colour="red")
p <- p + scale_y_continuous(sec.axis = sec_axis(~./150, name = "data2"))
p <- p + scale_colour_manual(name="Parameter", labels=c("data1", "data2"), values = c('black', 'red'))
p <- p+ scale_shape_manual(name="Parameter", labels=c("data1", "data2"), values = c(15,95))
p
This results in a plot like this:
This figure does not have a legend. I followed this answer to create a customized legend but it is not working in my case. I want a square and line shape in my legend corresponding to bars and line. How can we get it?
I want legend as shown in below image:
For the type of data you want to display, geom_bar is a better fit then geom_histogram. When you to manipulate the appaerance of the legend(s), you need to place the colour = ... parts inside the aes. To get the desired result it probably best to use different types of legend for the line and the bars. In that way you are better able to change the appearance of the legends with guide_legend and override.aes.
A proposal for your problem:
ggplot(data = df) +
geom_bar(aes(x = timestamp, y = data1, colour = "black"),
stat = "Identity", fill = NA) +
geom_line(aes(x = timestamp, y = data2*150, linetype = "red"), colour = "red", size = 1) +
scale_y_continuous(sec.axis = sec_axis(~./150, name = "data2")) +
scale_linetype_manual(labels = "data2", values = "solid") +
scale_colour_manual(name = "Parameter\n", labels = "data1", values = "black") +
guides(colour = guide_legend(override.aes = list(colour = "black", size = 1),
order = 1),
linetype = guide_legend(title = NULL,
override.aes = list(linetype = "solid",
colour = "red",
size = 1),
order = 2)) +
theme_minimal() +
theme(legend.key = element_rect(fill = "white", colour = NA),
legend.spacing = unit(0, "lines"))
which gives:

Resources