How to add a custom legend to plot with ggplot? - r

I want to add a new legend to my plot. But I hope the legend is all customized.
I add the label for every point by geom_text_repel.
The new legend describes the of character of the labels.

You can create a legend by creating "dummy" data that contains the legend key labels. You would then "plot" the dummy data in order to generate the legend, but use blank symbols so that nothing actually gets plotted.
library(ggplot2)
theme_set(theme_classic())
# Fake data for plotting
set.seed(2)
val = sapply(sample(1:4,30,replace=TRUE), function(x) paste(sort(sample(c('c','u','x','t'), x)), collapse=""))
dat = data.frame(x=runif(30), y=runif(30), val)
# Dummy data for creating the legend
leg = data.frame(x1=rep(0,4), y1=rep(0,4), ll = c("c: coor","u: url","x: xss","t: text"))
ggplot(data=dat, aes(x,y)) +
geom_text(aes(label=val)) +
geom_point(data=leg, aes(x1, y1, colour=ll)) +
theme(legend.key.size=unit(15,"pt"),
legend.title=element_blank(),
legend.margin=margin(l=0),
legend.text=element_text(size=12)) +
scale_colour_manual(values=rep("#00000000", 4))
You could also use geom_text to place the "legend" annotations directly:
leg = data.frame(ll = sort(c("c: coor","u: url","x: xss","t: text")))
leg$y = seq(mean(dat$y) + 0.05*diff(range(dat$y)),
mean(dat$y) - 0.05*diff(range(dat$y)),
length=4)
leg$x = 1.07 * max(dat$x)
ggplot(data=dat, aes(x,y)) +
geom_text(aes(label=val)) +
geom_text(dat=leg, aes(label=ll), hjust=0, colour="red") +
annotate(xmin=1.05 * max(dat$x), xmax=1.18 * max(dat$x), ymin=0.95*min(leg$y), ymax=1.04*max(leg$y),
geom="rect", fill=NA, colour="black") +
scale_x_continuous(limits=c(min(dat$x), 1.18*max(dat$x)))

Related

Hide legend elements in ggplot2

I am trying to plot the parameter estimates and levels of hierarchy from a stan model output. For the legend, I am hoping to remove all labels except for the "Overall Effects" label but I can't figure out how to remove all of the species successfully.
Here is the code:
ggplot(dfwide, aes(x=Estimate, y=var, color=factor(sp), size=factor(rndm),
alpha=factor(rndm))) +
geom_point(position =pd) +
geom_errorbarh(aes(xmin=(`2.5%`), xmax=(`95%`)), position=pd,
size=.5, height = 0, width=0) +
geom_vline(xintercept=0) +
scale_colour_manual(values=c("blue", "red", "orangered1","orangered3", "sienna4",
"sienna2", "green4", "green3", "purple2", "magenta2"),
labels=c("Overall Effects", expression(italic("A. pensylvanicum"),
italic("A. rubrum"), italic("A. saccharum"),
italic("B. alleghaniensis"), italic("B. papyrifera"),
italic("F. grandifolia"), italic("I. mucronata"),
italic("P. grandidentata"), italic("Q. rubra")))) +
scale_size_manual(values=c(3, 1, 1, 1, 1, 1, 1, 1, 1, 1)) +
scale_shape_manual(labels="", values=c("1"=16,"2"=16)) +
scale_alpha_manual(values=c(1, 0.4)) + guides(size=FALSE, alpha=FALSE) +
ggtitle(label = "A.") +
scale_y_discrete(limits = rev(unique(sort(dfwide$var))), labels=estimates) +
ylab("") +
labs(col="Effects") + theme(legend.title=element_blank())
The key points you need to notice is that remove part of the labels in legend can't be achieved by the function in ggplot2, what you need to do is interact with grid, which more underlying since both lattice and ggplot2 are based grid,to do some more underlying work, we need some functions in the grid.
To remove part of the labels in legend, there are three functions need to be used, they are grid.force(), grid.ls() and grid.remove() . After draw the picture by ggplot2, then using grid.force() and grid.ls(), we can find all the elements in the picture, they all are point, line, text, etc. Then we may need to find the elements we are interested, this process is interactive, since names of the element in ggplot2 are made by some numbers and text, they are not always meanful, after we identify the names of the element we are interested, we can use the grid.remove() function to remove the elements, blew is the sample code I made.
library(grid)
library(ggplot2)
set.seed(1)
data <- data.frame(x = rep(1:10, 2), y = sample(1:100, 20),
type = sample(c("A", "B"), 20, replace = TRUE))
ggplot(data, aes(x = x, y =y,color = type))+
geom_point()+
geom_line()+
scale_color_manual(values = c("blue", "darkred"))+
theme_bw()
until now, we have finished draw the whole picture, then we need to do some works remove some elements in the picture.
grid.force()
grid.ls()
grid.ls() list all the element names
grid.remove("key-4-1-1.5-2-5-2")
grid.remove("key-4-1-2.5-2-5-2")
grid.remove("label-4-3.5-4-5-4")
It's not perfect, but my solution would be to actually make two plots and combine them together. See this post where I lifted the extraction code from.
I don't have your data, but I think you will get the idea below:
library(ggplot2)
library(gridExtra)
library(grid)
#g_table credit goes to https://stackoverflow.com/a/11886071/2060081
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend)}
p_legend = ggplot(dfwide[sp=='Overall Effects'], aes(x=Estimate, y=var, color=factor(sp),
size=factor(rndm),
alpha=factor(rndm))) +
geom_point(position =pd) +
geom_errorbarh(aes(xmin=(`2.5%`), xmax=(`95%`)), position=pd,
size=.5, height = 0, width=0) +
geom_vline(xintercept=0) +
scale_colour_manual(values=c("blue"),
labels=c("Overall Effects"))) +
scale_size_manual(values=c(3)) +
scale_shape_manual(labels="", values=c("1"=16,"2"=16)) +
scale_alpha_manual(values=c(1, 0.4)) + guides(size=FALSE, alpha=FALSE) +
ggtitle(label = "A.") +
scale_y_discrete(limits = rev(unique(sort(dfwide$var))), labels=estimates) +
ylab("") +
labs(col="Effects") + theme(legend.title=element_blank())
p_legend = g_legend(p_legend)
One of your plots will just be the legend. Subset your data based on the Overall Effects and then plot the two plots together as a grid.

Visualizing crosstab tables with a plot in R - changing colours

I have the following code in R which is modified from here, which plots a crosstab table:
#load ggplot2
library(ggplot2)
# Set up the vectors
xaxis <- c("A", "B")
yaxis <- c("A","B")
# Create the data frame
df <- expand.grid(xaxis, yaxis)
df$value <- c(120,5,30,200)
#Plot the Data
g <- <- ggplot(df, aes(Var1, Var2)) + geom_point(aes(size = value), colour = "lightblue") + theme_bw() + xlab("") + ylab("")
g + scale_size_continuous(range=c(10,30)) + geom_text(aes(label = value))
It produces the right figure, which is great, but I was hoping to custom colour the four dots, ideally so that the top left and bottom right are both one colour and the top right and bottom left are another.
I have tried to use:
+ scale_color_manual(values=c("blue","red","blue","red"))
but that doesn't seem to work. Any ideas?
I would suggest that you colour by a vector in your data frame, as you don't have a column that gives you this, you can either create one, or make a rule based on existing columns (which I have done below):
g <- ggplot(df, aes(Var1, Var2)) + geom_point(aes(size = value, colour = (Var2!=Var1))) + theme_bw() + xlab("") + ylab("")
g + scale_size_continuous(range=c(10,30)) + geom_text(aes(label = value))
The important part is: colour = (Var2!=Var1), note that i put this inside the aesthetic (aes) for the geom_point
Edit: if you wish to remove the legend (you annotate the chart with totals, so I guess you don't really need it), you can add: g + theme(legend.position="none") to remove it

ggplot: add multiline text annotation outside of plot

I want to add a textbox of 10 separate, stacked lines outside of my plot area in ggplot. My text is: t = c("a=1", "b=2", "c=3", ... , "j=10") but these labels are independent of the data.frame that I made my original ggplot. How can I add 10 lines outside of the plot area?
For example, I want to add a textbox around my vector t on the right of the following plot:
df = data.frame(y=rnorm(300), test=rep(c(1,2,3),each=100))
t = c("a=1", "b=2", "c=3", "d=4", "e=5", "f=6", "g=7", "h=8", "i=0", "j=10")
p <- ggplot(df, aes(x=factor(test), y=y))
p <- p + geom_violin() + geom_jitter(height=0, width=0.1)
p <- p + theme(legend.title=element_blank(), plot.margin=unit(c(0.1, 3, 0.1, 0.1), "cm"))
p
try
library(gridExtra)
grid.arrange(p, right = tableGrob(matrix(t,ncol=1),
theme = ttheme_minimal(padding = unit(c(3,1),"line"))))
You can create a geom_text layer using the label values in t in order to get the labels printed as a legend. But we set alpha=0 in geom_text so that these labels won't be included in the plot, and we use legend.key=element_blank() and override.aes(list(size=0)) to get the "legend" labels (the t values) printed without the meaningless legend key.
p +
geom_text(data = data.frame(t, test=NA, y=NA), aes(label=t, colour=t), alpha=0, x=1, y=1) +
theme(legend.key=element_blank(),
legend.margin=margin(l=-10)) +
guides(colour=guide_legend(override.aes=list(size=0)))

X axis label is not showing in clustering dendrogram in ggplot

I have done a clustering dendrogram following a previous code I found online, but the x-axis of is not being shown in the graph. I would like to have the dissimilarity value shown in the x-axis, but I have not been successful.
females<-cervidae[cervidae$Sex=="female",]
dstf <- daisy(females[,9:14], metric = "euclidean", stand = FALSE)
hcaf <- hclust(dstf, method = "ave")
k <- 3
clustf <- cutree(hcaf,k=k) # k clusters
dendrf <- dendro_data(hcaf, type="rectangle") # convert for ggplot
clust.dff <- data.frame(label=rownames(females), cluster=factor(clustf),
females$Genus, females$Species)
dendrf[["labels"]] <- merge(dendrf[["labels"]],clust.dff, by="label")
rectf <- aggregate(x~cluster,label(dendrf),range)
rectf <- data.frame(rectf$cluster,rectf$x)
ymax <- mean(hcaf$height[length(hcaf$height)-((k-2):(k-1))])
fem=ggplot() +
geom_segment(data=segment(dendrf), aes(x=x, y=y, xend=xend, yend=yend)) +
geom_text(data=label(dendrf), aes(x, y, label= females.Genus, hjust=0,
color=females.Genus),
size=3) +
geom_rect(data=rectf, aes(xmin=X1-.3, xmax=X2+.3, ymin=0, ymax=ymax),
color="red", fill=NA)+
coord_flip() + scale_y_reverse(expand=c(0.2, 0)) +
theme_dendro() + scale_color_discrete(name="Genus") +
theme(legend.position="none")
Here is how my dendrogram looks:
Your code included theme_dendro(), which is described in its help file as:
Sets most of the ggplot options to blank, by returning blank theme
elements for the panel grid, panel background, axis title, axis text,
axis line and axis ticks.
You force the x-axis line / text / ticks to be visible in theme():
ggplot() +
geom_segment(data=segment(dendrf), aes(x=x, y=y, xend=xend, yend=yend)) +
geom_text(data=label(dendrf), aes(x, y, label= label, hjust=0,
color=cluster),
size=3) +
geom_rect(data=rectf, aes(xmin=X1-.3, xmax=X2+.3, ymin=0, ymax=ymax),
color="red", fill=NA)+
coord_flip() +
scale_y_reverse(expand=c(0.2, 0)) +
theme_dendro() +
scale_color_discrete(name="Cluster") +
theme(legend.position="none",
axis.text.x = element_text(), # show x-axis labels
axis.ticks.x = element_line(), # show x-axis tick marks
axis.line.x = element_line()) # show x-axis lines
(This demonstration uses a built-in dataset, since I'm not sure what's cervidae. Code used to create this is reproduced below:)
library(cluster); library(ggdendro); library(ggplot2)
hcaf <- hclust(dist(USArrests), "ave")
k <- 3
clustf <- cutree(hcaf,k=k) # k clusters
dendrf <- dendro_data(hcaf, type="rectangle") # convert for ggplot
clust.dff <- data.frame(label=rownames(USArrests),
cluster=factor(clustf))
dendrf[["labels"]] <- merge(dendrf[["labels"]],clust.dff, by="label")
rectf <- aggregate(x~cluster,label(dendrf),range)
rectf <- data.frame(rectf$cluster,rectf$x)
ymax <- mean(hcaf$height[length(hcaf$height)-((k-2):(k-1))])

More than six shapes in ggplot

I would like to plot lines with different shapes with more than six sets of data, using discrete colors. The problems are 1) a different legend is generated for line color and shape, but should be only one legend with the line color and shape, 2) when correcting the title for the line color legend, the color disappear.
t=seq(0,360,20)
for (ip in seq(0,10)) {
if (ip==0) {
df<-data.frame(t=t,y=sin(t*pi/180)+ip/2,sn=ip+100)
} else {
tdf<-data.frame(t=t,y=sin(t*pi/180)+ip/2,sn=ip+100)
df<-rbind(df,tdf)
}
}
head(df)
# No plot
# Error: A continuous variable can not be mapped to shape
gp <- ggplot(df,aes(x=t,y=y,group=sn,color=sn,shape=sn))
gp <- gp + labs(title = "Demo more than 6 shapes", x="Theat (deg)", y="Magnitude")
gp <- gp + geom_line() + geom_point()
print(gp)
# No plot
# Error: A continuous variable can not be mapped to shape (doesn't like integers)
gp <- ggplot(df,aes(x=t,y=y,group=sn,color=sn,shape=as.integer(sn)))
gp <- gp + labs(title = "Demo more than 6 shapes", x="Theat (deg)", y="Magnitude")
gp <- gp + geom_line() + geom_point()
print(gp)
# Gives warning about 6 shapes, and only shows 6 shapes, continous sn colors
gp <- ggplot(df,aes(x=t,y=y,group=sn,color=sn,shape=as.factor(sn)))
gp <- gp + labs(title = "Only shows six shapes, and two legends, need discrete colors",
x="Theat (deg)", y="Magnitude")
gp <- gp + geom_line() + geom_point()
print(gp)
# This is close to what is desired, but correct legend title and combine legends
gp <- ggplot(df,aes(x=t,y=y,group=sn,color=as.factor(sn),shape=as.factor(sn %% 6)))
gp <- gp + labs(title = "Need to combine legends and correct legend title", x="Theat (deg)", y="Magnitude")
gp <- gp + geom_line() + geom_point()
print(gp)
# Correct legend title, but now the line color disappears
gp <- ggplot(df,aes(x=t,y=y,group=sn,color=as.factor(sn),shape=as.factor(sn %% 6)))
gp <- gp + labs(title = "Color disappeard, but legend title changed", x="Theat (deg)", y="Magnitude")
gp <- gp + geom_line() + geom_point()
gp <- gp + scale_color_manual("SN",values=as.factor(df$sn))
print(gp)
# Add color and shape in geom_line / geom_point commands,
gp <- ggplot(df,aes(x=t,y=y,group=sn))
gp <- gp + labs(title = "This is close, but legend symbols are wrong", x="Theat (deg)", y="Magnitude")
gp <- gp + geom_line(aes(color=as.factor(df$sn)))
gp <- gp + geom_point(color=as.factor(df$sn),shape=as.factor(df$sn %% 6))
gp <- gp + scale_color_manual("SN",values=as.factor(df$sn))
print(gp)
First, it would be easier to convert sn to a factor.
df$sn <- factor(df$sn)
Then, you need to use scale_shape_manual to specify your shapes to use.
gp <- ggplot(df,aes(x=t, y=y, group=sn,color=sn, shape=sn)) +
scale_shape_manual(values=1:nlevels(df$sn)) +
labs(title = "Demo more than 6 shapes", x="Theat (deg)", y="Magnitude") +
geom_line() +
geom_point(size=3)
gp
This should give you what you want. You need to use scale_shape_manual because, even with sn as a factor, ggplot will only add up to 6 different symbols automatically. After that you have to specify them manually. You can change your symbols in a number of ways. Have a look at these pages for more information on how: http://sape.inf.usi.ch/quick-reference/ggplot2/shape
http://www.cookbook-r.com/Graphs/Shapes_and_line_types/
For me, the key to the error message about the 6 shapes is the part that says Consider specifying shapes manually..
If you add in the values in scale_shape_manual, I believe you'll get what you want. I made sn a factor in the dataset first.
df$sn = factor(df$sn)
ggplot(df, aes(x = t, y = y, group = sn, color = sn, shape = sn)) +
geom_point() +
geom_line() +
scale_shape_manual(values = 0:10)
I go to the Cookbook for R site when I need to remember which numbers correspond to which shapes.
Edit The example above shows adding 11 symbols, the same number of symbols in your example dataset. Your comments indicate that you have many more unique values for the sn variable than in your example. Be careful with using a long series of numbers in values, as not all numbers are defined as symbols.
Ignoring whether it is a good idea to have so many shapes in a single graphic or not, you can use letters and numbers as well as symbols as shapes. So if you wanted, say, 73 unique shapes based on a factor with 73 levels, you could use 19 symbols, all upper and lower case letters, and the numbers 0 and 1 as your values.
scale_shape_manual(values = c(0:18, letters, LETTERS, "0", "1"))
you can get about a hundred different shapes if you need them. good.shapes is a vector of the shape numbers that render on my screen without any fill argument.
library(ggplot2)
N = 100; M = 1000
good.shapes = c(1:25,33:127)
foo = data.frame( x = rnorm(M), y = rnorm(M), s = factor( sample(1:N, M, replace = TRUE) ) )
ggplot(aes(x,y,shape=s ), data=foo ) +
scale_shape_manual(values=good.shapes[1:N]) +
geom_point()

Resources