How to align multiple legends and avoid overlapping in ggplot? - r

The bounty expires in 7 days. Answers to this question are eligible for a +50 reputation bounty.
Electrino wants to draw more attention to this question.
I am trying to create a plot that combines 2 separate legends and a grid of multiple plots. The issue I'm having is I'm finding it difficult to align the legends so they are visible and not overlapping. hopefully the example below will explain what I mean.
To begin I am going to create 2 plots. In these two plots I am only interested in the legends, and I am discarding the actual plot (so please ignore the actual plots in these two plots). To get just the legend I am using the cowplot package.
library(ggplot2)
library(cowplot)
# -------------------------------------------------------------------------
# plot 1 ------------------------------------------------------------------
# create fake data
dfLegend_1 <- data.frame(x = LETTERS[1:10], y = c(1:10))
# set colours
pointColours <- c(A = "#F5736A", B = "#D58D00", C = "#A0A300",
D = "#36B300", E = "#00BC7B", F = "#00BCC2",
G = "#00ADF4", H = "#928DFF", I = "#E568F0",
J = "#808080")
# plot
ggLegend_1 <- ggplot(dfLegend_1, aes(x=x, y=y))+
geom_point(aes(fill = pointColours), shape = 22, size = 10) +
scale_fill_manual(values = unname(pointColours),
label = names(pointColours),
name = 'Variable') +
theme(legend.key.size = unit(0.5, "cm")) +
theme_void()
# get legend
legend_1 <- get_legend(ggLegend_1)
# -------------------------------------------------------------------------
# plot 2 ------------------------------------------------------------------
# Create fake data
dflegend_2 <- data.frame(
x = runif(100),
y = runif(100),
z2 = abs(rnorm(100))
)
# plot
ggLegend_2 <- ggplot(dflegend_2, aes(x=x, y = y))+
geom_point(aes(color = z2), shape = 22, size = 10) +
scale_color_gradientn(
colours = rev(colorRampPalette(c('steelblue', '#f7fcfd', 'orange'))(5)),
limits = c(0,10),
name = 'Gradient',
guide = guide_colorbar(
frame.colour = "black",
ticks.colour = "black"
))
# get legend
legend_2 <- get_legend(ggLegend_2)
Then I am creating many plots (in this example, I am creating 20 individual plots) and plotting them on a grid:
# create data
dfGrid <- data.frame(x = rnorm(10), y = rnorm(10))
# make a list of plots
plotList <- list()
for(i in 1:20){
plotList[[i]] <- ggplot(dfGrid) +
geom_ribbon(aes(x = x, ymin = min(y), ymax = 0), fill = "red", alpha = .5) +
geom_ribbon(aes(x = x, ymin = min(0), ymax = max(y)), fill = "blue", alpha = .5) +
theme_void()
}
# plot them on a grid
gridFinal <- cowplot::plot_grid(plotlist = plotList)
Finally, I am joining the two legends together and adding them to the grid of many plots:
# add legends together into on single plot
legendFinal <- plot_grid(legend_2, legend_1, ncol = 1)
# plot everything on the same plot
plot_grid(gridFinal, legendFinal, rel_widths = c(3, 1))
This results in something that looks like this:
As you can see, the legends overlap and are not very well spaced. I was wondering if there is any way to fit everything in whilst having the legends appropriately spaced and readable?
I should also note, that, in general, there can be any number of variables and any number of gridded plots.

One option to fix your issue would be to switch to patchwork to glue your plots and the legends together. Especially I make use of the design argument to assign more space to the Variable legend. However, you should be aware that legends are much less flexible compared to plots, i.e. the size of legends is in absolute units and will not adjust to the available space. Hence, I'm not sure whether my solution will fit your desire for a "one-size-fits-all" approach.
library(patchwork)
design <-
"
ABCDEU
FGHIJV
KLMNOV
PQRSTV
"
plotList2 <- c(plotList, list(legend_2, legend_1))
wrap_plots(plotList2) +
plot_layout(design = design)

Related

R control jitter function - avoid overplotting / non-random jitter

My problems seems simple, I am using ggplot2 with geom_jitter() to plot a variable. (take my picture as an example)
Jitter now adds some random noise to the variable (the variable is just called "1" in this example) to prevent overplotting. So I have now random noise in the y-direction and clearly what otherwise would be completely overplotted is now better visible.
But here is my question:
As you can see, there are still some points, that overplot each other. In my example here, this could be easily prevented, if it wouldn't be random noise in y-direction... but somehow more strategically placed offsets.
Can I somehow alter the geom_jitter() behavior or is there a similar function in ggplot2 that does exactly this?
Not really a minimal example, but also not too long:
library("imputeTS")
library("ggplot2")
data <- tsAirgap
# 2.1 Create required data
# Get all indices of the data that comes directly before and after an NA
na_indx_after <- which(is.na(data[1:(length(data) - 1)])) + 1
# starting from index 2 moves all indexes one in front, so no -1 needed for before
na_indx_before <- which(is.na(data[2:length(data)]))
# Get the actual values to the indices and put them in a data frame with a label
before <- data.frame(id = "1", type = "before", input = na_remove(data[na_indx_before]))
after <- data.frame(id = "1", type = "after", input = na_remove(data[na_indx_after]))
all <- data.frame(id = "1", type = "source", input = na_remove(data))
# Get n values for the plot labels
n_before <- length(before$input)
n_all <- length(all$input)
n_after <- length(after$input)
# 2.4 Create dataframe for ggplot2
# join the data together in one dataframe
df <- rbind(before, after, all)
# Create the plot
gg <- ggplot(data = df) +
geom_jitter(mapping = aes(x = id, y = input, color = type, alpha = type), width = 0.5 , height = 0.5)
gg <- gg + ggplot2::scale_color_manual(
values = c("before" = "skyblue1", "after" = "yellowgreen","source" = "gray66"),
)
gg <- gg + ggplot2::scale_alpha_manual(
values = c("before" = 1, "after" = 1,"source" = 0.3),
)
gg + ggplot2::theme_linedraw() + theme(aspect.ratio = 0.5) + ggplot2::coord_flip()
So many good suggestions...here is what Bens suggestion would look like for my example:
I changed parts of my code to:
gg <- ggplot(data = df, aes(x = input, color = type, fill = type, alpha = type)) +
geom_dotplot(binwidth = 15)
Would basically also work as intended for me. ggbeeplot as suggested by Jon also worked great for my purpose.
I thought of a hack I really like, using ggrepel. It's normally used for labels, but nothing preventing you from making the label into a point.
df <- data.frame(x = rnorm(200),
col = sample(LETTERS[1:3], 200, replace = TRUE),
y = 1)
ggplot(df, aes(x, y, label = "●", color = col)) + # using unicode black circle
ggrepel::geom_text_repel(segment.color = NA,
box.padding = 0.01, key_glyph = "point")
A downside of this method is that ggrepel can take a lot time for a large number of points, and will recalculate differently each time you change the plot size. A faster alternative would be to use ggbeeswarm::geom_quasirandom, which uses a deterministic process to define jitter that looks random.
ggplot(df, aes(x,y, color = col)) +
ggbeeswarm::geom_quasirandom(groupOnX = FALSE)

Can I draw a horizontal line at specific number of range of values using ggplot2?

I have data (from excel) with the y-axis as ranges (also calculated in excel) and the x-axis as cell counts and I would like to draw a horizontal line at a specific value in the range, like a reference line. I tried using geom_hline(yintercept = 450) but I am sure it is quite naive and does not work that way for a number in range. I wonder if there are any better suggestions for it :)
plot.new()
library(ggplot2)
d <- read.delim("C:/Users/35389/Desktop/R.txt", sep = "\t")
head(d)
d <- cbind(row.names(d), data.frame(d), row.names=NULL)
d
g <- ggplot(d, aes(d$CTRL,d$Bin.range))+ geom_col()
g + geom_hline(yintercept = 450)
First of all, have a look at my comments.
Second, this is how I suggest you to proceed: don't calculate those ranges on Excel. Let ggplot do it for you.
Say, your data is like this:
df <- data.frame(x = runif(100, 0, 500))
head(df)
#> x
#>1 322.76123
#>2 57.46708
#>3 223.31943
#>4 498.91870
#>5 155.05416
#>6 107.27830
Then you can make a plot like this:
library(ggplot2)
ggplot(df) +
geom_histogram(aes(x = x),
boundary = 0,
binwidth = 50,
fill = "steelblue",
colour = "white") +
geom_vline(xintercept = 450, colour = "red", linetype = 2, size = 1) +
coord_flip()
We don't have your data, but the following data frame is of a similar structure:
d <- data.frame(CTRL = sample(100, 10),
Bin.range = paste(0:9 * 50, 0:9 * 50 + 49.9, sep = "-"))
The first thing to note is that your y axis does not have your ranges ordered correctly. You have 50-99.9 at the top of the y axis. This is because your ranges are stored as characters and ggplot will automatically arrange these alphabetically, not numerically. So you need to reorder the factor levels of your ranges:
d$Bin.range <- factor(d$Bin.range, d$Bin.range)
When you create your plot, don't use d$Bin.range, but instead just use Bin.range. ggplot knows to look for this variable in the data frame you have passed.
g <- ggplot(d, aes(CTRL, Bin.range)) + geom_col()
If you want to draw a horizontal line, your two options are to specify the y axis label at which you want to draw the line (i.e. yintercept = "400-449.9") or, which is what I suspect you want, use a numeric value of 9.5 which will put it between the top two values:
g + geom_hline(yintercept = 9.5, linetype = 2)

Six Variable Line Graph Without a Legend

I'm looking to replicate the design of the first plot seen in this post:
http://stats.blogoverflow.com/2011/12/andyw-says-small-multiples-are-the-most-underused-data-visualization/
This was taken from [1], but in that book there is no code shown for the figure.
This figure in the link has no legend, and instead shows the labels 'IN', 'MO', ect. in a staggered fashion positioned at the height of the respective line they represent. I know how to make plots using ggplot2, but the specific issue I'm having is writing code to make the labels on the right side of the figure like that. Could someone demonstrate how to do this?
1: Carr, Daniel & Linda Pickle. 2009. Visualizing Data Patterns with Micromaps. Boca Rotan, FL. CRC Press.
This is no easy task, especially to do programmatically. It involves working with grobs, nothing hard, but usually done manually on finished products, rather than generated procedurally.
Data
df <- read.table(text = 'Samples Day.1 Day.2 Day.3 Day.4
Seradigma335 1.2875 1.350 1.850 1.6125
Seradigma322 0.9375 2.400 1.487 1.8125
Sigma 1.1250 1.962 1.237 2.0500
Shapiro_red 0.7750 1.575 1.362 1.0125
Shapiro_w/red 0.7750 1.837 0.975 0.8250', header = T)
(taken from another question here on SO)
Code
library(tidyr)
library(ggplot2)
tidy_df <- df %>%
gather('Day','Value', -Samples)
Simple plot
g <- ggplot(tidy_df) +
geom_line(aes(Day, Value, group = Samples), show.legend = F) +
scale_x_discrete(expand = c(.02,0)) +
scale_y_continuous(limits = c(0,3)) +
theme_grey() +
theme(plot.margin = unit(c(0,.2,0,0), 'npc'),
panel.border = element_rect(color = 'black', fill = NA),
axis.ticks = element_blank()
)
Note the right margin (set with plot.margin = unit(c(0,.2,0,0), 'npc'))
Labels
library(cowplot)
g <- g +
draw_label(label = df$Samples[1], x = 4.1, y = df$Day.4[1], hjust = 0) +
draw_label(label = df$Samples[2], x = 4.1, y = df$Day.4[2], hjust = 0) +
draw_label(label = df$Samples[3], x = 4.1, y = df$Day.4[3], hjust = 0) +
draw_label(label = df$Samples[4], x = 4.1, y = df$Day.4[4], hjust = 0) +
draw_label(label = df$Samples[5], x = 4.1, y = df$Day.4[5], hjust = 0)
We added the labels directly to the plot (as opposed to a grob, see documentation), with an x of 4.1, so it's actually just outside the panel (Day.4 has an x of 4) and covered by the margin (it's called clipping).
Remove clipping
# transform to grob
gt <- ggplotGrob(g)
# set panel's clipping to off
gt$layout$clip[gt$layout$name == "panel"] <- "off"
# draw the grob
ggdraw(gt)
Notes
We can kind of speed up the label creation with a for loop:
for (i in 1:nrow(df)) {
g <- g + draw_label(label = df$Samples[i], x = 4.1, y = df$Day.4[i], hjust = 0)
}
But we can't really map to vars as we are used to with aesthetics.
As I said at the beginning these methods require quite a bit of work to find the right balance and positioning, that can't certainly be expected for a quick plot, but if something has to be published it may be worth.
You can use geom_label with the nudge_x argument like this:
library(data.table) # I always use data.table but not required
library(ggplot2)
a <- c(1:10)
b <- c(seq(1,20,2))
c <- c(seq(1,30,3))
d <- c(1:10)
aa <- data.table(a,b,c,d)
ggplot(aa)+
geom_line(aes(a,b))+
geom_line(aes(a,c))+
geom_line(aes(a,d))+
geom_label(aes(10,10), label = "line 1", nudge_x = 1)+
geom_label(aes(10,19), label = "line 2", nudge_x = 1)+
geom_label(aes(10,29), label = "line 3", nudge_x = 1)
You can directly input the ordered pair that corresponds to the desired location of your label. The plot looks like this:

Adding text outside plot doesn't work in r

I have a simple dataset:
11 observations, 1 variable.
I want to plot them adding my own axis names, but when I want to change the position of them, R keeps plotting them in the exact same spot.
Here is my script:
plot(data[,5], xlab = "", xaxt='n')
axis(1, at = 1:11, labels = F)
text(1:11, par("usr")[3] - 0.1, srt = 90, adj = 1, labels = names, xpd = TRUE)
I am changing the -0.1, to any number but R keeps placing the labels in the exact same spot. I tried with short names like "a" but the result is the same.
Thanks in advance
My data:
10308.9
10201.6
12685.3
3957.93
7677.1
9671.7
11849.4
10755.7
11283.4
11583.8
12066.9
names <- rep("name",11)
My ggplot solution:
# creating the sample dataframe
data <- read.table(text="10308.9
10201.6
12685.3
3957.93
7677.1
9671.7
11849.4
10755.7
11283.4
11583.8
12066.9", header=FALSE)
# adding a names column
data$names <- as.factor(paste0("name",sprintf("%02.0f", seq(1,11,1))))
#creating the plot
require(ggplot2)
ggplot(data, aes(x=names, y=V1)) +
geom_bar(fill = "white", color = "black")
which gives:
When you want to change the order of the bars, you can do that with transform:
# transforming the data (I placed "name04" as the first one)
data2 <- transform(data,
newnames=factor(names,
levels=c("name04","name01","name02","name03","name04","name05","name06","name07","name08","name09","name10","name11"),
ordered =TRUE))
#creating the plot
ggplot(data2, aes(x=newnames, y=V1)) +
geom_bar(stat="identity", fill="white", color="black")
which gives:

Using ggplot2 how can I represent a dot and a line in the legend

Using ggplot2 I am plotting several functions and a series of points. I cannot figure out how to represent the points on the legend. I realize I need to use an aes() function, but I don't fully understand how to do this. I apologize that the example is so long, but I don't know how else to illustrate it.
## add ggplot2
library(ggplot2)
# Declare Chart values
y_label = expression("y_axis"~~bgroup("(",val / km^{2},")"))
x_label = "x_axis"
#############################
## Define functions
# Create a list to hold the functions
funcs <- list()
funcs[]
# loop through to define functions
for(k in 1:21){
# Make function name
funcName <- paste('func', k, sep = '' )
# make function
func = paste('function(x){exp(', k, ') * exp(x*0.01)}', sep = '')
funcs[[funcName]] = eval(parse(text=func))
}
# Specify values
yval = c(1:20)
xval = c(1:20)
# make a dataframe
d = data.frame(xval,yval)
# Specify Range
x_range <- range(1,51)
# make plot
p <-qplot(data = d,
x=xval,y=yval,
xlab = x_label,
ylab = y_label,
xlim = x_range
)+ geom_point(colour="green")
for(j in 1:length(funcs)){
p <- p + stat_function(aes(y=0),fun = funcs[[j]], colour="blue", alpha=I(1/5))
}
# make one function red
p <- p + stat_function(fun = funcs[[i]], aes(color="red"), size = 1) +
scale_colour_identity("", breaks=c("red", "green","blue"),
labels=c("Fitted Values", "Measured values","All values"))
# position legend and make remove frame
p <- p + opts(legend.position = c(0.85,0.7), legend.background = theme_rect(col = 0))
print(p)
Thank you in advance - I have learned I a lot from this community over the last few days.
See below for a solution. The main idea is the following: imagine the points having an invisible line under them, and the lines having invisible points. So each "series" gets color and shape and linetype attributes, and at the end we will manually set them to invisible values (0 for lines, NA for points) as necessary. ggplot2 will merge the legends for the three attributes automatically.
# make plot
p <- qplot(data = d, x=xval, y=yval, colour="Measured", shape="Measured",
linetype="Measured", xlab = x_label, ylab = y_label, xlim = x_range,
geom="point")
#add lines for functions
for(j in 1:length(funcs)){
p <- p + stat_function(aes(colour="All", shape="All", linetype="All"),
fun = funcs[[j]], alpha=I(1/5), geom="line")
}
# make one function special
p <- p + stat_function(fun = funcs[[1]], aes(colour="Fitted", shape="Fitted",
linetype="Fitted"), size = 1, geom="line")
# modify look
p <- p + scale_colour_manual("", values=c("green", "blue", "red")) +
scale_shape_manual("", values=c(19,NA,NA)) +
scale_linetype_manual("", values=c(0,1,1))
print(p)
Setting the colour aesthetic for each geom to a constant may help. Here is a small example:
require(ggplot2)
set.seed(666)
N<-20
foo<-data.frame(x=1:N,y=runif(N),z=runif(N))
p<-ggplot(foo)
p<-p+geom_line(aes(x,y,colour="Theory"))
p<-p+geom_point(aes(x,z,colour="Practice"))
#Optional, if you want your own colours
p<-p+scale_colour_manual("Source",c('blue','red'))
print(p)
This isn't supported natively in ggplot2, but I'm hoping I'll figure out how for a future version.

Resources