Highlight single points in scatter plot with ggplot2 and ggrepel - r

I want to highlight 4 single points in a scatter plot with a box surrounding the name associated with the plot. I am using ggrepel to create the boxes surrounding the plots and to repel them.
This is the code I have:
library(ggplot2)
gg <- ggplot(X, aes(x = XX, y = XY)) +
geom_point(col = "steelblue", size = 3) +
geom_smooth(method = "lm", col = "firebrick", se = FALSE) +
labs(title = "XX vs XY", subtitle = "X", y = "XX", x = "XY") +
scale_x_continuous(breaks = seq(76, 82, 1)) +
scale_y_continuous(breaks = seq(15, 19, 1))
library(ggrepel)
gg + geom_text_repel(aes(label = Female), size = 3, data = X)
gg + geom_label_repel(aes(label = Female), size = 2, data = X)
With that code, I obtain boxes surrounding all the plots. However, I only want to have the boxes in 4 specific plots and no boxes in the other plots. How can I do that?
Thanks in advance! Regards,
TD

Related

GGplot generates two legends with a bubble plot, how to delete one of them

The following code block generates a plot with two legends:
Spend7d_bubble <- ggplot(cluster_visuals,
aes(x = ltv_7d, y = avg_daily_sessions,
color = factor(cluster8), size = n)) +
geom_point(alpha = 0.5) +
scale_size_continuous(range = c(2, 25))
Notice how this generates two legends on the right, one for n and one for factor(cluster8).
How can I only include the legend for factor(cluster8) and also rename it to just 'cluster'?
Spend7d_bubble <- ggplot(cluster_visuals,
aes(x = ltv_7d, y = avg_daily_sessions,
color = factor(cluster8), size = n)) +
geom_point(alpha = 0.5) +
scale_size_continuous(range = c(2, 25), guide = 'none') +
labs(color = "Cluster")
Whichever of those aesthetics (color or size) that you don't want a legend for, should be out of aes(). As you see, you don't have any legend for alpha in geom_point since it is not an argument of aes.
ggplot(cluster_visuals,
aes(x = ltv_7d, y = avg_daily_sessions, color = factor(cluster8)), size = n) +
geom_point(alpha = 0.5) +
scale_size_continuous(range = c(2, 25))

Is there a way to customize the labels of size on a bubble plot to whatever I'd like in R?

As you can see on the image, R automatically assigns the values 0, 0.25... 1 for the size of the point. I was wondering if I could replace the 0, 0.25... 1 and make these text values instead while keeping the actual numerical values from the data.
library(ggplot2)
library(scales)
data(SLC4A1, package="ggplot2")
SLC4A1 <- read.csv(file.choose(), header = TRUE)
# bubble chart showing position of polymorphisms on gene, the frequency of each of these
# polymorphisms, where they are prominent on earth, and p-value
SLC4A1ggplot <- ggplot(SLC4A1, aes(Position, log10(Frequency)))+
geom_jitter(aes(col=Geographical.Location, size =(p.value)))+
labs(subtitle="Frequency of Various Polymorphisms", title="SLC4A1 Gene") +
labs(color = "Geographical Location") +
labs(size = "p-value") + labs(x = "Position of Polymorphism on SLC4A1 Gene") +
scale_size_continuous(range=c(1,4.5), trans = "reverse") +
guides(size = guide_legend(reverse = TRUE))
library(tidyver)
df <- data.frame(x = 1:5, y = 1:5,z = 1:5)
ggplot(df,aes(x = x, y = y, size = z)) +
geom_point()
ggplot(df,aes(x = x, y = y, size = z)) +
geom_point() +
scale_size_continuous(range = 1:2) # control range of circle size
See more here:
https://ggplot2.tidyverse.org/reference/scale_size.html

"Density" curve overlay on histogram where vertical axis is frequency (aka count) or relative frequency?

Is there a method to overlay something analogous to a density curve when the vertical axis is frequency or relative frequency? (Not an actual density function, since the area need not integrate to 1.) The following question is similar:
ggplot2: histogram with normal curve, and the user self-answers with the idea to scale ..count.. inside of geom_density(). However this seems unusual.
The following code produces an overinflated "density" line.
df1 <- data.frame(v = rnorm(164, mean = 9, sd = 1.5))
b1 <- seq(4.5, 12, by = 0.1)
hist.1a <- ggplot(df1, aes(v)) +
stat_bin(aes(y = ..count..), color = "black", fill = "blue",
breaks = b1) +
geom_density(aes(y = ..count..))
hist.1a
#joran's response/comment got me thinking about what the appropriate scaling factor would be. For posterity's sake, here's the result.
When Vertical Axis is Frequency (aka Count)
Thus, the scaling factor for a vertical axis measured in bin counts is
In this case, with N = 164 and the bin width as 0.1, the aesthetic for y in the smoothed line should be:
y = ..density..*(164 * 0.1)
Thus the following code produces a "density" line scaled for a histogram measured in frequency (aka count).
df1 <- data.frame(v = rnorm(164, mean = 9, sd = 1.5))
b1 <- seq(4.5, 12, by = 0.1)
hist.1a <- ggplot(df1, aes(x = v)) +
geom_histogram(aes(y = ..count..), breaks = b1,
fill = "blue", color = "black") +
geom_density(aes(y = ..density..*(164*0.1)))
hist.1a
When Vertical Axis is Relative Frequency
Using the above, we could write
hist.1b <- ggplot(df1, aes(x = v)) +
geom_histogram(aes(y = ..count../164), breaks = b1,
fill = "blue", color = "black") +
geom_density(aes(y = ..density..*(0.1)))
hist.1b
When Vertical Axis is Density
hist.1c <- ggplot(df1, aes(x = v)) +
geom_histogram(aes(y = ..density..), breaks = b1,
fill = "blue", color = "black") +
geom_density(aes(y = ..density..))
hist.1c
Try this instead:
ggplot(df1,aes(x = v)) +
geom_histogram(aes(y = ..ncount..)) +
geom_density(aes(y = ..scaled..))
library(ggplot2)
smoothedHistogram <- function(dat, y, bins=30, xlabel = y, ...){
gg <- ggplot(dat, aes_string(y)) +
geom_histogram(bins=bins, center = 0.5, stat="bin",
fill = I("midnightblue"), color = "#E07102", alpha=0.8)
gg_build <- ggplot_build(gg)
area <- sum(with(gg_build[["data"]][[1]], y*(xmax - xmin)))
gg <- gg +
stat_density(aes(y=..density..*area),
color="#BCBD22", size=2, geom="line", ...)
gg$layers <- gg$layers[2:1]
gg + xlab(xlabel) +
theme_bw() + theme(axis.title = element_text(size = 16),
axis.text = element_text(size = 12))
}
dat <- data.frame(x = rnorm(10000))
smoothedHistogram(dat, "x")

Adding multiple text annotations to a faceted ggplot geom_histogram

I have the following data.frame:
hist.df <- data.frame(y = c(rnorm(30,1,1), rnorm(15), rnorm(30,0,1)),
gt = c(rep("ht", 30), rep("hm", 15), rep("hm", 30)),
group = c(rep("sc", 30), rep("am", 15), rep("sc",30)))
from which I produce the following faceted histogram ggplot:
main.plot <- ggplot(data = hist.df, aes(x = y)) +
geom_histogram(alpha=0.5, position="identity", binwidth = 2.5,
aes(fill = factor(gt))) +
facet_wrap(~group) +
scale_fill_manual(values = c("darkgreen","darkmagenta"),
labels = c("ht","hm"),
name = "gt",
limits=c(0, 30))
In addition, I have this data.frame:
text.df = data.frame(ci.lo = c(0.001,0.005,-10.1),
ci.hi = c(1.85,2.25,9.1),
group = c("am","sc","sc"),
factor = c("nu","nu","alpha"))
Which defines the text annotations I want to add to the faceted histograms, so that the final figure will be:
So text.df$ci.lo and text.df$ci.hi are confidence intervals on the corresponding text.df$factor and they correspond to the faceted histograms through text.df$group
Note that not every histogram has all text.df$factor's.
Ideally, the ylim's of the faceted histograms will leave enough space for the text to be added above the histograms so that they appear only on the background.
Any idea how to achieve this?
Wrapping my comment into an answer:
text.df$ci <- paste0(text.df$factor, ' = [', text.df$ci.lo, ', ', text.df$ci.hi, ']')
new_labels <- aggregate(text.df$ci, by = list(text.df$group),
FUN = function(x) paste(x, collapse = '\n'))$x
hist.df$group <- factor(hist.df$group)
hist.df$group <- factor(hist.df$group,
labels = paste0(levels(hist.df$group), '\n', new_labels))
main.plot <- ggplot(data = hist.df, aes(x = y)) +
geom_histogram(alpha=0.5, position="identity", binwidth = 2.5,
aes(fill = factor(gt))) +
facet_wrap(~group) +
scale_fill_manual(values = c("darkgreen","darkmagenta"),
labels = c("ht","hm"),
name = "gt")
main.plot + theme(strip.text = element_text(size=20))
If you wish to stick to the original idea, this question has an answer that will help.

It is possible to create inset graphs?

I know that when you use par( fig=c( ... ), new=T ), you can create inset graphs. However, I was wondering if it is possible to use ggplot2 library to create 'inset' graphs.
UPDATE 1: I tried using the par() with ggplot2, but it does not work.
UPDATE 2: I found a working solution at ggplot2 GoogleGroups using grid::viewport().
Section 8.4 of the book explains how to do this. The trick is to use the grid package's viewports.
#Any old plot
a_plot <- ggplot(cars, aes(speed, dist)) + geom_line()
#A viewport taking up a fraction of the plot area
vp <- viewport(width = 0.4, height = 0.4, x = 0.8, y = 0.2)
#Just draw the plot twice
png("test.png")
print(a_plot)
print(a_plot, vp = vp)
dev.off()
Much simpler solution utilizing ggplot2 and egg. Most importantly this solution works with ggsave.
library(ggplot2)
library(egg)
plotx <- ggplot(mpg, aes(displ, hwy)) + geom_point()
plotx +
annotation_custom(
ggplotGrob(plotx),
xmin = 5, xmax = 7, ymin = 30, ymax = 44
)
ggsave(filename = "inset-plot.png")
Alternatively, can use the cowplot R package by Claus O. Wilke (cowplot is a powerful extension of ggplot2). The author has an example about plotting an inset inside a larger graph in this intro vignette. Here is some adapted code:
library(cowplot)
main.plot <-
ggplot(data = mpg, aes(x = cty, y = hwy, colour = factor(cyl))) +
geom_point(size = 2.5)
inset.plot <- main.plot + theme(legend.position = "none")
plot.with.inset <-
ggdraw() +
draw_plot(main.plot) +
draw_plot(inset.plot, x = 0.07, y = .7, width = .3, height = .3)
# Can save the plot with ggsave()
ggsave(filename = "plot.with.inset.png",
plot = plot.with.inset,
width = 17,
height = 12,
units = "cm",
dpi = 300)
I prefer solutions that work with ggsave. After a lot of googling around I ended up with this (which is a general formula for positioning and sizing the plot that you insert.
library(tidyverse)
plot1 = qplot(1.00*mpg, 1.00*wt, data=mtcars) # Make sure x and y values are floating values in plot 1
plot2 = qplot(hp, cyl, data=mtcars)
plot(plot1)
# Specify position of plot2 (in percentages of plot1)
# This is in the top left and 25% width and 25% height
xleft = 0.05
xright = 0.30
ybottom = 0.70
ytop = 0.95
# Calculate position in plot1 coordinates
# Extract x and y values from plot1
l1 = ggplot_build(plot1)
x1 = l1$layout$panel_ranges[[1]]$x.range[1]
x2 = l1$layout$panel_ranges[[1]]$x.range[2]
y1 = l1$layout$panel_ranges[[1]]$y.range[1]
y2 = l1$layout$panel_ranges[[1]]$y.range[2]
xdif = x2-x1
ydif = y2-y1
xmin = x1 + (xleft*xdif)
xmax = x1 + (xright*xdif)
ymin = y1 + (ybottom*ydif)
ymax = y1 + (ytop*ydif)
# Get plot2 and make grob
g2 = ggplotGrob(plot2)
plot3 = plot1 + annotation_custom(grob = g2, xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax)
plot(plot3)
ggsave(filename = "test.png", plot = plot3)
# Try and make a weird combination of plots
g1 <- ggplotGrob(plot1)
g2 <- ggplotGrob(plot2)
g3 <- ggplotGrob(plot3)
library(gridExtra)
library(grid)
t1 = arrangeGrob(g1,ncol=1, left = textGrob("A", y = 1, vjust=1, gp=gpar(fontsize=20)))
t2 = arrangeGrob(g2,ncol=1, left = textGrob("B", y = 1, vjust=1, gp=gpar(fontsize=20)))
t3 = arrangeGrob(g3,ncol=1, left = textGrob("C", y = 1, vjust=1, gp=gpar(fontsize=20)))
final = arrangeGrob(t1,t2,t3, layout_matrix = cbind(c(1,2), c(3,3)))
grid.arrange(final)
ggsave(filename = "test2.png", plot = final)
'ggplot2' >= 3.0.0 makes possible new approaches for adding insets, as now tibble objects containing lists as member columns can be passed as data. The objects in the list column can be even whole ggplots... The latest version of my package 'ggpmisc' provides geom_plot(), geom_table() and geom_grob(), and also versions that use npc units instead of native data units for locating the insets. These geoms can add multiple insets per call and obey faceting, which annotation_custom() does not. I copy the example from the help page, which adds an inset with a zoom-in detail of the main plot as an inset.
library(tibble)
library(ggpmisc)
p <-
ggplot(data = mtcars, mapping = aes(wt, mpg)) +
geom_point()
df <- tibble(x = 0.01, y = 0.01,
plot = list(p +
coord_cartesian(xlim = c(3, 4),
ylim = c(13, 16)) +
labs(x = NULL, y = NULL) +
theme_bw(10)))
p +
expand_limits(x = 0, y = 0) +
geom_plot_npc(data = df, aes(npcx = x, npcy = y, label = plot))
Or a barplot as inset, taken from the package vignette.
library(tibble)
library(ggpmisc)
p <- ggplot(mpg, aes(factor(cyl), hwy, fill = factor(cyl))) +
stat_summary(geom = "col", fun.y = mean, width = 2/3) +
labs(x = "Number of cylinders", y = NULL, title = "Means") +
scale_fill_discrete(guide = FALSE)
data.tb <- tibble(x = 7, y = 44,
plot = list(p +
theme_bw(8)))
ggplot(mpg, aes(displ, hwy, colour = factor(cyl))) +
geom_plot(data = data.tb, aes(x, y, label = plot)) +
geom_point() +
labs(x = "Engine displacement (l)", y = "Fuel use efficiency (MPG)",
colour = "Engine cylinders\n(number)") +
theme_bw()
The next example shows how to add different inset plots to different panels in a faceted plot. The next example uses the same example data after splitting it according to the century. This particular data set once split adds the problem of one missing level in one of the inset plots. As these plots are built on their own we need to use manual scales to make sure the colors and fill are consistent across the plots. With other data sets this may not be needed.
library(tibble)
library(ggpmisc)
my.mpg <- mpg
my.mpg$century <- factor(ifelse(my.mpg$year < 2000, "XX", "XXI"))
my.mpg$cyl.f <- factor(my.mpg$cyl)
my_scale_fill <- scale_fill_manual(guide = FALSE,
values = c("red", "orange", "darkgreen", "blue"),
breaks = levels(my.mpg$cyl.f))
p1 <- ggplot(subset(my.mpg, century == "XX"),
aes(factor(cyl), hwy, fill = cyl.f)) +
stat_summary(geom = "col", fun = mean, width = 2/3) +
labs(x = "Number of cylinders", y = NULL, title = "Means") +
my_scale_fill
p2 <- ggplot(subset(my.mpg, century == "XXI"),
aes(factor(cyl), hwy, fill = cyl.f)) +
stat_summary(geom = "col", fun = mean, width = 2/3) +
labs(x = "Number of cylinders", y = NULL, title = "Means") +
my_scale_fill
data.tb <- tibble(x = c(7, 7),
y = c(44, 44),
century = factor(c("XX", "XXI")),
plot = list(p1, p2))
ggplot() +
geom_plot(data = data.tb, aes(x, y, label = plot)) +
geom_point(data = my.mpg, aes(displ, hwy, colour = cyl.f)) +
labs(x = "Engine displacement (l)", y = "Fuel use efficiency (MPG)",
colour = "Engine cylinders\n(number)") +
scale_colour_manual(guide = FALSE,
values = c("red", "orange", "darkgreen", "blue"),
breaks = levels(my.mpg$cyl.f)) +
facet_wrap(~century, ncol = 1)
In 2019, the patchwork package entered the stage, with which you can create
insets
easily by using the inset_element() function:
require(ggplot2)
require(patchwork)
gg1 = ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_point()
gg2 = ggplot(iris, aes(Sepal.Length)) +
geom_density()
gg1 +
inset_element(gg2, left = 0.65, bottom = 0.75, right = 1, top = 1)

Resources