Repelling selective text in ggplot in R - r

I have a set of text to be printed using ggplot at (x,y) locations where only a subset of them overlaps. I would like to keep the ones not overlapping exactly where they are and then repel the ones that overlap (I know which ones do these -- for example the names of states in New England overlap while in the west nothing overlaps, I want to keep the western state names where they are but repel the ones in New England). When I use the geom_text_repel it repels all of the text. If I chose the subset that does not overlap and use geom_text to print them and the other using geom_text_repel because they are at different layers. Is there a way to fix some subset of text and repel the rest using geom_text_repel or do I need to go for a completely different solution?
Here is an example:
library(tidyverse)
library(ggrepel)
# state centers by fixing Alaska and Hawaii to look good in our maps
df = data.frame(x = state.center$x, y= state.center$y, z = state.abb)
overlaps = c('RI', 'DE', 'CT', 'MA')
df %>%
ggplot() +
geom_point(aes(x,y),
size = 1) +
# plot the ones I would like to keep where the are
# I want these right centered around the points
geom_text(aes(x,y,label=z),
data = df %>% filter(! z %in% overlaps),
size = 4) +
# plot the ones I would like to repel
geom_text_repel(aes(x,y,label=z),
data = df %>% filter(z %in% overlaps),
size = 4,
min.segment.length = unit(0, "npc")) +
coord_map() +
theme_minimal()
df %>%
ggplot() +
geom_point(aes(x,y),
size = 1) +
# if we repel all instead
geom_text_repel(aes(x,y,label=z),
size = 4,
min.segment.length = unit(0, "npc")) +
coord_map() +
theme_minimal()

Related

inner labelling for heatmap, in R ggplot

I am trying to add a number label on each cell of a heatmap. Because it also needs marginal barcharts I have tried two packages. iheatmapr and ComplexHeatmap.
(1st try) iheatmapr makes it easy to add to add bars as below, but I couldnt see how to add labels inside the heatmap on individual cells.
library(tidyverse)
library(iheatmapr)
library(RColorBrewer)
in_out <- data.frame(
'Economic' = c(2,1,1,3,4),
'Education' = c(0,3,0,1,1),
'Health' = c(1,0,1,2,0),
'Social' = c(2,5,0,3,1) )
rownames(in_out) <- c('Habitat', 'Resource', 'Combined', 'Protected', 'Livelihood')
GreenLong <- colorRampPalette(brewer.pal(9, 'Greens'))(12)
lowGreens <- GreenLong[0:5]
in_out_matrix <- as.matrix(in_out)
main_heatmap(in_out_matrix, colors = lowGreens)
in_out_plot <- iheatmap(in_out_matrix,
colors=lowGreens) %>%
add_col_labels() %>%
add_row_labels() %>%
add_col_barplot(y = colSums(bcio)/total) %>%
add_row_barplot(x = rowSums(bcio)/total)
in_out_plot
Then used: save_iheatmap(in_out_plot, "iheatmapr_test.png")
Because I couldnt use ggsave(device = ragg::agg_png etc) with iheatmapr object.
Also, the iheatmapr object's apparent incompatibility (maybe I am wrong) with ggsave() is a problem for me because I normally use ragg package to export image AGG to preserve font sizes. I am suspecting some other heatmap packages make custom objects that maybe incompatible with patchwork and ggsave.
ggsave("png/iheatmapr_test.png", plot = in_out_plot,
device = ragg::agg_png, dpi = 72,
units="in", width=3.453, height=2.5,
scaling = 0.45)
(2nd try) ComplexHeatmap makes it easy to label individual number "cells" inside a heatmap, and also offers marginal bars among its "Annotations", and I have tried it, but its colour palette system (which uses integers to refer to a set of colours) doesnt suit my RGB vector colour gradient, and overall it is a sophisticated package clearly designed to make graphics more advanced than what I am doing.
I am aiming for style as shown in screenshot example below, which was made in Excel.
Please can anyone suggest a more suitable R package for a simple heatmap like this with marginal bars, and number labels inside?
Instead of relying on packages which offer out-of-the-box solutions one option to achieve your desired result would be to create your plot from scratch using ggplot2 and patchwork which gives you much more control to style your plot, to add labels and so on.
Note: The issue with iheatmapr is that it returns a plotly object, not a ggplot. That's why you can't use ggsave.
library(tidyverse)
library(patchwork)
in_out <- data.frame(
'Economic' = c(1,1,1,5,4),
'Education' = c(0,0,0,1,1),
'Health' = c(1,0,1,0,0),
'Social' = c(1,1,0,3,1) )
rownames(in_out) <- c('Habitat', 'Resource', 'Combined', 'Protected', 'Livelihood')
in_out_long <- in_out %>%
mutate(y = rownames(.)) %>%
pivot_longer(-y, names_to = "x")
# Summarise data for marginal plots
yin <- in_out_long %>%
group_by(y) %>%
summarise(value = sum(value)) %>%
mutate(value = value / sum(value))
xin <- in_out_long %>%
group_by(x) %>%
summarise(value = sum(value)) %>%
mutate(value = value / sum(value))
# Heatmap
ph <- ggplot(in_out_long, aes(x, y, fill = value)) +
geom_tile() +
geom_text(aes(label = value), size = 8 / .pt) +
scale_fill_gradient(low = "#F7FCF5", high = "#00441B") +
theme(legend.position = "bottom") +
labs(x = NULL, y = NULL, fill = NULL)
# Marginal plots
py <- ggplot(yin, aes(value, y)) +
geom_col(width = .75) +
geom_text(aes(label = scales::percent(value)), hjust = -.1, size = 8 / .pt) +
scale_x_continuous(expand = expansion(mult = c(.0, .25))) +
theme_void()
px <- ggplot(xin, aes(x, value)) +
geom_col(width = .75) +
geom_text(aes(label = scales::percent(value)), vjust = -.5, size = 8 / .pt) +
scale_y_continuous(expand = expansion(mult = c(.0, .25))) +
theme_void()
# Glue plots together
px + plot_spacer() + ph + py + plot_layout(ncol = 2, widths = c(2, 1), heights = c(1, 2))

Geom_text() in R - how to change a label position of a specific dot in geom_point

I am struggling with changing the position of a few points' labels in the geom_point in ggplot. So far, my code:
p <- ggplot(all_short_filled, aes(x=Modernization, y=Passionate_love)) +
geom_point(size=2)+geom_abline(intercept = 0.965830, slope = -0.001127)+ theme_bw()
p1 <- p + geom_text(label=all_short_filled$Country, position = position_dodge(width = 1),
vjust = -0.5)
p1
It gives me something like this:
And I want to change the position of a few overlapping labels (such as Russia and Serbia, or the Netherlands and Belgium, so that, e.g., the label of Serbia would go below the dot, not the above). Please, send help :-)
You could create two label columns in your dataset: one for countries that should be plotted above their point and the other for below. Since I do not have a sample of your data I used the mtcars dataset to create a reproducible example:
This will require you to know which countries and is hardcoded.
library(datasets) # used to create fake data
library(tidyverse)
# create fake dataset for example
df <- tail(datasets::mtcars) %>%
tibble::rownames_to_column("car")
below <- c("Ferrari Dino", "Maserati Bora")
# create two columns for geom_text labels
data <- df %>%
dplyr::mutate(label_above = ifelse(car %in% below, "", car),
label_below = ifelse(car %in% below, car, ""))
# ignore scale_x.. and scale_y.. those were to fit points/labels neatly
ggplot2::ggplot(data, aes(x = hp, y = mpg)) +
geom_point() +
geom_text(aes(label = label_above), vjust = -0.5) + # labels above their points
geom_text(aes(label = label_below), vjust = 1) + # labels below their points
scale_x_continuous(expand = ggplot2::expansion(0.3)) +
scale_y_continuous(expand = ggplot2::expansion(0.15))
That being said, as mentioned in the comments ggrepel is usually very good at handling this sort of thing.

Group bars in bar plot using ggplot and empty limits, but avoid inconsistent axis.ticks

I produced this wonderful bar plot (see below). To quickly group my countries by region, I added scale_x_discrete(limits = ORDER ) with some empty limits "" (specified by ORDER). It adds empty bars to the plot, which seem to work fine for me, but the axis.ticks are not consistent. I does not add axis.ticks (which I prefer), but for the last empty bar, it does. Why that? How to get rid of this single tick?
ORDER <- c("Kiribati", "Marshall Islands", "Palau", "States of Micronesia",
"",
"Micronesia g." ,
"",
"Fiji", "Nauru", "PNG", "Solomon Islands", "Vanuatu",
"",
"Melanesia g.",
"",
"Cook Islands", "Niue", "Samoa", "Tonga", "Tuvalu",
"",
"Polynesia g."
)
ORDER
ggplot(ESA_coun_p ,aes(x=x, y=y))+
geom_col(position="dodge", na.rm=TRUE)+
scale_x_discrete(limits = ORDER )+
coord_flip()
thothal & Romain B. gave some great replies for solving the questions, both with their pro and cons.
#thothal: Your suggestion using labels instead of limits make the plot consistent as it adds axis ticks to all empty separation bars. However, it may require hard-coding of some empty extra observations and reordering factors. It also does not distinguish the different groups to well from each other.
#Romain B.: Your suggestion works very well and does distinguish the different groups clearly. However, I ran into difficulties with some more sophisticated plots, a "gap bar plot", which allows to compare values better in case of outliers (see below your example adjusted).
set.seed(10)
test <- data.frame(country = LETTERS[1:12],
region = c(1,1,1,1,2,2,3,4,4,4,5,5),
value = rnorm(12, m = 10))%>%
mutate(value=replace(value, country=='A', 100))
# I'm ordering by <value> here, so in the plot, they'll be ordered as such
test$country <- factor(test$country, levels = test$country[order(test$value)])
######
trans_rate_surf <- 0.02 ##play around, defines cropping of the cut of values
white_space_min_surf <- 20 ##littel bit above the last fully displaied bar
white_space_max_surf<- 80 ##littel bit below the first cropped bar
#####
trans_surf <- function(x){pmin(x,white_space_min_surf) + trans_rate_surf*pmax(x-white_space_min_surf,0)}
yticks_surf <- c(5, 10, 15, 20, 100) ## not within or too close to the white space
##
test$value_t <- trans_surf(test$value)
ggplot(test, aes(x = country, y = value_t)) + geom_bar(stat = 'identity') + coord_flip()+
geom_rect(aes(xmin=0, xmax=nrow(test)+0.6, ymin=trans_surf(white_space_min_surf), ymax=trans_surf(white_space_max_surf)), fill="white")+
scale_y_continuous(limits=c(0,NA), breaks=trans_surf(yticks_surf), labels=yticks_surf)
If I add now + facet_grid(rows = vars(region), scales = "free_y", space = "free_y") everything is messed up, because xmax=nrow(test) doesn't fit anymore, but would need to be region sensitive.
You could have a region variable and facet the plot according to it. You can then play with facet plot spacing.
You didn't provide data, so I made a dummy test dataframe.
set.seed(10)
test <- data.frame(country = LETTERS[1:12],
region = c(1,1,1,1,2,2,3,4,4,4,5,5),
value = rnorm(12, m = 10))
# I'm ordering by <value> here, so in the plot, they'll be ordered as such
test$country <- factor(test$country, levels = test$country[order(test$value)])
ggplot(test, aes(x = country, y = value)) + geom_bar(stat = 'identity') +
facet_grid(rows = vars(region), scales = "free_y", space = "free_y") + coord_flip() +
theme(panel.spacing = unit(1, "lines")) # play with this to spread more
This yields
While I ordered by value here, you can give the order you want as the levels of your factor.
EDIT : with "gap"
I will put a disclaimer here, that i personally do not think that using plots with axis breaks or gaps is a good idea.
This has been extensively discussed on this website before and there are many ways around it (e.g transforming your data, using log scales, building indices, etc.).
Since you're trying to kind of force it in your way, I'll give you another workaround : use a line with a large width.
trans_rate_surf <- 0.02 ##play around, defines cropping of the cut of values
white_space_min_surf <- 20 ##littel bit above the last fully displaied bar
white_space_max_surf<- 80 ##littel bit below the first cropped bar
#####
trans_surf <- function(x){pmin(x,white_space_min_surf) + trans_rate_surf*pmax(x-white_space_min_surf,0)}
yticks_surf <- c(5, 10, 15, 20, 100) ## not within or too close to the white space
##
test$value_t <- trans_surf(test$value)
ggplot(test, aes(x = country, y = value_t)) + geom_bar(stat = 'identity') + coord_flip() +
scale_y_continuous(limits=c(0,NA), breaks=trans_surf(yticks_surf), labels=yticks_surf) +
facet_grid(rows = vars(region), scales = "free_y", space = "free_y") + coord_flip() +
theme(panel.spacing = unit(1, "lines")) + # play with this to spread more
geom_hline(yintercept = trans_surf(50), size = 10, color = "white")
The last line of the plot is the only thing I've changed from your post's code. As a results, I get :
You should use labels instead of limits. Toy example below b/c you did not provide a regrex.
Explanation
With limits you set the, well, limits of the scale. As it is a discrete scale, it expects the unique data points. But your labels are not unique. What you want is to set the labels of the scale, and hence you should use the argument labels.
Data
library(tidyverse)
set.seed(1)
my_dat <- mtcars %>%
rownames_to_column() %>%
as_tibble() %>%
select(rowname, mpg) %>%
add_row(rowname = paste0("remove", 1:3), mpg = rep(0, 3)) %>%
slice(sample(NROW(.))) %>%
mutate(rowname = factor(rowname, rowname))
p <- ggplot(my_dat, aes(x=rowname, y = mpg)) +
geom_col(position = "dodge", na.rm=F) +
coord_flip()
rn <- gsub("^remove[0-9]+", "", my_dat$rowname)
Wrong Way using limits
p + scale_x_discrete(limits = rn)
Correct Way using labels
p + scale_x_discrete(labels = rn)

geom_violin overlapping plots

By default, the neighboring violins will touch each other at the widest point if the widest point occurs at the same height. I would like to make my violin plots wider so that they overlap each other. Basically, something more similar to a ridge plot:
Is that possible with geom_violin?
I saw the width parameter, but if I set it higher than 1, I get these warnings, which makes me think that may not be the most appropriate approach:
Warning: position_dodge requires non-overlapping x intervals
I don't think geom_violin is meant to do this by design, but we can hack it with some effort.
Illustration using the diamonds dataset from ggplot2:
# normal violin plot
p1 <- diamonds %>%
ggplot(aes(color, depth)) +
geom_violin()
# overlapping violin plot
p2 <- diamonds %>%
rename(x.label = color) %>% # rename the x-variable here;
# rest of the code need not be changed
mutate(x = as.numeric(factor(x.label)) / 2) %>%
ggplot(aes(x = x, y = depth, group = x)) +
# plot violins in two separate layers, such that neighbouring x values are
# never plotted in the same layer & there's no overlap WITHIN each layer
geom_violin(data = . %>% filter(x %% 1 != 0)) +
geom_violin(data = . %>% filter(x %% 1 == 0)) +
# add label for each violin near the bottom of the chart
geom_text(aes(y = min(depth), label = x.label), vjust = 2, check_overlap = TRUE) +
# hide x-axis labels as they are irrelevant now
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank())
gridExtra::grid.arrange(
p1 + ggtitle("Normal violins"),
p2 + ggtitle("Overlapping violins"),
nrow = 2
)

create a map with the adapted size of states

Hi visualization lovers,
I am trying to create a color map plot,like this one:
(source: https://github.com/hrbrmstr/albersusa)
BUT i want this maps to be biased so that the areas of the states to be proportional to the value I provide (in particular,I use GPD value).
What i mean is that I want some states to look bigger, some smaller that they are in reality but reminding the real USA map as much as possible.
No problems with the states moving or shape destroying.
Any ideas? Any ready solutions?
Currently I use R and albersusa package because it is something I am familiar with. Open to change!
My current code for the plot is:
gmap<-
ggplot() +
geom_map(data = counties#data, map = cmap,
aes(fill =atan(y/x),alpha=x+y, map_id = name),
color = "gray50") +
geom_map(data = smap, map = smap,
aes(x = long, y = lat, map_id = id),
color = "black", size = .5, fill = NA) +
theme_map(base_size = 12) +
theme(plot.title=element_text(size = 16, face="bold",margin=margin(b=10))) +
theme(plot.subtitle=element_text(size = 14, margin=margin(b=-20))) +
theme(plot.caption=element_text(size = 9, margin=margin(t=-15),hjust=0)) +
scale_fill_viridis()+guides(alpha=F,fill=F)
Here's a very ugly first try to get you started, using the outlines from the maps package and some data manipulation from dplyr.
library(maps)
library(dplyr)
library(ggplot2)
# Generate the base outlines
mapbase <- map_data("state.vbm")
# Load the centroids
data(state.vbm.center)
# Coerce the list to a dataframe, then add in state names
# Then generate some random value (or your variable of interest, like population)
# Then rescale that value to the range 0.25 to 0.95
df <- state.vbm.center %>% as.data.frame() %>%
mutate(region = unique(mapbase$region),
somevalue = rnorm(50),
scaling = scales::rescale(somevalue, to = c(0.25, 0.95)))
df
# Join your centers and data to the full state outlines
df2 <- df %>%
full_join(mapbase)
df2
# Within each state, scale the long and lat points to be closer
# to the centroid by the scaling factor
df3 <- df2 %>%
group_by(region) %>%
mutate(longscale = scaling*(long - x) + x,
latscale = scaling*(lat - y) + y)
df3
# Plot both the outlines for reference and the rescaled polygons
ggplot(df3, aes(long, lat, group = region, fill = somevalue)) +
geom_path() +
geom_polygon(aes(longscale, latscale)) +
coord_fixed() +
theme_void() +
scale_fill_viridis()
These outlines aren't the best, and the centroid positions they shrink toward cause the polygons to sometimes overlap the original state outline. But it's a start; you can find better shapes for US states and various centroid algorithms.

Resources