ggplot2 separating legend by shape - r

# Data:
zz <- "Small Large Lat Long
1 51 2 11 10
2 49 0 12 11
3 77 7 13 13
4 46 5 12 15
5 32 6 13 14
6 54 3 15 17
7 68 0 14 10
8 39 5 12 13"
Data <- as.data.frame(read.table(text=zz, header = TRUE))
I have a continuous variable, a ratio (small/large), I am successfully plotting.
Although, some 0s exist within the 'large' variable. When this occurs, I just want to plot the 'small' number as a ratio is impossible. To do this I have the following:
ratio.both <- Data %>%
filter(Large > 0) %>%
mutate(Ratio = Small/Large)
only.sml<- Data %>%
filter(Large < 1)
I then plot both on the same graph (by lat long data):
ggplot() +
geom_point(data = ratio.both,
aes(x = Long,
y = Lat,
size = Ratio),
stroke = 0,
colour = '#3B3B3B',
shape=16) +
#
geom_point(data = only.sml,
aes(x = Long,
y = Lat,
size = Small,
shape=1),
stroke = 1,
shape=1)
Notice the difference in shape. This plots the following
not the nicest graph but demonstrates example
The difference between those which are a ratio (filled) and those which are just the small value is clear on the map but difficult in the legend.
I want the following in the legend:
#Title
Size = both.ratio$Ratio,
Shape/fill = Ratio or small value #whichever is easier

It is much easier to use variables in the table to contrast the data using the built in aesthetics mapping, instead of creating separate geoms for the small and large data. You can for example create a new variable that checks whether that datapoint belongs to the large or small "type". You can then map shape, color, size or whatever you want in aesthetics and optionally add scales for these manually (if you want).
Data %>%
mutate(is_large = ifelse(Large > 0, "Ratio", "Small"),
size = ifelse(is_large == "Large", Small/Large, Small)) %>%
ggplot(aes(Long, Lat,
size = size,
shape = is_large)) +
geom_point() +
scale_shape_manual(values = c("Ratio" = 16, "Small" = 1),
name = "Size") +
scale_size_continuous(name = "Ratio/small value")
Or if you want to contrast by point color:
Data %>%
mutate(is_large = ifelse(Large > 0, "Ratio", "Small"),
size = ifelse(is_large == "Large", Small/Large, Small)) %>%
ggplot(aes(Long, Lat,
size = size,
color = is_large)) +
geom_point() +
scale_color_manual(values = c("Ratio" = "blue", "Small" = "red"),
name = "Size") +
scale_size_continuous(name = "Ratio/small value")

Related

Using plotmath with geom_label_repel to have subscripts within the labels

I am trying to have subscripts in my geom_label. e.g.
Maine
Apo (km/h) = 9
Qt (m/s) = 90
I am aware of using [x] to get subscripts but I am not sure how to achieve that when I want to get the label values (partly) from a column. I tried using tidyeval (!!) to no avail. Even simply changing parse = T gives me errors. It could be something rudimentary that I am overlooking, but after reading this thread using plotmath in ggrepel labels, I am not sure if it is as simple as I thought.
Here is with what I have so far. I provided the packages and the data I have used, along with data cleaning/preparation steps. Finally, I've shown the code that I have used for creating the "preliminary" plot.
library(tidyverse)
library(stringr)
library(usmap)
library(ggrepel)
library(rlang)
read.table(text = "State Apo Qt
NJ 1 10
MO 2 20
SD 3 30
NY 4 40
FL 5 50
OK 6 60
NE 7 70
KY 8 80
ME 9 90
CA 10 100
NC 11 110
MA 12 120
CT 13 140", header = T, stringsAsFactor = F) -> ex1
# get the states full names
region <- state.name[match(ex1$State,state.abb)]
region <- str_to_title(region)
# US map data (50 States)
us1 <- usmap::us_map()
# adding full names to the dataset
ex_df <- cbind(region = region, ex1)
# adding dataset values to the map data (only states with data)
us_val1 <- left_join(ex_df, us1, by = c("region" = "full"))
# full map dataset joined by ex1 dataset to draw the map
us_map1 <- left_join(us1, ex_df, by = c("full" ="region")) %>%
mutate(qQt = replace_na(Qt, 0))
# creating a dataset with centroids of the states (only the ones in ex1)
us_centroids1 <-
us_val1 %>%
group_by(region) %>%
summarise(centroid.x = mean(range(x)),
centroid.y = mean(range(y)),
label = unique(State),
`Apo` = unique(Apo),
`Qt` = unique(Qt))
## drawing the plot
ggplot() +
geom_polygon(data = us_map1,
aes(x,y, group = group, fill = Qt),
color = "black",
size = .1) +
geom_label_repel(data = us_centroids1,
aes(centroid.x, centroid.y,
label = paste(region, "\n Apo (km/h) = ", `Apo`, "\n Qt (m/s) =", `Qt`)),
size = 5/14*8,
box.padding = 1,
parse = F) +
scale_fill_gradientn(name = expression(Q[t]~(m/s)),
breaks = c(0, seq(10,130,20)),
labels = c("", seq(10,130,20)),
limits = c(0, 130),
colors = c("#DCDCDC", "lightblue", "green"),
guide = guide_colorbar(barwidth = 0.8, barheight = 18)) +
theme_void()
This is kind of a pain, since plotmath doesn't appear to have line breaks. Thus, you have to work around it with atop(). Use bquote() to insert variable values into the expression. This only works on one element at once, thus we have to pmap() over the three variables.
ggplot() +
geom_polygon(data = us_map1,
aes(x,y, group = group, fill = Qt),
color = "black",
size = .1) +
geom_label_repel(data = us_centroids1,
aes(centroid.x, centroid.y,
label = pmap(list(region, Apo, Qt),
\(x,y,z) bquote(atop(.(x), # first line of lab
atop(A[po] (km/h) == .(y), # second line
Q[t] (m/s) == .(z)) # third line
)
)
)
),
size = 5/14*8,
box.padding = 1,
parse = T) +
scale_fill_gradientn(name = expression(Q[t]~(m/s)),
breaks = c(0, seq(10,130,20)),
labels = c("", seq(10,130,20)),
limits = c(0, 130),
colors = c("#DCDCDC", "lightblue", "green"),
guide = guide_colorbar(barwidth = 0.8, barheight = 18)) +
theme_void()
Created on 2022-07-31 by the reprex package (v2.0.1)

ggplot2 heatmap with tile height and width as aes()

I'm trying to create a heat map for an OD matrix, but I wanted to scale the rows and columns by certain weights. Since these weights are constant across each category I would expect the plot would keep the rows and columns structure.
# Tidy OD matrix
df <- data.frame (origin = c(rep("A", 3), rep("B", 3),rep("C", 3)),
destination = rep(c("A","B","C"),3),
value = c(0, 1, 10, 5, 0, 11, 15, 6, 0))
# Weights
wdf <- data.frame(region = c("A","B","C"),
w = c(1,2,3))
# Add weights to the data.
plot_df <- df %>%
merge(wdf %>% rename(w_origin = w), by.x = 'origin', by.y = 'region') %>%
merge(wdf %>% rename(w_destination = w), by.x = 'destination', by.y = 'region')
Here's how the data looks like:
> plot_df
destination origin value w_origin w_destination
1 A A 0 1 1
2 A C 15 3 1
3 A B 5 2 1
4 B A 1 1 2
5 B B 0 2 2
6 B C 6 3 2
7 C B 11 2 3
8 C A 10 1 3
9 C C 0 3 3
However, when passing the weights as width and height in the aes() I get this:
ggplot(plot_df,
aes(x = destination,
y = origin)) +
geom_tile(
aes(
width = w_destination,
height = w_origin,
fill = value),
color = 'black')
It seems to be working for the size of the columns (width), but not quite because the proportions are not the right. And the rows are all over the place and not aligned.
I'm only using geom_tile because I could pass height and width as aesthetics, but I accept other suggestions.
The issue is that your tiles are overlapping. The reason is that while you could pass the width and the heights as aesthetics, geom_tile will not adjust the x and y positions of the tiles for you. As your are mapping a discrete variable on x and y your tiles are positioned on a equidistant grid. In your case the tiles are positioned at .5, 1.5 and 2.5. The tiles are then drawn on these positions with the specified width and height.
This could be easily seen by adding some transparency to your plot:
library(ggplot2)
library(dplyr)
ggplot(plot_df,
aes(x = destination,
y = origin)) +
geom_tile(
aes(
width = w_destination,
height = w_origin,
fill = value), color = "black", alpha = .2)
To achieve your desired result you have to manually compute the x and y positions according to the desired widths and heights to prevent the overlapping of the boxes. To this end you could switch to a continuous scale and set the desired breaks and labels via scale_x/y_ continuous:
breaks <- wdf %>%
mutate(cumw = cumsum(w),
pos = .5 * (cumw + lag(cumw, default = 0))) %>%
select(region, pos)
plot_df <- plot_df %>%
left_join(breaks, by = c("origin" = "region")) %>%
rename(y = pos) %>%
left_join(breaks, by = c("destination" = "region")) %>%
rename(x = pos)
ggplot(plot_df,
aes(x = x,
y = y)) +
geom_tile(
aes(
width = w_destination,
height = w_origin,
fill = value), color = "black") +
scale_x_continuous(breaks = breaks$pos, labels = breaks$region, expand = c(0, 0.1)) +
scale_y_continuous(breaks = breaks$pos, labels = breaks$region, expand = c(0, 0.1))
So I think I have a partial solution for you. After playing arround with geom_tile, it appears that the order of your dataframe matters when you are using height and width.
Here is some example code I came up with off of yours (run your code first). I converted your data_frame to a tibble (part of dplyr) to make it easier to sort by a column.
# Converted your dataframe to a tibble dataframe
plot_df_tibble = tibble(plot_df)
# Sorted your dataframe by your w_origin column:
plot_df_tibble2 = plot_df_tibble[order(plot_df_tibble$w_origin),]
# Plotted the sorted data frame:
ggplot(plot_df_tibble2,
aes(x = destination,
y = origin)) +
geom_tile(
aes(
width = w_destination,
height = w_origin,
fill = value),
color = 'black')
And got this plot:
Link to image I made
I should note that if you run the converted tibble before you sort that you get the same plot you posted.
It seems like the height and width arguements may not be fully developed for this portion of geom_tile, as I feel that the order of the df should not matter.
Cheers

R - (ggplot2 library) - Legends not showing on graphs

What I'm doing
I'm using a library for R called ggplot2, which allows for a lot of different options for creating graphics and other things. I'm using that to display two different data sets on one graph with different colours for each set of data I want to display.
The Problem
I'm also trying to get a legend to to show up in my graph that will tell the user which set of data corresponds to which colour. So far, I've not been able to get it to show.
What I've tried
I've set it to have a position at the top/bottom/left/right to make sure nothing was making it's position to none by default, which would've hidden it.
The Code
# PDF/Plot generation
pdf("activity-plot.pdf")
ggplot(data.frame("Time"=times), aes(x=Time)) +
#Data Set 1
geom_density(fill = "#1A3552", colour = "#4271AE", alpha = 0.8) +
geom_text(x=mean(times)-1, y=max(density(times)$y/2), label="Mean {1} Activity", angle=90, size = 4) +
geom_vline(aes(xintercept=mean(times)), color="cyan", linetype="dashed", size=1, alpha = 0.5) +
# Data Set 2
geom_density(data=data.frame("Time"=timesSec), fill = "gray", colour = "orange", alpha = 0.8) +
geom_text(x=mean(timesSec)-1, y=max(density(timesSec)$y/2), label="Mean {2} Activity", angle=90, size = 4) +
geom_vline(aes(xintercept=mean(timesSec)), color="orange", linetype="dashed", size=1, alpha = 0.5) +
# Main Graph Info
labs(title="Activity in the past 48 hours", subtitle="From {DATE 1} to {DATE 2}", caption="{LOCATION}") +
scale_x_continuous(name = "Time of Day", breaks=seq(c(0:23))) +
scale_y_continuous(name = "Activity") +
theme(legend.position="top")
dev.off()
Result
As pointed out by #Ben, you should pass the color into an aes in order to get the legend being displayed.
However, a better way to get a ggplot is to merge your two values "Time" and "Timesec" into a single dataframe and reshape your dataframe into a longer format. Here, to illustrate this, I created this dummy dataframe:
Time = sample(1:24, 200, replace = TRUE)
Timesec = sample(1:24, 200, replace = TRUE)
df <- data.frame(Time, Timesec)
Time Timesec
1 22 23
2 21 9
3 19 9
4 10 6
5 7 24
6 15 9
... ... ...
So, the first step is to reshape your dataframe into a longer format. Here, I'm using pivot_longer function from tidyr package:
library(tidyr)
library(dplyr)
df %>% pivot_longer(everything(), names_to = "var",values_to = "val")
# A tibble: 400 x 2
var val
<chr> <int>
1 Time 22
2 Timesec 23
3 Time 21
4 Timesec 9
5 Time 19
6 Timesec 9
7 Time 10
8 Timesec 6
9 Time 7
10 Timesec 24
# … with 390 more rows
To add geom_vline and geom_text based on the mean of your values, a nice way of doing it easily is to create a second dataframe gathering the mean and the maximal density values needed to be plot:
library(tidyr)
library(dplyr)
df_lab <- df %>% pivot_longer(everything(), names_to = "var",values_to = "val") %>%
group_by(var) %>%
summarise(Mean = mean(val),
Density = max(density(val)$y))
# A tibble: 2 x 3
var Mean Density
<chr> <dbl> <dbl>
1 Time 11.6 0.0555
2 Timesec 12.1 0.0517
So, using df and df_lab, you can generate your entire plot. Here, we passed color and fill arguments into the aes and use scale_color_manual and scale_fill_manual to set appropriate colors:
library(dplyr)
library(tidyr)
library(ggplot2)
df %>% pivot_longer(everything(), names_to = "var",values_to = "val") %>%
ggplot(aes(x = val, fill = var, colour = var))+
geom_density(alpha = 0.8)+
scale_color_manual(values = c("#4271AE", "orange"))+
scale_fill_manual(values = c("#1A3552", "gray"))+
geom_vline(inherit.aes = FALSE, data = df_lab,
aes(xintercept = Mean, color = var), linetype = "dashed", size = 1,
show.legend = FALSE)+
geom_text(inherit.aes = FALSE, data = df_lab,
aes(x = Mean-0.5, y = Density/2, label = var, color = var), angle = 90,
show.legend = FALSE)+
labs(title="Activity in the past 48 hours", subtitle="From {DATE 1} to {DATE 2}", caption="{LOCATION}") +
scale_x_continuous(name = "Time of Day", breaks=seq(c(0:23))) +
scale_y_continuous(name = "Activity") +
theme(legend.position="top")
Does it answer your question ?

Connect geom_line only between specified factors

I have a dataset that has diameter values for 4 treatment groups for several different months. I am plotting Diameter ~ Treatment for each month, as well as the Diameter changes between months ~ Treatment.
Dataset looks like this:
# the data that contains diameter for each month and diameter differences between months
> head(gatheredDiameterAndTreatmentData)
Treatment Month Diameter
1 Aux_Drop Diameter_mm.Sep01 55.88
2 Aux_Spray Diameter_mm.Sep01 63.50
3 DMSO Diameter_mm.Sep01 66.04
4 Water Diameter_mm.Sep01 43.18
5 Aux_Drop Diameter_mm.Sep01 38.10
6 Aux_Spray Diameter_mm.Sep01 76.20
# data that contains mean diameter and mean diameter changes for each month
> head(subMeansDiameter)
Treatment Month Diameter SEdiam
1 Aux_Drop Diameter_mm.Dec 83.63857 29.62901
2 Aux_Drop Diameter_mm.Feb01 101.20923 24.84024
3 Aux_Drop Diameter_mm.Feb02 110.00154 22.51364
4 Aux_Drop Diameter_mm.Jan 93.00308 25.13485
5 Aux_Drop Diameter_mm.Mar 116.84000 22.19171
6 Aux_Drop Diameter_mm.Nov01 74.50667 17.40454
Here is my code:
# assign the factors name to pick
factorsOnXaxis.DiameterByMonth = c(
"Diameter_mm.Sep01", "DiameterDiff.Sep01ToDec", "Diameter_mm.Dec", "DiameterDiff.DecToMar", "Diameter_mm.Mar")
# assign name to above factors
factorsOnXaxisName = c('Sep','Dec-Sep','Dec', 'Mar-Dec', 'Mar')
# start plotting
gatheredDiameterAndTreatmentData %>%
subset(Diameter != "NA") %>%
ggplot(aes(x = factor(Month), y = Diameter)) +
geom_point(aes(colour = Treatment), na.rm = TRUE,
position = position_dodge(width = 0.2)) +
geom_point(data = subMeansDiameter, size = 4, aes(colour = Treatment),
na.rm = TRUE, position = position_dodge(width = 0.2)) +
theme_bw() + # remove background
# add custom color to the "Treatment" levels
scale_colour_manual(
values = c("Aux_Drop" = "Purple", "Aux_Spray" = "Red",
"DMSO" = "Orange", "Water" = "Green")) +
# rearrange the x-axis
scale_x_discrete(limits = factorsOnXaxis.DiameterByMonth, labels = factorsOnXaxisName) +
# to connect the "subMeans - Diameter" values across time points
geom_line(data = subMeansDiameter, aes(
x = Month, y = Diameter, group = Treatment, colour = Treatment),
position = position_dodge(width = 0.2))
Which gives me a plot like this:
Instead of geom_line connecting line for each time points I want the line to be joined between specified x-axis factors, i.e
between Sep, Dec, March
between Dec-Sep to Mar-Dec
I tried to manipulate the code line that uses geom_line as:
geom_line(data = subMeansDiameter, aes(
x = c("DiameterDiff.Sep01ToDec", "DiameterDiff.DecToMar"), y = Diameter, group = Treatment, colour = Treatment),
position = position_dodge(width = 0.2))
to connect the line between Dec-Sep to Mar-Dec.
But, this is not working. How can I change my code?
Here is the data file I stores as *.tsv.
gatheredDiameterAndTreatmentData = http://s000.tinyupload.com/index.php?file_id=38251290073324236098
subMeans = http://s000.tinyupload.com/index.php?file_id=93947954496987393129
Here you need to define groups explicitly as color is not enough.
Your example is not reproducible but here's something that will give you the idea, here's a plot with no explicit group:
ggplot(iris,aes(Sepal.Width, Sepal.Length, color = Species)) + geom_line()
And now here's one with a group aesthetic, I have split the data using Sepal.Length's values but you'll most likely use an ifelse deending on the month :
ggplot(iris,aes(Sepal.Width, Sepal.Length, color = Species,
group = interaction(Species, Sepal.Length > 5.5))) +
geom_line()

Selectively colored geom_hline

I am using hline from ggplot to construct an axis for a data set I am looking out. Essentially I want to selectively color this axis based on a dataframe. This dataframe consists of an array of (7684, 7685,...,7853) and each corresponds to a letter "a", "b", "c", and "d". I would like to correspond each letter with a color used to color that interval on the axis.
For example row 1 of this data frame is: (7684, "c") so I would want to color the interval on the axis from 7684 to 7685 the color of "c" which could be red for instance. I have yet to think of a straightforward solution to this, I am not sure if hline would be the way to go with this.
> df
p nucleotide
1 c 7684
2 c 7685
3 t 7686
4 t 7687
5 a 7688
6 c 7689
7 a 7690
8 t 7691
9 a 7692
10 c 7693
Small snippet of what I am talking about. Basically want to associate df$p with colors. And color the interval of the corresponding df$nucleotide
You never use a for loop in ggplot and you should never use df$.. in an aesthetic.
library(dplyr)
library(ggplot2)
ggplot(df) +
geom_segment(aes(x = nucleotide, xend = lead(nucleotide), y = 1, yend = 1, color = p), size = 4)
#> Warning: Removed 1 rows containing missing values (geom_segment).
This takes us half the way. What is does is draw a segment from x to xend. x is mapped to the nucleotide value, xend is mapped to lead(nucleotide), meaning the next value. This of course lead to leaving out the last line, as it does not have a next value.
The following code takes care of that, admittedly in a hackish way, adding a row to the df, and then limiting scale_x . It may be not generalizable.
It also add some graphical embellishment.
df %>%
add_row(p = '', nucleotide = max(.$nucleotide) + 1) %>%
ggplot() +
geom_segment(aes(x = nucleotide, xend = lead(nucleotide), y = 1, yend = 1, color = p), size = 4) +
geom_text(aes(x = nucleotide, y = 1, label = nucleotide), nudge_x = .5, size = 3) +
scale_x_continuous(breaks = NULL, limits = c(min(df$nucleotide), max(df$nucleotide) + 1)) +
scale_color_brewer(palette = 'Dark2', limits = c('a', 'c', 't'), direction = 1) +
theme(aspect.ratio = .2,
panel.background = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank())
#> Warning: Removed 1 rows containing missing values (geom_segment).
#> Warning: Removed 1 rows containing missing values (geom_text).
Data
df <- read.table(text = ' p nucleotide
1 c 7684
2 c 7685
3 t 7686
4 t 7687
5 a 7688
6 c 7689
7 a 7690
8 t 7691
9 a 7692
10 c 7693', header = T)

Resources