I want to build plot with double y-axes.
In image you can see my dataframe and plot. It was done in Excel, I need to do the sames in R. I tried to use latticeExtra library, but it doesn't show any lines and boxes
library(latticeExtra)
obj1 <- xyplot(Q_TY_PAPER ~ PU, df, type = "h")
obj2 <- xyplot(COM_USD ~ PU, df, type = "l")
doubleYScale(obj1, obj2, text = c("obj1", "obj2"))`
Can you please help me?
Here the capture of my dataset and the plot that I would like to get:
You need to separate your dataframe in two, one that will be used for the barchart and need to be reshape and the second one to be used for the line that need to be scaled.
Basically, the line will be plot on the same y axis that the barchart, however, we will add a secondary y axis that will have mark corresponding to the "real" value of the line.
So, first, we need to rescale the value plot as a line. As, we saw in your example that a value of 8 in the barchart match a value of 500 for the line, we can rescale by applying a ratio of 8/500:
df_line = df[,c("PU","COM_USD")]
df_line$COM_USD_2 = df_line$COM_USD * 8/500
> df_line
PU COM_USD COM_USD_2
1 Client1 464 7.424
2 Client2 237 3.792
3 Client3 179 2.864
4 Client4 87 1.392
5 Client5 42 0.672
6 Client6 27 0.432
7 Client7 10 0.160
For the barchart, we need to pivot the data in a longer format in order to fit the grammar of ggplot2. For doing that, we can use pivot_longer from tidyr packages (loaded with tidyverse):
library(tidyverse)
df_bar <- df %>% select(-COM_USD) %>% pivot_longer(., - PU, names_to = "Variable", values_to = "Value")
# A tibble: 21 x 3
PU Variable Value
<fct> <chr> <dbl>
1 Client1 Q_TY_PAPER 7.1
2 Client1 Q_TY_ONLINE 7.1
3 Client1 CURR 6
4 Client2 Q_TY_PAPER 3.8
5 Client2 Q_TY_ONLINE 3.8
6 Client2 CURR 3.9
7 Client3 Q_TY_PAPER 4.4
8 Client3 Q_TY_ONLINE 4.4
9 Client3 CURR 2.3
10 Client4 Q_TY_PAPER 2.6
# … with 11 more rows
Now, you can plot both of them by doing:
library(tidyverse)
ggplot(df_bar, aes(x = PU, y = Value))+
geom_bar(aes(fill = Variable), stat = "identity", position = position_dodge(), alpha = 0.8)+
geom_line(data = df_line, aes(x = PU, y = COM_USD_2, group = 1), size = 2, color = "blue")+
scale_y_continuous(name = "Quantity", limits = c(0,8), sec.axis = sec_axis(~(500/8)*., name = "USD"))+
theme(legend.title = element_blank(),
axis.title.x = element_blank())
As you can see, in scale_y_continuous, we are setting a second axis that will have the value of its ticks multiply by the reverse ratio (500/8). Like that, it will match values of the line plotted.
Finally, you get the following plot:
DATA
PU = paste0("Client",1:7)
COM_USD = c(464,237,179,87,42,27,10)
Q_TY_PAPER = c(7.1,3.8,4.4,2.6,1.2,1.1,0.5)
Q_TY_ONLINE = c(7.1,3.8,4.4,2.6,1.2,1.1,0.5)
CURR = c(6.0,3.9,2.3,0.2,0.2,0.1,0)
df = data.frame(PU,COM_USD, Q_TY_PAPER, Q_TY_ONLINE, CURR)
EDIT: Dealing with long names as x-axis labels
If your real data names of clients is too long, you can use this solution (Two lines of X axis labels in ggplot) to write them on two lines.
So, first modifying the PU variables:
PU = c("Jon Jon", "Bob Bob", "Andrew Andrew", "Henry Henry", "Alexander Alexander","Donald Donald", "Jack Jack")
COM_USD = c(464,237,179,87,42,27,10)
Q_TY_PAPER = c(7.1,3.8,4.4,2.6,1.2,1.1,0.5)
Q_TY_ONLINE = c(7.1,3.8,4.4,2.6,1.2,1.1,0.5)
CURR = c(6.0,3.9,2.3,0.2,0.2,0.1,0)
df = data.frame(PU,COM_USD, Q_TY_PAPER, Q_TY_ONLINE, CURR)
Then, we apply the same code as described above:
df_line = df[,c("PU","COM_USD")]
df_line$COM_USD_2 = df_line$COM_USD * 8/500
library(tidyverse)
df_bar <- df %>% select(-COM_USD) %>% pivot_longer(., - PU, names_to = "Variable", values_to = "Value")
But for the plot, you can use scale_x_discrete and specify labels by adding \n to indicate R to write x-labels on multiple lines:
ggplot(df_bar, aes(x = PU, y = Value))+
geom_bar(aes(fill = Variable), stat = "identity", position = position_dodge(), alpha = 0.8)+
geom_line(data = df_line, aes(x = PU, y = COM_USD_2, group = 1), size = 2, color = "blue")+
scale_y_continuous(name = "Quantity", limits = c(0,8), sec.axis = sec_axis(~(500/8)*., name = "USD"))+
theme(legend.title = element_blank(),
axis.title.x = element_blank())+
scale_x_discrete(labels = gsub(" ","\n",PU), breaks = PU)
And you get this:
Related
I am trying to have subscripts in my geom_label. e.g.
Maine
Apo (km/h) = 9
Qt (m/s) = 90
I am aware of using [x] to get subscripts but I am not sure how to achieve that when I want to get the label values (partly) from a column. I tried using tidyeval (!!) to no avail. Even simply changing parse = T gives me errors. It could be something rudimentary that I am overlooking, but after reading this thread using plotmath in ggrepel labels, I am not sure if it is as simple as I thought.
Here is with what I have so far. I provided the packages and the data I have used, along with data cleaning/preparation steps. Finally, I've shown the code that I have used for creating the "preliminary" plot.
library(tidyverse)
library(stringr)
library(usmap)
library(ggrepel)
library(rlang)
read.table(text = "State Apo Qt
NJ 1 10
MO 2 20
SD 3 30
NY 4 40
FL 5 50
OK 6 60
NE 7 70
KY 8 80
ME 9 90
CA 10 100
NC 11 110
MA 12 120
CT 13 140", header = T, stringsAsFactor = F) -> ex1
# get the states full names
region <- state.name[match(ex1$State,state.abb)]
region <- str_to_title(region)
# US map data (50 States)
us1 <- usmap::us_map()
# adding full names to the dataset
ex_df <- cbind(region = region, ex1)
# adding dataset values to the map data (only states with data)
us_val1 <- left_join(ex_df, us1, by = c("region" = "full"))
# full map dataset joined by ex1 dataset to draw the map
us_map1 <- left_join(us1, ex_df, by = c("full" ="region")) %>%
mutate(qQt = replace_na(Qt, 0))
# creating a dataset with centroids of the states (only the ones in ex1)
us_centroids1 <-
us_val1 %>%
group_by(region) %>%
summarise(centroid.x = mean(range(x)),
centroid.y = mean(range(y)),
label = unique(State),
`Apo` = unique(Apo),
`Qt` = unique(Qt))
## drawing the plot
ggplot() +
geom_polygon(data = us_map1,
aes(x,y, group = group, fill = Qt),
color = "black",
size = .1) +
geom_label_repel(data = us_centroids1,
aes(centroid.x, centroid.y,
label = paste(region, "\n Apo (km/h) = ", `Apo`, "\n Qt (m/s) =", `Qt`)),
size = 5/14*8,
box.padding = 1,
parse = F) +
scale_fill_gradientn(name = expression(Q[t]~(m/s)),
breaks = c(0, seq(10,130,20)),
labels = c("", seq(10,130,20)),
limits = c(0, 130),
colors = c("#DCDCDC", "lightblue", "green"),
guide = guide_colorbar(barwidth = 0.8, barheight = 18)) +
theme_void()
This is kind of a pain, since plotmath doesn't appear to have line breaks. Thus, you have to work around it with atop(). Use bquote() to insert variable values into the expression. This only works on one element at once, thus we have to pmap() over the three variables.
ggplot() +
geom_polygon(data = us_map1,
aes(x,y, group = group, fill = Qt),
color = "black",
size = .1) +
geom_label_repel(data = us_centroids1,
aes(centroid.x, centroid.y,
label = pmap(list(region, Apo, Qt),
\(x,y,z) bquote(atop(.(x), # first line of lab
atop(A[po] (km/h) == .(y), # second line
Q[t] (m/s) == .(z)) # third line
)
)
)
),
size = 5/14*8,
box.padding = 1,
parse = T) +
scale_fill_gradientn(name = expression(Q[t]~(m/s)),
breaks = c(0, seq(10,130,20)),
labels = c("", seq(10,130,20)),
limits = c(0, 130),
colors = c("#DCDCDC", "lightblue", "green"),
guide = guide_colorbar(barwidth = 0.8, barheight = 18)) +
theme_void()
Created on 2022-07-31 by the reprex package (v2.0.1)
What I'm doing
I'm using a library for R called ggplot2, which allows for a lot of different options for creating graphics and other things. I'm using that to display two different data sets on one graph with different colours for each set of data I want to display.
The Problem
I'm also trying to get a legend to to show up in my graph that will tell the user which set of data corresponds to which colour. So far, I've not been able to get it to show.
What I've tried
I've set it to have a position at the top/bottom/left/right to make sure nothing was making it's position to none by default, which would've hidden it.
The Code
# PDF/Plot generation
pdf("activity-plot.pdf")
ggplot(data.frame("Time"=times), aes(x=Time)) +
#Data Set 1
geom_density(fill = "#1A3552", colour = "#4271AE", alpha = 0.8) +
geom_text(x=mean(times)-1, y=max(density(times)$y/2), label="Mean {1} Activity", angle=90, size = 4) +
geom_vline(aes(xintercept=mean(times)), color="cyan", linetype="dashed", size=1, alpha = 0.5) +
# Data Set 2
geom_density(data=data.frame("Time"=timesSec), fill = "gray", colour = "orange", alpha = 0.8) +
geom_text(x=mean(timesSec)-1, y=max(density(timesSec)$y/2), label="Mean {2} Activity", angle=90, size = 4) +
geom_vline(aes(xintercept=mean(timesSec)), color="orange", linetype="dashed", size=1, alpha = 0.5) +
# Main Graph Info
labs(title="Activity in the past 48 hours", subtitle="From {DATE 1} to {DATE 2}", caption="{LOCATION}") +
scale_x_continuous(name = "Time of Day", breaks=seq(c(0:23))) +
scale_y_continuous(name = "Activity") +
theme(legend.position="top")
dev.off()
Result
As pointed out by #Ben, you should pass the color into an aes in order to get the legend being displayed.
However, a better way to get a ggplot is to merge your two values "Time" and "Timesec" into a single dataframe and reshape your dataframe into a longer format. Here, to illustrate this, I created this dummy dataframe:
Time = sample(1:24, 200, replace = TRUE)
Timesec = sample(1:24, 200, replace = TRUE)
df <- data.frame(Time, Timesec)
Time Timesec
1 22 23
2 21 9
3 19 9
4 10 6
5 7 24
6 15 9
... ... ...
So, the first step is to reshape your dataframe into a longer format. Here, I'm using pivot_longer function from tidyr package:
library(tidyr)
library(dplyr)
df %>% pivot_longer(everything(), names_to = "var",values_to = "val")
# A tibble: 400 x 2
var val
<chr> <int>
1 Time 22
2 Timesec 23
3 Time 21
4 Timesec 9
5 Time 19
6 Timesec 9
7 Time 10
8 Timesec 6
9 Time 7
10 Timesec 24
# … with 390 more rows
To add geom_vline and geom_text based on the mean of your values, a nice way of doing it easily is to create a second dataframe gathering the mean and the maximal density values needed to be plot:
library(tidyr)
library(dplyr)
df_lab <- df %>% pivot_longer(everything(), names_to = "var",values_to = "val") %>%
group_by(var) %>%
summarise(Mean = mean(val),
Density = max(density(val)$y))
# A tibble: 2 x 3
var Mean Density
<chr> <dbl> <dbl>
1 Time 11.6 0.0555
2 Timesec 12.1 0.0517
So, using df and df_lab, you can generate your entire plot. Here, we passed color and fill arguments into the aes and use scale_color_manual and scale_fill_manual to set appropriate colors:
library(dplyr)
library(tidyr)
library(ggplot2)
df %>% pivot_longer(everything(), names_to = "var",values_to = "val") %>%
ggplot(aes(x = val, fill = var, colour = var))+
geom_density(alpha = 0.8)+
scale_color_manual(values = c("#4271AE", "orange"))+
scale_fill_manual(values = c("#1A3552", "gray"))+
geom_vline(inherit.aes = FALSE, data = df_lab,
aes(xintercept = Mean, color = var), linetype = "dashed", size = 1,
show.legend = FALSE)+
geom_text(inherit.aes = FALSE, data = df_lab,
aes(x = Mean-0.5, y = Density/2, label = var, color = var), angle = 90,
show.legend = FALSE)+
labs(title="Activity in the past 48 hours", subtitle="From {DATE 1} to {DATE 2}", caption="{LOCATION}") +
scale_x_continuous(name = "Time of Day", breaks=seq(c(0:23))) +
scale_y_continuous(name = "Activity") +
theme(legend.position="top")
Does it answer your question ?
i am trying to plot three variable (SA,SA1,SA2) with two variable(SA& SA2) on left y-axis and one variable (SA1)on right secondary y-axis. I tried to fix the axis limits using limits = c(1e15,5e15) on left y-axis while trying to limit secondary axis between limits = c(3e17,4.2e17) but i am unable to plot the seocondary axis with my customized limits. DATA Link
library(ggplot2)
test <- read.xlsx2("filepath/test.xlsx", 1, header=TRUE)
View(test)
test$SA=as.numeric(levels(test$SA))[test$SA]
test$SA1=as.numeric(levels(test$SA1))[test$SA1]
test$SA2=as.numeric(levels(test$SA2))[test$SA2]
g <- ggplot(test,aes(x=year, y= SA, group = 1)) + geom_line(mapping = aes(x = test$year, y = test$SA))
+ geom_line(mapping = aes(x = test$year, y = test$SA2), color = "red") + geom_line(mapping = aes(x = test$year, y = test$SA1), size = 1, color = "blue")
g+scale_y_continuous(name = "primary axis title",
+ sec.axis = sec_axis(~./5, name = "secondary axis title (SA1)"))
Final Solution by #dc37 gives me the followibng result:
ggplot(subset(DF, Var != "SA1"), aes(x = year, y = val, color = Var))+
geom_line()+
scale_y_continuous(name = "Primary axis", sec.axis = sec_axis(~.*100, name = "Secondary"))
Thanks
The argument sec.axis is only creating a new axis but it does not change your data and can't be used for plotting data.
To do be able to plot data from two groups with a large range, you need to scale down SA1 first.
Here, I scaled it down by dividing it by 100 (because the ratio between the max of SA1 and the max of SA and SA2 is close to 100) and I also reshape your dataframe in longer format more suitable for ggplot2:
library(lubridate)
df$year = parse_date_time(df$year, orders = "%Y") # To set year in a date format
library(dplyr)
library(tidyr)
DF <- df %>% mutate(SA1_100 = SA1/100) %>% pivot_longer(.,-year, names_to = "Var",values_to = "val")
# A tibble: 44 x 3
year Var val
<int> <chr> <dbl>
1 2008 SA 1.41e15
2 2008 SA1 3.63e17
3 2008 SA2 4.07e15
4 2008 SA1_100 3.63e15
5 2009 SA 1.53e15
6 2009 SA1 3.77e17
7 2009 SA2 4.05e15
8 2009 SA1_100 3.77e15
9 2010 SA 1.52e15
10 2010 SA1 3.56e17
# … with 34 more rows
Then, you can plot it by using (I subset the dataframe to remove "SA1" and keep the transformed column "SA1_100"):
library(ggplot2)
ggplot(subset(DF, Var != "SA1"), aes(x = year, y = val, color = Var))+
geom_line()+
scale_y_continuous(name = "Primary axis", sec.axis = sec_axis(~.*100, name = "Secondary"))
BTW, in ggplot2, you don't need to design column using $, simply write the name of it.
Data
structure(list(year = 2008:2018, SA = c(1.40916e+15, 1.5336e+15,
1.52473e+15, 1.58394e+15, 1.59702e+15, 1.54936e+15, 1.6077e+15,
1.59211e+15, 1.73533e+15, 1.7616e+15, 1.67771e+15), SA1 = c(3.63e+17,
3.77e+17, 3.56e+17, 3.68e+17, 3.68e+17, 3.6e+17, 3.6e+17, 3.68e+17,
3.55e+17, 3.58e+17, 3.43e+17), SA2 = c(4.07e+15, 4.05e+15, 3.94e+15,
3.95e+15, 3.59e+15, 3.53e+15, 3.43e+15, 3.2e+15, 3.95e+15, 3.03e+15,
3.16e+15)), row.names = c(NA, -11L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x56412c341350>)
Based on Small ggplot2 plots placed on coordinates on a ggmap
I would like to have the same solution, but with ggplot function outside the pipeline, applied with purrr::map().
The data for small bar subplots indicating 2 values, may contain
lon, lat, id, valueA, valueB,
After tidyr::gather operation it may look like:
Town, Potential_Sum, lon, lat, component , sales
Aaa, 9.00, 20.80, 54.25, A, 5.000
Aaa, 9.00, 20.80, 54.25, B, 4.000
Bbb, 5.00, 19.60, 50.50, A, 3.000
Bbb, 5.00, 19.60, 50.50, B, 2.000
Current working solution is to use do() to generate sublopts and then ggplotGrob to generate a column with objects "grobs" to be placed at lon,lat locations on a ggmap.
maxSales <- max(df$sales)
df.grobs <- df %>%
do(subplots = ggplot(., aes(1, sales, fill = component)) +
geom_col(position = "dodge", alpha = 0.50, colour = "white") +
coord_cartesian(ylim = c(0, maxSales)) +
scale_fill_manual(values = c("green", "red"))+
geom_text(aes(label=if_else(sales>0,round(sales), NULL)), vjust=0.35,hjust=1.1, colour="black",
position=position_dodge(.9), size=2.5, angle=90)+
theme_void()+ guides(fill = F)) %>%
mutate(subgrobs = list(annotation_custom(ggplotGrob(subplots),
x = lon-0.14, y = lat-0.20,
xmax = lon+0.14, ymax = lat+1.2)))
df.grobs %>%
{p + geom_label(aes(x = 15, y = 49.8, label = "A"), colour = c("black"),fill = "green", size=3)+
geom_label(aes(x = 15, y = 5.01, label = "B"), colour = c("black"),fill = "red", size=3)+
.$subgrobs +
geom_text(data=df, aes(label = Miasto), vjust = 3.5,nudge_x = 0.05, size=2.5) +
geom_col(data = df,
aes(0,0, fill = component),
colour = "white")}
p is a ggmap object, map of Poland, on which I would like to place small plots:
# p <-
# get_googlemap(
# "Poland",
# maptype = "roadmap",
# zoom = 6,
# color = "bw",
# crop = T,
# style = "feature:all|element:labels|visibility:off" # 'feature:administrative.country|element:labels|visibility:off'
# ) %>% # or 'feature:all|element:labels|visibility:off'
# ggmap() + coord_cartesian() +
# scale_x_continuous(limits = c(14, 24.3), expand = c(0, 0)) +
# scale_y_continuous(limits = c(48.8, 55.5), expand = c(0, 0))
#
How to translate this solution to the syntax nest - apply -unnest so that the ggplot part should be outside of the piped expression as a function.
In other words. How to replace do() with map(parameters, GGPlot_function) and then plot grobs on a ggmap .
What I did so far was I tried to write a ggplot function
#----barplots----
maxSales <- max(df$sales)
fn_ggplot <- function (df, x, component, maxX) {
x <- enquo(x)
component <-enquo(component)
maxX <-enquo(maxX)
p <- ggplot(df, aes(1, !!x, fill = !!component)) +
geom_col(position = "dodge", alpha = 0.50, colour = "white") +
coord_cartesian(ylim = c(0, !!maxX)) +
scale_fill_manual(values = c("green", "red"))+
geom_text(aes(label=if_else(x>0,round(!!x), NULL)), vjust=0.35,hjust=1.1, colour="black",
position=position_dodge(.9), size=2.5, angle=90)+
theme_void()+ guides(fill = F)
return(p)
}
And got totaly confused trying to apply it like this (I am a constant beginner unfortunately)... this is not working, showing
df.grobs <- df %>%
mutate(subplots = pmap(list(.,sales,component,Potential_Sum),fn_ggplot)) %>%
mutate(subgrobs = list(annotation_custom(ggplotGrob(subplots),
x = lon-0.14, y = lat-0.20,
xmax = lon+0.14, ymax = lat+1.2)))
I get errors indicating I do not know what I am doing, ie lengths of arguments are incorrect and something else is expected.
message: Element 2 of `.l` must have length 1 or 7, not 2
class: `purrr_error_bad_element_length`
backtrace:
1. dplyr::mutate(...)
12. purrr:::stop_bad_length(...)
13. dplyr::mutate(...)
Call `rlang::last_trace()` to see the full backtrace
> rlang::last_trace()
x
1. +-`%>%`(...)
2. | +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3. | \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
4. | \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
5. | \-global::`_fseq`(`_lhs`)
6. | \-magrittr::freduce(value, `_function_list`)
7. | \-function_list[[i]](value)
8. | +-dplyr::mutate(...)
9. | \-dplyr:::mutate.tbl_df(...)
10. | \-dplyr:::mutate_impl(.data, dots, caller_env())
11. +-purrr::pmap(list(., sales, component, Potential_Sum), fn_ggplot)
12. \-purrr:::stop_bad_element_length(...)
13. \-purrr:::stop_bad_length(...)
data
First let's build some sample data close to yours but reproducible without the need for an api key.
As a starting point we have a plot of a country map stored in p, and some data in long form to build the charts stored in plot_data.
library(maps)
library(tidyverse)
p <- ggplot(map_data("france"), aes(long,lat,group=group)) +
geom_polygon(fill = "lightgrey") +
theme_void()
set.seed(1)
plot_data <- tibble(lon = c(0,2,5), lat = c(44,48,46)) %>%
group_by(lon, lat) %>%
do(tibble(component = LETTERS[1:3], value = runif(3,min=1,max=5))) %>%
mutate(total = sum(value)) %>%
ungroup()
plot_data
# # A tibble: 9 x 5
# lon lat component value total
# <dbl> <dbl> <chr> <dbl> <dbl>
# 1 0 44 A 2.06 7.84
# 2 0 44 B 2.49 7.84
# 3 0 44 C 3.29 7.84
# 4 2 48 A 4.63 11.0
# 5 2 48 B 1.81 11.0
# 6 2 48 C 4.59 11.0
# 7 5 46 A 4.78 11.9
# 8 5 46 B 3.64 11.9
# 9 5 46 C 3.52 11.9
define a plotting function
we isolate the plotting code in a separate function
my_plot_fun <- function(data){
ggplot(data, aes(1, value, fill = component)) +
geom_col(position = position_dodge(width = 1),
alpha = 0.75, colour = "white") +
geom_text(aes(label = round(value, 1), group = component),
position = position_dodge(width = 1),
size = 3) +
theme_void()+ guides(fill = F)
}
build a wrapper
This function takes a data set, some coordinates and the plotting function as parameters, to annotate at the right spot.
annotation_fun <- function(data, lat,lon, plot_fun) {
subplot = plot_fun(data)
sub_grob <- annotation_custom(ggplotGrob(subplot),
x = lon-0.5, y = lat-0.5,
xmax = lon+0.5, ymax = lat+0.5)
}
The final code
The the code becomes simple, using nest and pmap
subgrobs <- plot_data %>%
nest(-lon,-lat) %>%
pmap(annotation_fun,plot_fun = my_plot_fun)
p + subgrobs
Consider this simple example
library(dplyr)
library(forcats)
library(ggplot2)
mydata <- data_frame(cat1 = c(1,1,2,2),
cat2 = c('a','b','a','b'),
value = c(10,20,-10,-20),
time = c(1,2,1,2))
mydata <- mydata %>% mutate(cat1 = factor(cat1),
cat2 = factor(cat2))
> mydata
# A tibble: 4 x 4
cat1 cat2 value time
<fct> <fct> <dbl> <dbl>
1 1 a 10.0 1.00
2 1 b 20.0 2.00
3 2 a -10.0 1.00
4 2 b -20.0 2.00
Now, I want to create a chart where I interact the two factor variables.
I know I can use interact in ggplot2 (see below).
My big problem is that I do not know how to automate the labeling (and the colouring) of the interactions so that I can avoid any manual error using scale_colour_manual.
For instance:
ggplot(mydata,
aes(x = time, y = value, col = interaction(cat1, cat2) )) +
geom_point(size=15) + theme(legend.position="bottom")+
scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
theme(legend.position="bottom",
legend.text=element_text(size=12, face = "bold")) +
scale_colour_manual(name = ""
, values=c("red","red4","royalblue","royalblue4")
, labels=c("1-b","1-a"
,"2-a","2-b"))
shows:
which has the wrong labels because of a (voluntarily) mistake I made in scale_colour_manual(). Indeed, the bright red dot is 1-a and not 1-b (note how the labels are simply the concatenation of the variable names). The idea is that with more factor levels, guessing the right order can be tricky.
Is there a way to automate this labeling (even better: labeling AND coloring)? Perhaps using forcats? Perhaps creating the labels as strings in the dataframe beforehand?
Thanks!
If the number of factor levels for cat1 / cat2 are not fixed (but could potentially be much larger than 2), I would try to calculate the appropriate colours with hsv(), rather than assign them manually.
The colour cheatsheet here summarise the HSV colour model rather nicely:
Hue (h) is essentially your rainbow colour wheel, Saturation (s) determines how intense the colour is, and Value (v) how dark it is. Each parameter accepts values in the range [0, 1].
Here's how I would adapt it for this use case:
mydata2 <- mydata %>%
# use "-" instead of the default "." since we are using that for the labels anyway
mutate(interacted.variable = interaction(cat1, cat2, sep = "-")) %>%
# cat1: assign hue evenly across the whole wheel,
# cat2: restrict both saturation & value to the [0.3, 1], as it can look too
# faint / dark otherwise
mutate(colour = hsv(h = as.integer(cat1) / length(levels(cat1)),
s = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2)),
v = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2))))
# create the vector of colours for scale_colour_manual()
manual.colour <- mydata2 %>% select(interacted.variable, colour) %>% unique()
colour.vector <- manual.colour$colour
names(colour.vector) <- manual.colour$interacted.variable
rm(manual.colour)
> colour.vector
1-a 1-b 2-a 2-b
"#3AA6A6" "#00FFFF" "#A63A3A" "#FF0000"
With the colours calculated automatically for any number of factors, plotting becomes quite straightforward:
ggplot(mydata2,
aes(x = time, y = value, colour = interacted.variable)) +
geom_point(size = 15) +
scale_colour_manual(name = "",
values = colour.vector,
breaks = names(colour.vector)) +
theme(legend.position = "bottom")
An illustration with more factor levels (code is the same except for the addition of specifying guide_legend(byrow = TRUE) in the colour scale:
mydata3 <- data.frame(
cat1 = factor(rep(1:3, times = 5)),
cat2 = rep(LETTERS[1:5], each = 3),
value = 1:15,
time = 15:1
) %>%
mutate(interacted.variable = interaction(cat1, cat2, sep = "-"),
colour = hsv(h = as.integer(cat1) / length(levels(cat1)),
s = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2)),
v = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2))))
manual.colour <- mydata3 %>% arrange(cat1, cat2) %>%
select(interacted.variable, colour) %>% unique()
colour.vector <- manual.colour$colour
names(colour.vector) <- manual.colour$interacted.variable
rm(manual.colour)
ggplot(mydata3,
aes(x = time, y = value, colour = interacted.variable)) +
geom_point(size = 15) +
scale_colour_manual(name = "",
values = colour.vector,
breaks = names(colour.vector),
guide = guide_legend(byrow = TRUE)) +
theme(legend.position = "bottom")