How can I visualise points on a single line in R? - r

I'm wanting to plot 3 numerical size values on one line in R in order of ascending size, but research so far has pointed me towards regular line graphs. I'm looking for something like this:
where size increases from left to right and I can plot my 3 data points on the line to show where each sample falls. It doesnt need to be as complicated as this example, just one line standalone.
How would I go about doing this?

Here's a quick recreation:
library(tidyverse)
mtcars %>%
group_by(gear = as.factor(gear)) %>%
summarize(min = min(wt),
max = max(wt),
mean = mean(wt),
sd = sd(wt),
median = median(wt)) -> summary
ggplot(summary, aes(y=gear)) +
geom_errorbarh(aes(xmin = min, xmax = max), height = 0.04, color = "gray70") +
geom_segment(aes(yend = gear, x = mean-sd, xend = mean+sd), alpha = 0.3,
color = "forestgreen", size = 10) +
geom_point(aes(x = median), shape = 17, color = "darkred") +
geom_text(aes(x = median, label = median), vjust = -1.5) +
theme_minimal() + theme(panel.grid = element_blank())

Related

How to increase the default y-axis limit in ggplot when scales in facet_wrap() are set to "free"

I am trying to figure out how to change the global default setting of the y-axis limit in ggplot when facet_wrap() is used and scales are "free". I have searched the site but couldn't find any solution to my problem. I know I could build each plot separately and then assemble them using cowplot::plot_grid() or something similar but this will be cumbersome with 10 subplots.
Example data and plot:
library(tidyverse)
d <- tibble(
var = c(rep(LETTERS[1:2], each = 4)),
id = c(rep(1:4, 2)),
mean = c(23, 24, 21, 22, 154, 153, 152, 151),
sd = c(rep(c(2, 10), each = 4)),
diff = c(rep(letters[1:4], 2))
)
ggplot(d, aes(id, mean)) +
geom_point() +
geom_errorbar(aes(ymin = mean - sd,
ymax = mean + sd)) +
geom_text(aes(
x = id,
y = mean + sd,
vjust = -2,
label = diff
)) +
facet_wrap( ~ var, scales = "free_y")
Resulting graph:
The problem I am encountering is that the letters are out of range when using the vjust = -2 argument. I am aware that in this example I can simply change the vjust argument but in my actual dataset this won't resolve the issue unfortunately. So my question is, assuming vjust = -2 is fixed, is there a way to increase the default y-axis limits when scales = "free" in facet_wrap().
An option is to expand the axis limits.
ggplot(d, aes(id, mean)) +
geom_point() +
geom_errorbar(aes(ymin = mean - sd,
ymax = mean + sd)) +
geom_text(aes(
x = id,
y = mean + sd,
vjust = -2,
label = diff
)) +
facet_wrap( ~ var, scales = "free_y") +
scale_y_continuous(expand = expansion(mult = c(0.1, 0.2)))
This increases the top range by 20% and the bottom by 10%. AFAIK, the defaults are mult = c(0.05, 0).

Connecting means with stat_summary (geom = 'line') within ticks on the x-axis

I am trying to create a plot in ggplot2 similar to this one:
Here is the code I am using:
Dataset %>%
group_by(Participant, Group, Emotion) %>%
ggplot(aes(y = Score, x = Emotion, fill = Group, colour = Group)) +
geom_flat_violin(position = position_nudge(x = .2, y = 0), alpha = .4) +
geom_point(aes(y = Score, color = Group), position = position_jitter(width = .15), size = 3, alpha = 0.4) +
stat_summary(aes(y = Score, group = Emotion), fun.y = mean, geom="line", size = 2.2, alpha = 1.2, width = 0.25, colour = 'gray48') +
stat_summary(fun = mean, geom = 'pointrange', width = 0.2, size = 2, alpha = 1.2, position=position_dodge(width=0.3)) +
stat_summary(fun.data = mean_se, geom='errorbar', width = 0.25, size = 2.2, alpha = 1.2, linetype = "solid",position=position_dodge(width=0.3)) +
guides(color = FALSE) +
scale_color_brewer(palette = "Dark2") +
scale_fill_brewer(palette = "Dark2") +
ylim(0, 100) +
graph_theme
What I am failing to do is set up the stat_summary(geom = 'line') to connect the green and orange means within each emotion on the x-axis. Could anyone give any pointers on this? I'd also like all the other features to stay the same if possible (e.g., I wouldn't like to use facet_grid or facet_wrap).
Thank you!
When I change the group argument in stat_summary to 'Group' instead of 'Emotion', means for each group are connected across emotions, but I can't figure out how to connect means of different groups within each emotion:
This is a tricky one because your line needs to connect points that have different x values but even if you jitter in the point layer, they still technically have the same x value so the line doesn't know how to connect them. What others have done is to manually add the jitter to force the points to have a different x position. For more inspiration check out this, this and this. Here's an example:
library(tidyverse)
set.seed(1)
emotion <- c("anger", "fear", "sadness")
group <- letters[1:2]
participant <- 1:10
dat <- expand_grid(emotion, group, participant) %>%
mutate(across(everything(), as.factor),
score = sample(x = 1:100, size = nrow(.), replace = T))
dat %>%
mutate(new_emot = case_when(
group == "a" ~as.numeric(emotion) - 0.125,
group == "b" ~as.numeric(emotion) + 0.125
)) %>%
ggplot(aes(x = emotion, y = score)) +
stat_summary(aes(color = group), fun = mean, geom = "point", position = position_dodge(width = 0.5)) +
stat_summary(aes(color = group), fun.data = mean_se, geom = "errorbar", width = 0.5, position = position_dodge(width = 0.5)) +
stat_summary(aes(x = new_emot, group = emotion), fun = mean, geom = "line") +
theme_bw()
Created on 2021-03-24 by the reprex package (v1.0.0)
Setting geom_line to the same position as pointrange and errorbar will solve the problem.
i.e.,
stat_summary(aes(y = Score, group = Emotion), fun.y = mean, geom="line", size = 2.2, alpha = 1.2, width = 0.25, colour = 'gray48', position=position_dodge(width=0.3))

How to improve speed of ggplot bar chart when plotting >1000 points?

I'm producing a bar chart for 1200 observations using ggplot2. Each of these observations has an error bar. There's also an average shown (using geom_line) for these observations overall.
I'm finding the running time is very slow (2 seconds) in comparison to less observations (e.g. if 500 or were used <1 second). Also, all observations must be a seperate bar.
I realise it doesn't sound like much time, but this time adds up overall for what I need to do - producing over 100 of these plots and knitting them to rmd file.
Below is a piece of code I've created to replicate the issue - this is using ggplot2 inbuilt diamonds dataset.
diamonds1 <- as.data.frame(mutate(diamonds, upper = x + 1.2, lower = x - 0.4))
diamonds2 <- diamonds1 %>%
group_by(cut) %>%
summarize(Mean = mean(x, na.rm=TRUE))
ChosenColorClarity <- "VVS28451"
diamonds3 <- left_join(diamonds1 ,diamonds2, by = c("cut" = "cut") ) %>%
filter(cut == "Very Good") %>%
mutate(ID = paste0(clarity,row_number() )) %>%
mutate(CutType = case_when(ID==ChosenColorClarity ~ ID,
color == "F" & ID != ChosenColorClarity ~ " Same Color",
TRUE ~ " Other Color"),
CutLabel = ifelse(ID == ChosenColorClarity, "Your Cut", ""))
diamonds4 <- diamonds3[order(-xtfrm(diamonds3$CutLabel)),]
diamonds4 <- diamonds4[1:1255,]
diamonds4$Xval <- as.numeric(reorder(diamonds4$ID, diamonds4$x))
DiamondCutChart = diamonds4 %>%
ggplot(aes(x = Xval,
y = x)) +
geom_bar(aes(fill=CutType), stat = "identity", width = 1) +
geom_errorbar(aes(ymin = lower, ymax = upper)) +
geom_text(aes(label = CutLabel),
position = position_stack(vjust = 0.5),
size = 2.7, angle = 90, fontface = "bold") +
geom_line(aes(y = diamonds4$Mean), group = 1, linetype=2, colour = "#0000ff") +
scale_fill_manual(values = c("#32572C", "#41B1B1", "#db03fc")) +
annotate("text", x = 1, y = diamonds4$Mean, hjust =0, vjust = -0.5,
size = 3.2, colour = "#0000ff",
label=paste0("Mean ",diamonds4$Mean)) +
theme_classic()+
theme(axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
legend.position = "top") +
labs(fill = "")
StartTime = Sys.time()
DiamondCutChart
EndTime = Sys.time()
EndTime - StartTime
When running this, it takes around 2 seconds. I need this to be less than 1 second to be able to produce multiple plots and rmarkdown outputs in less overall time.
How can I reduce the time it takes to plot the graph from the piece of code?
Any help or pointing in the right direction is greatly appreciated.
I'm assuming for now that you're aiming for raw speed, and visualization that depicts the desired data content. I'm not sure you need geom_bar() if only one bar is a different color. If your real world scenario has 7 different colors mixed randomly among the 1255 bars... this workaround won't work for you. :) Hopefully this will be helpful! :)
The geom_ribbon() is much faster to render than geom_bar(). With 1255 positions I didn't fiddle with its options, but I understand it has step functions to make it appear like bars when zoomed in. Ymmv.
It is so much faster, I decided to use it twice: once to render "bars" and once to render "error bars". In order for geom_ribbon() to work (for me) I created a numeric column for the x-axis values Xval, see below.
The geom_text() step is really only printing one label, and subsetting the data during this step saves a lot of rendering time. You can adjust as needed.
Same with the annotate() step, it's actually printing and re-printing the same label 1255 times, takes a lot of time. Obviously you don't need that. :)
Each of the three steps above saves about 0.6 to 0.7 seconds. Maybe you can mix and match with other geoms as needed.
The final result (on my system) was 0.2 seconds.
diamonds4$Xval <- as.numeric(reorder(diamonds4$ID, diamonds4$x))
DiamondCutChartNew <- diamonds4 %>%
ggplot(aes(x = Xval, y = x)) +
geom_ribbon(aes(ymin = 0, ymax = x), fill="#32572C") +
geom_col(data = subset(diamonds4, nchar(CutLabel) > 0),
aes(x = Xval, y = x),
fill = "#41B1B1") +
geom_ribbon(data = diamonds4,
aes(ymin = lower, ymax = upper), fill="#FF000077") +
geom_line(aes(y = x)) +
geom_text(data = subset(diamonds4, nchar(CutLabel) > 0),
aes(label = CutLabel),
position = position_stack(vjust = 0.5),
size = 2.7, angle = 90, fontface = "bold") +
geom_line(aes(x = Xval, y = Mean), group = 1, linetype = 2, colour = "#0000ff") +
annotate("text", x = 1, y = head(diamonds4$Mean, 1), hjust = 0, vjust = -0.5,
size = 3.2, colour = "#0000ff",
label=paste0("Mean ", head(diamonds4$Mean, 1))) +
theme_classic() +
theme(axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
legend.position = "top") +
labs(fill = "")
{StartTime = Sys.time()
print(DiamondCutChartNew)
EndTime = Sys.time()
EndTime - StartTime}
Original result (for me):
Time difference of 2.05 secs
The new result:
Time difference of 0.229 secs
Pasting the ProfVis run for this question:
https://rstudio.github.io/profvis/
install.packages("profvis")
library(profvis)
profvis(expr = {
DiamondCutChart <- diamonds4 %>%
ggplot(aes(x = reorder(ID, x),
y = x)) +
geom_bar(aes(fill=CutType), stat = "identity", width = 1) +
geom_errorbar(aes(ymin = lower, ymax = upper)) +
geom_text(aes(label = CutLabel),
position = position_stack(vjust = 0.5),
size = 2.7, angle = 90, fontface = "bold") +
geom_line(aes(y = Mean), group = 1, linetype=2, colour = "#0000ff") +
scale_fill_manual(values = c("#32572C", "#41B1B1")) +
annotate("text", x = 1, y = diamonds4$Mean, hjust =0, vjust = -0.5,
size = 3.2, colour = "#0000ff",
label=paste0("Mean ",diamonds4$Mean)) +
theme_classic()+
theme(axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
legend.position = "top") +
labs(fill = "")
print(DiamondCutChart)
},
interval = 0.005
)

Graph with a shaded the area occupied by multiple lines

PROBLEM STATEMENT
My dataset contains 100 groups, each of one can be plotted as a line with a similar shape against a response variable. I would like to produce a graph where all the space taken by the 100 curved lines turns into a shaded area, so it is easier to show the variation of the response variable across all the groups. This will also allow to clearly see the values or intervals in the x-axis where the response variable has lower variation (shaded area will be narrower as most lines will overlap) or higher variation.
CODE EXAMPLE
library(tidyverse)
library(ggplot2)
set.seed(1)
# Produce a similar table to the real one
example <- tibble(values = seq(0, 10, 0.1),
sine1 = sin(values + 0.2),
sine2 = sin(values - 0.2),
sine3 = sin(values + 0.4) + 0.2,
sine4 = sin(values - 0.4) - 0.2,
sine5 = sin(values - 0.4) + 0.2,
sine6 = sin(values - 0.2) + 0.4) %>%
pivot_longer(-values) # final format with 3 columns
# Create a line graph, where each line represents a different sine curve
graph1 <- ggplot(example, aes(x = values, y = value, col = name)) +
geom_line(size = 3, show.legend = FALSE, alpha = 0.5) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))
graph1
QUESTION
Is there a way of going from this graph...
to this one (or similar)? Note: the thick black line is not strictly necessary
You just need to group per individual time unit and calculate the minimum / maximum values. This allows you to plot a geom_ribbon:
example %>%
group_by(values) %>%
summarize(min = min(value), max = max(value)) %>%
ggplot() +
geom_ribbon(aes(x = values, ymin = min, ymax = max), size = 2,
fill = "#29c8e5", color = "black") +
theme_classic()
If you would rather have the ribbon overlying your original plot, you could do:
ribbon <- example %>%
group_by(values) %>%
summarize(min = min(value), max = max(value))
graph1 +
geom_ribbon(aes(x = values, ymin = min, ymax = max),
data = ribbon, size = 0, fill = "#29c8e5",
color = NA, alpha = 0.3, inherit.aes = FALSE)
For what it's worth, I think the first option is more visually striking.

Overlay histogram and density with varying alpha

I am trying to create a ggplot histogram with a density overlay, where the alpha changes past the number 1. An example can be seen on 538 under the Every outcome in our simulations section. The alpha differs based on the electoral vote count. I am close to getting a similar graph but I cannot figure out how to get the density and histogram to work together.
Code
library(data.table)
library(ggplot2)
dt <- data.table(ratio = rnorm(10000, mean = .5, sd = 1))
dt[, .(ratio,
al = (ratio >= 1))] %>%
ggplot(aes(x = ratio, alpha = al)) +
geom_histogram(aes(), bins = 100,
fill = 'red') +
geom_density(aes(),size = 1.5,
color = 'blue') +
geom_vline(xintercept = 1,
color = '#0080e2',
size = 1.2) +
scale_alpha_discrete(range = c(.65, .9))
This attempt correctly changes alpha past 1 as desired but the density estimate is not scaled.
dt[, .(ratio,
al = (ratio >= 1))] %>%
ggplot(aes(x = ratio)) +
geom_histogram(aes(y = ..density.., alpha = al), bins = 100,
fill = 'red') +
geom_density(aes(y = ..scaled..),size = 1.5,
color = 'blue',) +
geom_vline(xintercept = 1,
color = '#0080e2',
size = 1.2) +
scale_alpha_discrete(range = c(.65, .9))
This attempt correctly scales the density curve, but now the geom_histogram is calculated separately for values under 1 and above 1. I want them calculated as one group.
What am I missing?
The reason why knowing your theme is important is that there's an easy shortcut to this, which is not using alpha, but just drawing a semitransparent rectangle over the left half of your plot:
library(data.table)
library(ggplot2)
library(dplyr)
data.table(ratio = rnorm(10000, mean = .5, sd = 1)) %>%
ggplot(aes(x = ratio)) +
geom_histogram(aes(y = ..density..), bins = 100,
fill = 'red') +
geom_line(aes(), stat = "density", size = 1.5,
color = 'blue') +
geom_vline(xintercept = 1,
color = '#0080e2',
size = 1.2) +
annotate("rect", xmin = -Inf, xmax = 1, ymin = 0, ymax = Inf, fill = "white",
alpha = 0.5) +
theme_bw()
Splitting into two groups and using alpha is possible, but it basically requires you to precalculate the histogram and the density curve. That's fine, but it would be an awful lot of extra effort for very little visual gain.
Of course, if theme_josh has a custom background color and zany gridlines, this approach may not be quite so effective. As long as you set the fill color to the panel background you should get a decent result. (the default ggplot panel is "gray90" or "gray95" I think)

Resources