possible bug in geom_ribbon - r

i was hoping to plot two time series and shade the space between the series according to which series is larger at that time.
here are the two series-- first in a data frame with an indicator for whichever series is larger at that time
d1 <- read.csv("https://dl.dropbox.com/s/0txm3f70msd3nm6/ribbon%20data.csv?dl=1")
And this is the melted series.
d2 <- read.csv("https://dl.dropbox.com/s/6ohwmtkhpsutpig/melted%20ribbon%20data.csv?dl=1")
which I plot...
ggplot() + geom_line(data = d2,
aes(x = time, y = value, group = variable, color = variable)) +
geom_hline(yintercept = 0, linetype = 2) +
geom_ribbon(data = d1[d1$big == "B",],
aes(x = time, ymin = csa,
ymax = csb),
alpha = .25,
fill = "#9999CC") +
geom_ribbon(data = d1[d1$big == "A",],
aes(x = time, ymin = csb,
ymax = csa),
alpha = .25,
fill = "#CC6666") +
scale_color_manual(values = c("#CC6666" , "#9999CC"))
which results in...
why is there a superfluous blue band in the middle of the plot?

Here is a solution. I replaced data = d1[d1$big == "B",] in the first geom_ribbon function with:
data = rbind(d1[d1$big == "B",],
d1[c((which(diff(as.numeric(d1$big)) == -1) + 1),
(which(diff(as.numeric(d1$big)) == 1))), ])
This is necessary since the first and last rows of d1$big == "B" sequences often contain different csa and csb values. As a result, there is a visible ribbon connecting the data. The above command uses the last rows before and the first rows after these sequences together with the data for the first ribbon.
This problem does not exist for d1$big == "A" (the base for the second ribbon).
The complete code:
ggplot() +
geom_line(data = d2,
aes(x = time, y = value, group = variable, color = variable)) +
geom_hline(yintercept = 0, linetype = 2) +
geom_ribbon(data = rbind(d1[d1$big == "B",],
d1[c((which(diff(as.numeric(d1$big)) == -1) + 1),
(which(diff(as.numeric(d1$big)) == 1))), ]),
aes(x = time, ymin = csa, ymax = csb),
alpha = .25, fill = "#9999CC") +
geom_ribbon(data = d1[d1$big == "A",],
aes(x = time, ymin = csb, ymax = csa),
alpha = .25, fill = "#CC6666") +
scale_color_manual(values = c("#CC6666" , "#9999CC"))

Related

How to add variables in a column as text using ggplot, facet_grid and geom_text

consider the following tibble
library(tidyverse)
df <-tibble(year = rep(1981:2020,4),
x = rep(letters[1:8],20),
y = rnorm(n = 160,0,1),
group = rep(letters[10:13],40))
I want to plot a faceted grid based on variable group and as text in each panel, the years (year) corresponding to each group (group).
Below a failed attempt where years are overlapping and not correct
ggplot() +
geom_line(data = df, aes(x = x, y = y, color = group)) +
geom_text(
data = df,
aes(
x = x,
y = 3,
label = year,
color = group
),
alpha = 0.7,
show.legend = FALSE
) +
facet_grid( ~ group)
Thanks for support!
I'm not sure I understand what you want, but you can try the following
ggplot() +
geom_line(data = df, aes(x = x, y = y, color = group)) +
geom_text(
data = df,
aes(
x = x,
y = 3,
label = year,
color = group
),
alpha = 0.7,
show.legend = FALSE,
position = position_stack(vjust = 1.5),
# stat = "unique"
) +
facet_grid( ~ group)
If you don't want the year to be repeated, uncomment the stat = "unique" line.
UPDATE
If you want a horizontal alignment you can create a new data frame
df2 <- df %>%
group_by(x, group) %>%
summarise(year = str_c(unique(year), collapse=", "))
ggplot() +
geom_line(data = df, aes(x = x, y = y, color = group)) +
geom_text(
data = df2,
aes(
x = x,
y = 1,
label = year,
color = group
),
alpha = 0.7,
show.legend = FALSE,
position = position_stack(vjust = 1.5),
stat = "unique"
) +
facet_grid( ~ group)
but with this alignment labels will overlap. You can try reducing the font-size or using facet_wrap to arrange the panels on two rows.
You can also manipulate strings in df2 and add "\n" where you need it, but I think this cannot be easily adapted to every string.

trouble combining geom_rect and facet_grid in ggplot

I want to shade part of the background in each facet of a simple plot. If I omit faceting and run geom_rect + geom_point, the expected results appear as shown in the MRE below. If I omit the rectangle and run geom_point + facet_grid, the expected 4 panels have each point in the correct facet. But when I combine geom_rect + geom_point + and facet_grid, the points in the first category and only those get plotted in every facet. What is going on please???
library(ggplot2)
set.seed(42)
syn.dat <- data.frame(
category.1 = as.factor(rep(c("1A", "1B"), each = 8)),
category.2 = as.factor(rep(rep(c("2A", "2B"), times = 2), each = 4)),
x = rep(-1:2, each = 4) + runif(8, max = .4),
y = rep(-1:2, each = 4) + runif(8, max = .4))
ggplot() +
geom_rect(aes(xmin = -Inf, xmax = Inf, ymin = .5,
ymax = Inf), fill = "lightyellow") +
geom_point(data = syn.dat, aes(x = x, y = y)) +
facet_grid(cols = vars(category.1),
rows = vars(category.2))
I'm not totally sure about this, but it may be that you need to explicitly provide the data argument to ggplot itself, in order for facet_grid to correctly pick up all the values?
ggplot(syn.dat) +
geom_rect(aes(xmin = -Inf, xmax = Inf, ymin = 0.5, ymax = Inf), fill = "lightyellow") +
geom_point(aes(x = x, y = y)) +
facet_grid(rows = vars(category.2), vars(cols = category.1))

asymmetrical distance between groups in alluvial diagram

I would like to chance the distances between groups in an alluvial diagram using ggplot2 and ggalluvial
my example is from https://corybrunson.github.io/ggalluvial/articles/ggalluvial.html
data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))
ggplot(vaccinations,
aes(x = survey, stratum = response, alluvium = subject,
y = freq,
fill = response, label = response)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow() +
geom_stratum(alpha = .5) +
geom_text(stat = "stratum", size = 3) +
theme(legend.position = "none") +
ggtitle("vaccination survey responses at three points in time")
gives me:
Now I would like to shorten the distance between ms153_NSA while keeping the same distance between ms432_NSA and ms460_NSA:
I tried to use following (without success):
https://rdrr.io/cran/ggalluvial/man/stat_flow.html
How to increase the space between the bars in a bar plot in ggplot2?
Even google does not show me an asymmetrical alluvial diagram: :'-/
Here's one way to hack it
Data manipulation:
# get layer data calculated by ggalluvial, & shift 2nd position x leftwards by desired amount
library(dplyr)
p <- ggplot(vaccinations,
aes(x = survey, stratum = response, alluvium = subject,
y = freq,
fill = response, label = response)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow() +
theme(legend.position = "none") +
ggtitle("vaccination survey responses at three points in time")
shift.amt = 0.2
new.df <- layer_data(p) %>%
mutate(xmin = ifelse(x == 2, xmin - shift.amt, xmin),
xmax = ifelse(x == 2, xmax - shift.amt, xmax),
x = ifelse(x == 2, x - shift.amt, x))
Plot:
library(ggforce) # needed for geom_diagonal_wide
ggplot(new.df, aes(fill = fill)) +
# recreate each geom layer using the modified data
geom_diagonal_wide(data = . %>%
select(alluvium, fill, side, xmin, xmax, ymin, ymax) %>%
group_by(alluvium,) %>%
summarise(fill = fill[side == "start"],
x = list(c(xmax[side == "start"], xmin[side == "end"],
xmin[side == "end"], xmax[side == "start"])),
y = list(c(ymax[side == "start"], ymax[side == "end"],
ymin[side == "end"], ymin[side == "start"]))) %>%
tidyr::unnest(),
aes(x = x, y = y, group = alluvium),
alpha = 0.5) +
geom_rect(data = . %>% group_by(x, stratum, fill) %>%
summarise(xmin = min(xmin), xmax = max(xmax),
ymin = min(ymin), ymax = max(ymax)),
aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax),
alpha = 0.5, colour = "black") +
geom_text(data = . %>% group_by(x, stratum, label) %>%
summarise(y = mean(range(y))),
aes(x = x, y = y, label = label),
inherit.aes = FALSE, size = 3) +
# recreate original x-axis breaks/labels, fill values, & axis titles
scale_x_continuous(breaks = sort(unique(new.df$x)),
labels = layer_scales(p)$x$get_labels()) +
scale_fill_identity() + # by default, this won't create a legend
labs(title = "vaccination survey responses at three points in time",
x = p$labels$x, y = p$labels$y)

geom_smooth(): One line, different colors

I am currently trying to customize my plot with the goal to have a plot like this:
If I try to specify the color or linetype in either aes() or mapping = aes(), I get two different smooths. One for each class. This makes sense, because the smoothing will be applied once for each type.
If I use group = 1 in the aestetics, I will get one line, also one color/linetype.
But I can not find a solution to have one smooth line with different colors/linetypes for each class.
My code:
ggplot(df2, aes(x = dateTime, y = capacity)) +
#geom_line(size = 0) +
stat_smooth(geom = "area", method = "loess", show.legend = F,
mapping = aes(x = dateTime, y = capacity, fill = type, color = type, linetype = type)) +
scale_color_manual(values = c(col_fill, col_fill)) +
scale_fill_manual(values = c(col_fill, col_fill2))
The result for my data:
Reproduceable code:
File: enter link description here (I can not make this file shorter and copy it hear, else I get errors with smoothing for too few data points)
df2 <- read.csv("tmp.csv")
df2$dateTime <- as.POSIXct(df2$dateTime, format = "%Y-%m-%d %H:%M:%OS")
col_lines <- "#8DA8C5"
col_fill <- "#033F77"
col_fill2 <- "#E5E9F2"
ggplot(df2, aes(x = dateTime, y = capacity)) +
stat_smooth(geom = "area", method = "loess", show.legend = F,
mapping = aes(x = dateTime, y = capacity, fill = type, color = type, linetype = type)) +
scale_color_manual(values = c(col_fill, col_fill)) +
scale_fill_manual(values = c(col_fill, col_fill2))
I would suggest to model the data outside the plotting function and then plot it with ggplot. I used the pipes (%>%) and mutate from the tidyversefor convenient reasons, but you don't have to. Also, I prefer to have a line and a fill separated to avoid the dashed line on the right side of your plot.
df2$index <- as.numeric(df2$dateTime) #create an index for the loess model
model <- loess(capacity ~ index, data = df2) #model the capacity
plot <- df2 %>% mutate(capacity_predicted = predict(model)) %>% # use the predicted data for the capacity
ggplot(aes(x = dateTime, y = capacity_predicted)) +
geom_ribbon(aes(ymax = capacity_predicted, ymin = 0, fill = type, group = type)) +
geom_line(aes( color = type, linetype = type)) +
scale_color_manual(values = c(col_fill, col_fill)) +
scale_fill_manual(values = c(col_fill, col_fill2)) +
theme_minimal() +
theme(legend.position = "none")
plot
Please tell me if it works (I don't have the original data to test it), and if you would like a version without tidyverse functions.
EDIT:
Not very clean, but a smoother curve can be obtained with this code:
df3 <- data.frame(index = seq(min(df2$index), max(df2$index), length.out = 300),
type = "historic", stringsAsFactors = F)
modelling_date_index <- 1512562500
df3$type[df3$index <= modelling_date_index] = "predict"
plot <- df3 %>% mutate(capacity_predicted = predict(model, newdata = index),
dateTime = as.POSIXct(index, origin = '1970-01-01')) %>%
# arrange(dateTime) %>%
ggplot(aes(x = dateTime, y = capacity_predicted)) +
geom_ribbon(aes(ymax = capacity_predicted, ymin = 0, fill = type, group =
type)) +
geom_line(aes( color = type, linetype = type)) +
scale_color_manual(values = c(col_fill, col_fill)) +
scale_fill_manual(values = c(col_fill, col_fill2)) +
theme_minimal()+
theme(legend.position = "none")
plot

ggplot2:: Facetting plot with the same reference plot in all panels

I would like to facet a plot, but with a reference plot in each panel. Let me try to show with pictures what I want to achieve: My example data_frame:
require(dplyr)
df <- data_frame( id = c(rep('ctr',40), rep('pat',80)),
class = c(rep('ctr',40), rep(c('a','b'), each = 40)),
rank = rep (1:20,6),
mean = c(rep(seq(3,-3, length.out = 20),2),
rep(seq(1,-4, length.out = 20),2),
rep(seq(-2,-8, length.out = 20),2)),
sd = rep(seq(1.2,0.8, length.out = 20), times = 6),
exam = rep(c('blue','red'), each = 20, times = 3))
My plot:
# first, create reference plot of the 'controls'
require(ggplot2)
p_ctr <- ggplot() +
geom_line(data = filter(df, id == 'ctr'),
aes(x=rank, y=mean, color=exam), linetype=1) +
geom_ribbon(data = filter(df, id == 'ctr'),
aes(x = rank, ymax = mean+sd, ymin = mean-sd,
fill = exam), alpha = .1) +
scale_colour_manual(values = c("#00b6eb","#eb0041")) +
scale_fill_manual(values = c("#00b6eb","#eb0041"))
# then, overlay with plot of 'patients'
p_ctr + geom_line(data = filter(df, id == 'pat'),
aes(x=rank, y=mean, linetype = class)) +
geom_ribbon(data = filter(df, id == 'pat'),
aes(x = rank, ymax = mean+sd, ymin = mean-sd,
group = class),
alpha = .1) +
facet_wrap(~exam)
That is halfway there:
Ideally, however, I would like to plot the different "classes" in separate panels, but with the control plot as a reference in each panel:
Expected result:
I have tried different combinations of facetting, without good result. I guess, there must be a simple solution?
Maybe like so.
library(dplyr)
library(ggplot2)
df1 <- filter(df, id == 'ctr')
df2 <- filter(df, id == 'pat')
df2 <- dplyr::rename(df2, class_2 = class)
p_ctr <- ggplot() +
geom_line(data = df1, aes(x=rank, y=mean, color=exam)) +
geom_ribbon(data = df1,
aes(x = rank, ymax = mean+sd, ymin = mean-sd, fill = exam),
alpha = .1) +
scale_colour_manual(values = c("#00b6eb","#eb0041")) +
scale_fill_manual(values = c("#00b6eb","#eb0041")) +
geom_line(data = df2,
aes(x=rank, y=mean)) +
geom_ribbon(data = df2,
aes(x = rank, ymax = mean+sd, ymin = mean-sd),
alpha = .1) +
facet_grid(class_2 ~ exam)
p_ctr
Using facet_wrap gives me the following error:
error in gList(list(x = 0.5, y = 0.5, width = 1, height = 1, just = "centre", :
only 'grobs' allowed in "gList"
You probably came across this plot while looking for the solution.
p_ctr + geom_line(data = filter(df, id == 'pat'),
aes(x=rank, y=mean)) +
geom_ribbon(data = filter(df, id == 'pat'),
aes(x = rank, ymax = mean+sd, ymin = mean-sd),
alpha = .1) +
# facet_wrap(~exam) +
facet_grid(class ~ exam)
This is basically your reference plot and its overlay, without the linetype and group arguments. Additionally I faceted by class ~ exam. From this plot you see that 'the problem' is that class contains three unique elements: a, b and ctr. That's why I renamed the variable class in df2 to be class_2 which has only two unique elements: a and b. Faceting by class_2 ~ exam then gives the desired output.
I hope this helps.

Resources