Difference plot - r

I don't know the name of this type of plot (comments around this are welcomed). Essentially it is a barplot with glyphs that are filled to indicate a loss/gain. The glyph is arrow like encoding information about direction, magnitude, and allowing the bar geom under to be seen.
This looks interesting but can't think of how to do it in ggplot2 (grid frame work). How could we recreate this plot in ggplot2/grid framework (base solutions welcomed as well for completeness of question). Specifically the glyphs, not the text as this is pretty straight forward in ggplot2 already.
Here is some code to create data and traditional overlaid & coordinate flipped dodged bar plots and line graphs to show typical ways of visualizing this type of data.
set.seed(10)
x <- sample(30:60, 12)
y <- jitter(x, 60)
library(ggplot2)
dat <- data.frame(
year = rep(2012:2013, each=12),
month = rep(month.abb, 2),
profit = c(x, y)
)
ggplot() +
geom_bar(data=subset(dat, year==2012), aes(x=month, weight=profit)) +
geom_bar(data=subset(dat, year==2013), aes(x=month, weight=profit), width=.5, fill="red")
ggplot(dat, aes(x=month, fill=factor(year))) +
geom_bar(position="dodge", aes(weight=profit)) +
coord_flip
ggplot(dat, aes(x=month, y=profit, group = year, color=factor(year))) +
geom_line(size=1)

Here is an example, perhaps there are other ways though,
dat <- data.frame(
year = rep(2012:2013, each=12),
month = factor(rep(1:12, 2), labels=month.abb),
profit = c(x, y)
)
dat2 <- reshape2::dcast(dat, month~ year, value.var = "profit")
names(dat2)[2:3] <- paste0("Y", names(dat2)[2:3])
ggplot(dat2) +
geom_bar(aes(x=month, y = Y2012), stat = "identity", fill = "grey80", width = 0.6) +
geom_segment(aes(x=as.numeric(month)-0.4, xend = as.numeric(month)+0.4, y = Y2013, yend = Y2013)) +
geom_segment(aes(x = month, xend = month, y = Y2013, yend = Y2012, colour = Y2013 < Y2012),
arrow = arrow(60, type = "closed", length = unit(0.1, "inches")), size = 1.5) +
theme_bw()

Related

How to smooth out a time-series geom_area with fill in ggplot?

I have the following graph and code:
Graph
ggplot(long2, aes(x = DATA, y = value, fill = variable)) + geom_area(position="fill", alpha=0.75) +
scale_y_continuous(labels = scales::comma,n.breaks = 5,breaks = waiver()) +
scale_fill_viridis_d() +
scale_x_date(date_labels = "%b/%Y",date_breaks = "6 months") +
ggtitle("Proporcions de les visites, només 9T i 9C") +
xlab("Data") + ylab("% visites") +
theme_minimal() + theme(legend.position="bottom") + guides(fill=guide_legend(title=NULL)) +
annotate("rect", fill = "white", alpha = 0.3,
xmin = as.Date.character("2020-03-16"), xmax = as.Date.character("2020-06-22"),
ymin = 0, ymax = 1)
But it has some sawtooth, how am I supposed to smooth it out?
I believe your situation is roughly analogous to the following, wherein we have missing x-positions for one group, but not the other at the same position. This causes spikes if you set position = "fill".
library(ggplot2)
x <- seq_len(100)
df <- data.frame(
x = c(x[-c(25, 75)], x[-50]),
y = c(cos(x[-c(25, 75)]), sin(x[-50])) + 5,
group = rep(c("A", "B"), c(98, 99))
)
ggplot(df, aes(x, y, fill = group)) +
geom_area(position = "fill")
To smooth out these spikes, it has been suggested to linearly interpolate the data at the missing positions.
# Find all used x-positions
ux <- unique(df$x)
# Split data by group, interpolate data groupwise
df <- lapply(split(df, df$group), function(xy) {
approxed <- approx(xy$x, xy$y, xout = ux)
data.frame(x = ux, y = approxed$y, group = xy$group[1])
})
# Recombine data
df <- do.call(rbind, df)
# Now without spikes :)
ggplot(df, aes(x, y, fill = group)) +
geom_area(position = "fill")
Created on 2022-06-17 by the reprex package (v2.0.1)
P.S. I would also have expected a red spike at x=50, but for some reason this didn't happen.

Raster-like timeseries graph in ggplot2

I'm trying to recreate a graph like the one here using ggplot2.
I can get pretty close if I mess around with the size and shape of points using coord_equal, but...
Example data and code
library(ggplot2)
df <- data.frame()
Years <- 1990:2020
for(i in 1:length(Years)) {
Year <- Years[i]
week <-1:52
value <- sort(round(rnorm(52, 50, 30), 0))
df.small <- data.frame(Year = Year, week = week, value = value)
df <- bind_rows(df, df.small)
}
ggplot(df, aes(week, Year, color = value)) +
geom_point(shape = 15, size = 2.7) +
scale_color_gradientn(colours = rainbow(10)) +
coord_equal()
The problem is,
with my real data I want to "stretch" the graph so I can see it more clearly (my timeseries is shorter) and when I don't use coord_equal, squares don't fill the graphing area:
ggplot(df, aes(week, Year, color = value)) +
geom_point(shape = 15, size = 2.7) +
scale_color_gradientn(colours = rainbow(10))
Is this as simple as using the geom_raster geom?
ggplot(df, aes(week, Year)) +
geom_raster(aes(fill = value)) +
scale_fill_gradientn(colours = rainbow(10)) +
coord_equal()

R - geom_bar - 'stack' position without summing the values

I have this data frame
df <- data.frame(profile = rep(c(1,2), times = 1, each = 3), depth = c(100, 200, 300), value = 1:3)
This is my plot
ggplot() +
geom_bar(data = df, aes(x = profile, y = - depth, fill = value), stat = "identity")
My problem is the y labels which doesn't correspond to the depth values of the data frame
To help, my desired plot seems like this :
ggplot() +
geom_point(data = df, aes(x = profile, y = depth, colour = value), size = 20) +
xlim(c(0,3))
But with bar intead of points vertically aligned
nb : I don't want to correct it manually in changing ticks with scale_y_discrete(labels = (desired_labels))
Thanks for help
Considering you want a y-axis from 0 to -300, using facet_grid() seems to be a right option without summarising the data together.
ggplot() + geom_bar(data = df, aes(x = as.factor(profile), y = -depth, fill = value), stat = 'identity') + facet_grid(~ value)
I have it !
Thanks for your replies and to this post R, subtract value from previous row, group by
To resume; the data :
df <- data.frame(profile = rep(c(1,2), times = 1, each = 3), depth = c(100, 200, 300), value = 1:3)
Then we compute the depth step of each profile :
df$diff <- ave(df$depth, df$profile, FUN=function(z) c(z[1], diff(z)))
And finally the plot :
ggplot(df, aes(x = factor(profile), y = -diff, fill = value)) + geom_col()

Directlabels package-- labels do not fit in plot area

I want to explore the directlabels package with ggplot. I am trying to plot labels at the endpoint of a simple line chart; however, the labels are clipped by the plot panel. (I intend to plot about 10 financial time series in one plot and I thought directlabels would be the best solution.)
I would imagine there may be another solution using annotate or some other geoms. But I would like to solve the problem using directlabels. Please see code and image below. Thanks.
library(ggplot2)
library(directlabels)
library(tidyr)
#generate data frame with random data, for illustration and plot:
x <- seq(1:100)
y <- cumsum(rnorm(n = 100, mean = 6, sd = 15))
y2 <- cumsum(rnorm(n = 100, mean = 2, sd = 4))
data <- as.data.frame(cbind(x, y, y2))
names(data) <- c("month", "stocks", "bonds")
tidy_data <- gather(data, month)
names(tidy_data) <- c("month", "asset", "value")
p <- ggplot(tidy_data, aes(x = month, y = value, colour = asset)) +
geom_line() +
geom_dl(aes(colour = asset, label = asset), method = "last.points") +
theme_bw()
On data visualization principles, I would like to avoid extending the x-axis to make the labels fit--this would mean having data space with no data. Rather, I would like the labels to extend toward the white space beyond the chart box/panel (if that makes sense).
In my opinion, direct labels is the way to go. Indeed, I would position labels at the beginning and at the end of the lines, creating space for the labels using expand(). Also note that with the labels, there is no need for the legend.
This is similar to answers here and here.
library(ggplot2)
library(directlabels)
library(grid)
library(tidyr)
x <- seq(1:100)
y <- cumsum(rnorm(n = 100, mean = 6, sd = 15))
y2 <- cumsum(rnorm(n = 100, mean = 2, sd = 4))
data <- as.data.frame(cbind(x, y, y2))
names(data) <- c("month", "stocks", "bonds")
tidy_data <- gather(data, month)
names(tidy_data) <- c("month", "asset", "value")
ggplot(tidy_data, aes(x = month, y = value, colour = asset, group = asset)) +
geom_line() +
scale_colour_discrete(guide = 'none') +
scale_x_continuous(expand = c(0.15, 0)) +
geom_dl(aes(label = asset), method = list(dl.trans(x = x + .3), "last.bumpup")) +
geom_dl(aes(label = asset), method = list(dl.trans(x = x - .3), "first.bumpup")) +
theme_bw()
If you prefer to push the labels into the plot margin, direct labels will do that. But because the labels are positioned outside the plot panel, clipping needs to be turned off.
p1 <- ggplot(tidy_data, aes(x = month, y = value, colour = asset, group = asset)) +
geom_line() +
scale_colour_discrete(guide = 'none') +
scale_x_continuous(expand = c(0, 0)) +
geom_dl(aes(label = asset), method = list(dl.trans(x = x + .3), "last.bumpup")) +
theme_bw() +
theme(plot.margin = unit(c(1,4,1,1), "lines"))
# Code to turn off clipping
gt1 <- ggplotGrob(p1)
gt1$layout$clip[gt1$layout$name == "panel"] <- "off"
grid.draw(gt1)
This effect can also be achieved using geom_text (and probably also annotate), that is, without the need for direct labels.
p2 = ggplot(tidy_data, aes(x = month, y = value, group = asset, colour = asset)) +
geom_line() +
geom_text(data = subset(tidy_data, month == 100),
aes(label = asset, colour = asset, x = Inf, y = value), hjust = -.2) +
scale_x_continuous(expand = c(0, 0)) +
scale_colour_discrete(guide = 'none') +
theme_bw() +
theme(plot.margin = unit(c(1,3,1,1), "lines"))
# Code to turn off clipping
gt2 <- ggplotGrob(p2)
gt2$layout$clip[gt2$layout$name == "panel"] <- "off"
grid.draw(gt2)
Since you didn't provide a reproducible example, it's hard to say what the best solution is. However, I would suggest trying to manually adjust the x-scale. Use a "buffer" increase the plot area.
#generate data frame with random data, for illustration and plot:
p <- ggplot(tidy_data, aes(x = month, y = value, colour = asset)) +
geom_line() +
geom_dl(aes(colour = asset, label = asset), method = "last.points") +
theme_bw() +
xlim(minimum_value, maximum_value + buffer)
Using scale_x_discrete() or scale_x_continuous() would likely also work well here if you want to use the direct labels package. Alternatively, annotate or a simple geom_text would also work well.

Plot with multiple breaks of different sizes

I would like to create a plot with multiple breaks of different sized intervals on the y axis. The closest post I could find is this Show customised X-axis ticks in ggplot2 But it doesn't fully solve my problem.
# dummy data
require(ggplot2)
require(reshape2)
a<-rnorm(mean=15,sd=1.5, n=100)
b<-rnorm(mean=1500,sd=150, n=100)
df<-data.frame(a=a,b=b)
df$x <- factor(seq(100), ordered = T)
df.m <- melt(df)
ggplot(data = df.m, aes(x = x, y=value, colour=variable, group=variable)) +
geom_line() + scale_y_continuous(breaks = c(seq(from = 0, to = 20, by = 1),
seq(from = 1100, to = max(y), by = 100))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
The problem is how to get the first set of breaks to be proportional to the second (thus visible).
Any pointer would be very much appreciated, thanks!
You can try something like this:
# Rearrange the factors in the data.frame
df.m$variable <- factor(df.m$variable, levels = c("b", "a"))
ggplot(data = df.m, aes(x = x, y=value, colour=variable, group=variable)) +
geom_line() + facet_grid(variable~., scales = "free")
Hope this helps

Resources