I want to create in R a graphic similar to the one below to show where a certain person or company ranks relative to its peers. The score will always be between 1 and 100.
Although I am amenable to any ggplot solution it seemed to me that the best way would be to use geom_rect and then to adapt and add the arrowhead described in baptiste's answer to this question. However, I came unstuck on something even simpler - getting the geom_rect to fill properly with a gradient like that shown in the guide to the right of the plot below. This should be easy. What am I doing wrong?
library(ggplot2)
library(scales)
mydf <- data.frame(id = rep(1, 100), sales = 1:100)
ggplot(mydf) +
geom_rect(aes(xmin = 1, xmax = 1.5, ymin = 0, ymax = 100, fill = sales)) +
scale_x_discrete(breaks = 0:2, labels = 0:2) +
scale_fill_gradient2(low = 'blue', mid = 'white', high = 'red', midpoint = 50) +
theme_minimal()
I think that geom_tile() will be better - use sales for y and fill. With geom_tile() you will get separate tile for each sales value and will be able to see the gradient.
ggplot(mydf) +
geom_tile(aes(x = 1, y=sales, fill = sales)) +
scale_x_continuous(limits=c(0,2),breaks=1)+
scale_fill_gradient2(low = 'blue', mid = 'white', high = 'red', midpoint = 50) +
theme_minimal()
Related
I am trying to create a ggplot histogram with a density overlay, where the alpha changes past the number 1. An example can be seen on 538 under the Every outcome in our simulations section. The alpha differs based on the electoral vote count. I am close to getting a similar graph but I cannot figure out how to get the density and histogram to work together.
Code
library(data.table)
library(ggplot2)
dt <- data.table(ratio = rnorm(10000, mean = .5, sd = 1))
dt[, .(ratio,
al = (ratio >= 1))] %>%
ggplot(aes(x = ratio, alpha = al)) +
geom_histogram(aes(), bins = 100,
fill = 'red') +
geom_density(aes(),size = 1.5,
color = 'blue') +
geom_vline(xintercept = 1,
color = '#0080e2',
size = 1.2) +
scale_alpha_discrete(range = c(.65, .9))
This attempt correctly changes alpha past 1 as desired but the density estimate is not scaled.
dt[, .(ratio,
al = (ratio >= 1))] %>%
ggplot(aes(x = ratio)) +
geom_histogram(aes(y = ..density.., alpha = al), bins = 100,
fill = 'red') +
geom_density(aes(y = ..scaled..),size = 1.5,
color = 'blue',) +
geom_vline(xintercept = 1,
color = '#0080e2',
size = 1.2) +
scale_alpha_discrete(range = c(.65, .9))
This attempt correctly scales the density curve, but now the geom_histogram is calculated separately for values under 1 and above 1. I want them calculated as one group.
What am I missing?
The reason why knowing your theme is important is that there's an easy shortcut to this, which is not using alpha, but just drawing a semitransparent rectangle over the left half of your plot:
library(data.table)
library(ggplot2)
library(dplyr)
data.table(ratio = rnorm(10000, mean = .5, sd = 1)) %>%
ggplot(aes(x = ratio)) +
geom_histogram(aes(y = ..density..), bins = 100,
fill = 'red') +
geom_line(aes(), stat = "density", size = 1.5,
color = 'blue') +
geom_vline(xintercept = 1,
color = '#0080e2',
size = 1.2) +
annotate("rect", xmin = -Inf, xmax = 1, ymin = 0, ymax = Inf, fill = "white",
alpha = 0.5) +
theme_bw()
Splitting into two groups and using alpha is possible, but it basically requires you to precalculate the histogram and the density curve. That's fine, but it would be an awful lot of extra effort for very little visual gain.
Of course, if theme_josh has a custom background color and zany gridlines, this approach may not be quite so effective. As long as you set the fill color to the panel background you should get a decent result. (the default ggplot panel is "gray90" or "gray95" I think)
I want to separately plot data in a bubble plot like the image right (I make this in PowerPoint just to visualize).
At the moment I can only create a plot that looks like in the left where the bubble are overlapping. How can I do this in R?
b <- ggplot(df, aes(x = Year, y = Type))
b + geom_point(aes(color = Spp, size = value), alpha = 0.6) +
scale_color_manual(values = c("#0000FF", "#DAA520", "#228B22","#E7B888")) +
scale_size(range = c(0.5, 12))
You can have the use of position_dodge() argument in your geom_point. If you apply it directly on your code, it will position points in an horizontal manner, so the idea is to switch your x and y variables and use coord_flip to get it in the right way:
library(ggplot2)
ggplot(df, aes(y = as.factor(Year), x = Type))+
geom_point(aes(color = Group, size = Value), alpha = 0.6, position = position_dodge(0.9)) +
scale_color_manual(values = c("#0000FF", "#DAA520", "#228B22","#E7B888")) +
scale_size(range = c(1, 15)) +
coord_flip()
Does it look what you are trying to achieve ?
EDIT: Adding text in the middle of each points
To add labeling into each point, you can use geom_text and set the same position_dodge2 argument than for geom_point.
NB: I use position_dodge2 instead of position_dodge and slightly change values of width because I found position_dodge2 more adapted to this case.
library(ggplot2)
ggplot(df, aes(y = as.factor(Year), x = Type))+
geom_point(aes(color = Group, size = Value), alpha = 0.6,
position = position_dodge2(width = 1)) +
scale_color_manual(values = c("#0000FF", "#DAA520", "#228B22","#E7B888")) +
scale_size(range = c(3, 15)) +
coord_flip()+
geom_text(aes(label = Value, group = Group),
position = position_dodge2(width = 1))
Reproducible example
As you did not provide a reproducible example, I made one that is maybe not fully representative of your original dataset. If my answer is not working for you, you should consider providing a reproducible example (see here: How to make a great R reproducible example)
Group <- c(LETTERS[1:3],"A",LETTERS[1:2],LETTERS[1:3])
Year <- c(rep(1918,4),rep(2018,5))
Type <- c(rep("PP",3),"QQ","PP","PP","QQ","QQ","QQ")
Value <- sample(1:50,9)
df <- data.frame(Group, Year, Value, Type)
df$Type <- factor(df$Type, levels = c("PP","QQ"))
I'm trying to create a scatterplot where the points are jittered (geom_jitter), but I also want to create a black outline around each point. Currently I'm doing it by adding 2 geom_jitters, one for the fill and one for the outline:
beta <- paste("beta == ", "0.15")
ggplot(aes(x=xVar, y = yVar), data = data) +
geom_jitter(size=3, alpha=0.6, colour=my.cols[2]) +
theme_bw() +
geom_abline(intercept = 0.0, slope = 0.145950, size=1) +
geom_vline(xintercept = 0, linetype = "dashed") +
annotate("text", x = 2.5, y = 0.2, label=beta, parse=TRUE, size=5)+
xlim(-1.5,4) +
ylim(-2,2)+
geom_jitter(shape = 1,size = 3,colour = "black")
However, that results in something like this:
Because jitter randomly offsets the data, the 2 geom_jitters are not in line with each other. How do I ensure the outlines are in the same place as the fill points?
I've see threads about this (e.g. Is it possible to jitter two ggplot geoms in the same way?), but they're pretty old and not sure if anything new has been added to ggplot that would solve this issue
The code above works if, instead of using geom_jitter, I use the regular geom_point, but I have too many overlapping points for that to be useful
EDIT:
The solution in the posted answer works. However, it doesn't quite cooperate for some of my other graphs where I'm binning by some other variable and using that to plot different colours:
ggplot(aes(x=xVar, y = yVar, color=group), data = data) +
geom_jitter(size=3, alpha=0.6, shape=21, fill="skyblue") +
theme_bw() +
geom_vline(xintercept = 0, linetype = "dashed") +
scale_colour_brewer(name = "Title", direction = -1, palette = "Set1") +
xlim(-1.5,4) +
ylim(-2,2)
My group variable has 3 levels, and I want to colour each group level by a different colour in the brewer Set1 palette. The current solution just colours everything skyblue. What should I fill by to ensure I'm using the correct colour palette?
You don't actually have to use two layers; you can just use the fill aesthetic of a plotting character with a hole in it:
# some random data
set.seed(47)
df <- data.frame(x = rnorm(100), y = runif(100))
ggplot(aes(x = x, y = y), data = df) + geom_jitter(shape = 21, fill = 'skyblue')
The colour, size, and stroke aesthetics let you customize the exact look.
Edit:
For grouped data, set the fill aesthetic to the grouping variable, and use scale_fill_* functions to set color scales:
# more random data
set.seed(47)
df <- data.frame(x = runif(100), y = rnorm(100), group = sample(letters[1:3], 100, replace = TRUE))
ggplot(aes(x=x, y = y, fill=group), data = df) +
geom_jitter(size=3, alpha=0.6, shape=21) +
theme_bw() +
geom_vline(xintercept = 0, linetype = "dashed") +
scale_fill_brewer(name = "Title", direction = -1, palette = "Set1")
I'm creating a figure using ggplot. That figure has 27 lines that I want to show but not emphasize, and two lines, mean and weighted mean, that I want to emphasize. I would like only these last two lines to appear into the legend of the plot. Here is my code:
p_plot <- ggplot(data = dta, aes(x = date, y = premium, colour = State)) +
geom_line(, show_guide=FALSE) +
scale_color_manual(values=c(rep("gray60", 27)))
p_plot <- p_plot + geom_line(aes(y = premium.m), colour = "blue", size = 1.25,
show_guide=TRUE) + geom_line(aes(y = premium.m.w), colour = "red",
size = 1.25, show_guide=c(TRUE)) + ylab("Pe/pg")
p_plot
The show_guide = FALSE statement in the first geom_line seems to be overridden by the other show_guide=TRUE statements. How can I limit the number of entries in the legend of my figures to the lines "premium.m" and "premium.m.w"? Thank you.
I think this should answer your question: (the code's been slightly modified but the concept is the same)
dta <- data.frame(date = rep(seq.Date(as.Date("2010-01-01"), as.Date("2010-12-01"), "months"), 26),
premium = rnorm(12*26),
State = rep(letters, each = 12))
library(ggplot2)
p_plot <- ggplot(data = dta) +
geom_line(aes(x = date, y = premium, group = State), colour = "grey60")
p_plot + geom_line(aes(x = unique(date), y = as.numeric(tapply(premium, date, mean)), colour = "mean"),
size = 1.25) +
geom_line(aes(x = unique(date), y = as.numeric(tapply(premium, date, median)), colour = "median"),
size = 1.25) + ylab("Pe/pg") + scale_color_discrete("stats")
p_plot
However, this is just a (ugly) workaround and far from the best practice for data visualisation (especially for the purposes ggplot has been implemented for). Anyway, I could provide you with a more elegant solution if you edited your question adding more details.
I'm making a straightforward barchart in R using the ggplot2 package. Rather than the grey default I'd like to divide the background into five regions, each a different (but similarly understated) colour. How do I do this?
More specifically, I'd like the five coloured regions to run from 0-25, 25-45, 45-65, 65-85 and 85-100 where the colours represent worse-than-bronze, bronze, silver, gold and platinum respectively. Suggestions for a colour scheme very welcome too.
Here's an example to get you started:
#Fake data
dat <- data.frame(x = 1:100, y = cumsum(rnorm(100)))
#Breaks for background rectangles
rects <- data.frame(xstart = seq(0,80,20), xend = seq(20,100,20), col = letters[1:5])
#As Baptiste points out, the order of the geom's matters, so putting your data as last will
#make sure that it is plotted "on top" of the background rectangles. Updated code, but
#did not update the JPEG...I think you'll get the point.
ggplot() +
geom_rect(data = rects, aes(xmin = xstart, xmax = xend, ymin = -Inf, ymax = Inf, fill = col), alpha = 0.4) +
geom_line(data = dat, aes(x,y))
I wanted to move the line⎯or the bars of the histogram⎯to the foreground, as suggested by baptiste above and fix the background with
+ theme(panel.background = element_rect(), panel.grid.major = element_line( colour = "white") ), unfortunately I could only do it by sending the geom_bar twice, hopefully someone can improve the code and make the answer complete.
background <- data.frame(lower = seq( 0 , 3 , 1.5 ),
upper = seq( 1.5, 4.5, 1.5 ),
col = letters[1:3])
ggplot() +
geom_bar( data = mtcars , aes( factor(cyl) ) ) +
geom_rect( data = background ,
mapping = aes( xmin = lower ,
xmax = upper ,
ymin = 0 ,
ymax = 14 ,
fill = col ) ,
alpha = .5 ) +
geom_bar(data = mtcars,
aes(factor(cyl))) +
theme(panel.background = element_rect(),
panel.grid.major = element_line( colour = "white"))
Produces this,
Take a look at this site for colour scheme suggestions.
Since you are after vertical (or horizontal) area highlighting, geom_rect() might be an overshoot. Consider geom_ribbon() instead:
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
geom_ribbon(aes(xmin=3, xmax=4.2), alpha=0.25) +
theme_minimal()