This question has two parts, one more general and the other a specific case:
Is there a theme or template in R for producing plots that have similar appearance to the charts published in "The Economist" magazine? Examples in other contexts include: Create "The Economist" style graphs from python for python and set scheme economist for Stata.
Specifically, what would be the syntax (e.g., in ggplot2) to produce a groups bar plot that would look like the example below, colored shaped markers with bold lines spanning the range between them (left panel), or rectangular confidence intervals (right panel)?
Source: https://www.economist.com/graphic-detail/2020/04/01/covid-19-may-be-far-more-prevalent-than-previously-thought
Yes you have it in ggthemes (extension of ggplot2) with theme_economist and theme_economist_white.
For the bar plot, you will need to play with geom_bar and coord_flip (here)
Examples from ggthemes doc (here)
library("ggplot2")
library("ggthemes")
p <- ggplot(mtcars) +
geom_point(aes(x = wt, y = mpg, colour = factor(gear))) +
facet_wrap(~am) +
# Economist puts x-axis labels on the right-hand side
scale_y_continuous(position = "right")
## Standard
p + theme_economist() +
scale_colour_economist()
## White
p + theme_economist_white() +
scale_colour_economist()
How to reproduce the plot given in example
Since I cannot install SciencesPo package in my computer, I propose you a ggplot + ggthemes approach.
A good starting point might be the following approach. I use as an example the diamond dataset.
library(dplyr)
library(ggplot2)
library(ggthemes)
df <- diamonds %>%
group_by(cut) %>%
summarise(mean = mean(price), sigma = sd(price),
n = n())
df <- df %>%
mutate(int_minus = mean - 1.96*sigma/sqrt(n),
int_plus = mean + 1.96*sigma/sqrt(n))
And then the plot
ggplot(df) +
geom_segment(aes(x = int_minus, xend = int_plus, y = factor(cut), yend = factor(cut)), size = 2L, alpha = 0.4) +
geom_point(aes(x = mean, y = factor(cut)), shape = 15, color = "blue", size = 4L) +
theme_economist_white()
Related
Hi I am trying to code for a scatter plot for three variables in R:
Race= [0,1]
YOI= [90,92,94]
ASB_mean = [1.56, 1.59, 1.74]
Antisocial <- read.csv(file = 'Antisocial.csv')
Table_1 <- ddply(Antisocial, "YOI", summarise, ASB_mean = mean(ASB))
Table_1
Race <- unique(Antisocial$Race)
Race
ggplot(data = Table_1, aes(x = YOI, y = ASB_mean, group_by(Race))) +
geom_point(colour = "Black", size = 2) + geom_line(data = Table_1, aes(YOI,
ASB_mean), colour = "orange", size = 1)
Image of plot: https://drive.google.com/file/d/1E-ePt9DZJaEr49m8fguHVS0thlVIodu9/view?usp=sharing
Data file: https://drive.google.com/file/d/1UeVTJ1M_eKQDNtvyUHRB77VDpSF1ASli/view?usp=sharing
Can someone help me understand where I am making mistake? I want to plot mean ASB vs YOI grouped by Race. Thanks.
I am not sure what is your desidered output. Maybe, if I well understood your question I Think that you want somthing like this.
g_Antisocial <- Antisocial %>%
group_by(Race) %>%
summarise(ASB = mean(ASB),
YOI = mean(YOI))
Antisocial %>%
ggplot(aes(x = YOI, y = ASB, color = as_factor(Race), shape = as_factor(Race))) +
geom_point(alpha = .4) +
geom_point(data = g_Antisocial, size = 4) +
theme_bw() +
guides(color = guide_legend("Race"), shape = guide_legend("Race"))
and this is the output:
#Maninder: there are a few things you need to look at.
First of all: The grammar of graphics of ggplot() works with layers. You can add layers with different data (frames) for the different geoms you want to plot.
The reason why your code is not working is that you mix the layer call and or do not really specify (and even mix) what is the scatter and line visualisation you want.
(I) Use ggplot() + geom_point() for a scatter plot
The ultimate first layer is: ggplot(). Think of this as your drawing canvas.
You then speak about adding a scatter plot layer, but you actually do not do it.
For example:
# plotting antisocal data set
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race)))
will plot your Antiscoial data set using the scatter, i.e. geom_point() layer.
Note that I put Race as a factor to have a categorical colour scheme otherwise you might end up with a continous palette.
(II) line plot
In analogy to above, you would get for the line plot the following:
# plotting Table_1
ggplot() +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean))
I save showing the plot of the line.
(III) combining different layers
# putting both together
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race))) +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean)) +
## this is to set the legend title and have a nice(r) name in your colour legend
labs(colour = "Race")
This yields:
That should explain how ggplot-layering works. Keep an eye on the datasets and geoms that you want to use. Before working with inheritance in aes, I recommend to keep the data= and aes() call in the geom_xxxx. This avoids confustion.
You may want to explore with geom_jitter() instead of geom_point() to get a bit of a better presentation of your dataset. The "few" points plotted are the result of many datapoints in the same position (and overplotted).
Moving away from plotting to your question "I want to plot mean ASB vs YOI grouped by Race."
I know too little about your research to fully comprehend what you mean with that.
I take it that the mean ASB you calculated over the whole population is your reference (aka your Table_1), and you would like to see how the Race groups feature vs this population mean.
One option is to group your race data points and show them as boxplots for each YOI.
This might be what you want. The boxplot gives you the median and quartiles, and you can compare this per group against the calculated ASB mean.
For presentation purposes, I highlighted the line by increasing its size and linetype. You can play around with the colours, etc. to give you the aesthetics you aim for.
Please note, that for the grouped boxplot, you also have to treat your integer variable YOI, I coerced into a categorical factor. Boxplot works with fill for the body (colour sets only the outer line). In this setup, you also need to supply a group value to geom_line() (I just assigned it to 1, but that is arbitrary - in other contexts you can assign another variable here).
ggplot() +
geom_boxplot(data = Antisocial, aes(x = as.factor(YOI), y = ASB, fill = as.factor(Race))) +
geom_line(data = Table_1, aes(x = as.factor(YOI), y = ASB_mean, group = 1)
, size = 2, linetype = "dashed") +
labs(x = "YOI", fill = "Race")
Hope this gets you going!
I'm not able to correctly draw a colour aesthetic line in plotly, using a ggplot object. What am I missing?
library(ggplot2)
library(plotly)
df <- data.frame(val = as.numeric(LakeHuron), idx = 1:length(LakeHuron))
p <- ggplot(df, aes(x = idx, y = val, colour = val)) + geom_line()
p <- p + scale_color_gradient2(low="red", mid = "gold", high="green", midpoint = mean(df$val))
p
p2 <- ggplotly(p)
p2
p prints the correct expected output.
When I print the plotly object p2, I dont get the line points joining correctly?
The problem is when i add the colour aesthetic I think.
Versions:
plotly 4.9, ggplot2 3.1.1
This is due to a limitation / difference in how plotly works vs. ggplot. Looks like there's an open issue here updated August 2018 suggesting it's not possible within the same structure ggplot uses -- a single series in plotly can't currently have varying color. ("We don't allow per-segment coloring on line traces")
But fear not! We could construct the plot a little differently using geom_segment to specify each part of the line as a separate segment. This structure is a separate object for each segment and will convert over to plotly fine:
df <- data.frame(val = as.numeric(LakeHuron), idx = 1:length(LakeHuron))
p_seg <- ggplot(df, aes(x = idx, y = val,
xend = lead(idx), yend = lead(val),
colour = val)) +
geom_segment()
p_seg <- p_seg + scale_color_gradient2(low="red", mid = "gold", high="green", midpoint = mean(df$val))
p_seg
p2 <- ggplotly(p_seg)
I've made a violin plot that looks like this:
As we can see most of the data lies near the region where the score is 0.90-0.95. What I wish is to focus on the interval 0.75 to 1.00 by changing the scale giving less space to ratings from 0 to 0.75.
Is there a way to do this?
This is the code I'm currently using to create the violin plot:
ggplot(data=Violin_plots, aes(x = Year, y = Score)) +
geom_violin(aes(fill = Violin_plots$Year), trim = TRUE) +
coord_flip()+
scale_fill_brewer(palette = "Blues") +
theme(legend.position = 'none') +
labs(y = "Rating score",
fill = "Rating year",
title = "Violin-plots of credit rating scores")
While it's possible to transform the scale to focus more in the upper region (e.g. add trans = "exp" as an argument to the scale), a non linear scale is often hard to interpret appropriately.
For such use cases, I recommend facet_zoom from the ggforce package, which is pretty much built for this exact purpose (see vignette here).
I also switched from geom_violin() + coord_flip() to geom_violinh from the ggstance package, which extends ggplot2 by providing flipped versions of ggplot components. Example with simulated data below:
library(ggforce) # for facet_zoom
library(ggstance) # for flipped version of geom_violin
ggplot(df,
aes(x = rating, y = year, fill = year)) +
geom_violinh() + # no need to specify trim = TRUE as it's the default
scale_fill_brewer(palette = "Blues") +
theme(legend.position = 'none') +
facet_zoom(xlim = c(0.75, 0.98)) # specify zoom range here
Sample data that simulates the characteristics of the data in the question:
df <- diamonds[, c("color", "price")]
df$rating <- (max(df$price) - df$price) / max(df$price)
df$year <- df$color
You could create a second plot to zoom in on the original plot, without modifying the data, by using ggplot2::coord_cartesian()
ggplot(data=Violin_plots, aes(x=Year,y=Score*100)) +
geom_violin(aes(fill=Violin_plots$Year),trim=TRUE) +
coord_flip() +
coord_cartesian(xlim = c(0.75, 1.00)) +
scale_fill_brewer(palette="Blues") +
theme(legend.position='none') +
labs(y="Rating score",fill="Rating year",title="Violin-plots of credit rating scores")
My question relates to plots in ggplot. Running the code below each image should work if you load the "diamonds" dataset that comes with ggplot2.
I am trying to generate a graph like this:
library(ggplot2)
#First plot
p1 <- ggplot(diamonds, aes(color)) + geom_bar(aes(group = cut, y = ..density..))
p1 <- p1 + facet_wrap(~cut)
p1
but I want to color each bar in each facet by factor, like in this plot:
#Second plot
p2 <- ggplot(diamonds, aes(color)) + geom_bar(aes( y = ..density.., fill = color))
p2 <- p2 + facet_wrap(~cut)
p2
The problem is that "group =" and "fill=" appear to interfere with each other when I attempt to call them both; ggplot seems to ignore the "fill" command when "group" is also called.
The call to group is important because it forces the y-axis to scale for each facet, so that densities within each facet add up to 1. However, I'd like to be able to visually distinguish between groups easily using fill colors.
How can I work around this?
The problem is with ..density... It often is a convenient shortcut, but in a more complicated situation like this one it's often easier just to calculate on your own:
library(dplyr)
diam2 <- diamonds %>% group_by(cut) %>%
mutate(ncut = n()) %>%
group_by(cut, color) %>%
summarize(den = n() / first(ncut))
ggplot(diam2, aes(x = color, fill = color, y = den)) +
geom_bar(stat = "identity") +
facet_wrap(~ cut)
I should add, comparing my plot with your p1, the shapes are the same but the scale looks a little different (mine being a little lower overall). I'm not sure why.
Does anyone know how to create a scatterplot in R to create plots like these in PRISM's graphpad:
I tried using boxplots but they don't display the data the way I want it. These column scatterplots that graphpad can generate show the data better for me.
Any suggestions would be appreciated.
As #smillig mentioned, you can achieve this using ggplot2. The code below reproduces the plot that you are after pretty well - warning it is quite tricky. First load the ggplot2 package and generate some data:
library(ggplot2)
dd = data.frame(values=runif(21), type = c("Control", "Treated", "Treated + A"))
Next change the default theme:
theme_set(theme_bw())
Now we build the plot.
Construct a base object - nothing is plotted:
g = ggplot(dd, aes(type, values))
Add on the points: adjust the default jitter and change glyph according to type:
g = g + geom_jitter(aes(pch=type), position=position_jitter(width=0.1))
Add on the "box": calculate where the box ends. In this case, I've chosen the average value. If you don't want the box, just omit this step.
g = g + stat_summary(fun.y = function(i) mean(i),
geom="bar", fill="white", colour="black")
Add on some error bars: calculate the upper/lower bounds and adjust the bar width:
g = g + stat_summary(
fun.ymax=function(i) mean(i) + qt(0.975, length(i))*sd(i)/length(i),
fun.ymin=function(i) mean(i) - qt(0.975, length(i)) *sd(i)/length(i),
geom="errorbar", width=0.2)
Display the plot
g
In my R code above I used stat_summary to calculate the values needed on the fly. You could also create separate data frames and use geom_errorbar and geom_bar.
To use base R, have a look at my answer to this question.
If you don't mind using the ggplot2 package, there's an easy way to make similar graphics with geom_boxplot and geom_jitter. Using the mtcars example data:
library(ggplot2)
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot() + geom_jitter() + theme_bw()
which produces the following graphic:
The documentation can be seen here: http://had.co.nz/ggplot2/geom_boxplot.html
I recently faced the same problem and found my own solution, using ggplot2.
As an example, I created a subset of the chickwts dataset.
library(ggplot2)
library(dplyr)
data(chickwts)
Dataset <- chickwts %>%
filter(feed == "sunflower" | feed == "soybean")
Since in geom_dotplot() is not possible to change the dots to symbols, I used the geom_jitter() as follow:
Dataset %>%
ggplot(aes(feed, weight, fill = feed)) +
geom_jitter(aes(shape = feed, col = feed), size = 2.5, width = 0.1)+
stat_summary(fun = mean, geom = "crossbar", width = 0.7,
col = c("#9E0142","#3288BD")) +
scale_fill_manual(values = c("#9E0142","#3288BD")) +
scale_colour_manual(values = c("#9E0142","#3288BD")) +
theme_bw()
This is the final plot:
For more details, you can have a look at this post:
http://withheadintheclouds1.blogspot.com/2021/04/building-dot-plot-in-r-similar-to-those.html?m=1