Create one tornado diagram with multiple factors ggplot - r

I am working with data that I am turning into a tornado diagram that in the end I want to look something like this:
Currently my data looks like this:
Using facet_wrap I get a result like this but want it all on one graph to look like the sample at the start:
ggplot(data = df, aes(x = Person)) +
geom_crossbar(aes(y=Mean, ymin= Min, ymax = Max)) +
coord_flip() +
facet_wrap(~Letter, strip.position = "left", scales = "free_x") +
theme(panel.spacing = unit(0, "lines"), strip.background = element_blank(), strip.placement = "outside")
Is there any way to do this in ggplot?
Sorry I had to include the images as links!

How about this? See the code and result below with some explanations after.
# reorder levels in order of appearance
df$Person <- factor(df$Person, levels=unique(df$Person))
ggplot(df, aes(x=Letter)) +
geom_crossbar(
aes(y=Mean, ymin=Min, ymax=Max),
fill='dodgerblue2', width=0.8
) +
facet_grid(Person~., scales='free', space='free', switch = 'y') +
scale_y_continuous(expand=expansion(mult=c(0.3,0.3))) +
labs(x=NULL, y='Axis Label Here') +
theme_classic() +
theme(
strip.placement = 'outside',
strip.background = element_rect(color=NA, fill=NA),
strip.text.y=element_text(angle=90, size=12),
panel.spacing = unit(0,'pt'),
panel.background = element_rect(color='black')
) +
coord_flip()
Now, for some explanation, where I'll step through the code from top to bottom, calling out specific changes/adjustments as they come up.
Re-leveling Person Factor. The purpose here is to ensure that the order we see the listing of "Persons" matches the order in which they are listed in the dataset. You can list them in any order, of course, but the default for characters/strings is: If it is a factor, then the order = the ordering of the levels. If it is not a factor, the order = alphabetical.
Overall Adjustment for facets. Given that each df$Person has one or more df$Letters associated, and given your example plot, it seems that you actually want to have facets be df$Person, with each having an x aesthetic for df$Letter.
Facet_grid. I use facet_grid() instead of facet_wrap(), since it offers more control. If you use the . ~ facet or facet ~ . notation, it acts just like facet_wrap(), except it will not "wrap" around. There are three critical arguments that are only all available via facet_grid():
scales=. This will remove the extra space in each facet that is not used. Since not every df$Person has the same amount of df$Letter associated, this is very important.
space=. By default, the space occupied by each facet is kept constant. This means that if one name has 3 letters and another facet has only 1, the width of the bars in each facet will be smaller in the one with 3 vs the one bar in the facet with only 2. Setting space="free" allows for all widths to be constant: it's the facet size that is "free" to adjust to the bar - not the other way around.
switch=. This allows for the strip placement (facet label) to be on the opposite side. It doesn't place it outside though...
Expanding the y scale. This is purely aesthetic. I'm trying to match what you show, which has extra space around the bars.
Theme Elements. There's a decent amount going on here, but basically I'm putting the facet label outside (strip.placement) and removing the box that goes around it usually (strip.background). I also smoosh the facets together (panel.spacing) and decided it was easier to view when you drew some lines between the facets (panel.background).
Some other things would be purely aesthetic, but I think this gets you close to your desired result. If you want to include the information / text... that's a different matter.

Related

In ggplot2 , I want to plot boxplot+dotplot side by side

In ggplot2 , I want to plot boxplot+dotplot side by side as attached image. But the code can't work, anyone can help? this code from 'R graphic cookbook'. Thanks!
library(gcookbook)
library(tidyverse)
ggplot(heightweight,aes(x=sex,y=heightIn ))+
geom_boxplot(aes(x=as.numeric(sex)+0.2),group=sex)+
geom_dotplot(aes(x=as.numeric(sex)-0.2),group=sex,
binaxis = "y",stackdir = 'center',
binwidth = 0.5)
This is a very interesting question. OP is looking to dodge geoms along the x axis, which is not typically difficult to do. The difficulty here lies in that you are dodging the same data using different geoms.
What you can do is use a bit of clever formatting, mapping, and faceting to recreate an example of the type of plot OP shows. For this example solution, I am using the built-in dataset, iris. In the future, OP, please be sure to provide a reproducible example using a built-in dataset, your data, or a sample of your data.
Here's the basic plot showing a dotplot on top of a box plot below - I'll be trying to split the boxplot on the right and dotplot on the left.
ggplot(iris, aes(x=Species, y=Sepal.Width)) +
geom_boxplot(width=0.3) +
geom_dotplot(binaxis = 'y', binwidth=0.04, stackdir = "center")
Dodging is the act of splitting an aesthetic across a specific geom according to a value of a particular column in your data frame. Basically, it means you can have two boxplots next to one another, two points, etc - each one colored or represented differently according to a value for another column in your data. We cannot use dodging to move the boxplot alongside the dotplot because dodging only works across the same geom. You can have two boxplots next to one another for the same specified value of x... but not a boxplot and a dotplot.
The solution here is to draw our geoms individually - effectively "manually" doing the dodging. I can't specify a segment within a specific x value (like "x right" vs. "x left"), so the only way to make this work in my mind is to use faceting to create the actual x positions in the dataset, and the positional information for the dodging is going to be specified in the plot using the x axis. This means each value in x (in this example, each Species) will be kind of a mini plot - dotplot on the left and boxplot on the right.
Here's the code and result:
ggplot(iris, aes(x=positional, y=Sepal.Width)) +
geom_dotplot(aes(x = "1"), binaxis="y", binwidth=0.04, stackdir="center") +
geom_boxplot(aes(x = "2"), width=0.6) +
facet_wrap(~Species, strip.position = "bottom") +
theme(
# panel.spacing.x = unit(0, "npc"),
strip.background = element_blank(),
strip.text = element_text(size=12),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.x = element_blank()
)
What's going on here? Well, you'll notice that I'm mapping x to a new "column" in the dataset called "positional". This column does not exist in the dataset, so I define it separately for geom_boxplot() and geom_dotplot(). You have to do this in aes(), since it's required for mapping, but if you map in aes() to a constant value, the plot will be created as if every observation is set at that value. This is useful, because this creates our dotplot on the left (where positional == "1" and our boxplot on the right (where positional == "2").
The rest of the code is just theme stuff and creating the facets. Note that I use strip.placement to move the facet labels to the bottom, then remove all the other axis elements so that our facet labels take the place as the new axis label.
Finally, you can either keep the spacing between the facets (I kind of like it), or you can also remove that by using another theme() element. Adding theme(panel.spacing = unit(0, "npc")) gives you:

ggplot2 - Add extra space between two legend items

I've created a ggplot2 graph using the basic code below:
my_df %>%
ggplot(aes(conv_norm, vot_norm, color = language:poa)) +
geom_smooth(method = "glm", se=FALSE) +
theme(
...
)
[I've left out the formatting commands from the theme() layer]
And I got a graph that looks like this:
Now, my question is: how can I add extra space only in between two legend items? I've looked online and have found ways to increase the spacing between all items in the legend, but I only want extra spacing between the English items and the Spanish items. Is there a way to add a 1-in distance between these language groups?
Well, I don't know of an elegant, simple solution to do what you are asking to do... but by working with how legends are drawn and adjusting some of the elements, we can come up with a really "hacky" solution. ;)
Here's a sample dataset that kind of simulates what you shared, along with the plot:
set.seed(12345)
my_df <- data.frame(
lang = rep(c(paste('English',1:3), paste('Spanish',1:3)),2),
x = c(rep(0,6), rep(1,6)),
y = rnorm(12, 10,2))
library(ggplot2)
p <- ggplot(my_df, aes(x,y, color=lang)) + geom_line()
p
The approach here is going to be to combine all the following individual steps:
Add a "blank" legend entry. We do this by refactoring and specifying the levels of the column mydf$lang to include a blank entry in the correct position. This will be the final order of the items in the legend.
Use scale_color_manual() to set the colors of the legend items manually. I make sure to use "NA" for the blank entry.
Within scale_color_manual() I use the drop=FALSE setting. This includes all levels for a factor, even if there is no data on the plot to show. This makes our blank entry show on the legend.
Use the legend.background theme element to draw transparent boxes for the legend key items. This is so that you don't have a white or gray box for that blank entry.
Putting it all together you get this:
my_df$lang <- factor(my_df$lang, levels=c(paste('English',1:3), '', paste('Spanish',1:3)))
ggplot(my_df, aes(x,y, color=lang)) +
geom_line() +
scale_color_manual(
values=c(rainbow(6)[1:3], 'NA', rainbow(6)[4:6]),
drop=FALSE) +
theme( legend.key = element_rect(fill='NA') )
Alternatively, you could use guides(color=guide_legend(override.aes... to set the colors, but you need the drop=FALSE part within scale_color_manual() get the blank level to draw in the legend anyway.
Another option would be to create two separate legends. Either by using two different aesthetics, or you can use color twice, e.g with ggnewscale - thanks to user chemdork123 for the fake data +1.
library(tidyverse)
library(ggnewscale)
set.seed(12345)
my_df <- data.frame(
lang = rep(c(paste('English',1:3), paste('Spanish',1:3)),2),
x = c(rep(0,6), rep(1,6)),
y = rnorm(12, 10,2))
ggplot(mapping = aes(x,y)) +
geom_line(data = filter(my_df, grepl("English", lang)), aes(color=lang)) +
scale_color_brewer(NULL, palette = "Dark2") +
new_scale_colour() +
geom_line(data = filter(my_df, grepl("Spanish", lang)), aes(color=lang)) +
scale_color_brewer(palette = "Set1") +
guides(color = guide_legend(order = 1))
Created on 2021-04-11 by the reprex package (v1.0.0)

Adding a legend to a combined line and bargraph ggplot

So I know many people have asked similar questions but the code others have used does not seem to be working for my graph hence why I'm wondering if I have done something wrong.
I have this code:
ggplot(dfMonth)
+ geom_col(aes(x=Month, y=NumberMO), size=.7, colour="black", fill="white")
+ geom_line(aes(x=Month, y=NumberME), size=1, colour="black", group=1)
+ xlab("Month")
+ ylab("No. of birds observed")
+ theme_bw()
+ geom_point(x=Month, y=NumberME)
+ scale_colour_manual("" ,values =c("NumberME"="black"), labels=c("Expected No. of birds"))
+ theme(legend.key=element_blank(),legend.title=element_blank(), legend.box="horizontal")
+ theme(axis.title.x = element_text(margin = unit(c(5, 0, 0, 0), "mm")),
axis.title.y = element_text(margin = unit(c(0,3 , 0, 0), "mm")))
Which produces this graph:
so as you can see, the legend to show what the black line with the points mean has not been added to my graph even though I have inputted the code. No error comes up so hence why I'm lost on whats wrong. Any ideas on what i've failed to include?
Thanks
In order for ggplot to know to draw a legend, you need to include one of the aesthetics for a geom within aes(). In this case, if you want a legend to be drawn for your line, you need to include within the aes() in the geom_line() call one of the aesthetics that you have identified for the line: linetype or color works. We'll use color here.
Oh... and in the absence of OP sharing their dataset, here's a made-up example:
set.seed(1234)
dfMonth <- data.frame(
Month=month.name,
NumberMO=sample(50:380, 12),
NumberME=sample(50:380, 12)
)
Now the code to make the plot and ensure the legend is created.
p <- ggplot(dfMonth, aes(x=Month)) +
geom_col(aes(y=NumberMO), size=0.7, color="black", fill="white") +
geom_line(aes(y=NumberME, color='black'), size=1, group=1)
p
We have a legend, but there's some problems. You get the default title of the legend (which is the name of the aesthetic), and the default label (which is whatever text you put inside aes(color=.... Since we put "black" as the value there, it's applied as the label, and not the actual color. The actual color of the line is to default to the first level of the standard colorset used by ggplot2, which in this case is that light red color.
To set the color, name of the legend, and name of the label, we should specify the value. There's only one item in the legend, so there's no need to specify, but if you were to send a named vector to indicate the name for our single line explicitly, you end up with the somewhat strange-looking c('black'='black'). I also included a line break in the label name to make the look a bit better. Also, the months were running into each other, so I also changed the angle of the x axis labels.
Finally, you might notice the months were out of order. That's because default ggplot2 behavior is to factor a column of discrete values, which uses alphabetical ordering for the levels. To fix that, you specify the column as a factor before plotting with the correct levels.
dfMonth$Month <- factor(dfMonth$Month, levels=month.name)
p + scale_color_manual(
name=NULL, values=c('black'='black'),
labels='Expected No.\nof birds') +
theme(axis.text.x=element_text(angle=30, hjust=1))

How to add legend to ggplot loadings plot?

So created a loadings plot via arrow style using ggplot command. In order to make things easier for graphing, I added a column into the dataframe of my rr.pr$rotation code with colours so that it graphs those arrows based on the colour I specified. The colours that match the arrows are important which is why I did it that way. I am having trouble now adding a legend as ggplot isn't adding a legend.
Is there a way to add one or do I have to do something to the dataframe?
I was thinking of adding the colours manually, but I am getting stuck.
Green represents Sulfated, Orange represents Sialyllated, and Brown represents Neutral. And I would like the legend to show that.
Here is the code:
Dataframe
rrload<-data.frame(rr.pr$rotation[c(2,15,17,24,52),c(1:5)])
rrload$class<-c('orange','springgreen3','bisque3','bisque3','bisque3')
rrload1<-rrload[,c(1:5)]
rrload1<-as.numeric(as.matrix(rrload1))
rrload1<-matrix(rrload1,nrow=5,ncol=5,byrow = F)
rrload[,c(1:5)]<-rrload1
Code for plotting it:
ggplot(rrload)+geom_segment(aes(xend=PC1,yend=PC2),x=0,y=0,arrow = arrowstyle2,color=rrload$class)+
geom_text(aes(x=PC1,y=PC2,label=row.names(rrload)),hjust=0,nudge_x = -0.05,vjust=1,nudge_y = 0.025,size=3.5,color='black')+xlim(-0.3,0.3)+ylim(-0.3,0.3)+theme_light()+
theme_minimal()+theme(legend.title = element_text("Class"),axis.text.x = element_text(colour = "black",size = 10),axis.text.y = element_text(colour = "black",size = 10),axis.title.x = element_text(colour = "black",size = 10),axis.title.y = element_text(colour = "black",size = 10),axis.ticks = element_line(color = "black"),panel.grid = element_blank(), panel.border = element_rect(colour = "black",fill = NA,size = 1))+geom_hline(yintercept = 0,linetype="dashed",color="gray69")+geom_vline(xintercept = 0,linetype="dashed",color="gray69")
This is the graph:
Loadings plot
Without access to your full data (your code is unable to recreate the dataframe, rrload properly), it's hard to help. I managed to estimate the numbers based on the plot you shared. Here's the dataframe I used - note the naming conventions for the columns:
d <- data.frame(
PC1=c(-0.2,-0.2,0.1,0.15,-0.08),
PC2=c(0.13,-0.1,0.2,0.1,-0.2300),
class=c('Neutral','Neutral','Neutral','Sulfated','Silylated'),
name=c('o53','o18','o25','o15','o2')
)
To prepare the data for plotting, I included d$name and d$class. d$class is similar to the column you had, although instead of the color, I'm using the actual name. d$name is the name that I'm using to plot your labels.
Here's the code I used and resulting plot. Explanation will come after:
library(ggrepel)
ggplot(d) + theme_classic() +
geom_vline(xintercept=0, linetype=2, color='gray60') +
geom_hline(yintercept=0, linetype=2, color='gray60') +
geom_segment(
aes(xend=PC1,yend=PC2, color=class), x=0,y=0,
arrow=arrow(type='closed', angle=20, length=unit(0.02,'npc'))
) +
geom_text_repel(
aes(x=PC1, y=PC2, label=name), force=6, min.segment.length = 10, seed=123
) +
ylim(-0.3,0.3) + xlim(-0.3,0.3) +
scale_color_manual(
name='Legend Title',
values=c('Neutral'='bisque3','Sulfated'='springgreen3','Silylated'='orange'))
ggplot2 will create a legend for certain aesthetics, but they must be placed within aes(). Once you do that, ggplot2 will create the legend and automatically assign colors. This means that if we want to create a legend for color=, you need to put it within aes(). The interesting part is that you can put it within aes() anywhere in the call, or just apply to specific geom/geoms. This allows a lot of flexibility in creating your plot. In this case, I only want to color the arrows, so you include color=class within the geom_segment() call. If you put it within the ggplot() call, it would color both the line segment as well as the text geom.
I'm also paying attention to the ordering. We want to make sure the background dotted lines for the central axis at 0,0 are "behind" everything, so they go first. Then the segments, and then the text geom.
The scale_color_manual() function is used to specify the colors for the different d$class values explicitly and the name of the legend. You can also just let ggplot2 find a palette by default, or you can specify via a palette (there are a ton of other methods to specify color). BTW - you can also specify the name of the legend via labs(color=....
Finally, I decided to use geom_text_repel() rather than geom_text(). Since the lines go out in every direction, the "nudge" values for each text item are not going to work going in the same direction. In other words, if you plot the text at x=PC1, y=PC2, it will overlap the arrowheads. You noticed this too and applied nudge_ values, which happens to work, but if your data was a bit different, it would not have worked. geom_text_repel from the ggrepel package can work to do this by kind of "pushing" the text away from your points.

ggplot: combining size and color in legend

I've only very recently started learning R. Now what I'm trying to do is to integrate two legends for the same plot. In other words, I want the default size legend to change color depending on it's size.
I have been Googling several solutions that apparently all don't seem to work, but again, I'm new to R so maybe I'm just doing something wrong.
My code:
ggplot(Caschool, aes(x=testscr, y=avginc), colour="green") +
geom_point(aes(size=enrltot, color=enrltot)) +
geom_smooth(colour="blue") +
labs(x="Test Score", y="Average Income", title="California Test Score Data", color="Number of Students\nPer District") +
theme(
panel.grid.minor = element_blank(),
panel.grid.major=element_line(colour="grey", size=0.4),
panel.background=element_rect(fill="beige"),
axis.line=element_line(size = 1.2, colour = "black"),
plot.title = element_text(size = rel(2))) +
scale_color_continuous(limits=c(0, 30000), breaks=seq(0,30000, by=2500)) +
guides(color= guide_legend(), size=guide_legend())
Apparently, I'm not allowed to post pictures, or I would have shown what this looks like so far.
ggplot2 can indeed combine size and colour legends into one, however, this only works, if they are compatible: they need to have exactly the same breaks, otherwise they can not be combined.
Let me make an example: Assume, you have values between 0 and 10 that you want to map on size and colour. You tell ggplo2 to use small points for values below 5 and large points for larger value. It will then plot a legend with a small and a large point, as expected. Now, you also want to add colour and you require points below 3 to be green and points above to be blue. ggplot2 will also draw a legend for this, but it is impossible to combine the two legends. The small point would have to be both, green and blue. The problem can be solved by using the same breaks for colour and size.
In your example, you manually change the breaks of the colour scale, but not those of the size scale. This results in incompatible legends that can not be combined.
I can not demonstrate this using your date, because I don't have it. So I will create an example with mtcars. The variant with incompatible legends is constructed as follows:
p <- ggplot(mtcars, aes(x=mpg, y=drat)) +
geom_point(aes(size=gear, color=gear)) +
scale_color_continuous(limits=c(2, 5), breaks=seq(2, 5, by=0.5)) +
guides(color= guide_legend(), size=guide_legend())
which gives the following plot:
If I now add the same breaks for size,
p + scale_size_continuous(limits=c(2, 5), breaks=seq(2, 5, by=0.5))
I get a plot with only one legend:
For your code, this means that you should add the following to your plot:
+ scale_size_continuous(limits=c(0, 30000), breaks=seq(0,30000, by=2500))
A little side remark: What do you intend by using colour = "green" in your call to ggplot? I don't see that this has any effect at all, because you set the colour again in both geoms that you use later. Maybe a relic from an older variant of the plot?

Resources