Plotting with secondary axis - r

I am trying to overlay a bar graph (primary axis) and line (secondary axis), but I keep getting an error that I don't understand how to fix. I have tried to follow multiple examples from other questions, but I'm still not getting the result I need.
Here's my code:
ggplot(data = MRIP, aes(x = Length_mm)) +
geom_bar(aes(y = Perc.of.Fish), stat="identity", width = 10, fill = "black") +
geom_line(aes(y = Landings), stat = "identity", size = 2, color = "red") +
scale_y_continuous(name = "Percentage", sec.axis = sec_axis (~./Landings, name = "Landings"))
How do I fix this error: "Error in f(...) : object 'Landings' not found"?

Try this:
coef <- 4000
MRIP %>%
mutate(LandingsPlot=Landings/coef) %>%
ggplot(aes(x = Length_mm)) +
geom_col(aes(y = Perc.of.Fish), width = 10, fill = "black") +
geom_line(aes(y = LandingsPlot), size = 2, color = "red") +
scale_y_continuous(
name = "Percentage",
sec.axis = sec_axis (trans= ~.*coef, name = "Landings")
)
Giving
Why does this work? The scale factor used to define the secondary axis cannot be part of the input data.frame - because if it were, it could potentially vary across rows (as it does here). That would mean you had a separate scale for each row of the input data.frame. That doesn't make sense. So, you scale the secondary variable to take a similar range to that of the primary variable. I chose coef <- 4000 by eye. The exact value doesn't matter, so long as it's sensible.
Having divided by the scale factor to obtain the plotted values, you need to multiply by the scale factor in the transformation in order to get the correct labels on the secondary axis.
Thank you for providing a good MWE. But next time, for extra marks, please post the results of dput() in your question rather than in the comments...
Update
To answer OP's follow up question in the comments: legends are linked to aesthetics. So to get a legend, move the attribute that you want to label inside aes(). You can then define and customise the legend using the appropriate scale_<aesthetic>_<type>. However it's worth noting that if you write, say, aes(colour="black") then "black" is just a character string. It doesn't define a colour. (Using the standard defaults, it will in fact appear as a slightly pinkish red, labelled "black"!) This can be confusing, so it might be a good idea to use arbitrary strings like "a", "b" and "c" (or "Landings" and "Percentage") in the aesthetics. Anyway...
coef <- 4000
#Note fill and color have moved inside aes()
MRIP %>%
mutate(LandingsPlot=Landings/coef) %>%
ggplot(aes(x = Length_mm)) +
geom_col(aes(y = Perc.of.Fish, fill = "black"), width = 10,) +
geom_line(aes(y = LandingsPlot, color = "red"), size = 2) +
scale_y_continuous(
name = "Percentage",
sec.axis = sec_axis (trans= ~.*coef, name = "Landings")
) +
scale_color_manual(values=c("red"), labels=c("Landings"), name=" ") +
scale_fill_manual(values=c("black"), labels=c("Percentage"), name=" ")
Gives

Related

Is there a way to add multiple horizontal lines to a plot with different symbology and a legent?

I need to add horizontal lines (similar to geom_hline) to a plot but retain the ability to add symbology (e.g. different dashed or dotted line) to each and have them appear in a legend. The y-intercept of each comes from the "Concentration" variable and the label needs to come from "Type". I've created an example dataset at the bottom of the question but the one I'm working with is a lot larger, with about 30 different Pesticides and up to 4 Types within each. I am hoping to split and sapply to automate plotting across the entire dataset.
Here is the plot code that I've tried so far and you can see the geom_hline works (as in, the lines appear on the plot), but it doesnt give the ability to change symbology of each or add a legend. Please ignore the box plots and jitter - they arent relevant to this question.
Is there a way i can add these lines to the plot, each with different symbology and a legend to capture that?
Thanks in advance
Plot2 <- ggplot(PestDat, aes(x=factor(Phyla), y=Conc_ug)) +
geom_boxplot(color="dark gray", fill = 'dark gray', alpha = 0.1) +
geom_jitter(aes(shape = Pathway), height=.3, width=.3, size = 1.5) +
scale_y_continuous(trans='log10') +
geom_hline(data = PestDat,aes(yintercept = Concentration),linetype="dashed") +
labs(x = "Phyla", y = "Log10 Concentration (µg/L)") +
labs(title = (unique(ToxDat_Diuron$Pesticide)),
subtitle = "Do we want a subtitle for the plot?",
caption = "Source: Citation, Reliability ?? ")
Plot2
Sample data
PestDat <- data.frame(
Pest = c("Diuron","Diuron","Diuron","Diuron","Atrazine"),
Type = c("PC99","PC95","PC90","PC80","ETV"),
Concentration = c(0.1,0.2,0.3,0.4,0.7),
stringsAsFactors = FALSE)
We can put the linetype aesthetic inside the aes() term to make it use the variable from the data to categorize the type of line. Note that linetype wants a discrete variable, so in this case I used A-B-C.
ggplot(PestDat, aes(x=factor(Pest), y=Concentration)) +
geom_boxplot(color="dark gray", fill = 'dark gray', alpha = 0.1) +
geom_jitter(aes(shape = Type), height=.3, width=.3, size = 1.5) +
scale_y_continuous(trans='log10') +
labs(x = "Phyla", y = "Log10 Concentration (µg/L)",
subtitle = "Do we want a subtitle for the plot?") +
geom_hline(data = data.frame(y_int = c(0.6,0.9, 1.1),
type = c("C","B","A")),
aes(yintercept = y_int, linetype = type))

R code of scatter plot for three variables

Hi I am trying to code for a scatter plot for three variables in R:
Race= [0,1]
YOI= [90,92,94]
ASB_mean = [1.56, 1.59, 1.74]
Antisocial <- read.csv(file = 'Antisocial.csv')
Table_1 <- ddply(Antisocial, "YOI", summarise, ASB_mean = mean(ASB))
Table_1
Race <- unique(Antisocial$Race)
Race
ggplot(data = Table_1, aes(x = YOI, y = ASB_mean, group_by(Race))) +
geom_point(colour = "Black", size = 2) + geom_line(data = Table_1, aes(YOI,
ASB_mean), colour = "orange", size = 1)
Image of plot: https://drive.google.com/file/d/1E-ePt9DZJaEr49m8fguHVS0thlVIodu9/view?usp=sharing
Data file: https://drive.google.com/file/d/1UeVTJ1M_eKQDNtvyUHRB77VDpSF1ASli/view?usp=sharing
Can someone help me understand where I am making mistake? I want to plot mean ASB vs YOI grouped by Race. Thanks.
I am not sure what is your desidered output. Maybe, if I well understood your question I Think that you want somthing like this.
g_Antisocial <- Antisocial %>%
group_by(Race) %>%
summarise(ASB = mean(ASB),
YOI = mean(YOI))
Antisocial %>%
ggplot(aes(x = YOI, y = ASB, color = as_factor(Race), shape = as_factor(Race))) +
geom_point(alpha = .4) +
geom_point(data = g_Antisocial, size = 4) +
theme_bw() +
guides(color = guide_legend("Race"), shape = guide_legend("Race"))
and this is the output:
#Maninder: there are a few things you need to look at.
First of all: The grammar of graphics of ggplot() works with layers. You can add layers with different data (frames) for the different geoms you want to plot.
The reason why your code is not working is that you mix the layer call and or do not really specify (and even mix) what is the scatter and line visualisation you want.
(I) Use ggplot() + geom_point() for a scatter plot
The ultimate first layer is: ggplot(). Think of this as your drawing canvas.
You then speak about adding a scatter plot layer, but you actually do not do it.
For example:
# plotting antisocal data set
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race)))
will plot your Antiscoial data set using the scatter, i.e. geom_point() layer.
Note that I put Race as a factor to have a categorical colour scheme otherwise you might end up with a continous palette.
(II) line plot
In analogy to above, you would get for the line plot the following:
# plotting Table_1
ggplot() +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean))
I save showing the plot of the line.
(III) combining different layers
# putting both together
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race))) +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean)) +
## this is to set the legend title and have a nice(r) name in your colour legend
labs(colour = "Race")
This yields:
That should explain how ggplot-layering works. Keep an eye on the datasets and geoms that you want to use. Before working with inheritance in aes, I recommend to keep the data= and aes() call in the geom_xxxx. This avoids confustion.
You may want to explore with geom_jitter() instead of geom_point() to get a bit of a better presentation of your dataset. The "few" points plotted are the result of many datapoints in the same position (and overplotted).
Moving away from plotting to your question "I want to plot mean ASB vs YOI grouped by Race."
I know too little about your research to fully comprehend what you mean with that.
I take it that the mean ASB you calculated over the whole population is your reference (aka your Table_1), and you would like to see how the Race groups feature vs this population mean.
One option is to group your race data points and show them as boxplots for each YOI.
This might be what you want. The boxplot gives you the median and quartiles, and you can compare this per group against the calculated ASB mean.
For presentation purposes, I highlighted the line by increasing its size and linetype. You can play around with the colours, etc. to give you the aesthetics you aim for.
Please note, that for the grouped boxplot, you also have to treat your integer variable YOI, I coerced into a categorical factor. Boxplot works with fill for the body (colour sets only the outer line). In this setup, you also need to supply a group value to geom_line() (I just assigned it to 1, but that is arbitrary - in other contexts you can assign another variable here).
ggplot() +
geom_boxplot(data = Antisocial, aes(x = as.factor(YOI), y = ASB, fill = as.factor(Race))) +
geom_line(data = Table_1, aes(x = as.factor(YOI), y = ASB_mean, group = 1)
, size = 2, linetype = "dashed") +
labs(x = "YOI", fill = "Race")
Hope this gets you going!

ggplot2 geom_points won't colour or dodge

So I'm using ggplot2 to plot both a bar graph and points. I'm currently getting this:
As you can see the bars are nicely separated and colored in the desired colors. However my points are all uncolored and stacked ontop of eachother. I would like the points to be above their designated bar and in the same color.
#Add bars
A <- A + geom_col(aes(y = w1, fill = factor(Species1)),
position = position_dodge(preserve = 'single'))
#Add colors
A <- A + scale_fill_manual(values = c("A. pelagicus"= "skyblue1","A. superciliosus"="dodgerblue","A. vulpinus"="midnightblue","Alopias sp."="black"))
#Add points
A <- A + geom_point(aes(y = f1/2.5),
shape= 24,
size = 3,
fill = factor(Species1),
position = position_dodge(preserve = 'single'))
#change x and y axis range
A <- A + scale_x_continuous(breaks = c(2000:2020), limits = c(2016,2019))
A <- A + expand_limits(y=c(0,150))
# now adding the secondary axis, following the example in the help file ?scale_y_continuous
# and, very important, reverting the above transformation
A <- A + scale_y_continuous(sec.axis = sec_axis(~.*2.5, name = " "))
# modifying axis and title
A <- A + labs(y = " ",
x = " ")
A <- A + theme(plot.title = element_text(size = rel(4)))
A <- A + theme(axis.text.x = element_text(face="bold", size=14, angle=45),
axis.text.y = element_text(face="bold", size=14))
#A <- A + theme(legend.title = element_blank(),legend.position = "none")
#Print plot
A
When I run this code I get the following error:
Error: Unknown colour name: A. pelagicus
In addition: Warning messages:
1: Width not defined. Set with position_dodge(width = ?)
2: In max(table(panel$xmin)) : no non-missing arguments to max; returning -Inf
I've tried a couple of things but I can't figure out it does work for geom_col and not for geom_points.
Thanks in advance
The two basic problems you have are dealing with your color error and not dodging, and they can be solved by formatting your scale_...(values= argument using a list instead of a vector, and applying the group= aesthetic, respectively.
You'll see the answer to these two question using an example:
# dummy dataset
year <- c(rep(2017, 4), rep(2018, 4))
species <- rep(c('things', 'things1', 'wee beasties', 'ew'), 2)
values <- c(10, 5, 5, 4, 60, 10, 25, 7)
pt.value <- c(8, 7, 10, 2, 43, 12, 20, 10)
df <-data.frame(year, species, values, pt.value)
I made the "values" set for my column heights and I wanted to use a different y aesthetic for points for illustrative purposes, called "pt.value". Otherwise, the data setup is similar to your own. Note that df$year will be set as numeric, so it's best to change that into either Date format (kinda more trouble than it's worth here), or just as a factor, since "2017.5" isn't gonna make too much sense here :). The point is, I need "year" to be discrete, not continuous.
Solve the color error
For the plot, I'll try to create it similar to you. Here note that in the scale_fill_manual object, you have to set the values= argument using a list. In your example code, you are using a vector (c()) to specify the colors and naming. If you have name1=color1, name2=color2,..., this represents a list structure.
ggplot(df, aes(x=as.factor(year), y=values)) +
geom_col(aes(fill=species), position=position_dodge(width=0.62), width=0.6) +
scale_fill_manual(values=
list('ew' = 'skyblue1', 'things' = 'dodgerblue',
'things1'='midnightblue', 'wee beasties' = 'gray')) +
geom_point(aes(y=pt.value), shape=24, position=position_dodge(width=0.62)) +
theme_bw() + labs(x='Year')
So the colors are applied correctly and my axis is discrete, and the y values of the points are mapped to pt.value like I wanted, but why don't the points dodge?!
Solve the dodging issue
Dodging is a funny thing in ggplot2. The best reasoning here I can give you is that for columns and barplots, dodging is sort of "built-in" to the geom, since the default position is "stack" and "dodge" represents an alternative method to draw the geom. For points, text, labels, and others, the default position is "identity" and you have to be more explicit in how they are going to dodge or they just don't dodge at all.
Basically, we need to let the points know what they are dodging based on. Is it "species"? With geom_col, it's assumed to be, but with geom_point, you need to specify. We do that by using a group= aesthetic, which let's the geom_point know what to use as criteria for dodging. When you add that, it works!
ggplot(df, aes(x=as.factor(year), y=values, group=species)) +
geom_col(aes(fill=species), position=position_dodge(width=0.62), width=0.6) +
scale_fill_manual(values=
list('ew' = 'skyblue1', 'things' = 'dodgerblue',
'things1'='midnightblue', 'wee beasties' = 'gray')) +
geom_point(aes(y=pt.value), shape=24, position=position_dodge(width=0.62)) +
theme_bw() + labs(x='Year')

Discrete values and geom_ribbon and geom_lines + problems with "discrete" scale

I have got a file like this one:
Month,Open,Closed
2017-08,53,38
2017-09,102,85
2017-10,58,38
2017-11,51,42
2017-12,32,24
2018-01,24,30
2018-02,56,46
2018-03,82,74
2018-04,95,89
2018-05,16,86
I want to plot both lines, and also shade the difference between them. So this works:
ggplot() +geom_line(data=issues.m,aes(x=Month,y=Open,group=1))
+geom_line(data=issues.m,aes(x=Month,y=Closed,group=1))
+geom_ribbon(data=issues.m, aes(x=Month,ymin=Closed,ymax=Open,color=Open-Closed))
+theme_tufte()
+theme(axis.text.x = element_text(angle = 90, hjust = 1))
producing this
First problem here is that I would like the whole area between the two lines shaded if possible, not a single line. How can I do that?
But I would also like to color the two lines. If I add a color to one of them:
ggplot()
+geom_line(data=issues.m,aes(x=Month,y=Open,group=1,color='open'))
+geom_line(data=issues.m,aes(x=Month,y=Closed,group=1))
+geom_ribbon(data=issues.m, aes(x=Month,ymin=Closed,ymax=Open,color=Open-Closed))
+theme_tufte()
+theme(axis.text.x = element_text(angle = 90, hjust = 1))
I get the error:
Error: Continuous value supplied to discrete scale
So, can what I want to do be done at all? Would it be possible to change the colour palette of the ribbon too?
Your error was because you were mapping Open - Closed onto the color, which will be a continuous variable, i.e. the difference between those two values for each month. But you also assigned "open" to color inside the aes in one of your geom_lines. That means you're trying to assign both continuous values and discrete values to the same scale, and that's not going to work.
If all you need to do is get 2 colors, one for each line, you can do this one of two ways, the second of which fits more into the ggplot/tidyverse way of doing things.
First off I turned your dates into date objects to clean up the x-axis and avoid rotating the labels—feel free to experiment with the date breaks that work well in scale_x_date.
The less "tidy" way is to just make two geom_lines, one for Open and one for Closed, and assign a color to each.
library(tidyverse)
df_dated <- df %>%
mutate(month2 = sprintf("%s-01", Month) %>% lubridate::ymd())
ggplot(df_dated, aes(x = month2)) +
geom_ribbon(aes(ymin = Open, ymax = Closed), fill = "lightblue2") +
geom_line(aes(y = Open), color = "green3") +
geom_line(aes(y = Closed), color = "red") +
ggthemes::theme_tufte()
But the more idiomatically "tidy" way is to make a long-shaped version of the data so you can map a variable—in this case whether an observation is the opening or closing value—onto an aesthetic such as color. This also gives you a legend—if you don't want it, you can get rid of it in the theme. This lets you set a scale for the colors, instead of hard-coding into each geom_line.
df_date_long <- df_dated %>%
gather(key, value, -month2, -Month)
ggplot(df_dated, aes(x = month2)) +
geom_ribbon(aes(ymin = Open, ymax = Closed), fill = "lightblue2") +
geom_line(aes(y = value, color = key), data = df_date_long) +
scale_color_manual(values = c(Open = "green3", Closed = "red")) +
ggthemes::theme_tufte()

'Merging' two plots using ggplot2 and R

Given the following dataset:
data = cbind(1:10,c('open','reopen','closed'),letters[1:3],1:10)
data = rbind(data,cbind(1:10,c('open','closed','reopen'),letters[1:3],5:10))
data = rbind(data,cbind(1:10,c('closed','open','reopen'),letters[1:3],3:10))
data = data.frame(data);
colnames(data) <- c("id","status","author","when")
I'd like to get a plot similar to the following:
ggplot(data, aes(when,id)) +
geom_line(aes(group = id,colour = status)) +
geom_point(aes(group = id,colour = author))
But, as such I get a single legend by 'author' with the status and author values. How can I get the same result but with a legend for author and other for status? My rationale is that I want to layer two plots of the same dataset on top of each other.
I don't think you can have different color scales / legends for one ggplot. You could hack something together (see this question for legend hacking), but in this case where one of your geom's is point, you could just use fill and one of the point options that are filled in.
ggplot(data, aes(when,id)) +
geom_line(aes(group = id,colour = status)) +
geom_point(aes(group = id, fill = author),
shape = 21, color = NA, size = 4)
Here the colors used are the same for each, but you can edit the color or fill scales individually, e.g., adding
scale_fill_brewer(type = "qual") +
scale_color_brewer(type = "qual", palette = 2)
I do agree with AndyClifton that using color in two ways will be hard to distinguish. You could also experiment with line types, point shapes, or even plotting with geom_text using a word, a letter, or a number as a label instead of points. You say you have more than 6 values for author, but it will be very difficult to distinguish more than 6 colors for author, especially when color is also being used for status.
Let's take your data. First you should be aware that you have a problem that your when and id column is a string, so you are plotting 1, 10, 2, 3, ... not 1,...9,10. We can fix that:
data$when.num <-as.numeric(as.character(data$when))
data$id.num <-as.numeric(as.character(data$id))
Then we'll plot it but use different shapes to get two different legends:
require(ggplot2)
p <- ggplot(data, aes(x = when.num, y = id)) +
geom_line(aes(group = id,colour = status)) +
geom_point(aes(group = id,shape = author))
print(p)
And you get this:
I think this is much clearer than using coloured points for the author, but this is a question of taste.

Resources