adding a key for geom_line to legend from geom_area - r

I have a data frame, where I am talking about different flows of water at a dam (water units are kcfs—1000 cubic feet per second—if anyone is interested)
Call it df4plot
date kcfs Flowtype
10/1/2010 50 Power
10/1/2010 10 Spill_Overgen
10/1/2010 8 Spill_Force
10/2/2010 52 Power
10/2/2010 7 Spill_Overgen
10/2/2010 10 Spill_Force
(there are 3x365 rows in the data frame)
So what I want to do is make an aggregated area graph that shows each of these flows
p <- ggplot(data = df4plot, aes(date,kcfs)) +
geom_area(aes(colour = Flowtype, fill=Flowtype), position = “stack”)
I want to control the colors used, so I added
plot_colors_aggregate <- c("forestgreen","lightsalmon","dodgerblue")
p <- p +
scale_color_manual(values = plot_colors_aggregate) +
scale_fill_manual(values = plot_colors_aggregate)
Now I want to add a dashed line, showing the maximum turbine capacity—the flow limits for power generation—that vary by month. I have a separate dataframe for this (365 rows long), df4FGline
Date FGlimit
10/1/2010 52
10/2/2010 52
…
11/1/2010 60
11/2/2010 60
...
Etc
So now I have
p <- p +
geom_line(data = df4FGline, aes(x=date,y=FGlimit), colour = “darkblue”, linetype = “dashed”)
p
The legend is currently just the three blocks for the three types of Flowtype. I’d like to add the dashed line for the flow gate limits to the bottom, but I can’t get it to show up there.
It is probably related to my incomplete understanding of aes (help(aes) is AMAZINGLY unhelpful).
I’ve tried something similar to this and something similar to this, but since I’m only trying to add 1 line to a pre-existing legend, maybe?, this is not working for me.
I tried adding “legend = TRUE” inside the parentheses for the geom_line, but it put a dashed line inside each color box in the legend, AND created a 4th entry for the legend, but offset from the rest of the legend (below and to the right)... ARG!
I swear I have the book on order... any help you can share so that I understand this aesthetic thing and how it relates to the legend a little better, I'd be extremely grateful.
edited for typo

This should help:
df <- data.frame(x = 1:10,y = 1:10)
ggplot(df,aes(x = x,y = y)) +
geom_line(aes(linetype = "dashed")) +
scale_linetype_manual(name = "Linetype",values = "dashed")

Related

How to add legend to plot with data from multiple data frames

I have scripted a ggplot compiled from two separate data frames, but as it stands there is no legend as the colours aren't included in aes. I'd prefer to keep the two datasets separate if possible, but can't figure out how to add the legend. Any thoughts?
I've tried adding the colours directly to the aes function, but then colours are just added as variables and listed in the legend instead of colouring the actual data.
Plotting this with base r, after creating the plot I would've used:
legend("top",c("Delta 18O","Delta 13C"),fill=c("red","blue")
and gotten what I needed, but I'm not sure how to replicate this in ggplot.
The following code currently plots exactly what I want, it's just missing the legend... which ideally should match what the above line would produce, except the "18" and "13" need superscripted.
Examples of an old plot using base r (with a correct legend, except lacking superscripted 13 and 18) and the current plot missing the legend can be found here:
Old: https://imgur.com/xgd9e9C
New, missing legend: https://imgur.com/eGRhUzf
Background data
head(avar.data.x)
time av error
1 1.015223 0.030233604 0.003726832
2 2.030445 0.014819145 0.005270609
3 3.045668 0.010054801 0.006455241
4 4.060891 0.007477541 0.007453974
5 5.076113 0.006178282 0.008333912
6 6.091336 0.004949045 0.009129470
head(avar.data.y)
time av error
1 1.015223 0.06810001 0.003726832
2 2.030445 0.03408136 0.005270609
3 3.045668 0.02313839 0.006455241
4 4.060891 0.01737148 0.007453974
5 5.076113 0.01405144 0.008333912
6 6.091336 0.01172788 0.009129470
The following avarn function produces a data frame with three columns and several thousand rows (see header above). These are then graphed over time on a log/log plot.
avar.data.x <- avarn(data3$"d Intl. Std:d 13C VPDB - Value",frequency)
avar.data.y <- avarn(data3$"d Intl. Std:d 18O VPDB-CO2 - Value",frequency)
Create allan deviation plot
ggplot()+
geom_line(data=avar.data.y,aes(x=time,y=sqrt(av)),color="red")+
geom_line(data=avar.data.x,aes(x=time,y=sqrt(av)),color="blue")+
scale_x_log10()+
scale_y_log10()+
labs(x=expression(paste("Averaging Time ",tau," (seconds)")),y="Allan Deviation (per mil)")
The above plot is only missing a legend to show the name of the two plotted datasets and their respective colours. I would like the legend in the top centre of the graph.
How to superscript legend titles?:
ggplot()+
geom_line(data=avar.data.y,aes(x=time,y=sqrt(av),
color =expression(paste("Delta ",18^,"O"))))+
geom_line(data=avar.data.xmod,aes(x=time,y=sqrt(av),
color=expression(paste("Delta ",13^,"C"))))+
scale_color_manual(values = c("blue", "red"),name=NULL) +
scale_x_log10()+
scale_y_log10()+
labs(
x=expression(paste("Averaging Time ",tau," (seconds)")),
y="Allan Deviation (per mil)") +
theme(legend.position = c(0.5, 0.9))
Set color inside the aes and add a scale_color_ function to your plot should do the trick.
ggplot()+
geom_line(data=avar.data.y,aes(x=time,y=sqrt(av), color = "a"))+
geom_line(data=avar.data.x,aes(x=time,y=sqrt(av), color="b"))+
scale_color_manual(
values = c("red", "blue"),
labels = expression(avar.data.x^2, "b")
) +
scale_x_log10()+
scale_y_log10()+
labs(
x=expression(paste("Averaging^2 Time ",tau," (seconds)")),
y="Allan Deviation (per mil)") +
theme(legend.position = c(0.5, 0.9))
You can make better use of ggplot's aesthetics by combining both data sets into one. This is particularly easy when your data frames have the same structure. Here, you could then for example use color.
This way you only need one call to geom_line and it is easier to control the legend(s). You could even make some fancy function to automate your labels. etc.
Also note that white spaces in column names are not great (you're making your own life very difficult) and that you may want to think about automating your avarn calls, e.g. with lapply, which would result in a list of data frames and makes the binding of the data frames even easier.
avar.data.x <- readr::read_table("0 time av error
1 1.015223 0.030233604 0.003726832
2 2.030445 0.014819145 0.005270609
3 3.045668 0.010054801 0.006455241
4 4.060891 0.007477541 0.007453974
5 5.076113 0.006178282 0.008333912
6 6.091336 0.004949045 0.009129470")
avar.data.y <- readr::read_table("0 time av error
1 1.015223 0.06810001 0.003726832
2 2.030445 0.03408136 0.005270609
3 3.045668 0.02313839 0.006455241
4 4.060891 0.01737148 0.007453974
5 5.076113 0.01405144 0.008333912
6 6.091336 0.01172788 0.009129470")
library(tidyverse)
combine_df <- bind_rows(list(a = avar.data.x, b = avar.data.y), .id = 'ID')
ggplot(combine_df)+
geom_line(aes(x = time, y = sqrt(av), color = ID))+
scale_color_manual(values = c("red", "blue"),
labels = c(expression("Delta 18"^"O"), expression("Delta 13"^"C")))
Created on 2019-11-11 by the reprex package (v0.2.1)

What am I missing to build this plot?

I am trying to make a Rare Earth Elements spider diagram that places concentration in log10 on the y-axis, and each respective element from the Rare Earth Elements on the x-axis. I then am trying to compare several units of rock with each other. An example of what I am looking for and what I am getting is added to the google doc link below.
So, with the code I have added I have two problems:
1. The elements are being listed on the x-axis in alphabetical order, not in the order that I have in my CSV
2. I don't know what I am missing in my code to correlate the points together in each sample to build a line. I couple this with not knowing if that is an issue with my code, or with the way my data is arranged in the CSV.
I have seen someone else tackle this issue by treating the respective elements as dates. I have played with lubridate a bit, but I feel like it wasn't as successful as the code that I've added below... which is saying something.
ggplot(data=dataMGSREE) +
geom_point(mapping = aes(x = Concentration, y = Element, color=Group), show.legend = FALSE) +
coord_flip() +
scale_x_log10()
Analysis Name Element Concentration
HM030218-2 Haycock Upper La 65.00
HM030218-2 Haycock Upper Ce 127.00
HM030218-2 Haycock Upper Pr 13.46
HM030218-2 Haycock Upper Nd 44.00
HM030218-2 Haycock Upper Sm 6.70
HM030218-2 Haycock Upper Eu 0.75
HM030218-2 Haycock Upper Gd 4.48
HM030218-2 Haycock Upper Tb 0.64
HM030218-2 Haycock Upper Dy 3.40
HM030218-2 Haycock Upper Ho 0.73
1-10 of 14 rows
Something similar to the expected result is listed above, while the actual result is here:https://docs.google.com/document/d/1p7QY8Ie_bmav1XApTSy1TCECvteUcxckZXpsy9Ib7Ew/edit?usp=sharing
Please forgive me also for not knowing how to upload the screenshots on here.
A few things going on
(a) If you want lines, you need to add geom_line() to your plot. You'll also need to add a group aesthetic to indicate which points to connect, presumably group = Analysis inside aes(). This is necessary whenever you plot a line with a discrete variable on an axis.
(b) See this FAQ for getting a custom order of your elements.
(c) If you want points and lines, put aes() inside the original ggplot() call, it will be passed on to both geom_point() and geom_line() so you don't have to re-specify it in subsequent layers
(d) I don't see a reason to use coord_flip here, I'd just map what you want to go on x and y from the start
(e) You don't show a column called Group in your data, so I'm surprised your color = Group works at all...
Something like this:
# change factor levels to order they occur
# you could also custom-specify an order, with, e.g., `levels = c("Li", "Ce", "Pr", ...)`
dataMGSREE$Element = factor(dataMGSREE$Element, levels = unique(dataMGSREE$Element))
# plot with changes explained above
ggplot(data = dataMGSREE,
mapping = aes(x = Element, y = Concentration, color = Analysis, group = Analysis)) +
geom_point(show.legend = FALSE) +
geom_line() +
scale_y_log10()
The axis ordering for discrete data like Element is determined by how the factor levels are set. It looks like here the factor levels should be in the same order they already are in the data, so you can do:
dataMGSREE$Element = factor(dataMGSREE$Element, levels = dataMGSREE$Element)
ggplot(data=dataMGSREE) +
# I set color = Analysis here because the example data didn't
# contain a Group column, replace as appropriate
geom_point(mapping = aes(x = Concentration, y = Element, color=Analysis),
show.legend = FALSE) +
coord_flip() +
scale_x_log10()

Customize linetype in ggplot2 OR add automatic arrows/symbols below a line

I would like to use customized linetypes in ggplot. If that is impossible (which I believe to be true), then I am looking for a smart hack to plot arrowlike symbols above, or below, my line.
Some background:
I want to plot some water quality data and compare it to the standard (set by the European Water Framework Directive) in a red line. Here's some reproducible data and my plot:
df <- data.frame(datum <- seq.Date(as.Date("2014-01-01"),
as.Date("2014-12-31"),by = "week"),y=rnorm(53,mean=100,sd=40))
(plot1 <-
ggplot(df, aes(x=datum,y=y)) +
geom_line() +
geom_point() +
theme_classic()+
geom_hline(aes(yintercept=70),colour="red"))
However, in this plot it is completely unclear if the Standard is a maximum value (as it would be for example Chloride) or a minimum value (as it would be for Oxygen). So I would like to make this clear by adding small pointers/arrows Up or Down. The best way would be to customize the linetype so that it consists of these arrows, but I couldn't find a way.
Q1: Is this at all possible, defining custom linetypes?
All I could think of was adding extra points below the line:
extrapoints <- data.frame(datum2 <- seq.Date(as.Date("2014-01-01"),
as.Date("2014-12-31"),by = "week"),y2=68)
plot1 + geom_point(data=extrapoints, aes(x=datum2,y=y2),
shape=">",size=5,colour="red",rotate=90)
However, I can't seem to rotate these symbols pointing downward. Furthermore, this requires calculating the right spacing of X and distance to the line (Y) every time, which is rather inconvenient.
Q2: Is there any way to achieve this, preferably as automated as possible?
I'm not sure what is requested, but it sounds as though you want arrows at point up or down based on where the y-value is greater or less than some expected value. If that's the case, then this satisfies using geom_segment:
require(grid) # as noted by ?geom_segment
(plot1 <-
ggplot(df, aes(x=datum,y=y)) + geom_line()+
geom_segment(data = data.frame( df$datum, y= 70, up=df$y >70),
aes(xend = datum , yend =70 + c(-1,1)[1+up]*5), #select up/down based on 'up'
arrow = arrow(length = unit(0.1,"cm"))
) + # adjust units to modify size or arrow-heads
geom_point() +
theme_classic()+
geom_hline(aes(yintercept=70),colour="red"))
If I'm wrong about what was desired and you only wanted a bunch of down arrows, then just take out the stuff about creating and using "up" and use a minus-sign.

ggplot scale_fill_discrete(breaks = user_countries) creates a second, undesired legend

I am trying to change the factor level ordering of a data frame column to control the legend ordering and ggplot coloring of factor levels specified by country name. Here is my dataframe country_hours:
countries hours
1 Brazil 17
2 Mexico 13
3 Poland 20
4 Indonesia 2
5 Norway 20
6 Poland 20
Here is how I try to plot subsets of the data frame depending on a list of selected countries, user_countries:
make_country_plot<-function(user_countries, country_hours_pre)
{
country_hours = country_hours_pre[which(country_hours_pre$countries %in% user_countries) ,]
country_hours$countries = factor(country_hours$countries, levels = c(user_countries))
p = ggplot(data=country_hours, aes(x=hours, color=countries))
for(name in user_countries){
p = p + geom_bar( data=subset(country_hours, countries==name), aes(y = (..count..)/sum(..count..), fill=countries), binwidth = 1, alpha = .3)
}
p = p + scale_y_continuous(labels = percent) + geom_density(size = 1, aes(color=countries), adjust=1) +
ggtitle("Baltic countries") + theme(plot.title = element_text(lineheight=.8, face="bold")) + scale_fill_discrete(breaks = user_countries)
}
This works great in that the coloring goes according to my desired order as does the top legend, but a second legend appears and shows a different order. Without scale_fill_discrete(breaks = user_countries) I do not get my desired order, but I also do not get two legends. In the plot shown below, the desired order, given by user_countries was
user_countries = c("Lithuania", "Latvia", "Estonia")
I'd like to get rid of this second legend. How can I do it?
I also have another problem, which is that the plotting/coloring is inconsistent between different plots. I'd like the "first" country to always be blue, but it's not always blue. Also the 'real' legend (darker/solid colors) is not always in the same position - sometimes it's below the incorrect/black legend. Why does this happen and how can I make this consistent across plots?
Also, different plots have different numbers of factor groups, sometimes more than 9, so I'd rather stick with standard ggplot coloring as most of the solutions for defining your own colors seem limited in the number of colors you can do (How to assign colors to categorical variables in ggplot2 that have stable mapping?)
You are mapping to two different aesthetics (color and fill) but you changed the scale specifications for only one of them. Doing this will always split a previously combined legend. There is a nice example of this on this page
To keep your legends combined, you'll want to add scale_color_discrete(breaks = user_countries) in addition to scale_fill_discrete(breaks = user_countries).
I don't have enough reputation to comment, but this previous question has a comprehensive answer.
Short answer is to change geom_density so that it doesn't map countries to color. That means just taking everything inside the aes() and putting it outside.
geom_density(size = 1, color=countries, adjust=1)
(This should work. Don't have an example to confirm).

Make scatter plots for multiple subsets of data

Let me introduce my data-set and my preliminary result first for better understanding my question. my dataset looks like:
Place Species Size Conc.
A BT 24 0.2
A ST 76 1.4
...
B BT 45 1.2
B ST 21 0.7
...
I want to make scatterplot of Size against Conc. for each Species at each Place. What I have done uses ggplot2 to make a graph as below:
scatterplot <- ggplot(mydata, aes(x = Size, y = Conc, color = Species)) +
geom_point(shape = 1)
Though this graph plots by the species group in different color, it summarizes all data in the dataset and fails to plot for different places.
I think the code below
scatterplot <- ggplot(mydata[mydata$place == "A"], aes(x = Size, y = Conc, color = Species)) + geom_point(shape = 1)
works for plotting just place A and I can do this for different places one by one. However, in my real dataset, the place variable has tons of different places, and I can't type them all out one by one manually. Thus my question actually is how to let R make those plots for different places automatically at one time?
Try:
ggplot(ddf)+geom_point(aes(Size, Conc.))+facet_grid(Place~Species)
If there are too many places:
ggplot(ddf)+geom_point(aes(Size, Conc., color=Place))+facet_grid(.~Species)
Or, in one graph:
ggplot(ddf)+geom_point(aes(Size, Conc., color=Place,shape=Species), size=5)

Resources