How to add multiple geom_smooth lines to the legend (ggplot)? - r

I am trying to create a plot which includes multiple geom_smooth trendlines within one plot. My current code is as follows:
png(filename="D:/Users/...", width = 10, height = 8, units = 'in', res = 300)
ggplot(Data) +
geom_smooth(aes(BA,BAgp1),colour="red",fill="red") +
geom_smooth(aes(BA,BAgp2),colour="turquoise",fill="turquoise") +
geom_smooth(aes(BA,BAgp3),colour="orange",fill="orange") +
xlab(bquote('Tree Basal Area ('~cm^2~')')) +
ylab(bquote('Predicted Basal Area Growth ('~cm^2~')')) +
labs(title = expression(paste("Other Softwoods")), subtitle = "Tree Level Basal Area Growth") +
theme_bw()
dev.off()
Which yields the following plot:
The issue is I can't for the life of me include a simple legend where I can label what each trendline represents. The dataset is quite large- if it would be valuable in indentifying a solution I will post externally to Stackoverflow.

Your data is in the wide format, or like a matrix. There's no easy way to add a custom legend in ggplot, so you need to transform your current data to a long format. I simulated 3 curves like what you have, and you can see if you call geom_line or geom_smooth with a variable ("name" in the example below) that separates your different values, it will work and produce a legend nicely.
library(dplyr)
library(tidyr)
library(ggplot2)
X = 1:50
#simulate data
Data = data.frame(
BA=X,
BAgp1 = log(X)+rnorm(length(X),0,0.3),
BAgp2 = log(X)+rnorm(length(X),0,0.3) + 0.5,
BAgp3 = log(X)+rnorm(length(X),0,0.3) + 1)
# convert this to long format, use BA as id
Data <- Data %>% pivot_longer(-BA)
#define colors
COLS = c("red","turquoise","orange")
names(COLS) = c("BAgp1","BAgp2","BAgp3")
###
ggplot(Data) +
geom_smooth(aes(BA,value,colour=name,fill=name)) +
# change name of legend here
scale_fill_manual(name="group",values=COLS)+
scale_color_manual(name="group",values=COLS)

Related

Use two colour scales possible (with work around)?

I'm trying to plots insect counts of 2 species in 18 experimental plots onto a single graph. Since the second species population peaks later, it is visually doable (see picture below). I would like the 18 population lines from species 1 to be green (using "Greens" from RColorBrewer) and the 18 of species 2 to be red (using "Reds"). I do realize this may be problematic for a colourblind audience, but that is irrelevant here.
I've read here that it is not possible with standard ggplot2 options: R ggplot two color palette on the same plot but this post is more than two years old.
There is a short of "cheat" for points: Using two scale colour gradients ggplot2 but since I prefer lines to show the population through time, I can't use it.
Are there any new "cheats" available for this?
Or does anyone have another idea to visualize my data in a way that shows population trends through time in all plots and shows the difference in timing of the peak? I've included a picture at the bottom that shows my real data, all in the same colour scale though.
Sample code
# example data frame
plot <- as.factor(rep(c("A","B","C"),each=5))
time <- as.numeric(rep(c(1:5),times=3))
S1 <- c(1,4,7,5,2, 2,8,9,3,1, 1,6,6,3,1)
S2 <- c(0,0,2,3,2, 1,2,1,5,3, 0,1,1,6,7)
df <- data.frame(time, plot, S1, S2)
# example colour scales
S1Colours <- colorRampPalette(brewer.pal(9,"Greens"))(3)
S2Colours <- colorRampPalette(brewer.pal(9,"Reds"))(3)
names(S1Colours) <- levels(df$plot)
names(S2Colours) <- levels(df$plot)
# example plot
ggplot(data=df) +
geom_line(aes(x=time, y=S1, colour=plot)) +
geom_line(aes(x=time, y=S2, colour=plot)) +
scale_colour_manual(name = "plot", values = S1Colours) +
scale_colour_manual(name = "plot", values = S2Colours)
# this gives the note "Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the existing scale."
Plot real data
I also would go by creating a manual color scale for all the combinations.
library(tidyverse)
library(RColorBrewer)
df_long=pivot_longer(df,cols=c(S1,S2),names_to = "Species",values_to = "counts") %>% # create long format and
mutate(plot_Species=paste(plot,Species,sep="_")) # make identifiers for combined plot and Species
#make color palette
mycolors=c(colorRampPalette(brewer.pal(9,"Greens"))(sum(grepl("S1",unique(df_long$plot_Species)))),
colorRampPalette(brewer.pal(9,"Reds"))(sum(grepl("S2",unique(df_long$plot_Species)))))
names(mycolors)=c(grep("S1",unique(df_long$plot_Species),value = T),
grep("S2",unique(df_long$plot_Species),value = T))
# example plot
ggplot(data=df_long) +
geom_line(aes(x=time, y=counts, colour=plot_Species)) +
scale_colour_manual(name = "Species by plot", values = mycolors)
You can do this easily with the ggnewscale package (disclaimer: I'm the author).
This is how you would do it:
library(RColorBrewer)
library(ggplot2)
library(ggnewscale)
plot <- as.factor(rep(c("A","B","C"),each=5))
time <- as.numeric(rep(c(1:5),times=3))
S1 <- c(1,4,7,5,2, 2,8,9,3,1, 1,6,6,3,1)
S2 <- c(0,0,2,3,2, 1,2,1,5,3, 0,1,1,6,7)
df <- data.frame(time, plot, S1, S2)
# example colour scales
S1Colours <- colorRampPalette(brewer.pal(9,"Greens"))(3)
S2Colours <- colorRampPalette(brewer.pal(9,"Reds"))(3)
names(S1Colours) <- levels(df$plot)
names(S2Colours) <- levels(df$plot)
ggplot(data=df) +
geom_line(aes(x=time, y=S1, colour=plot)) +
scale_colour_manual(name = "plot 1", values = S1Colours) +
new_scale_color() +
geom_line(aes(x=time, y=S2, colour=plot)) +
scale_colour_manual(name = "plot 2", values = S2Colours)
Created on 2019-12-19 by the reprex package (v0.3.0)

R: overlying trajectory plot and scatter plot

I'm working with ggplot2 and trajectory plots, plots whom are like scatter plots, but with lines that connect points due a specific rule.
My goal is to overlay a trajectory plot with a scatter plot, and each of them has different data.
First of all, the data:
# first dataset
ideal <- data.frame(ideal=c('a','b')
,x_i=c(0.3,0.8)
,y_i=c(0.11, 0.23))
# second dataset
calculated <- data.frame(calc = c("alpha","alpha","alpha")
,time = c(1,2,3)
,x_c = c(0.1,0.9,0.3)
,y_c = c(0.01,0.26,0.17)
)
Creating a scatter plot with the first one is easy:
library(ggplot2)
ggplot(calculated, aes(x=x_c, y=y_c)) + geom_point()
After that, I created the trajectory plot, using this helpful link:
library(grid)
library(data.table)
qplot(x_c, y_c, data = calculated, color = calc, group = calc)+
geom_path (linetype=1, size=0.5, arrow=arrow(angle=15, type="closed"))+
geom_point (data = calculated, colour = "red")+
geom_point (shape=19, size=5, fill="black")
With this result:
How can I overlay the ideal data to this trajectory plot (without trajectory of course, they should be only points)?
Thanks in advance!
qplot isn't usually recommended. Here's how you could plot the two dataframes. However, ggplot might work better for you if the dataframes were merged, and you had an x and y column, with an additional method column containing with calculated or ideal.
library(ggplot2)
ideal <- data.frame(ideal=c('a','b')
,x_i=c(0.3,0.8)
,y_i=c(0.11, 0.23)
)
# second dataset
calculated <- data.frame(calc = c("alpha","alpha","alpha")
,time = c(1,2,3)
,x_c = c(0.1,0.9,0.3)
,y_c = c(0.01,0.26,0.17)
)
ggplot(aes(x_c, y_c, color = "calculated"), data = calculated) +
geom_point( size = 5) +
geom_path (linetype=1, size=0.5, arrow = arrow(angle=15, type="closed"))+
geom_point(aes(x_i, y_i, color = "ideal"), data = ideal, size = 5) +
labs(x = "x", y = "y", color = "method")

Adding Custom Legend to 2 Data sets in ggplot2

I am trying to simply add a legend to my Nyquist plot where I am plotting 2 sets of data: 1 is an experimental set (~600 points), and 2 is a data frame calculated using a transfer function (~1000 points)
I need to plot both and label them. Currently I have them both plotted okay but when i try to add the label using scale_colour_manual no label appears. Also a way to move this label around would be appreciated!! Code Below.
pdf("nyq_2elc.pdf")
nq2 <- ggplot() + geom_point(data = treat, aes(treat$V1,treat$V2), color = "red") +
geom_point(data = circuit, aes(circuit$realTF,circuit$V2), color = "blue") +
xlab("Real Z") + ylab("-Imaginary Z") +
scale_colour_manual(name = 'hell0',
values =c('red'='red','blue'='blue'), labels = c('Treatment','EQ')) +
ggtitle("Nyquist Plot and Equivilent Circuit for 2 Electrode Treatment Setup at 0 Minutes") +
xlim(0,700) + ylim(0,700)
print(nq2)
dev.off()
Ggplot works best with long dataframes, so I would combine the datasets like this:
treat$Cat <- "treat"
circuit$Cat <- "circuit"
CombData <- data.frame(rbind(treat, circuit))
ggplot(CombData, aes(x=V1, y=V2, col=Cat))+geom_point()
This should give you the legend you want.
You probably have to change the names/order of the columns of dataframes treat and circuit so they can be combined, but it's hard to tell because you're not giving us a reproducible example.

How to change origin line position in ggplot bar graph?

Say I'm measuring 10 personality traits and I know the population baseline. I would like to create a chart for individual test-takers to show them their individual percentile ranking on each trait. Thus, the numbers go from 1 (percentile) to 99 (percentile). Given that a 50 is perfectly average, I'd like the graph to show bars going to the left or right from 50 as the origin line. In bar graphs in ggplot, it seems that the origin line defaults to 0. Is there a way to change the origin line to be at 50?
Here's some fake data and default graphing:
df <- data.frame(
names = LETTERS[1:10],
factor = round(rnorm(10, mean = 50, sd = 20), 1)
)
library(ggplot2)
ggplot(data = df, aes(x=names, y=factor)) +
geom_bar(stat="identity") +
coord_flip()
Picking up on #nongkrong's comment, here's some code that will do what I think you want while relabeling the ticks to match the original range and relabeling the axis to avoid showing the math:
library(ggplot2)
ggplot(data = df, aes(x=names, y=factor - 50)) +
geom_bar(stat="identity") +
scale_y_continuous(breaks=seq(-50,50,10), labels=seq(0,100,10)) + ylab("Percentile") +
coord_flip()
This post was really helpful for me - thanks #ulfelder and #nongkrong. However, I wanted to re-use the code on different data without having to manually adjust the tick labels to fit the new data. To do this in a way that retained ggplot's tick placement, I defined a tiny function and called this function in the label argument:
fix.labels <- function(x){
x + 50
}
ggplot(data = df, aes(x=names, y=factor - 50)) +
geom_bar(stat="identity") +
scale_y_continuous(labels = fix.labels) + ylab("Percentile") +
coord_flip()

R + ggplot : Time series with events

I'm an R/ggplot newbie. I would like to create a geom_line plot of a continuous variable time series and then add a layer composed of events. The continuous variable and its timestamps is stored in one data.frame, the events and their timestamps are stored in another data.frame.
What I would really like to do is something like the charts on finance.google.com. In those, the time series is stock-price and there are "flags" to indicate news-events. I'm not actually plotting finance stuff, but the type of graph is similar. I am trying to plot visualizations of log file data. Here's an example of what I mean...
If advisable (?), I would like to use separate data.frames for each layer (one for continuous variable observations, another for events).
After some trial and error this is about as close as I can get. Here, I am using example data from data sets that come with ggplot. "economics" contains some time-series data that I'd like to plot and "presidential" contains a few events (presidential elections).
library(ggplot2)
data(presidential)
data(economics)
presidential <- presidential[-(1:3),]
yrng <- range(economics$unemploy)
ymin <- yrng[1]
ymax <- yrng[1] + 0.1*(yrng[2]-yrng[1])
p2 <- ggplot()
p2 <- p2 + geom_line(mapping=aes(x=date, y=unemploy), data=economics , size=3, alpha=0.5)
p2 <- p2 + scale_x_date("time") + scale_y_continuous(name="unemployed [1000's]")
p2 <- p2 + geom_segment(mapping=aes(x=start,y=ymin, xend=start, yend=ymax, colour=name), data=presidential, size=2, alpha=0.5)
p2 <- p2 + geom_point(mapping=aes(x=start,y=ymax, colour=name ), data=presidential, size=3)
p2 <- p2 + geom_text(mapping=aes(x=start, y=ymax, label=name, angle=20, hjust=-0.1, vjust=0.1),size=6, data=presidential)
p2
Questions:
This is OK for very sparse events, but if there's a cluster of them (as often happens in a log file), it gets messy. Is there some technique I can use to neatly display a bunch of events occurring in a short time interval? I was thinking of position_jitter, but it was really hard for me to get this far. google charts stacks these event "flags" on top of each other if there's a lot of them.
I actually don't like sticking the event data in the same scale as the continuous measurement display. I would prefer to put it in a facet_grid. The problem is that the facets all must be sourced from the same data.frame (not sure if that's true). If so, that also seems not ideal (or maybe I'm just trying to avoid using reshape?)
Now I like ggplot as much as the next guy, but if you want to make the Google Finance type charts, why not just do it with the Google graphics API?!? You're going to love this:
install.packages("googleVis")
library(googleVis)
dates <- seq(as.Date("2011/1/1"), as.Date("2011/12/31"), "days")
happiness <- rnorm(365)^ 2
happiness[333:365] <- happiness[333:365] * 3 + 20
Title <- NA
Annotation <- NA
df <- data.frame(dates, happiness, Title, Annotation)
df$Title[333] <- "Discovers Google Viz"
df$Annotation[333] <- "Google Viz API interface by Markus Gesmann causes acute increases in happiness."
### Everything above here is just for making up data ###
## from here down is the actual graphics bits ###
AnnoTimeLine <- gvisAnnotatedTimeLine(df, datevar="dates",
numvar="happiness",
titlevar="Title", annotationvar="Annotation",
options=list(displayAnnotations=TRUE,
legendPosition='newRow',
width=600, height=300)
)
# Display chart
plot(AnnoTimeLine)
# Create Google Gadget
cat(createGoogleGadget(AnnoTimeLine), file="annotimeline.xml")
and it produces this fantastic chart:
As much as I like #JD Long's answer, I'll put one that is just in R/ggplot2.
The approach is to create a second data set of events and to use that to determine positions. Starting with what #Angelo had:
library(ggplot2)
data(presidential)
data(economics)
Pull out the event (presidential) data, and transform it. Compute baseline and offset as fractions of the economic data it will be plotted with. Set the bottom (ymin) to the baseline. This is where the tricky part comes. We need to be able to stagger labels if they are too close together. So determine the spacing between adjacent labels (assumes that the events are sorted). If it is less than some amount (I picked about 4 years for this scale of data), then note that that label needs to be higher. But it has to be higher than the one after it, so use rle to get the length of TRUE's (that is, must be higher) and compute an offset vector using that (each string of TRUE must count down from its length to 2, the FALSEs are just at an offset of 1). Use this to determine the top of the bars (ymax).
events <- presidential[-(1:3),]
baseline = min(economics$unemploy)
delta = 0.05 * diff(range(economics$unemploy))
events$ymin = baseline
events$timelapse = c(diff(events$start),Inf)
events$bump = events$timelapse < 4*370 # ~4 years
offsets <- rle(events$bump)
events$offset <- unlist(mapply(function(l,v) {if(v){(l:1)+1}else{rep(1,l)}}, l=offsets$lengths, v=offsets$values, USE.NAMES=FALSE))
events$ymax <- events$ymin + events$offset * delta
Putting this together into a plot:
ggplot() +
geom_line(mapping=aes(x=date, y=unemploy), data=economics , size=3, alpha=0.5) +
geom_segment(data = events, mapping=aes(x=start, y=ymin, xend=start, yend=ymax)) +
geom_point(data = events, mapping=aes(x=start,y=ymax), size=3) +
geom_text(data = events, mapping=aes(x=start, y=ymax, label=name), hjust=-0.1, vjust=0.1, size=6) +
scale_x_date("time") +
scale_y_continuous(name="unemployed \[1000's\]")
You could facet, but it is tricky with different scales. Another approach is composing two graphs. There is some extra fiddling that has to be done to make sure the plots have the same x-range, to make the labels all fit in the lower plot, and to eliminate the x axis in the upper plot.
xrange = range(c(economics$date, events$start))
p1 <- ggplot(data=economics, mapping=aes(x=date, y=unemploy)) +
geom_line(size=3, alpha=0.5) +
scale_x_date("", limits=xrange) +
scale_y_continuous(name="unemployed [1000's]") +
opts(axis.text.x = theme_blank(), axis.title.x = theme_blank())
ylims <- c(0, (max(events$offset)+1)*delta) + baseline
p2 <- ggplot(data = events, mapping=aes(x=start)) +
geom_segment(mapping=aes(y=ymin, xend=start, yend=ymax)) +
geom_point(mapping=aes(y=ymax), size=3) +
geom_text(mapping=aes(y=ymax, label=name), hjust=-0.1, vjust=0.1, size=6) +
scale_x_date("time", limits=xrange) +
scale_y_continuous("", breaks=NA, limits=ylims)
#install.packages("ggExtra", repos="http://R-Forge.R-project.org")
library(ggExtra)
align.plots(p1, p2, heights=c(3,1))
Plotly is an easy way to make ggplots interactive. To display events, coerce them into factors which can be displayed as an aesthetic, like color.
The end result is a plot that you can drag the cursor over. The plots display data of interest:
Here is the code for making the ggplot:
# load data
data(presidential)
data(economics)
# events of interest
events <- presidential[-(1:3),]
# strip year from economics and events data frames
economics$year = as.numeric(format(economics$date, format = "%Y"))
# use dplyr to summarise data by year
#install.packages("dplyr")
library(dplyr)
econonomics_mean <- economics %>%
group_by(year) %>%
summarise(mean_unemployment = mean(unemploy))
# add president terms to summarized data frame as a factor
president <- c(rep(NA,14), rep("Reagan", 8), rep("Bush", 4), rep("Clinton", 8), rep("Bush", 8), rep("Obama", 7))
econonomics_mean$president <- president
# create ggplot
p <- ggplot(data = econonomics_mean, aes(x = year, y = mean_unemployment)) +
geom_point(aes(color = president)) +
geom_line(alpha = 1/3)
It only takes one line of code to make the ggplot into a plotly object.
# make it interactive!
#install.packages("plotly")
library(plotly)
ggplotly(p)
Considering you are plotting time series and qualitative information, most economic book use the area of plotting to indicate a structural change or event on data so i recommend to use something like this:
library(ggplot2)
data(presidential)
data(economics)
ggplot() +
geom_rect(aes(xmin = start,
xmax = end,
ymin = 0, ymax = Inf,
fill = name),
data = presidential,
show.legend = F) +
geom_text(aes(x = start+500,
y = 2000,
label = name,
angle = 90),
data = presidential) +
geom_line(aes(x = date, y = unemploy),
data= economics) +
scale_fill_brewer(palette = "Blues") +
labs(x = "time", y = "unemploy")

Resources