Adding Legends in Graphs without tidy data - r

#Plot the in sample forecasts against the actual values
#Build the confidence interval
Upper95 <- fcast1 + 1.96*sqrt(Var1)
Lower95 <- fcast1 - 1.96*sqrt(Var1)
Upper80 <- fcast1 + 1.28*sqrt(Var1)
Lower80 <- fcast1 - 1.28*sqrt(Var1)
#Create a data frame
dfb <- data.frame(TeslaWeeklyPrices$Date,fcast1,TeslaWeeklyPrices$TeslaPrices,Upper95,Lower95,Upper80,Lower80)
#Make the Plot
Plot1 <- ggplot(dfb, aes(x=TeslaWeeklyPrices.Date, y=TeslaWeeklyPrices.TeslaPrices))+
geom_ribbon(data=dfb,aes(ymin=Upper95,ymax=Lower95),fill = "slategray2")+
geom_ribbon(data=dfb,aes(ymin=Upper80,ymax=Lower80),fill = "bisque")+
geom_line(data=dfb, aes(x=TeslaWeeklyPrices.Date, y=fcast1),size=1, color="red1")+
geom_point(shape = 19, fill = "white", colour = "blue" ,size = 1)+
theme_light(base_size = 11) +
ylab("Tesla Stock price ($)") + xlab("Date (weeks)")
Plot1
That is my code for my graph.
That is how it looks. I want to add legends in my graph without having to tidy my data. Because then I can not format my graph as I want.
After the useful comment I got.
Upper95 <- fcast1 + 1.96*sqrt(Var1)
Lower95 <- fcast1 - 1.96*sqrt(Var1)
Upper80 <- fcast1 + 1.28*sqrt(Var1)
Lower80 <- fcast1 - 1.28*sqrt(Var1)
dfb <- data.frame(TeslaWeeklyPrices$Date,fcast1,TeslaWeeklyPrices$TeslaPrices,Upper95,Lower95,Upper80,Lower80)
Plot1 <- ggplot(dfb, aes(x=TeslaWeeklyPrices.Date, y=TeslaWeeklyPrices.TeslaPrices))+
geom_ribbon(aes(ymin=Upper95, ymax=Lower95, fill='95% prediction level')) +
geom_ribbon(aes(ymin=Upper80, ymax=Lower80, fill='80% prediction level')) +
geom_line(data=dfb, aes(x=TeslaWeeklyPrices.Date, y=fcast1,
color="Predicted Values"),size=1)+
geom_point(shape = 19, aes(color = "Observed Values"),
fill = "white", size = 1 ,)+
scale_fill_manual(values=c('95% prediction level'='slategray2', '80% prediction level'="bisque"), breaks=c('95% prediction level', '80% prediction level')) +
scale_color_manual(values=c("Predicted Values"="red","Observed Values"= "blue"), breaks=c('Predicted Values', 'Observed Values'))+
guides(color=guide_legend(title=NULL),fill=guide_legend(title=NULL) ) +
theme(legend.margin = margin(b=0, t=-1000))+
theme_light(base_size = 12)
Plot1
That is my new code.
So how can my blue points look as points in the Legend and not as a line. And how can i det the margin to 0 between my 2 legends?
Can I format the background color of this so it looks like an independent part and not as part of the graph?
That is an example I saw in one paper.

First of all, I do have a bit of an issue with the comment:
I want to add legends in my graph without having to tidy my data.
Because then I can not format my graph as I want.
Tidying your data is often the best solution to do just that (format the graph as you want), but I kind of agree it might be more straightforward in this case to just "brute force" the legend into place. I'll show you how.
Since I don't have your data, I made up my own to mirror that you shared:
set.seed(1234)
time <- 1:50
Var1 <- unlist(lapply(time, function(x) rnorm(1, 1, 0.01)))
fcast1 <- unlist(lapply(time, function(x) { x * rnorm(1, 0.1, 0.01)}))
Upper95 <- fcast1 + 1.96*sqrt(Var1)
Lower95 <- fcast1 - 1.96*sqrt(Var1)
Upper80 <- fcast1 + 1.28*sqrt(Var1)
Lower80 <- fcast1 - 1.28*sqrt(Var1)
dfb <- data.frame(time, fcast1, Upper95, Lower95, Upper80, Lower80)
And the plot:
ggplot(dfb, aes(time, fcast1)) +
geom_ribbon(aes(ymin=Upper95, ymax=Lower95), fill='slategray2') +
geom_ribbon(aes(ymin=Upper80, ymax=Lower80), fill='bisque') +
geom_line(size=1, color='red1') +
theme_light()
To create the legend without having Tidy data, you need to make the legend piece by piece, but still use ggplot to do so. Legends are created for aesthetics that are not part of the coordinate system in ggplot2 that are inside of aes(). Therefore, to make the legend appear, you only need to put the aesthetic modifiers fill and color inside the aes() part of each geom_*() function.
It's not quite that simple though, but once you understand how it works it becomes more clear. The value you assign to fill= or color= inside aes() will be used for the label in the legend, and not the color. You will have to specify color with a scale_*() function.
ggplot(dfb, aes(time, fcast1)) +
geom_ribbon(aes(ymin=Upper95, ymax=Lower95, fill='Upper')) +
geom_ribbon(aes(ymin=Upper80, ymax=Lower80, fill='Lower')) +
geom_line(size=1, aes(color="Forecast")) +
scale_fill_manual(values=c('Upper'='slategray2', 'Lower'='bisque'), breaks=c('Upper', 'Lower')) +
scale_color_manual(values='red1') +
theme_light()
That looks more like it, but it's not perfect. Perhaps you would want the line and fill boxes in the legend to become "one" legend box instead? If that's the case, you can't really do that (because they span two different aesthetic modifiers, fill and color); however, we can make the same effect if we do a few things:
Remove the title for the color legend
Change the title for the fill legend
Use the theme elements and margins to move the legends closer together to look as one
Here you can see how you might do that:
p + # this is the code from above
guides(
color=guide_legend(title=NULL),
fill=guide_legend(title='Legend')
) +
theme(legend.margin = margin(b=0, t=-13))
EDIT:
OP asked if the points could appear on the chart as well. They certainly can, and you have to use a similar method to do that. You can just add color= into aes() for geom_point() like before:
ggplot(dfb, aes(time, fcast1)) +
geom_ribbon(aes(ymin=Upper95, ymax=Lower95, fill='Upper')) +
geom_ribbon(aes(ymin=Upper80, ymax=Lower80, fill='Lower')) +
geom_line(size=1, aes(color="Forecast")) +
geom_point(size=1, aes(color='Actual Values'), shape=19) +
scale_fill_manual(values=c('Upper'='slategray2', 'Lower'='bisque'), breaks=c('Upper', 'Lower')) +
scale_color_manual(values=c('Forecast'='red1', 'Actual Values'='blue')) +
theme_light() +
guides(
color=guide_legend(title=NULL),
fill=guide_legend(title='Legend')
) +
theme(legend.margin = margin(b=0, t=-13))
One small problem there... you'll notice the icon (called "glyph") next to "Actual Values" and "Forecast" is a line + a point. I think you'd prefer to have a point be the glyph for the point and a line be the glyph for the line. We can't do that in the same legend (they are both part of the color legend)... so we can fix that by separating "Actual Values" into another legend. In this case, we'll just use the shape aesthetic modifier and have a third legend that also has no title.
ggplot(dfb, aes(time, fcast1)) +
geom_ribbon(aes(ymin=Upper95, ymax=Lower95, fill='Upper')) +
geom_ribbon(aes(ymin=Upper80, ymax=Lower80, fill='Lower')) +
geom_line(size=1, aes(color="Forecast")) +
geom_point(size=1, aes(shape='Actual Values'), color='blue') +
scale_fill_manual(values=c('Upper'='slategray2', 'Lower'='bisque'), breaks=c('Upper', 'Lower')) +
scale_color_manual(values=c('Forecast'='red1')) +
scale_shape_manual(values=19) +
theme_light() +
guides(
color=guide_legend(title=NULL),
shape=guide_legend(title=NULL),
fill=guide_legend(title='Legend')
) +
theme(legend.margin = margin(b=0, t=-13))
Now you have all the information needed to become a ggplot master :).

Related

Issue with log_2 scaling using ggplot2 and log2_trans()

I am trying to plot data using ggplot2 in R.
The datapoints occur for each 2^i-th x-value (4, 8, 16, 32,...). For that reason, I want to scale my x-Axis by log_2 so that my datapoints are spread out evenly. Currently most of the datapoints are clustered on the left side, making my plot hard to read (see first image).
I used the following command to get this image:
ggplot(summary, aes(x=xData, y=yData, colour=groups)) +
geom_errorbar(aes(ymin=yData-se, ymax=yData+se), width=2000, position=pd) +
geom_line(position=pd) +
geom_point(size=3, position=pd)
However trying to scale my x-axis with log2_trans yields the second image, which is not what I expected and does not follow my data.
Code used:
ggplot(summary, aes(x=settings.numPoints, y=benchmark.costs.average, colour=solver.name)) +
geom_errorbar(aes(ymin=benchmark.costs.average-se, ymax=benchmark.costs.average+se), width=2000, position=pd) +
geom_line(position=pd) +
geom_point(size=3, position=pd) +
scale_x_continuous(trans = log2_trans(),
breaks = trans_breaks("log2", function(x) 2^x),
labels = trans_format("log2", math_format(2^.x)))
Using scale_x_continuous(trans = log2_trans()) only doesn't help either.
EDIT:
Attached the data for reproducing the results:
https://pastebin.com/N1W0z11x
EDIT 2:
I have used the function pd <- position_dodge(1000) to avoid overlapping of my error bars, which caused the problem.
Removing the position=pd statements solved the issue
Here is a way you could format your x-axis:
# Generate dummy data
x <- 2^seq(1, 10)
df <- data.frame(
x = c(x, x, x),
y = c(0.5*x, x, 1.5*x),
z = rep(letters[seq_len(3)], each = length(x))
)
The plot of this would look like this:
ggplot(df, aes(x, y, colour = z)) +
geom_point() +
geom_line()
Adjusting the x-axis would work like so:
ggplot(df, aes(x, y, colour = z)) +
geom_point() +
geom_line() +
scale_x_continuous(
trans = "log2",
labels = scales::math_format(2^.x, format = log2)
)
The labels argument is just so you have labels in the format 2^x, you could change that to whatever you like.
I have used the function pd <- position_dodge(1000) to avoid overlapping of my error bars, which caused the problem.
Adjusting the amount of position dodge and the with of the error bars according to the new scaling solved the problem.
pd <- position_dodge(0.2) # move them .2 to the left and right
ggplot(summary, aes(x=settings.numPoints, y=benchmark.costs.average, colour=algorithm)) +
geom_errorbar(aes(ymin=benchmark.costs.average-se, ymax=benchmark.costs.average+se), width=0.4, position=pd) +
geom_line(position=pd) +
geom_point(size=3, position=pd) +
scale_x_continuous(
trans = "log2",
labels = scales::math_format(2^.x, format = log2)
)
Adding scale_y_continuous(trans="log2") yields the results I was looking for:

Viridis and ggplot2/ggmarginal

I am encountering a problem using viridis with ggplot2 and ggmarginal.
I would like to colorize dots on a Bland-Altman Plot that I am plotting with ggplot2:
diff <- (a1$A1_phones - a1$A1_video)
diffp <- (a1$A1_phones - a1$A1_video)/a1$A1_video*100
sd.diff <- sd(diff)
sd.diffp <- sd(diffp)
my.data <- data.frame(a1$A1_video, a1$A1_phones, diff, diffp)
dev.off()
diffplot <- ggplot(my.data, aes(a1$A1_video, diff)) +
geom_point(size=2, colour = rgb(0,0,0, alpha = 0.5)) +
theme_bw() +
#when the +/- 2SD lines will fall outside the default plot limits
#Thanks to commenter for noticing this.
ylim(mean(my.data$diff) - 7*sd.diff, mean(my.data$diff) + 7*sd.diff) +
geom_hline(yintercept = 0, linetype = 3) +
geom_hline(yintercept = mean(my.data$diff)) +
geom_hline(yintercept = mean(my.data$diff) + 2*sd.diff, linetype = 2) +
geom_hline(yintercept = mean(my.data$diff) - 2*sd.diff, linetype = 2) +
ylab("Difference Video vs Algorithm [ms]") +
xlab("Average of Video vs Algorithm [ms]")
p<-ggMarginal(diffplot, type="histogram", bins = 40)+ scale_colour_viridis_d()
It would now be very beautiful to colorize the dots from A1_video differently than those from A1_phones and have viridis drawing a continuous density plot.
I am not sure if this is what you want, please try to be more specific and provide sample data. If you just want the color to change based on another column in the source dataset , it must be specified inside the aes() function:
diffplot <- ggplot(my.data,aes(col=a1$A1_video))

Hide legend elements in ggplot2

I am trying to plot the parameter estimates and levels of hierarchy from a stan model output. For the legend, I am hoping to remove all labels except for the "Overall Effects" label but I can't figure out how to remove all of the species successfully.
Here is the code:
ggplot(dfwide, aes(x=Estimate, y=var, color=factor(sp), size=factor(rndm),
alpha=factor(rndm))) +
geom_point(position =pd) +
geom_errorbarh(aes(xmin=(`2.5%`), xmax=(`95%`)), position=pd,
size=.5, height = 0, width=0) +
geom_vline(xintercept=0) +
scale_colour_manual(values=c("blue", "red", "orangered1","orangered3", "sienna4",
"sienna2", "green4", "green3", "purple2", "magenta2"),
labels=c("Overall Effects", expression(italic("A. pensylvanicum"),
italic("A. rubrum"), italic("A. saccharum"),
italic("B. alleghaniensis"), italic("B. papyrifera"),
italic("F. grandifolia"), italic("I. mucronata"),
italic("P. grandidentata"), italic("Q. rubra")))) +
scale_size_manual(values=c(3, 1, 1, 1, 1, 1, 1, 1, 1, 1)) +
scale_shape_manual(labels="", values=c("1"=16,"2"=16)) +
scale_alpha_manual(values=c(1, 0.4)) + guides(size=FALSE, alpha=FALSE) +
ggtitle(label = "A.") +
scale_y_discrete(limits = rev(unique(sort(dfwide$var))), labels=estimates) +
ylab("") +
labs(col="Effects") + theme(legend.title=element_blank())
The key points you need to notice is that remove part of the labels in legend can't be achieved by the function in ggplot2, what you need to do is interact with grid, which more underlying since both lattice and ggplot2 are based grid,to do some more underlying work, we need some functions in the grid.
To remove part of the labels in legend, there are three functions need to be used, they are grid.force(), grid.ls() and grid.remove() . After draw the picture by ggplot2, then using grid.force() and grid.ls(), we can find all the elements in the picture, they all are point, line, text, etc. Then we may need to find the elements we are interested, this process is interactive, since names of the element in ggplot2 are made by some numbers and text, they are not always meanful, after we identify the names of the element we are interested, we can use the grid.remove() function to remove the elements, blew is the sample code I made.
library(grid)
library(ggplot2)
set.seed(1)
data <- data.frame(x = rep(1:10, 2), y = sample(1:100, 20),
type = sample(c("A", "B"), 20, replace = TRUE))
ggplot(data, aes(x = x, y =y,color = type))+
geom_point()+
geom_line()+
scale_color_manual(values = c("blue", "darkred"))+
theme_bw()
until now, we have finished draw the whole picture, then we need to do some works remove some elements in the picture.
grid.force()
grid.ls()
grid.ls() list all the element names
grid.remove("key-4-1-1.5-2-5-2")
grid.remove("key-4-1-2.5-2-5-2")
grid.remove("label-4-3.5-4-5-4")
It's not perfect, but my solution would be to actually make two plots and combine them together. See this post where I lifted the extraction code from.
I don't have your data, but I think you will get the idea below:
library(ggplot2)
library(gridExtra)
library(grid)
#g_table credit goes to https://stackoverflow.com/a/11886071/2060081
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend)}
p_legend = ggplot(dfwide[sp=='Overall Effects'], aes(x=Estimate, y=var, color=factor(sp),
size=factor(rndm),
alpha=factor(rndm))) +
geom_point(position =pd) +
geom_errorbarh(aes(xmin=(`2.5%`), xmax=(`95%`)), position=pd,
size=.5, height = 0, width=0) +
geom_vline(xintercept=0) +
scale_colour_manual(values=c("blue"),
labels=c("Overall Effects"))) +
scale_size_manual(values=c(3)) +
scale_shape_manual(labels="", values=c("1"=16,"2"=16)) +
scale_alpha_manual(values=c(1, 0.4)) + guides(size=FALSE, alpha=FALSE) +
ggtitle(label = "A.") +
scale_y_discrete(limits = rev(unique(sort(dfwide$var))), labels=estimates) +
ylab("") +
labs(col="Effects") + theme(legend.title=element_blank())
p_legend = g_legend(p_legend)
One of your plots will just be the legend. Subset your data based on the Overall Effects and then plot the two plots together as a grid.

Scale geom_density to match geom_bar with percentage on y

Since I was confused about the math last time I tried asking this, here's another try. I want to combine a histogram with a smoothed distribution fit. And I want the y axis to be in percent.
I can't find a good way to get this result. Last time, I managed to find a way to scale the geom_bar to the same scale as geom_density, but that's the opposite of what I wanted.
My current code produces this output:
ggplot2::ggplot(iris, aes(Sepal.Length)) +
geom_bar(stat="bin", aes(y=..density..)) +
geom_density()
The density and bar y values match up, but the scaling is nonsensical. I want percentage on the y axes, not well, the density.
Some new attempts. We begin with a bar plot modified to show percentages instead of counts:
gg = ggplot2::ggplot(iris, aes(Sepal.Length)) +
geom_bar(aes(y = ..count../sum(..count..))) +
scale_y_continuous(name = "%", labels=scales::percent)
Then we try to add a geom_density to that and somehow get it to scale properly:
gg + geom_density()
gg + geom_density(aes(y=..count..))
gg + geom_density(aes(y=..scaled..))
gg + geom_density(aes(y=..density..))
Same as the first.
gg + geom_density(aes(y = ..count../sum(..count..)))
gg + geom_density(aes(y = ..count../n))
Seems to be off by about factor 10...
gg + geom_density(aes(y = ..count../n/10))
same as:
gg + geom_density(aes(y = ..density../10))
But ad hoc inserting numbers seems like a bad idea.
One useful trick is to inspect the calculated values of the plot. These are not normally saved in the object if one saves it. However, one can use:
gg_data = ggplot_build(gg + geom_density())
gg_data$data[[2]] %>% View
Since we know the density fit around x=6 should be about .04 (4%), we can look around for ggplot2-calculated values that get us there, and the only thing I see is density/10.
How do I get geom_density fit to scale to the same y axis as the modified geom_bar?
Bonus question: why are the grouping of the bars different? The current function does not have spaces in between bars.
Here is an easy solution:
library(scales) # ! important
library(ggplot2)
ggplot(iris, aes(Sepal.Length)) +
stat_bin(aes(y=..density..), breaks = seq(min(iris$Sepal.Length), max(iris$Sepal.Length), by = .1), color="white") +
geom_line(stat="density", size = 1) +
scale_y_continuous(labels = percent, name = "percent") +
theme_classic()
Output:
Try this
ggplot2::ggplot(iris, aes(x=Sepal.Length)) +
geom_histogram(stat="bin", binwidth = .1, aes(y=..density..)) +
geom_density()+
scale_y_continuous(breaks = c(0, .1, .2,.3,.4,.5,.6),
labels =c ("0", "1%", "2%", "3%", "4%", "5%", "6%") ) +
ylab("Percent of Irises") +
xlab("Sepal Length in Bins of .1 cm")
I think your first example is what you want, you just want to change the labels to make it seem like it is percents, so just do that rather than mess around.

overlaying plots in ggplot2

How to overlay one plot on top of the other in ggplot2 as explained in the following sentences? I want to draw the grey time series on top of the red one using ggplot2 in R (now the red one is above the grey one and I want my graph to be the other way around). Here is my code (I generate some data in order to show you my problem, the real dataset is much more complex):
install.packages("ggplot2")
library(ggplot2)
time <- rep(1:100,2)
timeseries <- c(rep(0.5,100),rep(c(0,1),50))
upper <- c(rep(0.7,100),rep(0,100))
lower <- c(rep(0.3,100),rep(0,100))
legend <- c(rep("red should be under",100),rep("grey should be above",100))
dataset <- data.frame(timeseries,upper,lower,time,legend)
ggplot(dataset, aes(x=time, y=timeseries)) +
geom_line(aes(colour=legend, size=legend)) +
geom_ribbon(aes(ymax=upper, ymin=lower, fill=legend), alpha = 0.2) +
scale_colour_manual(limits=c("grey should be above","red should be under"),values = c("grey50","red")) +
scale_fill_manual(values = c(NA, "red")) +
scale_size_manual(values=c(0.5, 1.5)) +
theme(legend.position="top", legend.direction="horizontal",legend.title = element_blank())
Convert the data you are grouping on into a factor and explicitly set the order of the levels. ggplot draws the layers according to this order. Also, it is a good idea to group the scale_manual codes to the geom it is being applied to for readability.
legend <- factor(legend, levels = c("red should be under","grey should be above"))
c <- data.frame(timeseries,upper,lower,time,legend)
ggplot(c, aes(x=time, y=timeseries)) +
geom_ribbon(aes(ymax=upper, ymin=lower, fill=legend), alpha = 0.2) +
scale_fill_manual(values = c("red", NA)) +
geom_line(aes(colour=legend, size=legend)) +
scale_colour_manual(values = c("red","grey50")) +
scale_size_manual(values=c(1.5,0.5)) +
theme(legend.position="top", legend.direction="horizontal",legend.title = element_blank())
Note that the ordering of the values in the scale_manual now maps to "grey" and "red"

Resources