ggplot scale_fill_discrete(breaks = user_countries) creates a second, undesired legend - r

I am trying to change the factor level ordering of a data frame column to control the legend ordering and ggplot coloring of factor levels specified by country name. Here is my dataframe country_hours:
countries hours
1 Brazil 17
2 Mexico 13
3 Poland 20
4 Indonesia 2
5 Norway 20
6 Poland 20
Here is how I try to plot subsets of the data frame depending on a list of selected countries, user_countries:
make_country_plot<-function(user_countries, country_hours_pre)
{
country_hours = country_hours_pre[which(country_hours_pre$countries %in% user_countries) ,]
country_hours$countries = factor(country_hours$countries, levels = c(user_countries))
p = ggplot(data=country_hours, aes(x=hours, color=countries))
for(name in user_countries){
p = p + geom_bar( data=subset(country_hours, countries==name), aes(y = (..count..)/sum(..count..), fill=countries), binwidth = 1, alpha = .3)
}
p = p + scale_y_continuous(labels = percent) + geom_density(size = 1, aes(color=countries), adjust=1) +
ggtitle("Baltic countries") + theme(plot.title = element_text(lineheight=.8, face="bold")) + scale_fill_discrete(breaks = user_countries)
}
This works great in that the coloring goes according to my desired order as does the top legend, but a second legend appears and shows a different order. Without scale_fill_discrete(breaks = user_countries) I do not get my desired order, but I also do not get two legends. In the plot shown below, the desired order, given by user_countries was
user_countries = c("Lithuania", "Latvia", "Estonia")
I'd like to get rid of this second legend. How can I do it?
I also have another problem, which is that the plotting/coloring is inconsistent between different plots. I'd like the "first" country to always be blue, but it's not always blue. Also the 'real' legend (darker/solid colors) is not always in the same position - sometimes it's below the incorrect/black legend. Why does this happen and how can I make this consistent across plots?
Also, different plots have different numbers of factor groups, sometimes more than 9, so I'd rather stick with standard ggplot coloring as most of the solutions for defining your own colors seem limited in the number of colors you can do (How to assign colors to categorical variables in ggplot2 that have stable mapping?)

You are mapping to two different aesthetics (color and fill) but you changed the scale specifications for only one of them. Doing this will always split a previously combined legend. There is a nice example of this on this page
To keep your legends combined, you'll want to add scale_color_discrete(breaks = user_countries) in addition to scale_fill_discrete(breaks = user_countries).

I don't have enough reputation to comment, but this previous question has a comprehensive answer.
Short answer is to change geom_density so that it doesn't map countries to color. That means just taking everything inside the aes() and putting it outside.
geom_density(size = 1, color=countries, adjust=1)
(This should work. Don't have an example to confirm).

Related

How to add legend to plot with data from multiple data frames

I have scripted a ggplot compiled from two separate data frames, but as it stands there is no legend as the colours aren't included in aes. I'd prefer to keep the two datasets separate if possible, but can't figure out how to add the legend. Any thoughts?
I've tried adding the colours directly to the aes function, but then colours are just added as variables and listed in the legend instead of colouring the actual data.
Plotting this with base r, after creating the plot I would've used:
legend("top",c("Delta 18O","Delta 13C"),fill=c("red","blue")
and gotten what I needed, but I'm not sure how to replicate this in ggplot.
The following code currently plots exactly what I want, it's just missing the legend... which ideally should match what the above line would produce, except the "18" and "13" need superscripted.
Examples of an old plot using base r (with a correct legend, except lacking superscripted 13 and 18) and the current plot missing the legend can be found here:
Old: https://imgur.com/xgd9e9C
New, missing legend: https://imgur.com/eGRhUzf
Background data
head(avar.data.x)
time av error
1 1.015223 0.030233604 0.003726832
2 2.030445 0.014819145 0.005270609
3 3.045668 0.010054801 0.006455241
4 4.060891 0.007477541 0.007453974
5 5.076113 0.006178282 0.008333912
6 6.091336 0.004949045 0.009129470
head(avar.data.y)
time av error
1 1.015223 0.06810001 0.003726832
2 2.030445 0.03408136 0.005270609
3 3.045668 0.02313839 0.006455241
4 4.060891 0.01737148 0.007453974
5 5.076113 0.01405144 0.008333912
6 6.091336 0.01172788 0.009129470
The following avarn function produces a data frame with three columns and several thousand rows (see header above). These are then graphed over time on a log/log plot.
avar.data.x <- avarn(data3$"d Intl. Std:d 13C VPDB - Value",frequency)
avar.data.y <- avarn(data3$"d Intl. Std:d 18O VPDB-CO2 - Value",frequency)
Create allan deviation plot
ggplot()+
geom_line(data=avar.data.y,aes(x=time,y=sqrt(av)),color="red")+
geom_line(data=avar.data.x,aes(x=time,y=sqrt(av)),color="blue")+
scale_x_log10()+
scale_y_log10()+
labs(x=expression(paste("Averaging Time ",tau," (seconds)")),y="Allan Deviation (per mil)")
The above plot is only missing a legend to show the name of the two plotted datasets and their respective colours. I would like the legend in the top centre of the graph.
How to superscript legend titles?:
ggplot()+
geom_line(data=avar.data.y,aes(x=time,y=sqrt(av),
color =expression(paste("Delta ",18^,"O"))))+
geom_line(data=avar.data.xmod,aes(x=time,y=sqrt(av),
color=expression(paste("Delta ",13^,"C"))))+
scale_color_manual(values = c("blue", "red"),name=NULL) +
scale_x_log10()+
scale_y_log10()+
labs(
x=expression(paste("Averaging Time ",tau," (seconds)")),
y="Allan Deviation (per mil)") +
theme(legend.position = c(0.5, 0.9))
Set color inside the aes and add a scale_color_ function to your plot should do the trick.
ggplot()+
geom_line(data=avar.data.y,aes(x=time,y=sqrt(av), color = "a"))+
geom_line(data=avar.data.x,aes(x=time,y=sqrt(av), color="b"))+
scale_color_manual(
values = c("red", "blue"),
labels = expression(avar.data.x^2, "b")
) +
scale_x_log10()+
scale_y_log10()+
labs(
x=expression(paste("Averaging^2 Time ",tau," (seconds)")),
y="Allan Deviation (per mil)") +
theme(legend.position = c(0.5, 0.9))
You can make better use of ggplot's aesthetics by combining both data sets into one. This is particularly easy when your data frames have the same structure. Here, you could then for example use color.
This way you only need one call to geom_line and it is easier to control the legend(s). You could even make some fancy function to automate your labels. etc.
Also note that white spaces in column names are not great (you're making your own life very difficult) and that you may want to think about automating your avarn calls, e.g. with lapply, which would result in a list of data frames and makes the binding of the data frames even easier.
avar.data.x <- readr::read_table("0 time av error
1 1.015223 0.030233604 0.003726832
2 2.030445 0.014819145 0.005270609
3 3.045668 0.010054801 0.006455241
4 4.060891 0.007477541 0.007453974
5 5.076113 0.006178282 0.008333912
6 6.091336 0.004949045 0.009129470")
avar.data.y <- readr::read_table("0 time av error
1 1.015223 0.06810001 0.003726832
2 2.030445 0.03408136 0.005270609
3 3.045668 0.02313839 0.006455241
4 4.060891 0.01737148 0.007453974
5 5.076113 0.01405144 0.008333912
6 6.091336 0.01172788 0.009129470")
library(tidyverse)
combine_df <- bind_rows(list(a = avar.data.x, b = avar.data.y), .id = 'ID')
ggplot(combine_df)+
geom_line(aes(x = time, y = sqrt(av), color = ID))+
scale_color_manual(values = c("red", "blue"),
labels = c(expression("Delta 18"^"O"), expression("Delta 13"^"C")))
Created on 2019-11-11 by the reprex package (v0.2.1)

ggplot2 multi-variable scatterplot, Changing Labels and View in Margins

I am trying to create a scatterplot based on four values. My data is just lists of prices (BASIC,VALUE,DELUXE,ULTIMATE). I want VALUE and DELUXE to be the two axis (x,y) and then have the size and color of the points represent the data for the other two columns.
It is hard to set up a reproducible example, because it is only an issue when I get a lot of values listed. i have about 300 points, with about 30 different color/value labels(For ULTIMATE, and 20 size/value labels(For BASIC)
> gg <- ggplot(d, aes(x=DELUXE_PRICE, y=VALUE_PRICE,color=ULTIMATE_PRICE,size=BASIC_PRICE)) + geom_point(alpha = 1)
> plot(gg)
My code does this well, and lists the colors/size with the corresponding value on the side. This is great, but I would like to alter how that is displayed, so that it is not cut off. I would like to be able to "wrap" the values into more columns, or shrink the display size of those so that they fit.
Currently, this lists ULTIMATE in three columns, to the right of the plot area, but cuts off the top of the labels (it extends well above the plot area)
This lists BASIC size/value labels to the right of the plot area, below ULTIMATE labels, in one column, so about half are cut off at the bottom.
I can increase the margins with:
> gg <- ggplot(d, aes(x=DELUXE_PRICE, y=VALUE_PRICE,color=ULTIMATE_PRICE,size=BASIC_PRICE)) + geom_point(alpha = 1) +theme(plot.margin = unit(c(4,2,4,2), "cm"))
> plot(gg)
This gets more of it in, but creates lots of white area and a smaller view of the plot. I would like to be able to just increase the right margin if necessary, and "wrap" the labels in more columns extending to the right. (i.e. put ULTIMATE into 4 columns instead of 3, and put BASIC into 3-4 columns instead of 1 - So that they are shorter and don't run out the plot area.
There is some built in functionality I found to do the required operation. It lies in adding a guides() argument to the plot, specifying whether I am dealing with the color or size legend, and specifying the number of columns with "ncol = " (You can also specify rows). Giving it an order ranking allows you to rank these as well, so my resulting code was:
> gg <- ggplot(Table, aes(x=DELUXE_PRICE, y=VALUE_PRICE,color=ULTIMATE_PRICE,size=BASIC_PRICE)) + geom_point(alpha = 1) + guides(color = guide_legend(order = 0,ncol = 4),size = guide_legend(order = 1,ncol = 4))

ggplot 2 - geom_bar - represent 2 factor variables as fill= and colour =

I'm trying to use a geom_bar and I want to use colors to represent two different variables (that are factors)
I want to use Color= Breakfast or Lunch and fill = Restaurant.
However, I end up with similar colors because they are just two levels of each factor.
Can anyone help me to set other colors of the inside and outside part of the bars and also to make the outline thicker?
ggplot(aes(x=Item, y= `Sugars (g)`, fill = Restaurant, color =Breakfast_lunch)) + geom_col()
Thank you!
So you want to have the fill colour of the bars (the fill aesthetic) tied to one variable, Restaurant, and the outline/stroke colour (the colour or color aesthetic) tied to another one, Breakfast_lunch? In that case, you just want to change the actual colours used for each aesthetic (since they have the same defaults).
As #Jack Brookes suggests, you can use scale_fill_manual and scale_colour_manual to change these. You can supply them a named vector for the values argument: the names are the possible values of your variables, and the values are the colours you want them to appear as. For example:
ggplot(mydata, aes(x = Item, y = `Sugars (g)`, fill = Restaurant, color = Breakfast_lunch)) +
geom_col() +
scale_fill_manual(values = c("Burger King" = "red", "McDonalds" = "blue")) +
scale_colour_manual(values = c("Breakfast" = "green", "Lunch" = "black"))
(those colours are hideous, though, so maybe choose better ones :P)
The colours you supply to values don't actually have to be named: if you don't provide the names (the values in your data), ggplot2 will just apply the colours in the order it encounters them in your dataset. So if you just care abouthaving different colours, you don't need to stress about that bit.
Also note, as #neilfws does, that your original ggplot call is missing a data argument (to your data frame) at the start. That's pretty important!

r ggplot change legend order to match final order of data

I have a dataframe which has a set of manufacturers and collected data for those manufacturers. The list of manufacturers and/or the attribute data can change, depending on the run.
I display this as a line chart in ggplot, but what I want is to have the legend order match the 'up/down' order of the final year of data. So for this chart:
Default Legend Order
I'd like to see the legend order (and color) be Yoyodyne (purple), Widget (green), Wonka (blue) and Acme (red).
I can't (or don't think I can) use scale_color_manual as the data-in from one model run to the next the end-order (in 2032) may differ and/or the list of manufacturers may differ.
Code for doing chart is (last part, pz, just to simplify x axis display):
px <- ggplot(bym, aes(x=Model.Year, y=AverageCost, colour=Manufacturer))
py <- px + ggtitle("MyChart") + labs(x="Year", y="Foo") + geom_line(size=0.5) + geom_point()
pz <- py + scale_x_continuous(breaks=c(min(bym$Model.Year),max(bym$Model.Year)))
pz
You can set the order of the legend objects by using dplyr::mutate function in conjunction with the factor function. To set the colors in the order you want, you can just create a vector with your desired colors in the order you want them and pass them to scale_color_manual. I have done this in the example below. Mine looks a little different then yours because I removed the intermediate assignments.
bym <- data.frame(
Model.Year = rep(seq(2016, 2030, 1), 4),
AverageCost = rnorm(60),
Manufacturer = rep(c("Yoyodyne", "Widget", "Wonka", "Acme"), each = 15)
)
my_colors <- c("purple", "green", "blue", "red")
bym %>%
mutate(Manufacturer = factor(Manufacturer,
levels = c("Yoyodyne", "Widget", "Wonka", "Acme"))) %>%
ggplot(aes(x=Model.Year, y=AverageCost, colour=Manufacturer)) +
ggtitle("MyChart") +
labs(x="Year", y="Foo") +
geom_line(size=0.5) +
geom_point()+
scale_x_continuous(breaks=c(min(bym$Model.Year),max(bym$Model.Year))) +
scale_color_manual(values = my_colors)
Have you tried setting the levels for Manufacturer according to the last year? For example, you can add a column with levels set this way:
# order Manufacturer by AverageCost in the last year
colours = bym[with(bym[bym$Model.Year == 2032,], order(-AverageCost)),]$Manufacturer
# add factor with levels ordered by colours
bym$Colour = factor(bym$Manufacturer, levels=as.character(colours))
Then use Colour for your colour aesthetic.
EDIT: That is, if you want to stick to base R. The answer with dplyr::mutate is much easier to use.

How to recycle colours in a colorbrewer palette using line symbols

I'm using ggplot2 to create quite a few facet_wrapped geom_line plot.
Although each plot only has a maximum of eight lines, when taken together, there are more like twenty categories to show on the legend.
In a similar vein to this:
Recommend a scale colour for 13 or more categories
and this:
In R,how do I change the color value of just one value in ggplot2's scale_fill_brewer? I'd like to artificially up the number of colours I can show using colorbrewer's high-contrast colour sets.
An obvious way to do this would seem to be to 'recycle' the colours in the palette, with a different line symbol each time. So bright red with 'x's on the line could be a different category than bright red with 'o's etc.
Can anyone think how I might do this?
Thanks!
Edit
Here's some (sanitised) data to play with, and the R code I'm using to produce my plot.
Data: http://orca.casa.ucl.ac.uk/~rob/Stack%20Overflow%20question/stack%20overflow%20colours%20question%20data.csv
R code:
csvData <- read.csv("stack overflow colours question data.csv")
p <- ggplot(csvData,
aes(year, percentage_of_output, colour=category, group=category))
p +
geom_line(size=1.2) +
labs(title = "Can I recycle the palette colours?", y = "% of output") +
scale_colour_brewer(palette = "Set1") +
theme(plot.title = element_text(size = rel(1.5))) +
facet_wrap("country_iso3", scales="free_y")
Made data frame containing 20 levels (as letters).
df<-data.frame(group=rep(c(LETTERS[1:20]),each=5),x=rep(1:5,times=20),y=1:100)
You can use scale_colour_manual() to set colors for lines - in example I used five SET1 and repeated them four times (total number is 20). Then to set shapes added geom_point() and scale_shape_manual() and five different shapes and repeated each of them four times (total number again is 20).
library(RColorBrewer)
ggplot(df,aes(x,y,colour=group))+geom_line()+geom_point(aes(shape=group),size=5)+
scale_colour_manual(values=rep(brewer.pal(5,"Set1"),times=4))+
scale_shape_manual(values=rep(c(15,16,17,18,19),each=5))

Resources