Order of the boxplots and legend labels with ggplot - r

I want to create a boxplot with ggplot2 and I'd like to organize the plot in the order of the data frame, e.g.
I know that R organize the boxplot alphabetically. How can I:
Organize the X axis in the order Taste - Color - Capacity
Switch the boxes, i.e. first the green and then the orange, instead orange and green
Switch the legend order too, first NaCl and then O_{2}
library(ggplot2)
library(readxl)
Chemical <- rep(c("NaCl", "Al2"), times = 3, each = 4)
Quality <- rep(c("Taste", "Color of package", "Capacity"), times = 1, each = 8)
Accepted <- seq(0, 100, by = 100/23)
DF <- data.frame(Chemical, Quality, Accepted)
ggplot(DF, aes(x = Quality, y = Accepted, fill = Chemical)) +
geom_boxplot() +
scale_fill_manual(values = c("orange", "green"),
labels = expression("Al"[2], "NaCl")) +
xlab("") +
theme(legend.position = "top", legend.title = element_blank())

You have different methods to control the output. A quick solution would be:
ggplot(DF, aes(x = Quality, y = Accepted, fill = Chemical)) +
geom_boxplot() +
scale_fill_manual(values = c("green", "orange"),
labels = expression("Al"[2], "NaCl")) +
xlab("") +
theme(legend.position = "top", legend.title = element_blank()) +
guides(fill=guide_legend(reverse=TRUE)) +
scale_x_discrete(limits=c("Taste", "Color of package", "Capacity"))
Simply with the argument guides(fill=guide_legend(reverse=TRUE)), manually altering the order of the colors and fixing a specific order on the X axis with scale_x_discrete is achieved.
It is also possible to reorder the levels with DF$Quality <- factor (DF$Quality, levels = c ("Taste", "Color of package", "Capacity")) and achieve the same result without use scale_x_discrete().

Related

making changes on boxplot objects

these are my codes:
ggplot(summer.months, aes(x = month, y = Temp_mean, linetype = position, color = canopy, fill = position)) +
geom_boxplot() +
theme_bw() +
ggtitle(" Temperature changes in elevated and lying deadwood in summer under different canopies") +
labs(y = "temperature values(C°)", x = "months") +
scale_fill_manual(values = c("white", "white", "green", "black"))
my professor said:
i have to put number of objects on the legend on the graph and put the legend on the upper right-hand corner of the graph & make the legend bigger.
put the months in a chronological order like 11,12,1,2,3,4.. ( put the names of the months in the graph instead of numbers)
i created a basic ggplot but the problem is i can´t do the changes that they want from me cuz the names and order of the objects are so in my excel data.
A dput(head(summer.months)) might be sufficient. Anyway, here's an example using internal dataset mpg for illustrating few adjustments:
library(tidyverse)
## changing variable for x-Axis into ordered factor - this is a bit of a workaround. If using dates,
## it is better to use datatype date and adjust axis labels accordingly
my_mpg <- mpg %>%
mutate(class = factor(class, levels = c("compact", "midsize", "suv", "2seater", "minivan", "pickup", "subcompact"), ordered = TRUE))
ggplot(my_mpg, aes(x = class, y = hwy, linetype = class, colour = fl, fill = drv)) +
geom_boxplot() +
scale_fill_manual(values = c("white", "white", "green", "black")) +
## using subtitle to add information about the dataset
labs(title = "title", subtitle = paste("#lines: ", nrow(mpg))) +
theme_bw() +
theme(legend.justification = "top", ## move legend to top
legend.text = element_text(size = 10), ## adjust text sizes in legend
legend.title = element_text(size =10),
legend.key.size = unit(20, "pt"), ## if required: adjust size of legend keys
plot.subtitle = element_text(hjust = 1.0)) ## shift subtitle to the right
You might find further hints in ggplot2 reference and the ggplot2 book.

How do I add a legend to identify vertical lines in ggplot?

I have a chart that shows mobile usage by operating system. I'd like to add vertical lines to identify when those operating systems were released. I'll go through the chart and then the code.
The chart -
The code -
dev %>%
group_by(os) %>%
mutate(monthly_change = prop - lag(prop)) %>%
ggplot(aes(month, monthly_change, color = os)) +
geom_line() +
geom_vline(xintercept = as.numeric(ymd("2013-10-01"))) +
geom_text(label = "KitKat", x = as.numeric(ymd("2013-10-01")) + 80, y = -.5)
Instead of adding the text in the plot, I'd like to create a legend to identify each of the lines. I'd like to give each of them its own color and then have a legend to identify each. Something like this -
Can I make my own custom legend like that?
1) Define a data frame that contains the line data and then use geom_vline with it. Note that BOD is a data frame that comes with R.
line.data <- data.frame(xintercept = c(2, 4), Lines = c("lower", "upper"),
color = c("red", "blue"), stringsAsFactors = FALSE)
ggplot(BOD, aes( Time, demand ) ) +
geom_point() +
geom_vline(aes(xintercept = xintercept, color = Lines), line.data, size = 1) +
scale_colour_manual(values = line.data$color)
2) Alternately put the labels right on the plot itself to avoid an extra legend. Using the line.data frame above. This also has the advantage of avoiding possible multiple legends with the same aesthetic.
ggplot(BOD, aes( Time, demand ) ) +
geom_point() +
annotate("text", line.data$xintercept, max(BOD$demand), hjust = -.25,
label = line.data$Lines) +
geom_vline(aes(xintercept = xintercept), line.data, size = 1)
3) If the real problem is that you want two color legends then there are two packages that can help.
3a) ggnewscale Any color geom that appears after invoking new_scale_color will get its own scale.
library(ggnewscale)
BOD$g <- gl(2, 3, labels = c("group1", "group2"))
line.data <- data.frame(xintercept = c(2, 4), Lines = c("lower", "upper"),
color = c("red", "blue"), stringsAsFactors = FALSE)
ggplot(BOD, aes( Time, demand ) ) +
geom_point(aes(colour = g)) +
scale_colour_manual(values = c("red", "orange")) +
new_scale_color() +
geom_vline(aes(xintercept = xintercept, colour = line.data$color), line.data,
size = 1) +
scale_colour_manual(values = line.data$color)
3b) relayer The experimental relayer package (only on github) allows one to define two color aethetics, color and color2, say, and then have separate scales for each one.
library(dplyr)
library(relayer)
BOD$g <- gl(2, 3, labels = c("group1", "group2"))
ggplot(BOD, aes( Time, demand ) ) +
geom_point(aes(colour = g)) +
geom_vline(aes(xintercept = xintercept, colour2 = line.data$color), line.data,
size = 1) %>% rename_geom_aes(new_aes = c("colour" = "colour2")) +
scale_colour_manual(aesthetics = "colour", values = c("red", "orange")) +
scale_colour_manual(aesthetics = "colour2", values = line.data$color)
You can definitely make your own custom legend, but it is a bit complicated, so I'll take you through it step-by-step with some fake data.
The fake data contained 100 samples from a normal distribution (monthly_change for your data), 5 groupings (similar to the os variable in your data) and a sequence of dates from a random starting point.
library(tidyverse)
library(lubridate)
y <- rnorm(100)
df <- tibble(y) %>%
mutate(os = factor(rep_len(1:5, 100)),
date = seq(from = ymd('2013-01-01'), by = 1, length.out = 100))
You already use the colour aes for your call to geom_line, so you will need to choose a different aes to map onto the calls to geom_vline. Here, I use linetype and a call to scale_linetype_manual to manually edit the linetype legend to how I want it.
ggplot(df, aes(x = date, y = y, colour = os)) +
geom_line() +
# set `xintercept` to your date and `linetype` to the name of the os which starts
# at that date in your `aes` call; set colour outside of the `aes`
geom_vline(aes(xintercept = min(date),
linetype = 'os 1'), colour = 'red') +
geom_vline(aes(xintercept = median(date),
linetype = 'os 2'), colour = 'blue') +
# in the call to `scale_linetype_manual`, `name` will be the legend title;
# set `values` to 1 for each os to force a solid vertical line;
# use `guide_legend` and `override.aes` to change the colour of the lines in the
# legend to match the colours in the calls to `geom_vline`
scale_linetype_manual(name = 'lines',
values = c('os 1' = 1,
'os 2' = 1),
guide = guide_legend(override.aes = list(colour = c('red',
'blue'))))
And there you go, a nice custom legend. Please do remember next time that if you can provide your data, or a minimally reproducible example, we can better answer your question without having to generate fake data.

Add geom_hline to legend

After searching the web both yesterday and today, the only way I get a legend working was to follow the solution by 'Brian Diggs' in this post:
Add legend to ggplot2 line plot
Which gives me the following code:
library(ggplot2)
ggplot()+
geom_line(data=myDf, aes(x=count, y=mean, color="TrueMean"))+
geom_hline(yintercept = myTrueMean, color="SampleMean")+
scale_colour_manual("",breaks=c("SampleMean", "TrueMean"),values=c("red","blue"))+
labs(title = "Plot showing convergens of Mean", x="Index", y="Mean")+
theme_minimal()
Everything works just fine if I remove the color of the hline, but if I add a value in the color of hline that is not an actual color (like "SampleMean") I get an error that it's not a color (only for the hline).
How can adding a such common thing as a legend big such a big problem? There much be an easier way?
To create the original data:
#Initial variables
myAlpha=2
myBeta=2
successes=14
n=20
fails=n-successes
#Posterior values
postAlpha=myAlpha+successes
postBeta=myBeta+fails
#Calculating the mean and SD
myTrueMean=(myAlpha+successes)/(myAlpha+successes+myBeta+fails)
myTrueSD=sqrt(((myAlpha+successes)*(myBeta+fails))/((myAlpha+successes+myBeta+fails)^2*(myAlpha+successes+myBeta+fails+1)))
#Simulate the data
simulateBeta=function(n,tmpAlpha,tmpBeta){
tmpValues=rbeta(n, tmpAlpha, tmpBeta)
tmpMean=mean(tmpValues)
tmpSD=sd(tmpValues)
returnVector=c(count=n, mean=tmpMean, sd=tmpSD)
return(returnVector)
}
#Make a df for the data
myDf=data.frame(t(sapply(2:10000, simulateBeta, postAlpha, postBeta)))
Given solution works in most of the cases, but not for geom_hline (vline). For them you usually don't have to use aes, but when you need to generate a legend then you have to wrap them within aes:
library(ggplot2)
ggplot() +
geom_line(aes(count, mean, color = "TrueMean"), myDf) +
geom_hline(aes(yintercept = myTrueMean, color = "SampleMean")) +
scale_colour_manual(values = c("red", "blue")) +
labs(title = "Plot showing convergens of Mean",
x = "Index",
y = "Mean",
color = NULL) +
theme_minimal()
Seeing original data you can use geom_point for better visualisation (also added some theme changes):
ggplot() +
geom_point(aes(count, mean, color = "Observed"), myDf,
alpha = 0.3, size = 0.7) +
geom_hline(aes(yintercept = myTrueMean, color = "Expected"),
linetype = 2, size = 0.5) +
scale_colour_manual(values = c("blue", "red")) +
labs(title = "Plot showing convergens of Mean",
x = "Index",
y = "Mean",
color = "Mean type") +
theme_minimal() +
guides(color = guide_legend(override.aes = list(
linetype = 0, size = 4, shape = 15, alpha = 1))
)

Manually change order of y axis items on complicated stacked bar chart in ggplot2

I've been stuck on an issue and can't find a solution. I've tried many suggestions on Stack Overflow and elsewhere about manually ordering a stacked bar chart, since that should be a pretty simple fix, but those suggestions don't work with the huge complicated mess of code I plucked from many places. My only issue is y-axis item ordering.
I'm making a series of stacked bar charts, and ggplot2 changes the ordering of the items on the y-axis depending on which dataframe I am trying to plot. I'm trying to make 39 of these plots and want them to all have the same ordering. I think ggplot2 only wants to plot them in ascending order of their numeric mean or something, but I'd like all of the bar charts to first display the group "Bird Advocates" and then "Cat Advocates." (This is also the order they appear in my data frame, but that ordering is lost at the coord_flip() point in plotting.)
I think that taking the data frame through so many changes is why I can't just add something simple at the end or use the reorder() function. Adding things into aes() also doesn't work, since the stacked bar chart I'm creating seems to depend on those items being exactly a certain way.
Here's one of my data frames where ggplot2 is ordering my y-axis items incorrectly, plotting "Cat Advocates" before "Bird Advocates":
Group,Strongly Opposed,Opposed,Slightly Opposed,Neutral,Slightly Support,Support,Strongly Support
Bird Advocates,0.005473026,0.010946052,0.012509773,0.058639562,0.071149335,0.31118061,0.530101642
Cat Advocates,0.04491726,0.07013396,0.03624901,0.23719464,0.09141056,0.23404255,0.28605201
And here's all the code that takes that and turns it into a plot:
library(ggplot2)
library(reshape2)
library(plotly)
#Importing data from a .csv file
data <- read.csv("data.csv", header=TRUE)
data$s.Strongly.Opposed <- 0-data$Strongly.Opposed-data$Opposed-data$Slightly.Opposed-.5*data$Neutral
data$s.Opposed <- 0-data$Opposed-data$Slightly.Opposed-.5*data$Neutral
data$s.Slightly.Opposed <- 0-data$Slightly.Opposed-.5*data$Neutral
data$s.Neutral <- 0-.5*data$Neutral
data$s.Slightly.Support <- 0+.5*data$Neutral
data$s.Support <- 0+data$Slightly.Support+.5*data$Neutral
data$s.Strongly.Support <- 0+data$Support+data$Slightly.Support+.5*data$Neutral
#to percents
data[,2:15]<-data[,2:15]*100
#melting
mdfr <- melt(data, id=c("Group"))
mdfr<-cbind(mdfr[1:14,],mdfr[15:28,3])
colnames(mdfr)<-c("Group","variable","value","start")
#remove dot in level names
mylevels<-c("Strongly Opposed","Opposed","Slightly Opposed","Neutral","Slightly Support","Support","Strongly Support")
mdfr$variable<-droplevels(mdfr$variable)
levels(mdfr$variable)<-mylevels
pal<-c("#bd7523", "#e9aa61", "#f6d1a7", "#999999", "#c8cbc0", "#65806d", "#334e3b")
ggplot(data=mdfr) +
geom_segment(aes(x = Group, y = start, xend = Group, yend = start+value, colour = variable,
text=paste("Group: ",Group,"<br>Percent: ",value,"%")), size = 5) +
geom_hline(yintercept = 0, color =c("#646464")) +
coord_flip() +
theme(legend.position="top") +
theme(legend.key.width=unit(0.5,"cm")) +
guides(col = guide_legend(ncol = 12)) + #has 7 real columns, using to adjust legend position
scale_color_manual("Response", labels = mylevels, values = pal, guide="legend") +
theme(legend.title = element_blank()) +
theme(axis.title.x = element_blank()) +
theme(axis.title.y = element_blank()) +
theme(axis.ticks = element_blank()) +
theme(axis.text.x = element_blank()) +
theme(legend.key = element_rect(fill = "white")) +
scale_y_continuous(breaks=seq(-100,100,100), limits=c(-100,100)) +
theme(panel.background = element_rect(fill = "#ffffff"),
panel.grid.major = element_line(colour = "#CBCBCB"))
The plot:
I think this works, you may need to play around with the axis limits/breaks:
library(dplyr)
mdfr <- mdfr %>%
mutate(group_n = as.integer(case_when(Group == "Bird Advocates" ~ 2,
Group == "Cat Advocates" ~ 1)))
ggplot(data=mdfr) +
geom_segment(aes(x = group_n, y = start, xend = group_n, yend = start + value, colour = variable,
text=paste("Group: ",Group,"<br>Percent: ",value,"%")), size = 5) +
scale_x_continuous(limits = c(0,3), breaks = c(1, 2), labels = c("Cat", "Bird")) +
geom_hline(yintercept = 0, color =c("#646464")) +
theme(legend.position="top") +
theme(legend.key.width=unit(0.5,"cm")) +
coord_flip() +
guides(col = guide_legend(ncol = 12)) + #has 7 real columns, using to adjust legend position
scale_color_manual("Response", labels = mylevels, values = pal, guide="legend") +
theme(legend.title = element_blank()) +
theme(axis.title.x = element_blank()) +
theme(axis.title.y = element_blank()) +
theme(axis.ticks = element_blank()) +
theme(axis.text.x = element_blank()) +
theme(legend.key = element_rect(fill = "white"))+
scale_y_continuous(breaks=seq(-100,100,100), limits=c(-100,100)) +
theme(panel.background = element_rect(fill = "#ffffff"),
panel.grid.major = element_line(colour = "#CBCBCB"))
produces this plot:
You want to factor the 'Group' variable in the order by which you want the bars to appear.
mdfr$Group <- factor(mdfr$Group, levels = c("Bird Advocates", "Cat Advocates")

ggplot generating two legends when only one is wanted

In R I'm trying to generate a plot where I want to apply unique colors, line types, transparencies, and line thicknesses by case grouping. As currently implemented two legend plots are generated instead of one. The second legend plot is the only one that I can change the legend title. Presumably I've made a mistake any help would be greatly appreciated.
Ultimately I want to generate a single legend and have the style changes and labeling changes take effect.
library(ggplot2)
temp_df <- data.frame(year = integer(50), value = numeric(50), case = character(50))
temp_df$year <- 1:50
temp_df$value <- runif(50)
temp_df$case <- "A"
df <- temp_df
temp_df$value <- runif(50)
temp_df$case <- "B"
df <- rbind(df, temp_df)
LineTypes <- c("solid", "dotted")
colors <- c("red", "black")
linealphas <- c(1, .8)
linesizes <- c(1, 2)
Plot <- ggplot(df, aes(x = year, y = value, group = case))+
geom_line(aes(linetype = case, color = case, size = case, alpha = case))+
scale_linetype_manual(values = LineTypes)+
scale_color_manual(values = colors)+
scale_y_continuous(limits = c(0, 1), labels = scales::percent)+
scale_alpha_manual(values = linealphas)+
scale_size_manual(values = linesizes)+
xlab("Year")+
ylab("Percentage%")+
labs(color = "Scenario")+
theme_minimal()
Plot
If you want ggplot to merge the legends then they all have to have the same title. You can specify the legend title with the name argument in the scales:
ggplot(df, aes(x = year, y = value, group = case))+
geom_line(aes(linetype = case, color = case, size = case, alpha = case)) +
scale_linetype_manual(values = LineTypes, name = "Scenario")+
scale_color_manual(values = colors, name = "Scenario")+
scale_y_continuous(limits = c(0, 1), labels = scales::percent)+
scale_alpha_manual(values = linealphas, name = "Scenario")+
scale_size_manual(values = linesizes, name = "Scenario")+
xlab("Year")+
ylab("Percentage%")+
theme_minimal()
A coworker pointed out a resolution to me the key was to remove the guides so that only one of styles that I had defined was being used for the legend.
guides(size = FALSE)+
guides(alpha = FALSE)+
guides(linetype = FALSE)+
His explanation for this was that R doesn't recognize that the vector of factors defining the properties of the plot are necessarily related. As a result it will generate multiple legends when only one is desired.
library(ggplot2)
temp_df<-data.frame(year=integer(50),value=numeric(50),case=character(50))
temp_df$year<-1:50
temp_df$value<-runif(50)
temp_df$case<-"A"
df<-temp_df
temp_df$value<-runif(50)
temp_df$case<-"B"
df<-rbind(df,temp_df)
LineTypes<-c("solid","dotted")
colors<-c("red","black")
linealphas<-c(1,.8)
linesizes<-c(1,2)
Plot<-ggplot(df,aes(x=year,y=value,group=case))+
geom_line(aes(linetype=case, color=case, size=case, alpha =case))+
scale_linetype_manual(values=LineTypes)+
scale_color_manual(values=colors)+
scale_y_continuous(limits=c(0,1),labels = scales::percent)+
scale_alpha_manual(values=linealphas)+
scale_size_manual(values=linesizes)+
xlab("Year")+
ylab("Percentage%")+
labs(color = "Scenario")+
guides(size = FALSE)+
guides(alpha = FALSE)+
guides(linetype = FALSE)+
theme_minimal()
Plot
Can't you just remove the line "labs(color = "Scenario")"?
This is the plot that gets generated. Not sure if it's missing anything that you need.
The result seems fine to me:

Resources