Changing colour schemes between facets - r

I have a data.frame, something like the following:
set.seed(100)
df <- data.frame(year = rep(2011:2014, 3),
class = rep(c("high", "middle", "low"), each = 4),
age_group = rep(1:3, each = 4),
value = sample(1:2, 12, rep = TRUE))
and I am looking to produce, by facet-ing (by the variable age_group) three plots which look similar to those produced by the following code:
library(ggplot2)
blue <- c("#bdc9e1", "#74a9cf", "#0570b0")
ggplot(df) + geom_bar(aes(x = year, y = value,
fill = factor(class, levels = c("high", "middle", "low"))),
stat = "identity") +
scale_fill_manual(values = c(blue)) +
guides(fill = FALSE)
however, where each facet has a different colour scheme where all the colours are specified by myself.
I appear to want a more specific version of what is going on here: ggplot2: Change color for each facet in bar chart
So, using the data I have provided, I am looking to get three facet-ed plots, split by age_group where the fill is given in each plot by the level of class, and all colours (9 total) would be specified manually by myself.
Edit: For clarification, the facet that I would like to end up with is indeed provided by the following code:
ggplot(df) + geom_bar(aes(x = year, y = value,
fill = factor(class, levels = c("high", "middle", "low"))),
stat = "identity") +
scale_fill_manual(values = c(blue)) +
guides(fill = FALSE) +
facet_wrap(~ age_group)
with the added level of control of colour subset by the class variable.

I'm not entirely sure why you want to do this, so it is a little hard to know whether or not what I came up with addresses your actual use case.
First, I generated a different data set that actually has each class in each age_group:
set.seed(100)
df <- data.frame(year = rep(2011:2014, 3),
class = rep(c("high", "middle", "low"), each = 12),
age_group = rep(1:3, each = 4),
value = sample(1:2, 36, rep = TRUE))
If you are looking for a similar dark-to-light gradient within each age_group you can accomplish this directly using alpha and not worry about adding extra data columns:
ggplot(df) +
geom_bar(aes(x = year, y = value,
fill = factor(age_group)
, alpha = class ),
stat = "identity") +
facet_wrap(~age_group) +
scale_alpha_discrete(range = c(0.4,1)) +
scale_fill_brewer(palette = "Set1"
, name = "age_group")
Here, I set the range of the alpha to give reasonably visible colors, and just chose a default palette from RColorBrewer to show the idea. This gives:
It also gives a relatively usable legend as a starting point, though you could modify it further (here is a similar legend answer I gave to a different question: https://stackoverflow.com/a/39046977/2966222 )
Alternatively, if you really, really want to specify the colors yourself, you can add a column to the data, and base the color off of that:
df$forColor <-
factor(paste0(df$class, " (", df$age_group , ")")
, levels = paste0(rep(c("high", "middle", "low"), times = 3)
, " ("
, rep(1:3, each = 3)
, ")") )
Then, use that as your fill. Note here that I am using the RColorBrewer brewer.pal to pick colors. I find that the first color is too light to show up for bars like this, so I excluded it.
ggplot(df) +
geom_bar(aes(x = year, y = value,
fill = forColor),
stat = "identity") +
scale_fill_manual(values = c(brewer.pal(4, "Blues")[-1]
, brewer.pal(4, "Reds")[-1]
, brewer.pal(4, "Purples")[-1]
)
, name = "Class (age_group)") +
facet_wrap(~age_group)
gives:
The legend is rather busy, but could be modified similar to the other answer I linked to. This would then allow you to set whatever 9 (or more, for different use cases) colors you wanted.

Related

Reordering within 3-factor grouped plot

Using ggplot2, I'm attempting to reorder a data representation with 3 factors: condition, sex, and time.
library(ggplot2)
library(dplyr)
DF <- data.frame(value = rnorm(100, 20, sd = 0.1),
cond = c(rep("a",25),rep("b",25),rep("a",25),rep("b",25)),
sex = c(rep("M",50),rep("F",50)),
time = rep(c("1","2"),50)
)
ggplot(data=DF, aes( x = time,
y = value,
fill = cond,
colour = sex,
)
) +
geom_boxplot(size = 1, outlier.shape = NA) +
scale_fill_manual(values=c("#69b3a2", "#404080")) +
scale_color_manual(values=c("grey10", "grey40")) +
ggtitle("aF,aM,bF,bM") +
theme(legend.position = "top")
Badly ordered plot.
The way ggplot2 automatically orders condition first and interleaves sex poses the issue. It defaults to an interleaved "aF,aM,bF,bM" order regardless of which factor I assign to which aesthetic.
For analysis purposes, my preferred order is "aM,bM,aF,bF". Order sex first and interleave condition. I tried to fix it by converting the 2x2 factor assignments to one group with 4 levels, which gives me complete control over the order:
DF %>% mutate(grp = as.factor(paste0(cond,sex))) -> DF
level_order <- c("aM", "bM", "aF", "bF")
ggplot(data=DF, aes( x = time,
y = value,
fill = factor(grp, level=level_order),
colour = sex
)
) +
geom_boxplot(size = 1, outlier.shape = NA) +
scale_fill_manual(values=c("#69b3a2", "#404080","#69b3a2", "#404080")) +
scale_color_manual(values=c("grey10", "grey40", "grey40", "grey10")) +
ggtitle("aM,bM,aF,bF") +
theme(legend.position = "top")
Ordering OK, bad representation.
However artificial grouping like this has its downsides, subjects are not assigned to a group, they are male/female (can't be changed) and assigned to some condition. Also the plot legend is unnecessarily cluttered, it has 6 keys instead of 4. It doesn't convey that it's 2x2 repeated measures design all that well.
I'm not sure if what I'm trying to do makes sense (I hope this isn't some massive brain fart), any help would be appreciated.
The order in which you place the aesthetics controls the priority of its groupings. Thus if you switch the position of fill and colour you will get the result you are looking for (e.i. you want colour to be grouped first, and then fill)
ggplot(data=DF, aes( x = time,
y = value,
colour = sex,
fill = cond)) +
geom_boxplot(size = 1, outlier.shape = NA) +
scale_fill_manual(values=c("#69b3a2", "#404080")) +
scale_color_manual(values=c("grey10", "grey40")) +
theme(legend.position = "top")

How do you make a line graph with multiple lines from multiple variables in R

I have two dataframes and I want to plot a comparison between them. The plot and dataframes look like so
df2019 <- data.frame(Role = c("A","B","C"),Women_percent = c(65,50,70),Men_percent = c(35,50,30), Women_total =
c(130,100,140), Men_total = c(70,100,60))
df2016 <- data.frame(Role= c("A","B","C"),Women_percent = c(70,45,50),Men_percent = c(30,55,50),Women_total =
c(140,90,100), Men_total = c(60,110,100))
all_melted <- reshape2::melt(
rbind(cbind(df2019, year=2019), cbind(df2016, year=2016)),
id=c("year", "Role"))
Theres no reason I need the data in melted from, I just did it because I was plotting bar graphs with it, but now I need a line graph and I dont know how to make line graphs in melted form, and dont know how to keep that 19/16 tag if not in melted frame. When i try to make a line graph I dont know how to specify what "variable" will be used. I want the lines to be the Women,Men percent values, and the label to be the totals. (in this picture the geom_text is the percent values, I want it to use the total values)
Crucially I want the linetype to be dotted in 2016 and for the legend to show that
I think it would be simplest to rbind the two frames after labelling them with their year, then reshape the result so that you have columns for role, year, gender, percent and total.
I would then use a bit of alpha scale trickery to hide the points and labels from 2016:
df2016$year <- 2016
df2019$year <- 2019
rbind(df2016, df2019) %>%
pivot_longer(cols = 2:5, names_sep = "_", names_to = c("Gender", "Type")) %>%
pivot_wider(names_from = Type) %>%
ggplot(aes(Role, percent, color = Gender,
linetype = factor(year),
group = paste(Gender, year))) +
geom_line(size = 1.3) +
geom_point(size = 10, aes(alpha = year)) +
geom_text(aes(label = total, alpha = year), colour = "black") +
scale_colour_manual(values = c("#07aaf6", "#ef786f")) +
scale_alpha(range = c(0, 1), guide = guide_none()) +
scale_linetype_manual(values = c(2, 1)) +
labs(y = "Percent", color = "Gender", linetype = "Year")

Geom_bar with R (Beginner)

Good morning all,
I work on data that I would like to represent in the form of a bar graph by two according to my two departments. I generated a dataframe that looks like this:
> test = data.frame (type_transport = sample (c ("ON FOOT", "CAR", "TRANSPORT COMMON"), 5000, replace = T), type_route = sample (c ("N", "D", " A "," VC "), 5000, replace = T), department = sample (c (" department1"," department2"), 5000, replace = T), troncon = sample (x = 0: 17 , 5000, replace = T))
By entering this formula, I get a bar graph:
> ggplot (test, aes (x = route_type, y = troncon_km, fill = department)) + geom_bar (stat = "identity")
https://zupimages.net/viewer.php?id=20/19/vt1s.png
Now, I would like to split these bars in half, to display the data according to my two departments. For this, I use position = "dodge":
> ggplot (test, aes (x = road_type, y = troncon_km, fill = department)) + geom_bar (stat = "identity", position = "dodge")
But there is a problem. The Y scale is far too small compared to reality (we go from several thousand on the first graph to 15 on the second). I obviously missed something ...
https://zupimages.net/viewer.php?id=20/19/sbh5.png
I do not understand.
Thank you.
The reason why all bars are of equal height is because geom_bar(stat="identity") will plot a bar for each observation (and the height of the bar will equal the value for that observation). Since every category in both departments have at least 1 observation of 17, all bars are showing that value.
There are several ways to move forward:
1.
ggplot(test, aes(type_route, troncon_km, fill = department)) +
stat_summary(geom = "bar", position = "dodge", fun.y = sum)
The fun.y argument can be any other function (e.g. mean, or median etc.)
2.
library("tidyverse")
total_km <- test %>%
group_by(department, type_route) %>%
summarise(total_km = sum(troncon_km))
ggplot(total_km, aes(type_route, total_km, fill = department)) +
geom_bar(stat = "identity", position = "dodge")
Again you can change the sum() function within the summarise() to your liking.
using the same data frame total_km, only a litle bit shorter using geom_col
ggplot(total_km, aes(type_route, total_km, fill = department)) +
geom_col(position = "dodge")
Hope this helps.

Add second y axis based on same dataframe

I am trying to create a boxplot using ggplot2, and need to have two axes from the same data frame representing two different scales. Essentially I am plotting surface area to volume ratios per two different species for three appendages, and one of the appendages has a very high SA:V ratio in comparison to the other two, which makes it difficult to have them all on the same graph.
I've recreated my data and code for the boxplot to demonstrate what I am talking about. If possible I would like the dorsal fins to be displayed on the same graph, but on a different y axis scale (that will also be shown on the graph) just so the boxes of the boxplot are all visible.
SAV <- c(seq(.35, .7, .01), seq(.09, .125, .001), seq(.09, .125, .001))
Type <- c(rep("Pectoral Fin", 36), rep("Dorsal fin", 36), rep("Fluke", 36))
Species <- c(rep(c(rep("Sp1", 18), rep("Sp2", 18)), 3))
appendage <- data.frame(SAV, Type, Species)
ggplot(aes(y = appendage$SAV,
x = factor(appendage$Type, levels = c("Dorsal fin", "Fluke")),
fill = appendage$Species),
data = appendage) +
geom_boxplot(outlier.shape = NA) +
labs(y = expression("SA:V("*cm^-1*")"), x="") +
scale_x_discrete(labels = c("PF", "DF", "F")) +
scale_fill_manual(values = c("black", "gray"))
If any one could help me with this that would be great!
One possibility is to use facet_wrap.
appendage %>%
mutate(
Type = factor(Type,
levels = c("Dorsal fin", "Fluke", "Pectoral Fin"),
labels = c("DF", "PF", "F"))) %>%
ggplot(aes(Type, SAV, fill = Species)) +
geom_boxplot(outlier.shape=NA) +
labs(y=expression("SA:V("*cm^-1*")"),x="") +
scale_fill_manual(values=c("black","gray")) +
facet_wrap(~Type, scales="free") +
theme(axis.ticks.x = element_blank(),
strip.background = element_blank(),
strip.text.x = element_blank())
First off, like what others have commented, I do not recommend this type of plot. Dual axes have a tendency to make comparisons harder, & visually confuse the audience even when they are aware of it.
That said, it is possible to achieve this using ggplot2, & I'll show one approach below, once we get past several other issues in the original code:
Issue 1: You are passing a data frame to ggplot(). The dollar sign $ has no place in aes() in such cases.
Instead of:
ggplot(aes(y = appendage$SAV,
x = factor(appendage$Type), # ignore the levels for now; see next issue
fill = appendage$Species),
data = appendage) +
...
Use:
ggplot(aes(y = SAV,
x = factor(Type),
fill = Species),
data = appendage) +
...
Issue 2: Which appendage has the extraordinarily high SA:V?
From the code used to generate the sample dataset, it should be "Pectoral Fin", but the final result shows "DF". I assume the mapping between full terms & axis labels to be:
"Pectoral Fin" -> "PF"
"Dorsal fin" -> "DF"
"Fin" -> "F"
... so this looks like a slip up between passing Type as a factor to the x parameter in aes(), and setting the axis labels in scale_x_discrete().
Since you're using factor(), it would be neater to set the labels there as well. Keeping it in the same place makes such things easier to spot.
Instead of:
ggplot(aes(y = SAV,
x = factor(Type, levels = c("Dorsal fin", "Fluke")),
fill = Species),
data = appendage) +
...
Use:
ggplot(aes(y = SAV,
x = factor(Type,
levels = c("Dorsal fin", "Fluke", "Pectoral Fin"),
labels = c("DF", "F", "PF")),
fill = Species),
data = appendage) +
...
I switched the order of factors as I feel it makes (marginally) more sense visually for the x-axis category corresponding to the secondary y-axis (typically on the right) to be on the right of other x-axis categories. You can change that if this isn't the desired case. Just make sure both levels = ... and labels = ... are changed together.
Solution for secondary y-axis
Manually re-scale the values of the offending appendage (whichever fin that turns out to be) until its range is somewhat similar to that of other appendages. (In the example below, I used a simple division of y / 5, but more complicated functions can be used too.)
Specify the sec.axis() option for the y-axis, using the inverse of the re-scaling function as the transformation. (In this case y * 5.)
Label the original y-axis (left) and the secondary y-axis (right) accordingly to make it clear which appendage(s) each axis's scale applies to.
Final code + result:
k = 5 #rescale factor
ggplot(aes(y = ifelse(Type == "Pectoral Fin",
SAV / k, SAV),
x = factor(Type,
levels = c("Dorsal fin", "Fluke", "Pectoral Fin"),
labels = c("DF", "F", "PF")),
fill = Species),
data = appendage) +
geom_boxplot(outlier.shape = NA) +
scale_y_continuous(sec.axis = sec_axis(trans = ~. * k,
name = expression("SA:V ("*cm^-1*") PF"))) +
labs(y = expression("SA:V ("*cm^-1*") DF / F"), x = "") +
scale_fill_manual(values = c("black", "gray"))

Separate boxes for two grouping variables when color by only one variable

Here is an example from the geom_boxplot man page:
p = ggplot(mpg, aes(class, hwy))
p + geom_boxplot(aes(colour = drv))
which looks like this:
I would like to make a very similar plot, but with (yearmon formatted) dates where the class variable is in the example, and a factor variable where drv is in the example.
Here is some sample data:
df_box = data_frame(
Date = sample(
as.yearmon(seq.Date(from = as.Date("2013-01-01"), to = as.Date("2016-08-01"), by = "month")),
size = 10000,
replace = TRUE
),
Source = sample(c("Inside", "Outside"), size = 10000, replace = TRUE),
Value = rnorm(10000)
)
I have tried a bunch of different things:
Put an as.factor around the date variable, then I no longer have the nicely spaced out date scale for the x-axis:
df_box %>%
ggplot(aes(
x = as.factor(Date),
y = Value,
# group = Date,
color = Source
)) +
geom_boxplot(outlier.shape = NA) +
theme_bw() +
xlab("Month Year") +
theme(
axis.text.x = element_text(hjust = 1, angle = 50)
)
On the other hand, if I use Date as an additional group variable as suggested here, adding color no longer has any additional impact:
df_box %>%
ggplot(aes(
x = Date,
y = Value,
group = Date,
color = Source
)) +
geom_boxplot() +
theme_bw()
Any ideas as to how achieve the output of #1 while still maintaining a yearmon scale x-axis?
Since you need separate boxes for each combination of Date and Source, use interaction(Source, Date) as the group aesthetic:
ggplot(df_box, aes(x = Date, y = Value,
colour = Source,
group = interaction(Source, Date))) +
geom_boxplot()

Resources