Using ggplot2, I'm attempting to reorder a data representation with 3 factors: condition, sex, and time.
library(ggplot2)
library(dplyr)
DF <- data.frame(value = rnorm(100, 20, sd = 0.1),
cond = c(rep("a",25),rep("b",25),rep("a",25),rep("b",25)),
sex = c(rep("M",50),rep("F",50)),
time = rep(c("1","2"),50)
)
ggplot(data=DF, aes( x = time,
y = value,
fill = cond,
colour = sex,
)
) +
geom_boxplot(size = 1, outlier.shape = NA) +
scale_fill_manual(values=c("#69b3a2", "#404080")) +
scale_color_manual(values=c("grey10", "grey40")) +
ggtitle("aF,aM,bF,bM") +
theme(legend.position = "top")
Badly ordered plot.
The way ggplot2 automatically orders condition first and interleaves sex poses the issue. It defaults to an interleaved "aF,aM,bF,bM" order regardless of which factor I assign to which aesthetic.
For analysis purposes, my preferred order is "aM,bM,aF,bF". Order sex first and interleave condition. I tried to fix it by converting the 2x2 factor assignments to one group with 4 levels, which gives me complete control over the order:
DF %>% mutate(grp = as.factor(paste0(cond,sex))) -> DF
level_order <- c("aM", "bM", "aF", "bF")
ggplot(data=DF, aes( x = time,
y = value,
fill = factor(grp, level=level_order),
colour = sex
)
) +
geom_boxplot(size = 1, outlier.shape = NA) +
scale_fill_manual(values=c("#69b3a2", "#404080","#69b3a2", "#404080")) +
scale_color_manual(values=c("grey10", "grey40", "grey40", "grey10")) +
ggtitle("aM,bM,aF,bF") +
theme(legend.position = "top")
Ordering OK, bad representation.
However artificial grouping like this has its downsides, subjects are not assigned to a group, they are male/female (can't be changed) and assigned to some condition. Also the plot legend is unnecessarily cluttered, it has 6 keys instead of 4. It doesn't convey that it's 2x2 repeated measures design all that well.
I'm not sure if what I'm trying to do makes sense (I hope this isn't some massive brain fart), any help would be appreciated.
The order in which you place the aesthetics controls the priority of its groupings. Thus if you switch the position of fill and colour you will get the result you are looking for (e.i. you want colour to be grouped first, and then fill)
ggplot(data=DF, aes( x = time,
y = value,
colour = sex,
fill = cond)) +
geom_boxplot(size = 1, outlier.shape = NA) +
scale_fill_manual(values=c("#69b3a2", "#404080")) +
scale_color_manual(values=c("grey10", "grey40")) +
theme(legend.position = "top")
Related
I have an issue with my ggplot() reordering the data. I have an example code below. I have data, and reordered the factors in feed to my content, but after the str_extract() in facet_wrap(), the data gets reordered back before I reordered it. Is there a way to prevent that from occurring? For my actual code, it is important for me to use regex within the facet_wrap() in ggplot,
data <- chickwts
data <- mutate(data, time = 1:nrow(data))
lvl <- c("linseed", "meatmeal", "sunflower", "soybean",
"casein", "horsebean")
data$feed <- factor(data$feed, levels = lvl)
ggplot(data, aes(x = time, y = weight, color = feed)) +
geom_line(size = 1) + geom_point(size = 1.75) +
facet_wrap(~str_extract(feed,"[a-z]+"))
You could put the factor inside the facet_wrap:
ggplot(data, aes(x = time, y = weight, color = feed)) +
geom_line(size = 1) + geom_point(size = 1.75) +
facet_wrap(~ factor(str_extract(feed,"[a-z]+"), levels = lvl))
I'm trying to plot proportions with geom_bar() combining fill and facet_grid.
library(tidyverse)
set.seed(123)
df <- data_frame(val_num = c(rep(1, 60), rep(2, 40), rep(1, 30), rep(2, 70)),
val_cat = ifelse(val_num == 1, "cat", "mouse"),
val_fill = sample(c("black", "white", "gray"), 200, replace = TRUE),
group = rep(c("A", "B"), each = 100))
ggplot(df) +
stat_count(mapping = aes(x = val_cat, y = ..count../tapply(..count.., ..x.. , sum)[..x..],
fill = val_fill),
position = position_dodge2(preserve = "single")) +
facet_grid(.~ group)
However, it seems that proportions are calculated for all cats (or all mices) in categories A and B together. In other words, sum of proportions in the first three columns is not 1.
It should be solved with adding group = group into the mapping. However:
ggplot(df) +
stat_count(mapping = aes(x = val_cat, y = ..count../tapply(..count.., ..x.. , sum)[..x..],
fill = val_fill, group = group),
position = position_dodge2(preserve = "single")) +
facet_grid(.~ group)
plot ignores fill argument (and moreover does not solve the issue). I tried to specify group with different choices including interaction() but without any real success.
I would like to solve problem within ggplot and I would like to avoid data manipulation before plotting.
So it wasn't as easy as I thought because I don't tend to use the stat_xxx() functions. Although you seem persistent in not manipulating the data before hand, here is an approach you can use.
grouped.df <- df %>%
group_by( group, val_fill ) %>%
count( val_cat ) %>%
ungroup() %>%
group_by( group, val_cat ) %>%
mutate( prop=n/sum(n) ) %>%
ungroup()
grouped.df %>%
ggplot() +
geom_col( aes(x=val_cat,y=prop,fill=val_fill), position="dodge" ) +
facet_wrap( ~ group )
to produce
But getting back to your "no data manipulation approach", I think your error is within your y variable. For example, consider the following code and output.
df2 %>%
ggplot() +
stat_count( aes(x=val_cat,y=..count..,color=val_fill,label=tapply(..count.., ..x.. , sum)[..x..]),
geom="text" ) +
facet_wrap( ~ group )
In the plot above, the y value is the numerator of your attempted proportion and the label value is the denominator of your attempted proportion. I think all you need to do is mess around some more with your tapply() function calls until you have the right combination of y and label.
I'm little bit stuck on ggplot2 trying to plot several data frame in one plot.
I have several data frame here I'll present just two exemples.
The data frame have the same Header but are different. Let say that I want to count balls that I have in 2 boxes.
name=c('red','blue','green','purple','white','black')
value1=c(2,3,4,2,6,8)
value2=c(1,5,7,3,4,2)
test1=data.frame("Color"=name,"Count"=value1)
test2=data.frame("Color"=name,"Count"=value2)
What I'm trying to do it's to make a bar plot of my count.
At the moment what I did it's :
(plot_test=ggplot(NULL, aes(x= Color, y=Count)) +
geom_bar(data=test1,stat = "identity",color='green')+
geom_bar(data=test2,stat = "identity",color='blue')
)
I want to have x=Color and y=Count, and barplot of test2 data frame next to test1. Here there are overlapping themselves. So I'll have same name twice in x but I want to plot the data frames in several color and got in legend the name.
For example "Green bar" = test1
"Blue bar" = test2
Thank you for your time and your help.
Best regards
You have two options here:
Either tweak the size and position of the bars
ggplot(NULL, aes(x= Color, y=Count)) +
geom_bar(data=test1, aes(color='test1'), stat = "identity",
width=.4, position=position_nudge(x = -0.2)) +
geom_bar(data=test2, aes(color='test2'), stat = "identity",
width=.4, position=position_nudge(x = 0.2))
or what I recommend is join the two data frames together and then plot
library(dplyr)
test1 %>%
full_join(test2, by = 'Color') %>%
data.table::melt(id.vars = 'Color') %>%
ggplot(aes(x= Color, y=value, fill = variable)) +
geom_bar(stat = "identity", position = 'dodge')
Try this:
name=c('red','blue','green','purple','white','black')
value1=c(2,3,4,2,6,8)
value2=c(1,5,7,3,4,2)
test1=data.frame("Color"=name,"Count"=value1)
test2=data.frame("Color"=name,"Count"=value2)
test1$var <- 'test1'
test2$var <- 'test2'
test_all <- rbind(test1,test2)
(plot_test=ggplot(data=test_all) +
geom_bar(aes(x=Color,y=Count,color=var),
stat = "identity", position=position_dodge(1))+
scale_color_manual(values = c('green', 'blue'))
)
This will do what you were trying to do:
balls <- data.frame(
count = c(c(2,3,4,2,6,8),c(1,5,7,3,4,2)),
colour = c(c('red','blue','green','purple','white','black'),c('red','blue','green','purple','white','black')),
box = c(rep("1", times = 6), rep("2", times = 6))
)
ggplot(balls, aes(x = colour, y = count, fill = box)) +
geom_col() +
scale_fill_manual(values = c("green","blue"))
This is better because it facilitates comparisons between the box counts:
ggplot(balls, aes(x = colour, y = count)) +
geom_col() +
facet_wrap(~ box, ncol = 1, labeller = as_labeller(c("1" = "Box #1", "2" = "Box #2")))
I am trying to create a boxplot using ggplot2, and need to have two axes from the same data frame representing two different scales. Essentially I am plotting surface area to volume ratios per two different species for three appendages, and one of the appendages has a very high SA:V ratio in comparison to the other two, which makes it difficult to have them all on the same graph.
I've recreated my data and code for the boxplot to demonstrate what I am talking about. If possible I would like the dorsal fins to be displayed on the same graph, but on a different y axis scale (that will also be shown on the graph) just so the boxes of the boxplot are all visible.
SAV <- c(seq(.35, .7, .01), seq(.09, .125, .001), seq(.09, .125, .001))
Type <- c(rep("Pectoral Fin", 36), rep("Dorsal fin", 36), rep("Fluke", 36))
Species <- c(rep(c(rep("Sp1", 18), rep("Sp2", 18)), 3))
appendage <- data.frame(SAV, Type, Species)
ggplot(aes(y = appendage$SAV,
x = factor(appendage$Type, levels = c("Dorsal fin", "Fluke")),
fill = appendage$Species),
data = appendage) +
geom_boxplot(outlier.shape = NA) +
labs(y = expression("SA:V("*cm^-1*")"), x="") +
scale_x_discrete(labels = c("PF", "DF", "F")) +
scale_fill_manual(values = c("black", "gray"))
If any one could help me with this that would be great!
One possibility is to use facet_wrap.
appendage %>%
mutate(
Type = factor(Type,
levels = c("Dorsal fin", "Fluke", "Pectoral Fin"),
labels = c("DF", "PF", "F"))) %>%
ggplot(aes(Type, SAV, fill = Species)) +
geom_boxplot(outlier.shape=NA) +
labs(y=expression("SA:V("*cm^-1*")"),x="") +
scale_fill_manual(values=c("black","gray")) +
facet_wrap(~Type, scales="free") +
theme(axis.ticks.x = element_blank(),
strip.background = element_blank(),
strip.text.x = element_blank())
First off, like what others have commented, I do not recommend this type of plot. Dual axes have a tendency to make comparisons harder, & visually confuse the audience even when they are aware of it.
That said, it is possible to achieve this using ggplot2, & I'll show one approach below, once we get past several other issues in the original code:
Issue 1: You are passing a data frame to ggplot(). The dollar sign $ has no place in aes() in such cases.
Instead of:
ggplot(aes(y = appendage$SAV,
x = factor(appendage$Type), # ignore the levels for now; see next issue
fill = appendage$Species),
data = appendage) +
...
Use:
ggplot(aes(y = SAV,
x = factor(Type),
fill = Species),
data = appendage) +
...
Issue 2: Which appendage has the extraordinarily high SA:V?
From the code used to generate the sample dataset, it should be "Pectoral Fin", but the final result shows "DF". I assume the mapping between full terms & axis labels to be:
"Pectoral Fin" -> "PF"
"Dorsal fin" -> "DF"
"Fin" -> "F"
... so this looks like a slip up between passing Type as a factor to the x parameter in aes(), and setting the axis labels in scale_x_discrete().
Since you're using factor(), it would be neater to set the labels there as well. Keeping it in the same place makes such things easier to spot.
Instead of:
ggplot(aes(y = SAV,
x = factor(Type, levels = c("Dorsal fin", "Fluke")),
fill = Species),
data = appendage) +
...
Use:
ggplot(aes(y = SAV,
x = factor(Type,
levels = c("Dorsal fin", "Fluke", "Pectoral Fin"),
labels = c("DF", "F", "PF")),
fill = Species),
data = appendage) +
...
I switched the order of factors as I feel it makes (marginally) more sense visually for the x-axis category corresponding to the secondary y-axis (typically on the right) to be on the right of other x-axis categories. You can change that if this isn't the desired case. Just make sure both levels = ... and labels = ... are changed together.
Solution for secondary y-axis
Manually re-scale the values of the offending appendage (whichever fin that turns out to be) until its range is somewhat similar to that of other appendages. (In the example below, I used a simple division of y / 5, but more complicated functions can be used too.)
Specify the sec.axis() option for the y-axis, using the inverse of the re-scaling function as the transformation. (In this case y * 5.)
Label the original y-axis (left) and the secondary y-axis (right) accordingly to make it clear which appendage(s) each axis's scale applies to.
Final code + result:
k = 5 #rescale factor
ggplot(aes(y = ifelse(Type == "Pectoral Fin",
SAV / k, SAV),
x = factor(Type,
levels = c("Dorsal fin", "Fluke", "Pectoral Fin"),
labels = c("DF", "F", "PF")),
fill = Species),
data = appendage) +
geom_boxplot(outlier.shape = NA) +
scale_y_continuous(sec.axis = sec_axis(trans = ~. * k,
name = expression("SA:V ("*cm^-1*") PF"))) +
labs(y = expression("SA:V ("*cm^-1*") DF / F"), x = "") +
scale_fill_manual(values = c("black", "gray"))
If I want to order the bars in a ggplot2 barchart from largest to smallest, then I'd usually update the factor levels of the bar category, like so
one_group <- data.frame(
height = runif(5),
category = gl(5, 1)
)
o <- order(one_group$height, decreasing = TRUE)
one_group$category <- factor(one_group$category, levels = one_group$category[o])
p_one_group <- ggplot(one_group, aes(category, height)) +
geom_bar(stat = "identity")
p_one_group
If have have several groups of barcharts that I'd like in different facets, with each facet having bars ordered from largest to smallest (and different x-axes) then the technique breaks down.
Given some sample data
two_groups <- data.frame(
height = runif(10),
category = gl(5, 2),
group = gl(2, 1, 10, labels = letters[1:2])
)
and the plotting code
p_two_groups <- ggplot(two_groups, aes(category, height)) +
geom_bar(stat = "identity") +
facet_grid(. ~ group, scales = "free_x")
p_two_groups
what do I need to do to get the bar ordering right?
If it helps, an equivalent problem to solve is: how do I update factor levels after I've done the faceting?
here is a hack:
two_groups <- transform(two_groups, category2 = factor(paste(group, category)))
two_groups <- transform(two_groups, category2 = reorder(category2, rank(height)))
ggplot(two_groups, aes(category2, height)) +
geom_bar(stat = "identity") +
facet_grid(. ~ group, scales = "free_x") +
scale_x_discrete(labels=two_groups$category, breaks=two_groups$category2)
make UNIQUE factor variable for all entries (category2)
reorder the variable based on the height
plot on the variable: aes(x=category2)
re-label the axis using original value (category) for the variable (category2) in scale_x_discrete.
Here is a hack to achieve what you want. I was unable to figure out how to get the category values below the tick marks. So if someone can help fix that, it would be wonderful. Let me know if this works
# add a height rank variable to the data frame
two_groups = ddply(two_groups, .(group), transform, hrank = rank(height));
# plot the graph
p_two_groups <- ggplot(two_groups, aes(-hrank, height)) +
geom_bar(stat = "identity") +
facet_grid(. ~ group, scales = "free_x") +
opts(axis.text.x = theme_blank()) +
geom_text(aes(y = 0, label = category, vjust = 1.5))