Ordering geom_col NOT by fill value - r

Please resist your instinct to jump at defining factors level. I am trying to make a bar plot with text annotations. I'm using geom_col with a y value aesthetic, and I'm using geom_text with a separate dataframe where the value has been converted into a cumulative sum. The order matters here, I want to plot based on the same order in which cumulative sum is calculated.
Example
library(ggplot2)
library(data.table)
example_df <- data.frame(gender = c('M', 'F', 'F', 'M'), month = c('1', '1', '2', '2'),
value = c(10, 20, 30, 40), name = c('Jack', 'Kate', 'Nassrin', 'Malik'))
setDT(example_df)
text_df <- example_df[, .(value=cumsum(value), name=name), by='month']
ggplot(example_df) + geom_col(aes(x=month, y=value, fill=gender)) +
geom_text(data=text_df, aes(x=month, y=value, label=name), vjust=1)
If you can see here, the left side is exactly what I want. Jack is labeled at 10 over the M color, Kate labeled 20 above that over the F color. The right side though is wrong. Nassrin is labeled at 30, but over the M color that is of height 40. This is because geom_col by default orders by fill, which is converted to a factor in alphabetic order. What I want here is for the left plot to be ordered M, F but the right one F, M. Is this possible? Or is my best solution to reorder my cumulative sum (which would lead to a different plot than I intend).

Set group and fill separately. The order of stacking (i.e. the position) is controlled by group, and when you don't define that it gets set automatically (in this case the definition of fill is used). So:
ggplot(example_df) +
geom_col(aes(x=month, y=value, group = fct_rev(fct_inorder(name)), fill = gender)) +
geom_text(data=text_df, aes(x=month, y=value, label=name), vjust=1)
Note that we can also let ggplot do the cumulative sums for us. Then we can use just the original data.frame, simplifying your plot to:
ggplot(example_df, aes(month, value, group = fct_rev(fct_inorder(name)),)) +
geom_col(aes(fill = gender)) +
geom_text(aes(label = name), position = 'stack', vjust = 1)

Related

data points misaligned when using a third value with position jitterdodge

Edited with sample data:
When I try to plot a grouped boxplot together with jittered points using position=position_jitterdodge(), and add an additional group indicated by e.g. shape, I end up with a graph where the jittered points are misaligned within the individual groups:
n <- 16
data <- data.frame(
age = factor(rep(c('young', 'old'), each=8)),
group=rep(LETTERS[1:2], n/2),
yval=rnorm(n)
)
ggplot(data, aes(x=group, y=yval))+
geom_boxplot(aes(color=group), outlier.shape = NA)+
geom_point(aes(color=group, shape=age, fill=group),size = 1.5, position=position_jitterdodge())+
scale_shape_manual(values = c(21,24))+
scale_color_manual(values=c("black", "#015393"))+
scale_fill_manual(values=c("white", "#015393"))+
theme_classic()
Is there a way to suppress that additional separation?
Thank you!
OP, I think I get what you are trying to explain. It seems the points are grouped according to age, rather than treated as the same for each group. The reason for this is that you have not specified what to group together. In order to jitter the points, they are first grouped together according to some aesthetic, then the jitter is applied. If you don't specify the grouping, then ggplot2 gives it a guess as to how you want to group the points.
In this case, it is grouping according to age and group, since both are defined to be used in the aesthetics (x=, fill=, and color= are assigned to group and shape= is assigned to age).
To define that you only want to group the points by the column group, you can use the group= aesthetic modifier. (reposting your data with a seed so you see the same thing)
set.seed(8675309)
n <- 16
data <- data.frame(
age = factor(rep(c('young', 'old'), each=8)),
group=rep(LETTERS[1:2], n/2),
yval=rnorm(n)
)
ggplot(data, aes(x=group, y=yval))+
geom_boxplot(aes(color=group), outlier.shape = NA)+
geom_point(aes(color=group, shape=age, fill=group, group=group),size = 1.5, position=position_jitterdodge())+
scale_shape_manual(values = c(21,24))+
scale_color_manual(values=c("black", "#015393"))+
scale_fill_manual(values=c("white", "#015393"))+
theme_classic()

Boxplot ggplot2: Show mean value and number of observations in grouped boxplot

I wish to add the number of observations to this boxplot, not by group but separated by factor. Also, I wish to display the number of observations in addition to the x-axis label that it looks something like this: ("PF (N=12)").
Furthermore, I would like to display the mean value of each box inside of the box, displayed in millions in order not to have a giant number for each box.
Here is what I have got:
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
}
mean.n <- function(x){x <- x/1000000
return(c(y = median(x)*0.97, label = round(mean(x),2)))
}
ggplot(Soils_noctrl) +
geom_boxplot(aes(x=Slope,y=Events.g_Bacteria, fill = Detergent),
varwidth = TRUE) +
stat_summary(aes(x = Slope, y = Events.g_Bacteria), fun.data = give.n, geom = "text",
fun = median,
position = position_dodge(width = 0.75))+
ggtitle("Cell Abundance")+
stat_summary(aes(x = Slope, y = Events.g_Bacteria),
fun.data = mean.n, geom = "text", fun = mean, colour = "red")+
facet_wrap(~ Location, scale = "free_x")+
scale_y_continuous(name = "Cell Counts per Gram (Millions)",
breaks = round (seq(min(0),
max(100000000), by = 5000000),1),
labels = function(y) y / 1000000)+
xlab("Sample")
And so far it looks like this:
As you can see, the mean value is at the bottom of the plot and the number of observations are in the boxes but not separated
Thank you for your help! Cheers
TL;DR - you need to supply a group= aesthetic, since ggplot2 does not know on which column data it is supposed to dodge the text geom.
Unfortunately, we don't have your data, but here's an example set that can showcase the rationale here and the function/need for group=.
set.seed(1234)
df1 <- data.frame(detergent=c(rep('EDTA',15),rep('Tween',15)), cells=c(rnorm(15,10,1),rnorm(15,10,3)))
df2 <- data.frame(detergent=c(rep('EDTA',20),rep('Tween',20)), cells=c(rnorm(20,1.3,1),rnorm(20,4,2)))
df3 <- data.frame(detergent=c(rep('EDTA',30),rep('Tween',30)), cells=c(rnorm(30,5,0.8),rnorm(30,3.3,1)))
df1$smp='Sample1'
df2$smp='Sample2'
df3$smp='Sample3'
df <- rbind(df1,df2,df3)
Instead of using stat_summary(), I'm just going to create a separate data frame to hold the mean values I want to include as text on my plot:
summary_df <- df %>% group_by(smp, detergent) %>% summarize(m=mean(cells))
Now, here's the plot and use of geom_text() with dodging:
p <- ggplot(df, aes(x=smp, y=cells)) +
geom_boxplot(aes(fill=detergent))
p + geom_text(data=summary_df,
aes(y=m, label=round(m,2)),
color='blue', position=position_dodge(0.8)
)
You'll notice the numbers are all separated along y= just fine, but the "dodging" is not working. This is because we have not supplied any information on how to do the dodging. In this case, the group= aesthetic can be supplied to let ggplot2 know that this is the column by which to use for the dodging:
p + geom_text(data=summary_df,
aes(y=m, label=round(m,2), group=detergent),
color='blue', position=position_dodge(0.8)
)
You don't have to supply the group= aesthetic if you supply another aesthetic such as color= or fill=. In cases where you give both a color= and group= aesthetic, the group= aesthetic will override any of the others for dodging purposes. Here's an example of the same, but where you don't need a group= aesthetic because I've moved color= up into the aes() (changing fill to greyscale so that you can see the text):
p + geom_text(data=summary_df,
aes(y=m, label=round(m,2), color=detergent),
position=position_dodge(0.8)
) + scale_fill_grey()
FUN FACT: Dodging still works even if you supply geom_text() with a nonsensical aesthetic that would normally work for dodging, such as fill=. You get a warning message Ignoring unknown aesthetics: fill, but the dodging still works:
p + geom_text(data=summary_df,
aes(y=m, label=round(m,2), fill=detergent),
position=position_dodge(0.8)
)
# gives you the same plot as if you just supplied group=detergent, but with black text
In your case, changing your stat_summary() line to this should work:
stat_summary(aes(x = Slope, y = Events.g_Bacteria, group = Detergent),...

ggplot - retaining axis label coloring with reordered data

I'm making a horizontal bar chart where each observation has a numeric count variable associated with it. I want to show the bars for each variable ordered by (descending) count, which is no problem. However I also want to highlight the variable name based on a third dichotomous variable. I found how to do the latter in another post on here, but I have been unable to combine the two. Here's an example of what I mean:
library(ggplot2)
testdata<-data.frame("var"=c('V1','V2','V3','V4'),"cat"=c('Y','N','Y','N'),
"count"=c(1,5,2,10))
ggplot(testdata, aes(var,count))+
geom_bar(stat='identity',colour='blue',fill='blue',width=0.3)+
coord_flip(ylim=c(0,10))+
theme(axis.text.y=
element_text(colour=ifelse(testdata$cat=="N","darkgreen","darkred"),
size=15))
That's the horizontal bar chart with highlighting, which works fine - V1/V3 are red and V2/V4 are green.
However when I try to sort it doesn't keep the groups:
ggplot(testdata, aes(reorder(var,count),count))+
geom_bar(stat='identity',colour='blue',fill='blue',width=0.3)+
coord_flip(ylim=c(0,10))+theme_classic()+
theme(axis.ticks.y=element_blank())+
theme(axis.text.y=
element_text(colour=ifelse(testdata$cat=="N","darkgreen","darkred"),
size=15))
In this second graph, V2 and V3 are the wrong color.
I also tried sorting the data by count first, and then using the first ggplot statement, however it still plots the data by variable name instead of count (and even if it did work, I would have to resolve tied count values). Any ideas? What I really need is for the dataframe in the "ifelse" colour to match the dataframe in the aes statement. I tried using the data frame that was sorted by descending count in the colour statement, but that also did not work.
Thanks
edit: more code
testdata$var = with(testdata, reorder(var, count))
ggplot(testdata, aes(var,count))+
geom_bar(stat='identity',colour='blue',fill='blue',width=0.3)+
coord_flip(ylim=c(0,10))+theme_classic()+
theme(axis.ticks.y=element_blank())+
theme(axis.text.y=
element_text(colour=ifelse(testdata$cat=="N","darkgreen","darkred"),
size=15))
My comment was partially incorrect. The order of the levels is the only thing that matters for the order of the axis, but when we do ifelse(testdata$cat == "N", "darkgreen", "darkred") of course it goes in the order of the data! So we need the order of the levels and the order of the data to be the same:
testdata$var = with(testdata, reorder(var, count))
testdata = testdata[order(testdata$var), ]
ggplot(testdata, aes(var, count)) +
geom_bar(
stat = 'identity',
colour = 'blue',
fill = 'blue',
width = 0.3
) +
coord_flip(ylim = c(0, 10)) + theme_classic() +
theme(axis.ticks.y = element_blank()) +
theme(axis.text.y =
element_text(
colour = ifelse(testdata$cat == "N", "darkgreen", "darkred"),
size = 15
))

Coloring geom_bars based upon values in dataset

I would like my bars to be red when the value is below zero. This is not the actual data I am working with but I hope this will create a reproduce-able example:
library(ggplot2)
library(car)
mtcars$carnames <- rownames(mtcars)
rownames(mtcars) <- 1:nrow(mtcars)
subsetCars <- as.data.frame(head(mtcars, n = 20))
subsetCars[1,4] <- -50
myplot.p <- ggplot(subsetCars, aes(x = subsetCars$carnames, y = subsetCars$hp))
myplot.p + geom_bar(stat = 'identity',
fill = ifelse(subsetCars$hp > 0, "lightblue", "firebrick")) +
coord_flip()
One bar is colored red, but not the one with the negative value. I have a similar problem with the current viz I am working on as well.
Advice?
Note that your fill argument in geom_bar takes a vector created by ifelse, of which the first element is "firebrick" and all other elements are "lightblue". So the first (bottom-most) bar will be filled with red. However, the first bar does not correspond to the row with the negative value, since the observations have been re-ordered by carnames in alphabetical order.
A more idiomatic way of plotting your desired chart is
myplot.p <- ggplot(subsetCars, aes(x = carnames, y = hp, fill = hp < 0))
myplot.p + geom_bar(stat = 'identity') +
scale_fill_manual("Negative hp", values = c("lightblue", "firebrick")) +
coord_flip()
where the $ subsetting is unnecessary, as #alistaire pointed out, and the fill aesthetic can be stated in ggplot().
The problem you are having is that when you are specifying the fill, ggplot is not assigning that aesthetic in the same order that it is for the names. So to make sure the order is preserved, you need to put the greater than zero variable with the other aesthetics.
The unfortunate side effect of this is you need to manually set the colors and remove the fill scale legend.
ggplot(subsetCars, aes(x = subsetCars$carnames, y = subsetCars$hp,
fill = hp > 0)) +
geom_bar(stat = 'identity') +
coord_flip() +
scale_fill_manual(values = c("TRUE" = "lightblue",
"FALSE" = "firebrick")) +
theme(legend.position = "none")
I hope that helps!

How to reorder the x axis on a stacked area plot

I have the following data frame and want to plot a stacked area plot:
library(ggplot2)
set.seed(11)
df <- data.frame(a = rlnorm(30), b = as.factor(1:10), c = rep(LETTERS[1:3], each = 10))
ggplot(df, aes(x = as.numeric(b), y = a, fill = c)) +
geom_area(position = 'stack') +
theme_grey() +
scale_x_discrete(labels = levels(as.factor(df$b))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
The resulting plot on my system looks like this:
Unfortunately, the x-axis doesn't seem to show up. I want to plot the values of df$b rotated so that they don't overlap, and ultimately I would like to sort them in a specific way (haven't gotten that far yet, but I will take any suggestions).
Also, according to ?factor() using as.numeric() with a factor is not the best way to do it. When I call ggplot but leave out the as.numeric() for aes(x=... the plot comes up empty.
Is there a better way to do this?
Leave b as a factor. You will further need to add a group aesthetic which is the same as the fill aesthetic. (This tells ggplot how to "connect the dots" between separate factor levels.)
ggplot(df, aes(x = b, y = a, fill = c, group = c)) +
geom_area(position = 'stack') +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
As for the order, the x-axis will go in the order of the factor levels. To change the order of the axis simply change the order of the factor levels. reorder() works well if you are basing it on a numeric column (or a function of a numeric column). For arbitrary orders, just specify the order of the levels directly in a factor call, something like: df$b = factor(df$b, levels = c("1", "5", "2", ...) For more examples of this, see the r-faq Order bars in ggplot. Yours isn't a barplot but the principle is identical.

Resources