I want to draw a dot plot with horizontal lines by groups.
The df object store the points and the df.line object stores the line I want to add to the dot plot. The horizontal lines are not the mean/median value of points, they are some standards I want to show in this figure.
I tried gome_hline, geom_line, geom_errorbar, and stat_summary. but none of them work as I want.
Could anyone teach me how to do it?
library(ggplot2)
library(tidytext)
set.seed(42)
df=data.frame(site=c(rep("a",5),rep("b",5),rep("c",5)),
sample=c(1:5,1:5,1:5),
value=c(runif(5, min=0.54, max=0.56),runif(5, min=0.52, max=0.6),runif(5,
min=0.3, max=0.4)),
condition=c(rep("c1",5),rep("c2",5),rep("c2",5)))
df.line=data.frame(site=c("a","b","c"),standard=c(0.55,0.4,0.53))
ggplot(df)+
geom_point(aes(x=tidytext::reorder_within(site,value,condition,fun=mean),
y=value))+
facet_grid(~condition,space="free_x",scales = "free_x")+
scale_x_reordered()
First, merge df and df.line together. Next, move the main aes() call to ggplot so it can be used later. Then use stat_summary:
library(dplyr)
merge(df,df.line) %>%
ggplot(aes(x=tidytext::reorder_within(site,value,condition,fun=mean),
y=value))+
geom_point()+
stat_summary(aes(y = standard, ymax = after_stat(y), ymin = after_stat(y)),
fun = mean, geom = "errorbar", color = "red", width = 0.3) +
facet_grid(~condition,space="free_x",scales = "free_x")+
scale_x_reordered()
Related
I wish to add the number of observations to this boxplot, not by group but separated by factor. Also, I wish to display the number of observations in addition to the x-axis label that it looks something like this: ("PF (N=12)").
Furthermore, I would like to display the mean value of each box inside of the box, displayed in millions in order not to have a giant number for each box.
Here is what I have got:
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
}
mean.n <- function(x){x <- x/1000000
return(c(y = median(x)*0.97, label = round(mean(x),2)))
}
ggplot(Soils_noctrl) +
geom_boxplot(aes(x=Slope,y=Events.g_Bacteria, fill = Detergent),
varwidth = TRUE) +
stat_summary(aes(x = Slope, y = Events.g_Bacteria), fun.data = give.n, geom = "text",
fun = median,
position = position_dodge(width = 0.75))+
ggtitle("Cell Abundance")+
stat_summary(aes(x = Slope, y = Events.g_Bacteria),
fun.data = mean.n, geom = "text", fun = mean, colour = "red")+
facet_wrap(~ Location, scale = "free_x")+
scale_y_continuous(name = "Cell Counts per Gram (Millions)",
breaks = round (seq(min(0),
max(100000000), by = 5000000),1),
labels = function(y) y / 1000000)+
xlab("Sample")
And so far it looks like this:
As you can see, the mean value is at the bottom of the plot and the number of observations are in the boxes but not separated
Thank you for your help! Cheers
TL;DR - you need to supply a group= aesthetic, since ggplot2 does not know on which column data it is supposed to dodge the text geom.
Unfortunately, we don't have your data, but here's an example set that can showcase the rationale here and the function/need for group=.
set.seed(1234)
df1 <- data.frame(detergent=c(rep('EDTA',15),rep('Tween',15)), cells=c(rnorm(15,10,1),rnorm(15,10,3)))
df2 <- data.frame(detergent=c(rep('EDTA',20),rep('Tween',20)), cells=c(rnorm(20,1.3,1),rnorm(20,4,2)))
df3 <- data.frame(detergent=c(rep('EDTA',30),rep('Tween',30)), cells=c(rnorm(30,5,0.8),rnorm(30,3.3,1)))
df1$smp='Sample1'
df2$smp='Sample2'
df3$smp='Sample3'
df <- rbind(df1,df2,df3)
Instead of using stat_summary(), I'm just going to create a separate data frame to hold the mean values I want to include as text on my plot:
summary_df <- df %>% group_by(smp, detergent) %>% summarize(m=mean(cells))
Now, here's the plot and use of geom_text() with dodging:
p <- ggplot(df, aes(x=smp, y=cells)) +
geom_boxplot(aes(fill=detergent))
p + geom_text(data=summary_df,
aes(y=m, label=round(m,2)),
color='blue', position=position_dodge(0.8)
)
You'll notice the numbers are all separated along y= just fine, but the "dodging" is not working. This is because we have not supplied any information on how to do the dodging. In this case, the group= aesthetic can be supplied to let ggplot2 know that this is the column by which to use for the dodging:
p + geom_text(data=summary_df,
aes(y=m, label=round(m,2), group=detergent),
color='blue', position=position_dodge(0.8)
)
You don't have to supply the group= aesthetic if you supply another aesthetic such as color= or fill=. In cases where you give both a color= and group= aesthetic, the group= aesthetic will override any of the others for dodging purposes. Here's an example of the same, but where you don't need a group= aesthetic because I've moved color= up into the aes() (changing fill to greyscale so that you can see the text):
p + geom_text(data=summary_df,
aes(y=m, label=round(m,2), color=detergent),
position=position_dodge(0.8)
) + scale_fill_grey()
FUN FACT: Dodging still works even if you supply geom_text() with a nonsensical aesthetic that would normally work for dodging, such as fill=. You get a warning message Ignoring unknown aesthetics: fill, but the dodging still works:
p + geom_text(data=summary_df,
aes(y=m, label=round(m,2), fill=detergent),
position=position_dodge(0.8)
)
# gives you the same plot as if you just supplied group=detergent, but with black text
In your case, changing your stat_summary() line to this should work:
stat_summary(aes(x = Slope, y = Events.g_Bacteria, group = Detergent),...
I am trying to add corresponding labels to the color in the bar in a histogram. Here is a reproducible code.
ggplot(aes(displ),data =mpg) + geom_histogram(aes(fill=class),binwidth = 1,col="black")
This code gives a histogram and give different colors for the car "class" for the histogram bars. But is there any way I can add the labels of the "class" inside corresponding colors in the graph?
The inbuilt functions geom_histogram and stat_bin are perfect for quickly building plots in ggplot. However, if you are looking to do more advanced styling it is often required to create the data before you build the plot. In your case you have overlapping labels which are visually messy.
The following codes builds a binned frequency table for the dataframe:
# Subset data
mpg_df <- data.frame(displ = mpg$displ, class = mpg$class)
melt(table(mpg_df[, c("displ", "class")]))
# Bin Data
breaks <- 1
cuts <- seq(0.5, 8, breaks)
mpg_df$bin <- .bincode(mpg_df$displ, cuts)
# Count the data
mpg_df <- ddply(mpg_df, .(mpg_df$class, mpg_df$bin), nrow)
names(mpg_df) <- c("class", "bin", "Freq")
You can use this new table to set a conditional label, so boxes are only labelled if there are more than a certain number of observations:
ggplot(mpg_df, aes(x = bin, y = Freq, fill = class)) +
geom_bar(stat = "identity", colour = "black", width = 1) +
geom_text(aes(label=ifelse(Freq >= 4, as.character(class), "")),
position=position_stack(vjust=0.5), colour="black")
I don't think it makes a lot of sense duplicating the labels, but it may be more useful showing the frequency of each group:
ggplot(mpg_df, aes(x = bin, y = Freq, fill = class)) +
geom_bar(stat = "identity", colour = "black", width = 1) +
geom_text(aes(label=ifelse(Freq >= 4, Freq, "")),
position=position_stack(vjust=0.5), colour="black")
Update
I realised you can actually selectively filter a label using the internal ggplot function ..count... No need to preformat the data!
ggplot(mpg, aes(x = displ, fill = class, label = class)) +
geom_histogram(binwidth = 1,col="black") +
stat_bin(binwidth=1, geom="text", position=position_stack(vjust=0.5), aes(label=ifelse(..count..>4, ..count.., "")))
This post is useful for explaining special variables within ggplot: Special variables in ggplot (..count.., ..density.., etc.)
This second approach will only work if you want to label the dataset with the counts. If you want to label the dataset by the class or another parameter, you will have to prebuild the data frame using the first method.
Looking at the examples from the other stackoverflow links you shared, all you need to do is change the vjust parameter.
ggplot(mpg, aes(x = displ, fill = class, label = class)) +
geom_histogram(binwidth = 1,col="black") +
stat_bin(binwidth=1, geom="text", vjust=1.5)
That said, it looks like you have other issues. Namely, the labels stack on top of each other because there aren't many observations at each point. Instead I'd just let people use the legend to read the graph.
Say I am working with the following (fake) data:
var1 <- runif(20, 0, 30)
var2 <- runif(20, 0, 40)
year <- c(1900:1919)
data_gg <- cbind.data.frame(var1, var2, year)
I melt the data for ggplot:
data_melt <- melt(data_gg, id.vars='year')
and I make a grouped barplot for var1 and var2:
plot1 <- ggplot(data_melt, aes(as.factor(year), value)) +
geom_bar(aes(fill = variable), position = "dodge", stat="identity")+
xlab('Year')+
ylab('Density')+
theme_light()+
theme(panel.grid.major.x=element_blank())+
scale_fill_manual(values=c('goldenrod2', 'firebrick2'), labels=c("Var1",
"Var2"))+
theme(axis.title = element_text(size=15),
axis.text = element_text(size=12),
legend.title = element_text(size=13),
legend.text = element_text(size=12))+
theme(legend.title=element_blank())
Finally, I want to add a line showing the cumulative sum (Var1 + Var2) for each year. I manage to make it using stat_summary, but it does not show up in the legend.
plot1 + stat_summary(fun.y = sum, aes(as.factor(year), value, colour="sum"),
group=1, color='steelblue', geom = 'line', size=1.5)+
scale_colour_manual(values=c("sum"="blue"))+
labs(colour="")
How can I make it so that it appears in the legend?
To be precise and without being a ggplot2 expert the thing that you need to change in your code is to remove the color argument from outside the aes of the stat.summary call.
stat_summary(fun.y = sum, aes(as.factor(year), value, col="sum"), group=1, geom = 'line', size=1.5)
Apparently, the color argument outside the aes function (so defining color as an argument) overrides the aesthetics mapping. Therefore, ggplot2 cannot show that mapping in the legend.
As far as the group argument is concerned it is used to connect the points for making the line, the details of which you can read here: ggplot2 line chart gives "geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?"
But it is not necessary to add it inside the aes call. In fact if you leave it outside the chart will not change.
Let's say I want to make a histogram
So I use the following code
v100<-c(runif(100))
v100
library(ggplot2)
private_plot<-ggplot()+aes(v100)+geom_histogram(binwidth = (0.1),boundary=0
)+scale_x_continuous(breaks=seq(0,1,0.1), lim=c(0,1))
private_plot
How do I separate my columns so that the whole thing is more pleasing to the eye?
I tried this but it somehow doesn't work:
Adding space between bars in ggplot2
Thanks
You could set the line color of the histogram bars with the col parameter, and the filling color with the fill parameter. This is not really adding space between the bars, but it makes them visually distinct.
library(ggplot2)
set.seed(9876)
v100<-c(runif(100))
### use "col="grey" to set the line color
ggplot() +
aes(v100) +
geom_histogram(binwidth = 0.1, fill="black", col="grey") +
scale_x_continuous(breaks = seq(0,1,0.1), lim = c(0,1))
Yielding this graph:
Please let me know whether this is what you want.
If you want to increase the space for e.g. to indicate that values are discrete, one thing to do is to plot your histogram as a bar plot. In that case, you have to summarize the data yourself, and use geom_col() instead of geom_histogram(). If you want to increase the space further, you can use the width parameter.
library(tidyverse)
lambda <- 1:6
pois_bar <-
map(lambda, ~rpois(1e5, .x)) %>%
set_names(lambda) %>%
as_tibble() %>%
gather(lambda, value, convert = TRUE) %>%
count(lambda, value)
pois_bar %>%
ggplot() +
aes(x = value, y = n) +
geom_col(width = .5) +
facet_wrap(~lambda, scales = "free", labeller = "label_both")
Just use color and fill options to distinguish between the body and border of bins:
library(ggplot2)
set.seed(1234)
df <- data.frame(sex=factor(rep(c("F", "M"), each=200)),
weight=round(c(rnorm(200, mean=55, sd=5), rnorm(200, mean=65, sd=5))))
ggplot(df, aes(x=weight)) +
geom_histogram(color="black", fill="white")
In cases where you are creating a "histogram" over a range of integers, you could use:
ggplot(data) + geom_bar(aes(x = value, y = ..count..))
I just came across this issue. My solution was to add vertical lines at the points separating my bins. I use "theme_classic" and have a white background. I set my bins to break at 10, 20, 30, etc. So I just added 9 vertical lines with:
geom_vline(xintercept=10, linetype="solid", color = "white", size=2)+
geom_vline(xintercept=20, linetype="solid", color = "white", size=2)+
etc
A silly hack, but it works.
I have two data frames: one I am using to create the bars in a barchart and a second that I am using to create a shaded "target region" behind the bars using geom_rect.
Here is example data:
test.data <- data.frame(crop=c("A","B","C"), mean=c(6,4,12))
target.data <- data.frame(crop=c("ONE","TWO"), mean=c(31,12), min=c(24,9), max=c(36,14))
I start with the means of test.data for the bars and means of target.data for the line in the target region:
library(ggplot2)
a <- ggplot(test.data, aes(y=mean, x=crop)) + geom_hline(aes(yintercept = mean, color = crop), target.data) + geom_bar(stat="identity")
a
So far so good, but then when I try to add a shaded region to display the min-max range of target.data, there is an issue. The shaded region appears just fine, but somehow, the crops from target.data are getting added to the x-axis. I'm not sure why this is happening.
b <- a + geom_rect(aes(xmin=-Inf, xmax=Inf, ymin=min, ymax=max, fill = crop), data = target.data, alpha = 0.5)
b
How can I add the geom_rect shapes without adding those extra names to the x-axis of the bar-chart?
This is a solution to your question, but I'd like to better understand you problem because we might be able to make a more interpretable plot. All you have to do is add aes(x = NULL) to your geom_rect() call. I took the liberty to change the variable 'crop' in add.data to 'brop' to minimize any confusion.
test.data <- data.frame(crop=c("A","B","C"), mean=c(6,4,12))
add.data <- data.frame(brop=c("ONE","TWO"), mean=c(31,12), min=c(24,9), max=c(36,14))
ggplot(test.data, aes(y=mean, x=crop)) +
geom_hline(data = add.data, aes(yintercept = mean, color = brop)) +
geom_bar(stat="identity") +
geom_rect(data = add.data, aes(xmin=-Inf, xmax=Inf, x = NULL, ymin=min, ymax=max, fill = brop),
alpha = 0.5, show.legend = F)
In ggplot calls all of the aesthetics or aes() are inherited from the intial call:
ggplot(data, aes(x=foo, y=bar)).
That means that regardless of what layers I add on geom_rect(), geom_hline(), etc. ggplot is looking for 'foo' to assign to x and 'bar' to assign to y, unless you specifically tell it otherwise. So like aeosmith pointed out you can clear all inherited aethesitcs for a layer with inherit.aes = FALSE, or you can knock out single variables at a time by reassigning them as NULL.