I made a figure using geom_point from ggplot2 (just showing part of it). Colors are representing 3 classes. Black bar is mean (not relevant for the question).
The data structure is the following (stored in a list):
V1 V2 V3
1 L. brevis 5 class1
3 L. sp. 13 class1
4 L. rhamnosus 14 class1
5 L. lindneri 17 class1
6 L. plantarum 17 class1
7 L. acidophilus 18 class1
8 L. acidophilus 18 class1
10 L. plantarum 18 class1
... ... .. ...
Where V2 is the position of the datapoints on the y-axis and V3 is the class (color).
Now I would like to show the percentages for each of the three classes on top of the figure (Or maybe even as pie charts :-) ). I made an example for "L. acidophilus" on the image (66.7% / 33.3%).
The legend explaining groups ideally is also produced by R but I can do it manually.
How do I do that?
Forgot to add the 0% for group three on top of column "L. acidophilus"... Sorry for that.
EDIT: Here the ggplot2 code:
p <- ggplot(myData, aes(x=V1, y=V2)) +
geom_point(aes(color=V3, fill=V3), size=2.5, cex=5, shape=21, stroke=1) +
scale_color_manual(values=colBorder, labels=c("Class I","Class II","Class III","This study")) +
scale_fill_manual(values=col, labels=c("Class I","Class II","Class III","This study")) +
theme_bw() +
theme(axis.text.x=element_text(angle=50,hjust=1,face="italic", color="black"), text = element_text(size=12),
axis.text.y=element_text(color="black"), panel.grid.major = element_line(color="gray85",size=.15), panel.grid.minor = element_blank(),
panel.grid.major.y = element_blank(), axis.ticks = element_line(size = 0.3), panel.border = element_rect(fill=NA, colour = "black", size=0.3)) +
stat_summary(aes(shape="mean"), fun.y=mean, size = 6, shape=95, colour="black", geom="point") +
guides(fill=guide_legend(title="Class", order=1), color=guide_legend(title="Class",order=1), shape=guide_legend(title="Blup", order=2))
Option A: Secondary Axis
You can do this using a secondary x axis (new to ggplot2 v2.2.0), but it's hard to do with a categorical variable on the x axis because it doesn't work with scale_x_discrete(), only scale_x_continuous(). So, you have to convert the factor to integer, plot based on that, and then overwrite the labels on the primary x axis.
For example:
set.seed(123)
df <- iris[sample.int(nrow(iris),size=300,replace=TRUE),]
# Assume we are grouping by species
# Some group-level stats -- how about count and mean/sdev of sepal length
library(dplyr)
df_stats <- df %>%
group_by(Species) %>%
summarize(stat_txt = paste0(c('N=','avg=','sdev='),
c(n(),round(mean(Sepal.Length),2),round(sd(Sepal.Length),3) ),
collapse='\n') )
library(ggplot2)
ggplot(data = df,
aes(x = as.integer(Species),
y = Sepal.Length)) +
geom_point() +
stat_summary(aes(shape="mean"), fun.y=mean, size = 6, shape=95,
colour="black", geom="point") +
theme_bw() +
scale_x_continuous(breaks=1:length(levels(df$Species)),
limits = c(0,length(levels(df$Species))+1),
labels = levels(df$Species),
minor_breaks=NULL,
sec.axis=sec_axis(~.,
breaks=1:length(levels(df$Species)),
labels=df_stats$stat_txt)) +
xlab('Species') +
theme(axis.text.x = element_text(hjust=0))
Option B: grid.arrange your statistics as a separate chart atop your main chart.
This is a little more straightforward, but the two charts don't quite perfectly line up, possibly because of the ticks and labels being suppressed on the axes of the top chart.
library(ggplot2)
library(gridExtra)
p <-
ggplot(data = df,
aes(x = Species,
y = Sepal.Length)) +
geom_point() +
stat_summary(aes(shape="mean"), fun.y=mean, size = 6, shape=95,
colour="black", geom="point") +
theme_bw() +
theme(axis.text.x = element_text(angle=45, hjust=1, vjust=1))
annot <-
ggplot(data=df_stats, aes(x=Species, y = 0)) +
geom_text(aes(label=stat_txt), hjust=0) +
theme_minimal() +
scale_x_discrete(breaks=NULL) +
scale_y_continuous(breaks=NULL) +
xlab(NULL) + ylab('')
grid.arrange(annot, p, heights=c(1,8))
Related
I am currently working with these "lollipop" plots out of ggplot2.
I am trying to have the year values on the axis plotted only where observations are displayed. So I guess this would be an individual axis formatting but I couldn't find anything for this so far. I have a multitude of such plots to do - so that'S why i marked the red ones manually.
Any ideas?
This is my code and my plot:
# Create data
value1 <- c(33000000,45000000,45000000, 60000000,65000000,40000000)
value2 <- c(102984862,129342769,147717833,300228084,240159255,312242626)
value3 <- c(2002,2004,2007,2010,2012,2016)
data <- data.frame(value1=value1, value2=value2)
# Plot
ggplot(data) +
geom_segment( aes(x=value3, xend=value3, y=value1, yend=value2), color="grey") +
geom_point( aes(x=value3, y=value1), color=rgb(0.0,0.0,0.0,0.9), size=2 ) +
geom_point( aes(x=value3, y=value2), color=rgb(0.0,0.9,0.0,0.9), size=2 ) +
coord_flip()+
theme_grey() +
scale_y_continuous(name="", limits = c(0, 400000000)) +
theme(legend.position = "none",panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.text.x = element_blank()) +
You can add: scale_x_continuous(breaks = value3).
I am trying to reproduce a similar figure
ggplot2_ecdf
My Data looks like this
Category Value
A 2
A 3
A 4
A 2
A 4
B 2
B 1
B 6
C 1
C 2
C 3
C 3
I would like to plot the distribution with the category as X-axes and the values in y-axes. Since some of them have similar values, using the stat_ecdf () would be great to visualize the distribution with curves for the categories to horizontally displace similar points (similar to the figure in the link).
I used the beeswarm plot in ggplot but would like to use stat_ecdf to get a displaced distribution (showing each entry as dots per category). And also add a median line in red.
What I tried
a <- ggplot(df, aes(x=Category, y=value)) +
stat_ecdf()+
scale_y_continuous() +
theme_light() +
theme(axis.text.x = element_text(angle = 90)) +
xlab('category') +
ylab('values')
a
I'am a bit limited on time today, but maybe this can point you in the right direction.
a <- ggplot(data = df,
aes(x = value)) +
stat_ecdf(geom = "point",
size = 1,
pad = FALSE) +
xlab("category") +
ylab("values") +
facet_wrap(~ Category,
scales = "free_x",
strip.position = "bottom") +
coord_cartesian(clip = "off") +
theme_minimal() +
theme(axis.text.x = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank())
a
Update:
I played around a bit more. Hopefully this looks a bit better.
I have the following plot:
score = c(5,4,8,5)
Group = c('A','A','B','B')
Time = c('1','2','1','2')
df = data.frame(score,Group,Time)
df$Group = factor(df$Group)
df$Time = factor(df$Time)
a = ggplot(df, aes(x=Time, y=score, fill=Group)) +
geom_bar(position=position_dodge(), stat="identity", width = 0.8, color = 'black')
How do I reorder the bars such that Group A will be grouped together, followed by Group B, and the x-axis will be labelled as Time 1,2,1,2 for each bar? As shown below:
Having repeated elements on an axis is kinda against the principles of how ggplot2 works. But we can cheat a bit. I would suggest you use #RLave suggestion of using faceting. But if that doesn't suit you, I tried to do without facetting:
df2 <- rbind(df, data.frame(score=NA, Group=c('A'), Time=c('9')))
df2$x <- as.character(interaction(df2$Group, df2$Time))
ggplot(df2, aes(x=x, y=score, fill=Group)) +
geom_col(position='dodge', colour='black') +
scale_x_discrete(labels=c('1','2','','1','2')) +
theme(axis.ticks.x = element_blank(), panel.grid.major.x = element_blank())
As you can see, we have to create a dummy variable for the x-axis, and manually put on the labels.
Now consider a better solution using facet:
ggplot(df, aes(x=Time, y=score, fill=Group)) +
geom_col(width = 1, color = 'black') +
facet_grid(~Group) +
theme(strip.background = element_blank(), strip.text = element_blank(), panel.spacing.x=grid::unit(3, 'pt'))
The distance between the panels is adjusted with the theme argument panel.spacing.x.
I want to add the number of observations below each boxplot (as in the figure- no need for the red square). :)
However, I don't know how to annotate this type of boxplot (see figure below).multiple boxplot annotate number of observations
Does anyone know how to do it?
This is the code that I used to plot this figure.
ggplot(data=MIOT1, aes(stage, time, fill=resp)) +
geom_boxplot(color= "black", lwd=0.3) +
stat_summary(fun.y=mean, geom="point", shape=0, size=1, colour="black", position=position_dodge(width=0.75)) +
scale_fill_manual(values=c("grey25", "grey50", "grey67")) +
annotation_custom(mygrobA) +
scale_y_continuous(limits=c(-10,124)) +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
strip.background = element_rect(colour="black"),
panel.border = element_rect(colour = "black", fill="transparent")) +
xlab(bquote(' ')) +
ylab(bquote('Minimum Consecutive Time (s)')) +
labs(title="SATIATION\n") +
theme(axis.title.y = element_text(colour="black",size=10,face="bold"),
axis.text.x = element_text(colour="black",size=8, face="plain"),
axis.text.y = element_text(colour="black",size=8, face="plain"),
axis.title.x = element_text(colour="black",size=10,face="bold")) +
theme(panel.background = element_rect(fill = "white")) +
theme(plot.title = element_text(lineheight=.8, size=10, face="bold")) +
theme(legend.title=element_blank(), legend.key = element_rect(fill = NA, colour = NA)) +
theme(legend.position="none") +
theme(legend.background = element_rect(fill=NA)) +
theme(plot.margin = unit(c(.25,.25,.0,.0), "cm"))<i>
EXAMPLE DATA
MIOT1 is a numeric variable (y-axis), and I am considering two grouping factors (development stage- x axis) and the response (unresponsive, coastal, lagoon).
Something like
stage resp time
pre U 100
pre U 80
pre U 50
pre C 20
flex U 80
flex U 90
flex C 10
flex C 20
post U 40
post U 30
post U 60
post C 80
post C 100
post L 50
post L 40
Thank you!
Pedro
Here's a simple example of how to do it, using the built-in mtcars data frame:
ggplot(mtcars, aes(factor(cyl), mpg)) +
geom_boxplot() +
geom_text(stat="count", aes(label=..count..), y=min(mtcars$mpg)- 0.6)
In your case, it will be something like
ggplot(data=MIOT1, aes(stage, time, fill=resp)) +
geom_boxplot(color= "black", lwd=0.3) +
geom_text(stat="count", aes(label=..count..), y=min(MIOT1$time))
where you may have to adjust the y location of the text labels and you might also need to adjust the range of the y-axis to make room for the labels.
UPDATE: I was able to reproduce the error you reported, but I'm not sure how to fix it. Instead, you can pre-summarize the data and then add it to the plot. Here's an example:
library(dplyr)
# Get counts by desired grouping variables
counts = mtcars %>% group_by(cyl, am) %>% tally
ggplot(mtcars, aes(factor(cyl), mpg, fill=factor(am))) +
geom_boxplot(position=position_dodge(0.9)) +
geom_text(data=counts, aes(label=n, y=min(mtcars$mpg) - 0.6),
position=position_dodge(0.9))
IN SUMMARY EIPI10 ANSWERED MY QUESTION:
library(dplyr)
# Get counts by desired grouping variables
counts = mtcars %>% group_by(cyl, am) %>% tally
ggplot(mtcars, aes(factor(cyl), mpg, fill=factor(am))) +
geom_boxplot(position=position_dodge(0.9)) +
geom_text(data=counts, aes(label=n, y=min(mtcars$mpg) - 0.6), position=position_dodge(0.9)
I am trying to demonstrate the soil type (soil column) at different depths in the ground using box plots. However, as the sampling interval is not consistent, there are also gaps in between the samples.
My questions are as follows:
Is it possible to put the box plots within the same column? i.e. all box plots in 1 straight column
Is it possible to remove the x-axis labels and ticks when using ggdraw? I tried to remove it when using plot, but appears again when I use ggdraw.
My code looks like this:
SampleID <- c("Rep-1", "Rep-2", "Rep-3", "Rep-4")
From <- c(0,2,4,9)
To <- c(1,4,8,10)
Mid <- (From+To)/2
ImaginaryVal <- c(1,1,1,1)
Soiltype <- c("organic", "silt","clay", "sand")
df <- data.frame(SampleID, From, To, Mid, ImaginaryVal, Soiltype)
plot <- ggplot(df, aes(x=ImaginaryVal, ymin=From, lower=From,fill=Soiltype,
middle=`Mid`, upper=To, ymax=To)) +
geom_boxplot(colour= "black", stat="identity") + scale_y_reverse(breaks = seq(0,10,0.5)) + xlab('Soiltype') + ylab('Depth (m)') + theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())
ggdraw(switch_axis_position(plot + theme_bw(8), axis = 'x'))
In the image I have pointed out what I want, using the red arrows and lines.
You can use position = position_dodge() like so:
plot <- ggplot(df, aes(x=ImaginaryVal, ymin=From, lower=From,fill=Soiltype, middle=Mid, upper=To, ymax=To)) +
geom_boxplot(colour= "black", stat="identity", position = position_dodge(width=0)) +
scale_y_reverse(breaks = seq(0,10,0.5)) +
xlab('Soiltype') +
ylab('Depth (m)') +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())
edit: I don't think you need cowplot at all, if this is what you want your plot to look like:
ggplot(df, aes(x=ImaginaryVal, ymin=From, lower=From,fill=Soiltype, middle=Mid, upper=To, ymax=To)) +
geom_boxplot(colour= "black", stat="identity", position = position_dodge(width=0)) +
scale_y_reverse(breaks = seq(0,10,0.5)) +
xlab('Soiltype') +
ylab('Depth (m)') +
theme_bw() +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
xlab("") +
ggtitle("Soiltype")