R: ggplot2: Adding count labels to histogram with density overlay - r

I have a time-series that I'm examining for data heterogeneity, and wish to explain some important facets of this to some data analysts. I have a density histogram overlayed by a KDE plot (in order to see both plots obviously). However the original data are counts, and I want to place the count values as labels above the histogram bars.
Here is some code:
$tix_hist <- ggplot(tix, aes(x=Tix_Cnt))
+ geom_histogram(aes(y = ..density..), colour="black", fill="orange", binwidth=50)
+ xlab("Bin") + ylab("Density") + geom_density(aes(y = ..density..),fill=NA, colour="blue")
+ scale_x_continuous(breaks=seq(1,1700,by=100))
tix_hist + opts(
title = "Ticket Density To-Date",
plot.title = theme_text(face="bold", size=18),
axis.title.x = theme_text(face="bold", size=16),
axis.title.y = theme_text(face="bold", size=14, angle=90),
axis.text.x = theme_text(face="bold", size=14),
axis.text.y = theme_text(face="bold", size=14)
)
I thought about extrapolating count values using KDE bandwidth, etc, . Is it possible to data frame the numeric output of a ggplot frequency histogram and add this as a 'layer'. I'm not savvy on the layer() function yet, but any ideas would be helpful. Many thanks!

if you want the y-axis to show the bin_count number, at the same time, adding a density curve on this histogram,
you might use geom_histogram() first and record the binwidth value! (this is very important!), next add a layer of geom_density() to show the fitting curve.
if you don't know how to choose the binwidth value, you can just calculate:
my_binwidth = (max(Tix_Cnt)-min(Tix_Cnt))/30;
(this is exactly what geom_histogram does in default.)
The code is given below:
(suppose the binwith value you just calculated is 0.001)
tix_hist <- ggplot(tix, aes(x=Tix_Cnt)) ;
tix_hist<- tix_hist + geom_histogram(aes(y=..count..),colour="blue",fill="white",binwidth=0.001);
tix_hist<- tix_hist + geom_density(aes(y=0.001*..count..),alpha=0.2,fill="#FF6666",adjust=4);
print(tix_hist);

Related

Is it possible to add a 'break' to an axis that plots a categorical factor?

I am using ggplot2 to plot a mixed-design dataset in a violin plot.
The data was collected over three sessions: Baseline (collected on Day 1), Post-training (collected on Day 3) and Follow-up (collected on Day 30) and two groups: (1) Active and (2) Sham. For the sessions I have a categorical factor called 'Session' with the labels: Baseline, Post-training and Follow-up which are plotted on the x-axis. (Please ignore the rough state of the draft plot and dummy data for demonstration purposes).
level_order <- factor(tidied_data$Session, level = c('Baseline (Day 1)', 'Post-training (Day 3)', 'Follow-up (Day 30)'))
tidied_data %>%
ggplot(aes(x=level_order, y=Amplitude, fill=Group)) +
geom_violin(position=position_dodge(1), trim=FALSE) +
geom_jitter(binaxis='y', stackdir='center',
position=position_dodge(1)) +
stat_summary(fun = "mean", geom = "point",
size = 3, position=position_dodge(1), color="white") +
stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width=0.3, position=position_dodge(1), color="white") +
theme_bw() + # removes background colour
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + # removes grid lines
theme(panel.border = element_blank()) + # removes border lines
theme(axis.line = element_line(colour = "black")) + # adds axis lines
labs(title = "Group x Session",
x = "Session",
y = "Amplitude")
I want to demonstrate to the viewer that there is a different time-course between Baseline (Day 1), Post-training (Day 3) and follow-up (Day 30), it's a 30-day scale essentially.
From previous threads I've seen that this isn't something that ggplot2 will handle well, since broken axes are generally considered questionable.
I've come across the package 'ggbreak', where you can use the function 'scale_x_break' or scale_y_break' to set an axis break on a continuous variable. This doesn't work for the three time-points, presumably as it's a categorical factor.
Can anyone recommend a way that I can 'break' the axis to demonstrate the different length of time between the three sessions, or alternatively another way I could demonstrate this to the viewer? I've thought about adding custom spacing between bars, but I can only manage to set this to the same width for each bar, not different widths between different bars.
Any help would be greatly appreciated! Thanks in advance!
I can't recommend using discontinuous for this, but you can use facets to visually indicate small multiples. Example with a standard dataset below:
library(ggplot2)
ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(am))) +
geom_violin() +
facet_grid(~ cyl, scales = "free_x") +
theme_classic() +
theme(strip.text = element_blank()) # Hide strip text
Created on 2021-08-20 by the reprex package (v1.0.0)

ggplot2 dodged boxplot with geom_point dodging and unequal number of subgroups

I am attempting to plot a dodged boxplot but I run into a couple of difficulties. First of all, the x-axis basically has 2 types of grouping: the "letter-groups" (A, B, C etc...) are the main groups, I specify these as my "X" aesthetic (X_main_group). Within this main group I have subgroups called "X_group", the boxes are coloured by those subgroup types. What causes problems is that for each letter group I have different amounts of these subgroups, e.g. for x=A I have 4 subgroups but for x=B I have only one. This causes problems, for one the dodging of the plotted points do not work anymore (see the example plot below) as they do not align with the dodged boxplots. Secondly, the boxes are not centered around the x-axis tick anymore, this is most clear for x=B. How can I fix this?
I would also like to achieve small x-axis ticks below each subgroup (so 4 ticks for x=A, 1 tick for x=B, 3 for x=C etc..) but this has less priority. I have attached the figure, and in red I drew some examples of what I hope to achieve with the tick-marks. ggplot2 code is shown below. I would like to provide a reproducible piece of code, but I can not manage to create a piece of code that creates a dataframe with unequal amounts of subgroups so people that want to help can run it. I can only make "symmetrical" dataframes...
cbpallette <- c("#999999", "#666666", "#333333", "#000000", "#003300")
p1 <- ggplot(data=df, aes(x=X_main_group,y=Intensity, colour=factor(X_group))) + stat_boxplot(geom = "errorbar", width=.4, position = position_dodge(0.5, preserve="single")) + geom_boxplot(width=0.5, outlier.shape=NA, position=position_dodge(preserve = "single")) + theme_classic() + geom_point(position=position_jitterdodge(), alpha=0.3)
p2 <- p1 + scale_colour_manual(values = cbpallette) + theme(legend.position = "none") + theme(axis.ticks.length = unit(-0.1, "cm"), axis.text.x = element_text(size=30, vjust=-0.4), axis.text.y=element_text(size=35, hjust = 0.5, angle=45), axis.title = element_blank())
p3 <- p2 + theme(axis.text.x = element_text(margin = margin(t = .5, unit = "cm")), axis.text.y = element_text(margin = margin(r = .5, unit = "cm")))
p3

Dodged geom_errorbar with geom_bar using fill, colour, and alpha

I am attempting to make a facet_wrap bar_graph with error bars (se) that clearly shows three different categorical variables (Treatment, Horizon, Enzyme) with one response variable (AbundChangetoAvgCtl). Below is the code for some dummy data followed by the ggplot code I have so far. The graphs I've made can be see at this link:
bargraph figures
Enzyme <- c("Arabinosides","Arabinosides","Arabinosides","Arabinosides","Arabinosides","Arabinosides","Cellulose","Cellulose","Cellulose","Cellulose","Cellulose","Cellulose","Chitin","Chitin","Chitin","Chitin","Chitin","Chitin","Lignin","Lignin","Lignin","Lignin","Lignin","Lignin")
Treatment <- c("Deep","Deep","Int","Int","Low","Low","Deep","Deep","Int","Int","Low","Low","Deep","Deep","Int","Int","Low","Low","Deep","Deep","Int","Int","Low","Low")
Horizon <- c("Org","Min","Org","Min","Org","Min","Org","Min","Org","Min","Org","Min","Org","Min","Org","Min","Org","Min","Org","Min","Org","Min","Org","Min")
AbundChangetoAvgCtl <- rnorm(24,mean=0,sd=1)
se <- rnorm(24, mean=0.5, sd=0.25)
notrans_noctl_enz_toCtl_summary <- data.frame(Enzyme,Treatment,Horizon,AbundChangetoAvgCtl,se)
ggplot(notrans_noctl_enz_toCtl_summary, aes(x=Horizon, y=AbundChangetoAvgCtl, fill=Horizon, alpha=Treatment)) +
geom_bar(position=position_dodge(), colour="black", stat="identity", aes(fill=Horizon)) +
geom_errorbar(aes(ymin=AbundChangetoAvgCtl-se, ymax=AbundChangetoAvgCtl+se),
width=.2,
position=position_dodge(.9)) +
scale_fill_brewer(palette = "Set1") + theme_bw() +
geom_hline(yintercept=0) +
labs(y = "Rel Gene Abundance Change / Control", x="") +
theme(axis.ticks = element_blank(),
axis.text.x = element_blank(),
strip.text.x = element_text(size=20),
plot.title = element_text(size=22, vjust=2, face="bold"),
axis.title.y = element_text(size=18),
legend.key.size = unit(.75, "in"),
legend.text = element_text(size = 15),
legend.title = element_text(size = 18)) +
facet_wrap(~Enzyme, scales="free")
(figure 1)
So this is close to what I want, however for some reason, the "alpha=Treatment" call in ggplot causes my errorbars to fade (which I don't want) as well as the bar_fill (which I do want). I've tried moving the "alpha=Treatment" to the geom_bar call, as well as adding "alpha=1" to geom_bar, but when I do that, the error bars all move to a single location and overlap (figure 2).
I initially wanted to cluster the bars within facet_wrap, but found the alpha option on this site, which seems to accomplish what I'm looking for as well. Any help would be appreciated. If there is a better way to represent all of this, those ideas are welcome as well.
Also, if there is a way to condense and clarify my legend, that would be extra bonus!
Thanks in advance for your help!
Mike
You need to assign Treatment to the group option in the ggplot() command and then move the alpha=Treatment option to the geom_bar() command. Then the alpha value of geom_errorbar won't be affected by the global option and will be black. Like this:
ggplot(notrans_noctl_enz_toCtl_summary, aes(x=Horizon, y=AbundChangetoAvgCtl, fill=Horizon, group = Treatment)) +
geom_bar(position=position_dodge(), colour="black", stat="identity", aes(fill=Horizon, alpha = Treatment))
Also, I would check whether setting alpha=Treatment corresponds to more transparent as being equivalent to low treatment and less transparent to high treatment. At least that would be my intuitive understanding, without any background on the research design or data.
For information about formatting legends, see here.

In ggplot2, how to add another y-axis in the figure?

Here is my data (called "data" and is a CSV format file):
attitude,order,min,max,mean,SpRate
Commanding,7,0.023005096,1.6517,0.681777825,5.66572238
Friendly,10,0.20565908,1.7535,0.843770095,6.191950464
Hostile,12,0.105828885,2.4161,1.128603777,6.493506494
Insincere,1,0.110689225,1.5551,0.730545923,5.115089514
Irony,4,0.089307133,2.2395,0.955312553,5.249343832
Joking,2,0.165717303,2.1871,0.94512688,5.141388175
Neutral,5,-0.044620705,1.5322,0.696879247,5.420054201
Polite,11,0.170151929,1.8467,0.873735105,6.191950464
Praising,8,0.192402573,2.0631,0.972857404,5.797101449
Rude,13,0.249746688,2.2885,1.100819511,6.644518272
Serious,6,0.011312206,1.7195,0.693606814,5.649717514
Sincere,9,-0.09135461,1.6409,0.659525513,5.813953488
Suggesting,3,0.072541529,1.8345,0.82999014,5.249343832
Here is my code:
library(ggplot2)
ggplot (data, aes(x=order))+
geom_rect(aes(xmin=order-0.1, xmax=order+0.1, ymin = min, ymax=max), size=1, alpha=0,color="black")+
geom_bar(aes(y=SpRate, fill="SpRate"),stat="identity", alpha=0.2, width=0.9)+
geom_point(aes(y=min, shape="min"), size=5, fill="white")+
geom_point(aes(y=mean, shape="mean"), size=5)+
geom_point(aes(y=max, shape="max"), size=5)+
scale_x_continuous(breaks=c(1:13), labels=c("Insincere","Joking","Suggesting","Irony","Neutral","Serious","Commanding","Praising","Sincere","Friendly","Polite","Hostile", "Rude"))+
xlab("")+ylab("")+theme_bw()+
theme(axis.text.x=element_text(size=25,angle=45, vjust=0.5, color="black"))+
theme(legend.text = element_text(size = 20))+
theme(legend.title = element_text(size = 20))+
labs(shape = "f0:", fill = "SpRate:")+
scale_shape_manual(values=c("min"=15,"max"=16,"mean"=18))+
scale_fill_manual(values= "black")+
theme(axis.text.y = element_text(size=20))
So, as you can see from the plot, there are two plots indeed: A rectanglular with points and a bar-plot, but the y-axis of bar-plot obviously not adapt into the y-axis presented well, so, how to add another y-axis in the right of the whole plot which could adjust for the bar-plot better? (i.e. I want the y-axis of rectangular presented from 0 to 2.5 and bar-plot from 0 to 7)
You could add the second y-axis in ggplot2.
Use this example for one panel plot (http://rpubs.com/kohske/dual_axis_in_ggplot2)
Use my example for multiple panel plot (Dual y axis in ggplot2 for multiple panel figure)

Set different bandwidths in ggplot2 facet_grid plotting

Suppose I have a data set called "data", and is generated through:
library(reshape2) # Reshape data, needed in command "melt"
library(ggplot2) # apply ggplot
density <-rep (0.05, each=800)
tau <-rep (0.05, each=800)
# define two different models: network and non-network
model <-rep(1:2, each=400, times=1)
## Create data and factors for the plot
df <- melt(rnorm(800, -3, 0.5))
data <- as.data.frame(cbind(density, tau, model, df$value))
data$density <- factor(data$density,levels=0.05,
labels=c("Density=0.05"))
data$tau <- factor(data$tau,levels=0.05,
labels=c("tau=0.05"))
data$model<- factor(data$model,levels=c(1,2),
labels=c("Yes",
"No"))
ggplot(data=data, aes(x=V4, shape=model, colour=model, lty=model)) +
stat_density(adjust=1, geom="line",position="identity") +
facet_grid(tau~density, scale="free") +
geom_vline(xintercept=-3, lty="dashed") +
ggtitle("Kernel Density") +
xlab("Data") +
ylab("Kernel Density") +
theme(plot.title=element_text(face="bold", size=17), # change fond size of title
axis.text.x= element_text(size=14),
axis.text.y= element_text(size=14),
legend.title=element_text(size=14),
legend.text =element_text(size=12),
strip.text.x=element_text(size=14), # change fond size of x_axis
strip.text.y=element_text(size=14)) # change fond size of y_axis
Looking at the data, variable V4 is separated into two subsets by the model (Yes [1:400] and No [401:800]), and the kernel density is plotted without change the original bandwidth since adjust=1.
What I want to do is: for the Yes model, the bandwidth changes to 10 times of the original, but for the No model, the bandwidth keeps unchanged. Can I do something like letting the adjust=c(10, 1)? I know how to realize this by plot()+lines(), but I want to do this in ggplot() for further analysis.
I wouldn't recommend this, since it creates a very misleading plot, but you can do it with two calls to stat_density(...).
ggplot(data=data, aes(x=V4, shape=model, colour=model, lty=model)) +
stat_density(data=data[data$model=="Yes",], adjust=10,
geom="line",position="identity") +
stat_density(data=data[data$model=="No",], adjust=1,
geom="line",position="identity") +
facet_grid(tau~density, scale="free") +
geom_vline(xintercept=-3, lty="dashed") +
ggtitle("Kernel Density") +
xlab("Data") +
ylab("Kernel Density") +
theme(plot.title=element_text(face="bold", size=17),
axis.text.x= element_text(size=14),
axis.text.y= element_text(size=14),
legend.title=element_text(size=14),
legend.text =element_text(size=12),
strip.text.x=element_text(size=14),
strip.text.y=element_text(size=14))

Resources