How to show percent labels on histogram bars using ggplot2 - r

I have seen lots of question regarding converting count on y axis into percent but must of them are in bar plot.
I want to do similar thing in histogram but not able to show the labels on the bar clearly. Please tell me where I am doing wrong.
x = runif(100, min = 0, max = 10)
data1 <- data.frame(x = x)
ggplot(aes(x = x), data = data1)+
geom_histogram(aes(y = (..count..)/sum(..count..)), bins = 10, breaks =
seq(0,10,1), fill = "blue", col = "black")+
geom_text(aes(y = ((..count..)/sum(..count..)),
label = scales::percent((..count..)/sum(..count..))),
stat = "count", vjust = -10)+
scale_y_continuous(labels = scales::percent)
Output:

Use scale_y_continous with breaks and labels will solve your problem.
data1 <- data.frame (x = runif(100, min = 0, max = 10))
ggplot(aes(x=x), data1) + stat_bin(aes(y = ..count..))
ggplot(data1, aes(x = x)) + geom_histogram(fill = "blue", col = "black")+ scale_y_continuous(breaks = seq(0,10,1),labels = paste(seq(0, 10, by = 1) / 100, "%", sep = ""))+geom_text(aes(y = (..count..),label = scales::percent((..count..)/sum(..count..))), stat="bin",colour="green",vjust=2)
or, you can specify where you would like to add the percentage like this:
geom_text(aes(y = (..count..)+0.5))
of course you can change the color as well. from,
stat="bin",colour="your prefer color "
Also you can change the width of the bins as follows:
geom_histogram(fill = "blue", col = "black", binwidth = 0.5)

Related

Geom_label_repel not properly referencing to the sec.axis

I am working with a ggplot that has two axis: one for the geom_bar component, and the other for the geom_linecomponent. And for this, I am using the sec.axis() command.
I wanted to insert a box to provide the last value of the geom_line component, but I am struggling because I believe that while using the commmand geom_label_repel, the aesthetic being used, is referent to the geom_barcomponent.
I'll provide a similar data to illustrate what I am saying.
df <- data.frame(day = as.character(seq(from = 1, to = 100, by = 1)),
total = rbinom(n=100,30,0.5),
prop = runif(100))
df <- df %>% arrange(df, by = day)
df$`percentage` <- label_percent(accuracy = 0.01)(df$prop)
ggplot(data = df,
aes(x = day, y = total)) +
geom_bar(stat = "identity", fill = "lightgreen", width = 0.35) +
geom_line(data = df,
aes(x = day, y = (prop)*15, group = 1),
color = "red", size = 1,inherit.aes = TRUE) +
scale_y_continuous(
labels = function(x) format(x, scientific = FALSE),
#breaks = seq(from = 0, to = 10000000,by = 100000),
sec.axis = sec_axis(trans = ~./15,
name = "Secondary axis",
breaks = seq(from = 0, to = 10, by = 0.1),
scales::percent))+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))+
geom_label_repel(data=df[nrow(df),],
aes(x = day,
y = prop*1,
label = round(prop*100,2)),
color = 'red',
segment.alpha = 0.5) +
scale_x_discrete(expand = expansion(add = c(0, 7)))
Which outputs the following image:
As you can tell, it works well in regards to obtaining the last number of the prop column, which is intended, but it is not automatically placed beside the geom_line.
I have tried messing with the nudge_xand nudge_y commands but it didn't lead me to anywhere, given the fact that I want to have this "number placement" automatic.
Can anyone help?
The sec.axis is in some ways just decorative. ggplot is plotting everything by the main axis. To make the label follow the line, make the same transform as in your geom_line call (y = prop*15):
library(tidyverse)
library(ggrepel)
df <- data.frame(day = as.character(seq(from = 1, to = 100, by = 1)),
total = rbinom(n=100,30,0.5),
prop = runif(100))
df <- df %>% arrange(df, by = day)
df$`percentage` <- scales::label_percent(accuracy = 0.01)(df$prop)
ggplot(data = df,
aes(x = day, y = total)) +
geom_bar(stat = "identity", fill = "lightgreen", width = 0.35) +
geom_line(data = df,
aes(x = day, y = (prop)*15, group = 1),
color = "red", size = 1,inherit.aes = TRUE) +
scale_y_continuous(
labels = function(x) format(x, scientific = FALSE),
#breaks = seq(from = 0, to = 10000000,by = 100000),
sec.axis = sec_axis(trans = ~./15,
name = "Secondary axis",
breaks = seq(from = 0, to = 10, by = 0.1),
scales::percent))+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))+
geom_label_repel(data=df[nrow(df),],
aes(x = day,
y = prop*15,
label = round(prop*100,2)),
color = 'red',
segment.alpha = 0.5) +
scale_x_discrete(expand = expansion(add = c(0, 7)))
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.

How to present the results of a dataframe in a serial scale using ggplot as in the example attached?

I have this data frame :
Raw.Score = c(0,1,2,3,4,5,6,7,8)
Severity = c(-3.56553994,-2.70296933,-1.63969850,-0.81321707,-0.04629182,
0.73721320,1.61278518,2.76647043,3.94804472)
x = data.frame(Raw.Score = Raw.Score, Severity = Severity)
Raw.score are raw numbers from 0 to 8 (let's consider them as the labels of the severity numbers)
Severity are relative numbres that represent the locations of the scores in the diagram
I want to graphically present the results as in the following example using ggplot (the example includes different numbers but I want something similar)
As a fun exercise in ggplot-ing here is one approach to achieve or come close to your desired result.
Raw.Score = c(0,1,2,3,4,5,6,7,8)
Severity = c(-3.56553994,-2.70296933,-1.63969850,-0.81321707,-0.04629182,
0.73721320,1.61278518,2.76647043,3.94804472)
dat <- data.frame(Raw.Score, Severity)
library(ggplot2)
dat_tile <- data.frame(
Severity = seq(-4.1, 4.1, .05)
)
dat_axis <- data.frame(
Severity = seq(-4, 4, 2)
)
tile_height = .15
ymax <- .5
ggplot(dat, aes(y = 0, x = Severity, fill = Severity)) +
# Axis line
geom_hline(yintercept = -tile_height / 2) +
# Colorbar
geom_tile(data = dat_tile, aes(color = Severity), height = tile_height) +
# Sgements connecting top and bottom labels
geom_segment(aes(xend = Severity, yend = -ymax, y = ymax), color = "orange") +
# Axis ticks aka dots
geom_point(data = dat_axis,
y = -tile_height / 2, shape = 21, stroke = 1, fill = "white") +
# ... and labels
geom_text(data = dat_axis, aes(label = Severity),
y = -tile_height / 2 - .1, vjust = 1, fontface = "bold") +
# Bottom labels
geom_label(aes(y = -ymax, label = scales::number(Severity, accuracy = .01))) +
# Top labels
geom_point(aes(y = ymax, color = Severity), size = 8) +
geom_text(aes(y = ymax, label = Raw.Score), fontface = "bold") +
# Colorbar annotations
annotate(geom = "text", fontface = "bold", label = "MILD", color = "black", x = -3.75, y = 0) +
annotate(geom = "text", fontface = "bold", label = "SEVERE", color = "white", x = 3.75, y = 0) +
# Fixing the scales
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(limits = c(-ymax, ymax)) +
# Color gradient
scale_fill_gradient(low = "orange", high = "red", guide = "none") +
scale_color_gradient(low = "orange", high = "red", guide = "none") +
# Get rid of all non-data ink
theme_void() +
# Add some plot margin
theme(plot.margin = rep(unit(10, "pt"), 4)) +
coord_cartesian(clip = "off")

changing color of errorbars in ggplot2 chart

I have a problem with errorbars in bar chart in ggplot. I have an interaction between categorical (condition) and continuous (moderator) variable. I want to show error bars, but they are the same color as bars, which makes them impossible to interpret.
I tried adding color = "black" etc. for error bars, but it won't change anything.
Here is a code:
moderator = runif(n = 100, min = 1, max = 7)
condition <- rep(letters[1:2], length.out = 100)
y = runif(n = 100, min = 1, max = 100)
df <- data.frame(moderator, condition, y)
lm21 <- lm(y~ condition* moderator, data = df)
summary(lm21)
library(ggeffects)
library(ggplot2)
library(magrittr)
pd <- position_dodge()
ggeffect(lm21, terms = c("condition", "moderator")) %>%
plot(show.title = FALSE) +
stat_summary(fun.y = mean, geom = "bar", position = pd, width = 0.25) +
stat_summary(fun.data = mean_cl_boot, geom = "errorbar",
position = pd, size = 8.5, alpha=13.2) +
scale_y_continuous("Voting", limits = c(0, 100)) +
scale_color_discrete(name = "Control", labels = c("Low", "Medium", "High")) +
scale_x_continuous(name = "Condition",
breaks = 0:1,
labels = c("Low","High"))
The graph looks like this:
How can I change the color of error bars so that they are fully visible?
Thank you in advance!
I tried to convert the ggeffect value to a data.frame and ended like this, hope it's what you wanted.
The width control is made by hand sorry, I played with it to put it in the middle. Maybe someone better than me knows how to do it.
ggplot(as.data.frame(ggeffect(lm21, terms = c("condition", "moderator"))), aes(x = factor(x))) +
geom_col(aes(y = predicted, fill = factor(group)), position = position_dodge2(width = .5, preserve = "single", padding = 0)) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high, group = factor(group)), position = position_dodge(width = .9), width = .15) +
geom_point(aes(y = predicted, group = factor(group)), position = position_dodge2(width = .9)) +
scale_fill_discrete(name = "Control", labels = c("Low", "Medium", "High")) +
scale_y_continuous("Voting", limits = c(0, 100)) +
scale_x_discrete(name = "Condition", labels = c("Low","High")) +
theme_light()
Ok it's not the easiest way but that's what I'd done:
p = ggeffect(lm21, terms = c("condition", "moderator")) %>%
plot(show.title = FALSE) +
stat_summary(fun.y = mean, geom = "bar", position = pd, width = 0.25) +
stat_summary(fun.data = mean_cl_boot, geom = "errorbar",
position = pd, size = 8.5, alpha=13.2) +
scale_y_continuous("Voting", limits = c(0, 100)) +
scale_color_discrete(name = "Control", labels = c("Low", "Medium", "High")) +
scale_x_continuous(name = "Condition",
breaks = 0:1,
labels = c("Low","High"))+
scale_colour_manual(values = rep('black',3))+
theme(legend.position = 'none')
The output is:
The only thing is that the legend is missing because scale_colour_manual changes it. But you can use this post to extract the legend How to plot just the legends in ggplot2? and the combine it to your plot.
I hope this is what you wanted
Here is another solution based on grobs manipulation.
p <- ggeffect(lm21, terms = c("condition", "moderator")) %>%
plot(show.title = FALSE) +
stat_summary(fun.y = mean, geom = "bar", position = pd, width = 0.25) +
stat_summary(fun.data = mean_cl_boot, geom = "errorbar",
position = pd, size = 8.5, alpha=13.2) +
scale_y_continuous("Voting", limits = c(0, 100)) +
scale_color_discrete(name = "Control", labels = c("Low", "Medium", "High")) +
scale_x_continuous(name = "Condition",
breaks = 0:1,
labels = c("Low","High"))
# Change the order of ggplot layers (error bars are printed after mean bars)
p$layers <- p$layers[c(3,1,2,4)]
# Set colors of polyline grob (error bars)
q <- ggplotGrob(p)
q$grobs[[6]]$children[[5]]$gp$col <- rep("black",6)
grid::grid.draw(q)

how to ggplot with upper and lower bound as shaded using facet_wrap in R?

I am trying to automate the process of plotting data using ggplot and the facet_wrap functionality. I want a single y-axis label instead individual plot Ob (i.e., A_Ob, B_ob etc) and also a single X-axis not all the plots having label for x-axis such as below. Below is my sample code using gridextra package. However, i would like to do it through facet_wrap as i have many other plots to draw which i think will save me sometime.
graphics.off()
rm(list = ls())
library(tidyverse)
library(gridExtra)
G1 = data.frame(A_Ob = runif(1000, 5, 50), A_Sim = runif(1000, 3,60), A_upper = runif(1000, 10,70), A_lower = runif(1000, 0, 45 ),
B_Ob = runif(1000, 5, 50), B_Sim = runif(1000, 3,60), B_upper = runif(1000, 10,70), B_lower = runif(1000, 0, 45 ),
C_Ob = runif(1000, 5, 50), C_Sim = runif(1000, 3,60), C_upper = runif(1000, 10,70), C_lower = runif(1000, 0, 45 ),
D_Ob = runif(1000, 5, 50), D_Sim = runif(1000, 3,60), D_upper = runif(1000, 10,70), D_lower = runif(1000, 0, 45 ),
Pos = 1:1000)
A1 = ggplot(data = G1, aes(x = Pos))+
geom_line(aes(y = A_Ob), col = "black")+
geom_line(aes(y = A_Sim), col = "blue")+
geom_vline(xintercept = 750, color = "red", size=1.5)+
geom_ribbon(aes(ymin = A_upper, ymax = A_lower), fill = "grey70")
B1 = ggplot(data = G1, aes(x = Pos))+
geom_line(aes(y = B_Ob), col = "black")+
geom_line(aes(y = B_Sim), col = "blue")+
geom_vline(xintercept = 750, color = "red", size=1.5)+
geom_ribbon(aes(ymin = B_upper, ymax = B_lower), fill = "grey70")
C1 = ggplot(data = G1, aes(x = Pos))+
geom_line(aes(y = C_Ob), col = "black")+
geom_line(aes(y = C_Sim), col = "blue")+
geom_vline(xintercept = 750, color = "red", size=1.5)+
geom_ribbon(aes(ymin = C_upper, ymax = C_lower), fill = "grey70")
D1 = ggplot(data = G1, aes(x = Pos))+
geom_line(aes(y = D_Ob), col = "black")+
geom_line(aes(y = D_Sim), col = "blue")+
geom_vline(xintercept = 750, color = "red", size=1.5)+
geom_ribbon(aes(ymin = D_upper, ymax = D_lower), fill = "grey70")
grid.arrange(A1,B1,C1,D1, nrow = 4)
Here is the result of the code
You need to reshape your dataframe into a longer format and separate values for Ob, Sim, upper and lower.
Using the function melt from data.table package can help you to achieve this:
library(data.table)
setDT(G1)
Ob_cols = grep("_Ob",colnames(G1),value = TRUE)
Sim_cols = grep("_Sim",colnames(G1),value = TRUE)
Upper_cols = grep("_upper",colnames(G1), value = TRUE)
Lower_cols = grep("_lower", colnames(G1), value = TRUE)
g.m <- melt(G1, measure = list(Ob_cols,Sim_cols,Upper_cols,Lower_cols), value.name = c("OBS","SIM","UP","LOW"))
levels(g.m$variable) <- c("A","B","C","D")
Pos variable OBS SIM UP LOW
1: 1 A 5.965488 29.167666 26.66783 29.97259
2: 2 A 23.855719 8.570245 43.75830 30.65616
3: 3 A 16.947887 51.201047 15.20758 39.76122
4: 4 A 49.883306 3.715319 34.38066 20.73177
5: 5 A 5.021938 3.102880 30.05036 32.05123
6: 6 A 19.887176 15.400853 53.67156 28.54982
and now, you can plot it:
library(ggplot2)
ggplot(g.m, aes(x = Pos))+
geom_line(aes(y = OBS), color = "black")+
geom_line(aes(y = SIM), color = "blue")+
geom_vline(xintercept = 750,color = "red", size = 1.5)+
geom_ribbon(aes(ymin = UP, ymax = LOW), fill = "grey70")+
facet_grid(variable~.)
EDIT: Adding annotations & renaming labels
To rename and replace facet labels, you can re-define levels of variable and use facet_wrap instead of facet_grid using ncol = 1 as argument.
To add multiple annotations on a single panel, you need to define a dataframe that you will use in geom_text.
Altogether, you have to do:
# renaming names of each facets:
levels(g.m$variable) <- c("M1","M2","M3","M4")
# Defining annotations to add:
df_text <- data.frame(label = c("Calibration", "Validation"),
x = c(740,760),
y = c(65,65),
hjust = c(1,0),
variable = factor("M1", levels = c("M1","M2","M3","M4")))
# Plotting
ggplot(g.m, aes(x = Pos))+
geom_line(aes(y = OBS), color = "black")+
geom_line(aes(y = SIM), color = "blue")+
geom_vline(xintercept = 750,color = "red", size = 1.5)+
geom_ribbon(aes(ymin = UP, ymax = LOW), fill = "grey70")+
facet_wrap(variable~., ncol = 1)+
theme(strip.text.x = element_text(hjust = 0),
strip.background = element_rect(fill = "white"))+
geom_text(data = df_text, aes(x = x, y = y, label = label, hjust = hjust), color = "red")
Does it look what you are expecting ?

Histogram with normal Distribution in R using ggplot2 for illustrations

I'm trying to plot a histogram with ggplot2.
I wrote a simple code for this in R
dnorm.count <- function(x, mean = 0, sd = 1, log = FALSE, n = 1, binwidth = 1){
n * binwidth * dnorm(x = x, mean = mean, sd = sd, log = log)
}
mtcars %>%
ggplot(aes(x = mpg)) +
geom_histogram(bins =60,color = "white", fill = "#9FE367",boundary = 0.5) +
geom_vline(aes(xintercept = mean(mpg)),
linetype="dashed",
size = 1.6,
color = "#FF0000")+
geom_text(aes(label = ..count..), stat= "count",vjust = -0.6)+
stat_function(fun = dnorm.count, color = "#6D67E3",
args = list(mean= mean(mtcars$mpg),
sd = sd(mtcars$mpg),
n = nrow(mtcars)),
lwd = 1.2) +
scale_y_continuous(labels = comma, name = "Frequency") +
scale_x_continuous(breaks=seq(0,max(mtcars$mpg)))+
geom_text(aes(label = paste0("mean = ", round(mean(mtcars$mpg), 2)),
x = mean(mtcars$mpg)*1.2,
y = mean(mtcars$mpg)/5))+
geom_vline(aes(xintercept = sd(mpg)), linetype="dashed",size = 1.6, color = "#FF0000")
What I got is this!
The question is how do I Plot the histogram similar to this
using ggplot2 and is it possible to convert the code to R function?
Edit: For the better explanation of what I'm trying to do:
I wanna create a Histogram exactly the same as the one attached for reference using ggplot2 and then I wanna create a function for the same to reduce the coding. Use any package+ggplot2 you like. The histograms should have lines depicting the standard deviation & mean like the one in reference. If possible depict the standard deviation in the plot as the reference image, that's what I'm trying to achieve.
If your question how to plot histograms like the one you attached in your last figure, this 9 lines of code produce a very similar result.
library(magrittr) ; library(ggplot2)
set.seed(42)
data <- rnorm(1e5)
p <- data %>%
as.data.frame() %>%
ggplot(., aes(x = data)) +
geom_histogram(fill = "white", col = "black", bins = 30 ) +
geom_density(aes( y = 0.3 *..count..)) +
labs(x = "Statistics", y = "Probability/Density") +
theme_bw() + theme(axis.text = element_blank())
You could use annotate() to add symbols or text and geom_segment to show the intervals on the plot like this:
p + annotate(x = sd(data)/2 , y = 8000, geom = "text", label = "σ", size = 10) +
annotate(x = sd(data) , y = 6000, geom = "text", label = "2σ", size = 10) +
annotate(x = sd(data)*1.5 , y = 4000, geom = "text", label = "3σ", size = 10) +
geom_segment(x = 0, xend = sd(data), y = 7500, yend = 7500) +
geom_segment(x = 0, xend = sd(data)*2, y = 5500, yend = 5500) +
geom_segment(x = 0, xend = sd(data)*3, y = 3500, yend = 3500)
This chunk of code would give you something like this:

Resources