I am trying to make a composite plot in R using the packages ggplot2 and ggpubr.
I have no problem in making the composite plots except each plot has a normal distribution curve specific to that dataset. When I generate the composite plot, both plots have the same curve, that of the last dataset.
How can I generate the composite plot with each plot having its own specific normal distribution curve?
CODE AND OUTPUT PLOTS
## PLOT 1 ##
results_matrix_C <- data.frame(matrix(rnorm(20), nrow=20))
colnames(results_matrix_C) <- c("X")
m <- mean(results_matrix_C$X)
sd <- sd(results_matrix_C$X)
dnorm_C <- function(x){
norm_C <- dnorm(x, m, sd)
return(norm_C)
}
e = 1
dnorm_one_sd_C <- function(x){
norm_one_sd_C <- dnorm(x, m, sd)
# Have NA values outside interval x in [e]:
norm_one_sd_C[x <= e] <- NA
return(norm_one_sd_C)
}
C <- ggplot(results_matrix_C, aes(x = results_matrix_C$X)) +
geom_histogram(aes(y=..density..), bins = 10, colour = "black", fill = "white") +
stat_function(fun = dnorm_one_sd_C, geom = "area", fill = "#CE9A05", color = "#CE9A05", alpha = 0.25, size = 1) +
stat_function(fun = dnorm_C, colour = "#CE0539", size = 1) +
theme_classic()
## PLOT 2 ##
results_matrix_U <- data.frame(matrix(rnorm(20)+1, nrow=20))
colnames(results_matrix_U) <- c("X")
m <- mean(results_matrix_U$X)
sd <- sd(results_matrix_U$X)
dnorm_U <- function(x){
norm_U <- dnorm(x, m, sd)
return(norm_U)
}
e = 2
dnorm_one_sd_U <- function(x){
norm_one_sd_U <- dnorm(x, m, sd)
# Have NA values outside interval x in [e]:
norm_one_sd_U[x <= e] <- NA
return(norm_one_sd_U)
}
U <- ggplot(results_matrix_U, aes(x = results_matrix_U$X)) +
geom_histogram(aes(y=..density..), bins = 10, colour = "black", fill = "white") +
stat_function(fun = dnorm_one_sd_U, geom = "area", fill = "#CE9A05", color = "#CE9A05", alpha = 0.25, size = 1) +
stat_function(fun = dnorm_U, colour = "#CE0539", size = 1) +
theme_classic()
library(ggpubr)
ggarrange(C, U,
nrow = 1, ncol = 2)
As you can see in the composite plot, the first one has taken the normal distribution curve of the second plot rather than its own one from my initial plot (Plot 1).
UPDATE
Variable "e" refers to the shaded area which is related to the distribution curve.
m = mean of the dataset
sd = standard deviation of the dataset
m and sd are used to generate the normal distribution curves
SOLVED
By inserting the function in full into the stat_function section of the ggplot2 code, this has worked
i.e:
## PLOT 1 ##
results_matrix_C <- data.frame(matrix(rnorm(20), nrow=20))
colnames(results_matrix_C) <- c("X")
mean <- mean(results_matrix_C$X)
sd <- sd(results_matrix_C$X)
e = 1
C <- ggplot(results_matrix_C, aes(x = results_matrix_C$X)) +
geom_histogram(aes(y=..density..), bins = 10, colour = "black", fill = "white") +
stat_function(
fun = function(x, mean, sd, e){
norm_one_sd_C <- dnorm(x, mean, sd)
norm_one_sd_C[x <= e] <- NA
return(norm_one_sd_C)},
args = c(mean = mean, sd = sd, e = e), geom = "area", fill = "#CE9A05", color = "#CE9A05", alpha = 0.25, size = 1) +
stat_function(
fun = function(x, mean, sd){
dnorm(x = x, mean = mean, sd = sd)},
args = c(mean = mean, sd = sd), colour = "#CE0539", size = 1) +
theme_classic()
## PLOT 2 ##
results_matrix_U <- data.frame(matrix(rnorm(20)+1, nrow=20))
colnames(results_matrix_U) <- c("X")
mean <- mean(results_matrix_U$X)
sd <- sd(results_matrix_U$X)
e = 2
U <- ggplot(results_matrix_U, aes(x = results_matrix_U$X)) +
geom_histogram(aes(y=..density..), bins = 10, colour = "black", fill = "white") +
stat_function(
fun = function(x, mean, sd, e){
norm_one_sd_U <- dnorm(x, mean, sd)
norm_one_sd_U[x <= e] <- NA
return(norm_one_sd_U)},
args = c(mean = mean, sd = sd, e = e), geom = "area", fill = "#CE9A05", color = "#CE9A05", alpha = 0.25, size = 1) +
stat_function(
fun = function(x, mean, sd){
dnorm(x = x, mean = mean, sd = sd)},
args = c(mean = mean, sd = sd), colour = "#CE0539", size = 1) +
theme_classic()
library(ggpubr)
ggarrange(C, U,
nrow = 1, ncol = 2)
Related
I'm trying to add more standard deviation to my current plot. I need to add 1std and 3std, I've already addeed the 2std to my plot.
This is my code:
tidyverse_downloads_rollmean <- treasury %>%
tq_mutate(
# tq_mutate args
select = yield,
mutate_fun = rollapply,
# rollapply args
width = 360,
align = "right",
FUN = mean,
# mean args
na.rm = TRUE,
# tq_mutate args
col_rename = "mean_360"
)
This is for the 2std but I need to add 1std and 3std in the same plot:
custom_stat_fun_2 <- function(x, na.rm = TRUE) {
m <- mean(x, na.rm = na.rm)
s <- sd(x, na.rm = na.rm)
hi <- m + 2*s
lo <- m - 2*s
ret <- c(mean = m, stdev = s, hi.95 = hi, lo.95 = lo)
return(ret)
}
I added to my data:
rollstats<- treasury %>%
tq_mutate(
select = yield,
mutate_fun = rollapply,
# rollapply args
width = 360,
align = "right",
by.column = FALSE,
FUN = custom_stat_fun_2,
# FUN args
na.rm = TRUE
)
This is my plot:
rollstats %>%
ggplot(aes(x = date)) +
# Data
geom_line(aes(y = yield), color = "grey40", alpha = 0.5, size =1) +
geom_ribbon(aes(ymin = lo.95, ymax = hi.95), alpha = 0.4) +
geom_point(aes(y = mean), linetype = 2, size = 0.5, alpha = 0.5) +
# Aesthetics
labs(title = "tidyverse packages: Volatility and Trend", x = "",
subtitle = "360-Day Moving Average with 95% Confidence Interval Bands (+/-2 Standard Deviations)") +
scale_color_tq(theme = "light") +
theme_tq() +
theme(legend.position="none")
This is my output:
But I want something like this:
So how can I add the 1std and 2std? Is there another way to plot 1std, 2std and 3std in the same plot? Thanks in advance!
You haven't provided a reprex so hard to help you. Tidyquant has functions to plot the standard deviation bands for you (geom_bbands). But, here's an idea with only ggplot2 using different data. Calculate the 1st, 2nd, and 3rd standard deviations:
library(tidyquant)
custom_stat_fun_2 <- function(x, na.rm = TRUE) {
m <- mean(x, na.rm = na.rm)
s <- sd(x, na.rm = na.rm)
hi1 <- m + s
lo1 <- m - s
hi2 <- m + 2*s
lo2 <- m - 2*s
hi3 <- m + 3*s
lo3 <- m - 3*s
ret <- c(mean = m, stdev = s,hi1 = hi1, lo1 = lo1, hi2=hi2, lo2=lo2, hi3=hi3, lo3=lo3)
return(ret)
}
treasury <- treasuryTR::get_yields("DGS10", format_out = "tibble")
rollstats<- treasury |>
tq_mutate(
select = DGS10,
mutate_fun = rollapply,
# rollapply args
width = 360,
align = "right",
by.column = FALSE,
FUN = custom_stat_fun_2
) |>
na.omit()
Melt the data frame to have one column for hi and one for lo and then set factor levels so they plot in reverse order:
rollsds <- tidyr::pivot_longer(rollstats,cols = starts_with(c("hi", "lo")),
names_to = c(".value", "sd"), names_pattern = "(.*)(\\d)")
rollsds$sd <- factor(as.character(rollsds$sd), levels=c(3,2,1))
Plot
library(ggplot2)
rollstats |>
ggplot(aes(x = date)) +
# Data
geom_ribbon(data=rollsds, aes(ymax = hi, ymin=lo, fill=sd, color=sd), alpha=0.3) +
geom_line(aes(y = mean), linetype = 2, size = 0.5, alpha = 0.5) +
geom_line(aes(y = DGS10), color = "midnightblue", alpha = 0.7, size =1) +
# Aesthetics
theme_tq()
library(ggplot2)
library(fitdistrplus)
set.seed(1)
dat <- data.frame(n = rlnorm(1000))
# binwidth
bw = 0.2
# fit a lognormal distribution
fit_params <- fitdistr(dat$n,"lognormal")
ggplot(dat, aes(n)) +
geom_histogram(aes(y = ..density..), binwidth = bw, colour = "black") +
stat_function(fun = dlnorm, size = 1, color = 'gray',
args = list(mean = fit_params$estimate[1], sd = fit_params$estimate[2]))
# my defined function
myfun <- function(x, a, b) 1/(sqrt(2*pi*b(x-1)))*exp(-0.5*((log(x-a)/b)^2)) # a and b are meanlog and sdlog resp.
I'd like to fit a modified lognormal defined by myfun to a density histogram. How do I add this function?
Maybe you are looking for this. Some values can not appear because of the domain of your myfun:
library(ggplot2)
library(fitdistrplus)
set.seed(1)
dat <- data.frame(n = rlnorm(1000))
# binwidth
bw = 0.2
# fit a lognormal distribution
fit_params <- fitdistr(dat$n,"lognormal")
# my defined function
myfun <- function(x, a, b) 1/(sqrt(2*pi*b*(x-1)))*exp(-0.5*((log(x-a)/b)^2))
# a and b are meanlog and sdlog resp.
#Plot
ggplot(dat, aes(n)) +
geom_histogram(aes(y = ..density..), binwidth = bw, colour = "black") +
stat_function(fun = myfun, size = 1, color = 'gray',
args = list(a = fit_params$estimate[1], b = fit_params$estimate[2]))
Output:
As the title indicates I am trying to plot the normal distribution and the binomial distribution in the same plot using R. My attempt can be seen below, is there any reason why my normal distribution looks so off? I have double checked the mean and standard deviation and everything looks fine.
n <- 151
p <- 0.2409
dev <- 4
mu <- n*p
sigma <- sqrt(n*p*(1 - p))
xmin <- round(max(mu - dev*sigma,0));
xmax <- round(min(mu + dev*sigma,n))
x <- seq(xmin, xmax)
y <- dbinom(x,n,p)
barplot(y,
col = 'lightblue',
names.arg = x,
main = 'Binomial distribution, n=151, p=.803')
range <- seq(mu - dev*sigma, mu + dev*sigma, 0.01)
height <- dnorm(range, mean = mu, sd = sigma)
lines(range, height, col = 'red', lwd = 3)
barplot is just the wrong function for your case. Or if you really want to use it, you'd have to rejigger the x-axes between barplot and lines
The default for barplot is to put each height value at
head(c(barplot(y, plot = FALSE)))
# [1] 0.7 1.9 3.1 4.3 5.5 6.7
This can be changed by your choices of space and width or a combination of both
head(c(barplot(y, plot = FALSE, space = 0)))
# [1] 0.5 1.5 2.5 3.5 4.5 5.5
head(c(barplot(y, plot = FALSE, space = 0, width = 3)))
# [1] 1.5 4.5 7.5 10.5 13.5 16.5
You can just use plot to avoid dealing with those things
n <- 151
p <- 0.2409
dev <- 4
mu <- n*p
sigma <- sqrt(n*p*(1 - p))
xmin <- round(max(mu - dev*sigma,0));
xmax <- round(min(mu + dev*sigma,n))
x <- seq(xmin, xmax)
y <- dbinom(x,n,p)
plot(x, y, type = 'h', lwd = 10, lend = 3, col = 'lightblue',
ann = FALSE, las = 1, bty = 'l', yaxs = 'i', ylim = c(0, 0.08))
title(main = sprintf('Binomial distribution, n=%s, p=%.3f', n, p))
lines(x, dnorm(x, mean = mu, sd = sigma), col = 'red', lwd = 7)
xx <- seq(min(x), max(x), length.out = 1000)
lines(xx, dnorm(xx, mean = mu, sd = sigma), col = 'white')
The "bars" in this figure depend on your choice of lwd and your device dimensions, but if you need finer control over that, you can use rect which takes a little more work.
w <- 0.75
plot(x, y, type = 'n', ann = FALSE, las = 1, bty = 'l', yaxs = 'i', ylim = c(0, 0.08))
rect(x - w / 2, 0, x + w / 2, y, col = 'lightblue')
lines(xx, dnorm(xx, mean = mu, sd = sigma), col = 'red', lwd = 3)
title(main = sprintf('Binomial distribution, n=%s, p=%.3f', n, p))
You can use the ggplot2 package
library(ggplot2)
n <- 151
p <- 0.2409
mean <- n*p
sd <- sqrt(n*p*(1-p))
binwidth <- 0.005
xmin <- round(max(mu - dev*sigma,0));
xmax <- round(min(mu + dev*sigma,n))
x <- seq(xmin, xmax)
y <- dbinom(x,n,p)
df <- cbind.data.frame(x, y)
ggplot(df, aes(x = x, y = y)) +
geom_bar(stat="identity", fill = 'dodgerblue3')+
labs(title = "Binomial distribution, n=151, p=.803",
x = "",
y = "") +
theme_minimal()+
# Create normal curve, akousting for number of observations and binwidth
stat_function(
fun = function(x, mean, sd, n, bw){
dnorm(x = x, mean = mean, sd = sd)
}, col = "red", size=I(1.4),
args = c(mean = mean, sd = sd, n = n, bw = binwidth))
You could do it using the ggplot2 package (I was surprised by the normal distribution but replacing geom_line by geom_point convinced me that is has this form (is the variance too high ?)) :
n <- 151
p <- 0.2409
dev <- 4
mu <- n*p
sigma <- sqrt(n*p*(1 - p))
xmin <- round(max(mu - dev*sigma,0));
xmax <- round(min(mu + dev*sigma,n))
x <- seq(xmin, xmax)
y <- dbinom(x,n,p)
z <- dnorm(x = qnorm(p = seq(0,1, length.out = length(x)), mean = mu, sd = sigma), mean = mu, sd = sigma)
library(magrittr)
library(ggplot2)
data.frame(x, y, z) %>%
ggplot(aes(x = x)) +
geom_col(aes(y = y)) +
geom_line(aes(x = x, y = z, colour = "red"),
show.legend = FALSE)
I am trying to adjust the layers of a plot that uses both stat_function and geom_vline. My problem is that the vertical line is not perfectly aligned with the green area:
Density plot with a vertical line (not aligned)
In this post I saw a solution to align two separate plots, however, in my case I want to align then in the same plot.
all_mean <- mean(mtcars$wt,na.rm = T)%>% round(2)
all_sd <- sd(mtcars$wt,na.rm = T)%>% round(2)
my_score <- mtcars[1,"wt"]
dd <- function(x) { dnorm(x, mean=all_mean, sd=all_sd) }
z <- (my_score - all_mean)/all_sd
pc <- round(100*(pnorm(z)), digits=0)
t1 <- paste0(as.character(pc),"th percentile")
p33 <- all_mean + (qnorm(0.3333) * all_sd)
p67 <- all_mean + (qnorm(0.6667) * all_sd)
funcShaded <- function(x, lower_bound) {
y = dnorm(x, mean = all_mean, sd = all_sd)
y[x < lower_bound] <- NA
return(y)
}
greenShaded <- function(x, lower_bound) {
y = dnorm(x, mean = all_mean, sd = all_sd)
y[x > (all_mean*2)] <- NA
return(y)
}
ggplot(data.frame(x=c(min(mtcars$wt-2), max(mtcars$wt+2))), aes(x=x)) +
stat_function(fun=dd, colour="black") +
stat_function(fun = greenShaded, args = list(lower_bound = pc),
geom = "area", fill = "green", alpha = 1)+
stat_function(fun = funcShaded, args = list(lower_bound = my_score),
geom = "area", fill = "white", alpha = .9)+
geom_vline(aes(xintercept=my_score), colour="black")
stat_function chooses n points along your range, by default 101. This means you only have limited resolution for your curve. Simply increase n for the funcShaded layer.
ggplot(data.frame(x=c(min(mtcars$wt-2), max(mtcars$wt+2))), aes(x=x)) +
stat_function(fun=dd, colour="black") +
stat_function(fun = greenShaded, args = list(lower_bound = pc),
geom = "area", fill = "green", alpha = 1)+
stat_function(fun = funcShaded, args = list(lower_bound = my_score),
geom = "area", fill = "white", alpha = .9, n = 1e3)+
geom_vline(aes(xintercept=my_score), colour="black")
I have two populations A and B distributed spatially with one character Z, I want to be able to make an hexbin substracting the proportion of the character in each hexbin. Here I have the code for two theoretical populations A and B
library(hexbin)
library(ggplot2)
set.seed(2)
xA <- rnorm(1000)
set.seed(3)
yA <- rnorm(1000)
set.seed(4)
zA <- sample(c(1, 0), 20, replace = TRUE, prob = c(0.2, 0.8))
hbinA <- hexbin(xA, yA, xbins = 40, IDs = TRUE)
A <- data.frame(x = xA, y = yA, z = zA)
set.seed(5)
xB <- rnorm(1000)
set.seed(6)
yB <- rnorm(1000)
set.seed(7)
zB <- sample(c(1, 0), 20, replace = TRUE, prob = c(0.4, 0.6))
hbinB <- hexbin(xB, yB, xbins = 40, IDs = TRUE)
B <- data.frame(x = xB, y = yB, z = zB)
ggplot(A, aes(x, y, z = z)) + stat_summary_hex(fun = function(z) sum(z)/length(z), alpha = 0.8) +
scale_fill_gradientn(colours = c("blue","red")) +
guides(alpha = FALSE, size = FALSE)
ggplot(B, aes(x, y, z = z)) + stat_summary_hex(fun = function(z) sum(z)/length(z), alpha = 0.8) +
scale_fill_gradientn (colours = c("blue","red")) +
guides(alpha = FALSE, size = FALSE)
here is the two resulting graphs
My goal is to make a third graph with hexbins with the values of the difference between hexbins at the same coordinates but I don't even know how to start to do it, I have done something similar in the raster Package, but I need it as hexbins
Thanks a lot
You need to make sure that both plots use the exact same binning. In order to achieve this, I think it is best to do the binning beforehand and then plot the results with stat_identity / geom_hex. With the variables from your code sample you ca do:
## find the bounds for the complete data
xbnds <- range(c(A$x, B$x))
ybnds <- range(c(A$y, B$y))
nbins <- 30
# function to make a data.frame for geom_hex that can be used with stat_identity
makeHexData <- function(df) {
h <- hexbin(df$x, df$y, nbins, xbnds = xbnds, ybnds = ybnds, IDs = TRUE)
data.frame(hcell2xy(h),
z = tapply(df$z, h#cID, FUN = function(z) sum(z)/length(z)),
cid = h#cell)
}
Ahex <- makeHexData(A)
Bhex <- makeHexData(B)
## not all cells are present in each binning, we need to merge by cellID
byCell <- merge(Ahex, Bhex, by = "cid", all = T)
## when calculating the difference empty cells should count as 0
byCell$z.x[is.na(byCell$z.x)] <- 0
byCell$z.y[is.na(byCell$z.y)] <- 0
## make a "difference" data.frame
Diff <- data.frame(x = ifelse(is.na(byCell$x.x), byCell$x.y, byCell$x.x),
y = ifelse(is.na(byCell$y.x), byCell$y.y, byCell$y.x),
z = byCell$z.x - byCell$z.y)
## plot the results
ggplot(Ahex) +
geom_hex(aes(x = x, y = y, fill = z),
stat = "identity", alpha = 0.8) +
scale_fill_gradientn (colours = c("blue","red")) +
guides(alpha = FALSE, size = FALSE)
ggplot(Bhex) +
geom_hex(aes(x = x, y = y, fill = z),
stat = "identity", alpha = 0.8) +
scale_fill_gradientn (colours = c("blue","red")) +
guides(alpha = FALSE, size = FALSE)
ggplot(Diff) +
geom_hex(aes(x = x, y = y, fill = z),
stat = "identity", alpha = 0.8) +
scale_fill_gradientn (colours = c("blue","red")) +
guides(alpha = FALSE, size = FALSE)