Histogram with normal curve and error bars in R - r

#import data
data = diameters$V1
error = .005 #mm
#make histogram
h <- hist(data, breaks = "FD", density = 10,
col = "lightblue", xlab = "Diameter", main = "Overall")
# Make normal curve
xfit <- seq(min(data), max(data), length = 40)
yfit <- dnorm(xfit, mean = mean(data), sd = sd(data))
yfit <- yfit * diff(h$mids[1:2]) * length(data)
#Draw normal curve
lines(xfit, yfit, col = "black", lwd = 2)
Output:
Expectation:
Is it possible to add error bars to the histogram using the value of +/- error without any external libraries?

You should be able to draw them with the arrows() function:
## Create a histogram from random data
> hist(sample(runif(100)))
> arrows(x0 = 0.15, y0 = 11, x1 = 0.15, y1 = 13, code = 3, length = 0.05, angle = 90)
x0 and x1 specify the start and finish x coordinates (for a straight vertical line, keep them the same)
y0 and y1 specify the start and finish y coordinates e.g the length of the line to draw.
code = 3 tells R to draw a double sided 'arrow', angle = 90 makes the 'arrow' a flat line and length = 0.05 specifies how wide the error bars should be.
See ?arrows for more details.

Related

How to customize Bland-Altman plot in R

I am trying to compare two measurement methods with Bland-Altman plot, which is basically this:
method.1 <- rnorm(20)
method.2 <- rnorm(20)
plot((method.1 + method.2)/2, method.1 - method.2)
I've found a package that I like:
devtools::install_github("deepankardatta/blandr")
library(blandr)
blandr.draw(method.1, method.2, plotter = "rplot")
Which gives me the following result:
Bland-Altman plot with blandr package
The upper band is Mean + 1.96 SD (+/- 95% CI)
The lower band is Mean - 1.96 SD (+/- 95% CI)
The middle band is Mean +/- 95% CI
I like the way it is, although I wish I could change the bands colours, line types, points shape or include the legend.
I wish I could overwrite the blandr.draw() function or just create my own plot ( same as blandr.draw() ) using base R so I can customize it the way I want. I failed to contact the package author...
Additionally - ggplot version of similar plot ( blandr.draw(method.1, method.2) ) will be appreciated.
So here is my self-made Bland-Altman plot - maybe it will be useful for others.
Sample Bland-Altman plot
All calculations (Lines of agreement and 95% Confidence Intervals) based on Bland and Altman paper from 1999: Measuring agreement in method comparision studies.
I still don't know how to shade bands between Confidence Intervals - probably with rect() function.
# Sample data:
method.1 <- rnorm(100)
method.2 <- rnorm(100)
df <- data.frame(
X = (method.1 + method.2)/2,
Y = (method.1 - method.2)
)
# Number of measurements to calculate degrees of freedom for t-distribution:
n = length(df$Y)
t = qt(0.975, df = n - 1) # t-distribution
mean <- mean(df$Y)
LoA <- 1.96*sd(df$Y) # Lines of Agreement
# 95% Confidence Intervals:
LoA_CI <- t * sqrt( (1/n + 3.8416/(2*(n - 1))) ) * sd(df$Y)
mean_CI <- t * sd(df$Y)/sqrt(n)
# To calculate position of partition lines:
LoA_up_plus <- mean + LoA + LoA_CI
LoA_up <- mean + LoA
LoA_up_minus <- mean + LoA - LoA_CI
mean_plus <- mean + mean_CI
mean_minus <- mean - mean_CI
LoA_down_plus <- mean - LoA + LoA_CI
LoA_down <- mean - LoA
LoA_down_minus <- mean - LoA - LoA_CI
# Save PNG file:
png(filename = "BA_norm.png",
width = 3000, height = 2100, units = "px", res = 300)
# Plot:
plot(Y ~ X, df,
# When I have a lot of data my points are overlapping each other
# that's why I make them semi-transparent with 'alpha':
col = rgb(0, 0, 0, alpha = 0.5), pch = 16, cex = 0.75,
main = "Bland-Altman plot for Mathod 1 and Method 2",
xlab = "Mean of results",
ylab = "Method 1 - Method 2 difference"
)
# Background colour for your plot, if you don't want it
# just skip following four lines of code:
rect(par("usr")[1], par("usr")[3], par("usr")[2], par("usr")[4],
col = "#c2f0f0") #here you can put desired background colour hex
points(Y ~ X, df,
col = rgb(0, 0, 0, alpha = 0.5), pch = 16, cex = 0.75)
# Adding lines:
abline(h = 0, lwd = 0.7) # solid line for Y = 0
# Display rounded values of partition lines positions:
text(x = 1.5, y = LoA_up_plus, # x and y position of text
paste(round(LoA_up, 2), "\u00B1", round(LoA_CI, 2)), pos = 1)
abline(h = LoA_up_plus, col = "#68cbf8", lty = "dotted")
abline(h = LoA_up, col = "blue", lty = "dashed")
abline(h = LoA_up_minus, col = "#68cbf8", lty = "dotted")
text(x = 1.5, y = mean_plus,
paste(round(mean, 2), "\u00B1", round(mean_CI, 2)), pos = 3)
abline(h = mean_plus, col = "#ff9e99", lty = "dotted")
abline(h = mean, col = "red", lty = "longdash")
abline(h = mean_minus, col = "#ff9e99", lty = "dotted")
text(x = 1.5, y = LoA_down_plus,
paste(round(LoA_down, 2), "\u00B1", round(LoA_CI, 2)), pos = 3)
abline(h = LoA_down_plus, col = "#68cbf8", lty = "dotted")
abline(h = LoA_down, col = "blue", lty = "dashed")
abline(h = LoA_down_minus, col = "#68cbf8", lty = "dotted")
# Close saving PNG file function:
dev.off()
I guess it is possible to easily condense all those abline() functions.

How to lower height of normal distribution curve

I would like to shorten the height of my normal dist curve so that the full curve can be seen on the graph.
histCvferr <- hist(cvf_ref_err, breaks = 10, density = 60,
col = "lightgray", xlab = "Residuals", main = "")
xfit <- seq(min(cvf_ref_err), max(cvf_ref_err), length = 40)
yfit <- dnorm(xfit, mean = mean(cvf_ref_err), sd = sd(cvf_ref_err))
yfit <- yfit * diff(h$mids[1:2]) * length(cvf_ref_err)
lines(xfit, yfit, col = "black", lwd = 2)
As you can see the top part of the curve cuts off
And also how can I change the bins so that they are black outlines with no fill?
Try passing a new argument "ylim" in the hist() function and change the range of y. Try passing the following argument in the hist() function. I hope this might help you.
ylim = c(0, 70)

overlaying two normal distributions over two histograms on one plot in R

I'm trying to graph two normal distributions over two histograms in the same plot in R. Here is an example of what I would like it to look like:
Here is my current code but I'm not getting the second Normal distribution to properly overlay:
g = R_Hist$`AvgFeret,20-60`
m<-mean(g)
std<-sqrt(var(g))
h <- hist(g, breaks = 20, xlab="Average Feret Diameter", main = "Histogram of 60-100um beads", col=adjustcolor("red", alpha.f =0.2))
xfit <- seq(min(g), max(g), length = 680)
yfit <- dnorm(xfit, mean=mean(g), sd=sd(g))
yfit <- yfit*diff(h$mids[1:2]) * length(g)
lines(xfit, yfit, col = "red", lwd=2)
k = R_Hist$`AvgFeret,60-100`
ms <-mean(k)
stds <-sqrt(var(k))
j <- hist(k, breaks=20, add=TRUE, col = adjustcolor("blue", alpha.f = 0.3))
xfit <- seq(min(j), max(j), length = 314)
yfit <- dnorm(xfit, mean=mean(j), sd=sd(j))
yfit <- yfit*diff(j$mids[1:2]) * length(j)
lines(xfit, yfit, col="blue", lwd=2)
and here is the graph this code is generating:
I haven't yet worked on figuring out how to rescale the axis so any help on that would also be appreciated, but I'm sure I can just look that up! Should I be using ggplot2 for this application? If so how do you overlay a normal curve in that library?
Also as a side note, here are the errors generated from graphing the second (blue) line:
To have them on the same scale, the easiest might be to run hist() first to get the values.
h <- hist(g, breaks = 20, plot = FALSE)
j <- hist(k, breaks = 20, plot = FALSE)
ymax <- max(c(h$counts, j$counts))
xmin <- 0.9 * min(c(g, k))
xmax <- 1.1 * max(c(g,k))
Then you can simply use parameters xlim and ylim in your first call to hist():
h <- hist(g, breaks = 20,
xlab="Average Feret Diameter",
main = "Histogram of 60-100um beads",
col=adjustcolor("red", alpha.f =0.2),
xlim=c(xmin, xmax),
ylim=c(0, ymax))
The errors for the second (blue) line are because you didn't replace j (the histogram object) with k (the raw values):
xfit <- seq(min(k), max(k), length = 314)
yfit <- dnorm(xfit, mean=mean(k), sd=sd(k))
yfit <- yfit*diff(j$mids[1:2]) * length(k)
As for the ggplot2 approach, you can find a good answer here and in the posts linked therein.

How to add confidence intervals to 3D surface?

I have a matrix 40x40 from values obtained by interpolation using library akima to create a 3D surface.
I estimated CI 95% using monte carlo simulations from predicted values and now I want to add them for year 0 to my 3D graph.
I’m doing something wrong and I don’t understand how to plot vertical lines to represent the CIs.
My lines look like this:
And I want to have CI like on this image:
Here's are my data, dropbox link because it's longer than the space allowed to post here: https://www.dropbox.com/s/c6iyd2r00k5jbws/data.rtf?dl=0
and my code:
persp(xyz,theta = 45, phi = 25,border="grey40", ticktype = "detailed", zlim=c(0,.8))->res2
y.bin <- rep(1,25)
x.bin <- seq(-10,10,length.out = 25)
points (trans3d(x.bin, y.bin, z = y0, pmat = res2), col = 1, lwd=2)
lines (trans3d(x.bin, y.bin, z = LCI, pmat = res2), col = 1, lwd=2)
lines (trans3d(x.bin, y.bin, z = UCI, pmat = res2), col = 1, lwd=2)
The problem is that the upper and lower confidence intervals are being drawn as a single line. If you loop over the points with an interval, and then plot the line between the upper and lower values there, the plot looks closer to what you want. Note that point values in the example y0 is not on many of the 3d intervals.
# data from link is imported
persp(xyz,theta = 45, phi = 25,border="grey40", ticktype = "detailed", zlim=c(0,.8))->res2
y.bin <- rep(1,25)
x.bin <- seq(-10,10,length.out = 25)
# y0 points
points (trans3d(x.bin, y.bin, z = y0, pmat = res2), col = 1, lwd=2)
# lines between upper and lower CIs for each location
for(i in 1:length(LCI)){
lines (trans3d(rep(x.bin[i],2), rep(y.bin[i],2), z = c(LCI[i],UCI[i]), pmat = res2), col = 1, lwd=2)
}

Plot normal, left and right skewed distribution in R

I want to create 3 plots for illustration purposes:
- normal distribution
- right skewed distribution
- left skewed distribution
This should be an easy task, but I found only this link, which only shows a normal distribution. How do I do the rest?
If you are not too tied to normal, then I suggest you use beta distribution which can be symmetrical, right skewed or left skewed based on the shape parameters.
hist(rbeta(10000,5,2))
hist(rbeta(10000,2,5))
hist(rbeta(10000,5,5))
Finally I got it working, but with both of your help, but I was relying on this site.
N <- 10000
x <- rnbinom(N, 10, .5)
hist(x,
xlim=c(min(x),max(x)), probability=T, nclass=max(x)-min(x)+1,
col='lightblue', xlab=' ', ylab=' ', axes=F,
main='Positive Skewed')
lines(density(x,bw=1), col='red', lwd=3)
This is also a valid solution:
curve(dbeta(x,8,4),xlim=c(0,1))
title(main="posterior distrobution of p")
just use fGarch package and these functions:
dsnorm(x, mean = 0, sd = 1, xi = 1.5, log = FALSE)
psnorm(q, mean = 0, sd = 1, xi = 1.5)
qsnorm(p, mean = 0, sd = 1, xi = 1.5)
rsnorm(n, mean = 0, sd = 1, xi = 1.5)
** mean, sd, xi location parameter mean, scale parameter sd, skewness parameter xi.
Examples
## snorm -
# Ranbdom Numbers:
par(mfrow = c(2, 2))
set.seed(1953)
r = rsnorm(n = 1000)
plot(r, type = "l", main = "snorm", col = "steelblue")
# Plot empirical density and compare with true density:
hist(r, n = 25, probability = TRUE, border = "white", col = "steelblue")
box()
x = seq(min(r), max(r), length = 201)
lines(x, dsnorm(x), lwd = 2)
# Plot df and compare with true df:
plot(sort(r), (1:1000/1000), main = "Probability", col = "steelblue",
ylab = "Probability")
lines(x, psnorm(x), lwd = 2)
# Compute quantiles:
round(qsnorm(psnorm(q = seq(-1, 5, by = 1))), digits = 6)

Resources