Boxplots with 95% Confidence Intervals in R - r

I'm trying to generate boxplots in R that display the 95% confidence intervals of the mean but I can't find any way to display this statistic. I typically use ggplot2 for data visualisation in R but I'm open to using another package if necessary. Does anyone have any suggestions on how to do this? Thanks.

Here is an ideia, with normal dist:
set.seed(123)
a = cumsum(rnorm(100))
n=length(a)
mm=mean(a)
dd=sd(a)
error <- qnorm(0.975)*dd/sqrt(n)
inf <- mm-error
sup <- mm+error
boxplot(a,col=3)
lines(c(0.75,1.25),c(inf,inf),col=4)
lines(c(0.75,1.25),c(mm,mm),col=2,lwd=2)
lines(c(0.75,1.25),c(sup,sup),col=4)
legend("topleft", c("95% CI", "Mean"), lty=1,col = c(4, 2),bty ="n")

Related

abline() is not working with weighted.hist()

So I used the plotrix library to plot a histogram using some weights , the histogram shows up as expected but when I tried a plot the mean as a vertical line it won't show up at all
Here's a snippet of my code:
library("plotrix")
library("zoom")
vals = seq.int(from = 52.5 , to = 97.5 , by = 5)
weights <- c(18.01,18.26,16.42,14.07,11.67,9.19,6.46,3.85,1.71,0.34)/100
mean <- sum(vals*weights)
wh <- weighted.hist(x = vals , w = weights , freq = FALSE)
abline(v = mean)
the abline() seems to work only with the normal hist() function
I am sorry if the question sounds stupid , I am R newbie however I did my research and could not find any helpful info.
Thanks in advance.
You should provide a sample of your data. Your calculation of the weighted mean is only correct if your weights sum to 1. If they do not, you should use weighted.mean(vals, weights) or sum(vals * weights/sum(weights)). The following example is slightly modified from the one on the weighted.hist manual page (help(weighted.hist)):
vals <- sample(1:10, 300, TRUE)
weights <- (101:400)/100
weighted.hist(vals, weights, breaks=1:10, main="Test weighted histogram")
(mean <- weighted.mean(vals, weights))
# [1] 5.246374
The histogram starts at 1, but this is 0 on the x-axis coordinates so we need to subtract 1 to get the line in the right place:
abline(v=mean-1, col="red")
Using your data we need to identify the first boundary to adjust the mean so it plots in the correct location"
wh$breaks[1]
# [1] 52.5
abline(v=mean - wh$breaks[1], col="red")

Setting custom LCL and UCL limits with qcc (Rstudio)

Perhaps this is an easy question but I am quite new wirh R and am struggling to define custom UCL and LCL limits in xBar control charts. In productions we have already set tollerances that must be fulfilled and I would like to set the limits (LCL and UCL) according the tollerances but I do not know how to do.
I write here a simple example to better understand:
library(qcc)
data(pistonrings)
diameter <- pistonrings$diameter
q1 <- qcc(diameter, type = "xbar.one", plot = TRUE)
This creates the xBar chart defining the two limits according the measurements and confidence interval. I would like to set them as following (just as example) and calculate the results according these values:
LCL: 73.99
UCL: 74.02
Is it possible?
I fixed the issue. It was enough specifying the limits with the qcc function:
q1 <- qcc(diameter, type = "xbar.one", plot = TRUE, limits = c(73.99,74.02))

Smooth curve through points and include the origin in R

I am a beginner in R and started with graphics recently.
I have managed to program a working empirical cumulative distribution function (user-generated, not using the standard ecdf() function) and to generate a plot. However, the plot is not as it should be, there are two issues with it and I am not sure on how to solve them (I have done my 'research' but have not found a solution).
This is my code:
set.seed(1)
n = 50
x = rpois(n, 2.2)
cdf = function(x,n)
{
v=c()
for(z in 1:max(x))
{
a = length(x[x<=z])/n
v = c(v, a)
}
plot(v,type="l", main="empirical cumulative distribution function", xlab="x", ylab="cumulative probability", xlim=c(0,6), ylim=c(0,1.0))
}
cdf(x, n)
There are two issues with this plot:
The lines are straight but it should be a smooth curve through all points.
The origin is not included (now the curve starts at x = 1).
How can these issues be resolved in an elegant way?
Try the following spline interpolator:
plot(spline(c(0, v)), type = "l")

rarecurve() plotted with Standard Error

Does rarecurve() (vegan) accept standard error for plotting?
If so, how can I plot such a curve?
I am following a classical script for this, with the BCI dataset:
S <- specnumber(BCI)
(raremax <- min(rowSums(BCI)))
Srare <- rarefy(BCI, raremax)
plot(S, Srare, xlab = "Observed No. of Species", ylab = "Rarefied No. of Species")
abline(0, 1)
rarecurve(BCI, step = 20, sample = raremax, col = "blue", cex = 0.6)
Statistically speaking, facilitating a function as this one would be helpful to most vegan users.
Thank you!
André
rarecurve does not give you SE. The reason is obvious and already given to you: there is enough clutter without extra curves. If you really want to do this, you must do it manually. That is not too complicated, because rarefy function accepts a vector sample sizes and gives you all the numbers you need. The following draws a basic plot using one site of Barro Colorado data set:
library(vegan)
data(BCI)
sum(BCI[1,]) # site 1, 448 tree stems
N <- seq(2, 448, by=8)
S <- rarefy(BCI[1,], N, se = TRUE)
plot(N, S[1,], type="l", lwd=3)
lines(N, S[1,] + 2*S[2,]) ## 2*SE is good enough for 95% CI
lines(N, S[1,] - 2*S[2,])
Statistically speaking, this gives you only the error caused by the subsampling process assuming that the observed data have no random variation. To me this makes little sense, and I find the rarefaction SE's misleading and meaningless. That does not stop me providing them in vegan.

R superimposing bivariate normal density (ellipses) on scatter plot

There are similar questions on the website, but I could not find an answer to this seemingly very simple problem. I fit a mixture of two gaussians on the Old Faithful Dataset:
if(!require("mixtools")) { install.packages("mixtools"); require("mixtools") }
data_f <- faithful
plot(data_f$waiting, data_f$eruptions)
data_f.k2 = mvnormalmixEM(as.matrix(data_f), k=2, maxit=100, epsilon=0.01)
data_f.k2$mu # estimated mean coordinates for the 2 multivariate Gaussians
data_f.k2$sigma # estimated covariance matrix
I simply want to super-impose two ellipses for the two Gaussian components of the model described by the mean vectors data_f.k2$mu and the covariance matrices data_f.k2$sigma. To get something like:
For those interested, here is the MatLab solution that created the plot above.
If you are interested in the colors as well, you can use the posterior to get the appropriate groups. I did it with ggplot2, but first I show the colored solution using #Julian's code.
# group data for coloring
data_f$group <- factor(apply(data_f.k2$posterior, 1, which.max))
# plotting
plot(data_f$eruptions, data_f$waiting, col = data_f$group)
for (i in 1: length(data_f.k2$mu)) ellipse(data_f.k2$mu[[i]],data_f.k2$sigma[[i]], col=i)
And for my version using ggplot2.
# needs ggplot2 package
require("ggplot2")
# ellipsis data
ell <- cbind(data.frame(group=factor(rep(1:length(data_f.k2$mu), each=250))),
do.call(rbind, mapply(ellipse, data_f.k2$mu, data_f.k2$sigma,
npoints=250, SIMPLIFY=FALSE)))
# plotting command
p <- ggplot(data_f, aes(color=group)) +
geom_point(aes(waiting, eruptions)) +
geom_path(data=ell, aes(x=`2`, y=`1`)) +
theme_bw(base_size=16)
print(p)
You can use the ellipse-function from package mixtools. The initial problem was that this function swaps x and y from your plot. I'll try to figure this out and update the answe. (I'll leave the colors to somebody else...)
plot( data_f$eruptions,data_f$waiting)
for (i in 1: length(data_f.k2$mu)) ellipse(data_f.k2$mu[[i]],data_f.k2$sigma[[i]])
Using mixtools internal plotting function:
plot.mixEM(data_f.k2, whichplots=2)

Resources