Adding point of significance to histogram in R - r

Below I have my code, which determines the number of times the mean of a population falls within confidence intervals of samples taken from the mean. Basically, hoping to prove 95% confidence intervals work.
rp<-function(x,s,n){ #x-population data, s-number of samples taken from
population, n-size of samples
m<-mean(x)
ci.mat=NULL
tot=0
for(i in 1:s){
cix<-t.test(sample(x,n))$conf.int #obtain confidence intervals of 1000
samples of x
if(cix[1]<m & m<cix[2]){tot<-tot+1} #total number of confidence intervals containing pop mean
ci.mat<-rbind(ci.mat,cbind(cix[1],cix[2]))
}
par(mfrow=c(2,1))
hist(ci.mat[,1],main=paste("Lower Limits for Sample Confidence
Intervals"),xlab="Lower Limit")
hist(ci.mat[,2], main=paste("Upper Limits for Sample Confidence
Intervals"),xlab="Upper Limit")
return(data.frame(mean(x),tot/s))
}
I am hoping to add the population mean to my histograms, so I can show where the confidence intervals did not include the mean. So wherever the mean lies on the histogram with the lower limits, any values to the right of it would be part of a confidence interval that did not include the mean. I have no experience with modifying plots in R, so I don't even know if this is possible. Thanks for any help!

Related

Calculate and Output of Crude and Adjusted Odds Ratios and CI

I'm new to using R. Most of my past experience is with the CDC's EpiINFO package.
I've been involved in numerous medical studies wherein we generate tables showing both the crude odds ratio and an adjusted odds ratio for 3 to 10 negative outcomes, comparing a control group and two or more other groups. Typically, I would already have a table with a row for each outcome and a set of columns for each group. In each row, I would have the total of all members of each group, the number who experienced the negative outcome, the number that did not have the negative outcome. From these I can calculate the unadjusted odds ratio, even in Excel.
How do I use R to caclulate an adjusted odds ratio( and 95% confidence intervals) adjusting for age, race, state of residence, etc?
Ideally, I'd like a function that would produce an output in a single line that I could cut and paste into my Excel sheet that would show these eight values:
Crude OR, Lower 95% CI, Upper 95% CI, p, Adjusted OR, Lower 95% CI, Upper 95% , p
Does such a function or package exist?
Thanks!

Setting the confidence interval in acf plots

I plotted the autocorrelation function of a given set of residuals I obtained from estimating a linear regression model:
> require("stats")
> acf(Reg$residuals)
It resulted in the following graphic:
I then wanted to look up what kind of confidence interval (95%, 99%) is displayed, but there is no information on that within the help section of the function. In addition to that I could not find a way to adjust the confidence interval manually.
Is there a way to manually set the confidence interval displayed?
See ?plot.acf:
plot(x, ci = 0.95, ...)
and:
ci: coverage probability for confidence interval. Plotting of the confidence interval is suppressed if ci is zero or negative.
That is, the default is 95% confidence intervals, and e.g.:
plot(acf(Reg$residuals), ci = 0.99)
should plot the 99% confidence intervals.

confidence interval for population proportion in R

How to calculate 95% confidence interval of a nominal variable. For instance, I have a sample size of 987 (300 smokers, 687 non-smokers). I need to calculate 95% confidence interval for the percentage of smokers in the population.
Do read the Wikipedia link provided by Otto. An easy to use implementation in R can be found in the Hmisc package.
library(Hmisc)
binconf(687, 987,alpha = 0.05)

obtaining quantiles from complete gaussian fit of data in R

I have been struggling with how R calculates quantiles and the normal fitting of data.
I have data (NDVI values) that follows a truncated normal distribution (see figure)
I am interested in getting the lowest 10th percentile value (p=0.1) from the data and from the fitting normal distribution curve.
In my understanding, because the data is truncated, the two should be quite different: I expect the quantile from the data to be higher than the one calculated from the normal distribution, but this is not so. For what I understand of the quantile function help the quantile from the data should be the default quantile function:
q=quantile(y, p=0.1)
while the quantile from the normal distribution is :
qx=quantile(y, p=0.1, type=9)
However the two result very close in all cases, which makes me wonder to what type of distribution does R fit the data to calculate the quantile (truncated normal dist.?)
I have also tried to calculate the quantile based on the fitting normal curve as:
fitted=fitdist(as.numeric(y), "norm", discrete = T)
fit.q=as.numeric(quantile(fitted, p=0.1)[[1]][1])
but obtaining no difference.
So my questions are:
To what curve does R fit the data for calculating quantiles, in particular for type=9 ? How can I calculate the quantile based on the complete normal distribution (including the lower tail)?
I don't know how to generate a reproducible example for this, but the data is available at https://dl.dropboxusercontent.com/u/26249349/data.csv
Thanks!
R is using the empirical ordering of the data when determining quantiles, rather than assuming any particular distribution.
The 10th percentile for your truncated data and a normal distribution fit to your data happen to be pretty close, although the 1st percentile is quite a bit different. For example:
# Load data
df = read.csv("data.csv", header=TRUE, stringsAsFactors=FALSE)
# Fit a normal distribution to the data
df.dist = fitdist(df$x, "norm", discrete = T)
Now let's get quantiles of the fitted distribution and the original data. I've included the 1st percentile in addition to the 10th percentile. You can see that the fitted normal distribution's 10th percentile is just a bit lower than that of the data. However, the 1st percentile of the fitted normal distribution is much lower.
quantile(df.dist, p=c(0.01, 0.1))
Estimated quantiles for each specified probability (non-censored data)
p=0.01 p=0.1
estimate 1632.829 2459.039
quantile(df$x, p=c(0.01, 0.1))
1% 10%
2064.79 2469.90
quantile(df$x, p=c(0.01, 0.1), type=9)
1% 10%
2064.177 2469.400
You can also see this by direct ranking of the data and by getting the 1st and 10th percentiles of a normal distribution with mean and sd equal to the fitted values from fitdist:
# 1st and 10th percentiles of data by direct ranking
df$x[order(df$x)][round(c(0.01,0.1)*5780)]
[1] 2064 2469
# 1st and 10th percentiles of fitted distribution
qnorm(c(0.01,0.1), df.dist$estimate[1], df.dist$estimate[2])
[1] 1632.829 2459.039
Let's plot histograms of the original data (blue) and of fake data generated from the fitted normal distribution (red). The area of overlap is purple.
# Histogram of data (blue)
hist(df$x, xlim=c(0,8000), ylim=c(0,1600), col="#0000FF80")
# Overlay histogram of random draws from fitted normal distribution (red)
set.seed(685)
set.seed(685)
x.fit = rnorm(length(df$x), df.dist$estimate[1], df.dist$estimate[2])
hist(x.fit, add=TRUE, col="#FF000080")
Or we can plot the empirical cumulative distribution function (ecdf) for the data (blue) and the random draws from the fitted normal distribution (red). The horizontal grey line marks the 10th percentile:
plot(ecdf(df$x), xlim=c(0,8000), col="blue")
lines(ecdf(x.fit), col="red")
abline(0.1,0, col="grey40", lwd=2, lty="11")
Now that I've gone through this, I'm wondering if you were expecting fitdist to return the parameters of the normal distribution we would have gotten had your data really come from a normal distribution and not been truncated. Rather, fitdist returns a normal distribution with the mean and sd of the (truncated) data at hand, so the distribution returned by fitdist is shifted to the right compared to where we might have "expected" it to be.
c(mean=mean(df$x), sd=sd(df$x))
mean sd
3472.4708 790.8538
df.dist$estimate
mean sd
3472.4708 790.7853
Or, another quick example: x is normally distributed with mean ~ 0 and sd ~ 1. xtrunc removes all values less than -1, and xtrunc.dist is the output of fitdist on xtrunc:
set.seed(55)
x = rnorm(6000)
xtrunc = x[x > -1]
xtrunc.dist = fitdist(xtrunc, "norm")
round(cbind(sapply(list(x=x,xtrunc=xtrunc), function(x) c(mean=mean(x),sd=sd(x))),
xtrunc.dist=xtrunc.dist$estimate),3)
x xtrunc xtrunc.dist
mean -0.007 0.275 0.275
sd 1.009 0.806 0.806
And you can see in the ecdf plot below that the truncated data and the normal distribution fitted to the truncated data have about the same 10th percentile, while the 10th percentile of the untruncated data is (as we would expect) shifted to the left.

Significance level of ACF and PACF in R

I want to obtain the the limits that determine the significance of autocorrelation coefficients and partial autocorrelation coefficients, but I don't know how to do it.
I obtained the Partial autocorrelogram using this function pacf(data). I want that R print me the values indicated in the figure.
The limits that determine the significance of autocorrelation coefficients are: +/- of (exp(2*1.96/√(N-3)-1)/(exp(2*1.96/√(N-3)+1).
Here N is the length of the time series, and I used the 95% confidence level.
The correlation values that correspond to the m % confidence intervals chosen for the test are given by 0 ± i/√N where:
N is the length of the time series
i is the number of standard deviations we expect m % of the correlations to lie within under the null hypothesis that there is zero autocorrelation.
Since the observed correlations are assumed to be normally distributed:
i=2 for a 95% confidence level (acf's default),
i=3 for a 99% confidence level,
and so on as dictated by the properties of a Gaussian distribution
Figure A1, Page 1011 here provides a nice example of how the above principle applies in practice.
After investigating acf and pacf functions and library psychometric with its CIz and CIr functions I found this simple code to do the task:
Compute confidence interval for z Fisher:
ciz = c(-1,1)*(-qnorm((1-alpha)/2)/sqrt(N-3))
here alpha is the confidence level (typically 0.95). N - number of observations.
Compute confidence interval for R:
cir = (exp(2*ciz)-1)/(exp(2*ciz)+1

Resources