chi-square distribution R - r

Trying to fit a chi_square distribution using fitdistr() in R. Documentation on this is here (and not very useful to me): https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/fitdistr.html
Question 1: chi_df below has the following output: 3.85546875 (0.07695236). What is the second number? The Variance or standard deviation?
Question 2: fitdistr generates 'k' defined by the Chi-SQ distribution. How do I fit the data so I get the scaling constant 'A'? I am dumbly using lines 14-17 below. Obviously not good.
Question 3: Is the Chi-SQ distribution only defined for a certain x-range? (Variance is defined as 2K, while mean = k. This must require some constrained x-range... Stats question not programming...)
nnn = 1000;
## Generating a chi-sq distribution
chii <- rchisq(nnn,4, ncp = 0);
## Plotting Histogram
chi_hist <- hist(chii);
## Fitting. Gives probability density which must be scaled.
chi_df <- fitdistr(chii,"chi-squared",start=list(df=3));
chi_k <- chi_df[[1]][1];
## Plotting a fitted line:
## Spanning x-length of chi-sq data
x_chi_fit <- 1:nnn*((max(chi_hist[[1]][])-min(chi_hist[[1]][]))/nnn);
## Y data using eqn for probability function
y_chi_fit <- (1/(2^(chi_k/2)*gamma(chi_k/2)) * x_chi_fit^(chi_k/2-1) * exp(-x_chi_fit/2));
## Normalizing to the peak of the histogram
y_chi_fit <- y_chi_fit*(max(chi_hist[[2]][]/max(y_chi_fit)));
## Plotting the line
lines(x_chi_fit,y_chi_fit,lwd=2,col="green");
Thanks for your help!

As commented above, ?fitdistr says
An object of class ‘"fitdistr"’, a list with four components,
...
sd: the estimated standard errors,
... so that parenthesized number is the standard error of the parameter.
The scale parameter doesn't need to be estimated; you need either to scale by the width of your histogram bins or just use freq=FALSE when drawing your histogram. See code below.
The chi-squared distribution is defined on the non-negative reals, which makes sense since it's the distribution of a squared standard Normal (this is a statistical, not a programming question).
Set up data:
nnn <- 1000
## ensure reproducibility; not a big deal in this case,
## but good practice
set.seed(101)
## Generating a chi-sq distribution
chii <- rchisq(nnn,4, ncp = 0)
Fitting.
library(MASS)
## use method="Brent" based on warning
chi_df <- fitdistr(chii,"chi-squared",start=list(df=3),
method="Brent",lower=0.1,upper=100)
chi_k <- chi_df[[1]][1]
(For what it's worth, it looks like there might be a bug in the print method for fitdistr when method="Brent" is used. You could also use method="BFGS" and wouldn't need to specify bounds ...)
Histograms
chi_hist <- hist(chii,breaks=50,col="gray")
## scale by N and width of histogram bins
curve(dchisq(x,df=chi_k)*nnn*diff(chi_hist$breaks)[1],
add=TRUE,col="green")
## or plot histogram already scaled to a density
chi_hist <- hist(chii,breaks=50,col="gray",freq=FALSE)
curve(dchisq(x,df=chi_k),add=TRUE,col="green")

Related

Generate beta-binomial distribution from existing vector

Is it possible to/how can I generate a beta-binomial distribution from an existing vector?
My ultimate goal is to generate a beta-binomial distribution from the below data and then obtain the 95% confidence interval for this distribution.
My data are body condition scores recorded by a veterinarian. The values of body condition range from 0-5 in increments of 0.5. It has been suggested to me here that my data follow a beta-binomial distribution, discrete values with a restricted range.
set1 <- as.data.frame(c(3,3,2.5,2.5,4.5,3,2,4,3,3.5,3.5,2.5,3,3,3.5,3,3,4,3.5,3.5,4,3.5,3.5,4,3.5))
colnames(set1) <- "numbers"
I see that there are multiple functions which appear to be able to do this, betabinomial() in VGAM and rbetabinom() in emdbook, but my stats and coding knowledge is not yet sufficient to be able to understand and implement the instructions provided on the function help pages, at least not in a way that has been helpful for my intended purpose yet.
We can look at the distribution of your variables, y-axis is the probability:
x1 = set1$numbers*2
h = hist(x1,breaks=seq(0,10))
bp = barplot(h$counts/length(x1),names.arg=(h$mids+0.5)/2,ylim=c(0,0.35))
You can try to fit it, but you have too little data points to estimate the 3 parameters need for a beta binomial. Hence I fix the probability so that the mean is the mean of your scores, and looking at the distribution above it seems ok:
library(bbmle)
library(emdbook)
library(MASS)
mtmp <- function(prob,size,theta) {
-sum(dbetabinom(x1,prob,size,theta,log=TRUE))
}
m0 <- mle2(mtmp,start=list(theta=100),
data=list(size=10,prob=mean(x1)/10),control=list(maxit=1000))
THETA=coef(m0)[1]
We can also use a normal distribution:
normal_fit = fitdistr(x1,"normal")
MEAN=normal_fit$estimate[1]
SD=normal_fit$estimate[2]
Plot both of them:
lines(bp[,1],dbetabinom(1:10,size=10,prob=mean(x1)/10,theta=THETA),
col="blue",lwd=2)
lines(bp[,1],dnorm(1:10,MEAN,SD),col="orange",lwd=2)
legend("topleft",c("normal","betabinomial"),fill=c("orange","blue"))
I think you are actually ok with using a normal estimation and in this case it will be:
normal_fit$estimate
mean sd
6.560000 1.134196

Simulate Values that follow a Distribution Curve in R

I want to simulate demand values that follows different distributions (ex above: starts of linear> exponential>invlog>etc) I'm a bit confused by the notion of probability distributions but thought I could use rnorm, rexp, rlogis, etc. Is there any way I could do so?
I think it may be this but in R: Generating smoothed randoms that follow a distribution
Simulating random values from commonly-used probability distributions in R is fairly trivial using rnorm(), rexp(), etc, if you know what distribution you want to use, as well as its parameters. For example, rnorm(10, mean=5, sd=2) returns 10 draws from a normal distribution with mean 5 and sd 2.
rnorm(10, mean = 5, sd = 2)
## [1] 5.373151 7.970897 6.933788 5.455081 6.346129 5.767204 3.847219 7.477896 5.860069 6.154341
## or here's a histogram of 10000 draws...
hist(rnorm(10000, 5, 2))
You might be interested in an exponential distribution - check out hist(rexp(10000, rate=1)) to get the idea.
The easiest solution will be to investigate what probability distribution(s) you're interested in and their implementation in R.
It is still possible to return random draws from some custom function, and there are a few techniques out there for doing it - but it might get messy. Here's a VERY rough implementation of drawing randomly from probabilities defined by the region of x^3 - 3x^2 + 4 between zero and 3.
## first a vector of random uniform draws from the region
unifdraws <- runif(10000, 0, 3)
## assign a probability of "keeping" draws based on scaled probability
pkeep <- (unifdraws^3 - 3*unifdraws^2 + 4)/4
## randomly keep observations based on this probability
keep <- rbinom(10000, size=1, p=pkeep)
draws <- unifdraws[keep==1]
## and there it is!
hist(draws)
## of course, it's less than 10000 now, because we rejected some
length(draws)
## [1] 4364

Simulating a draw from the distribution of $X$ (in R)

I have a pdf $f(x)=4x^3$ of a random variable $X$ in which I need to simulate a draw from the distribution.
My solution consists of finding the cdf from the pdf (1st issue):
> pdf <- function(x){4*x^3}
> cdf <- integrate(pdf,lower=0,upper=x)
Error in integrate(pdf, lower = 0, upper = x) : object 'x' not found
Once I get the cdf $U$, I will set $X=F^-1(U)$. I notice that the pdf follows a Beta distribution with $\alpha=4$ and $\beta=1$.
Is it best to find the $F^-1$ via a inverse beta function? Is there a quick way to find the inverse of a beta function in R?
Since you have identified your pdf as beta, just use rbeta to sample.
s1 <- rbeta(5000,4,1)
In the case where the distribution is non-standard and you cannot solve analytically, you can use rejection sampling. Let's pretend we don't know your pdf is beta and we don't know how to integrate/inverse.
pdf <- function(x) 4*x^3 # on [0,1]
First we draw from our proposal distribution
p <- runif(50000)
Calculate the density values under our pdf
dp <- pdf(p)
And randomly accept/reject in proportion
s2 <- p[runif(50000) < dp/max(dp)]
You should find the distributions of s1 and s2 comparable, using histograms or, preferably, a qqplot.

Fitting the Poisson distribution

I was unable to calculate the maximum likelihood estimator and BIC for the Poisson distribution.. I was able to get the histogram but couldn't superimpose a kernel density estimate on it.
Can you please tell me where I went wrong?
x.pois<-rpois(Y1, 20)
hist(x.pois, breaks=100,freq=FALSE)
lines(density(Y1, bw=0.8), col="red")
library(MASS)
fitdistr(Y1,densfun="pois")
my.mle<-fitdistr(Y1, densfun="poison")
print(my.mle)
BIC(my.mle)
You need to (1) spell "poisson" correctly; (2) use x.pois (the Poisson sample), not Y1 (which should be the number of points you're trying to sample, based on your code example).
Note that kernel density estimates, and histograms, of discrete distributions don't necessarily make a lot of sense.
Y1 <- 100
set.seed(101) ## for reproducibility
x.pois<-rpois(Y1, 20)
hist(x.pois, breaks=100,freq=FALSE)
lines(density(x.pois, bw=0.8), col="red")
library(MASS)
(my.mle<-fitdistr(x.pois, densfun="poisson"))
## lambda
## 20.6700000
## ( 0.4546427)
BIC(my.mle)
## [1] 572.7861
update: your other question makes it clear that Y1 really is your sample, in which case the whole rpois()-sampling thing is just a red herring. In that case you should just leave out the first three lines, and substitute Y1 for x.pois, in the code above.

Density of a Two-Piece Normal (or Split Normal) Distribution

Is there a density function for the two-piece Normal distribution:
on CRAN? Thought I would check before I code one. I have checked the distribution task view. It is not listed there. I have looked in a couple of likely packages, but to no avail.
Update: I have added dsplitnorm, psplitnorm, qsplitnorm and rsplitnorm functions to the fanplot package.
If you choose to construct your own version of the distribution, you might be interested in distr. It (and the related packages distrEx, distrSim, distrTEst, distrTeach and distrDoc) have been written to provide a unified interface for constructing new distributions from existing ones. (I constructed this example with the help of the wonderful vignette that accompanies the distrDoc package and which can be gotten by typing vignette("distr").)
This implements the split normal distribution, which may not be exactly what you are after. Using the distr toolset, though, it shouldn't be too hard to adjust this to fit your exact needs.
library(distr)
## Construct the distribution object.
## Here, it's a split normal distribution with mode=0, and lower- and
## upper-half standard deviations of 1 and 2, respectively.
splitNorm <- UnivarMixingDistribution(Truncate(Norm(0,2), upper=0),
Truncate(Norm(0,1), lower=0),
mixCoeff=c(0.5, 0.5))
## Construct its density function ...
dsplitNorm <- d(splitNorm)
## ... and a function for sampling random variates from it
rsplitNorm <- r(splitNorm)
## Compare the density it returns to that from rnorm()
dsplitNorm(-1)
# [1] 0.1760327
dnorm(-1, sd=2)
# [1] 0.1760327
## Sample and plot a million random variates from the distribution
x <- rsplitNorm(1e6)
hist(x, breaks=100, col="grey")
## Plot the distribution's continuous density
plot(splitNorm, to.draw.arg="d")

Resources