Trying to fit f distribution to a vector - r

Would anyone know why the following code fails to execute fitdist with error "the function mle failed to estimate the parameters, with the error code 100".
I have encountered this error in the past when working with the normal distribution; the solution in that case was increasing the variance of the vector (by multiplying it by say 100), but that does not help on this case. Please note all elements in the vector are positive. Thank you.
library(fitdistrplus)
VH <- c(0.36, 0.3, 0.36, 0.47, 0, 0.05, 0.4, 0, 0, 0.15, 0.89, 0.03, 0.45, 0.21, 0, 0.18, 0.04, 0.53, 0, 0.68, 0.06, 0.09, 0.58, 0.03, 0.23, 0.27, 0, 0.12, 0.12, 0, 0.32, 0.07, 0.04, 0.07, 0.39, 0, 0.25, 0.28, 0.42, 0.55, 0.04, 0.07, 0.18, 0.17, 0.06, 0.39, 0.65, 0.15, 0.1, 0.32, 0.52, 0.55, 0.71, 0.93, 0, 0.36)
f <- fitdist(na.exclude(VH),"f", start =list(df1=1, df2=2))

The error you get here is actually somewhat informative:
simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, ddistnam = ddistname, hessian = TRUE, method = meth, lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters
Error in fitdist(na.exclude(VH), "f", start = list(df1 = 1, df2 = 2)) :
the function mle failed to estimate the parameters,
with the error code 100
That means something went wrong right away, not in the middle of the optimization process.
Taking a guess, I looked and saw that there was a zero value in your data (so your statement that all the elements are positive is not technically correct -- they're all non-negative ...). The F distribution has an infinite value at 0: df(0,1,2) is Inf.
If I exclude the zero value, I get an answer ...
f <- fitdist(na.exclude(VH[VH>0]),"f", start =list(df1=1, df2=2))
... the estimated value for the second shape parameter is very large (approx. 6e6, with a big uncertainty), but seems to fit OK ...
par(las=1); hist(VH,freq=FALSE,col="gray")
curve(df(x,1.37,6.45e6),add=TRUE)

Related

How to plot truncated distributions (truncdist) with fitdistrplus?

I am attempting to plot goodness of fit curves to truncated distributions from the fitdistrplus package using its plot function.
library(fitdistrplus)
library(truncdist)
library(truncnorm)
dataNum <- c(433.6668, 413.0450, 435.9952, 449.7559, 457.3629, 498.6187, 598.0335, 637.5611, 644.9193, 634.4843, 620.8676, 590.6622, 581.6411, 572.5022, 594.0925, 587.7293, 608.4948, 626.7594, 599.0286, 611.2966, 572.1749, 545.0071, 490.0298, 478.8484, 458.8293, 437.4878, 467.7026, 477.4094, 467.4182, 519.3056, 599.0155, 648.8603, 623.0672, 606.3737, 552.3653, 558.7612, 553.1345, 549.5961, 546.0578, 565.4582, 562.6825, 606.6225, 578.1584, 572.6201, 546.4735, 514.8147, 479.4638, 462.7702, 430.3652, 452.9671)
If I use the library(truncnorm) to fit a truncated normal distribution, everything works fine.
fit.dataNormTrunc2 <- fitdist(dataNum, "truncnorm", fix.arg=list(a=min(dataNum)), start = list(mean = mean(dataNum), sd = sd(dataNum)))
plot(fit.dataNormTrunc2)
However, if I try to use the truncdist package, only the histogram comparison plot prints without any of the other plots (e.g. qq-plot). I also get an error:
Error in qtNorm(p = c(0.01, 0.03, 0.05, 0.07, 0.09, 0.11, 0.13, 0.15, :
unused argument (p = c(0.01, 0.03, 0.05, 0.07, 0.09, 0.11, 0.13, 0.15, 0.17, 0.19, 0.21, 0.23, 0.25, 0.27, 0.29, 0.31, 0.33, 0.35, 0.37, 0.39, 0.41, 0.43, 0.45, 0.47, 0.49, 0.51, 0.53, 0.55, 0.57, 0.59, 0.61, 0.63, 0.65, 0.67, 0.69, 0.71, 0.73, 0.75, 0.77, 0.79, 0.81, 0.83, 0.85, 0.87, 0.89, 0.91, 0.93, 0.95, 0.97, 0.99))
The code used is:
dtNorm <- function(x, mean, sd) {
dtrunc(x, "norm", mean, sd, a=min(dataNum), b=Inf)
}
ptNorm <- function(x, mean, sd) {
ptrunc(x, "norm", mean, sd, a=min(dataNum), b=Inf)
}
qtNorm <- function(x, mean, sd) {
qtrunc(x, "norm", mean, sd, a=min(dataNum), b=Inf)
}
fit.dataNormTrunc <- fitdist(dataNum, "tNorm", start = c(mean=mean(dataNum), sd=sd(dataNum)))
plot(fit.dataNormTrunc)
I have also tried the truncdist approach with the lognormal functionand again the other 3 plots don't print out and I get the same error about the values not being used.

Bayes Factor values in the R package BayesFactor

I've followed the instructions on how to run a Bayesian 't-test' using default priors in the BayesFactor package in R.
Some of the returned values are astronomical.
Here is an example comparison with a huge Bayes factor:
#install.packages('BayesFactor')
library(BayesFactor)
condition1 <- c(0.94, 0.9, 0.96, 0.74, 1, 0.98, 0.86, 0.92, 0.918367346938776,
0.96, 0.4, 0.816326530612245, 0.8, 0.836734693877551, 0.56, 0.66,
0.605263157894737, 0.836734693877551, 0.84, 0.9, 0.92, 0.714285714285714,
0.82, 0.5, 0.565217391304348, 0.8, 0.62)
condition2 <- c(0.34, 0.16, 0.23, 0.19, 0.71, 0.36, 0.02, 0.83, 0.11, 0.06,
0.27, 0.347368421052632, 0.21, 0.13953488372093, 0.11340206185567,
0.14, 0.142857142857143, 0.257731958762887, 0.15, 0.29, 0.67,
0.0515463917525773, 0.272727272727273, 0.0895522388059701, 0.0204081632653061,
0.13, 0.0612244897959184)
bf = ttestBF(x = condition1, condition2, paired = TRUE)
bf
This returns:
Bayes factor analysis
--------------
[1] Alt., r=0.707 : 144035108289 ±0%
Against denominator:
Null, mu = 0
---
Bayes factor type: BFoneSample, JZS
For the most part the comparisons range from below 1 up to a few hundred. But I'm concerned that this value (144035108289!) is indicative of something erroneous on my part.
FYI: the p-value in the null-hypothesis test on the same data as above = 4.649279e-14.
Any assurances or insights into this returned BF would be much appreciated.
I calculated the BF using manual input of t-value and sample size like this using the same package:
exp(ttest.tstat(t=14.63, n1=27, rscale = 0.707)[['bf']])
It gives the same BF. It seems this is largely due to a relatively big sample size (27). The returned BF appears to be on the up-and-up.

Understanding and implementing numerical integration with a quantile function in R

I need to calculate this integral below, using R:
The q_theta(x) function I managed to do in R with quantile regression (package: quantreg).
matrix=structure(c(0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09,
0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2,
0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31,
0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42,
0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53,
0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64,
0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72, 0.73, 0.74, 0.75,
0.76, 0.77, 0.78, 0.79, 0.8, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86,
0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97,
0.98, 0.99, -22.2830664155772, -22.2830664155772, -19.9298291765612,
-18.2066426767652, -15.2657135034479, -14.921522915965, -13.5035945028536,
-13.1557269916064, -12.9495709618481, -11.6168348488161, -11.3999095021713,
-10.6962766764396, -10.0588239375837, -9.12944363439522, -8.15648778610587,
-8.04133299299019, -7.66558386420434, -7.50906566627427, -6.95626096568998,
-6.90630556403136, -6.53374879831376, -6.39324677042686, -6.20705804899049,
-6.09754765999465, -5.91272058217526, -5.75771166206242, -5.3770131257001,
-5.20892464393192, -5.07372162687422, -4.96706814289334, -4.64404095131293,
-4.1567394053577, -4.13209444755342, -3.85483644113723, -3.64855238293205,
-3.53054113507559, -3.46035383338799, -3.03155417364444, -2.93100183005178,
-2.90491824855193, -2.64056616049773, -2.51857727614607, -2.25163805172486,
-2.00934783937474, -1.89925824841417, -1.71405007411747, -1.65905834683964,
-1.47502511311988, -1.42755073292529, -1.20464216637298, -1.08574103345057,
-0.701134735371922, -0.590656010656201, -0.290335898959635, -0.0575062007348038,
0.0778328375033378, 0.165234593185889, 0.230651883848336, 0.316817885358695,
0.34841775605248, 0.516869604496075, 0.59743162507581, 0.857843937404964,
0.939734010162078, 1.12533017928147, 1.27037182428776, 1.52040854525927,
1.76577933448152, 2.07456447851822, 2.17389787235523, 2.27567786362425,
2.3850323163509, 2.55365596853891, 2.61208242890655, 2.77359226593771,
2.93275094039929, 3.07968072488942, 3.0822647851901, 3.26452177629061,
3.46223321951649, 3.66011832966054, 3.85710605543097, 4.05385887531972,
4.83943843494744, 5.05864734149161, 5.25501778319145, 5.38941130574907,
5.88571117751377, 6.5116611852713, 6.98632496342285, 7.21816245728101,
7.73244825971004, 7.80401007592906, 8.34648625541999, 9.83184090479964,
10.8324874884172, 11.3060100107816, 12.3048113953808, 13.1300123358331
), .Dim = c(99L, 2L), .Dimnames = list(NULL, c("Theta", "q(x)_(Theta)"
)))
This is my q_theta(x) function that I estimated in R. One of the question I have is:
a> If x is a standard normal distribution this integral is zero; Right?
b> Otherwise, in my case, the integral is not zero. How do I treat the q_1-Theta(x)? Its simply the sort(matrix[,"q(x)_(Theta)"],decreasing=TRUE) ?
And the integration would be:
sintegral(thau[1:50], (matrix[,"q(x)_(Theta)"][1:50] - sort(matrix[,"q(x)_(Theta)"],TRUE)[1:50])[1:50])$value
The median would be a comun point of this two functions. Right?
Thanks.
Recall your previous post Building a function by defining X and Y and then Integrating in R, we build a linear interpolation function
## note `rule = 2` to enable "extrapolation";
## otherwise `rule = 1` gives `NA` outside [0.01, 0.5]
integrand <- approxfun(mat[, 1], y, rule = 2)
Then we can perform numeric integration on [0, 0.5]:
integrate(integrand, lower = 0, upper = 0.5)
# -5.594405 with absolute error < 4e-04
Now for a>, let's have a proof first.
Note, your quantile function is not for normal distribution, so this result does not hold. You can actually verify this
quant <- approxfun(mat[, 1], mat[, 2], rule = 2)
integrate(quant, lower = 0, upper = 0.5)
# -3.737973 with absolute error < 0.00029
Compared with previous integration result -5.594405, the difference is not a factor of 2.

Interpolate within points in a vector

Vector V1 contains 56 observations for X, and vector BS contains a bootstrapped sample of V1 of length 100000. I would like to interpolate linearly within points in BS to fill in any missing values. For example, V1 contains no 0.27 values, and hence neither does BS. But BS would contain a few 0.28 and 0.26. I would like the interpolation to create a few 0.27 values and add those to BS. And so on for any missing values within the two extremes in the vector.
V1 <- c(0.18, 0.2, 0.24, 0.35, -0.22, -0.17, 0.28, -0.28, -0.14, 0.03, 0.87, -0.2, 0.06, -0.1, -0.72, 0.18, 0.01, 0.31, -0.36, 0.61, -0.16, -0.07, -0.13, 0.01, -0.09, 0.26, -0.14, 0.08, -0.62, -0.2, 0.3, -0.21, -0.11, 0.05, 0.06, -0.28, -0.27, 0.17, 0.42, -0.05, -0.15, 0.05, -0.07, -0.22, -0.34, 0.16, 0.34, 0.1, -0.12, 0.24, 0.45, 0.37, 0.61, 0.9, -0.25, 0.02)
BS <- sample(V1, 100000, replace=TRUE)
The approxfun functions do not help as are for interpolating within data sets. Have found a few questions/answers covering interpolating within different data sets, but not within one data set. Thank you for your help.
EDIT: please note I do not want to fit a normal distribution (or any other) to create those points.
You can use approx() (or approxfun()) to do this by treating BS as the y-coordinate and using sequential x-coordinates:
set.seed(1L); BS <- sample(V1,1e5L,T);
res <- approx(seq_along(BS),BS,n=length(BS)*2L-1L)$y;
The specification of n here is important. It ensures that exactly one interpolated value will be produced halfway between each adjacent pair of input values.
Here's a plot of an excerpt of the result, centered around the first occurrence of an adjacent pair of 0.26 and 0.28:
i <- which(BS[-length(BS)]==0.26 & BS[-1L]==0.28)[1L];
j <- i*2L-1L;
xlim <- c(j-6L,j+8L);
ylim <- c(-1,1);
xticks <- seq(xlim[1L],xlim[2L]);
yticks <- seq(ylim[1L],ylim[2L],0.05);
plot(NA,xlim=xlim,ylim=ylim,xlab='res index',ylab='y',axes=F,xaxs='i',yaxs='i');
abline(v=xticks,col='lightgrey');
abline(h=yticks,col='lightgrey');
axis(1L,xticks,cex.axis=0.7);
axis(2L,yticks,sprintf('%.02f',round(yticks,2L)),las=1L,cex.axis=0.7);
x <- seq(xlim[1L],xlim[2L],2L); y <- BS[seq(i-3L,len=8L)];
points(x,y,pch=16L,col='red',xpd=NA);
x <- seq(xlim[1L],xlim[2L]); y <- res[x];
points(x,y,pch=4L,cex=1.2,col='blue',xpd=NA);
text(x+0.24,y+0.03,y,cex=0.7,xpd=NA);
legend(xlim[1L]+1.5,0.87,c('input value','interpolated'),col=c('red','blue'),pch=c(16L,4L));

How to implement the bootstrap in R

So I posted a thread about this problem, but it got on hold. So I rephrased so it can be it a programming question. This is my code below. I am trying to find the stimulated confidence level of a sample using the bootstrap.
# Step One: Generating the data from lognormal distribution
MC <-1000; # Number of samples to simulate
xbar = c(1:MC);
mu = 1;
sigma= 1.5;
the_mean <- exp(mu+sigma^2/2);
n= 10;
for(i in 1:MC)
{
mySample <- rlnorm(n=n meanlog=mu, sdlog=sigma);
xbar [i] <- the_mean(mySample);
}
# Step Two: Compute 95% Bootstrap CI with B=1000
B = 1000
xbar_star = c(1:B)
for(b in 1:B)
{
x_star = sample(n,n, replace=TRUE)
xbar_star[b] = mean(x_star)
}
quantile(xbar, p=c(0.025, 0.975))
If you implement this code you can see that the output is 975.025 when it should actually be 0. 90.
I don't understand why my output is wrong.
We arent trying to find the Confidence Interval, but the stimulated Confidence Level. How does the actual coverage percentage (obtained through simulation) compare with the nominal confidence level (which is 95%)? This is my code when my samples were given in a practice problem...
library(boot)
x = c(0.22, 0.23, 0.26, 0.27, 0.28, 0.28, 0.29,
0.33, 0.34, 0.35, 0.38, 0.39, 0.39, 0.42, 0.42,
0.43, 0.45, 0.46, 0.48, 0.5, 0.5, 0.51, 0.52,
0.54, 0.56, 0.56, 0.57, 0.57, 0.6, 0.62, 0.63,
0.67, 0.69, 0.72, 0.74, 0.76, 0.79, 0.81, 0.82,
0.84, 0.89, 1.11, 1.13, 1.14, 1.14, 1.2, 1.33)
B = 10000
xbar = mean(x)
n = length(x)
xbar_star = c(1:B)
for(b in 1:B)
{
x_star = sample(x=x, size=n, replace=TRUE)
xbar_star[b] = mean(x_star)
}
# empirical percentile method
quantile(xbar_star, p=c(0.025, 0.975))
> quantile(xbar_star, p=c(0.025, 0.975))
2.5% 97.5%
0.5221277 0.6797926

Resources