How to implement the bootstrap in R - r

So I posted a thread about this problem, but it got on hold. So I rephrased so it can be it a programming question. This is my code below. I am trying to find the stimulated confidence level of a sample using the bootstrap.
# Step One: Generating the data from lognormal distribution
MC <-1000; # Number of samples to simulate
xbar = c(1:MC);
mu = 1;
sigma= 1.5;
the_mean <- exp(mu+sigma^2/2);
n= 10;
for(i in 1:MC)
{
mySample <- rlnorm(n=n meanlog=mu, sdlog=sigma);
xbar [i] <- the_mean(mySample);
}
# Step Two: Compute 95% Bootstrap CI with B=1000
B = 1000
xbar_star = c(1:B)
for(b in 1:B)
{
x_star = sample(n,n, replace=TRUE)
xbar_star[b] = mean(x_star)
}
quantile(xbar, p=c(0.025, 0.975))
If you implement this code you can see that the output is 975.025 when it should actually be 0. 90.
I don't understand why my output is wrong.
We arent trying to find the Confidence Interval, but the stimulated Confidence Level. How does the actual coverage percentage (obtained through simulation) compare with the nominal confidence level (which is 95%)? This is my code when my samples were given in a practice problem...
library(boot)
x = c(0.22, 0.23, 0.26, 0.27, 0.28, 0.28, 0.29,
0.33, 0.34, 0.35, 0.38, 0.39, 0.39, 0.42, 0.42,
0.43, 0.45, 0.46, 0.48, 0.5, 0.5, 0.51, 0.52,
0.54, 0.56, 0.56, 0.57, 0.57, 0.6, 0.62, 0.63,
0.67, 0.69, 0.72, 0.74, 0.76, 0.79, 0.81, 0.82,
0.84, 0.89, 1.11, 1.13, 1.14, 1.14, 1.2, 1.33)
B = 10000
xbar = mean(x)
n = length(x)
xbar_star = c(1:B)
for(b in 1:B)
{
x_star = sample(x=x, size=n, replace=TRUE)
xbar_star[b] = mean(x_star)
}
# empirical percentile method
quantile(xbar_star, p=c(0.025, 0.975))
> quantile(xbar_star, p=c(0.025, 0.975))
2.5% 97.5%
0.5221277 0.6797926

Related

Optimise function on two criteria in R

I'm trying to optimise an exponential model by minimising the sum of squares, however I can't work out how to optimise using two separate criteria.
I need to find values for "a" and "b" that minimise the output of the function. I have entered estimates in the code below, but need the output of this code to tell me the sum of squares (already printing), but also values for "a" and "b" when the model was correctly optimised.
c <- c(0.08, 0.17, 0.25, 0.33, 0.41, 0.49, 0.57, 0.65, 0.73, 0.81, 0.88, 0.96, 1.04, 1.11, 1.19, 1.26)
my_fun <- function(a, b, c){
predVar1 <- a * (1-exp(-c/b))
sum((predVar1 - c)^2)
}
a <- 9
b <- 1.4
my_fun(a, b, c)
Thanks
Did some rearranging based on how I'm familiar with using optim. Also, I changed c to d because I don't like messing with accidentally overwriting the c() function.
d <- c(0.08, 0.17, 0.25, 0.33, 0.41, 0.49, 0.57, 0.65, 0.73, 0.81, 0.88, 0.96, 1.04, 1.11, 1.19, 1.26)
my_fun <- function(parameters, d){
a <- parameters[1]
b <- parameters[2]
predVar1 <- a * (1-exp(-d/b))
return(sum((predVar1 - d)^2))
}
a <- 9
b <- 1.4
results <- optim(c(a, b), my_fun, d = d)
results$par
#[1] 700.8850 700.4793
results$value
#[1] 4.37461e-07

How to plot truncated distributions (truncdist) with fitdistrplus?

I am attempting to plot goodness of fit curves to truncated distributions from the fitdistrplus package using its plot function.
library(fitdistrplus)
library(truncdist)
library(truncnorm)
dataNum <- c(433.6668, 413.0450, 435.9952, 449.7559, 457.3629, 498.6187, 598.0335, 637.5611, 644.9193, 634.4843, 620.8676, 590.6622, 581.6411, 572.5022, 594.0925, 587.7293, 608.4948, 626.7594, 599.0286, 611.2966, 572.1749, 545.0071, 490.0298, 478.8484, 458.8293, 437.4878, 467.7026, 477.4094, 467.4182, 519.3056, 599.0155, 648.8603, 623.0672, 606.3737, 552.3653, 558.7612, 553.1345, 549.5961, 546.0578, 565.4582, 562.6825, 606.6225, 578.1584, 572.6201, 546.4735, 514.8147, 479.4638, 462.7702, 430.3652, 452.9671)
If I use the library(truncnorm) to fit a truncated normal distribution, everything works fine.
fit.dataNormTrunc2 <- fitdist(dataNum, "truncnorm", fix.arg=list(a=min(dataNum)), start = list(mean = mean(dataNum), sd = sd(dataNum)))
plot(fit.dataNormTrunc2)
However, if I try to use the truncdist package, only the histogram comparison plot prints without any of the other plots (e.g. qq-plot). I also get an error:
Error in qtNorm(p = c(0.01, 0.03, 0.05, 0.07, 0.09, 0.11, 0.13, 0.15, :
unused argument (p = c(0.01, 0.03, 0.05, 0.07, 0.09, 0.11, 0.13, 0.15, 0.17, 0.19, 0.21, 0.23, 0.25, 0.27, 0.29, 0.31, 0.33, 0.35, 0.37, 0.39, 0.41, 0.43, 0.45, 0.47, 0.49, 0.51, 0.53, 0.55, 0.57, 0.59, 0.61, 0.63, 0.65, 0.67, 0.69, 0.71, 0.73, 0.75, 0.77, 0.79, 0.81, 0.83, 0.85, 0.87, 0.89, 0.91, 0.93, 0.95, 0.97, 0.99))
The code used is:
dtNorm <- function(x, mean, sd) {
dtrunc(x, "norm", mean, sd, a=min(dataNum), b=Inf)
}
ptNorm <- function(x, mean, sd) {
ptrunc(x, "norm", mean, sd, a=min(dataNum), b=Inf)
}
qtNorm <- function(x, mean, sd) {
qtrunc(x, "norm", mean, sd, a=min(dataNum), b=Inf)
}
fit.dataNormTrunc <- fitdist(dataNum, "tNorm", start = c(mean=mean(dataNum), sd=sd(dataNum)))
plot(fit.dataNormTrunc)
I have also tried the truncdist approach with the lognormal functionand again the other 3 plots don't print out and I get the same error about the values not being used.

Bayes Factor values in the R package BayesFactor

I've followed the instructions on how to run a Bayesian 't-test' using default priors in the BayesFactor package in R.
Some of the returned values are astronomical.
Here is an example comparison with a huge Bayes factor:
#install.packages('BayesFactor')
library(BayesFactor)
condition1 <- c(0.94, 0.9, 0.96, 0.74, 1, 0.98, 0.86, 0.92, 0.918367346938776,
0.96, 0.4, 0.816326530612245, 0.8, 0.836734693877551, 0.56, 0.66,
0.605263157894737, 0.836734693877551, 0.84, 0.9, 0.92, 0.714285714285714,
0.82, 0.5, 0.565217391304348, 0.8, 0.62)
condition2 <- c(0.34, 0.16, 0.23, 0.19, 0.71, 0.36, 0.02, 0.83, 0.11, 0.06,
0.27, 0.347368421052632, 0.21, 0.13953488372093, 0.11340206185567,
0.14, 0.142857142857143, 0.257731958762887, 0.15, 0.29, 0.67,
0.0515463917525773, 0.272727272727273, 0.0895522388059701, 0.0204081632653061,
0.13, 0.0612244897959184)
bf = ttestBF(x = condition1, condition2, paired = TRUE)
bf
This returns:
Bayes factor analysis
--------------
[1] Alt., r=0.707 : 144035108289 ±0%
Against denominator:
Null, mu = 0
---
Bayes factor type: BFoneSample, JZS
For the most part the comparisons range from below 1 up to a few hundred. But I'm concerned that this value (144035108289!) is indicative of something erroneous on my part.
FYI: the p-value in the null-hypothesis test on the same data as above = 4.649279e-14.
Any assurances or insights into this returned BF would be much appreciated.
I calculated the BF using manual input of t-value and sample size like this using the same package:
exp(ttest.tstat(t=14.63, n1=27, rscale = 0.707)[['bf']])
It gives the same BF. It seems this is largely due to a relatively big sample size (27). The returned BF appears to be on the up-and-up.

Understanding and implementing numerical integration with a quantile function in R

I need to calculate this integral below, using R:
The q_theta(x) function I managed to do in R with quantile regression (package: quantreg).
matrix=structure(c(0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09,
0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2,
0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31,
0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42,
0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53,
0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64,
0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72, 0.73, 0.74, 0.75,
0.76, 0.77, 0.78, 0.79, 0.8, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86,
0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97,
0.98, 0.99, -22.2830664155772, -22.2830664155772, -19.9298291765612,
-18.2066426767652, -15.2657135034479, -14.921522915965, -13.5035945028536,
-13.1557269916064, -12.9495709618481, -11.6168348488161, -11.3999095021713,
-10.6962766764396, -10.0588239375837, -9.12944363439522, -8.15648778610587,
-8.04133299299019, -7.66558386420434, -7.50906566627427, -6.95626096568998,
-6.90630556403136, -6.53374879831376, -6.39324677042686, -6.20705804899049,
-6.09754765999465, -5.91272058217526, -5.75771166206242, -5.3770131257001,
-5.20892464393192, -5.07372162687422, -4.96706814289334, -4.64404095131293,
-4.1567394053577, -4.13209444755342, -3.85483644113723, -3.64855238293205,
-3.53054113507559, -3.46035383338799, -3.03155417364444, -2.93100183005178,
-2.90491824855193, -2.64056616049773, -2.51857727614607, -2.25163805172486,
-2.00934783937474, -1.89925824841417, -1.71405007411747, -1.65905834683964,
-1.47502511311988, -1.42755073292529, -1.20464216637298, -1.08574103345057,
-0.701134735371922, -0.590656010656201, -0.290335898959635, -0.0575062007348038,
0.0778328375033378, 0.165234593185889, 0.230651883848336, 0.316817885358695,
0.34841775605248, 0.516869604496075, 0.59743162507581, 0.857843937404964,
0.939734010162078, 1.12533017928147, 1.27037182428776, 1.52040854525927,
1.76577933448152, 2.07456447851822, 2.17389787235523, 2.27567786362425,
2.3850323163509, 2.55365596853891, 2.61208242890655, 2.77359226593771,
2.93275094039929, 3.07968072488942, 3.0822647851901, 3.26452177629061,
3.46223321951649, 3.66011832966054, 3.85710605543097, 4.05385887531972,
4.83943843494744, 5.05864734149161, 5.25501778319145, 5.38941130574907,
5.88571117751377, 6.5116611852713, 6.98632496342285, 7.21816245728101,
7.73244825971004, 7.80401007592906, 8.34648625541999, 9.83184090479964,
10.8324874884172, 11.3060100107816, 12.3048113953808, 13.1300123358331
), .Dim = c(99L, 2L), .Dimnames = list(NULL, c("Theta", "q(x)_(Theta)"
)))
This is my q_theta(x) function that I estimated in R. One of the question I have is:
a> If x is a standard normal distribution this integral is zero; Right?
b> Otherwise, in my case, the integral is not zero. How do I treat the q_1-Theta(x)? Its simply the sort(matrix[,"q(x)_(Theta)"],decreasing=TRUE) ?
And the integration would be:
sintegral(thau[1:50], (matrix[,"q(x)_(Theta)"][1:50] - sort(matrix[,"q(x)_(Theta)"],TRUE)[1:50])[1:50])$value
The median would be a comun point of this two functions. Right?
Thanks.
Recall your previous post Building a function by defining X and Y and then Integrating in R, we build a linear interpolation function
## note `rule = 2` to enable "extrapolation";
## otherwise `rule = 1` gives `NA` outside [0.01, 0.5]
integrand <- approxfun(mat[, 1], y, rule = 2)
Then we can perform numeric integration on [0, 0.5]:
integrate(integrand, lower = 0, upper = 0.5)
# -5.594405 with absolute error < 4e-04
Now for a>, let's have a proof first.
Note, your quantile function is not for normal distribution, so this result does not hold. You can actually verify this
quant <- approxfun(mat[, 1], mat[, 2], rule = 2)
integrate(quant, lower = 0, upper = 0.5)
# -3.737973 with absolute error < 0.00029
Compared with previous integration result -5.594405, the difference is not a factor of 2.

Ratio of polynomials approximation

I am trying to fit a polynomial to my dataset, which looks like that (full dataset is at the end of the post):
The theory predicts that the formulation of the curve is:
which looks like this (for x between 0 and 1):
When I try to make a linear model in R by doing:
mod <- lm(y ~ poly(x, 2, raw=TRUE)/poly(x, 2))
I get the following curve:
Which is much different from what I would expect. Have you got any idea how to fit a new curve from this data so that it would be similar to the one, which theory predicts? Also, it should have only one minimum.
Full dataset:
Vector of x values:
x <- c(0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.10, 0.11, 0.12,
0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22, 0.23, 0.24, 0.25,
0.26, 0.27, 0.28, 0.29, 0.30, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38,
0.39, 0.40, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.50, 0.51,
0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.60, 0.61, 0.62, 0.63, 0.64,
0.65, 0.66, 0.67, 0.68, 0.69, 0.70, 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77,
0.78, 0.79, 0.80, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.90,
0.91, 0.92, 0.93, 0.94, 0.95)
Vector of y values:
y <- c(4.104, 4.444, 4.432, 4.334, 4.285, 4.058, 3.901, 4.382,
4.258, 4.158, 3.688, 3.826, 3.724, 3.867, 3.811, 3.550, 3.736, 3.591,
3.566, 3.566, 3.518, 3.581, 3.505, 3.454, 3.529, 3.444, 3.501, 3.493,
3.362, 3.504, 3.365, 3.348, 3.371, 3.389, 3.506, 3.310, 3.578, 3.497,
3.302, 3.530, 3.593, 3.630, 3.420, 3.467, 3.656, 3.644, 3.715, 3.698,
3.807, 3.836, 3.826, 4.017, 3.942, 4.208, 3.959, 3.856, 4.157, 4.312,
4.349, 4.286, 4.483, 4.599, 4.395, 4.811, 4.887, 4.885, 5.286, 5.422,
5.527, 5.467, 5.749, 5.980, 6.242, 6.314, 6.587, 6.790, 7.183, 7.450,
7.487, 8.566, 7.946, 9.078, 9.308, 10.267, 10.738, 11.922, 12.178, 13.243,
15.627, 16.308, 19.246, 22.022, 25.223, 29.752)
Use nls to fit a nonlinear model. Note that the model formula is not uniquely defined as displayed in the question since if we multiply all the coefficients by any number the result will still give the same predictions. To avoid this we need to fix one coefficient. A first try used the coefficients shown in the question as starting values (except fixing one) but that failed so dropping C was tried and the resulting coefficients fed into a second fit with C = 1.
st <- list(a = 43, b = -14, c = 25, B = 18)
fm <- nls(y ~ (a + b * x + c * x^2) / (9 + B * x), start = st)
fm2 <- nls(y ~ (a + b * x + c * x^2) / (9 + B * x + C * x^2), start = c(coef(fm), C = 1))
plot(y ~ x)
lines(fitted(fm2) ~ x, col = "red")
(continued after chart)
Note: Here is an example of using nls2 to get starting values with random search. We assume that the coefficients each lie between -50 and 50.
library(nls2)
set.seed(123) # for reproducibility
v <- c(a = 50, b = 50, c = 50, B = 50, C = 50)
st0 <- as.data.frame(rbind(-v, v))
fm0 <- nls2(y ~ (a + b * x + c * x^2) / (9 + B * x + C * x^2), start = st0,
alg = "random", control = list(maxiter = 1000))
fm3 <- nls(y ~ (a + b * x + c * x^2) / (9 + B * x + C * x^2), st = coef(fm0))
Since you already have a theoretic prediction, you don't seem in need of a new model, and it's really only a plotting task:
png(); plot(y~x)
lines(x,mod,col="blue")
dev.off()
You cannot expect lm to produce a good approximation to a non-linear problem. The denominator involving x in that theoretic expression makes this inherently nonlinear.

Resources