MASS packages' "fitdistr": Error when dealing with manipulated random data - r

Background:
Below I have generated some random beta data using R and manipulate the shape of the data a bit to arrive at what I call "Final" in my code. And I histogram "Final" in my code.
Question:
I'm wondering why when trying to fit a "beta" distribution to "Final" data using MASS packages' "fitdistr" function, I get the following error (Any suggestion how to avoid this error)?
Error in stats::optim(x = c(0.461379379270288, 0.0694261016478062, 0.76934266883081, :
initial value in 'vmmin' is not finite
Here is my R code:
require(MASS)
## Generate some data and manipulate it
set.seed(47)
Initial = rbeta(1e5, 2, 3)
d <- density(Initial)
b.5 <- dbeta(seq(0, 1, length.out = length(d$y)), 50, 50)
b.5 <- b.5 / (max(b.5) / max(d$y)) # Scale down to max of original density
b.6 <- dbeta(seq(0, 1, length.out = length(d$y)), 60, 40)
b.6 <- b.6 / (max(b.6) / max(d$y))
# Collect maximum densities at each x to use as sample probability weights
p <- pmax(d$y, b.5, b.6)
Final <- sample(d$x, 1e4, replace = TRUE, prob = p) ## THIS IS MY FINAL DATA
hist(Final, freq = F, ylim = c(0, 2)) ## HERE IS A HISTOGRAM
m <- MASS::fitdistr(Final, "beta", ## RUN THIS TO SEE HOW THE ERROR COMES UP
start = list(shape1 = 1, shape2 = 1))

Here is the code.
It is the same with your code, I just removed the negative beta values.
library(MASS)
set.seed(47)
Initial = rbeta(1e5, 2, 3)
d <- density(Initial)
b.5 <- dbeta(seq(0, 1, length.out = length(d$y)), 50, 50)
b.5 <- b.5 / (max(b.5) / max(d$y)) # Scale down to max of original
density
b.6 <- dbeta(seq(0, 1, length.out = length(d$y)), 60, 40)
b.6 <- b.6 / (max(b.6) / max(d$y))
# Collect maximum densities at each x to use as sample probability weights
p <- pmax(d$y, b.5, b.6)
Final <- sample(d$x, 1e4, replace = TRUE, prob = p) ## THIS IS MY FINAL DATA
hist(Final, freq = F, ylim = c(0, 2)) ## HERE IS A HISTOGRAM
# replace negative beta values with smallest value > 0
Final[Final<= 0] <- min(Final[Final>0])
hist(Final, freq = F, ylim = c(0, 2))
m <- MASS::fitdistr(x = Final, densfun = "beta",
start = list(shape1 = 1, shape2 = 1))
Here are the shape parameters:
> m
shape1 shape2
1.99240852 2.90219720
(0.02649853) (0.04010168)
Take note that it gives some warnings.

Related

Monte Carlo simulations for VAR models

I've been trying to estimate VAR models using Monte Carlo Simulation. I have 3 endogenous variables. I need some guidance regarding this.
First of all, I want to add an outlier as a percentage of the sample size.
Second (second simulation for same model), I want to add multivariate contaminated normal distribution like 0.9N (0, I) + 0.1((0,0,0)',(100, 100, 100)) instead of outlier.
Could you tell me how to do these?
Thank you.
RR <- function(n, out){
# n is number of observations
k <- 3 # Number of endogenous variables
p <- 2 # Number of lags
# add outlier
n[1]<- n[1]+out
# Generate coefficient matrices
B1 <- matrix(c(.1, .3, .4, .1, -.2, -.3, .03, .1, .1), k) # Coefficient matrix of lag 1
B2 <- matrix(c(0, .2, .1, .07, -.4, -.1, .5, 0, -.1), k) # Coefficient matrix of lag 2
M <- cbind(B1, B2) # Companion form of the coefficient matrices
# Generate series
DT <- matrix(0, k, n + 2*p) # Raw series with zeros
for (i in (p + 1):(n + 2*p)){ # Generate series with e ~ N(0,1)
DT[, i] <- B1%*%DT[, i-1] + B2%*%DT[, i-2] + rnorm(k, 0, 1)
}
DT <- ts(t(DT[, -(1:p)])) # Convert to time series format
#names <- c("V1", "V2", "V3") # Rename variables
colnames(DT) <- c("Y1", "Y2", "Y3")
#plot.ts(DT) # Plot the series
# estimate VECM
vecm1 <- VECM(DT, lag = 2, r = 2, include = "const", estim ="ML")
vecm2 <- VECM(DT, lag = 2, r = 1, include = "const", estim ="ML")
# mse
mse1 <- mean(vecm1$residuals^2)
mse2 <- mean(vecm2$residuals^2)
#param_list <- unname(param_list)
return(list("mse1" = mse1, "mse2" = mse2, "mse3" = mse3))
}
# defined the parameter grids(define the parameters ranges we want to run our function with)
n_grid = c(50, 80, 200, 400)
out_grid = c(0 ,5, 10)
# collect parameter grids in a list (to enter it into the Monte Carlo function)
prml = list("n" = n_grid, "out" = out_grid)
# run simulation
RRS <- MonteCarlo(func = RR, nrep = 1000, param_list = prml)
summary(RRS)
# make table:
rows = "n"
cols = "out"
MakeTable(output = RRS, rows = rows, cols = cols)

How can I predict values in factorial experiments (2^k) with centre points in R?

How can I predict values in factorial experiments with centre points in R using FrF2 package with the predict function or using the broom package?
My code:
library(FrF2)
plan.person = FrF2(nfactors = 5, resolution = 5, replications = 2,
ncenter = 1, randomize = FALSE,
factor.names = list(
A = c(8, 5),
B = c(70, 30),
C = c(0.5, 0),
D = c(1000, 700),
E = c(70, 10)))
resp <- c(84.55, 66.34, -1, 69.18, 73.01, 64.52, 0.73, 47.61, 68.18, 59.87,
26, 72.57, 78.08, 73.81, 26, 59.38, 71.41, 88.64, 64.92, 4, 68.81,
80, 69.66, -1.36, 54.50, 79.24, 78.53, -1, 72.63, 89.97, 87.98,
-11, 65.68, 82.46)
newplan <- add.response(design = plan.person, response = resp)
model <- lm(newplan, use.center = T)
# summary(model)
d <- within(newplan, {
A <- as.numeric(as.character(A))
B <- as.numeric(as.character(B))
C <- as.numeric(as.character(C))
D <- as.numeric(as.character(D))
E <- as.numeric(as.character(E)) })
A = seq(5, 8, 1)
B = seq(30, 70, length.out = length(A))
C = seq(0, 0.5, length.out = length(A))
D = seq(700, 1000, length.out = length(A))
E = seq(10, 70, length.out = length(A))
data <- expand.grid(A = A, B = B,
C = C, D = D,
E = E)
dados$p <- predict(model, newdata=data)
Because of the center point the following message appears.
Error in model.frame.default (Terms, newdata, na.action = na.action, xlev = object $ xlevels):
   lengths of variables differ (found in 'center')
"A two-level experiment with center points can detect, but not fit, quadratic effects."
(https://www.itl.nist.gov/div898/handbook/pri/section3/pri336.htm)
That is, R can't predict these values because you need to make additional assumptions about what the curve looks like to predict points not at your design points.
Note that computationally, you can get the software to work by adding a center term. The error is because this term is in the regression but not in the data set. You could add one with data$center <- FALSE (because none of the points in data are at the center), but this will not do the right thing, as it will not take the potential curvature into account when predicting non-central points, it would simply predict a twisted plane (that is, linear with interactions) with a single bump at the center.
Of course, it's also equivalent to just fitting the model with use.center=FALSE, as the center point doesn't affect the fit of the other points.
If you remove the central value, you can this after model <- lm(newplan, use.center = T)
:
1- Filter the pvalues < 0.05
coe <- broom::tidy(model) %>%
slice(-7) %>% #remove center
filter(p.value < 0.05)
m_beta <- coe$estimate
2 - Do a grid:
A = seq(5, 8, 0.5)
B = seq(30, 70, length.out = length(A))
exp <- expand.grid(A = A, B = B) %>%
mutate(bo = as.numeric(1)) %>%
mutate(ult = A*B) %>%
select(bo, A, B, ult) %>%
as.matrix()
3: Do a Regression:
reg <- t(m_beta %*% t(exp))
exp <- cbind(exp, reg) %>%
as.data.frame() %>%
rename(reg = V5)
But I believe this only solves the computational problem or simplifies it. I believe linear regression should be redone as well. But with this code you can explore and see what other errors exist.

Vectorized R function to produce sets of histograms

I have a vectorized R function (see below). At each run, the function plots two histograms. My goal is that when argument n is a vector (see example of use below), the function plots length of n separate sets of these histograms (ex: if n is a vector of length 2, I expected two sets of histograms i.e., 4 individual histograms)?
I have tried the following with no success. Is there a way to do this?
t.sim = Vectorize(function(n, es, n.sim){
d = numeric(n.sim)
p = numeric(n.sim)
for(i in 1:n.sim){
N = sqrt((n^2)/(2*n))
x = rnorm(n, es, 1)
y = rnorm(n, 0, 1)
a = t.test(x, y, var.equal = TRUE)
d[i] = a[[1]]/N
p[i] = a[[3]]
}
par(mfcol = c(2, length(n)))
hist(p) ; hist(d)
}, "n")
# Example of use:
t.sim(n = c(30, 300), es = .1, n.sim = 1e3) # `n` is a vector of `2` so I expect
# 4 histograms in my graphical device
Vectorize seems to be based on mapply, which would essentially call the function numerous times while cycle through your inputs vector. Hence, the easier way out probably just calls it outside the function
t.sim = Vectorize(function(n, es, n.sim){
d = numeric(n.sim)
p = numeric(n.sim)
for(i in 1:n.sim){
N = sqrt((n^2)/(2*n))
x = rnorm(n, es, 1)
y = rnorm(n, 0, 1)
a = t.test(x, y, var.equal = TRUE)
d[i] = a[[1]]/N
p[i] = a[[3]]
}
# par(mfcol = c(2, npar))
hist(p) ; hist(d)
}, "n")
#inputs
data <- c(30,300)
par(mfcol = c(2, length(data)))
t.sim(n = data, es = c(.1), n.sim = 1e3)

Locate a point on a graph in R

I have a probability plot with point-wise confidence intervals fitted to the data. Using the graph I want to compute or locate the 0.001 quantile from the confidence bands. I used the function locator () to find the location of the point. Is there any other method that can be used to find the x-value given the y-value on a graph?
The code I used is as follows.
times <- c (17.88, 28.92, 33.00, 41.52, 42.12, 45.60, 48.40,
51.84, 51.96, 54.12, 55.56, 67.80, 68.64, 68.64,
68.88, 84.12, 93.12, 98.64, 105.12, 105.84, 127.92,
128.04,173.40)
N <- length (times)
t <- c (5, 10, 15)
rank.times <- rank (times) # Use average ranks for the tied observations
ecdf.times <- (rank.times - 0.5) / N
quant.ecdf <- log (-log (1 - ecdf.times))
weibull.ml <- suppressWarnings (fitdist (times, "weibull"))
weibull.cdf <- pweibull (times, shape = weibull.ml$estimate[1],
scale = weibull.ml$estimate[2])
wei <- log (-log (1 - weibull.cdf))
wei.extra <- approxExtrap (log (times), wei, log (t), method = "linear")
quant.wei <- c (wei.extra$y, wei)
set.seed (123)
B <- 2000
wei.boot <- suppressWarnings (bootdist (weibull.ml, bootmethod = "param", niter = B))
boot.cdf.we <- matrix (NA, nrow = B, ncol = N)
for (i in 1:B){
boot.cdf.we[i, ] <- pweibull (times, shape = wei.boot$estim$shape[i],
scale = wei.boot$estim$scale[i])
}
p <- 0.025
upper.wei <- NULL
lower.wei <- NULL
for (i in 1:N) {
upper.wei[i] <- log (-log (1 - quantile (boot.cdf.we[,i], probs = p)))
lower.wei[i] <- log (-log (1 - quantile (boot.cdf.we[,i], probs = 1-p)))
}
extra.wei.l <- approxExtrap (log (times), lower.wei, log (t), method = "linear")
lower.weibull <- c (extra.wei.l$y, lower.wei)
extra.wei.u <- approxExtrap (log (times), upper.wei, log (t), method = "linear")
upper.weibull <- c (extra.wei.u$y, upper.wei)
times.ext <- c (t, times)
loc1 <- c (.001, .005, .02, .05, .1, .2, .4, .6, .8, .9, .95, .98, .995)
loc2 <- log (-log (1 - loc1))
loc3 <- c (5, 9.77, 20, 50, 100, 200)
plot (times, quant.ecdf, log = "x", axes = FALSE, xlab = "Millions of cycles",
ylab = "Proportion failing", pch = 16, type = "p", main = "Weibull - Complete",
xlim = c (5, 200), ylim = c (-6.95, 1.7))
lines (times.ext, quant.wei)
lines (times.ext, upper.weibull)
lines (times.ext, lower.weibull)
abline (h = loc2[1])
segments (9.77789, -7.5, 9.77789, loc2[1])
axis (1, at = loc3, labels = loc3)
axis (2, at = loc2, labels = loc1, las = 2)
Thanks in advance!!

Graphical output of density for the function gammamixEM (package mixtools)

I'm using the function gammamixEM from the package mixtools. How can I return the graphical output of density as in the function normalmixEM (i.e., the second plot in plot(...,which=2)) ?
Update:
Here is a reproducible example for the function gammamixEM:
x <- c(rgamma(200, shape = 0.2, scale = 14), rgamma(200,
shape = 32, scale = 10), rgamma(200, shape = 5, scale = 6))
out <- gammamixEM(x, lambda = c(1, 1, 1)/3, verb = TRUE)
Here is a reproducible example for the function normalmixEM:
data(faithful)
attach(faithful)
out <- normalmixEM(waiting, arbvar = FALSE, epsilon = 1e-03)
plot(out, which=2)
I would like to obtain this graphical output of density from the function gammamixEM.
Here you go.
out <- normalmixEM(waiting, arbvar = FALSE, epsilon = 1e-03)
x <- out
whichplots <- 2
density = 2 %in% whichplots
loglik = 1 %in% whichplots
def.par <- par(ask=(loglik + density > 1), "mar") # only ask and mar are changed
mix.object <- x
k <- ncol(mix.object$posterior)
x <- sort(mix.object$x)
a <- hist(x, plot = FALSE)
maxy <- max(max(a$density), .3989*mix.object$lambda/mix.object$sigma)
I just had to dig into the source code of plot.mixEM
So, now to do this with gammamixEM:
x <- c(rgamma(200, shape = 0.2, scale = 14), rgamma(200,
shape = 32, scale = 10), rgamma(200, shape = 5, scale = 6))
gammamixEM.out <- gammamixEM(x, lambda = c(1, 1, 1)/3, verb = TRUE)
mix.object <- gammamixEM.out
k <- ncol(mix.object$posterior)
x <- sort(mix.object$x)
a <- hist(x, plot = FALSE)
maxy <- max(max(a$density), .3989*mix.object$lambda/mix.object$sigma)
main2 <- "Density Curves"
xlab2 <- "Data"
col2 <- 2:(k+1)
hist(x, prob = TRUE, main = main2, xlab = xlab2,
ylim = c(0,maxy))
for (i in 1:k) {
lines(x, mix.object$lambda[i] *
dnorm(x,
sd = sd(x)))
}
I believe it should be pretty straight forward to continue this example a bit, if you want to add the labels, smooth lines, etc. Here's the source of the plot.mixEM function.

Resources