An NA error in my Gibbs sampler for mixture model - r

I am working on a Gibbs sampler and my code is as follows. The idea is (1)sample pi first (2) sample delta (3) sample beta.
library(foreign)
cognitive `=read.dta("http://www.stat.columbia.edu/~gelman/arm/examples/child.iq/kidiq.dta")`
summary(cognitive)
cognitive$mom_work = as.numeric(cognitive$mom_work > 1)
cognitive$mom_hs = as.numeric(cognitive$mom_hs > 0)
# Modify column names of the data set
colnames(cognitive) = c("kid_score", "hs", "IQ", "work", "age")
x<-cbind(cognitive$hs, cognitive$IQ, cognitive$work, cognitive$age)
y<-cognitive$kid_score
lmmodel<-lm(y~x-1, data=cognitive)
NSim=3000 #iteration
Betahat=solve(t(x)%*%x)%*%t(x)%*%y
Error in if (delta[ite, j] == 1) rnorm(1, mu1, sigma1) else rnorm(1, mu0, :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In rbinom(1, 1, prob = (p1/(p0 + p1))) : NAs produced
2: In rbinom(1, 1, prob = (p1/(p0 + p1))) : NAs produced

error is caused by the line prob=(pi[ite]*exp(-beta[ite-1,j]^2/(2*10^2)))/(((1-pi[ite])*10^3)*exp(-beta[ite-1,j]^2/(2*10^(-4)))+pi[ite-1]*exp(-beta[ite-1,j]^2/(2*10^2))): at some iteration prob becomes greater than 1, so rbern() returns NA. Check your formula.
UPD. For debugging, add the following before your delta[ite,j]=rbern(... line:
prob_full <- (pi[ite]*exp(-beta[ite-1,j]^2/(2*10^2)))/(((1-pi[ite])*10^3)*exp(-beta[ite-1,j]^2/(2*10^(-4)))+pi[ite-1]*exp(-beta[ite-1,j]^2/(2*10^2)));
cat('\n',ite,j,prob_full)

Related

Error in if (delta < tol) break : missing value where TRUE/FALSE needed

library("mvmeta")
library("dosresmeta")
lin <- dosresmeta(formula = logHR ~ dose, id = study,
type = type, se = se, cases = cases,
n = personyear, data = breast)
Error in if (delta < tol) break : missing value where TRUE/FALSE needed
Otherwise: Warning message:
In log(Ax[v != 0]) : NaNs generated
Answer
At least one of your values in breast$cases is a negative number. You can check this with:
table(breast$cases < 0, useNA = "always")
Rationale
The dosresmeta function will call the covar.logrr function, which will call the grl function (try running traceback() after getting the error). The essential code in grl is:
cases <- eval(mf.cases, data, enclos = sys.frame(sys.parent()))
Ax <- Axp <- cases
y[v!=0] + log(A0) + log(n[v!=0]) - log(Ax[v!=0]) - log(n[v==0])
Essentially, the function takes the cases column, assigns it to Ax, and then tries to run log(Ax[v!=0]) (see your error).
It is known that taking the log of a negative number will yield a NaN, as stated by your error:
log(-1)
# [1] NaN
# Warning message:
# In log(-1) : NaNs produced
Therefore, I think that your cases columns contains non-positive numbers.

Correct usage of stats4::mle

I want to use stats4::mle function to estimate the best parameters (2) of a distribution.
I would like to be sure my usage is correct and get guidance to avoid error
"Error in optim(start, f, method = method, hessian = TRUE, ...) :
initial value in 'vmmin' is not finite
In addition: Warning message:
In log(mu) : NaNs produced"
Function I would like to estimate is exp(beta0*a + beta1*b) and I would like to estimate the betas
Sample code:
a <- mydata$a # first variable
b <- mydata$b # second variable
y <- mydata$y # observed result
nll <- function(beta0, beta1) {
mu = y - exp(beta0 * a + beta1 * b)
- sum(log(mu))
}
est <- stats4::mle(minuslog = nll, start = list(beta0 = 0.0001, beta1 = 0.0001))
est
So:
Is this the correct way of doing things?
For the error, I understand this is due to values of mu getting to 0, but I don't know what I can do with it
Thanks for your help.

Error using random forest (MICE package) during imputation

I would like to use the method Random Forest to impute missing values. I have read some papers that claim that MICE random Forest perform better than parametric mice.
In my case, I already run a model for the default mice and got the results and played with them. However when I had a option for the method random forest, I got an error and I'm not sure why. I've seen some questions relating to errors with random forest and mice but those are not my cases. My variables have more than a single NA.
imp <- mice(data1, m=70, pred=quickpred(data1), method="pmm", seed=71152, printFlag=TRUE)
impRF <- mice(data1, m=70, pred=quickpred(data1), method="rf", seed=71152, printFlag=TRUE)
iter imp variable
1 1 Vac
Error in if (n == 0) stop("data (x) has 0 rows") : argument is of length zero
Any one has any idea why I'm getting this error?
EDIT
I tried to change all variables to numeric instead of having dummy variables and it returned the same error and some warnings()
impRF <- mice(data, m=70, pred=quickpred(data), method="rf", seed=71152, printFlag=TRUE)
iter imp variable
1 1 Vac CliForm
Error in if (n == 0) stop("data (x) has 0 rows") : argument is of length zero
In addition: There were 50 or more warnings (use warnings() to see the first 50)
50: In randomForest.default(x = xobs, y = yobs, ntree = 1, ... :
The response has five or fewer unique values. Are you sure you want to do regression?
EDIT1
I've tried only with 5 imputations and a smaller subset of the data, with only 2000 rows and got a few different errors:
> imp <- mice(data2, m=5, pred=quickpred(data2), method="rf", seed=71152, printFlag=TRUE)
iter imp variable
1 1 Vac Radio Origin Job Alc Smk Drugs Prison Commu Hmless Symp
Error in randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : NAs in foreign
function call (arg 11)
In addition: Warning messages:
1: In randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : invalid mtry: reset to within valid range
2: In max(ncat) : no non-missing arguments to max; returning -Inf
3: In randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : NAs introduced by coercion
I also encountered this error when I had only one fully observed variable, which I'm guessing is the cause in your case too. My colleague Anoop Shah provided me with a fix (below) and Prof van Buuren (mice's author) has said he will include it in the next update of the package.
In R type the following to enable you to redefine the rf impute function.
fixInNamespace("mice.impute.rf", "mice")
The corrected function to paste in is then:
mice.impute.rf <- function (y, ry, x, ntree = 100, ...){
ntree <- max(1, ntree)
xobs <- as.matrix(x[ry, ])
xmis <- as.matrix(x[!ry, ])
yobs <- y[ry]
onetree <- function(xobs, xmis, yobs, ...) {
fit <- randomForest(x = xobs, y = yobs, ntree = 1, ...)
leafnr <- predict(object = fit, newdata = xobs, nodes = TRUE)
nodes <- predict(object = fit, newdata = xmis, nodes = TRUE)
donor <- lapply(nodes, function(s) yobs[leafnr == s])
return(donor)
}
forest <- sapply(1:ntree, FUN = function(s) onetree(xobs,
xmis, yobs, ...))
impute <- apply(forest, MARGIN = 1, FUN = function(s) sample(unlist(s),
1))
return(impute)
}

Error when using midasr package

Ive made a Minimal Reproducible Example for the problem i'm facing.
Data for Y(monthly dependent variable):
monthlytest <- c(-.035, 0.455)
ytest <- ts(monthlytest, start=c(2008,8), frequency=12)
Data for X(daily explanatory variable):
lol1 <- paste(2008, sprintf("%02s",rep(1:12, each=30)), sprintf("%02s", 1:30), sep="-") [211:270]
lol2 <- seq(0.015, 0.078, length.out=60)
xtest <- zoo(lol2, order.by = lol1)
Load package:
library(midasr)
library(zoo)
Run regression:
beta <- midas_r(ytest ~ mls(ytest, 1, 1) + mls(xtest, 3:30, 30))
When this final line of code is run I get this error, what am I doing wrong?
Error in matrix(NA, nrow = n - nrow(X), ncol = ncol(X)) :
invalid 'nrow' value (< 0)
The error is produced by the function mls:
> mls(xtest, 3:30, 30)
Erreur dans matrix(NA, nrow = n - nrow(X), ncol = ncol(X)) :
valeur 'nrow' incorrecte (< 0)
This happens because mls expects numeric argument. Converting xtest to numeric solves the problem:
xtn <- as.numeric(xtest)
beta <- midas_r(ytest ~ mls(ytest, 1, 1) + mls(xtn, 3:30, 30))
Which produces error again:
Erreur dans prepmidas_r(y, X, mt, Zenv, cl, args, start, Ofunction, user.gradient, :
l'argument "start" est manquant, avec aucune valeur par défaut
Which means that you did not specify start which is mandatory for function midas_r. Your model is the unrestricted MIDAS model, which means that either you need to use function midas_u or supply start=NULL. But even this does not help:
> beta <- midas_r(ytest ~ mls(ytest, 1, 1) + mls(xtn, 3:30, 30),start=NULL)
Erreur dans midas_r.fit(prepmd) :
Not possible to estimate MIDAS model, more parameters than observations
You have two low frequency observations, which in theory allows you to estimate two parameters, your model has 29. So you need to have at least 30 low frequency observations (since you lose one observation due to lagged dependent variable) to estimate this model.

Fitting a 3 parameter Weibull distribution

I have been doing some data analysis in R and I am trying to figure out how to fit my data to a 3 parameter Weibull distribution. I found how to do it with a 2 parameter Weibull but have come up short in finding how to do it with a 3 parameter.
Here is how I fit the data using the fitdistr function from the MASS package:
y <- fitdistr(x[[6]], 'weibull')
x[[6]] is a subset of my data and y is where I am storing the result of the fitting.
First, you might want to look at FAdist package. However, that is not so hard to go from rweibull3 to rweibull:
> rweibull3
function (n, shape, scale = 1, thres = 0)
thres + rweibull(n, shape, scale)
<environment: namespace:FAdist>
and similarly from dweibull3 to dweibull
> dweibull3
function (x, shape, scale = 1, thres = 0, log = FALSE)
dweibull(x - thres, shape, scale, log)
<environment: namespace:FAdist>
so we have this
> x <- rweibull3(200, shape = 3, scale = 1, thres = 100)
> fitdistr(x, function(x, shape, scale, thres)
dweibull(x-thres, shape, scale), list(shape = 0.1, scale = 1, thres = 0))
shape scale thres
2.42498383 0.85074556 100.12372297
( 0.26380861) ( 0.07235804) ( 0.06020083)
Edit: As mentioned in the comment, there appears various warnings when trying to fit the distribution in this way
Error in optim(x = c(60.7075705026659, 60.6300379017397, 60.7669410153573, :
non-finite finite-difference value [3]
There were 20 warnings (use warnings() to see them)
Error in optim(x = c(60.7075705026659, 60.6300379017397, 60.7669410153573, :
L-BFGS-B needs finite values of 'fn'
In dweibull(x, shape, scale, log) : NaNs produced
For me at first it was only NaNs produced, and that is not the first time when I see it so I thought that it isn't so meaningful since estimates were good. After some searching it seemed to be quite popular problem and I couldn't find neither cause nor solution. One alternative could be using stats4 package and mle() function, but it seemed to have some problems too. But I can offer you to use a modified version of code by danielmedic which I have checked a few times:
thres <- 60
x <- rweibull(200, 3, 1) + thres
EPS = sqrt(.Machine$double.eps) # "epsilon" for very small numbers
llik.weibull <- function(shape, scale, thres, x)
{
sum(dweibull(x - thres, shape, scale, log=T))
}
thetahat.weibull <- function(x)
{
if(any(x <= 0)) stop("x values must be positive")
toptim <- function(theta) -llik.weibull(theta[1], theta[2], theta[3], x)
mu = mean(log(x))
sigma2 = var(log(x))
shape.guess = 1.2 / sqrt(sigma2)
scale.guess = exp(mu + (0.572 / shape.guess))
thres.guess = 1
res = nlminb(c(shape.guess, scale.guess, thres.guess), toptim, lower=EPS)
c(shape=res$par[1], scale=res$par[2], thres=res$par[3])
}
thetahat.weibull(x)
shape scale thres
3.325556 1.021171 59.975470
An alternative: package "lmom". The estimative by L-moments technique
library(lmom)
thres <- 60
x <- rweibull(200, 3, 1) + thres
moments = samlmu(x, sort.data = TRUE)
log.moments <- samlmu( log(x), sort.data = TRUE )
weibull_3parml <- pelwei(moments)
weibull_3parml
zeta beta delta
59.993075 1.015128 3.246453
But I don´t know how to do some Goodness-of-fit statistics in this package or in the solution above. Others packages you can do Goodness-of-fit statistics easily. Anyway, you can use alternatives like: ks.test or chisq.test

Resources