R: Profile-likelihood based confidence intervals - r

I am using the function plkhci from library Bhat to construct Profile-likelihood based confidence intervals and I got this warning:
Warning message: In dqstep(list(label = x$label, est = btrf(xt, x$low,
x$upp), low = x$low, : oops: unable to find stepsize, use default
when i run
r <- dfp(x,f=nlogf)
Can I ignore this warning as I still can get the output?
Following is the complete coding:
library(Bhat)
beta0<--8
beta1<-0.03
gamma<-0.0105
alpha<-0.05
n<-100
u<-runif(n)
u
x<-rnorm(n)
x
c<-rexp(100,1/1515)
c
t1<-(1/gamma)*log(1-((gamma/(exp(beta0+beta1*x)))*(log(1-u))))
t1
t<-pmin(t1,c)
t
delta<-1*(t1>c)
delta
length(delta)
cp<-length(delta[delta==1])/n
cp
delta[delta==1]<-ifelse(rbinom(length(delta[delta==1]),1,0.5),1,2)
delta
deltae<-ifelse(delta==0, 1,0)
deltar<-ifelse(delta==1, 1,0)
deltai<-ifelse(delta==2, 1,0)
dat=data.frame(t,delta, deltae,deltar,deltai,x)
dat$interval[delta==2] <- as.character(cut(dat$t[delta==2], breaks=seq(0, 600, 100)))
labs <- cut(dat$t[delta==2], breaks=seq(0, 600, 100))
dat$lower[delta==2]<-as.numeric( sub("\\((.+),.*", "\\1", labs) )
dat$upper[delta==2]<-as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs) )
data0<-dat[which(dat$delta==0),]#uncensored data
data1<-dat[which(dat$delta==1),]#right censored data
data2<-dat[which(dat$delta==2),]#interval censored data
nlogf<-function(para)
{
b0<-para[1]
b1<-para[2]
g<-para[3]
e<-sum((b0+b1*data0$x)+g*data0$t+(1/g)*exp(b0+b1*data0$x)*(1-exp(g*data0$t)))
r<-sum((1/g)*exp(b0+b1*data1$x)*(1-exp(g*data1$t)))
i<-sum(log(exp((1/g)*exp(b0+b1*data2$x)*(1-exp(g*data2$lower)))-exp((1/g)*exp(b0+b1*data2$x)*(1-exp(g*data2$upper)))))
l<-e+r+i
return(-l)
}
x <- list(label=c("beta0","beta1","gamma"),est=c(-8,0.03,0.0105),low=c(-10,0,0),upp=c(10,1,1))
r <- dfp(x,f=nlogf)
x$est <- r$est
plkhci(x,nlogf,"beta0")
plkhci(x,nlogf,"beta1")
plkhci(x,nlogf,"gamma")

I am giving you a super long answer, but it will help you see that you can chase down your own error messages (most of the time, sometimes this means of looking at functions will not work). It is good to see what is happening inside a method when it throws an warning because sometimes it is fine and sometimes you need to fix your data.
This function is REALLY involved! You can look at it by typing dfp into the R command line (NO TRAILING PARENTHESES) and it will print out the whole function.
17 lines from the end, you will see an assignment:
del <- dqstep(x, f, sens = 0.01)
You can see that this calls the function dqstep, which is reflected in your warning.
You can see this function by typing dqstep into the command line of R again. In reading through this function, also long but not so tedious, there is this section of boolean logic:
if (r < 0 | is.na(r) | b == 0) {
warning("oops: unable to find stepsize, use default")
cat("problem with ", x$label[i], "\n")
break
}
This is the culprit, it returns the message you are getting. The line right above it spells out how r is calculated. You are feeding this function your default x from the prior function plus a sensitivity equations (which I assume dfp generates, it is huge and ugly, so I did not untangle all of it). When the previous nested function returns either an r value lower than Zero, and r value of NA or a b value of ZERO, that message is displayed.
The second error tells you that it was likely b==0 because b is in the denominator and it returned and infinity value, so NO STEP SIZE IS RETURNED FROM THIS NESTED FUNCTION to the variable del in dfp.
The step is fed into THIS equation:
h <- logit.hessian(x, f, del, dapprox = FALSE, nfcn)
which you can look into by typing logit.hessian into the R commandline.
When you do, you see that del is a step size in a logit scale, with a default value of del=rep(0.002, length(x$est))...which the function set for you because running the function dqstep returned no value.
So, you now get to decide if using that step size in the calculation of your confidence interval seems right or if there is a problem with your data which needs resolving to make this work better for you.
When I ran it, line by line, I got this message:
Error in if (denom <= 0) { : missing value where TRUE/FALSE needed
at this line of code:
r <- dfp(x,f=nlogf(x))
Which makes me think I was correct.
That is how I chase down issues I have with messages from packages when I get a message like yours.

Related

Error related to randomisation test within lapply() function in R

I have 30 datasets that are conbined in a data list. I wanted to analyze spatial point pattern by L function along with randomisation test. Codes are following.
The first code works well for a single dataset (data1) but once it is applied to a list of dataset with lapply() function as shown in 2nd code, it gives me a very long error like so,
"Error in Kcross(X, i, j, ...) : No points have mark i = Acoraceae
Error in envelopeEngine(X = X, fun = fun, simul = simrecipe, nsim =
nsim, : Exceeded maximum number of errors"
Can anybody tell me what is wrong with 2nd code?
grp <- factor(data1$species)
window <- ripras(data1$utmX, data1$utmY)
pp.grp <- ppp(data1$utmX, data1$utmY, window=window, marks=grp)
L.grp <- alltypes(pp.grp, Lest, correlation = "Ripley")
LE.grp <- alltypes(pp.grp, Lcross, nsim = 100, envelope = TRUE)
plot(L.grp)
plot(LE.grp)
L.LE.sp <- lapply(data.list, function(x) {
grp <- factor(x$species)
window <- ripras(x$utmX, x$utmY)
pp.grp <- ppp(x$utmX, x$utmY, window = window, marks = grp)
L.grp <- alltypes(pp.grp, Lest, correlation = "Ripley")
LE.grp <- alltypes(pp.grp, Lcross, envelope = TRUE)
result <- list(L.grp=L.grp, LE.grp=LE.grp)
return(result)
})
plot(L.LE.sp$LE.grp[1])
This question is about the R package spatstat.
It would help if you could add a minimal working example including data which demonstrate this problem.
If that is not available, please generate the error on your computer, then type traceback() and capture the output and post it here. This will trace the location of the error.
Without this information, my best guess is the following:
The error message says No points have mark i=Acoraceae. That means that the code is expecting a point pattern to include points of type Acoraceae but found that there were none. This can happen because in alltypes(... envelope=TRUE) the code generates random point patterns according to complete spatial randomness. In the simulated patterns, the number of points of type Acoraceae (say) will be random according to a Poisson distribution with a mean equal to the number of points of type Acoraceae in the observed data. If the number of Acoraceae in the actual data is small then there is a reasonable chance that the simulated pattern will contain no Acoraceae at all. This is probably what is causing the error message No points have mark i=Acoraceae.
If this interpretation is correct then you should be able to suppress the error by including the argument fix.marks=TRUE, that is,
alltypes(pp.grp, Lcross, envelope=TRUE, fix.marks=TRUE, nsim=99)
I'm not suggesting this is necessarily appropriate for your application, but this should remove the error message if my guess is correct.
In the latest development version of spatstat, available on github, the code for envelope has been tweaked to detect this error.

coxph : "**Error in if (any(infs)) warning(paste("Loglik converged before variable ", : missing value where TRUE/FALSE needed"**

I am doing a Multi-touch Attribution problem using coxph() function. Its a large dataset with around 1 million data but currently I am running a subset of ~100000.
I have removed all the missing values from my data. I am getting an error
Error in if (any(infs)) warning(paste("Loglik converged before variable ",
:missing value where TRUE/FALSE needed
Here is the Cox Function :
SurvObj <- Surv(Final_Data$NormalizedStartTime,Final_Data$NormalizedEndTime,event = Final_Data$Converted)
model2 <- coxph(SurvObj ~ Clicks + RFR + Impressions + Other + `Site-ID` + `Creative-ID`, data = Final_Data1)
Thanks in Advance for the help :)
The Error and the Summary of Final_Data
The line above, "Loglik" and so on, is meant to give information about a dubious test, where loglik converges beforehandedly.
The full line, when produced correctly, should be something akin to the following:
"Loglik converged before variable 100; beta may be infinite."
And it is produced by the following code in the agreg.Rnw
https://r-forge.r-project.org/scm/viewvc.php/pkg/survival/noweb/agreg.Rnw?diff_format=c&sortdir=down&sortby=author&revision=11529&root=survival&view=markup
if (any(infs))
warning(paste("Loglik converged before variable ",
paste((1:nvar)[infs],collapse=","),
"; beta may be infinite. "))
From here we can see that the any() would expect the infs to be a number. If infs is NaN, the function will not work.
The inner part functions like this:
paste("Loglik converged before variable ",
paste((1:1)[NaN],collapse=","),
"; beta may be infinite. ")
[1] "Loglik converged before variable NA ; beta may be infinite. "
so this part of the function would run, if it could get to the inner part. But it does not, since evaluating
infs <- NaN
if (any(infs))
warning(paste("Loglik converged before variable ",
paste((1:nvar)[infs],collapse=","),
"; beta may be infinite. "))
Error in if (any(infs)) warning(paste("Loglik converged before variable ", :
missing value where TRUE/FALSE needed
the exact error you had. The infs variable is generated before, via infs <- abs(agfit$u %*% var). And the agfit is produced via .Call(Cagfit4.....), so the problem is in the underlying C code for the function.
For some of my data, both the agfit$u and agfit$imat are NaNs. The $u and $imat are generated from
u2 = SET_VECTOR_ELT(rlist, 1, allocVector(REALSXP, nvar));
u = REAL(u2);
and
PROTECT(imat2 = allocVector(REALSXP, nvar*nvar));
nprotect =1;
if (NAMED(covar2)>0) {
PROTECT(covar2 = duplicate(covar2));
nprotect++;
}
covar= dmatrix(REAL(covar2), nused, nvar);
imat = dmatrix(REAL(imat2), nvar, nvar);
respectively, in the agfit4 C code. I am not that good in C so I cannot say what the problem is in the C code area. It could be a bug or then the Cox function is not usable for your data or both. Nevertheless, something should be done to this, since I've seen others asking about this error too. But unfortunately I am not skilled enough to fix this, I only can point to the problem and holler "hey! somebody else pls take care of this" :-).
My possible solutions would be:
1) check if your data is usable with the Cox function at all (e.g. if you have 2000 cases of 0 and 2 cases of 1, the Cox function may not be suitable anyway, and the error is suggesting you find another way for the analysis :-) )
2) modify the code to do the any(infs) evaluation via removing the NAs, resulting in FALSE and skipping the error via the following:
if (any(infs, na.rm=T)) (could screw up the code, tho)
3) fix the C code so that the agfit4 does not produce NaNs in the output object. (only for the skilled, not me)
I had this problem too. Turns out one of my variables had infinite values in it. Problem solved once I replaced these values

R: Error "incorrect number of subscripts on matrix" when trying boot with roc

I am using Rstudio, and trying to use roc from package pROC with boot for bootstrapping. I am following the code on this link. Code from that link uses another function with boot which works fine. But when I try roc, it gives error.
Below is my code: (In the output I am printing the dimensions of the sample to see how many times re-sampling is done. Here R=5, sampling is done 6 times and then error occurs).
library(boot)
roc_boot <- function(D, d) {
E=D[d,]
print(dim(E))
return(roc(E$x,E$y))
}
x = round(runif(100))
y = runif(100)
D = data.frame(x, y)
b = boot(D, roc_boot, R=5)
Output:
[1] 100 2
[1] 100 2
[1] 100 2
[1] 100 2
[1] 100 2
[1] 100 2
Error in boot(D, roc_boot, R = 5) :
incorrect number of subscripts on matrix
What is the problem here?
If I replace roc with some other function like sum, then it works perfectly (it prints the 6 lines without any error). It also gives different answers when booted multiple times (while keeping D same).
Please notice that the error is occurring after all the re-sampling is done. I cannot find the source of this particular error. I have looked at other answers like this but they don't seem to apply on my case. Can someone also explain why this error occurs and what it means, generally?
EDIT:
I returned only area under curve using following function:
roc_boot <- function(D, d) {
E=D[d,]
objectROC <- roc(E$x,E$y)
return(objectROC$auc)
}
This gives an answer of area under the curve but it is same as the answer without bootstrapping, meaning there is no improvement. I need to pass the whole roc object to have improvement because of bootstrapping.
Turns out, you can't return roc object from the function statistic in boot. It has to be a numeric value. So the following modification gets rid of the error (as edited in the questions)
roc_boot <- function(D, d) {
E=D[d,]
objectROC <- roc(E$x,E$y)
return(objectROC$auc)
}
Moreover, As suggested by #Calimo, boot only improves the confidence interval and not the actual answer. In my case, there is a slight improvement in confidence interval.

Estimate parameters of Frechet distribution using mmedist or fitdist(with mme) error

I'm relatively new in R and I would appreciated if you could take a look at the following code. I'm trying to estimate the shape parameter of the Frechet distribution (or inverse weibull) using mmedist (I tried also the fitdist that calls for mmedist) but it seems that I get the following error :
Error in mmedist(data, distname, start = start, fix.arg = fix.arg, ...) :
the empirical moment function must be defined.
The code that I use is the below:
require(actuar)
library(fitdistrplus)
library(MASS)
#values
n=100
scale = 1
shape=3
# simulate a sample
data_fre = rinvweibull(n, shape, scale)
memp=minvweibull(c(1,2), shape=3, rate=1, scale=1)
# estimating the parameters
para_lm = mmedist(data_fre,"invweibull",start=c(shape=3,scale=1),order=c(1,2),memp = "memp")
Please note that I tried many times en-changing the code in order to see if my mistake was in syntax but I always get the same error.
I'm aware of the paradigm in the documentation. I've tried that as well but with no luck. Please note that in order for the method to work the order of the moment must be smaller than the shape parameter (i.e. shape).
The example is the following:
require(actuar)
#simulate a sample
x4 <- rpareto(1000, 6, 2)
#empirical raw moment
memp <- function(x, order)
ifelse(order == 1, mean(x), sum(x^order)/length(x))
#fit
mmedist(x4, "pareto", order=c(1, 2), memp="memp",
start=c(shape=10, scale=10), lower=1, upper=Inf)
Thank you in advance for any help.
You will need to make non-trivial changes to the source of mmedist -- I recommend that you copy out the code, and make your own function foo_mmedist.
The first change you need to make is on line 94 of mmedist:
if (!exists("memp", mode = "function"))
That line checks whether "memp" is a function that exists, as opposed to whether the argument that you have actually passed exists as a function.
if (!exists(as.character(expression(memp)), mode = "function"))
The second, as I have already noted, relates to the fact that the optim routine actually calls funobj which calls DIFF2, which calls (see line 112) the user-supplied memp function, minvweibull in your case with two arguments -- obs, which resolves to data and order, but since minvweibull does not take data as the first argument, this fails.
This is expected, as the help page tells you:
memp A function implementing empirical moments, raw or centered but
has to be consistent with distr argument. This function must have
two arguments : as a first one the numeric vector of the data and as a
second the order of the moment returned by the function.
How can you fix this? Pass the function moment from the moments package. Here is complete code (assuming that you have made the change above, and created a new function called foo_mmedist):
# values
n = 100
scale = 1
shape = 3
# simulate a sample
data_fre = rinvweibull(n, shape, scale)
# estimating the parameters
para_lm = foo_mmedist(data_fre, "invweibull",
start= c(shape=5,scale=2), order=c(1, 2), memp = moment)
You can check that optimization has occurred as expected:
> para_lm$estimate
shape scale
2.490816 1.004128
Note however, that this actually reduces to a crude way of doing overdetermined method of moments, and am not sure that this is theoretically appropriate.

Rstudio - Error in user-created function - Object not found

First thing's first; my skills in R are somewhat lacking, so there is a chance I may be using something incorrectly in the following. If I go wrong somewhere, please let me know.
I've been having a problem in Rstudio where I try to create 2 functions for formulae, then use nls() to create a model using those, with which I will make a plot. When I try to run the line for creating it, I get an error message saying an object is missing. It is always the last object in the function of the first "formula", in this case, 'p'.
I'll provide my code here then explain what I am trying to do for a little context;
DATA <- read.csv(file.choose(), as.is=T)
formula <- function(m, h, g, p){(2*m)/(m+(sqrt(m^2+1)))*p*g*(h^2/2)}
formula.2 <- function(P, V, g){P*V*g}
m = 0.85
p = 766.42
g = 9.81
P = 0.962
h = DATA$lithothick
V = DATA$Vol
fit.1 <- nls(formula (P, V, g) ~ formula(m, h, g, p), data = DATA)
If I run it how it is shown, I get the error;
Error in (2 * m)/(m + (sqrt(m^2 + 1))) * p : 'p' is missing
However it will show h if I rearrange the objects in the formula to (m,g,p,h)
Error in h^2 : 'h' is missing
Now, what I'm trying to do is this; I have a .csv file with 3 thicknesses (0.002, 0.004, 0.006 meters) and 3 volumes (10, 25, 50 milliliters). I am trying to see how the rates of strength and buoyancy increase (in relation to each other) as the thickness and volume for each object (respectively) increases. I was hoping to come out with a graph showing the upward trend for each property (strength and buoyancy), as I believe them to be unequal (one exponential the other linear). I hope that isn't more confusing than clarifying, but any pointers would be GREATLY appreciated.
You cannot overload functions this way in R, what you can do is provide optional arguments (which is a kind of overload) with syntax function(mandatory, optionnal="")
For what you are trying to do, you have to use formula.2 if you want to use the 3-arguments formula.
A workaround could be to use one function with one optionnal argument and check if this argument has been used. Something like :
formula = function(m, h, g, p="") {
if (is.numeric(p)) {
(2*m)/(m+(sqrt(m^2+1)))*p*g*(h^2/2)
} else {
m*h*g
}
}
This is ugly and a very bad way to do it (your variables do not really mean the same thing from one call to the other) but it works.

Resources