Multivariate Copulas and uniroot error in R - r

I am trying to construct a multivariate copula method with time series (19 observations) for 7 risk indicators. First I would like to find which is the most suitable copula I should use for this dataset. I would expect that the best fit would be the Gumbel method with survival function but still want to make sure that I am at the right track. For this, I am using Vinecopula package in R. To begin with, I convert my data to uniform marginals using rank method with the function:
umr= apply(dataset,2,rank)/(nrow(dataset)+1)
Then, I try to run the following function:
st_rvine= RVineStructureSelect(dataset, familyset = NA, type = 0, selectioncrit = "AIC", indeptest = FALSE, level = NA, progress = FALSE, weights = NA, treecrit = "tau",rotations = TRUE, se=FALSE, presel = TRUE, method = "mle",cores = 1)
After this however I get the following error:
Error in uniroot(function(x) tau - frankTau(x), lower = 0 + .Machine$double.eps^0.5, :
f() values at end points not of opposite sign
I still haven't found what I may be doing wrong. Can anyone guide me on this please? Is there a mistake on my formulas or there a step that I am missing?
Thank you in advance.

Related

R bootnet case-dropping bootstrap stops running with no specific error message

I'm running a network analysis in R using qgraph and bootnet. When running the case-dropping bootstrap to estimate correlation-stability coefficients of centrality indices, the algorithm simply "gets stuck" with no specific error message.
I know there are issues with my data (e.g., the sample is quite small, N = 112, a subset of variables are highly correlated with each other, while a few others share little to no correlation with the rest). However, it's never happened to me that the analysis would simply stop mid-way and I'm trying to figure out what exactly is causing this.
This is the current code:
netdt23 <- select(dati,
ToM,
"MT" = Mentalization,
"PpR" = Popular_Response,
"CM" = Complete_Meaning,
"PE" = Problem_Elaboration,
"PS" = Problem_Solving,
"NE" = Negative_Emotions
)
npn23 <- huge.npn(netdt23)
net23 <- estimateNetwork(
npn23,
default = "EBICglasso",
corMethod = "cor_auto",
lambda.min.ratio = 1e-15,
threshold = TRUE
)
btfun <- function(x){
nt <- estimateNetwork(x,
default = "EBICglasso",
corMethod = "cor_auto",
lambda.min.ratio = 1e-15,
threshold = TRUE
)
return(nt$graph)
}
btntCen23 <- bootnet(
npn23, fun = btfun, type = "case",
statistics = c("edge", "strength", "betweenness"), #"closeness",
nBoots = 2000,
caseMin = .05, caseMax = .95, caseN = 19)
While running, bootnet occasionally gives warnings and/or errors, but these are never the last output before crashing (meaning that, apparently, the analysis keeps running past these issues and stops later). Such as:
Error in lav_samplestats_icov(COV = cov[[g]], ridge = ridge.eps, x.idx = x.idx[[g]], : lavaan ERROR: sample covariance matrix is not positive-definite
An empty network was selected to be the best fitting network. Possibly set 'lambda.min.ratio' higher to search more sparse networks. You can also change the 'gamma' parameter to improve sensitivity (at the cost of specificity).
Any help would be greatly appreciated. If I left out necessary information, please let me know and I'll edit the question.

Error with svyglm function in survey package in R: "all variables must be in design=argument"

New to stackoverflow. I'm working on a project with NHIS data, but I cannot get the svyglm function to work even for a simple, unadjusted logistic regression with a binary predictor and binary outcome variable (ultimately I'd like to use multiple categorical predictors, but one step at a time).
El_under_glm<-svyglm(ElUnder~SO2, design=SAMPdesign, subset=NULL, family=binomial(link="logit"), rescale=FALSE, correlation=TRUE)
Error in eval(extras, data, env) :
object '.survey.prob.weights' not found
I changed the variables to 0 and 1 instead:
Under_narm$SO2REG<-ifelse(Under_narm$SO2=="Heterosexual", 0, 1)
Under_narm$ElUnderREG<-ifelse(Under_narm$ElUnder=="No", 0, 1)
But then get a different issue:
El_under_glm<-svyglm(ElUnderREG~SO2REG, design=SAMPdesign, subset=NULL, family=binomial(link="logit"), rescale=FALSE, correlation=TRUE)
Error in svyglm.survey.design(ElUnderREG ~ SO2REG, design = SAMPdesign, :
all variables must be in design= argument
This is the design I'm using to account for the weights -- I'm pretty sure it's correct:
SAMPdesign=svydesign(data=Under_narm, id= ~NHISPID, weight= ~SAMPWEIGHT)
Any and all assistance appreciated! I've got a good grasp of stats but am a slow coder. Let me know if I can provide any other information.
Using some make-believe sample data I was able to get your model to run by setting rescale = TRUE. The documentation states
Rescaling of weights, to improve numerical stability. The default
rescales weights to sum to the sample size. Use FALSE to not rescale
weights.
So, one solution maybe is just to set rescale = TRUE.
library(survey)
# sample data
Under_narm <- data.frame(SO2 = factor(rep(1:2, 1000)),
ElUnder = sample(0:1, 1000, replace = TRUE),
NHISPID = paste0("id", 1:1000),
SAMPWEIGHT = sample(c(0.5, 2), 1000, replace = TRUE))
# with 'rescale' = TRUE
SAMPdesign=svydesign(ids = ~NHISPID,
data=Under_narm,
weights = ~SAMPWEIGHT)
El_under_glm<-svyglm(formula = ElUnder~SO2,
design=SAMPdesign,
family=quasibinomial(), # this family avoids warnings
rescale=TRUE) # Weights rescaled to the sum of the sample size.
summary(El_under_glm, correlation = TRUE) # use correlation with summary()
Otherwise, looking code for this function's method with 'survey:::svyglm.survey.design', it seems like there may be a bug. I could be wrong, but by my read when 'rescale' is FALSE, .survey.prob.weights does not appear to get assigned a value.
if (is.null(g$weights))
g$weights <- quote(.survey.prob.weights)
else g$weights <- bquote(.survey.prob.weights * .(g$weights)) # bug?
g$data <- quote(data)
g[[1]] <- quote(glm)
if (rescale)
data$.survey.prob.weights <- (1/design$prob)/mean(1/design$prob)
There may be a work around if you assign a vector of numeric values to .survey.prob.weights in the global environment. No idea what these values should be, but your error goes away if you do something like the following. (.survey.prob.weights needs to be double the length of the data.)
SAMPdesign=svydesign(ids = ~NHISPID,
data=Under_narm,
weights = ~SAMPWEIGHT)
.survey.prob.weights <- rep(1, 2000)
El_under_glm<-svyglm(formula = ElUnder~SO2,
design=SAMPdesign,
family=quasibinomial(),
rescale=FALSE)
summary(El_under_glm, correlation = TRUE)

How to use covariates in rddtools rdd_reg_lm function?

I am trying to run a parametric RD regression using the rddtools R package. However, the package documentation is not very clear to me.
First: the function to define an RD object is:
rdd_data(y, x, covar, cutpoint, z, labels, data)
where covar, in the help file, means only "Exogeneous variables" . But what type? A data frame? A list?
Second: The function rdd_reg_lm again demands informing covariates in this way:
rdd_reg_lm(rdd_object, covariates = NULL, order = 1, bw = NULL,
slope = c("separate", "same"), covar.opt = list(strategy = c("include",
"residual"), slope = c("same", "separate"), bw = NULL),
covar.strat = c("include", "residual"), weights)
Where, according to the help file, the covariates argument means simply "Formula to include covariates". Again, it is not clear to me what is exactly the correct way of applying these covariates.
Moreover, is it possible to include multiple covariates in this function rdd_data() and rdd_reg_lm()?
I appreciate some help here. I have already read the help and vignette files again and again, searched in many blogs and still nothing.
I have already checked this topic below
How to include a linear trend in a regression discontinuity design using rddtools
which showed me the following example:
rd.medic <- rdd_data(y = er ,x = ageyrs, covar = ageyrs, cutpoint=65, data = medicare)
rd.reg <- rdd_reg_lm(rdd_object=rd.medic, covariates = 'ageyrs', slope =
("same"), covar.opt = list("include"))
Even so, the syntax is still not clear to me, as I am trying to add multiple covariates without success
Thanks!
You can create a data frame with your covariates and then include it in rdd_data.
covariates<-data.frame(z1=ageyrs, z2=ageyrs2)
rd.medic <- rdd_data(y = er ,x = ageyrs, covar = covariates, cutpoint=65, data = medicare)
rd.reg <- rdd_reg_lm(rdd_object=rd.medic, covariates =TRUE, slope =("same"))

Using tsboot to obtain confidence interval from a regression with lags

I would like to do a bootstrap of regression coefficient in a return model that includes two lags.
I have snp_ret vector with returns obtained from quantmod. The data looks like this:
head(snp_ret)
ret
1998-10-13 -0.2920975
1998-10-14 1.0728374
1998-10-15 4.0882022
1998-10-16 0.8489058
1998-10-19 0.5635226
1998-10-20 0.1448549
Obtaining bootstrap for coefficients should be simple:
getC=function(myData){
return(coef(lm(formula = dyn(ret ~ lag(ret, c(-1,-9))), data=myData) ))
}
tsboot(snp_ret, getC, R = 100, l = 18, sim = "fixed")
The following error appears:
Error in merge.zoo(ret, lag(ret, c(-1, -9)), retclass = "list", all
= TRUE) : series cannot be merged with non-unique index entries in a series
I suspect that it has to do with the fact that regression has two lags, but do not know how to proceed.
If possible, please help.
All right, I found a workaround, so maybe this will be interesting to somebody else... Using arima function instead of lag operators helped.
getC <- function(myData) {
reg <- suppressWarnings(arima(myData, order = c(9, 0, 0), fixed = c(NA, 0,0,0,0,0,0,0,NA,NA)))
return((coef(reg)[c(1,9,10)]))
Note that arima has a weird way of selecting lags - you have to force to zero coefficients on lags that you don't want to include

Power law fitted by `fitdistr()` function in package `fitdistrplus`

I generate some random variables using rplcon() function in package poweRlaw
data <- rplcon(1000,10,2)
Now, I want to know which known distributions fit the data best. Lognorm? exp? gamma? power law? power law with exponential cutoff?
So I use function fitdist() in package fitdistrplus:
fit.lnormdl <- fitdist(data,"lnorm")
fit.gammadl <- fitdist(data, "gamma", lower = c(0, 0))
fit.expdl <- fitdist(data,"exp")
Due to the power law distribution and power law with exponential cutoff are not the base probability function according to CRAN Task View: Probability Distributions, so I write the d,p,q function of power law based on the example 4 of ?fitdist
dplcon <- function (x, xmin, alpha, log = FALSE)
{
if (log) {
pdf = log(alpha - 1) - log(xmin) - alpha * (log(x/xmin))
pdf[x < xmin] = -Inf
}
else {
pdf = (alpha - 1)/xmin * (x/xmin)^(-alpha)
pdf[x < xmin] = 0
}
pdf
}
pplcon <- function (q, xmin, alpha, lower.tail = TRUE)
{
cdf = 1 - (q/xmin)^(-alpha + 1)
if (!lower.tail)
cdf = 1 - cdf
cdf[q < round(xmin)] = 0
cdf
}
qplcon <- function(p,xmin,alpha) alpha*p^(1/(1-xmin))
Finally, I use codes below to get parameter xmin and alpha of power law:
fitpl <- fitdist(data,"plcon",start = list(xmin=1,alpha=1))
But it throws an error:
<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, ddistnam = ddistname, hessian = TRUE, method = meth, lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
Error in fitdist(data, "plcon", start = list(xmin = 1, alpha = 1)) :
the function mle failed to estimate the parameters,
with the error code 100
I try to search in google and stackoverflow, and so many similar error questions appear, but after reading and trying, no solutions work in my issues, what should I do to complete it correctly to get the parameters?
Thank you for everyone who does me a favor!
This was an interesting one that I am not entirely happy with the discovery but I will tell you what I have found and see if it helps.
On calling the fitdist function, by default it wants to use mledist from the same package. This itself results in a call to stats::optim which is a general optimization function. In it's return value it gives a convergence error code, see ?optim for details. The 100 you see is not one of the ones returned by optim. So I pulled apart the code for mledist and fitdist to find where that error code comes from. Unfortunately it is defined in more than one case and is a general trap error code. If you break down all of the code, what fitdist is trying to do here is the following, subject to various checks etc beforehand.
fnobj <- function(par, fix.arg, obs, ddistnam) {
-sum(do.call(ddistnam, c(list(obs), as.list(par),
as.list(fix.arg), log = TRUE)))
}
vstart = list(xmin=5,alpha=5)
fnobj <- function(par, fix.arg obs, ddistnam) {
-sum(do.call(ddistnam, c(list(obs), as.list(par),
as.list(fix.arg), log = TRUE)))
}
ddistname=dplcon
fix.arg = NULL
meth = "Nelder-Mead"
lower = -Inf
upper = Inf
optim(par = vstart, fn = fnobj,
fix.arg = fix.arg, obs = data, ddistnam = ddistname,
hessian = TRUE, method = meth, lower = lower,
upper = upper)
If we run this code we find a more useful error "function cannot be evaluated at initial parameters". Which makes sense if we look at the function definition. Having xmin=0 or alpha=1 will yield a log-likelihood of -Inf. OK so think try different initial values, I tried a few random choices but all returned a new error, "non-finite finite-difference value 1".
Searching the optim source further for the source of these two errors they are not part of the R source itself, there is however a .External2 call so I can only assume the errors come from there. The non-finite error implies that one of the function evaluations somewhere gives a non numeric result. The function dplcon will do so when alpha <= 1 or xmin <= 0. fitdist lets you specify additional arguments that get passed to mledist or other (depending on what method you choose, mle is default) of which lower is one for controlling lower bounds on the parameters to be optimized. So I tried imposing these limits and trying again:
fitpl <- fitdist(data,"plcon",start = list(xmin=1,alpha=2), lower = c(xmin = 0, alpha = 1))
Annoyingly this still gives an error code 100. Tracking this down yields the error "L-BFGS-B needs finite values of 'fn'". The optimization method has changed from the default Nelder-Mead as you specifying the boundary and somewhere on the external C code call this error arises, presumably close to the limits of either xmin or alpha where the stability of the numerical calculation as we approach infinity is important.
I decided to do quantile matching rather than max likelihood to try to find out more
fitpl <- fitdist(data,"plcon",start = list(xmin=1,alpha=2),
method= "qme",probs = c(1/3,2/3))
fitpl
## Fitting of the distribution ' plcon ' by matching quantiles
## Parameters:
## estimate
## xmin 0.02135157
## alpha 46.65914353
which suggests that the optimum value of xmin is close to 0, it's limits. The reason I am not satisfied is that I can't get a maximum-likelihood fit of the distribution using fitdist however hopefully this explanation helps and the quantile matching gives an alternative.
Edit:
After learning a little more about power law distributions in general it makes sense that this does not work as you expect. The parameter power parameter has a likelihood function which can be maximised conditional on a given xmin. However no such expression exists for xmin since the likelihood function is increasing in xmin. Typically estimation of xmin comes from a Kolmogorov--Smirnov statistic, see this mathoverflow question and the d_jss_paper vignette of the poweRlaw package for more info and associated references.
There is functionality to estimate the parameters of the power law distribution in the poweRlaw package itself.
m = conpl$new(data)
xminhat = estimate_xmin(m)$xmin
m$setXmin(xminhat)
alphahat = estimate_pars(m)$pars
c(xmin = xminhat, alpha = alphahat)

Resources