Power law fitted by `fitdistr()` function in package `fitdistrplus` - r

I generate some random variables using rplcon() function in package poweRlaw
data <- rplcon(1000,10,2)
Now, I want to know which known distributions fit the data best. Lognorm? exp? gamma? power law? power law with exponential cutoff?
So I use function fitdist() in package fitdistrplus:
fit.lnormdl <- fitdist(data,"lnorm")
fit.gammadl <- fitdist(data, "gamma", lower = c(0, 0))
fit.expdl <- fitdist(data,"exp")
Due to the power law distribution and power law with exponential cutoff are not the base probability function according to CRAN Task View: Probability Distributions, so I write the d,p,q function of power law based on the example 4 of ?fitdist
dplcon <- function (x, xmin, alpha, log = FALSE)
{
if (log) {
pdf = log(alpha - 1) - log(xmin) - alpha * (log(x/xmin))
pdf[x < xmin] = -Inf
}
else {
pdf = (alpha - 1)/xmin * (x/xmin)^(-alpha)
pdf[x < xmin] = 0
}
pdf
}
pplcon <- function (q, xmin, alpha, lower.tail = TRUE)
{
cdf = 1 - (q/xmin)^(-alpha + 1)
if (!lower.tail)
cdf = 1 - cdf
cdf[q < round(xmin)] = 0
cdf
}
qplcon <- function(p,xmin,alpha) alpha*p^(1/(1-xmin))
Finally, I use codes below to get parameter xmin and alpha of power law:
fitpl <- fitdist(data,"plcon",start = list(xmin=1,alpha=1))
But it throws an error:
<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, ddistnam = ddistname, hessian = TRUE, method = meth, lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
Error in fitdist(data, "plcon", start = list(xmin = 1, alpha = 1)) :
the function mle failed to estimate the parameters,
with the error code 100
I try to search in google and stackoverflow, and so many similar error questions appear, but after reading and trying, no solutions work in my issues, what should I do to complete it correctly to get the parameters?
Thank you for everyone who does me a favor!

This was an interesting one that I am not entirely happy with the discovery but I will tell you what I have found and see if it helps.
On calling the fitdist function, by default it wants to use mledist from the same package. This itself results in a call to stats::optim which is a general optimization function. In it's return value it gives a convergence error code, see ?optim for details. The 100 you see is not one of the ones returned by optim. So I pulled apart the code for mledist and fitdist to find where that error code comes from. Unfortunately it is defined in more than one case and is a general trap error code. If you break down all of the code, what fitdist is trying to do here is the following, subject to various checks etc beforehand.
fnobj <- function(par, fix.arg, obs, ddistnam) {
-sum(do.call(ddistnam, c(list(obs), as.list(par),
as.list(fix.arg), log = TRUE)))
}
vstart = list(xmin=5,alpha=5)
fnobj <- function(par, fix.arg obs, ddistnam) {
-sum(do.call(ddistnam, c(list(obs), as.list(par),
as.list(fix.arg), log = TRUE)))
}
ddistname=dplcon
fix.arg = NULL
meth = "Nelder-Mead"
lower = -Inf
upper = Inf
optim(par = vstart, fn = fnobj,
fix.arg = fix.arg, obs = data, ddistnam = ddistname,
hessian = TRUE, method = meth, lower = lower,
upper = upper)
If we run this code we find a more useful error "function cannot be evaluated at initial parameters". Which makes sense if we look at the function definition. Having xmin=0 or alpha=1 will yield a log-likelihood of -Inf. OK so think try different initial values, I tried a few random choices but all returned a new error, "non-finite finite-difference value 1".
Searching the optim source further for the source of these two errors they are not part of the R source itself, there is however a .External2 call so I can only assume the errors come from there. The non-finite error implies that one of the function evaluations somewhere gives a non numeric result. The function dplcon will do so when alpha <= 1 or xmin <= 0. fitdist lets you specify additional arguments that get passed to mledist or other (depending on what method you choose, mle is default) of which lower is one for controlling lower bounds on the parameters to be optimized. So I tried imposing these limits and trying again:
fitpl <- fitdist(data,"plcon",start = list(xmin=1,alpha=2), lower = c(xmin = 0, alpha = 1))
Annoyingly this still gives an error code 100. Tracking this down yields the error "L-BFGS-B needs finite values of 'fn'". The optimization method has changed from the default Nelder-Mead as you specifying the boundary and somewhere on the external C code call this error arises, presumably close to the limits of either xmin or alpha where the stability of the numerical calculation as we approach infinity is important.
I decided to do quantile matching rather than max likelihood to try to find out more
fitpl <- fitdist(data,"plcon",start = list(xmin=1,alpha=2),
method= "qme",probs = c(1/3,2/3))
fitpl
## Fitting of the distribution ' plcon ' by matching quantiles
## Parameters:
## estimate
## xmin 0.02135157
## alpha 46.65914353
which suggests that the optimum value of xmin is close to 0, it's limits. The reason I am not satisfied is that I can't get a maximum-likelihood fit of the distribution using fitdist however hopefully this explanation helps and the quantile matching gives an alternative.
Edit:
After learning a little more about power law distributions in general it makes sense that this does not work as you expect. The parameter power parameter has a likelihood function which can be maximised conditional on a given xmin. However no such expression exists for xmin since the likelihood function is increasing in xmin. Typically estimation of xmin comes from a Kolmogorov--Smirnov statistic, see this mathoverflow question and the d_jss_paper vignette of the poweRlaw package for more info and associated references.
There is functionality to estimate the parameters of the power law distribution in the poweRlaw package itself.
m = conpl$new(data)
xminhat = estimate_xmin(m)$xmin
m$setXmin(xminhat)
alphahat = estimate_pars(m)$pars
c(xmin = xminhat, alpha = alphahat)

Related

nlminb problem, convergence error code = 1 message = iteration limit reached without convergence (10)

I am trying to find a best model fitting on my data using library(nlme) and lme function in R. Here is my model when the slope is fixed:
FixedRopeLength <- lme(EnergyCost~ RopeLength,
data = data,
random=~1|Subject, method = "ML")
summary(FixedRopeLength)
To see whether a random slope provides a better model than a fixed slope, I let the slope to vary across Subject as follows:
RandomRopeLength <- lme(EnergyCost~RopeLength,
data = data,
random=~RopeLength|Subject, method = "ML")
summary(RandomRopeLength)
However, I got this error:
Error in lme.formula(EnergyCost ~ RopeLength, data = data, random =
~RopeLength | : nlminb problem, convergence error code = 1
message = iteration limit reached without convergence (10)
Any solution??
Thank you so much for your help. Your code worked. I only needed to justify your code based on lme function. Here is the code which can be used for aforementioned error:
RandomRopeLength<-lme(EnergyCost~RopeLength, data = data, random=~RopeLength|Subject, method = "ML", control =list(msMaxIter = 1000, msMaxEval = 1000))
summary(RandomRopeLength)
Thanks!
?lme shows that there is a control argument, which redirects you to ?lmerControl, which gives you
msMaxIter: maximum number of iterations for the optimization step
inside the ‘lme’ optimization. Default is ‘50’.
and
msMaxEval: maximum number of evaluations of the objective function
permitted for nlminb. Default is ‘200’.
These correspond to eval.max and iter.max from ?nlminb. Since I'm not sure which of these is the problem, I would re-run the model with
control = lmeControl(msMaxIter = 1000, msMaxEval = 1000)
However, I'll warn you that once you have a problem that experiences numerical problems with the default parameter settings, adjusting the parameter settings may just lead to other problems farther down the line ...

Fréchet distribution parameters estimation in R?

I would need to calculate the parameters of a Fréchet distribution.
I am using the packages fitdistrplus and evd of R. But I don't know
what values to initialize the parameters.
library(fitdistrplus)
library(evd)
#Datos
x<-c(19.1,20.2,14.3,19.0,18.8,18.5,20.0,18.6,11.4,15.6,17.4,16.2,15.7,14.3,14.9,14.0,20.2,17.4,18.6,17.0,16.0,12.2,10.8,12.4,10.2,19.8,23.4)
fit.frechet<-fitdist(x,"frechet")
fit.frechet<-fitdist(x,"frechet")
generating the following error
Error in computing default starting values.
Error in manageparam(start.arg = start, fix.arg = fix.arg, obs = data, :
Error in start.arg.default(obs, distname) :
Unknown starting values for distribution frechet. `
When starting the parameters:
fit.frechet2<-fitdist(x,"frechet", start = list(loc=0,scale=1, shape=1))
Output:
Warning messages:
1: In fitdist(x, "frechet", start = list(loc = 0, scale = 1, shape = 1)) :
The dfrechet function should return a vector of with NaN values when input has inconsistent parameters and not raise an error
2: In fitdist(x, "frechet", start = list(loc = 0, scale = 1, shape = 1)) :
The pfrechet function should return a vector of with NaN values when input has inconsistent parameters and not raise an error
3: In sqrt(diag(varcovar)) : NaNs produced
4: In sqrt(1/diag(V)) : NaNs produced
5: In cov2cor(varcovar) :
diag(.) had 0 or NA entries; non-finite result is doubtful
Fitting of the distribution ' frechet ' by maximum likelihood
Parameters:
estimate Std. Error
loc -12128345 40.10705
scale 12128360 40.10705
shape 3493998 NaN
How can I estimate the parameters of the frechet in R?
Well, you could try limit your values and start with some reasonable estimates
F.e.
fit.frechet<-fitdist(x, "frechet", method = "mle", lower = c(0, 0, 0), start = list(loc=1,scale=12, shape=4))
will produce couple of expected warnings, and
print(fit.frechet)
will print somewhat reasonable values
loc 2.146861e-07
scale 1.449643e+01
shape 4.533351e+00
with plot of fit vs empirical
plot(fit.frechet,demp=TRUE)
UPDATE
I would say that Frechet might not be a good fit for your data. I tried Weibull and it looks a lot better, check it yourself
fit.weibull<-fitdist(x, "weibull", method = "mle", lower = c(0, 0))
print(fit.weibull)
plot(fit.weibull, demp=TRUE)
Output is
shape 5.865337
scale 17.837188
One could note that scale parameter is kind of similar and could have been guessed just from histogram. Plot for Weibull fit, given the data it looks quite good

Error using fitdist with truncated distributions

I am trying to fit a certain truncated distribution to a data set. For example, for a lognormal distribution, I define the density function of the truncated distribution as:
dtlnorm <- function(x, meanlog, sdlog,low)
dlnorm(x,meanlog,sdlog)/(1-plnorm(low,meanlog,sdlog))* (x >= low)
where low is the truncation point.
My data is the following vector
Data <- c(1068295.00589834, 1406446.49289834, 1540330.78489834, 1152321.94489834,
3108649.66189834, 3718417.97089834, 2981945.18089834, 4552923.31989834,
5747260.98289834, 2105461.57989834, 1044515.95889834, 1133641.75289834,
3847920.72789834, 2536441.02989834, 3073854.15789834, 1591039.28389834,
2592446.73289834, 4989152.55189834, 2426457.45489834, 120265066.499898,
6888222046.1999, 1092811.87089834, 3440123.51689834, 74684298.1398983,
1475038.27689834, 1124226.39489834, 11739544.5798983, 1187688.74489834,
1023193.88789834, 18784663.9698983)
To fit the distribution, I write:
fitdist(Data,distr="tlnorm",method="mle",start = list(meanlog=0,sdlog=0),fix.arg = list(low=100))
But the following error appears:
Error in fitdist(Data, distr = "tlnorm", method = "mle", start = list(meanlog = 0, :
the function mle failed to estimate the parameters,
with the error code 100
I do not know what is happening. Can somebody help me? Thank you!

Unknown error message when attempting to find MLE in R

I'm trying to find the MLE of distribution whose pdf is specified as 'mixture' in the code. I've provided the code below that gives an error of
"Error in optim(start, f, method = method, hessian = TRUE, ...) :
L-BFGS-B needs finite values of 'fn'"
"claims" is the dataset im using. I tried the same code with just the first two values of "claims" and encountered the same problem, so for a reproducible example the first two values are 1536.77007 and 1946.92409.
The limits on the parameters of the distribution is that 0<.p.<1 and a>0 and b>0, hence the lower and upper bounds in the MLE function. Any help is much appreciated.
#create mixture of two exponential distribution
mixture<-function(x,p,a,b){
d<-p*a*exp(-a*x)+(1-p)*b*exp(-b*x)
d
}
#find MLE of mixture distribution
LL <- function(p,a,b) {
X = mixture(claims,p,a,b)
#
-sum(log(X))
}
mle(LL, start = list(p=0.5,a=1/100,b=1/100),method = "L-BFGS-B", lower=c(0,0,0), upper=c(1,Inf,Inf))
edit: Not really sure why dput(), but anyway,
#first two values of claims put into dput() (the actual values are above)
dput(claims[1:2])
c(307522.103, 195633.5205)

R portfolio analytics chart.EfficientFrontier function

I am trying to use the chart.EfficientFrontier function in the portfolioanalytics package in R to chart an efficient frontier object that I have created but it keeps failing. Basically I am trying to find a frontier that will minimize annaulized standard deviation. Eventually once I get this working I would also like to maximize annualized return.
Firstly I created an annualized standard deviation function using this code
pasd <- function(R, weights){
as.numeric(StdDev(R=R, weights=weights)*sqrt(12)) # hardcoded for monthly data
# as.numeric(StdDev(R=R, weights=weights)*sqrt(4)) # hardcoded for quarterly data
}
I imported a csv file with monthly returns and my portfolio object looks like this:
> prt
**************************************************
PortfolioAnalytics Portfolio Specification
**************************************************
Call:
portfolio.spec(assets = colnames(returns))
Number of assets: 3
Asset Names
[1] "Global REITs" "Au REITs" "Au Util and Infra"
Constraints
Enabled constraint types
- leverage
- long_only
Objectives:
Enabled objective names
- mean
- pasd
Now I successfully create an efficient frontier object using this line:
prt.ef <- create.EfficientFrontier(R = returns, portfolio = prt, type = "DEoptim", match.col = "pasd")
But when I try to plot it I am getting the following error messages.
> chart.EfficientFrontier(prt.ef, match.col="pasd")
Error in StdDev(R = R, weights = weights) :
argument "weights" is missing, with no default
In addition: There were 26 warnings (use warnings() to see them)
Error in StdDev(R = R, weights = weights) :
argument "weights" is missing, with no default
Error in StdDev(R = R, weights = weights) :
argument "weights" is missing, with no default
Error in xlim[2] * 1.15 : non-numeric argument to binary operator
Anyone know why this is the case? When I use summary(prt.ef) I can see the weights, but why is the chart.EfficientFrontier function failing?
As #WaltS suggested, you need to be consistent in implementing functions to annualize mean and risk returns.
But actually to get annualized statistics you have two options, you are not using any:
1) Make the optimization with monthly data, with the original risk return functions in the specification. For plotting you can anualize making
Port.Anua.Returns=prt.ef$frontier[,1]*12
Port.Anua.StDev=prt.ef$frontier[,2]*12^.5
The weights will be the same for monthly or annualized portfolios.
prt.ef$frontier[,-(1:3)]
2) Transform your monthly returns in annualized returns multiplying by 12. Then do the optimization with the usual procedure, all risk and return will be already annualized in prt.ef$frontier.
Related to the jagged line in EF. Using your portfolio specification I was also able to recreate the same behavior. For the following plot I used edhec data, your specification with original mean and StdDev in the objectives:
data(edhec)
returns <- edhec[,1:3]
That behavior must be influenced by the specification or the optimization algorithm you are using. I did the same optimization with solve.QP from package quadprog. This is the result.
Update
The code is here:
require(quadprog)
#min_x(-d^T x + 1/2 b^T D x) r.t A.x>=b
MV_QP<-function(nx, tarRet, Sig=NULL,long_only=FALSE){
if (is.null(Sig)) Sig=cov(nx)
dvec=rep(0,ncol(Sig))
meq=2
Amat=rbind(rep(1,ncol(Sig)),
apply(nx,2,mean) )
bvec=c(1,tarRet )
if (long_only) {
meq=1
Amat=Amat[-1,]
Amat=rbind(Amat,
diag(1,ncol(Sig)),
rep(1,ncol(Sig)),
rep(-1,ncol(Sig)))
bvec=bvec[-1]
bvec=c(bvec,
rep(0,ncol(Sig)),.98,-1.02)
}
sol <- solve.QP(Dmat=Sig, dvec, t(Amat), bvec, meq=meq)$solution
}
steps=50
x=returns
µ.b <- apply(X = x, 2, FUN = mean)
long_only=TRUE
range.bl <- seq(from = min(µ.b), to = max(µ.b)*ifelse(long_only,1,1.6), length.out = steps)
risk.bl <- t(sapply(range.bl, function(targetReturn) {
w <- MV_QP(x, targetReturn,long_only=long_only)
c(sd(x %*% w),w) }))
weigthsl=round(risk.bl[,-1],4)
colnames(weigthsl)=colnames(x)
weigthsl
risk.bl=risk.bl[,1]
rets.bl= weigthsl%*%µ.b
fan=12
plot(x = risk.bl*fan^.5, y = rets.bl*fan,col=2,pch=21,
xlab = "Annualized Risk ",
ylab = "Annualized Return", main = "long only EF with solve.QP")
Adding to Robert's comments, the optimization calculation with monthly returns is a quadratic programming problem with linear constraints. When mean is the return objective and StdDev or var is the risk objective, optimize.portfolio and create.EfficientFrontier select the ROI method as the solver which uses solve.QP, an efficient solver for these sorts of problems. When the risk objective is changed to pasd, these functions don't recognize this as a QP problem so use DEoptim a general nonlinear problem solver perhaps better suited to solving nonconvex problem rather than convex QP ones. See Differential Evolution with DEoptim . This seems to be the cause of the jagged efficient frontier.
In order to have create.EfficientFrontier use solve.QP, which is much more efficient and accurate for this type of problem, you can make a custom moment function to compute the mean and variance and then specify it with the argument momentFUN. However, create.EfficientFrontier at least in part uses means computed directly from the returns rather than using mu from momentFUN. To deal with that, multiply the returns and divide the variance by 12 as shown in the example below.
library(PortfolioAnalytics)
data(edhec)
returns <- edhec[,1:3]
# define moment function
annualized.moments <- function(R, scale=12, portfolio=NULL){
out <- list()
out$mu <- matrix(colMeans(R), ncol=1)
out$sigma <- cov(R)/scale
return(out)
}
# define portfolio
prt <- portfolio.spec(assets=colnames(returns))
prt <- add.constraint(portfolio=prt, type="long_only")
# leverage defaults to weight_sum = 1 so is equivalent to full_investment constraint
prt <- add.constraint(portfolio=prt, type="leverage")
prt <- add.objective(portfolio=prt, type="risk", name="StdDev")
# calculate and plot efficient frontier
prt_ef <- create.EfficientFrontier(R=12*returns, portfolio=prt, type="mean-StdDev",
match.col = "StdDev", momentFUN="annualized.moments", scale=12)
xlim <- range(prt_ef$frontier[,2])*c(1, 1.5)
ylim <- range(prt_ef$frontier[,1])*c(.80, 1.05)
chart.EfficientFrontier(prt_ef, match.col="StdDev", chart.assets = FALSE,
labels.assets = FALSE, xlim=xlim, ylim=ylim )
points(with(annualized.moments(12*returns, scale=12), cbind(sqrt(diag(sigma)), mu)), pch=19 )
text(with(annualized.moments(12*returns, scale=12), cbind(sqrt(diag(sigma)), mu)),
labels=colnames(returns), cex=.8, pos=4)
chart.EF.Weights(prt_ef, match.col="StdDev")
The means and standard deviations of the assets also need to be adjusted and so are plotted outside of chart.EfficientFrontier and shown on the chart below.
At the end of the day it would be simpler, as Robert suggests, to compute the weights for the efficient frontier using the monthly returns and then compute the portfolio returns and standard deviations using annualized asset means and standard deviations and the monthly weights which are the same in both cases. However, perhaps this example is useful to show the use of custom moment and objective functions.
Does not find the reason of the error, but setting the limits it partially works!
prt.ef$frontier #see the EF
xylims=apply(prt.ef$frontier[,c(2,1)],2,range)*c(.98,1.01)
chart.EfficientFrontier(prt.ef, match.col="pasd",
main="Portfolio Optimization",
xlim=xylims[,1], ylim=xylims[,2])
#or
plot(prt.ef$frontier[,c(2,1)],col=2)
ok so I tried the pasd function that WaltS suggested, and the chart.EfficientFrontier seemed to work but it gave me a jagged line and not a smooth line.
I have now created an annualized return function using this code:
pamean <- function(R, weights=NULL){Return.annualized(apply(as.xts(t(t(R) * weights)),1,sum))}
and added this as an objective to my portfolio prt.
> prt
**************************************************
PortfolioAnalytics Portfolio Specification
**************************************************
Call:
portfolio.spec(assets = colnames(returns))
Number of assets: 3
Asset Names
[1] "Global REITs" "Au REITs" "Au Util and Infra"
Constraints
Enabled constraint types
- long_only
- leverage
Objectives:
Enabled objective names
- pamean
- pasd
I then create the efficient frontier again using this line:
> prt.ef <- create.EfficientFrontier(R=returns, portfolio=prt, type="DEoptim", match.col="pasd")
but when I use the summary function I see that only 1 frontier point has been generated. What does the error msg mean and why was only 1 point generated?
> summary(prt.ef)
**************************************************
PortfolioAnalytics Efficient Frontier
**************************************************
Call:
create.EfficientFrontier(R = returns, portfolio = prt, type = "DEoptim",
match.col = "pasd")
Efficient Frontier Points: 1
Error in `colnames<-`(`*tmp*`, value = character(0)) :
attempt to set 'colnames' on an object with less than two dimensions

Resources