Related
For return data I am doing research about the importance of skewness and kurtosis for the cVaR calculation. We are comparing some distributions first, by estimating the parameters of the distribution using fitdist() in R using package "fitdistrplus". However, we want to do this for a various number of distributions (see picture: SGT, GT, SGED, GED, t, norm).
Below is a sample code for SGT, where there is a problem: it introduced NaN's for the standard errors for parameters p and q. I also don't really know how to choose the starting values exactly.
SGTstart <- list(mu=0,sigma=2, lambda = 0.5, p=2, q=8)
SGTfit_R <- fitdistrplus::fitdist(data = as.vector(coredata(R)), distr = "sgt", method = "mle", SGTstart)
summary(SGTfit_R)
Sample of the data to make it reproducable: return vector of my stock index
c("0", "-1,008599424", "0,73180187", "0,443174024", "-0,351935172", "-1,318784086", "-2,171323799", "1,270243431", "-0,761354019", "0,417350946", "0,906432976", "-0,066736422", "-0,867085373", "-0,119914361", "-0,300989601", "0,482518259", "0,787365385", "-1,443105439", "-0,318546686", "-3,467674998", "1,041540157", "1,371281289", "-1,176752782", "-1,116893343", "-0,127522915", "-0,658070287", "1,098348016", "0,296391358", "-0,810635352", "-0,041779322", "0,353974233", "0,120090141", "0,304927119", "-1,22772592", "0,040768364", "1,182218724", "0,123136685", "-0,682709972", "-0,174093506", "-0,539704174", "0,579080595", "0,326346169", "0,205503526", "-0,771928642", "1,490828799", "0,734822712", "-0,025733101", "0,246531452", "-0,695585736", "-0,732413919", "0,806417952", "0,396105099", "0,024558388", "-0,791232528", "0,730410255", "-1,438890702", "0,668400286", "1,440996039", "0,731823553", "0,177515522", "0,740085418", "0,926248628", "-0,63516084", "-0,89996829", "1,655117371", "0,501033581", "0,06526534", "1,320866692", "-0,496350734", "-0,10157668", "0,022333393", "-1,236934596", "-1,070586427", "0,661662029", "0,871334714", "0,758891429", "0,064748766", "-0,305132153", "-0,424033661", "1,223444774", "-0,441840866", "-0,661390655", "-2,148399329", "0,843067435", "0,601099664", "-0,329590349", "0,210791225", "-0,341341769", "-0,555892395", "0,624026986", "0,218851965", "-0,015859171", "0,524283138", "-0,855634719", "0,339281481", "0,038507713", "-1,943784688", "0,315857689", "-0,368982834", "-1,111684011", "-0,2409217", "0,421815833", "-0,079319721", "0,915338199", "0,537387704", "-0,023004636", "-0,331854888", "0,702733882", "-1,084343115", "0,16901282", "0,559404916", "-0,538587484", "0,153683523", "-0,336562411", "-0,274946953", "0,862901957", "0,117407383", "1,205205829", "0,633347347", "0,058712615", "-0,083562948", "1,343190727", "1,281380185", "0,750972389", "-1,538678151", "0,228222073", "0,635385022", "0,037379479", "-0,491444798", "-1,220272752", "1,093162287", "1,499512169", "0,041394336", "-0,113330512", "0,657485999", "-0,264647978", "0,115056075", "-0,009763771", "0,454629881", "0,322398317", "0,347112494", "0,948127411", "0,461194301", "-0,407013048", "-0,469481931", "-0,536045151", "0,114726251", "0,396772868", "0,525885581")
Best, Enjo
The answer was using package sgt
I am trying to get the estimation parameters for the following equation:
First, I need to write the function with double summation. Then, I will use any optimization method to get the estimated parameters (can be nlm, optim etc). Here is my attempt (though, I am not sure how to add the double summation part in the end, two for loops):
There are 5 parameters: alpha_x, sigma_x, lambda_x, theta_x, phi_x >>> step_3[1] == alpha_x
for (a in 1:nrow(returns[[4]][,1])){
abc <- 0
for (b in 0:7){
abc <- abc + (step_3[3]^b*(factorial(b))^-1*(sqrt(step_3[2]^2+b*step_3[5]^2))^-1*exp(-1*(as.numeric(returns[[4]][a,1])-(step_3[1]+b*step_3[4]))^2*(2*(step_3[2]^2+b*step_3[5]^2))^-1))
}
abcd <- abcd + log(abc)
}
abcd <- -abcd + 0.5*nrow(returns[[4]][,1])*log(2*pi) + nrow(returns[[4]][,1])*step_3[3]
return(abcd)
}
nlm(s3, step_3 <- c(0.01,0.01,0.01,0.01,0.01), hessian = TRUE )
I am getting the estimation results, but the estimates are way off:
My question is how to implement (efficiently, any suggestions for sapply etc. would be appreciated) the double summation (sum_0_T) > ln (sum_1_l) into a function. The function will be used in optimization.
(l=0:7, T=number of observations)
return1 = some vector with price returns (all returns between -1,1).
After the estimation sample mean and variance should be roughly equal to:
sample_mean = alpha_x + lambda_x * theta_x
sample_variance = sigma_x^2 + lambda_x * (theta_x^2 + phi_x^2)
Thanks.
Here is a sample that can be used for the analysis:
c(0.02423747 ,-0.00419738, -0.03153278, 0.05343888, -0.0175492, 0.00848472, 0.01043673, -0.02123556, 0.01954968, -0.06462473, 0.02679873, 0.07971938, 0.00050474, -0.01768566,
-0.05206762, -0.00413691, 0.06390499, 0.00269576, 0.01520837, 0.00091051, 0.03499043, -0.00121999, -0.00123521, -0.01961684, 0.03360355, 0.01803711, 0.01772631, 0.036523
-0.00038927, 0.00905013, 0.01150976, 0.00480223, 0.01916402, 0.00054628, 0.01911904, 0.02194556, 0.00371314, 0.03376601, 0.0546574, -0.03972611, -0.0272525, 0.00271509,
0.02137819, 0.00483075, 0.03538795, 0.02981431, 0.00428509, -0.07192935, 0.01770175, -0.09626522, 0.07574215, 0.02929555, 0.01776551, 0.0385604, -0.06804089, 0.0666583,
0.01304272, -0.01825728, 0.01703525, 0.02022584, 0.03348027, 0.02818876, -0.00162942, -0.08785954, -0.13366772, 0.10243928)
I am working to reproduce some results from OxMetrics (Ox Professional version 7.10) in R, but I am having a hard time figuring out exactly I get the right specification in R. I do not expect to get identical estimates, but somewhat similar estimates should be possible (see below for estimates from OxMetrics and from R).
Can anyone here help me figuring out how I do what OxMetrics does in R?
I've tried using forecast::arfima, forecast::Arima, fracdiff::fracdiff, and arfima::arfima So far I came closest with the latter.
Below is data and code,
The blow results is from is from OxMetrics ARFIMA(2,0,2) model estimated using Maximum likelihood and from R using arfima from the arfima package (code blow the longer data string).
OxMetrics R (using arfima()]
AR 1.41763 1.78547
AR -0.51606 -0.79782
MA -0.89892 -0.08406
MA 0.30821 0.48083
Constant -0.09382 -0.09423
y <- c(-0.0527281830620101, -0.0483283435220523,
-0.0761110069836706, -0.0425588714546148,
-0.0629789511239869, -0.118944956578757,
-0.156545103342326, -0.138106089421937,
-0.107335059908618, -0.145013381825552,
-0.100753517322066, -0.0987268545186417,
-0.0454663306471916, -0.0404439816954447,
-0.110574863632305, -0.0933955365797221,
-0.0915045759209185, -0.110397691370645,
-0.0944201704700927, -0.121257467376357,
-0.109785472344257, -0.0890776818684245,
-0.0554059943242384, -0.0700566531543618,
-0.0366694695635905, -0.0687369752462432,
-0.0651380598746858, -0.134224646388692,
-0.0670924768348229, -0.0835771023087037,
-0.0709997877276756, -0.116003735777656,
-0.0794873243023737, -0.067057402058551,
-0.0698663891865543, -0.0511133873895728,
-0.0513203609998669, -0.0894001277309737,
-0.0398284483421012, -0.0514468502511471,
-0.0599700163953942, -0.0661889418696937,
-0.079516218903545, -0.0685966077135509,
-0.0861445337428064, -0.0923966209966709,
-0.133444703431511, -0.131567692883267,
-0.127157375630663, -0.136327904368355,
-0.102133208996487, -0.109453799095327,
-0.103333580486325, -0.0982528240902063,
-0.139243862997714, -0.112067682286408,
-0.0741501704478233, -0.0885574830826608,
-0.0819203358523941, -0.0891168040724528,
-0.0331415164887199, -0.038039022334333,
0.000471320939768205, -0.0250547289467331,
-0.0411983586070352, -0.0463752713008887,
-0.0184870766950889, -0.0318185253129144,
-0.0623828610377037, -0.0718563679309012,
-0.0635702270765757, -0.0929728977267059,
-0.0894248292570765, -0.0919046741661464,
-0.0844700793317346, -0.112800098282505,
-0.141344968548085, -0.127965917566584,
-0.143980868315393, -0.154901662762077,
-0.130634570152671, -0.150417664726561,
-0.163723312802416, -0.146099566906346,
-0.14837251795191, -0.144887288973472,
-0.14232221415307, -0.142825446351853,
-0.158838097005599, -0.14340614330986,
-0.118935233992604, -0.109627188482776,
-0.120889714109902, -0.119484146944083,
-0.0950435556738212, -0.134667374330086,
-0.155051119642286, -0.134094795193097,
-0.128627607285988, -0.133954472488274,
-0.119286541395138, -0.135714339904381,
-0.0903767618937357, -0.109592987693797,
-0.0770998518949151, -0.108375176935532,
-0.136901231908067, -0.0856673865524131,
-0.108854388315838, -0.0708359081737591,
-0.106961434062811, -0.0429126711978416,
-0.0550592121225453, -0.0715845951018634,
-0.0509376225313689, -0.0570175197192393,
-0.0724229547086495, -0.0867303057832318,
-0.089712447506396, -0.125158029708487,
-0.122260116350003, -0.0905629436620448,
-0.090357598491857, -0.097173095034008,
-0.0674973361276239, -0.12411935716644,
-0.0957789729967162, -0.088838044599159,
-0.110065127067576, -0.108172925482296)
# install.packages(c("arfima"), dependencies = TRUE)
# library(arfima)
arfima::arfima(y, order = c(2, 0, 2))
The solution is to set the second parameter in the numeach option to 0, i.e
arfima::arfima(y, order = c(2, 0, 2), numeach = c(2, 0))
this controls the the number of starts for the fractional parameter.
Calc<-function(th,t){
th[1]->mean
th[2]->beta.0
th[3]->beta.1
th[4]->beta.2
th[5]->sigma.0
th[6]->lambda
returns<-returnzzzz$ReturnontheSP500Index
r<-0.000137
n<-length(returns)
z<-rnorm(n,0,1)#sigma is variance
sigma.sqs<-vector(length=n)
sigma.sqs[1]<-sigma.0**2
for(i in c(1:(n))){
returns[i+1]<-( r+lambda*sqrt(beta.0+beta.1*sigma.sqs[i]+beta.2*sigma.sqs[i]*z**2)
-0.5*(beta.0+beta.1*sigma.sqs[i]+beta.2*sigma.sqs[i]*z**2)+sqrt(beta.0+beta.1*sigma.sqs[i]+beta.2*sigma.sqs[i]*z**2)*z
)
}
return(list(et=returns, ht=sigma.sqs))
}
GarchLogL<-function(th,t)
{
res<-Calc(th,t)
sigma.sqs<-res$ht
returns<-res$et
return (-sum(dnorm(returns[-1],mean=th[1],sd=sqrt(sigma.sqs[-1]),log=TRUE)))
}
GarchLogLSimpl<-function(th,y){GarchLogL(c(0,th),y)}
fit2<-nlm(GarchLogLSimpl,
p=rep(1,5),
hessian=TRUE,
data<-returnzzzz,
iterlim=500)
sqrt(diag(solve(fit2$hessian)))
Hi this is the first time for me here, i hope i do everything right. In this code i want to implement a maximum likelihood function of the N-GARCH, i get this error out of R: There were 50 or more warnings (use warnings() to see the first 50) and this one Error in solve.default(fit2$hessian) :
Lapack routine dgesv: system is exactly singular: U[1,1] = 0.
Unfortunately i am not a really good programmer and i just changed a GARCH model code for my needs. My return data is
# [,1]
#1996-01-02 0.007793
#1996-01-03 0.000950
#1996-01-04 -0.005826
#1996-01-05 -0.001587
#1996-01-08 0.002821
#1996-01-09 -0.014568
like this, as an xts. I hope i provided enough information, if not pls just comment.
I really appreciate your help!!
I got a new code that might be able to solve it, but I get another error in this one:
Ngarch<-function(rtn){
mu=mean(rtn)
par=c(mu,0.01,0.8,0.01,0.7) #Startwerte
low=c(-10,0,0,0,0)
upp=c(10,1,1,0.4,2)
mm=optim(par,glkn,method="Nelder-Mead",hessian=T)
#mm=optim(par,glkn,method="L-BFGS-B",hessian=T,lower=low,upper=upp)
par=mm$par
H=mm$hessian
Hi = solve(H)
cat(" ","\n")
cat("Estimation results of NGARCH(1,1) model:","\n")
cat("estimates: ",par,"\n")
se=sqrt(diag(Hi))
cat("std.errors: ",se,"\n")
tra=par/se
cat("t-ratio: ",tra,"\n")
# compute the volatility series and residuals
ht=var(rtn)
T=length(rtn)
if(T > 40)ht=var(rtn[1:40])
at=rtn-par[1]
for (i in 2:T){
sig2t=par[2]+par[3]*ht[i-1]+par[4]*(at[i-1]-par[5]*sqrt(ht[i-1]))^2
ht=c(ht,sig2t)
}
sigma.t=sqrt(ht)
Ngarch <- list(residuals=at,volatility=sigma.t)
}
glkn <- function(par){
rtn=read.table("tmp.txt")[,1]
glkn=0
ht=var(rtn)
T=length(rtn)
if(T > 40)ht=var(rtn[1:40])
at=rtn[1]-par[1]
for (i in 2:T){
ept=rtn[i]-par[1]
at=c(at,ept)
sig2t=par[2]+par[3]*ht[i-1]+par[4]*ht[i-1]*(at[i-1]/sqrt(ht[i-1])-par[5])^2
ht=c(ht,sig2t)
glkn=glkn + 0.5*(log(sig2t) + ept^2/sig2t)
}
glkn
}
The error I get is Error in optim(par, glkn, method = "Nelder-Mead", hessian = T) :
non-finite finite-difference value [2].
An again these warnings, how can I get rid of them?
Thanks for your help!
I'm having a problem with the square sum-of-residues of an fitting. The square sum of the residues is too high which indicates that the fit is not very good. However, visually it looks fine to have this very high residual value ... Can anyone help me to know what's going on?
My data:
x=c(0.017359, 0.019206, 0.020619, 0.021022, 0.021793, 0.022366, 0.025691, 0.025780, 0.026355, 0.028858, 0.029766, 0.029967, 0.030241, 0.032216, 0.033657,
0.036250, 0.039145, 0.040682, 0.042334, 0.043747, 0.044165, 0.044630, 0.046045, 0.048138, 0.050813, 0.050955, 0.051910, 0.053042, 0.054853, 0.056886,
0.058651, 0.059472, 0.063770,0.064567, 0.067415, 0.067802, 0.068995, 0.070742,0.073486, 0.074085 ,0.074452, 0.075224, 0.075853, 0.076192, 0.077002,
0.078273, 0.079376, 0.083269, 0.085902, 0.087619, 0.089867, 0.092606, 0.095944, 0.096327, 0.097019, 0.098444, 0.098868, 0.098874, 0.102027, 0.103296,
0.107682, 0.108392, 0.108719, 0.109184, 0.109623, 0.118844, 0.124023, 0.124244, 0.129600, 0.130892, 0.136721, 0.137456, 0.147343, 0.149027, 0.152818,
0.155706,0.157650, 0.161060, 0.162594, 0.162950, 0.165031, 0.165408, 0.166680, 0.167727, 0.172882, 0.173264, 0.174552,0.176073, 0.185649, 0.194492,
0.196429, 0.200050, 0.208890, 0.209826, 0.213685, 0.219189, 0.221417, 0.222662, 0.230860, 0.234654, 0.235211, 0.241819, 0.247527, 0.251528, 0.253664,
0.256740, 0.261723, 0.274585, 0.278340, 0.281521, 0.282332, 0.286166, 0.288103, 0.292959, 0.295201, 0.309456, 0.312158, 0.314132, 0.319906, 0.319924,
0.322073, 0.325427, 0.328132, 0.333029, 0.334915, 0.342098, 0.345899, 0.345936, 0.350355, 0.355015, 0.355123, 0.356335, 0.364257, 0.371180, 0.375171,
0.377743, 0.383944, 0.388606, 0.390111, 0.395080, 0.398209, 0.409784, 0.410324, 0.424782 )
y= c(34843.40, 30362.66, 27991.80 ,28511.38, 28004.74, 27987.13, 22272.41, 23171.71, 23180.03, 20173.79, 19751.84, 20266.26, 20666.72, 18884.42, 17920.78, 15980.99, 14161.08, 13534.40, 12889.18, 12436.11,
12560.56, 12651.65, 12216.11, 11479.18, 10573.22, 10783.99, 10650.71, 10449.87, 10003.68, 9517.94, 9157.04, 9104.01, 8090.20, 8059.60, 7547.20, 7613.51, 7499.47, 7273.46, 6870.20, 6887.01,
6945.55, 6927.43, 6934.73, 6993.73, 6965.39, 6855.37, 6777.16, 6259.28, 5976.27, 5835.58, 5633.88, 5387.19, 5094.94, 5129.89, 5131.42, 5056.08, 5084.47, 5155.40, 4909.01, 4854.71,
4527.62, 4528.10, 4560.14, 4580.10, 4601.70, 3964.90, 3686.20, 3718.46, 3459.13, 3432.05, 3183.09, 3186.18, 2805.15, 2773.65, 2667.73, 2598.55, 2563.02, 2482.63, 2462.49, 2478.10,
2441.70, 2456.16, 2444.00, 2438.47, 2318.64, 2331.75, 2320.43, 2303.10, 2091.95, 1924.55, 1904.91, 1854.07, 1716.52, 1717.12, 1671.00, 1602.70, 1584.89, 1581.34, 1484.16, 1449.26,
1455.06, 1388.60, 1336.71, 1305.60, 1294.58, 1274.36, 1236.51, 1132.67, 1111.35, 1095.21, 1097.71, 1077.05, 1071.04, 1043.99, 1036.22, 950.26, 941.06, 936.37, 909.72, 916.45,
911.01, 898.94, 890.68, 870.99, 867.45, 837.39, 824.93, 830.61, 815.49, 799.77, 804.84, 804.88, 775.53, 751.95, 741.01, 735.86, 717.03, 704.57, 703.74, 690.63,
684.24, 650.30, 652.74, 612.95 )
Then make fit using the nlsLM function (minpack.lm package):
library(magicaxis)
library(minpack.lm)
sig.backg=3*10^(-3)
mod <- nlsLM(y ~ a *( 1 + (x/b)^2 )^c+sig.backg,
start = c(a = 0, b = 1, c = 0),
trace = TRUE)
## plot data
magplot(x, y, main = "data", log = "xy", pch=16)
## plot fitted values
lines(x, fitted(mod), col = 2, lwd = 4 )
This value is the residue:
> print(mod)
Nonlinear regression model
model: y ~ a * (1 + (x/b)^2)^c + sig.backg
data: parent.frame()
a b c
68504.2013 0.0122 -0.6324
residual sum-of-squares: 12641435
Number of iterations to convergence: 34
Achieved convergence tolerance: 0.0000000149
sum-of-squares residual is too high : 12641435 ...
Is that so or is something wrong with the adjustment? It is bad?
It makes sense, since the squared mean of your response variable is 38110960. You can scale your data if you prefer to work with smaller numbers.
The residual sum of squares doesn't have much meaning without knowing the total sum of squares (from which R^2 can be calculated). Its value is going to increase if your data have large values or if you add more data points, regardless of how good your fit is. Also, you may want to look at a plot of your residuals versus fitted data, there is a clear pattern that should be explained by your model to ensure that your errors are Normally distributed.