I am going through my multivariate class notes and it uses cuhre function from R2Cuba package to evaluate probabilty according to rules on variables X & Y. Here's the complete chunk of the code:
integrand <- function(z){
x <- z[1]
y <- z[2]
if (0<x & x<1 & 0<y & y<1 & x+y>1)
f = 6*x*y^2
else
f = 0
return(f)
}
NDIM = 2
NCOMP = 1
int <- cuhre(NDIM, NCOMP, integrand, rel.tol = 1e-3, abs.tol = 1e-12,
flags = list(verbose = 2, final = 0))$value
int
The result is:
> int
[1] 0.8998863
I understand that this is the probability based on the rule:
0<x<1 & 0<y<1 & x+y>1
What I am not able to understand is that integrand has been defined as a function taking z as an argument, so only 1 parameter will be passed. When it is getting called below, it doesn't have any parameters and integration happens twice and probability gets saved in int. I got like 50% of the people but not very 100% clear on how it worked. Somewhere down the line I think we can use Cuhre to calculate marginal probability as well, can we?
Related
I am working on a portfolio optimazion algorithm and part of the problem consists in generating moment matching scenario.
My choice due to its simplicity and quickness was to go through paper "An algorithm for moment-matching scenario generation with application to financial portfolio optimization" (Ponomareva, Roman and Date).
The problem is that even though the mathematics are very simple, I am stuck by the fact that some of probability weights pi are negative even though the formulas in the paper should ensure otherwise. If I put a loop to run the algorithm until it finds a positive combination it essentially runs forever.
I put the bit of code based on the paper were things get stuck:
dummy1 = 0
while (dummy1 <=0 | dummy1 >= 1) {
dummy1 = round(rnorm(1, mean = 0.5, sd = 0.25), 2)
}
diag.cov.returns = diag(cov.returns)
Z = dummy1 * sqrt (diag.cov.returns) #Vector Z according to paper formula
ZZT = Z %*% t(Z)
LLT = cov.returns - ZZT
L = chol(LLT) #cholesky decomposition to get matrix L
s = sample (1:5, 1)
F1 = 0
F2 = -1
S = (2*N*s)+3
while (((4*F2)-(3*F1*F1)) < 0) {
#Gamma = (2*s*s)*(((N*mean.fourth) - (0.75*(sum(Z^4)* (N*mean.third/sum(Z^3))^2)))/sum(L^4))
#Gamma is necessary if we want to get p from Uniform Distribution
#U = runif(s, 0, 1)
U = rgamma(s, shape = 1, scale = ((1/exp(1)):1))
#p = (s*(N/Gamma)) + ((1/(2*N*s)) - (s/(N*Gamma)))*U
p = (-log(U, base = exp(1)))
p = p/(((2*sum(p))+max(p))*N*s) #this is the array expected to have positive and bounded between 0 and 1
q1 = 1/p
pz = p
p[s+1] = (1-(2*N*sum(p))) #extra point necessary to get the 3 moment mathcing probabilities
F1 = (N*mean.third*sqrt(p[s+1]))/(sum(Z^3))
F2 = p[s+1]*(((N*mean.fourth) - (1/(2*s*s))*sum(L^4)*(sum(1/p)))/sum(Z^4))
}
alpha = (0.5*F1) + 0.5*sqrt((4*F2)-(3*F1*F1))
beta = -(0.5*F1) + 0.5*sqrt((4*F2)-(3*F1*F1))
w1 = 1/(alpha*(alpha+beta))
w2 = 1/(beta*(alpha+beta))
w0 = 1 - (1/(alpha*beta))
P = rep(pz, 2*N) #Vector with Probabilities starting from p + 3 extra probabilities to match third and fourth moments
P[(2*N*s)+1] = p[s+1]*w0
P[(2*N*s)+2] = p[s+1]*w1
P[(2*N*s)+3] = p[s+1]*w2
Unfortunately I cannot discolose the input dataset containing funds returns. However I can surely be more specific. Starting from a data.frame() containing N assets' returns (in my case there 11 funds and monthly returns from 30/01/2001 to 30/09/2020). Once the mean returns, covariance matrix, central third and fourth moments (NOT skewness and kurtosis) and the averages are computed. The algorithm follows as I have reported in the problem. The point where i get stuck is that p takes also negative values. This is a problem since the first s elements of p are later used as probabilities in P.
I hope that in this way the problem is more clear. I also want to add that in the paper the data used by the authors is reported, unfortunately to import them in R would be necessary to import them manually. However I repeat any data.frame() containing assets' returns will do.
Im trying to use DEoptim to find the global minimum of z in in -1 < x < 1 , -1 < y < 1, but im getting Error in FUN(newX[, i], ...) : argument "y" is missing, with no default and I dont know what im supposed to do for the mission "y"
install.packages("Rmpfr")
install.packages("DEoptim")
library(gmp)
library(Rmpfr)
library(parallel) # https://cran.r-project.org/web/packages/DEoptim/vignettes/DEoptim.pdf
library(DEoptim)
z = function(x,y) {
(exp(sin(60.0*x)) + sin(50.0*exp(y)) + sin(80.0*sin(x)) + sin(sin(70.0*y)) - sin(10.0*(x+y)) + (x*x+y*y)/4.0)
}
optimized_Minimum <- DEoptim(z, lower = c(-1,-1), upper = c(1,1),
control=list(storepopfrom=1, trace=FALSE))
# optimized_Minimum <- optim(z, lower = c(-1,-1), upper = c(1,1), method = "Brent")
DEoptim is not expecting you to pass it 2 separate arguments to your function (x and y), but you can still solve for multiple variables.
You need to pass in a vector rather than 2 separate variables with the DEoptim package, as with the optim function.
I tested this with the functions from the linked solution and it worked:
fxcalc <- function(s,t){(1-(1-(parametros$ap/xm)^(s))^t)*100}
suma <- function(s,t){(parametros$fx-fxcalc(s,t))^2}
func <- function(st){
s <- st[1]
t <- st[2]
sum(suma(s,t))
}
optimized_Minimum <- DEoptim(func, lower = c(-1,-1), upper = c(1,1),
control=list(storepopfrom=1, trace=FALSE))
summary(optimized_Minimum)
***** summary of DEoptim object *****
best member : 1 1
best value : 0
after : 200 generations
fn evaluated : 402 times
*************************************
In R, how does the function ar.yw estimate the variance? Specifically, where does the number "var.pred" come from? It does not seem to come from the usual YW estimate of the variance, nor the sum of squared residuals divided by df (even though there is disagreement about what the df should be, none of the choices give an answer equivalent to var.pred). And yes, I know that there are better methods than YW; just trying to figure out what R is doing.
set.seed(82346)
temp <- arima.sim(n=10, list(ar = 0.5), sd=1)
fit <- ar(temp, method = "yule-walker", demean = FALSE, aic=FALSE, order.max=1)
## R's estimate of the sigma squared
fit$var.pred
## YW estimate
sum(temp^2)/10 - fit$ar*sum(temp[2:10]*temp[1:9])/10
## YW if there was a mean
sum((temp-mean(temp))^2)/10 - fit$ar*sum((temp[2:10]-mean(temp))*(temp[1:9]-mean(temp)))/10
## estimate based on residuals, different possible df.
sum(na.omit(fit$resid^2))/10
sum(na.omit(fit$resid^2))/9
sum(na.omit(fit$resid^2))/8
sum(na.omit(fit$resid^2))/7
Need to read the code if it's not documented.
?ar.yw
Which says: "In ar.yw the variance matrix of the innovations is computed from the fitted coefficients and the autocovariance of x." If that is not enough explanation, then you need to look at the code:
methods(ar.yw)
#[1] ar.yw.default* ar.yw.mts*
#see '?methods' for accessing help and source code
getAnywhere(ar.yw.default)
# there are two cases that I see
x <- as.matrix(x)
nser <- ncol(x)
if (nser > 1L) # .... not your situation
#....
else{
r <- as.double(drop(xacf))
z <- .Fortran(C_eureka, as.integer(order.max), r, r,
coefs = double(order.max^2), vars = double(order.max),
double(order.max))
coefs <- matrix(z$coefs, order.max, order.max)
partialacf <- array(diag(coefs), dim = c(order.max, 1L,
1L))
var.pred <- c(r[1L], z$vars)
#.......
order <- if (aic)
(0L:order.max)[xaic == 0L]
else order.max
ar <- if (order)
coefs[order, seq_len(order)]
else numeric()
var.pred <- var.pred[order + 1L]
var.pred <- var.pred * n.used/(n.used - (order + 1L))
So you now need to find the Fortran code for C_eureka. I think I'm finding it here: https://svn.r-project.org/R/trunk/src/library/stats/src/eureka.f This is the code that aI think is returning the var.pred estimate. I'm not a time series guy and It's your responsibility to review this process for applicability to your problem.
subroutine eureka (lr,r,g,f,var,a)
c
c solves Toeplitz matrix equation toep(r)f=g(1+.)
c by Levinson's algorithm
c a is a workspace of size lr, the number
c of equations
c
snipped
c estimate the innovations variance
var(l) = var(l-1) * (1 - f(l,l)*f(l,l))
if (l .eq. lr) return
d = 0.0d0
q = 0.0d0
do 50 i = 1, l
k = l-i+2
d = d + a(i)*r(k)
q = q + f(l,i)*r(k)
50 continue
I would like to integrate a following function named betalog
g <- function(x,a,b){
if (a < 0 | b < 0) stop()
temp <- (a-1)*log(x) + (b-1)*log(1-x)
return( exp(temp) )
}
betalog<- function(x,a,b)
{
temp <- g(x=x,a=a,b=b)* log(x/(1-x))
return( temp )
}
The function g is integrand of the beta function. In theory, betalog should be integrable over any [0,alpha] interval if 0 < alpha < 1, and a > 0, b >0.
However, I cannot numerically integrate betalog with very small a:
a <- 0.00001
b <- 1
alpha <- 0.5
integrate(betalog,a=a,b=b,lower=0,upper=alpha,subdivisions=1000000L)
Error in integrate(betalog, a = a, b = b, lower = 0, upper = alpha, subdivisions =
1000000L) :
non-finite function value
In fact, I cannot even compute the incomplete beta function using R integrate function when a is very small:
integrate(g,a=a,b=b,lower=0,upper=alpha,subdivisions=1000000L)
Error in integrate(g, a = a, b = b, lower = 0, upper = alpha, subdivisions = 1000000L) :
roundoff error is detected in the extrapolation table
Can anyone gives me tip to integrate such incomplete beta-like function in R?
> betalog(0, a, b)
[1] -Inf
Your function is singular at the lower bound. Recall that to compute an improper integral you must replace the singular bounds with dummy variables and take the limit from the correct side towards that bound. In particular,
> integrate(betalog,a=a,b=b,lower=0.000001,upper=alpha,subdivisions=10000000L)
-94.60292 with absolute error < 0.00014
> integrate(betalog,a=a,b=b,lower=.Machine$double.xmin * 1000,upper=alpha,subdivisions=10000
-244894.7 with absolute error < 10
> integrate(betalog,a=a,b=b,lower=.Machine$double.xmin,upper=alpha,subdivisions=10000000L)
Error in integrate(betalog, a = a, b = b, lower = .Machine$double.xmin, :
non-finite function value
I suspect that your integral diverges, but this might be tricky since even state-of-the-art symbolic algebra systems can't prove that:
http://www.wolframalpha.com/input/?i=Integral%28x%5E%280.00001+-1%29+ln%28x%2F%281-x%29%29%2C+x%2C0%2C+0.5%29
Whatever the case, R is not the correct tool for this problem.
I came across an interesting presentation on page 32, and I started out to replicate and understand a code presented
The code from the presentation is as follows:
#Unicredit banks code
library(evir)
library(fExtremes)
# Quantile function of lognormal-GPD severity distribution
qlnorm.gpd = function(p, theta, theta.gpd, u)
{
Fu = plnorm(u, meanlog=theta[1], sdlog=theta[2])
x = ifelse(p<Fu,
qlnorm( p=p, meanlog=theta[1], sdlog=theta[2] ),
qgpd( p=(p - Fu) / (1 - Fu) , xi=theta.gpd[1], mu=theta.gpd[2], beta=theta.gpd[3]) )
return(x)
}
# Random sampling function of lognormal-GPD severity distribution
rlnorm.gpd = function(n, theta, theta.gpd, u)
{
r = qlnorm.gpd(runif(n), theta, theta.gpd, u)
}
set.seed(1000)
nSim = 1000000 # Number of simulated annual losses
H = 1500 # Threshold body-tail
lambda = 791.7354 # Parameter of Poisson body
theta1 = 2.5 # Parameter mu of lognormal (body)
theta2 = 2 # Parameter sigma of lognormal (body)
theta1.tail = 0.5 # Shape parameter of GPD (tail)
theta2.tail = H # Location parameter of GPD (tail)
theta3.tail = 1000 # Scale parameter of GPD (tail)
sj = rep(0,nSim) # Annual loss distribution inizialization
freq = rpois(nSim, lambda) # Random sampling from Poisson
for(i in 1:nSim) # Convolution with Monte Carlo method
sj[i] = sum(rlnorm.gpd(n=freq[i], theta=c(theta1,theta2), theta.gpd=c(theta1.tail, theta2.tail, theta3.tail), u=H))
However I get this error which I cannot resolve:
Error: min(p, na.rm = TRUE) >= 0 is not TRUE
APPENDED Question
Many thanks to Shadow.
I dont know how to change function reference. Is it as easy as qgpd.fExtremes to qgpd.evir?
Thanks to Shadow again to pointing this out.
For anyone who wishes to change reference to function from different package (In the above example from fExtremes to evir its as simple as adding evir:::(function).
Example:
evir:::qgpd( p=(p - Fu) / (1 - Fu) , xi=theta.gpd[1], mu=theta.gpd[2], beta=theta.gpd[3]) )
The reason you get an error here is that the packages fExtremes and evir both implement different versions of the function qgpd. In the evir version, p can be less than 0, while the fExtremes package only implements qgpd for p>=0.
The easiest solution to this is to change the qgpd function call to evir:::qgpd.