compute the integration in R - r

I compute the cumulative distribution function whose result should lie in [0,1]. The equation for computing the CDF is:
\begin{align}
F= \int_{\hat{a}}^{x}\frac{2}{\hat{b}-\hat{a}} ~\sum \nolimits_{k=0}^{' N-1} C_{k}~\text{cos} \bigg( \big(y - \hat{a} \big) \frac{k \pi}{\hat{b} - \hat{a}}\bigg) ~dy
\end{align}
where
Ck is a vector
cos term is a vector
length(ck) = length(cos term) = N.
I am sure the equation is correct, but I am afraid my code is incorrect.
Here is my code:
integrand<-function(x,myCk)
{
(2/(b-a))*(t(myCk)%*%as.matrix(cos((x-hat.a)*uk)))
}
f <- function(x){integrand(x,myCk)}
# define a vectorized version of this function
fv <- Vectorize(f,"x")
res<-integrate(fv,upper = r,lower = hat.a, subdivisions = 2000)$value
resreturns the cumulative distribution function, and the result can be larger than 1.
myCkis a vector generated by another function.
hat.ais the lower bound for integration, and it is negative.
ukis a vector generated by a function. The length of ukequals the length of myCk.
I appreciate your advice!

Related

How to calculate Kullback-leiber divergence of Kernel estimation in R

I used Kernel estimation to get a non parametric probability density function. Then, I want to compare the tails 'distance' between two Kernel distribution of continuous variables, using Kullback-leiber divergence. I have tried the following code:
kl_l <- function(x,y) {
integrand <- function(x,y) {
f.x <- fitted(density(x, bw="nrd0"))
f.y <- fitted(density(y, bw="nrd0"))
return((log(f.x)-log(f.y))*f.x)
}
return(integrate(integrand, lower=-Inf,upper=quantile(density(x, bw="nrd0"),0.25))$value)
#the Kullback-leiber equation
}
When I run kl_l(a,b) for a, b = 19 continuous variables, it returns a warning
Error in density(y, bw = "nrd0") : argument "y" is missing, with no default
Is there any way to calculate this?
(If anyone wants to see the actual equation: https://www.bankofengland.co.uk/-/media/boe/files/working-paper/2019/attention-to-the-tails-global-financial-conditions-and-exchange-rate-risks.pdf page 13.)
In short, I think you just need to move the f.x and f.y outside the integrand (and possibly replace fitted with approxfun):
kl_l <- function(x, y) {
f.x <- approxfun(density(x, bw = "nrd0"))
f.y <- approxfun(density(y, bw = "nrd0"))
integrand <- function(z) {
return((log(f.x(z)) - log(f.y(z))) * f.x(z))
}
return(integrate(integrand, lower = -Inf, upper = quantile(density(x, bw="nrd0"), 0.25))$value)
#the Kullback-leiber equation
}
Expanding a little:
Looking at the paper you referenced, it appears as though you need to first create the two fitted distributions f and g. So if your variable a contains observations under the 1-standard-deviation increase in global financial conditions, and b contains the observations under average global financial conditions, you can create two functions as in your example:
f <- approxfun(density(a))
g <- approxfun(density(b))
Then define the integrand:
integrand <- function(x) log(f(x) / g(x)) * f(x)
The upper bound:
upper <- quantile(density(b, bw = "nrd0"), 0.25)
And finally do the integration on x within the specified bounds. Note that each value of x in the numerical computation has to go into both f and g; in your function kl_l, the x and y were separately going into the integrand, which I think is incorrect; and in any case, integrate will only have operated on the first variable.
integrate(integrand, lower = -Inf, upper = upper)$value
One thing to check for is that approxfun returns NA for values outside the range specified in the density, which can mess up your operation, so you'll need to adjust for those (if you expect the density to go to zero, for example).

Spatial Autoregressive Maximum Likelihood in Julia: Multiple Parameters

I have the following code that evaluates the likelihood function for a spatial autoregressive model in Julia, like so:
function like_sar2(betas,rho,sige,y,x,W)
n = length(y)
A = speye(n) - rho*W
e = y-x*betas-rho*sparse(W)*y
epe = e'*e
tmp2 = 1/(2*sige)
llike = -(n/2)*log(pi) - (n/2)*log(sige) + log(det(A)) - tmp2*epe
end
I am trying to maximize this function but I'm not sure how to pass the different sized function inputs so that the Optim.jl package will accept it. I have tried the following:
optimize(like_sar2,[betas;rho;sige;y;x;W],BFGS())
and
optimize(like_sar2,tuple(betas,rho,sige,y,x,W),BFGS())
In the first case, the matrix in brackets does not conform due to dimension mismatch and in the second, the Optim package doesn't allow tuples.
I'd like to try and maximize this likelihood function so that it can return the numerical Hessian matrix (using the Optim options) so that I can compute t-statistics for the parameters.
If there is any easier way to obtain the numerical Hessian for such a function I'd use that but it appears that packages like FowardDiff only accept single inputs.
Any help would be greatly appreciated!
Not 100% sure I correctly understand how your function works, but it seems to me like you're using the likelihood to estimate the coefficient vector beta, with the other input variables fixed. The way to do this would be to amend the function as follows:
using Optim
# Initialize some parameters
coeffs = rand(10)
rho = 0.1
ys = rand(10)
xs = rand(10,10)
Wmat = rand(10,10)
sige=0.5
# Construct likelihood with parameters fixed at pre-defined values
function like_sar2(β::Vector{Float64},ρ=rho,σε=sige,y=ys,x=xs,W=Wmat)
n = length(y)
A = speye(n) - ρ*W
ε = y-x*β-ρ*sparse(W)*y
epe = ε'*ε
tmp2 = 1/(2*σε)
llike = -(n/2)*log(π) - (n/2)*log(σε) + log(det(A)) - tmp2*epe
end
# Optimize, with starting value zero for all beta coefficients
optimize(like_sar2, zeros(10), NelderMead())
If you need to optimize more than your beta parameters (in the general autoregressive models I've used often the autocorrelation parameter was estimated jointly with other coefficients), you could do this by chugging it in with the beta vector and unpacking within the functions like so:
append!(coeffs,rho)
function like_sar3(coeffs::Vector{Float64},σε=sige,y=ys,x=xs,W=Wmat)
β = coeffs[1:10]; ρ = coeffs[11]
n = length(y)
A = speye(n) - ρ*W
ε = y-x*β-ρ*sparse(W)*y
epe = ε'*ε
tmp2 = 1/(2*σε)
llike = -(n/2)*log(π) - (n/2)*log(σε) + log(det(A)) - tmp2*epe
end
The key is that you end up with one vector of inputs to pass into your function.

Find the probability density of a new data point using "density" function in R

I am trying to find the best PDF of a continuous data that has unknown distribution, using the "density" function in R. Now, given a new data point, I want to find the probability density of this data point based on the kernel density estimator that I have from the "density" function result.
How can I do that?
If your new point will be within the range of values produced by density, it's fairly easy to do -- I'd suggest using approx (or approxfun if you need it as a function) to handle the interpolation between the grid-values.
Here's an example:
set.seed(2937107)
x <- rnorm(10,30,3)
dx <- density(x)
xnew <- 32.137
approx(dx$x,dx$y,xout=xnew)
If we plot the density and the new point we can see it's doing what you need:
This will return NA if the new value would need to be extrapolated. If you want to handle extrapolation, I'd suggest direct computation of the KDE for that point (using the bandwidth from the KDE you have).
This is one year old, but nevertheless, here is a complete solution. Let's call
d <- density(xs)
and define h = d$bw. Your KDE estimation is completely determined by
the elements of xs,
the bandwidth h,
the type of kernel functions.
Given a new value t, you can compute the corresponding y(t), using the following function, which assumes you have used Gaussian kernels for estimation.
myKDE <- function(t){
kernelValues <- rep(0,length(xs))
for(i in 1:length(xs)){
transformed = (t - xs[i]) / h
kernelValues[i] <- dnorm(transformed, mean = 0, sd = 1) / h
}
return(sum(kernelValues) / length(xs))
}
What myKDE does is it computes y(t) by the definition.
See: docs
dnorm(data_point, its_mean, its_stdev)

Function to find Negative binomial distribution in Julia

I am using below code to find Negative binomial distribution in R
dnbinom(n11, size=p[1], prob=p[2]/(p[2]+E))
where dnbinom is the function used for finding Negative binomial distribution
n11 & E are vector of integer.
Now i want to run the same code in Julia, which function should i have to use inplace of dnbinom
The function must have arguments as (x,size,prob)
where x = vector of probabilities.
size = target for number of successful trials, or dispersion parameter (the shape parameter of the gamma mixing distribution). Must be strictly positive, need not be integer.
prob = probability of success in each trial. 0 < prob <= 1.
Below is My full Code(Updated as per answers given, but still not working)
using Distributions
data = query("Select count_a,EXP_COUNT from SM_STAT_ALGO_LOCALTRADE_SOC;")
f([0.2,0.06,1.4,1.8,0.1],data[:,1],data[:,2])
function f(x::Vector,n11,E)
return sum(-log(x[5] * pdf(NegativeBinomial(x[1], x[2]/(x[2]+E), n11)) + (1-x[5]) * pdf(NegativeBinomial(x[3], x[4]/(x[4]+E),n11))))
end
Assuming that you want the probabilities of a vector of outcomes, you can do
using Distributions
function dnbinom(x, size, prob)
dist = NegativeBinomial(size,prob)
map(y->pdf(dist,y), x)
end
#show dnbinom([3,5], 10, 0.1)
To get the equivilaent of dbinom in R
dnbinom(1, 1, 0.5)
# [1] 0.25
you can use
using Distributions
pdf(NegativeBinomial(), 1)
# 0.25000000000000006
in julia.

Fitting an inverse function

I have a function which looks like:
g(x) = f(x) - a^b / f(x)^b
g(x) - known function, data vector provided.
f(x) - hidden process.
a,b - parameters of this function.
From the above we get the relation:
f(x) = inverse(g(x))
My goal is to optimize parameters a and b such that f(x) would be as close as possible
to a normal distribution. If we look on a f(x) Q-Q normal plot (attached), my purpose is to minimize the distance between f(x) to the straight line which represents the normal distribution, by optimizing parameters a and b.
I wrote the below code:
g_fun <- function(x) {x - a^b/x^b}
inverse = function (f, lower = 0, upper = 2000) {
function (y) uniroot((function (x) f(x) - y), lower = lower, upper = upper)[1]
}
f_func = inverse(function(x) g_fun(x))
enter code here
# let's made up an example
# g(x) values are known
g <- c(-0.016339, 0.029646, -0.0255258, 0.003352, -0.053258, -0.018971, 0.005172,
0.067114, 0.026415, 0.051062)
# Calculate f(x) by using the inverse of g(x), when a=a0 and b=b0
for (i in 1:10) {
f[i] <- f_fun(g[i])
}
I have two question:
How to pass parameters a and b to the functions?
How to perform this optimization task, meaning find a and b such that f(x) would approximate normal distribution.
Not sure how you were able to produce the Q-Q plot since your provided examples do not work. You are not specifying the values of a and b and you are defining f_func but calling f_fun. Anyway here is my answer to your questions:
How to pass parameters a and b to the functions? - Just pass them as
arguments to the functions.
How to perform this optimization task, meaning find a and b such that f(x) would approximate normal distribution? - The same way any optimization task is done. Define a cost function, then minimize it.
Here is the revised code: I have added a and b as parameters, removed the inverse function and incorporated it inside f_func, which can now take vector input so no need for a for loop.
g_fun <- function(x,a,b) {x - a^b/x^b}
f_func = function(y,a,b,lower = 0, upper = 2000){
sapply(y,function(z) { uniroot(function(x) g_fun(x,a,b) - z, lower = lower, upper = upper)$root})
}
# g(x) values are known
g <- c(-0.016339, 0.029646, -0.0255258, 0.003352, -0.053258, -0.018971, 0.005172,
0.067114, 0.026415, 0.051062)
f <- f_func(g,1,1) # using a = 1 and b = 1
#[1] 0.9918427 1.0149329 0.9873386 1.0016774 0.9737270 0.9905320 1.0025893
#[8] 1.0341199 1.0132947 1.0258569
f_func(g,2,10)
[1] 1.876408 1.880554 1.875578 1.878138 1.873094 1.876170 1.878304 1.884049
[9] 1.880256 1.882544
Now for the optimization part, it depends on what you mean by f(x) would approximate normal distribution. You can compare mean square error from the qq-line if you want. Also since you say approximate, how close is good enough? You can go with shapiro.test and keep searching till you find p-value below 0.05 (be ware that there may not be a solution)
shapiro.test(f_func(g,1,2))$p
[1] 0.9484821
cost <- function(x,y) shapiro.test(f_func(g,x,y))$p
Now that we have a cost function how do we go about minimizing it. There are many many different ways to do numerical optimization. Take a look at optim function http://stat.ethz.ch/R-manual/R-patched/library/stats/html/optim.html.
optim(c(1,1),cost)
This final line does not work, but without proper data and context this is as far as I can go. Hope this helps.

Resources