This is my first post so I apologize for any formatting issues.
I'm trying to calculate the expected value of a collection of numbers in Julia, given a probability distribution that is the mixture of two Beta distributions. Using the following code gives the error seen below
using Distribution, Expectations, Statistics
d = MixtureModel([Beta(1,1),Beta(3,1.6)],[0.5,0.5])
E = expectation(d)
E*rand(32,1)
MethodError: no method matching *(::MixtureExpectation{Vector{IterableExpectation{Vector{Float64}, Vector{Float64}}}, Vector{Float64}}, ::Matrix{Float64})
If I use just a single Beta distribution, the above syntax works fine:
d = Beta(1,1)
E = expectation(d)
E*rand(32,1)
Out = 0.503
And if I use function notation in the expectation, I can calculate expectations of functions using the Mixture model as well.
d = MixtureModel([Beta(1,1),Beta(3,1.6)],[0.5,0.5])
E = expectation(d)
E(x -> x^2)
It just seems to not work when using the dot-notation shown above.
Single distribution yields IterableExpectation that allows multiplication over an array, while mixture distribution yields MixtureExpectation that allows multiplications only over a scalar. You can run typeof(E) to check the type in your code.
julia> methodswith(IterableExpectation)
[1] *(r::Real, e::IterableExpectation) in Expectations at C:\JuliaPkg\Julia-1.8.0\packages\Expectations\hZ5Gh\src\iterable.jl:53
[2] *(e::IterableExpectation, h::AbstractArray) in Expectations at C:\JuliaPkg\Julia-1.8.0\packages\Expectations\hZ5Gh\src\iterable.jl:44
...
julia> methodswith(MixtureExpectation)
[1] *(r::Real, e::MixtureExpectation) in Expectations at C:\JuliaPkg\Julia-1.8.0\packages\Expectations\hZ5Gh\src\mixturemodels.jl:15
...
Related
I am trying to solve an economic problem using the sympy package in Julia. In this economic problem I have exogenous variables and endogenous variables and I am indexing them all. I have two questions:
How to access the indexed variables to pass: calibrated values ( to exogenous variables, calibrated in other enveiroment) or formula (to endogenous variables, determined by the first order conditions of the agents' maximalization problem using pencil and paper). This will also allow me to study the behavior of equilibrium when I disturb exogenous variables. First, consider my attempto to pass calibrated values on exogenous variables.
using SymPy
# To index
n,N = sympy.symbols("n N", integer=True)
N = 3 # It can change
# Household
#exogenous variables
α = sympy.IndexedBase("α")
#syms γ
α2 = sympy.Sum(α[n], (n, 1, N))
equation_1 = Eq(α2 + γ, 1)
The equation_1 says that the alpha's plus gamma sums one. So I would like to pass values to the α vector according to another vector, alpha3, with calibrated parameters.
# Suposse
alpha3 = [1,2,3]
for n in 1:N
α[n]= alpha3[n]
end
MethodError: no method matching setindex!(::Sym, ::Int64, ::Int64)
I will certainly do this step once the system is solved. Now, I want to pass formulas or expressions as a function of prices. Prices are endogenous and unknown variables. (As said before, the expressions were calculated using paper and pencil)
# Price vector, Endogenous, unknown in the system equations
P = sympy.IndexedBase("P")
# Other exogenous variables to be calibrated.
z = sympy.IndexedBase("z")
s = sympy.IndexedBase("s")
Y = sympy.IndexedBase("Y")
# S[n] and D[n], Supply and Demand, are endogenous, but determined by the first order conditions of the maximalization problem of the agents
# Supply and Demand
S = sympy.IndexedBase("S")
D = sympy.IndexedBase("D")
# (Hypothetical functions that I have to pass)
# S[n] = s[n]*P[n]
# D[n] = z[n]/P[n]
Once I can write the formulas on S[n] and D[n], consider the second question:
How to specify the endogenous variables indexed (All prices in their indexed format P[n]) as being unknown in the system of non-linear equations? I will ignore the possibility of not solving my system. Suppose my system has a single solution or infinite (manifold). So let's assume that I have more equations than variables:
# For all n, I want determine N indexed equations (looping?)
Eq_n = Eq(S[n] - D[n],0)
# Some other equations relating the P[n]'s
Eq0 = Eq(sympy.Sum(P[n]*Y[n] , (n, 1, N)), 0 )
# Equations system
eq_system = [Eq_n,Eq0]
# Solving
solveset(eq_system,P[n])
Many thanks
There isn't any direct support for the IndexedBase feature of SymPy. As such, the syntax alpha[n] is not available. You can call the method __getitem__ directly, as with
alpha.__getitem__[n]
I don't see a corresponding __setitem__ documented, so I'm not sure whether
α[n]= alpha3[n]
is valid in sympy itself. But if there is some other assignment method, you would likely just call that instead of the using [ for assignment.
As for the last question about equations, I'm not sure but you would presumably find the size of the IndexedBase object and use that to loop.
If possible, using native julia constructs would be preferred, as possible. For this example, you might just consider an array of variables. The recently changed #syms macro makes this easy to generate.
For example, I think the following mostly replicates what you are trying to do:
#syms n::integer, N::integer
#exogenous variables
N = 3
#syms α[1:3] # hard code 3 here or use `α =[Sym("αᵢ$i") for i ∈ 1:N]`
#syms γ
α2 = sum(α[i] for i ∈ 1:N)
equation_1 = Eq(α2 + γ, 1)
alpha3 = [1,2,3]
for n in 1:N
α[n]= alpha3[n]
end
#syms P[1:3], z[1:3], s[1:3], γ[1:3], S[1:3], D[1:3]
Eq_n = [Eq(S[n], D[n]) for n ∈ 1:N]
Eq0 = Eq(sum(P .* Y), 0)
eq_system = [Eq_n,Eq0]
solveset(eq_system,P[n])
Analogous to the example given in GSL.jl/examples/Quadrature.jl I am trying to integrate a function. However, since this function has a singularity, I need to use the cauchy weight. My idea was to use the following code
using GSL
function Q(p)
ws_size = 200
ws = GSL.integration_workspace_alloc(ws_size)
f_ = x -> 1/(x+p)
f = GSL.#gsl_function(f_)
result = Cdouble[0][1]
epsrel = 1e-10
epsabs = 1e-10
abserr = Cdouble[0][1]
limit = Csize_t[0][1]
result = integration_qawc(f, 0., 1.e4, p, epsabs,epsrel,limit,ws,result,abserr)
GSL.integration_workspace_free(ws)
return result
end
However, I get the following error
UndefVarError: f_ not defined
Stacktrace:
[1] (::getfield(Main, Symbol("##117#118")))(::Float64, ::Ptr{Nothing}) at /home/varantir/.julia/packages/GSL/IVE5m /src/manual_wrappers.jl:45
[2] integration_qawc at /home/varantir/.julia/packages/GSL/IVE5m/src/gen/direct_wrappers/gsl_integration_h.jl:570 [inlined]
[3] Q(::Float64) at ./In[250]:14
[4] top-level scope at In[251]:1
Which seems a little bit strange to me, since I clearly have defined f_. Any ideas?
for a weird reason, this doesn't throw errors but throws 0:
function Q(p)
ws_size = 200
ws = GSL.integration_workspace_alloc(ws_size)
f0(x::Float64)::Float64 = 1/(x+p)
f = GSL.#gsl_function(f0)
result = Cdouble[0][1]
epsrel = 1e-10
epsabs = 1e-10
abserr = Cdouble[0][1]
limit = Csize_t[0][1]
result = integration_qawc(f, 0., 1.e4, p, epsabs,epsrel,limit,ws,result,abserr)
GSL.integration_workspace_free(ws)
return result
end
From the docs of integration_qawc:
The adaptive bisection algorithm of QAG is used, with modifications to ensure that subdivisions do not occur at the singular point x = c. When a subinterval contains the point x = c or is close to it then a special 25-point modified Clenshaw-Curtis rule is used to control the singularity. Further away from the singularity the algorithm uses an ordinary 15-point Gauss-Kronrod integration rule.
Using an alternative, using QuadGK.jl:
using QuadGK
function G2(p)
f(x)=1/(x+p)
a = 0.0
b = 1e4
if a<-p<b
res, err = quadgk(f,a,-p,b,rtol=1e-10,atol=1e-10)
return res
else
res, err = quadgk(f,a,b,rtol=1e-10,atol=1e-10)
return res
end
end
from the QuadGK docs:
The algorithm is an adaptive Gauss-Kronrod integration technique: the integral in each interval is estimated using a Kronrod rule (2*order+1 points) and the error is estimated using an embedded Gauss rule (order points). The interval with the largest error is then subdivided into two intervals and the process is repeated until the desired error tolerance is achieved.
These quadrature rules work best for smooth functions within each interval, so if your function has a known discontinuity or other singularity, it is best to subdivide your interval to put the singularity at an endpoint. For example, if f has a discontinuity at x=0.7 and you want to integrate from 0 to 1, you should use quadgk(f, 0,0.7,1) to subdivide the interval at the point of discontinuity. The integrand is never evaluated exactly at the endpoints of the intervals, so it is possible to integrate functions that diverge at the endpoints as long as the singularity is integrable (for example, a log(x) or 1/sqrt(x) singularity).
The default order is 7, so is equivalent to the GSL integration.
I am having the hardest time trying to implement this equation into a nonlinear solver in R. I am trying both the nleqslv and BB packages but so far getting nothing but errors. I have searched and read documentation until my eyes have bled, but I cannot wrap my brain around it. The equation itself works like this:
The Equation
s2 * sum(price^(2*x+2)) - s2.bar * sum(price^(2*x)) = 0
Where s2, s2.bar and price are known vectors of equal length.
The last attempt I tried in BB was this:
gamma = function(x){
n = len(x)
f = numeric(n)
f[n] = s2*sum(price^(2*x[n]+2)) - s2.bar*sum(price^(2*x[n]))
f
}
g0 = rnorm(length(price))
results = BBsolve(par=g0, fn=gamma)
From you description of the various parts used in the function you seem to have muddled up the formula.
Your function gamma should most probably be written as
gamma <- function(x){
f <- s2*sum(price^(2*x+2)) - s2.bar*sum(price^(2*x))
f
}
s2, price and s2.bar are vectors from your description so the formula you gave will return a vector.
Since you have not given any data we cannot test. I have tried testing with randomly generated values for s2, price, s2.bar. Sometimes one gets a solution with both nleqslv and BB but not always.
In the case of package nleqslv the default method will not always work.
Since the package has different methods you should use the function testnslv from the package to see if any of the provided methods does find a solution.
I am teaching myself how to run some Markov models in R, by working through the textbook "Hidden Markov Models for Time Series: An Introduction using R". I am a bit stuck on how to go about implementing something that is mentioned in the text.
So, I have the following function:
f <- function(samples,lambda,delta) -sum(log(outer(samples,lambda,dpois)%*%delta))
Which I can optimize with respect to, say, lambda using:
optim(par, fn=f, samples=x, delta=d)
where "par" is the initial guess for lambda, for some x and d.
Which works perfectly fine. However, in the part of the text corresponding to the example I am trying to reproduce, they note: "The parameters delta and lambda are constrained by sum(delta_i)=1 for i=1,...m, delta_i>0, and lambda_i>0. It is therefore necessary to reparametrize if one wishes to use an unconstrained optimizer such as nlm". One possibility is to maximize the likelihood with respect to the 2m-1 unconstrained parameters".
The unconstrained parameters are given by
eta<-log(lambda)
tau<-log(delta/(1-sum(delta))
I don't entirely understand how to go about implementing this. How would I write a function to optimize over this transformed parameter space?
When using optim() without parmater transfromations like so:
simpleFun <- function(x)
(x-3)^2
out = optim(par=5,
fn=simpleFun)
the set of parmaters estimates would be obtained via out$par which is 3 in
the case, as you might expect. Alternatively, you can wrap your function
f in a transformation the parameters like so:
out = optim(par=5,
fn=function(x)
# apply the transformation x -> x^3
simpleFun(x^3))
and now the trick to get the correct set of optimal parmeters to your
function you need to apply the same transfromation to the parameter
estimates as in:
(out$par)^3
#> 2.99741
(and yes, the parameter estimate is slightly different. For this contrived
example, you could set method="BFGS" for a slightly better estimate. Anyhow, this goes to show that the choice of transformation does matter in
some cases, but that's for another discussion...)
To complete the answer, It sounds like you a want to use a wrapper like so
# the function to be optimized
f <- function(samples,lambda,delta)
-sum(log(outer(samples,lambda,dpois)%*%delta))
out <- optim(# par it now a 2m vector
par = c(eta1 = 1,
eta2 = 1,
eta3 = 1,
tau1 = 1,
tau2 = 1,
tau3 = 1),
# a wrapper that applies the constraints
fn=function(x,samples){
# exp() guarantees that the values of lambda are > 0
lambda = exp(x[1:3])
# delta is also > 0
delta = exp(x[4:6])
# and now it sums to 1
delta = delta / sum(delta)
f(samples,lambda,delta)
},
samples=samples)
The above guarantees that the the parameters passed to f()have the correct constraints, and as long as you apply the same transformation to out$par, optim() will estimate an optimal set of parameters for f().
I'm trying to learn a little Julia by doing some bayesian analysis. In Peter Hoff's textbook, he describes a process of sampling from a posterior predictive distribution of a Poisson-Gamma model in which he:
Samples values from the gamma distribution
Samples values from the poisson distribution, passing a vector of lambdas
Here is what this looks like in R:
a <- 2
b <- 1
sy1 <- 217; n1 <- 111
theta1.mc <- rgamma(1000, a+sy1, b+n1)
y1.mc <- rpois(1000, theta1.mc)
In Julia, I see that distributions can't take a vector of parameters. So, I end up doing something like this:
using Distributions
a = 2
b = 1
sy1 = 217; n1 = 111
theta_mc = rand(Gamma(a+217, 1/(b+n1)), 5000)
y1_mc = map(x -> rand(Poisson(x)), theta_mc)
While I was initially put off at the distribution function not taking a vector and working Just Like R™, I like that I'm not needing to set my number of samples more than once. That said, I'm not sure I'm doing this idiomatically, either in terms of how people would work with the distributions package, or more generically how to compose functions.
Can anyone suggest a better, more idiomatic approach than my example code?
I would usually do something like the following, which uses list comprehensions:
a, b = 2, 1
sy1, n1 = 217, 111
theta_mc = rand(Gamma(a + sy1, 1 / (b + n1)), 1000)
y1_mc = [rand(Poisson(theta)) for theta in theta_mc]
One source of confusion may be that Poisson isn't really a function, it's a type constructor and it returns an object. So vectorization over theta doesn't really make sense, since that wouldn't construct one object, but many -- which would then require another step to call rand on each generated object.