Related
The renewal function for Weibull distribution m(t) with t = 10 is given as below.
I want to find the value of m(t). I wrote the following r code to compute m(t)
last_term = NULL
gamma_k = NULL
n = 50
for(k in 1:n){
gamma_k[k] = gamma(2*k + 1)/factorial(k)
}
for(j in 1: (n-1)){
prev = gamma_k[n-j]
last_term[j] = gamma(2*j + 1)/factorial(j)*prev
}
final_term = NULL
find_value = function(n){
for(i in 2:n){
final_term[i] = gamma_k[i] - sum(last_term[1:(i-1)])
}
return(final_term)
}
all_k = find_value(n)
af_sum = NULL
m_t = function(t){
for(k in 1:n){
af_sum[k] = (-1)^(k-1) * all_k[k] * t^(2*k)/gamma(2*k + 1)
}
return(sum(na.omit(af_sum)))
}
m_t(20)
The output is m(t) = 2.670408e+93. Does my iteratvie procedure correct? Thanks.
I don't think it will work. First, lets move Γ(2k+1) from denominator of m(t) into Ak. Thus, Ak will behave roughly as 1/k!.
In the nominator of the m(t) terms there is t2k, so roughly speaking you're computing sum with terms
100k/k!
From Stirling formula
k! ~ kk, making terms
(100/k)k
so yes, they will start to decrease and converge to something but after 100th term
Anyway, here is the code, you could try to improve it, but it breaks at k~70
N <- 20
A <- rep(0, N)
# compute A_k/gamma(2k+1) terms
ps <- 0.0 # previous sum
A[1] = 1.0
for(k in 2:N) {
ps <- ps + A[k-1]*gamma(2*(k-1) + 1)/factorial(k-1)
A[k] <- 1.0/factorial(k) - ps/gamma(2*k+1)
}
print(A)
t <- 10.0
t2 <- t*t
r <- 0.0
for(k in 1:N){
r <- r + (-t2)^k*A[k]
}
print(-r)
UPDATE
Ok, I calculated Ak as in your question, got the same answer. I want to estimate terms Ak/Γ(2k+1) from m(t), I believe it will be pretty much dominated by 1/k! term. To do that I made another array k!*Ak/Γ(2k+1), and it should be close to one.
Code
N <- 20
A <- rep(0.0, N)
psum <- function( pA, k ) {
ps <- 0.0
if (k >= 2) {
jmax <- k - 1
for(j in 1:jmax) {
ps <- ps + (gamma(2*j+1)/factorial(j))*pA[k-j]
}
}
ps
}
# compute A_k/gamma(2k+1) terms
A[1] = gamma(3)
for(k in 2:N) {
A[k] <- gamma(2*k+1)/factorial(k) - psum(A, k)
}
print(A)
B <- rep(0.0, N)
for(k in 1:N) {
B[k] <- (A[k]/gamma(2*k+1))*factorial(k)
}
print(B)
shows that
I got the same Ak values as you did.
Bk is indeed very close to 1
It means that term Ak/Γ(2k+1) could be replaced by 1/k! to get quick estimate of what we might get (with replacement)
m(t) ~= - Sum(k=1, k=Infinity) (-1)k (t2)k / k! = 1 - Sum(k=0, k=Infinity) (-t2)k / k!
This is actually well-known sum and it is equal to exp() with negative argument (well, you have to add term for k=0)
m(t) ~= 1 - exp(-t2)
Conclusions
Approximate value is positive. Probably will stay positive after all, Ak/Γ(2k+1) is a bit different from 1/k!.
We're talking about 1 - exp(-100), which is 1-3.72*10-44! And we're trying to compute it precisely summing and subtracting values on the order of 10100 or even higher. Even with MPFR I don't think this is possible.
Another approach is needed
OK, so I ended up going down a pretty different road on this. I have implemented a simple discretization of the integral equation which defines the renewal function:
m(t) = F(t) + integrate (m(t - s)*f(s), s, 0, t)
The integral is approximated with the rectangle rule. Approximating the integral for different values of t gives a system of linear equations. I wrote a function to generate the equations and extract a matrix of coefficients from it. After looking at some examples, I guessed a rule to define the coefficients directly and used that to generate solutions for some examples. In particular I tried shape = 2, t = 10, as in OP's example, with step = 0.1 (so 101 equations).
I found that the result agrees pretty well with an approximate result which I found in a paper (Baxter et al., cited in the code). Since the renewal function is the expected number of events, for large t it is approximately equal to t/mu where mu is the mean time between events; this is a handy way to know if we're anywhere in the neighborhood.
I was working with Maxima (http://maxima.sourceforge.net), which is not efficient for numerical stuff, but which makes it very easy to experiment with different aspects. At this point it would be straightforward to port the final, numerical stuff to another language such as Python.
Thanks to OP for suggesting the problem, and S. Pappadeux for insightful discussions. Here is the plot I got comparing the discretized approximation (red) with the approximation for large t (blue). Trying some examples with different step sizes, I saw that the values tend to increase a little as step size gets smaller, so I think the red line is probably a little low, and the blue line might be more nearly correct.
Here is my Maxima code:
/* discretize weibull renewal function and formulate system of linear equations
* copyright 2020 by Robert Dodier
* I release this work under terms of the GNU General Public License
*
* This is a program for Maxima, a computer algebra system.
* http://maxima.sourceforge.net/
*/
"Definition of the renewal function m(t):" $
renewal_eq: m(t) = F(t) + 'integrate (m(t - s)*f(s), s, 0, t);
"Approximate integral equation with rectangle rule:" $
discretize_renewal (delta_t, k) :=
if equal(k, 0)
then m(0) = F(0)
else m(k*delta_t) = F(k*delta_t)
+ m(k*delta_t)*f(0)*(delta_t / 2)
+ sum (m((k - j)*delta_t)*f(j*delta_t)*delta_t, j, 1, k - 1)
+ m(0)*f(k*delta_t)*(delta_t / 2);
make_eqs (n, delta_t) :=
makelist (discretize_renewal (delta_t, k), k, 0, n);
make_vars (n, delta_t) :=
makelist (m(k*delta_t), k, 0, n);
"Discretized integral equation and variables for n = 4, delta_t = 1/2:" $
make_eqs (4, 1/2);
make_vars (4, 1/2);
make_eqs_vars (n, delta_t) :=
[make_eqs (n, delta_t), make_vars (n, delta_t)];
load (distrib);
subst_pdf_cdf (shape, scale, e) :=
subst ([f = lambda ([x], pdf_weibull (x, shape, scale)), F = lambda ([x], cdf_weibull (x, shape, scale))], e);
matrix_from (eqs, vars) :=
(augcoefmatrix (eqs, vars),
[submatrix (%%, length(%%) + 1), - col (%%, length(%%) + 1)]);
"Subsitute Weibull pdf and cdf for shape = 2 into discretized equation:" $
apply (matrix_from, make_eqs_vars (4, 1/2));
subst_pdf_cdf (2, 1, %);
"Just the right-hand side matrix:" $
rhs_matrix_from (eqs, vars) :=
(map (rhs, eqs),
augcoefmatrix (%%, vars),
[submatrix (%%, length(%%) + 1), col (%%, length(%%) + 1)]);
"Generate the right-hand side matrix, instead of extracting it from equations:" $
generate_rhs_matrix (n, delta_t) :=
[delta_t * genmatrix (lambda ([i, j], if i = 1 and j = 1 then 0
elseif j > i then 0
elseif j = i then f(0)/2
elseif j = 1 then f(delta_t*(i - 1))/2
else f(delta_t*(i - j))), n + 1, n + 1),
transpose (makelist (F(k*delta_t), k, 0, n))];
"Generate numerical right-hand side matrix, skipping over formulas:" $
generate_rhs_matrix_numerical (shape, scale, n, delta_t) :=
block ([f, F, numer: true], local (f, F),
f: lambda ([x], pdf_weibull (x, shape, scale)),
F: lambda ([x], cdf_weibull (x, shape, scale)),
[genmatrix (lambda ([i, j], delta_t * if i = 1 and j = 1 then 0
elseif j > i then 0
elseif j = i then f(0)/2
elseif j = 1 then f(delta_t*(i - 1))/2
else f(delta_t*(i - j))), n + 1, n + 1),
transpose (makelist (F(k*delta_t), k, 0, n))]);
"Solve approximate integral equation (shape = 3, t = 1) via LU decomposition:" $
fpprintprec: 4 $
n: 20 $
t: 1;
[AA, bb]: generate_rhs_matrix_numerical (3, 1, n, t/n);
xx_by_lu: linsolve_by_lu (ident(n + 1) - AA, bb, floatfield);
"Iterative solution of approximate integral equation (shape = 3, t = 1):" $
xx: bb;
for i thru 10 do xx: AA . xx + bb;
xx - (AA.xx + bb);
xx_iterative: xx;
"Should find iterative and LU give same result:" $
xx_diff: xx_iterative - xx_by_lu[1];
sqrt (transpose(xx_diff) . xx_diff);
"Try shape = 2, t = 10:" $
n: 100 $
t: 10 $
[AA, bb]: generate_rhs_matrix_numerical (2, 1, n, t/n);
xx_by_lu: linsolve_by_lu (ident(n + 1) - AA, bb, floatfield);
"Baxter, et al., Eq. 3 (for large values of t) compared to discretization:" $
/* L.A. Baxter, E.M. Scheuer, D.J. McConalogue, W.R. Blischke.
* "On the Tabulation of the Renewal Function,"
* Econometrics, vol. 24, no. 2 (May 1982).
* H(t) is their notation for the renewal function.
*/
H(t) := t/mu + sigma^2/(2*mu^2) - 1/2;
tx_points: makelist ([float (k/n*t), xx_by_lu[1][k, 1]], k, 1, n);
plot2d ([H(u), [discrete, tx_points]], [u, 0, t]), mu = mean_weibull(2, 1), sigma = std_weibull(2, 1);
https://en.wikipedia.org/wiki/Superellipse
I have read the SO questions on how to point-pick from a circle and an ellipse.
How would one uniformly select random points from the interior of a super-ellipse?
More generally, how would one uniformly select random points from the interior of the curve described by an arbitrary super-formula?
https://en.wikipedia.org/wiki/Superformula
The discarding method is not considered a solution, as it is mathematically unenlightening.
In order to sample the superellipse, let's assume without loss of generality that a = b = 1. The general case can be then obtained by rescaling the corresponding axis.
The points in the first quadrant (positive x-coordinate and positive y-coordinate) can be then parametrized as:
x = r * ( cos(t) )^(2/n)
y = r * ( sin(t) )^(2/n)
with 0 <= r <= 1 and 0 <= t <= pi/2:
Now, we need to sample in r, t so that the sampling transformed into x, y is uniform. To this end, let's calculate the Jacobian of this transform:
dx*dy = (2/n) * r * (sin(2*t)/2)^(2/n - 1) dr*dt
= (1/n) * d(r^2) * d(f(t))
Here, we see that as for the variable r, it is sufficient to sample uniformly the value of r^2 and then transform back with a square root. The dependency on t is a bit more complicated. However, with some effort, one gets
f(t) = -(n/2) * 2F1(1/n, (n-1)/n, 1 + 1/n, cos(t)^2) * cos(t)^(2/n)
where 2F1 is the hypergeometric function.
In order to obtain uniform sampling in x,y, we need now to sample uniformly the range of f(t) for t in [0, pi/2] and then find the t which corresponds to this sampled value, i.e., to solve for t the equation u = f(t) where u is a uniform random variable sampled from [f(0), f(pi/2)]. This is essentially the same method as for r, nevertheless in that case one can calculate the inverse directly.
One small issue with this approach is that the function f is not that well-behaved near zero - the infinite slope makes it quite challenging to find a root of u = f(t). To circumvent this, we can sample only the "upper part" of the first quadrant (i.e., area between lines x=y and x=0) and then obtain all the other points by symmetry (not only in the first quadrant but also for all the other ones).
An implementation of this method in Python could look like:
import numpy as np
from numpy.random import uniform, randint, seed
from scipy.optimize import brenth, ridder, bisect, newton
from scipy.special import gamma, hyp2f1
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
seed(100)
def superellipse_area(n):
#https://en.wikipedia.org/wiki/Superellipse#Mathematical_properties
inv_n = 1. / n
return 4 * ( gamma(1 + inv_n)**2 ) / gamma(1 + 2*inv_n)
def sample_superellipse(n, num_of_points = 2000):
def f(n, x):
inv_n = 1. / n
return -(n/2)*hyp2f1(inv_n, 1 - inv_n, 1 + inv_n, x)*(x**inv_n)
lb = f(n, 0.5)
ub = f(n, 0.0)
points = [None for idx in range(num_of_points)]
for idx in range(num_of_points):
r = np.sqrt(uniform())
v = uniform(lb, ub)
w = bisect(lambda w: f(n, w**n) - v, 0.0, 0.5**(1/n))
z = w**n
x = r * z**(1/n)
y = r * (1 - z)**(1/n)
if uniform(-1, 1) < 0:
y, x = x, y
x = (2*randint(0, 2) - 1)*x
y = (2*randint(0, 2) - 1)*y
points[idx] = [x, y]
return points
def plot_superellipse(ax, n, points):
coords_x = [p[0] for p in points]
coords_y = [p[1] for p in points]
ax.set_xlim(-1.25, 1.25)
ax.set_ylim(-1.25, 1.25)
ax.text(-1.1, 1, '{n:.1f}'.format(n = n), fontsize = 12)
ax.scatter(coords_x, coords_y, s = 0.6)
params = np.array([[0.5, 1], [2, 4]])
fig = plt.figure(figsize = (6, 6))
gs = gridspec.GridSpec(*params.shape, wspace = 1/32., hspace = 1/32.)
n_rows, n_cols = params.shape
for i in range(n_rows):
for j in range(n_cols):
n = params[i, j]
ax = plt.subplot(gs[i, j])
if i == n_rows-1:
ax.set_xticks([-1, 0, 1])
else:
ax.set_xticks([])
if j == 0:
ax.set_yticks([-1, 0, 1])
else:
ax.set_yticks([])
#ensure that the ellipses have similar point density
num_of_points = int(superellipse_area(n) / superellipse_area(2) * 4000)
points = sample_superellipse(n, num_of_points)
plot_superellipse(ax, n, points)
fig.savefig('fig.png')
This produces:
I'm looking at using Sobol sequences as a variance reduction technique to speed up a simulation. However, the convergence of the QMC method seems very weak.
Does anyone know how or why this is the case? Is the advantage of QMC lost on these high dimensional cases? I get similar results for the Brownian bridge.
Code below attempts to simulate in Julia the following
$$ \mathbb{E} \int B_t dt $$
where $B_t$ is a standard Brownian motion. The true value is $\frac{1}{2}T^2$.
using PyPlot
import Distributions: Normal
import PyPlot: plt
using Sobol
function pseudo_path_variation(dt, T, M)
# The pseudo random approach
N = round(Int, T/dt);
Z = zeros(N, M);
d = Normal(0, 1.0);
sum = 0.0;
for j in 1:M, i in 1:N-1
Z[i+1, j] = Z[i, j] + sqrt(dt)*rand(d, 1)[1];
end
# Calculate sum
return sumabs2(Z)*dt/M;
end
function quasi_path_variation(dt, T, M)
# An attempt at the above using a N dimensional sobol sequence
N = round(Int, T/dt);
Z = zeros(N, M);
d = Normal(0, 1.0);
sum = 0.0;
# The N dimensional sobol sequence
s = SobolSeq(N);
# Burn in the sequence
for i in 1:10
next(s);
end
for j in 1:M
B = next(s);
for i in 1:N-1
Z[i+1, j] = Z[i, j] + sqrt(dt)*quantile(d, B[i]);
end
end
# Calculate sum
return sumabs2(Z)*dt/M;
end
dt = 0.5;
T = 10;
M = 1000;
estims = zeros(M);
for N = 1:M-1
estims[N+1] = quasi_path_variation(dt, T, N)
end
p = plot(linspace(0, M, M), estims);
estims = zeros(M);
for N = 1:M-1
estims[N+1] = pseudo_path_variation(dt, T, N)
end
p = plot(linspace(0, M, M), estims);
This is a comment, not an answer, but my SO score is not high enough for comments.
Timing with tic()..toc() - completed in 3 seconds.
tic()
dt = 0.5;
T = 10;
...
...
p = plot(linspace(0, M, M), estims);
toc()
elapsed time: 2.928828918 seconds
I am trying to estimate the following model
where I provide uniform priors and I code the likelihood . The latter comes from this paper and goes as follows:
In the theano/pymc3 implementation I am computing the first and second term on the rhs in first_term and second_term. Finally logp sum over the entire sample.
Theano, on his own is producing some output yet when I integrate it in a pymc3 model the following error appears:
TypeError: ('Bad input argument to theano function with name "<ipython-input-90-a5304bf41c50>:27" at index 0(0-based)', 'Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?')
I think the problem is how to supply pymc3 variables to theano.
from pymc3 import Model, Uniform, DensityDist
import theano.tensor as T
import theano
import numpy as np
p_test, theta_test = .1, .1
X = np.asarray([[1,2,3],[1,2,3]])
### theano test
theano.config.compute_test_value = 'off'
obss = T.matrix('obss')
p, theta = T.scalar('p'), T.scalar('theta')
def first_term(obs, p, theta):
x, tx, n = obs[0], obs[1], obs[2]
first_comp = p ** x * (1 - p) ** (n - x) * (1 - theta) ** n
return(first_comp)
def second_term(obs, p, theta):
x, tx, n = obs[0], obs[1], obs[2]
components, updates = theano.scan(
lambda t, p, theta, x, tx: p ** x * (1 - theta) ** (tx-x+t) * theta * (1 - theta) ** (tx + t),
sequences=theano.tensor.arange(n), non_sequences = [p, theta, x, tx]
)
return(components)
def logp(X, p_hat, theta_hat):
contributions, updates = theano.scan(lambda obs, p, theta: first_term(obs, p, theta) + T.sum( second_term(obs, p, theta) ) ,
sequences = obss, non_sequences = [p, theta]
)
ll = contributions.sum()
get_ll = theano.function(inputs = [obss, p, theta], outputs = ll)
return(get_ll(X, p_hat , theta_hat))
print( logp( X, p_test, theta_test ) ) # It works!
### pymc3 implementation
with Model() as bg_model:
p = Uniform('p', lower = 0, upper = 1)
theta = Uniform('theta', lower = 0, upper = .2)
def first_term(obs, p, theta):
x, tx, n = obs[0], obs[1], obs[2]
first_comp = p ** x * (1 - p) ** (n - x) * (1 - theta) ** n
return(first_comp)
def second_term(obs, p, theta):
x, tx, n = obs[0], obs[1], obs[2]
components, updates = theano.scan(
lambda t, p, theta, x, tx: p ** x * (1 - theta) ** (tx-x+t) * theta * (1 - theta) ** (tx + t),
sequences=theano.tensor.arange(n), non_sequences = [p, theta, x, tx]
)
return(components)
def logp(X):
contributions, updates = theano.scan(lambda obs, p, theta: first_term(obs, p, theta) + T.sum( second_term(obs, p, theta) ) ,
sequences = obss, non_sequences = [p, theta]
)
ll = contributions.sum()
get_ll = theano.function(inputs = [obss, p, theta], outputs = ll)
return(get_ll(X, p, theta))
y = pymc3.DensityDist('y', logp, observed = X) # Nx4 y = f(f,x,tx,n | p, theta)
My first guess was modifying logp with return(get_ll(X, p.eval(), theta.eval())) but then theano complains about some mysterious p_interval missing from the graph. Any clue?
I figured it out by: i) simplifying things ii) avoiding using theano operators when coding the likelihood and iii) using the build-in wrapper Deterministic for deterministic transformations of variables (my lifetime). To speed up computation I vectorised the likelihood by writing the second term on the rhs as solution of a geometric series. Here is the code in case someone want to test it on his own lifetime application.
from pymc3 import Model, Uniform, Deterministic
import pymc3
from scipy import optimize
import theano.tensor as T
X = array([[ 5, 64, 8, 13],[ 4, 71, 23, 41],[ 7, 70, 4, 19])
#f, n, x, tx where f stands for the frequency of the triple (n, x, tx)
class CustomDist(pymc3.distributions.Discrete):
def __init__(self, p, theta, *args, **kwargs):
super(CustomDist, self).__init__(*args, **kwargs)
self.p = p
self.theta = theta
def logp(self, X):
p = self.theta
theta = self.theta
f, n, x, tx = X[0],(X[1] + 1),X[2],(X[3] + 1) #time indexed at 0, x > n
ll = f * T.log( p ** x * (1 - p) ** (n - x) * (1 - theta) ** n +
[(1 - p) ** (tx - x) * (1 - theta) ** tx - (1 - p) ** (n - x) * (1 - theta) ** n] / (1 - (1 - p) * (1 - theta)) )
# eliminated -1 since would result in negatice ll
return(T.sum( ll ))
with Model() as bg_model:
p = Uniform('p', lower = 0, upper = 1)
theta = Uniform('theta', lower = 0, upper = 1)
like = CustomDist('like', p = p, theta = theta, observed=X.T) #likelihood
lt = pymc3.Deterministic('lt', p / theta)
# start = {'p':.5, 'theta':.1}
start = pymc3.find_MAP(fmin=optimize.fmin_powell)
step = pymc3.Slice([p, theta, lt])
trace = pymc3.sample(5000, start = start, step = [step])
pymc3.traceplot(trace[2000:])
There is no enough documentation and my math knowledge is limited.
The model
sol = solvers.qp(P=P, q=q,G=G,h=h, A=L, b=t)
pcost dcost gap pres dres
0: 6.3316e+08 6.3316e+08 7e+00 3e+00 7e-10
1: 6.3316e+08 6.3316e+08 2e+00 1e+00 3e-10
2: 6.3316e+08 6.3316e+08 2e+00 1e+00 3e-10
3: 1.3393e+10 -3.5020e+09 2e+10 9e-01 2e-10
4: 7.6898e+08 -1.7925e+10 4e+10 8e-01 2e-10
5: 2.1728e+09 -3.8363e+09 6e+09 6e-16 2e-14
6: 2.1234e+09 2.0310e+09 9e+07 2e-16 3e-16
7: 2.0908e+09 2.0899e+09 1e+06 1e-18 2e-16
8: 2.0905e+09 2.0905e+09 1e+04 7e-21 9e-17
9: 2.0905e+09 2.0905e+09 1e+02 2e-16 3e-16
Optimal solution found.
The result
{'dual infeasibility': 2.538901219845834e-16,
'dual objective': 2090476416.743256,
'dual slack': 59.256743485146764,
'gap': 95.35084344459145,
'iterations': 9,
'primal infeasibility': 2.220446049250331e-16,
'primal objective': 2090476512.0941,
'primal slack': 1.0136346108956689e-08,
'relative gap': 4.561201584523881e-08,
's': <2x1 matrix, tc='d'>,
'status': 'optimal',
'x': <2x1 matrix, tc='d'>,
'y': <1x1 matrix, tc='d'>,
'z': <2x1 matrix, tc='d'>}
How can I inteprert s,x,z which one is the result of my variable ?
If I print s and x, I always get something near to 1 and another near to 0, which seems to be infinitely incorrect.
>>> np.array(sol['x'])
>>> array([[ 9.99999990e-01],[ 1.01363461e-08]])
I think this short text from Stanford might be helpful for you. Highlighting the quoted section about the result:
Many properties about the solution can be extracted from the sol variable >
(dictionary).
In particular, the ‘status’, ‘x’, and ‘primal objective’ are probably the most >important. If status is optimal, then the latter two give the optimal solution and >its objective
value. You can access these fields as you would with a regular Python dictionary:
Also, this documentation provides a in-depth explanation. For understanding the result, please go to page 69
#full explain here
https://blog.dominodatalab.com/fitting-support-vector-machines-quadratic-programming
#full example here
import numpy as np
import pandas as pd
import cvxopt
import matplotlib
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
cvxopt.solvers.options['show_progress'] = True
if __name__ == '__main__':
x_min = 0
x_max = 5.5
y_min = 0
y_max = 2
iris = load_iris()
iris_df = pd.DataFrame(data=np.c_[iris['data'], iris['target']], columns=iris['feature_names'] + ['target'])
#Retain only 2 linearly separable classes
iris_df = iris_df[iris_df["target"].isin([0,1])]
iris_df["target"] = iris_df[["target"]].replace(0,-1)
#Select only 2 attributes
iris_df = iris_df[["petal length (cm)", "petal width (cm)", "target"]]
print(iris_df.head())
print()
X = iris_df[["petal length (cm)", "petal width (cm)"]].to_numpy()
y = iris_df[["target"]].to_numpy()
n = X.shape[0]
H = np.dot(y * X, (y * X).T)
q = np.repeat([-1.0], n)[..., None]
A = y.reshape(1, -1)
b = 0.0
G = np.negative(np.eye(n))
h = np.zeros(n)
P = cvxopt.matrix(H)
q = cvxopt.matrix(q)
G = cvxopt.matrix(G)
h = cvxopt.matrix(h)
A = cvxopt.matrix(A)
b = cvxopt.matrix(b)
solved = cvxopt.solvers.qp(P, q, G, h, A, b)
print()
alphas = np.array(solved["x"])
w = np.dot((y * alphas).T, X)[0]
S = (alphas > 1e-5).flatten()
b = np.mean(y[S] - np.dot(X[S], w.reshape(-1, 1)))
print("W:", w)
print("b:", b)
xx = np.linspace(x_min, x_max)
a = -w[0] / w[1]
yy = a * xx - (b) / w[1]
margin = 1 / np.sqrt(np.sum(w ** 2))
yy_neg = yy - np.sqrt(1 + a ** 2) * margin
yy_pos = yy + np.sqrt(1 + a ** 2) * margin
plt.figure(figsize=(8, 8))
plt.plot(xx, yy, "b-")
plt.plot(xx, yy_neg, "m--")
plt.plot(xx, yy_pos, "m--")
colors = ["steelblue", "orange"]
plt.scatter(X[:, 0],
X[:, 1],
c=y.ravel(),
alpha=0.5,
cmap=matplotlib.colors.ListedColormap(colors),
edgecolors="black")
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.title('svm line')
plt.show()