Related
I'm trying to solve this in LPSolve IDE:
/* Objective function */
min: x + y;
/* Variable bounds */
r_1: 2x = 2y;
r_2: x + y = 1.11 x y;
r_3: x >= 1;
r_4: y >= 1;
but the response I get is:
Model name: 'LPSolver' - run #1
Objective: Minimize(R0)
SUBMITTED
Model size: 4 constraints, 2 variables, 5 non-zeros.
Sets: 0 GUB, 0 SOS.
Using DUAL simplex for phase 1 and PRIMAL simplex for phase 2.
The primal and dual simplex pricing strategy set to 'Devex'.
The model is INFEASIBLE
lp_solve unsuccessful after 2 iter and a last best value of 1e+030
How come this can happen when x=1.801801802 and y=1.801801802 are possible solutions here?
How To Find The Solution
Let's do some math.
Your problem is:
min x+y
s.t. 2x = 2y
x + y = 1.11 x y
x >= 1
y >= 1
The first constraint 2x = 2y can be simplified to x=y. We now substitute throughout the problem:
min 2*x
s.t. 2*x = 1.11 x^2
x >= 1
And rearrange:
min 2*x
s.t. 1.11 x^2-2*x=0
x >= 1
From geometry we know that 1.11 x^2-2*x makes an upward-opening parabola with a minimum less than zero. Therefore, there are exactly two points. These are given by the quadratic equation: 200/111 and 0.
Only one of these satisfies the second constraint: 200/111.
Why Can't I Find This Constraint With My Solver
The easy way out is to say it's because the x^2 term (x*y before the substitution is nonlinear). But it goes a little deeper than that. Nonlinear problems can be easy to solve as long as they are convex. A convex problem is one whose constraints form a single, contiguous space such that any line drawn between two points in the space stays within the boundaries of the space.
Your problem is not convex. The constraint 1.11 x^2-2*x=0 defines an infinite number of points. No two of these points can be connected by a straight line which stays in the space defined by the constraint because that space is curved. If the constraint were instead 1.11 x^2-2*x<=0 then the space would be convex because all points could be connected with straight lines that stay in its interior.
Nonconvex problems are part of a broader class of problems called NP-Hard. This means that there is not (and perhaps cannot) be any easy way of solving the problem. We have to be smart.
Solvers that can handle mixed-integer programming (MIP/MILP) can solve many non-convex problems efficiently, as can other techniques such as genetic algorithms. But, beneath the hood, these techniques all rely on glorified guess-and-check.
So your solver fails because the problem is nonconvex and your solver is neither smart enough to use MIP to guess-and-check its way to a solution nor smart enough to use the quadratic equation.
How Then Can I Solve The Problem?
In this particular instance, we are able to use mathematics to quickly find a solution because, although the problem is nonconvex, it is part of a class of special cases. Deep thinking by mathematicians has given us a simple way of handling this class.
But consider a few generalizations of the problem:
(a) a x^3+b x^2+c x+d=0
(b) a x^4+b x^3+c x^2+d x+e =0
(c) a x^5+b x^4+c x^3+d x^2+e x+f=0
(a) has three potential solutions which must be checked (exact solutions are tricky), (b) has four (trickier), and (c) has five. The formulas for (a) and (b) are much more complex than the quadratic formula and mathematicians have shown that there is no formula for (c) that can be expressed using "elementary operations". Instead, we have to resort to glorified guess-and-check.
So the techniques we used to solve your problem don't generalize very well. This is what it means to live in the realm of the nonconvex and NP-hard, and it's a good reason to fund research in mathematics, computer science, and related fields.
I am trying to solve the following inequality constraint:
Given time-series data for N stocks, I am trying to construct a portfolio weight vector to minimize the variance of the returns.
the objective function:
min w^{T}\sum w
s.t. e_{n}^{T}w=1
\left \| w \right \|\leq C
where w is the vector of weights, \sum is the covariance matrix, e_{n}^{T} is a vector of ones, C is a constant. Where the second constraint (\left \| w \right \|) is an inequality constraint (2-norm of the weights).
I tried using the nloptr() function but it gives me an error: Incorrect algorithm supplied. I'm not sure how to select the correct algorithm and I'm also not sure if this is the right method of solving this inequality constraint.
I am also open to using other functions as long as they solve this constraint.
Here is my attempted solution:
data <- replicate(4,rnorm(100))
N <- 4
fn<-function(x) {cov.Rt<-cov(data); return(as.numeric(t(x) %*%cov.Rt%*%x))}
eqn<-function(x){one.vec<-matrix(1,ncol=N,nrow=1); return(-1+as.numeric(one.vec%*%x))}
C <- 1.5
ineq<-function(x){
z1<- t(x) %*% x
return(as.numeric(z1-C))
}
uh <-rep(C^2,N)
lb<- rep(0,N)
x0 <- rep(1,N)
local_opts <- list("algorithm"="NLOPT_LN_AUGLAG,",xtol_rel=1.0e-7)
opts <- list("algorithm"="NLOPT_LN_AUGLAG,",
"xtol_rel"=1.0e-8,local_opts=local_opts)
sol1<-nloptr(x0,eval_f=fn,eval_g_eq=eqn, eval_g_ineq=ineq,ub=uh,lb=lb,opts=opts)
This looks like a simple QP (Quadratic Programming) problem. It may be easier to use a QP solver instead of a general purpose NLP (NonLinear Programming) solver (no need for derivatives, functions etc.). R has a QP solver called quadprog. It is not totally trivial to setup a problem for quadprog, but here is a very similar portfolio example with complete R code to show how to solve this. It has the same objective (minimize risk), the same budget constraint and the lower and upper-bounds. The example just has an extra constraint that specifies a minimum required portfolio return.
Actually I misread the question: the second constraint is ||x|| <= C. I think we can express the whole model as:
This actually looks like a convex model. I could solve it with "big" solvers like Cplex,Gurobi and Mosek. These solvers support convex Quadratically Constrained problems. I also believe this can be formulated as a cone programming problem, opening up more possibilities.
Here is an example where I use package cccp in R. cccp stands for
Cone Constrained Convex Problems and is a port of CVXOPT.
The 2-norm of weights doesn't make sense. It has to be the 1-norm. This is essentially a constraint on the leverage of the portfolio. 1-norm(w) <= 1.6 implies that the portfolio is at most 130/30 (Sorry for using finance language here). You want to read about quadratic cones though. w'COV w = w'L'Lw (Cholesky decomp) and hence w'Cov w = 2-Norm (Lw)^2. Hence you can introduce the linear constraint y - Lw = 0 and t >= 2-Norm(Lw) [This defines a quadratic cone). Now you minimize t. The 1-norm can also be replaced by cones as abs(x_i) = sqrt(x_i^2) = 2-norm(x_i). So introduce a quadratic cone for each element of the vector x.
I would like to build a simple structure from motion program according to Tomasi and Kanade [1992]. The article can be found below:
https://people.eecs.berkeley.edu/~yang/courses/cs294-6/papers/TomasiC_Shape%20and%20motion%20from%20image%20streams%20under%20orthography.pdf
This method seems elegant and simple, however, I am having trouble calculating the metric constraints outlined in equation 16 of the above reference.
I am using R and have outlined my work thus far below:
Given a set of images
I want to track the corners of the three cabinet doors and the one picture (black points on images). First we read in the points as a matrix w where
Ultimately, we want to factorize w into a rotation matrix R and shape matrix S that describe the 3 dimensional points. I will spare as many details as I can but a complete description of the maths can be gleaned from the Tomasi and Kanade [1992] paper.
I supply w below:
w.vector=c(0.2076,0.1369,0.1918,0.1862,0.1741,0.1434,0.176,0.1723,0.2047,0.233,0.3593,0.3668,0.3744,0.3593,0.3876,0.3574,0.3639,0.3062,0.3295,0.3267,0.3128,0.2811,0.2979,0.2876,0.2782,0.2876,0.3838,0.3819,0.3819,0.3649,0.3913,0.3555,0.3593,0.2997,0.3202,0.3137,0.31,0.2718,0.2895,0.2867,0.825,0.7703,0.742,0.7251,0.7232,0.7138,0.7345,0.6911,0.1937,0.1248,0.1723,0.1741,0.1657,0.1313,0.162,0.1657,0.8834,0.8118,0.7552,0.727,0.7364,0.7232,0.7288,0.6892,0.4309,0.3798,0.4021,0.3965,0.3844,0.3546,0.3695,0.3583,0.314,0.3065,0.3989,0.3876,0.3857,0.3781,0.3989,0.3593,0.5184,0.4849,0.5147,0.5193,0.5109,0.4812,0.4979,0.4849,0.3536,0.3517,0.4121,0.3951,0.3951,0.3781,0.397,0.348,0.5175,0.484,0.5091,0.5147,0.5128,0.4784,0.4905,0.4821,0.7722,0.7326,0.7326,0.7232,0.7232,0.7119,0.7402,0.7006,0.4281,0.3779,0.3918,0.3863,0.3825,0.3472,0.3611,0.3537,0.8043,0.7628,0.7458,0.7288,0.727,0.7213,0.7364,0.6949,0.5789,0.5491,0.5761,0.5817,0.5733,0.5444,0.5537,0.5379,0.3649,0.3536,0.4177,0.3951,0.3857,0.3819,0.397,0.3461,0.697,0.671,0.6821,0.6821,0.6719,0.6412,0.6468,0.6235,0.3744,0.3649,0.4159,0.3819,0.3781,0.3612,0.3763,0.314,0.7008,0.6691,0.6794,0.6812,0.6747,0.6393,0.6412,0.6235,0.7571,0.7345,0.7439,0.7496,0.7402,0.742,0.7647,0.7213,0.5817,0.5463,0.5696,0.5779,0.5761,0.5398,0.551,0.5398,0.7665,0.7326,0.7439,0.7345,0.7288,0.727,0.7515,0.7062,0.8301,0.818,0.8571,0.8878,0.8766,0.8561,0.858,0.8394,0.4121,0.3876,0.4347,0.397,0.38,0.3631,0.3668,0.2971,0.912,0.8962,0.9185,0.939,0.9259,0.898,0.8887,0.8571,0.3989,0.3781,0.4215,0.3725,0.3612,0.3461,0.3423,0.2782,0.9092,0.8952,0.9176,0.9399,0.925,0.8971,0.8887,0.8571,0.4743,0.4536,0.4894,0.4517,0.446,0.4328,0.4385,0.3706,0.8273,0.8171,0.8571,0.8878,0.8766,0.8543,0.8561,0.8394,0.4743,0.4554,0.4969,0.4668,0.4536,0.4404,0.4536,0.3857)
w=matrix(w.vector,ncol=16,nrow=16,byrow=FALSE)
Then create registered measurement matrix wm according to equation 2 as
by
wm = w - rowMeans(w)
We can decompose wm into a '2FxP' matrix o1 a diagonal 'PxP' matrix e and 'PxP' matrix o2 by using a singular value decomposition.
svdwm <- svd(wm)
o1 <- svdwm$u
e <- diag(svdwm$d)
o2 <- t(svdwm$v) ## dont forget the transpose!
However, because of noise, we only pay attention to the first 3 columns of o1, first 3 values of e and the first 3 rows of o2 by:
o1p <- svdwm$u[,1:3]
ep <- diag(svdwm$d[1:3])
o2p <- t(svdwm$v)[1:3,] ## dont forget the transpose!
Now we can solve for our rhat and shat in equation (14)
by
rhat <- o1p%*%ep^(1/2)
shat <- ep^(1/2) %*% o2p
However, these results are not unique and we still need to solve for R and S by equation (15)
by using the metric constraints of equation (16)
Now I need to find Q. I believe there are two potential methods but am unclear how to employ either.
Method 1 involves solving for B where B=Q%*%solve(Q) then using Cholesky decomposition to find Q. Method 1 appears to be the common choice in literature, however, little detail is given as to how to actually solve the linear system. It is apparent that B is a '3x3' symmetric matrix of 6 unknowns. However, given the metric constraints (equations 16), I don't know how to solve for 6 unknowns given 3 equations. Am I forgetting a property of symmetric matrices?
Method II involves using non-linear methods to estimate Q and is less commonly used in structure from motion literature.
Can anyone offer some advice as to how to go about solving this problem? Thanks in advance and let me know if I need to be more clear in my question.
can be written as .
can be written as .
can be written as .
so our equations are:
So the first equation can be written as:
which is equivalent to
To keep it short we define now:
(I know the spacings are terrably small, but yes, this is a Vector...)
So for all equations in all different Frames f, we can write one big equation:
(sorry for the ugly formulas...)
Now you just need to solve the -Matrix using Cholesky decomposition or whatever...
I've been searching for an answer for this question for quite a while, so I'm hoping someone can help me. I'm using dbscan from the fpc library in R. For example, I am looking at the USArrests data set and am using dbscan on it as follows:
library(fpc)
ds <- dbscan(USArrests,eps=20)
Choosing eps was merely by trial and error in this case. However I am wondering if there is a function or code available to automate the choice of the best eps/minpts. I know some books recommend producing a plot of the kth sorted distance to its nearest neighbour. That is, the x-axis represents "Points sorted according to distance to kth nearest neighbour" and the y-axis represents the "kth nearest neighbour distance".
This type of plot is useful for helping choose an appropriate value for eps and minpts. I hope I have provided enough information for someone to be help me out. I wanted to post a pic of what I meant however I'm still a newbie so can't post an image just yet.
There is no general way of choosing minPts. It depends on what you want to find. A low minPts means it will build more clusters from noise, so don't choose it too small.
For epsilon, there are various aspects. It again boils down to choosing whatever works on this data set and this minPts and this distance function and this normalization. You can try to do a knn distance histogram and choose a "knee" there, but there might be no visible one, or multiple.
OPTICS is a successor to DBSCAN that does not need the epsilon parameter (except for performance reasons with index support, see Wikipedia). It's much nicer, but I believe it is a pain to implement in R, because it needs advanced data structures (ideally, a data index tree for acceleration and an updatable heap for the priority queue), and R is all about matrix operations.
Naively, one can imagine OPTICS as doing all values of Epsilon at the same time, and putting the results in a cluster hierarchy.
The first thing you need to check however - pretty much independent of whatever clustering algorithm you are going to use - is to make sure you have a useful distance function and appropriate data normalization. If your distance degenerates, no clustering algorithm will work.
MinPts
As Anony-Mousse explained, 'A low minPts means it will build more clusters from noise, so don't choose it too small.'.
minPts is best set by a domain expert who understands the data well. Unfortunately many cases we don't know the domain knowledge, especially after data is normalized. One heuristic approach is use ln(n), where n is the total number of points to be clustered.
epsilon
There are several ways to determine it:
1) k-distance plot
In a clustering with minPts = k, we expect that core pints and border points' k-distance are within a certain range, while noise points can have much greater k-distance, thus we can observe a knee point in the k-distance plot. However, sometimes there may be no obvious knee, or there can be multiple knees, which makes it hard to decide
2) DBSCAN extensions like OPTICS
OPTICS produce hierarchical clusters, we can extract significant flat clusters from the hierarchical clusters by visual inspection, OPTICS implementation is available in Python module pyclustering. One of the original author of DBSCAN and OPTICS also proposed an automatic way to extract flat clusters, where no human intervention is required, for more information you can read this paper.
3) sensitivity analysis
Basically we want to chose a radius that is able to cluster more truly regular points (points that are similar to other points), while at the same time detect out more noise (outlier points). We can draw a percentage of regular points (points belong to a cluster) VS. epsilon analysis, where we set different epsilon values as the x-axis, and their corresponding percentage of regular points as the y axis, and hopefully we can spot a segment where the percentage of regular points value is more sensitive to the epsilon value, and we choose the upper bound epsilon value as our optimal parameter.
One common and popular way of managing the epsilon parameter of DBSCAN is to compute a k-distance plot of your dataset. Basically, you compute the k-nearest neighbors (k-NN) for each data point to understand what is the density distribution of your data, for different k. the KNN is handy because it is a non-parametric method. Once you choose a minPTS (which strongly depends on your data), you fix k to that value. Then you use as epsilon the k-distance corresponding to the area of the k-distance plot (for your fixed k) with a low slope.
For details on choosing parameters, see the paper below on p. 11:
Schubert, E., Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (2017). DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Transactions on Database Systems (TODS), 42(3), 19.
For two-dimensional data: use default value of minPts=4 (Ester et al., 1996)
For more than 2 dimensions: minPts=2*dim (Sander et al., 1998)
Once you know which MinPts to choose, you can determine Epsilon:
Plot the k-distances with k=minPts (Ester et al., 1996)
Find the 'elbow' in the graph--> The k-distance value is your Epsilon value.
If you have the resources, you can also test a bunch of epsilon and minPts values and see what works. I do this using expand.grid and mapply.
# Establish search parameters.
k <- c(25, 50, 100, 200, 500, 1000)
eps <- c(0.001, 0.01, 0.02, 0.05, 0.1, 0.2)
# Perform grid search.
grid <- expand.grid(k = k, eps = eps)
results <- mapply(grid$k, grid$eps, FUN = function(k, eps) {
cluster <- dbscan(data, minPts = k, eps = eps)$cluster
sum <- table(cluster)
cat(c("k =", k, "; eps =", eps, ";", sum, "\n"))
})
See this webpage, section 5: http://www.sthda.com/english/wiki/dbscan-density-based-clustering-for-discovering-clusters-in-large-datasets-with-noise-unsupervised-machine-learning
It gives detailed instructions on how to find epsilon. MinPts ... not so much.
Gaffer on Games has a great article about using RK4 integration for better game physics. The implementation is straightforward, but the math behind it confuses me. I understand derivatives and integrals on a conceptual level, but haven't manipulated equations in a long while.
Here's the brunt of Gaffer's implementation:
void integrate(State &state, float t, float dt)
{
Derivative a = evaluate(state, t, 0.0f, Derivative());
Derivative b = evaluate(state, t+dt*0.5f, dt*0.5f, a);
Derivative c = evaluate(state, t+dt*0.5f, dt*0.5f, b);
Derivative d = evaluate(state, t+dt, dt, c);
const float dxdt = 1.0f/6.0f * (a.dx + 2.0f*(b.dx + c.dx) + d.dx);
const float dvdt = 1.0f/6.0f * (a.dv + 2.0f*(b.dv + c.dv) + d.dv)
state.x = state.x + dxdt * dt;
state.v = state.v + dvdt * dt;
}
Can anybody explain in simple terms how RK4 works? Specifically, why are we averaging the derivatives at 0.0f, 0.5f, 0.5f, and 1.0f? How is averaging derivatives up to the 4th order different from doing a simple euler integration with a smaller timestep?
After reading the accepted answer below, and several other articles, I have a grasp on how RK4 works. To answer my own questions:
Can anybody explain in simple terms how RK4 works?
RK4 takes advantage of the fact that
we can get a much better approximation
of a function if we use its
higher-order derivatives rather than
just the first or second derivative.
That's why the Taylor series
converges much faster than Euler
approximations. (take a look at the
animation on the right side of that
page)
Specifically, why are we averaging the derivatives at 0.0f, 0.5f, 0.5f, and 1.0f?
The Runge-Kutta method is an
approximation of a function that
samples derivatives of several points
within a timestep, unlike the Taylor
series which only samples derivatives
of a single point. After sampling
these derivatives we need to know how
to weigh each sample to get the
closest approximation possible. An
easy way to do this is to pick
constants that coincide with the
Taylor series, which is how the
constants of a Runge-Kutta equation
are determined.
This article made it clearer for
me. Notice how (15) is the Taylor
series expansion while (17) is the
Runge-Kutta derivation.
How is averaging derivatives up to the 4th order different from doing a simple euler integration with a smaller timestep?
Mathematically, it converges much
faster than doing many Euler
approximations. Of course, with enough
Euler approximations we can gain equal
accuracy to RK4, but the computational
power needed doesn't justify using
Euler.
This may be a bit oversimplified so far as actual math, but meant as an intuitive guide to Runge Kutta integration.
Given some quantity at some time t1, we want to know the quantity at another time t2. With a first-order differential equation, we can know the rate of change of that quantity at t1. There is nothing else we can know for sure; the rest is guessing.
Euler integration is the simplest way to guess: linearly extrapolate from t1 to t2, using the precisely known rate of change at t1. This usually gives a bad answer. If t2 is far from t1, this linear extrapolation will fail to match any curvature in the ideal answer. If we take many small steps from t1 to t2, we'll have the problem of subtraction of similar values. Roundoff errors will ruin the result.
So we refine our guess. One way is to go ahead and do this linear extrapolation anyway, then hoping it's not too far off from truth, use the differential equation to compute an estimate of the rate of change at t2. This, averaged with the (accurate) rate of change at t1, better represents the typical slope of the true answer between t1 and t2. We use this to make a fresh linear extrapolation from to t1 to t2. It's not obvious if we should take the simple average, or give more weight to the rate at t1, without doing the math to estimate errors, but there is a choice here. In any case, it's a better answer than Euler gives.
Perhaps better, make our initial linear extrapolation to a point in time midway between t1 and t2, and use the differential equation to compute the rate of change there. This gives roughly as good an answer as the average just described. Then use this for a linear extrapolation from t1 to t2, since our purpose it to find the quantity at t2. This is the midpoint algorithm.
You can imagine using the mid-point estimate of the rate of change to make another linear extrapolation of the quantity from t1 to the midpoint. With the differential equation we get an better estimate of the slope there. Using this, we end by extrapolating from t1 all the way to t2 where we want an answer. This is the Runge Kutta algorithm.
Could we do a third extrapolation to the midpoint? Sure, it's not illegal, but detailed analysis shows diminishing improvement, such that other sources of error dominate the final result.
Runge Kutta applies the differential equation to the intial point t1, twice to the midpoint, and once at the final point t2. The in-between points are a matter of choice. It is possible to use other points between t1 and t2 for making those improved estimates of the slope. For example, we could use t1, a point one third the way toward t2, another 2/3 the way toward t2, and at t2. The weights for the average of the four derivatives will be different. In practice this doesn't really help, but might have a place in testing since it ought to give the same answer but will provide a different set of round off errors.
As to your question why: I recall once writing a cloth simulator where the cloth was a series of springs interconnected at nodes. In the simulator, the force exerted by the spring is proportional to how far the spring is stretched. The force causes acceleration at the node, which causes velocity which moves the node which stretches the spring. There are two integrals (integrating acceleration to get velocity, and integrating velocity to get position) and if they are inaccurate, the errors snowball: Too much acceleration causes too much velocity which causes too much stretch which causes even more acceleration, making the whole system unstable.
It is difficult to explain without graphics, but I'll try: Say you have f(t), where f(0) = 10, f(1) = 20, and f(2) = 30.
A proper integration of f(t) over the interval 0 < t < 1 would give you the surface under the graph of f(t) over that interval.
The rectangle rule integration approximates that surface with a rectangle where the breadth is the delta in time and the length is the new value of f(t), so in the interval 0 < t < 1 , it will yield 20 * 1 = 20, and in the next interval 1
Now if you were to plot these points and draw a line through them you'll see that it is actually triangular, with a surface of 30 (units), and therefore the Euler integration is inadequate.
To get a more accurate estimation of the surface (integral) you can take smaller intervals of t, evaluating at for example f(0), f(0.5), f(1), f(1.5) and f(2).
If you're still following me, the RK4 method is then simply a way of estimating values of f(t) for t0 < t < t0+dt invented by people smarter than myself for getting accurate estimates of the integral.
(but as others have said, read the Wikipedia article for a more detailed explanation. RK4 is in the category of numerical integration)
RK4 in the simplest sense is making a approximation function that that is based on 4 derivatives and point for each time step: Your initial condition at starting point A, a first approximated slope B based on data point A at your time step/2 and the slope from A, a third approximation C , which has an correction value for the slope at B to reflect the shape changes of your function, and finally a final slope based on the corrected slope at point C.
So basically this method lets you calculate using a starting point, an averaged midpoint which has corrections built into both parts to adjust for the shape, and a doubly corrected endpoint. This makes the effective contribution from each data point 1/6 1/3 1/3 and 1/6, so most of your answer is based on your corrections for the shape of your function.
It turns out that the order of an RK approximation (Euler is considered an RK1) corresponds to how its accuracy scales with smaller time steps.
The relationship between RK1 approximations is linear, so for 10 times the precision you get roughly 10 times better convergence.
For RK4, 10 times the precision yields you about 10^4 times better convergence. So while your calcuation time increases linearly in RK4, it increases your accuracy polynomially.