I have a Mixed Integer Programming (MIP) problem, currently modelled in Python's PuLP library. My issue is however very generic, syntax doesn't play a role here.
I want to add a constraint to my model that works like this:
if b=1 then x=y
The variable b is a binary variable taking values 0 or 1. x and y are variables that represent the current stock level. x as a continuous variable, y as an integer variable.
I know constraints can only be modelled in the following format:
a*x+c <= y # a, c are constants, x, y variables
I hope there is some workaround how I can model the above described if b then x equals y constraint.
Here are my approaches so far:
b*y <= x
y >= x*b # works in theory, but multiplication of 2 variables is not allowed
For 2 binary variables x and y the following is true:
M*y > x # represents: if x then y (M is a sufficient large constant)
I guess the solution involves a large M constant, maybe even further helper variables.
A little background: I want to model an inventory problem, with continuous stock levels. However, order decisions should only be possible in integer numbers. I therefore need the stock level to be modelled with float numbers. At the point of order (b==1) however in integer.
I hope someone can help here, even if this is rather theoretic than directly coding related. Hints to further resources that might help are also highly appreciated.
b=1 => x=y
can be modeled as:
y-M(1-b) <= x <= y+M(1-b)
Related
ModelingToolkit.jl is such a great package that I frequently expect too much of it. For example, I often find myself with a model which boils down to the following:
#variables t x(t) y(t)
#parameters a b C
d = Differential(t)
eqs = [
d(x) ~ a * y - b * x,
d(y) ~ b * x - a * y,
0 ~ x + y - C
]
#named sys = ODESystem(eqs)
Now, I know that I could get this down to one equation by substitution of the 0 ~ x + y - C. But in reality my systems are much larger, less trivial and programmatically generated, so I would like ModelingToolkit.jl to do it for me.
I have tried using structural_simplify, but the extra equation gets in the way:
julia> structural_simplify(sys)
ERROR: ExtraEquationsSystemException: The system is unbalanced. There are 2 highest order derivative variables and 3 equations.
More equations than variables, here are the potential extra equation(s):
Then I found the tutorial on DAE index reduction, and thought that dae_index_lowering might work for me:
julia> dae_index_lowering(sys)
ERROR: maxiters=8000 reached! File a bug report if your system has a reasonable index (<100), and you are using the default `maxiters`. Try to increase the maxiters by `pantelides(sys::ODESystem; maxiters=1_000_000)` if your system has an incredibly high index and it is truly extremely large.
So the question is whether ModelingToolkit.jl currently has a feature which will do the transformation, or if a different approach is necessary?
The problem is that the system is unbalanced, i.e. there are more equations than there are states. In general it is impossible to prove that an overdetermined system of this sort is well-defined. Thus to solve it, you have to delete one of the equations. If you know the conservation law must hold true, then you can delete the second differential equation:
using ModelingToolkit
#variables t x(t) y(t)
#parameters a b C
d = Differential(t)
eqs = [
d(x) ~ a * y - b * x,
0 ~ x + y - C
]
#named sys = ODESystem(eqs)
simpsys = structural_simplify(sys)
And that will simplify down to a single equation. The problem is that in general it cannot prove that if it does delete that differential equation, that y(t) is still going to be the same. In this specific case, maybe it could one day prove that the conservation law must occur given the differential equation system. But even if it could, then the format would be for you to only give the differential equation and then let it remove equations by substituting proved conservation laws: so you would still only give two equations for the two state system.
I have initially posted this question on stats.stackexchange.com,
but it was closed due to being focused on programming. Hopefully, I
can get any help here.
I will not put many theoretical details here to make it simple, but my final goal is to implement a Hidden Markov Model using R.
Although I am fine with the theoretical model construction, when I tried to implement it, I realized that I do not know basic things about computational statistics. My question goes into this direction.
Let and be random variables such that and , with and . If denotes distribution, how can I compute
using R?
I mean, what is the exact meaning of these distributions (one discrete and one continuous) multiplication? How can I do this using R? The answer is obviously a function of , but how is it represented in my code?
Is there any change if is also discrete? For instance, , with . How it would affect the implemented code?
I know my questions are not very specific, but I am very lost on how to start. My goal with this question is understanding how I can "translate" what I have written in paper to the computer.
Translation
The equations describe how to compute the probability distribution of X given an observation of Y=y and values for parameters p and sigma. Ultimately, you want to implement a function p_X_given_Y that takes a value of Y and returns a probability distribution for X. A good place to start is to implement the two functions used in the RHS of the expression. Something like,
p_X <- function (x, p=0.5) { switch(as.character(x), "0"=p, "1"=1-p, 0) }
p_Y_given_X <- function (y, x, sigma=1) { dnorm(y, x, sd=sigma) }
Note that p and sigma are picked arbitrarily here. These functions can then be used to define the p_X_given_Y function:
p_X_given_Y <- function (y) {
# numerators: for each x \in X
ps <- sapply(c("0"=0,"1"=1),
function (x) { p_X(x) * p_Y_given_X(y, x) })
# divide out denominator
ps / sum(ps)
}
which can be used like:
> p_X_given_Y(y=0)
# 0 1
# 0.6224593 0.3775407
> p_X_given_Y(y=0.5)
# 0 1
# 0.5 0.5
> p_X_given_Y(y=2)
# 0 1
# 0.1824255 0.8175745
These numbers should make intuitive sense (given p=0.5): Y=0 is more likely to come from X=0, Y=0.5 is equally likely to be X=0 or X=1, etc.. This is only one way of implementing it, where the idea is to return the "distribution of X", which in this case is simply a named numeric vector, where the names ("0", "1") correspond to the support of X, and the values correspond to the probability masses.
Some alternative implementations might be:
a p_X_given_Y(x,y) that also takes a value for x and returns the corresponding probability mass
a p_X_given_Y(y) that returns another function that takes an x argument and returns the corresponding probability mass (i.e., the probability mass function)
Since lpSolve does not allow to use != for the constraint directions, what is an alternative way to get the same result?
I would like to maximize x1 + x2 with constraints: x1 <= 5 and
x2 != 5
and keep using lpSolve R package.
I've tried using a combination of > < in order to replicate the same behaviour of !=, however I do not obtain the result I expected.
f.obj<-c(1,1)
f.con<-matrix(c(1,0,0,1),nrow=2,ncol=2,byrow=TRUE)
f.dir<-c("<=","!=")
f.rhs<-c(5,5)
lp("max",f.obj,f.con,f.dir,f.rhs)$solution
Since lpSolve does not support !=, I get the error message:
Error in lp("max",f.obj,f.con,f.dir,f.rhs): Unknown constraint direction found
EDIT
I would like to maximize x1 + x2 with constraints: x1 <= 5 and
x2 < 10 and x2 != 9.
So the solution would be 5 and 8.
You can't do that, even in theory, since the resulting constraint set is not closed. It is like trying to minimize x^2 over the set x > 0. For any proposed solution x0 in that set the solution x0/2 is better so there is no optimum.
I would just use x <= 5 as your constraint and if the constraint is not active (i.e. it turns out that x < 5) then you have found the solution; otherwise, there is no solution. If there is no solution you can try x <= 5 - eps for an arbitrarily chosen eps.
ADDED:
If what you intended was that the variables x1 and x2 are integer then
x < 10 and x != 9
is equivalent to
x <= 8
Note that lp has the all.int argument which defaults to FALSE.
ADDED 2:
If you just want to find multiple feasible solutions then if opt is the value of the objective from the first solution rerun adding the constraint (assuming a maximization problem):
objective <= opt - eps
where eps is an arbitrary small constant.
Also note that if the vectors x1 and x2 are two optimal solutions to an LP then since the constraint set is necessarily convex any convex combination of those solutions is also feasible and because the objective is linear all of those convex combinations must also be optimal so if there is more than one optimum then there are an infinite number of such optimal solutions so you can't simply enumerate them.
ADDED 3.
The feasible set of a linear program form a simplex (i.e. a polytope) and at least one vertex must be at the optimal value if such optimal value exists. If there are more than one vertex with the same optimal value then the points on the line connecting them are all optimal values as well. Although there are an infinite number of optimal values in that case there are only a finite number of vertices so you could enumerate them using the vertexenum package. Then evaluate the objective at each one. If there is one vertex whose objective value is greater than all others then that is the optimum. If there are multiple then we know that those plus all convex combinations of those are optimal. This might work if your problem is not too large.
Given a set of variables, x's. I want to find the values of coefficients for this equation:
y = a_1*x_1 +... +a_n*x_n + c
where a_1,a_2,...,a_n are all unknowns. Thinking this in perspective of data frame, I want to create this value of y for every rows in the data.
My question is: for y, a_1...a_n and c are all unknown, is there a way for me to find a set of solutions a_1,...,a_n under the condition that corr(y,x_1), corr(y,x_2) .... corr(y,x_n) are all greater than 0.7. For simplicity take correlation here as Pearson correlation. I know there would no be unique solution. But how can I construct a set of solutions for a_1,...,a_n to fulfill this condition?
Spent a day to search the idea but could not get any information out of it. Any programming language to tackle this problem is welcomed or at least some reference for this.
No, it is not possible in general. It may be possible in some special cases.
Given x₁, x₂, ... you want to find y = a₁x₁ + a₂x₂ + ... + c so that all the correlations between y and the x's are greater than some target R. Since the correlation is
Corr(y, xi) = Cov(y, xi) / Sqrt[ Var(y) * Var(xi) ]
your constraint is
Cov(y, xi) / Sqrt[ Var(y) * Var(xi) ] > R
which can be rearranged to
Cov(y, xi)² > R² * Var(y) * Var(xi)
and this needs to be true for all i.
Consider the simple case where there are only two columns x₁ and x₂, and further assume that they both have mean zero (so you can ignore the constant c) and variance 1, and that they are uncorrelated. In that case y = a₁x₁ + a₂x₂ and the covariances and variances are
Cov(y, x₁) = a₁
Cov(y, x₂) = a₂
Var(x₁) = 1
Var(x₂) = 1
Var(y) = (a₁)² + (a₂)²
so you need to simultaneously satisfy
(a₁)² > R² * ((a₁)² + (a₂)²)
(a₂)² > R² * ((a₁)² + (a₂)²)
Adding these inequalities together, you get
(a₁)² + (a₂)² > 2 * R² * ((a₁)² + (a₂)²)
which means that in order to satisfy both of the inequalities, you must have R < Sqrt(1/2) (by cancelling common factors on both sides of the inequality). So the very best you could do in this simple case is to choose a₁ = a₂ (the exact value doesn't matter as long as they are equal) and both of the correlations Corr(y,a₁) and Corr(y,a₂) will be equal to 0.707. You cannot achieve correlations higher than this between y and all of the x's simultaneously in this case.
For the more general case with n columns (each of which has mean zero, variance 1 and zero correlation between columns) you cannot simultaneously achieve correlations greater than 1 / sqrt(n) (as pointed out in the comments by #kazemakase).
In general, the more independent variables there are, the lower the correlation you will be able to achieve between y and the x's. Also (although I haven't mentioned it above) the correlations between the x's matter. If they are in general positively correlated, you will be able to achieve a higher target correlation between y and the x's. If they are in general uncorrelated or negatively correlated, you will only be able to achieve low correlations between y and the x's.
I am not expert in this field so read with extreme prejudice!
I am a bit confused by your y
Your y is a single constant and you want to have the correlation between it and all the x_i values be > 0.7 ? I am no math/statistics expert but my feelings for this are that this is achievable only if the correlation between x_i,x_j upholds the same condition. in that case you can simply do the average of x_i like this:
y=(x_1+x_2+x_3+...+x_n)/n
so the a_i=1.0/n and c=0.0 But still the question is:
What meaning has a correlation between 2 numbers only?
More reasonable would be if y is a function dependent on x
for example like this:
y(x) = a_1*(x-x_1)+... +a_n*(x-x_n) + c
or any other equation (hard to make any without knowing where it came from and for what purpose). Then you can compute the correlation between two sets
X = { x_1 , x_2 ,..., x_n }
Y = { y(x_1),y(x_2),...y(x_n) }
In that case I would give try approximation search for the c,a_i constants to maximize correlation between X,Y, but the results complexity for the whole thing would be insane. So instead I would tweak just one constant. at the time
set some safe c,a_1,a_2,... constants
tweak a_1
so compute correlation for (a_1-delta) and (a_1+delta) and then choose the direction which is in favor of correlation. then keep going in that direction until the correlation coefficient start to drop.
Then you can recursively to this again with smaller delta. Btw this is exactly what my approx class does from the link above.
loop #2 through all the a_i
loop this whole few times to enhance precision
May be you could compute the c after each run to minimize the distance between X,Y sets.
I'm basically looking for a summation function that will compute multinomials given the number of variables and a degree.
Example
2 Variables; 2 Degrees:
x^2+y^2+x*y+x+y+1
Thanks.
See Knuth The Art of Computer Programming, Vol. 4, Fascicle 3 for a comprehensive answer.
Short answer: it's enough to generate all multinomial expressions in n variables with degree exactly d. Then, for your problem, you can either put together the answers with degrees ≤d, or add a dummy variable "1".
The problem of generating all expressions with degree exactly d is thus simply one of generating all ordered partitions (i.e., all nonnegative integer solutions to x1 + ... + xn = d), and this can be done with a simple backtracking algorithm. ("Depth-first search")
Given N variables, and a maximum degree of D, you have an array of D slots to fill with all possible combinations of variables.
[_, _, ..., _, _]
You are allowed to fill the slots with any of the N variables any number <= D times total. Since multiplication is commutative, it suffices to not care about ordering of variables. As such, this problem is reduced to generating (1) partitions of an integer and (2) subsets of a set.
I hope this is at least a start to your solution.
This also seems to be a Dynamic programming variant of the 0-1 Knapsack problem. Here we would be interested in all possible leaves of the decision tree.