Solving a single, long linear equation in R - many unknown variables - r

I have a single twelve-term equation for different classes of disease prevalence (nx) that looks like this:
y = f*b0*n0 + f*b1*n1 + f*b2*n2 + ... + f*b12*n12
I also have an equality constraint, such that
n0 + n1 + n2 + ... + n12 = 1 ## prevalence must add up to one
The unknowns are nx (where x=[1,12] ).
I would prefer to solve this analytically, but can also make do with a numerical solution.
I would be grateful for any pointers on R packages or approaches (or is this unsolvable?). The project is in support of the current humanitarian response in Iraq - so hopefully a worthwhile cause.

Related

How to solve equation with rotation and translation matrices?

I working on computer vision task and have this equation:
R0*c + t0 = R1*c + t1 = Ri*c + ti = ... = Rn*c + tn ,
n is about 20 (but can be more if needs)
where each pair of R,t (rotation matrix and translation vector in 3D) is a result of i-measurement and they are known, and vector c is what I whant to know.
I've got result with ceres solver. It's good that it can handle outliers but I think it's overkill for this task.
So what methods I should use for two situations:
With outliers
Without outliers
To handle outliers you can use RANSAC:
* In each iteration randomly pick i,j (a "sample") and solve c:
Ri*c + ti = Rj*c + tj
- Set Y = Ri*c + ti
* Apply to a larger population:
- Select S={k} for which ||Rk*c + tk - Y||<e
e ~ 3*RMS of errors without outliers
- Find optimal c for all k equations (with least mean square)
- Give it a "grade": size of S
* After few iterations use optimal c found for Max "grade".
* Number of iterations: log(1-p)/log(1-w^2)
[https://en.wikipedia.org/wiki/Random_sample_consensus]
p = 0.001 (for example. It is the required certainty of the result)
w is an assumption of nonoutliers/n.

MatchIt: Optimal matching fails

I've been using the MatchIt package in R to do treatment versus control matching, and I can't get the optimal matching to work with my own dataset.
If I run the following code:
m.out <- matchit(match_formula, data=stats, method='optimal', distance='logit', ratio=2)
where the formula is
treatment ~ t_1 + t_2 + t_3 + t_4 + t_5 + t_6 + t_7 + t_8 + t_9 +
t_10 + t_11
then I end up with the error
Error in fullmatch.matrix(d, min.controls = ratio, max.controls = ratio, : omit.fraction must be NULL or numeric between -1 and 1
I couldn't find anywhere in the matchit method to specify an omit.fraction variable or what that even does. Is there any way to get around this bug and perform optimal matching?
Figured it out!
Because I had more control than treatment units, optmatch was unable to assign all control units to at least one treatment, which led to the error above. The solution was to switch the control and treatment populations and use a matching ratio of 1 in order to match all control units to a treatment unit.
Bonus: the optimal matches were an even better fit than the "nearest neighbor" matches, which I guess is to be expected.

Find out if a solution exists for multiple equations (in N) [duplicate]

This question already has answers here:
Algorithm for solving systems of linear inequalities
(5 answers)
Closed 8 years ago.
Consider the following equations:
X > Y
X + Y > 7
Y <= 10
X >= 0
Y >= 0
I want to find out if there exists a solution that fulfills all of them (natural numbers).
I don't care about the exact solution, I just want to know if there is a solution at all
I have read about Microsoft Solver Foundation or other linear programming libraries, but I'm not sure if they can solve problems like this.
Especially I'm not sure if the can solve equations with variables on each side, like
X > Y, or X + Y > Z
most examples are of the form:
X * 10 + Y * 30 > constant
I need it to be able to solve systems with maximum of 4-8 variables, all in range of 0-100
Another important constraint I have, the library needs to be fast. I need to be able to solve systems of like 7 equations in like 0,00001 seconds
Interesting question. Feels a lot like the integer-knapsack problem.
First of all, whether variables are on each side is irrelevant, since an equation like
X + Y > Z
can be rewritten to
X + Y - Z > 0
So let's assume that all constraints are of the format
(const1 * var1) + ... + (const8 * var8) > const
To support less variables, just use the value 0 for one of the constants.
The way to visualize this is to see the case of 2 variables as determining the convex hull of the 'lines' corresponding to the constraints. So each constraint can be drawn as a 2D line, and only values on one side of the line are allowed.
To visualize this for 3 variables, it's the same as whether the convex hull of 'planes' determined by the constraint have any grid points ('natural numbers') in them.
The trouble in this case is the fact that the solution should have only natural numbers: this makes normal linear algebra impossible, since a grid is imposed. I would not know of any library supporting such restrictions.
But it would not be too difficult to write a solution yourself: the idea is to find a solution by trying every number by pruning aggressively.
So in your example: test all X in the range 0 to 100. Now go to the next variable, and determine the valid range for the free variable based on the constraints. Worked out for x == 8: then the range for y would be:
0 .. 7 because of constraint x > y
0 .. 100 because of constraint x + y > 7 (since x is already 8)
0 .. 9 because of constraint y < 10
...and we repeat this for all constraints. The final constraint for y is then 0 .. 7, because that is the most tight constraint. Now repeat this process for the left-over unbound variables, and you're done if you find at least one solution.
I expect this code to be about 100 lines with dynamic programming; computation time very much depends on the input and vary wildly.
For example, a set of equations which would take a long time:
A + B + C + D + E + F + G + H > 400.5
A + B + C + D + E + F + G + H < 400.6
As a human we can deduce that since we're requiring natural numbers, there is no solution to these equations. However, this solution is not prunable using the method described above, all combinations of A .. G will have to be tested before it will be concluded that there is no fitting H. Therefore it will look at about all possibilities. Not really pleasant, but unavoidable.

How to calculate log(sum of terms) from its component log-terms

(1) The simple version of the problem:
How to calculate log(P1+P2+...+Pn), given log(P1), log(P2), ..., log(Pn), without taking the exp of any terms to get the original Pi. I don't want to get the original Pi because they are super small and may cause numeric computer underflow.
(2) The long version of the problem:
I am using Bayes' Theorem to calculate a conditional probability P(Y|E).
P(Y|E) = P(E|Y)*P(Y) / P(E)
I have a thousand probabilities multiplying together.
P(E|Y) = P(E1|Y) * P(E2|Y) * ... * P(E1000|Y)
To avoid computer numeric underflow, I used log(p) and calculate the summation of 1000 log(p) instead of calculating the product of 1000 p.
log(P(E|Y)) = log(P(E1|Y)) + log(P(E2|Y)) + ... + log(P(E1000|Y))
However, I also need to calculate P(E), which is
P(E) = sum of P(E|Y)*P(Y)
log(P(E)) does not equal to the sum of log(P(E|Y)*P(Y)). How should I get log(P(E)) without solving for P(E|Y)*P(Y) (they are extremely small numbers) and adding them.
You can use
log(P1+P2+...+Pn) = log(P1[1 + P2/P1 + ... + Pn/P1])
= log(P1) + log(1 + P2/P1 + ... + Pn/P1])
which works for any Pi. So factoring out maxP = max_i Pi results in
log(P1+P2+...+Pn) = log(maxP) + log(1+P2/maxP + ... + Pn/maxP)
where all the ratios are less than 1.

Minimization with constraint on all parameters in R

I want to minimize a simple linear function Y = x1 + x2 + x3 + x4 + x5 using ordinary least squares with the constraint that the sum of all coefficients have to equal 5. How can I accomplish this in R? All of the packages I've seen seem to allow for constraints on individual coefficients, but I can't figure out how to set a single constraint affecting coefficients. I'm not tied to OLS; if this requires an iterative approach, that's fine as well.
The basic math is as follows: we start with
mu = a0 + a1*x1 + a2*x2 + a3*x3 + a4*x4
and we want to find a0-a4 to minimize the SSQ between mu and our response variable y.
if we replace the last parameter (say a4) with (say) C-a1-a2-a3 to honour the constraint, we end up with a new set of linear equations
mu = a0 + a1*x1 + a2*x2 + a3*x3 + (C-a1-a2-a3)*x4
= a0 + a1*(x1-x4) + a2*(x2-x4) + a3*(x3-x4) + C*x4
(note that a4 has disappeared ...)
Something like this (untested!) implements it in R.
Original data frame:
d <- data.frame(y=runif(20),
x1=runif(20),
x2=runif(20),
x3=runif(20),
x4=runif(20))
Create a transformed version where all but the last column have the last column "swept out", e.g. x1 -> x1-x4; x2 -> x2-x4; ...
dtrans <- data.frame(y=d$y,
sweep(d[,2:4],
1,
d[,5],
"-"),
x4=d$x4)
Rename to tx1, tx2, ... to minimize confusion:
names(dtrans)[2:4] <- paste("t",names(dtrans[2:4]),sep="")
Sum-of-coefficients constraint:
constr <- 5
Now fit the model with an offset:
lm(y~tx1+tx2+tx3,offset=constr*x4,data=dtrans)
It wouldn't be too hard to make this more general.
This requires a little more thought and manipulation than simply specifying a constraint to a canned optimization program. On the other hand, (1) it could easily be wrapped in a convenience function; (2) it's much more efficient than calling a general-purpose optimizer, since the problem is still linear (and in fact one dimension smaller than the one you started with). It could even be done with big data (e.g. biglm). (Actually, it occurs to me that if this is a linear model, you don't even need the offset, although using the offset means you don't have to compute a0=intercept-C*x4 after you finish.)
Since you said you are open to other approaches, this can also be solved in terms of a quadratic programming (QP):
Minimize a quadratic objective: the sum of the squared errors,
subject to a linear constraint: your weights must sum to 5.
Assuming X is your n-by-5 matrix and Y is a vector of length(n), this would solve for your optimal weights:
library(limSolve)
lsei(A = X,
B = Y,
E = matrix(1, nrow = 1, ncol = 5),
F = 5)

Resources