Intervals for term structures - math

For a student learning platform (mathematics) we have managed to include Maxima and evaluate terms/equations/numbers on equivalence. For this we have programmed an algorithm randomly choosing numbers for all the variables and then comparing the two results whether they lead to the same values or not (more mathematically speaking we are seeing the terms as functions and comparing them at specific places).
Now the problem comes: Unfortunately, there must be the possibility to define ranges for coefficients of variables. So e.g. the correct solution [4,5]x^2-[3,4]x at the position x=10 leads to [4,5]*10^2-[3,4]*10. Here we have to find the minimum/maximum of this expression with e.g. the range of 4 to 5 as the coefficient before x^2. I have not been able to do this with native Maxima functions, so I am asking here for help. I am also wondering whether this is possible to combine with other functions such as sin, e etc. or whether this makes the whole optimisation problem too complex (and we should only allow polynomials).
Your help is greatly appreciated!
Best, Leon

To summarize what we said in the comments, we have something like sum(a[k]*e[k], k, 1, n) where coefficients a[k] are constrained by intervals I[k] and e[k] is an expression in x. Given that x is a specific value, then the sum is a linear combination of the a[k] and the extreme values are at the corners of the hypercube given by the Cartesian product of the intervals.
A simple solution is to just enumerate the corners of the hypercube and evaluate the sum at each corner, and see which is greatest. (If there are ties, that means that the sum is not actually a function of some coefficient. Given the problem statement, that means the corresponding e[k] is zero. Let's look for and omit such coefficients, then there can only be a unique maximum.)
Here's my attempt at a solution, hope I've understood what's going on and what needs to happen. Assume without checking that a, e, and I are all the same length, namely n.
find_maximum_corner (a, e, I, x, x1) :=
block ([n, ee, ii_omit, a_omit, ii_keep, a_keep, e_keep, I_keep,
corners_positions, corners_equations, corners_values,
maximum_value, ii_maximum_value],
n: length(a),
ee: subst (x = x1, sum (a[k]*e[k], k, 1, n)),
ii_omit: sublist_indices (e, lambda ([e1], subst (x = x1, e1) = 0)),
a_omit: makelist (a[i], i, ii_omit),
ii_keep: sublist (makelist (i, i, 1, n), lambda ([i1], not member (i1, ii_omit))),
a_keep: makelist (a[i], i, ii_keep),
e_keep: makelist (e[i], i, ii_keep),
I_keep: makelist (I[i], i, ii_keep),
corners_positions: apply (cartesian_product_list, I_keep),
corners_equations: map (lambda ([l], map (lambda ([a1, l1], a1 = l1), a_keep, l)), corners_positions),
corners_values: map (lambda ([eqs], subst (eqs, ee)), corners_equations),
maximum_value: lmax (corners_values),
ii_maximum_value: sublist_indices (corners_values, lambda ([v], v = maximum_value)),
[maximum_value, corners_equations[ii_maximum_value[1]], a_omit]);
That returns a list comprising the maximum value, the corner at which the sum reaches its maximum, and the list of variables omitted because the corresponding e[k] is zero at x = x1.
This solution makes use of cartesian_product_list which was recently added (in Maxima 5.43). If you are working with a version older than 5.43, I can write out a simple implementation of it.
With this solution I get:
(%i6) find_maximum_corner ([a, b, c], [x, -x^2, x^3], [[3, 4], [-2, 2], [4, 5]], x, 3);
(%o6) [165, [a = 4, b = - 2, c = 5], []]
(%i7) find_maximum_corner ([a, b, c], [x, -(x - 3)^2, x^3], [[3, 4], [-2, 2], [4, 5]], x, 3);
(%o7) [147, [a = 4, c = 5], [b]]
the second example showing a variable that drops out because the corresponding expression is zero.
It's not necessary for the expressions e[k] to be polynomials; they can be any functions of x (provided that subst(x = x1, e[k]) simplifies to a number when x1 is a number -- this is the case for most or all of the built-in math functions).

Related

Remainder sequences

I would like to compute the remainder sequence of two polynomials as used by GCD. If I understood the Wikipedia article about Pseudo-remainder sequence, one way to compute it is to use Euclid's algorithm:
gcd(a, b) := if b = 0 then a else gcd(b, rem(a, b))
meaning I will collect that rem() parts. If however the coefficients are integers, the intermediate fractions grow very quickly so then there are the so-called "Pseudo-remainder sequences" which try to keep the coefficients in small integers.
My question is, if I understood correctly (did I?), the two above sequences differ only by constant factor but when I try to run the following example I get different results, why? The first remainder sequence differs by -2, ok, but why is the second sequence so different? I presume subresultants() works correctly, but why does that g % (f % g) not work?
f = Poly(x**2*y + x**2 - 5*x*y + 2*x + 1, x, y)
g = Poly(2*x**2 - 12*x + 1, x)
print
print subresultants(f, g)[2]
print subresultants(f, g)[3]
print
print f % g
print g % (f % g)
which results in
Poly(-2*x*y - 16*x + y - 1, x, y, domain='ZZ')
Poly(-9*y**2 - 54*y + 225, x, y, domain='ZZ')
Poly(x*y + 8*x - 1/2*y + 1/2, x, y, domain='QQ')
Poly(2*x**2 - 12*x + 1, x, y, domain='QQ')
the two above sequences differ only by constant factor
For polynomials of one variable, they do. For multivariate polynomials, they don't.
The division of multivariable polynomials is a somewhat tricky business: result depends on the chosen order of monomials (by default, sympy uses lexicographic order). When you ask it to divide 2*x**2 - 12*x + 1 by x*y + 8*x - 1/2*y + 1/2, it observes that the leading monomial of the denominator is x*y, and there is no monomial in the numerator that is divisible by x*y. So the quotient is zero, and everything is a remainder.
The computation of subresultants (as it's implemented in sympy) treats polynomials in x,y as single-variable polynomials in x whose coefficients happen to come from the ring of polynomials in y. It is certain to produce a sequence of subresultants whose degree with respect to x keeps decreasing until it reaches 0: the last polynomial of the sequence will not have x in it. The degree with respect to y may (and does) go up, since the algorithm has no problem multiplying the terms by any polynomials in y in order to get x to drop out.
The upshot is that both computations work correctly, they just do different things.

Prolog Recursion (Factorial of a Power Function)

I am having some troubles with my CS assignment. I am trying to call another rule that I created previously within a new rule that will calculate the factorial of a power function (EX. Y = (N^X)!). I think the problem with my code is that Y in exp(Y,X,N) is not carrying over when I call factorial(Y,Z), I am not entirely sure though. I have been trying to find an example of this, but I haven been able to find anything.
I am not expecting an answer since this is homework, but any help would be greatly appreciated.
Here is my code:
/* 1.2: Write recursive rules exp(Y, X, N) to compute mathematical function Y = X^N, where Y is used
to hold the result, X and N are non-negative integers, and X and N cannot be 0 at the same time
as 0^0 is undefined. The program must print an error message if X = N = 0.
*/
exp(_,0,0) :-
write('0^0 is undefined').
exp(1,_,0).
exp(Y,X,N) :-
N > 0, !, N1 is N - 1, exp(Y1, X, N1), Y is X * Y1.
/* 1.3: Write recursive rules factorial(Y,X,N) to compute Y = (X^N)! This function can be described as the
factorial of exp. The rules must use the exp that you designed.
*/
factorial(0,X) :-
X is 1.
factorial(N,X) :-
N> 0, N1 is N - 1, factorial(N1,X1), X is X1 * N.
factorial(Y,X,N) :-
exp(Y,X,N), factorial(Y,Z).
The Z variable mentioned in factorial/3 (mentioned only once; so-called 'singleton variable', cannot ever get unified with anything ...).
Noticed comments under question, short-circuiting it to _ won't work, you have to unify it with a sensible value (what do you want to compute / link head of the clause with exp and factorial through parameters => introduce some parameter "in the middle"/not mentioned in the head).
Edit: I'll rename your variables for you maybe you'll se more clearly what you did:
factorial(Y,X,Result) :-
exp(Y,X,Result), factorial(Y,UnusedResult).
now you should see what your factorial/3 really computes, and how to fix it.

Vectorizing code to calculate (squared) Mahalanobis Distiance

EDIT 2: this post seems to have been moved from CrossValidated to StackOverflow due to it being mostly about programming, but that means by fancy MathJax doesn't work anymore. Hopefully this is still readable.
Say I want to to calculate the squared Mahalanobis distance between two vectors x and y with covariance matrix S. This is a fairly simple function defined by
M2(x, y; S) = (x - y)^T * S^-1 * (x - y)
With python's numpy package I can do this as
# x, y = numpy.ndarray of shape (n,)
# s_inv = numpy.ndarray of shape (n, n)
diff = x - y
d2 = diff.T.dot(s_inv).dot(diff)
or in R as
diff <- x - y
d2 <- t(diff) %*% s_inv %*% diff
In my case, though, I am given
m by n matrix X
n-dimensional vector mu
n by n covariance matrix S
and want to find the m-dimensional vector d such that
d_i = M2(x_i, mu; S) ( i = 1 .. m )
where x_i is the ith row of X.
This is not difficult to accomplish using a simple loop in python:
d = numpy.zeros((m,))
for i in range(m):
diff = x[i,:] - mu
d[i] = diff.T.dot(s_inv).dot(diff)
Of course, given that the outer loop is happening in python instead of in native code in the numpy library means it's not as fast as it could be. $n$ and $m$ are about 3-4 and several hundred thousand respectively and I'm doing this somewhat often in an interactive program so a speedup would be very useful.
Mathematically, the only way I've been able to formulate this using basic matrix operations is
d = diag( X' * S^-1 * X'^T )
where
x'_i = x_i - mu
which is simple to write a vectorized version of, but this is unfortunately outweighed by the inefficiency of calculating a 10-billion-plus element matrix and only taking the diagonal... I believe this operation should be easily expressible using Einstein notation, and thus could hopefully be evaluated quickly with numpy's einsum function, but I haven't even begun to figure out how that black magic works.
So, I would like to know: is there either a nicer way to formulate this operation mathematically (in terms of simple matrix operations), or could someone suggest some nice vectorized (python or R) code that does this efficiently?
BONUS QUESTION, for the brave
I don't actually want to do this once, I want to do it k ~ 100 times. Given:
m by n matrix X
k by n matrix U
Set of n by n covariance matrices each denoted S_j (j = 1..k)
Find the m by k matrix D such that
D_i,j = M(x_i, u_j; S_j)
Where i = 1..m, j = 1..k, x_i is the ith row of X and u_j is the jth row of U.
I.e., vectorize the following code:
# s_inv is (k x n x n) array containing "stacked" inverses
# of covariance matrices
d = numpy.zeros( (m, k) )
for j in range(k):
for i in range(m):
diff = x[i, :] - u[j, :]
d[i, j] = diff.T.dot(s_inv[j, :, :]).dot(diff)
First off, it seems like maybe you're getting S and then inverting it. You shouldn't do that; it's slow and numerically inaccurate. Instead, you should get the Cholesky factor L of S so that S = L L^T; then
M^2(x, y; L L^T)
= (x - y)^T (L L^T)^-1 (x - y)
= (x - y)^T L^-T L^-1 (x - y)
= || L^-1 (x - y) ||^2,
and since L is triangular L^-1 (x - y) can be computed efficiently.
As it turns out, scipy.linalg.solve_triangular will happily do a bunch of these at once if you reshape it properly:
L = np.linalg.cholesky(S)
y = scipy.linalg.solve_triangular(L, (X - mu[np.newaxis]).T, lower=True)
d = np.einsum('ij,ij->j', y, y)
Breaking that down a bit, y[i, j] is the ith component of L^-1 (X_j - \mu). The einsum call then does
d_j = \sum_i y_{ij} y_{ij}
= \sum_i y_{ij}^2
= || y_j ||^2,
like we need.
Unfortunately, solve_triangular won't vectorize across its first argument, so you should probably just loop there. If k is only about 100, that's not going to be a significant issue.
If you are actually given S^-1 rather than S, then you can indeed do this with einsum more directly. Since S is quite small in your case, it's also possible that actually inverting the matrix and then doing this would be faster. As soon as n is a nontrivial size, though, you're throwing away a lot of numerical accuracy by doing this.
To figure out what to do with einsum, write everything in terms of components. I'll go straight to the bonus case, writing S_j^-1 = T_j for notational convenience:
D_{ij} = M^2(x_i, u_j; S_j)
= (x_i - u_j)^T T_j (x_i - u_j)
= \sum_k (x_i - u_j)_k ( T_j (x_i - u_j) )_k
= \sum_k (x_i - u_j)_k \sum_l (T_j)_{k l} (x_i - u_j)_l
= \sum_{k l} (X_{i k} - U_{j k}) (T_j)_{k l} (X_{i l} - U_{j l})
So, if we make arrays X of shape (m, n), U of shape (k, n), and T of shape (k, n, n), then we can write this as
diff = X[np.newaxis, :, :] - U[:, np.newaxis, :]
D = np.einsum('jik,jkl,jil->ij', diff, T, diff)
where diff[j, i, k] = X_[i, k] - U[j, k].
Dougal nailed this one with an excellent and detailed answer, but thought I'd share a small modification that I found increases efficiency in case anyone else is trying to implement this. Straight to the point:
Dougal's method was as follows:
def mahalanobis2(X, mu, sigma):
L = np.linalg.cholesky(sigma)
y = scipy.linalg.solve_triangular(L, (X - mu[np.newaxis,:]).T, lower=True)
return np.einsum('ij,ij->j', y, y)
A mathematically equivalent variant I tried is
def mahalanobis2_2(X, mu, sigma):
# Cholesky decomposition of inverse of covariance matrix
# (Doing this in either order should be equivalent)
linv = np.linalg.cholesky(np.linalg.inv(sigma))
# Just do regular matrix multiplication with this matrix
y = (X - mu[np.newaxis,:]).dot(linv)
# Same as above, but note different index at end because the matrix
# y is transposed here compared to above
return np.einsum('ij,ij->i', y, y)
Ran both versions head-to-head 20x using identical random inputs and recorded the times (in milliseconds). For X as a 1,000,000 x 3 matrix (mu and sigma 3 and 3x3) I get:
Method 1 (min/max/avg): 30/62/49
Method 2 (min/max/avg): 30/47/37
That's about a 30% speedup for the 2nd version. I'm mostly going to be running this in 3 or 4 dimensions but to see how it scaled I tried X as 1,000,000 x 100 and got:
Method 1 (min/max/avg): 970/1134/1043
Method 2 (min/max/avg): 776/907/837
which is about the same improvement.
I mentioned this in a comment on Dougal's answer but adding here for additional visibility:
The first pair of methods above take a single center point mu and covariance matrix sigma and calculate the squared Mahalanobis distance to each row of X. My bonus question was to do this multiple times with many sets of mu and sigma and output a two-dimensional matrix. The set of methods above can be used to accomplish this with a simple for loop, but Dougal also posted a more clever example using einsum.
I decided to compare these methods with each other by using them to solve the following problem: Given k d-dimensional normal distributions (with centers stored in rows of k by d matrix U and covariance matrices in the last two dimensions of the k by d by d array S), find the density at the n points stored in rows of the n by d matrix X.
The density of a multivariate normal distribution is a function of the squared Mahalanobis distance of the point to the mean. Scipy has an implementation of this as scipy.stats.multivariate_normal.pdf to use as a reference. I ran all three methods against each other 10x using identical random parameters each time, with d=3, k=96, n=5e5. Here are the results, in points/sec:
[Method]: (min/max/avg)
Scipy: 1.18e5/1.29e5/1.22e5
Fancy 1: 1.41e5/1.53e5/1.48e5
Fancy 2: 8.69e4/9.73e4/9.03e4
Fancy 2 (cheating version): 8.61e4/9.88e4/9.04e4
where Fancy 1 is the better of the two methods above and Fancy2 is Dougal's 2nd solution. Since the Fancy 2 needs to calculate the inverses of all the covariance matrices I also tried a "cheating version" where it was passed these as a parameter, but it looks like that didn't make a difference. I had planned on including the non-vectorized implementation but that was so slow it would have taken all day.
What we can take away from this is that using Dougal's first method is about 20% faster than however Scipy does it. Unfortunately despite its cleverness the 2nd method is only about 60% as fast as the first. There are probably some other optimizations that can be done but this is already fast enough for me.
I also tested how this scaled with higher dimensionality. With d=100, k=96, n=1e4:
Scipy: 7.81e3/7.91e3/7.86e3
Fancy 1: 1.03e4/1.15e4/1.08e4
Fancy 2: 3.75e3/4.10e3/3.95e3
Fancy 2 (cheating version): 3.58e3/4.09e3/3.85e3
Fancy 1 seems to have an even bigger advantage this time. Also worth noting that Scipy threw a LinAlgError 8/10 times, probably because some of my randomly-generated 100x100 covariance matrices were close to singular (which may mean that the other two methods are not as numerically stable, I did not actually check the results).

From expensive search to Integer Programming or Constraint Programming?

Consider m by n matrices M, all of whose entries are 0 or 1. For a given M, the question is whether there exists a non zero vector v, all of whose entries are -1, 0 or 1 for which Mv = 0. For example,
[0 1 1 1]
M_1 = [1 0 1 1]
[1 1 0 1]
In this example, there is no such vector v.
[1 0 0 0]
M_2 = [0 1 0 0]
[0 0 1 0]
In this example, the vector (0,0,0,1) gives M_2v = 0.
Given an m and n, I would like to find if there exists such an M so that there is no non-zero v such that Mv = 0.
If m = 3 and n = 4 then the answer is yes as we can see above.
I am currently solving this problem by trying all different M and v which is very expensive.
However, is it possible to express the problem as an integer
programming problem or constraint programming problem so I can use an
existing software package, such as SCIP instead which might be more
efficient.
This question is probably more mathematical than progamming. I haven't found the final answer yet, but at least some ideas are here:
We can re-state the problem in the following way.
Problem A: Fix positive integers m and n. Let S be the set of n-dimensional vectors whose entries are 0 or 1. Does there exist any m by n matrix M whose entries are 0 or 1, such that, for any two different vectors v_1 and v_2 in S, the vectors Mv_1 and Mv_2 are different. (Or, you may say that, the matrix M, considered as an application from n-dimensional vectors to m-dimensional vectors, is injective on the set S.)
In brief: given the pair (m, n), does there exist such an injective M?
Problem A is equivalent to the original problem. Indeed, if Mv_1 = Mv_2 for two different v_1 and v_2 in S, then we have M(v_1 - v_2) = 0, and the vector v_1 - v_2 will have only 0, 1, - 1 as entries. The inverse is obviously also true.
Another reinterpretation is:
Problem B: Let m, n be a positive integer and S be the set of n-dimensional vectors whose entries are 0 and 1. Can we find m vectors r_1, ..., r_m in S, such that, for any pair of different vectors v_1 and v_2 in S, there exists an r_i, which satisfies <v_1, r_i> != <v_2, r_i>? Here <x, y> is the usual inner product.
In brief: can we choose m vectors in S to distinguish everyone in S by taking inner product with the chosen ones?
Problem B is equivalent to Problem A, because you can identify the matrix M with m vectors in S.
In the following, I will use both descriptions of the problem freely.
Let's call the pair (m, n) a "good pair" if the answer to Problem A (or B) is yes.
With the description of Problem B, it is clear that, for a given n, there is a minimal m such that (m, n) is a good pair. Let us write m(n) for this minimal m associated to n.
Similarly, for a given m, there is a maximal n such that (m, n) is good. This is because, if (m, n) is good, i.e. there is an injective M as stated in Problem A, then for any n' <= n, erasing any n - n' columns of M will give an injective M'. Let us write n(m) for this maximal n associated to m.
So the task becomes to calculate the functions m(n) and/or n(m).
We first prove several lemmas:
Lemma 1: We have m(n + k) <= m(n) + m(k).
Proof: If M is an m(n) by n injective matrix for the pair (m(n), n) and K is an m(k) by k injective matrix for the pair (m(k), k), then the (m(n) + n(k)) by (n + k) matrix
[M 0]
[0 K]
works for the pair (m(n) + 1, n + 1). To see this, let v_1 and v_2 be any pair of different (n + k)-dimensional vectors. We may cut both of them into two pieces: the first n entries, and the last k entries. If the first pieces of them are not equal, then they can be distinguished by one of the first m(n) rows of the above matrix; if the first pieces of them are equal, then the second pieces of them must be different, hence they can be distinguished by one of the last m(k) rows of the above matrix.
Remark: The sequence m(n) is thus a subadditive sequence.
A simple corollary:
Corollary 2: We have m(n + 1) <= m(n) + 1, hence m(n) <= n.
Proof: Take k = 1 in Lemma 1.
Note that, from other known values of m(n) you can get better upper bounds. For example, since we know that m(4) <= 3, we have m(4n) <= 3n. Anyway, these always give you O(n) upper bounds.
The next lemma gives you a lower bound.
Lemma 3: m(n) >= n / log2(n + 1).
Proof: Let T be the set of m(n)-dimensional vectors whose entries lie in {0, 1, ..., n}. Any m(n) by n matrix M gives a map from S to T, sending v to Mv.
Since there exists an M such that the above map is injective, then necessarily the size of the set T is at least the size of the set S. The size of T is (n + 1)^m, and the size of S is 2^n, thus we have:
(n + 1)^m(n) >= 2^n
or equivalently, m(n) >= n / log2(n + 1).
Back to programming
I have to say that I haven't figured out a good algorithm.
You might restate the problem as a Set Cover Problem, as follows:
Let U be the set of n dimensional vectors with entries 1, 0 or - 1, and let S be as above. Every vector w in S gives a subset C_w of U: C_w = {v in U: <w, v> != 0}. The question is then: can we find m vectors w such that the union of the subsets C_w is equal to U.
The general Set Cover Problem is NP complete, but in the above Wiki link there is an integer linear program formulation.
Anyway, this cannot take you much further than n = 10, I guess.
I'll keep editting this answer if I have further results.
i think using Boolean matrix multiplication will allow you to solve the Mv=0 problem with only 1's & 0's more efficiently. Using this method you should be able to solve without worrying about rank deficiencies due to the RHS equaling zero. Here is a link to documentation on some algorithms for using BMM:
http://theory.stanford.edu/~virgi/cs367/lecture2.pdf
If I understand the question, you are asking for a given m,n if there exists a Matrix M, (Linear Transformation), with a trivial kernal, that is Ker(M)={0}.
Recall that this is the same as the Nullspace of M being zero 0, Null(M)=0.
For the system Mv=0 the nullspace is {0} if the rank of the matrix M is equal to the dimension of v. So your question comes down to asking about the existence of a mxn matrix with rank dim(v)=m.
The problem in this form has been discussed here
Generate "random" matrix of certain rank over a fixed set of elements
You can also frame this question in terms of determinants because if M has determinant=0 then the nullspace is nontrivial. So you can think about this question in terms of constucting a matrix with entries in {-1,0,1} with a desired determinant.
I hope this helps.

lambda calculus for functional programming

in lambda calculus (λ x. λ y. λ s. λ z. x s (y s z)) is used for addition of two Church numerals how can we explain this, is there any good resource the lambda calculus for functional programming ? your help is much appreciated
Actually λ f1. λ f2. λ s. λ z. (f1 s (f2 s z)) computes addition because it is in effect substituting (f2 s z), the number represented by f2, to the "zero" inside (f1 s z).
Example: Let's take two for f2, s s z in expanded form. f1 is one: s z. Replace that last z by f2 and you get s s s z, the expanded form for three.
This would be easier with a blackboard and hand-waving, sorry.
In lambda calculus, you code a datatype in terms of the operations it induces. For instance, a boolean is a just a choice function that takes in input two values a and b and either returns a or b:
true = \a,b.a false = \a,b.b
What is the use of a natural number? Its main computational purpose is to
provide a bound to iteration. So, we code a natural number as an operator
that takes in input a function f, a value x, and iterate the application
of f over x for n times:
n = \f,x.f(f(....(f x)...))
with n occurrences of f.
Now, if you want to iterate n + m times the function f starting from x
you must start iterating n times, that is (n f x), and then iterate for m
additional times, starting from the previous result, that is
m f (n f x)
Similarly, if you want to iterate n*m times you need to iterate m times
the operation of iterating n times f (like in two nested loops), that is
m (n f) x
The previous encoding of datatypes is more formally explained in terms
of constructors and corresponding eliminators (the so called
Bohm-Berarducci encoding).

Resources