When executing Mathematica's NullSpace command on a symbolic matrix, Mathematica makes some assumptions about the variables and I would like to know what they are.
For example,
In[1]:= NullSpace[{{a, b}, {c, d}}]
Out[1]= {}
but the unstated assumption is that
a d != b c.
How can I determine what assumptions the NullSpace command uses?
The underlying assumptions, so to speak, are enforced by internal uses of PossibleZeroQ. If that function cannot deem an expression to be zero then it will be regarded as nonzero, hence eligible for use as a pivot in row reduction (which is generally what is used for symbolic NullSpace).
---edit---
The question was raised regarding what might be visible in zero testing in symbolic linear algebra. By default the calls to PossibleZeroQ go through internal routes. PossibleZeroQ was later built on top of those.
There is always a question in Mathematica kernel code development of what should go through the main evaluator loop and what (e.g. for purposes of speed) should short circuit. Only the former is readily traced.
One can influence the process in symbolic linear algebra by specifying a non-default zero test. Could be e.g.
myTest[ee_]:= (Print[zerotesting[ee]]; PossibleZeroQ[ee])
and then use ZeroTest->myTest in NullSpace.
---end edit---
Found this:
In this case, if you expand your matrix by one column, the assumption shows up:
NullSpace[{{a, b, 1}, {c, d, 1}}]
{{-((-b+d)/(-b c+a d)),-((a-c)/(-b c+a d)),1}}
Perhaps useful in some situations
Related
This will be a strange question: I know what to do, and I am actually doing it, and it works, but I don't know how to write about it. Looking for solutions to a homogeneous matrix equation, say AX=0, I use the kernel of the parameter matrix A. But, the world being imperfect as it is, the matrix does not have a "perfect" kernel; it does have an "imperfect" one if you set a nonzero "tolerance" parameter. FWIW I'm using Scilab, the function is kernel(A,tol).
Now what are the correct terms for "imperfect kernel", or "tolerance" (of what?), how should this whole process be described in correct English and maths terminology? Should I say something like a "least-squares kernel"? "Approximate kernel"? Is tol the "tolerance of kernel-determination algorithm"? Sounds lame to me...
Depending on the method used (QR or SVD, third flag allows to choose this in Scilab implementation) the tolerance is used to determine when pivots (QR case) or singular values (SVD case) are consider to be zero. The kernel is then considered to be the associated subspace.
I am very confused in how CLP works in Prolog. Not only do I find it hard to see the benefits (I do see it in specific cases but find it hard to generalise those) but more importantly, I can hardly make up how to correctly write a recursive predicate. Which of the following would be the correct form in a CLP(R) way?
factorial(0, 1).
factorial(N, F):- {
N > 0,
PrevN = N - 1,
factorial(PrevN, NewF),
F = N * NewF}.
or
factorial(0, 1).
factorial(N, F):- {
N > 0,
PrevN = N - 1,
F = N * NewF},
factorial(PrevN, NewF).
In other words, I am not sure when I should write code outside the constraints. To me, the first case would seem more logical, because PrevN and NewF belong to the constraints. But if that's true, I am curious to see in which cases it is useful to use predicates outside the constraints in a recursive function.
There are several overlapping questions and issues in your post, probably too many to coherently address to your complete satisfaction in a single post.
Therefore, I would like to state a few general principles first, and then—based on that—make a few specific comments about the code you posted.
First, I would like to address what I think is most important in your case:
LP ⊆ CLP
This means simply that CLP can be regarded as a superset of logic programming (LP). Whether it is to be considered a proper superset or if, in fact, it makes even more sense to regard them as denoting the same concept is somewhat debatable. In my personal view, logic programming without constraints is much harder to understand and much less usable than with constraints. Given that also even the very first Prolog systems had a constraint like dif/2 and also that essential built-in predicates like (=)/2 perfectly fit the notion of "constraint", the boundaries, if they exist at all, seem at least somewhat artificial to me, suggesting that:
LP ≈ CLP
Be that as it may, the key concept when working with CLP (of any kind) is that the constraints are available as predicates, and used in Prolog programs like all other predicates.
Therefore, whether you have the goal factorial(N, F) or { N > 0 } is, at least in principle, the same concept: Both mean that something holds.
Note the syntax: The CLP(ℛ) constraints have the form { C }, which is {}(C) in prefix notation.
Note that the goal factorial(N, F) is not a CLP(ℛ) constraint! Neither is the following:
?- { factorial(N, F) }.
ERROR: Unhandled exception: type_error({factorial(_3958,_3960)},...)
Thus, { factorial(N, F) } is not a CLP(ℛ) constraint either!
Your first example therefore cannot work for this reason alone already. (In addition, you have a syntax error in the clause head: factorial (, so it also does not compile at all.)
When you learn working with a constraint solver, check out the predicates it provides. For example, CLP(ℛ) provides {}/1 and a few other predicates, and has a dedicated syntax for stating relations that hold about floating point numbers (in this case).
Other constraint solver provide their own predicates for describing the entities of their respective domains. For example, CLP(FD) provides (#=)/2 and a few other predicates to reason about integers. dif/2 lets you reason about any Prolog term. And so on.
From the programmer's perspective, this is exactly the same as using any other predicate of your Prolog system, whether it is built-in or stems from a library. In principle, it's all the same:
A goal like list_length(Ls, L) can be read as: "The length of the list Ls is L."
A goal like { X = A + B } can be read as: The number X is equal to the sum of A and B. For example, if you are using CLP(Q), it is clear that we are talking about rational numbers in this case.
In your second example, the body of the clause is a conjunction of the form (A, B), where A is a CLP(ℛ) constraint, and B is a goal of the form factorial(PrevN, NewF).
The point is: The CLP(ℛ) constraint is also a goal! Check it out:
?- write_canonical({a,b,c}).
{','(a,','(b,c))}
true.
So, you are simply using {}/1 from library(clpr), which is one of the predicates it exports.
You are right that PrevN and NewF belong to the constraints. However, factorial(PrevN, NewF) is not part of the mini-language that CLP(ℛ) implements for reasoning over floating point numbers. Therefore, you cannot pull this goal into the CLP(ℛ)-specific part.
From a programmer's perspective, a major attraction of CLP is that it blends in completely seamlessly into "normal" logic programming, to the point that it can in fact hardly be distinguished at all from it: The constraints are simply predicates, and written down like all other goals.
Whether you label a library predicate a "constraint" or not hardly makes any difference: All predicates can be regarded as constraints, since they can only constrain answers, never relax them.
Note that both examples you post are recursive! That's perfectly OK. In fact, recursive predicates will likely be the majority of situations in which you use constraints in the future.
However, for the concrete case of factorial, your Prolog system's CLP(FD) constraints are likely a better fit, since they are completely dedicated to reasoning about integers.
I am trying to model linear feedback shift registers in haskell. These can be modeled by polynomials over finite fields, so I am using numeric-prelude to get type classes that resemble mathematical algebraic structures more closely than those in the normal prelude.
I am by no means an expert in abstract algebra, so I have become a little confused about the IntegralDomain type class. The problem is that my book on abstract algebra (A book of abstract algebra by Charles C. Pinter) and the type classes seem to be conflicting with each other.
According to the book, the ring of polynomials over an integral domain, is itself an integral domain. Also, a ring of polynomials over a field is only an integral domain, but with the special(the fact that it is special is mentioned) property that the division algorithm holds.
That is, if F[x] is a polynomial over a field, then for a in F[x] and b!=0 in F[x], there exists q,r in F[x] such that b*q+r=a, and the degree of r is less than that of b.
The fact that this property is special to polynomials over a field, to me implies that it does not hold over any integral domain.
On the other hand, according to the type classes of numeric prelude, a polynomial over a field (that is zeroTestable) is also an IntegraldDomain. But according to the documentation, there are several laws of integralDomains, one of the being:
(a `div` b) * b + (a `mod` b) === a
http://hackage.haskell.org/packages/archive/numeric-prelude/0.4.0.1/doc/html/Algebra-IntegralDomain.html#t:C
This to me looks like the division algorithm, but then the division algorithm is true in any integral domain, including a polynomial over an integral domain contradiction my book. It is also worth noting that a polynomail over an integral domain does not have an instance for IntegralDomain in numeric-prelude(not that I can see at least, the fact that every type class is simply called C, make the documentation a little hard to read). So maybe the IntegralDomain in numeric prelude is an integral domain with the extra property that the division algorithm holds?
So is the IntegralDomain in numeric-prelude really an integral domain?
rubber duck debugging post script: While writing this question I got an idea for part of a possible explanation. Is it the requirement that "the degree of r is less than that of b." which makes the whole difference? That requirement is not in the numeric-prelude IntegralDomain. Then again, some of the other laws might imply this fact...
According to the book, a polynomial over an integral domain, is itself an integral domain.
That's not correctly phrased. The ring of polynomials over an integral domain is again an integral domain.
The ring of polynomials in one indeterminate over a field is even a principal ideal domain, as witnessed by the division algorithm, since every polynomial of degree 0 is a unit then.
In a general integral domain R, you have nonzero non-units, and if a is one, then you cannot write
X = q*a + r
with the degree of r smaller than that of a (which is 0).
Is it the requirement that "the degree of r is less than that of b." which makes the whole difference?
Precisely. That requirement guarantees that the division algorithm terminates. In a general integral domain, you can have a "canonical" choice of remainders modulo any fixed ring element, but the canonical remainder need not be "smaller" in any meaningful way, so an attempt to use the division algorithm need not terminate.
Then again, some of the other laws might imply this fact
None of the laws in Algebra.IntegralDomain imply that.
The law
(a+k*b) `mod` b === a `mod` b
is, I believe, hard to implement for a completely general integral domain, which could somewhat restrict the actual instances, but for something like Z[X] or R[X,Y] which are not PIDs, an instance is possible.
I have a function that takes a floating point number and returns a floating point number. It can be assumed that if you were to graph the output of this function it would be 'n' shaped, ie. there would be a single maximum point, and no other points on the function with a zero slope. We also know that input value that yields this maximum output will lie between two known points, perhaps 0.0 and 1.0.
I need to efficiently find the input value that yields the maximum output value to some degree of approximation, without doing an exhaustive search.
I'm looking for something similar to Newton's Method which finds the roots of a function, but since my function is opaque I can't get its derivative.
I would like to down-thumb all the other answers so far, for various reasons, but I won't.
An excellent and efficient method for minimizing (or maximizing) smooth functions when derivatives are not available is parabolic interpolation. It is common to write the algorithm so it temporarily switches to the golden-section search (Brent's minimizer) when parabolic interpolation does not progress as fast as golden-section would.
I wrote such an algorithm in C++. Any offers?
UPDATE: There is a C version of the Brent minimizer in GSL. The archives are here: ftp://ftp.club.cc.cmu.edu/gnu/gsl/ Note that it will be covered by some flavor of GNU "copyleft."
As I write this, the latest-and-greatest appears to be gsl-1.14.tar.gz. The minimizer is located in the file gsl-1.14/min/brent.c. It appears to have termination criteria similar to what I implemented. I have not studied how it decides to switch to golden section, but for the OP, that is probably moot.
UPDATE 2: I googled up a public domain java version, translated from FORTRAN. I cannot vouch for its quality. http://www1.fpl.fs.fed.us/Fmin.java I notice that the hard-coded machine efficiency ("machine precision" in the comments) is 1/2 the value for a typical PC today. Change the value of eps to 2.22045e-16.
Edit 2: The method described in Jive Dadson is a better way to go about this. I'm leaving my answer up since it's easier to implement, if speed isn't too much of an issue.
Use a form of binary search, combined with numeric derivative approximations.
Given the interval [a, b], let x = (a + b) /2
Let epsilon be something very small.
Is (f(x + epsilon) - f(x)) positive? If yes, the function is still growing at x, so you recursively search the interval [x, b]
Otherwise, search the interval [a, x].
There might be a problem if the max lies between x and x + epsilon, but you might give this a try.
Edit: The advantage to this approach is that it exploits the known properties of the function in question. That is, I assumed by "n"-shaped, you meant, increasing-max-decreasing. Here's some Python code I wrote to test the algorithm:
def f(x):
return -x * (x - 1.0)
def findMax(function, a, b, maxSlope):
x = (a + b) / 2.0
e = 0.0001
slope = (function(x + e) - function(x)) / e
if abs(slope) < maxSlope:
return x
if slope > 0:
return findMax(function, x, b, maxSlope)
else:
return findMax(function, a, x, maxSlope)
Typing findMax(f, 0, 3, 0.01) should return 0.504, as desired.
For optimizing a concave function, which is the type of function you are talking about, without evaluating the derivative I would use the secant method.
Given the two initial values x[0]=0.0 and x[1]=1.0 I would proceed to compute the next approximations as:
def next_x(x, xprev):
return x - f(x) * (x - xprev) / (f(x) - f(xprev))
and thus compute x[2], x[3], ... until the change in x becomes small enough.
Edit: As Jive explains, this solution is for root finding which is not the question posed. For optimization the proper solution is the Brent minimizer as explained in his answer.
The Levenberg-Marquardt algorithm is a Newton's method like optimizer. It has a C/C++ implementation levmar that doesn't require you to define the derivative function. Instead it will evaluate the objective function in the current neighborhood to move to the maximum.
BTW: this website appears to be updated since I last visited it, hope it's even the same one I remembered. Apparently it now also support other languages.
Given that it's only a function of a single variable and has one extremum in the interval, you don't really need Newton's method. Some sort of line search algorithm should suffice. This wikipedia article is actually not a bad starting point, if short on details. Note in particular that you could just use the method described under "direct search", starting with the end points of your interval as your two points.
I'm not sure if you'd consider that an "exhaustive search", but it should actually be pretty fast I think for this sort of function (that is, a continuous, smooth function with only one local extremum in the given interval).
You could reduce it to a simple linear fit on the delta's, finding the place where it crosses the x axis. Linear fit can be done very quickly.
Or just take 3 points (left/top/right) and fix the parabola.
It depends mostly on the nature of the underlying relation between x and y, I think.
edit this is in case you have an array of values like the question's title states. When you have a function take Newton-Raphson.
I'm writing program in Python and I need to find the derivative of a function (a function expressed as string).
For example: x^2+3*x
Its derivative is: 2*x+3
Are there any scripts available, or is there something helpful you can tell me?
If you are limited to polynomials (which appears to be the case), there would basically be three steps:
Parse the input string into a list of coefficients to x^n
Take that list of coefficients and convert them into a new list of coefficients according to the rules for deriving a polynomial.
Take the list of coefficients for the derivative and create a nice string describing the derivative polynomial function.
If you need to handle polynomials like a*x^15125 + x^2 + c, using a dict for the list of coefficients may make sense, but require a little more attention when doing the iterations through this list.
sympy does it well.
You may find what you are looking for in the answers already provided. I, however, would like to give a short explanation on how to compute symbolic derivatives.
The business is based on operator overloading and the chain rule of derivatives. For instance, the derivative of v^n is n*v^(n-1)dv/dx, right? So, if you have v=3*x and n=3, what would the derivative be? The answer: if f(x)=(3*x)^3, then the derivative is:
f'(x)=3*(3*x)^2*(d/dx(3*x))=3*(3*x)^2*(3)=3^4*x^2
The chain rule allows you to "chain" the operation: each individual derivative is simple, and you just "chain" the complexity. Another example, the derivative of u*v is v*du/dx+u*dv/dx, right? If you get a complicated function, you just chain it, say:
d/dx(x^3*sin(x))
u=x^3; v=sin(x)
du/dx=3*x^2; dv/dx=cos(x)
d/dx=v*du+u*dv
As you can see, differentiation is only a chain of simple operations.
Now, operator overloading.
If you can write a parser (try Pyparsing) then you can request it to evaluate both the function and derivative! I've done this (using Flex/Bison) just for fun, and it is quite powerful. For you to get the idea, the derivative is computed recursively by overloading the corresponding operator, and recursively applying the chain rule, so the evaluation of "*" would correspond to u*v for function value and u*der(v)+v*der(u) for derivative value (try it in C++, it is also fun).
So there you go, I know you don't mean to write your own parser - by all means use existing code (visit www.autodiff.org for automatic differentiation of Fortran and C/C++ code). But it is always interesting to know how this stuff works.
Cheers,
Juan
Better late than never?
I've always done symbolic differentiation in whatever language by working with a parse tree.
But I also recently became aware of another method using complex numbers.
The parse tree approach consists of translating the following tiny Lisp code into whatever language you like:
(defun diff (s x)(cond
((eq s x) 1)
((atom s) 0)
((or (eq (car s) '+)(eq (car s) '-))(list (car s)
(diff (cadr s) x)
(diff (caddr s) x)
))
; ... and so on for multiplication, division, and basic functions
))
and following it with an appropriate simplifier, so you get rid of additions of 0, multiplying by 1, etc.
But the complex method, while completely numeric, has a certain magical quality. Instead of programming your computation F in double precision, do it in double precision complex.
Then, if you need the derivative of the computation with respect to variable X, set the imaginary part of X to a very small number h, like 1e-100.
Then do the calculation and get the result R.
Now real(R) is the result you would normally get, and imag(R)/h = dF/dX
to very high accuracy!
How does it work? Take the case of multiplying complex numbers:
(a+bi)(c+di) = ac + i(ad+bc) - bd
Now suppose the imaginary parts are all zero, except we want the derivative with respect to a.
We set b to a very small number h. Now what do we get?
(a+hi)(c) = ac + hci
So the real part of this is ac, as you would expect, and the imaginary part, divided by h, is c, which is the derivative of ac with respect to a.
The same sort of reasoning seems to apply to all the differentiation rules.
Symbolic Differentiation is an impressive introduction to the subject-at least for non-specialist like me :) The code is written in C++ btw.
Look up automatic differentiation. There are tools for Python. Also, this.
If you are thinking of writing the differentiation program from scratch, without utilizing other libraries as help, then the algorithm/approach of computing the derivative of any algebraic equation I described in my blog will be helpful.
You can try creating a class that will represent a limit rigorously and then evaluate it for (f(x)-f(a))/(x-a) as x approaches a. That should give a pretty accurate value of the limit.
if you're using string as an input, you can separate individual terms using + or - char as a delimiter, which will give you individual terms. Now you can use power rule to solve for each term, say you have x^3 which using power rule will give you 3x^2, or suppose you have a more complicated term like a/(x^3) or a(x^-3), again you can single out other variables as a constant and now solving for x^-3 will give you -3a/(x^2). power rule alone should be enough, however it will require extensive use of the factorization.
Unless any already made library deriving it's quite complex because you need to parse and handle functions and expressions.
Deriving by itself it's an easy task, since it's mechanical and can be done algorithmically but you need a basic structure to store a function.