Recursive hypothesis-building with ambiguites - what's it called? - recursion

There's a problem I've encountered a lot (in the broad fields of data analyis or AI). However I can't name it, probably because I don't have a formal CS background. Please bear with me, I'll give two examples:
Imagine natural language parsing:
The flower eats the cow.
You have a program that takes each word, and determines its type and the relations between them. There are two ways to interpret this sentence:
1) flower (substantive) -- eats (verb) --> cow (object)
using the usual SVO word order, or
2) cow (substantive) -- eats (verb) --> flower (object)
using a more poetic world order. The program would rule out other possibilities, e.g. "flower" as a verb, since it follows "the". It would then rank the remaining possibilites: 1) has a more natural word order than 2), so it gets more points. But including the world knowledge that flowers can't eat cows, 2) still wins. So it might return both hypotheses, and give 1) a score of 30, and 2) a score of 70.
Then, it remembers both hypotheses and continues parsing the text, branching off. One branch assumes 1), one 2). If a branch reaches a contradiction, or a ranking of ~0, it is discarded. In the end it presents ranked hypotheses again, but for the whole text.
For a different example, imagine optical character recognition:
** **
** ** *****
** *******
******* **
* ** **
** **
I could look at the strokes and say, sure this is an "H". After identifying the H, I notice there are smudges around it, and give it a slightly poorer score.
Alternatively, I could run my smudge recognition first, and notice that the horizontal line looks like an artifact. After removal, I recognize that this is ll or Il, and give it some ranking.
After processing the whole image, it can be Hlumination, lllumination or Illumination. Using a dictionary and the total ranking, I decide that it's the last one.
The general problem is always some kind of parsing / understanding. Examples:
Natural languages or ambiguous languages
OCR
Path finding
Dealing with ambiguous or incomplete user imput - which interpretations make sense, which is the most plausible?
I'ts recursive.
It can bail out early (when a branch / interpretation doesn't make sense, or will certainly end up with a score of 0). So it's probably some kind of backtracking.
It keeps all options in mind in light of ambiguities.
It's based off simple rules at the bottom can_eat(cow, flower) = true.
It keeps a plausibility ranking of interpretations.
It's recursive on a meta level: It can fork / branch off into different 'worlds' where it assumes different hypotheses when dealing with the next part of data.
It'll forward the individual rankings, probably using bayesian probability, to dependent hypotheses.
In practice, there will be methods to train this thing, determine ranking coefficients, and there will be cutoffs if the tree becomes too big.
I have no clue what this is called. One might guess 'decision tree' or 'recursive descent', but I know those terms mean different things.
I know Prolog can solve simple cases of this, like genealogies and finding out who is whom's uncle. But you have to give all the data in code, and it doesn't seem convienent or powerful enough to do this for my real life cases.
I'd like to know, what is this problem called, are there common strategies for dealing with this? Is there good literature on the topic? Are there libraries for ideally C(++), Python, were you can just define a bunch of rules, and it works out all the rankings and hypotheses?

I don't think there is one answer that fits all the bullet points you have. But I hope my links will lead you closer to an answer or might give you a different question.
I think the closest answer is Bayesian network since you have probabilities affecting each other as I understand it, it is also related to Conditional probability and Fuzzy Logic
You also describe a bit of genetic programming as well as Artificial Neural Networks
I can name drop some more topics which might be related:
http://en.wikipedia.org/wiki/Rule-based_programming
http://en.wikipedia.org/wiki/Expert_system
http://en.wikipedia.org/wiki/Knowledge_engineering
http://en.wikipedia.org/wiki/Fuzzy_system
http://en.wikipedia.org/wiki/Bayesian_inference

Related

Will Dijkstra or A* work correctly with cost a function of full path?

What I'm considering is this: when a node becomes the current node, compute "on the fly" the cost to each neighbor, where the cost is a function of the complete path to arrive at the current node. I can't think how this would break the assumptions of the algorithm, but I have a feeling it might.
I'm doing the on the fly computation for storage reasons anyway, but the new thing would be having the costs be a function of more than the two nodes involved. Could it work?
As far as I see, it doesn't break the assumptions of the Dijsktra algorithm, i.e. you can still continue to use it. However, when you want to do so, it does require you to completely refigure your graph.
More detailed, you can't simply use Node-Indices {1,...,N} anymore, but then your state rather needs to be something like {(1,all-ways-to-get-there), ... , (N,all-ways-to-get-there)}. This will bring in a bad exponential scaling.
The reason for this is that the Dijkstra algorithm -- like Dynamic programming -- relies on the fact that the problem can be split in parts and solved there, which is not the case here.
Here is an example why it can't be done by "normal" Dijkstra: say your function which assigns a cost to a given way {node_1, node_2, ..., node_N} is called f and is assumed arbitrary. Then it is completely irrelevant what your current best costs or best path {node_1, ..., node_{N-1}} is at the moment, as you can't make any implication on that -- all you can do is to work out each possible path, which grows exponentially and is hopeless for large graphs.
In case your function fulfills some requirements, however, there might be better things to do. For example, in the simplest case when your function is linear, f({path1} + {path2}) = f({path1}) + f({path2})], the "original" Dijkstra algorithm is recovered.
If it's possible to pre-compute the cost of travelling between each pair of nodes, then there is absolutely no reason you can't use Dijkstra or A*, as long as none of your edge weights can be negative.
If it's not possible to pre-compute the cost, then it's likely that you're doing something wrong in your pathfinding, as it likely depends on the state of the search. :)

R Constained Combination generation

Say I have an set of string:
x=c("a1","b2","c3","d4")
If I have a set of rules that must be met:
if "a1" and "b2" are together in group, then "c3" cannot be in that group.
if "d4" and "a1" are together in a group, then "b2" cannot be in that group.
I was wondering what sort of efficient algorithm are suitable for generating all combinations that meet those rules? What research or papers or anything talk about these type of constrained combination generation problems?
In the above problem, assume its combn(x,3)
I don't know anything about R, so I'll just address the theoretical aspect of this question.
First, the constraints are really boolean predicates of the form "a1 ^ b2 -> ¬c3" and so on. That means that all valid combinations can be represented by one binary decision diagram, which can be created by taking each of the constraints and ANDing them together. In theory you might make an exponentially large BDD that way (that usually doesn't happen, but depends on the structure of the problem), but that would mean that you can't really list all combinations anyway, so it's probably not too bad.
For example the BDD generated for those two constraints would be (I think - not tested - just to give an idea)
But since this is really about a family of sets, a ZDD probably works even better. The difference, roughly, between a BDD and a ZDD is that a BDD compresses nodes that have equal sub-trees (in the total tree of all possibilities), while the ZDD compresses nodes where the solid edge (ie "set this variable to 1") goes to False. Both re-use equal sub-trees and thus form a DAG.
The ZDD of the example would be (again not tested)
I find ZDDs a bit easier to manipulate in code, because any time a variable can be set, it will appear in the ZDD. In contrast, in a BDD, "skipped" nodes have to be detected, including "between the last node and the leaf", so for a BDD you have to keep track of your universe. For a ZDD, most operations are independent of the universe (except complement, which is rarely needed in the family-of-sets scenario). A downside is that you have to be aware of the universe when constructing the constraints, because they have to contain "don't care" paths for all the variables not mentioned in the constraint.
You can find more information about both BDDs and ZDDs in The Art of Computer Programming volume 4A, chapter 7.1.4, there is an old version available for free here.
These methods are in particular nice to represent large numbers of such combinations, and to manipulate them somehow before generating all the possibilities. So this will also work when there are many items and many constraints (such that the final count of combinations is not too large), (usually) without creating intermediate results of exponential size.

How to quantitatively measure how simplified a mathematical expression is

I am looking for a simple method to assign a number to a mathematical expression, say between 0 and 1, that conveys how simplified that expression is (being 1 as fully simplified). For example:
eval('x+1') should return 1.
eval('1+x+1+x+x-5') should returns some value less than 1, because it is far from being simple (i.e., it can be further simplified).
The parameter of eval() could be either a string or an abstract syntax tree (AST).
A simple idea that occurred to me was to count the number of operators (?)
EDIT: Let simplified be equivalent to how close a system is to the solution of a problem. E.g., given an algebra problem (i.e. limit, derivative, integral, etc), it should assign a number to tell how close it is to the solution.
The closest metaphor I can come up with it how a maths professor would look at an incomplete problem and mentally assess it in order to tell how close the student is to the solution. Like in a math exam, were the student didn't finished a problem worth 20 points, but the professor assigns 8 out of 20. Why would he come up with 8/20, and can we program such thing?
I'm going to break a stack-overflow rule and post this as an answer instead of a comment, because not only I'm pretty sure the answer is you can't (at least, not the way you imagine), but also because I believe it can be educational up to a certain degree.
Let's assume that a criteria of simplicity can be established (akin to a normal form). It seems to me that you are very close to trying to solve an analogous to entscheidungsproblem or the halting problem. I doubt that in a complex rule system required for typical algebra, you can find a method that gives a correct and definitive answer to the number of steps of a series of term reductions (ipso facto an arbitrary-length computation) without actually performing it. Such answer would imply knowing in advance if such computation could terminate, and so contradict the fact that automatic theorem proving is, for any sufficiently powerful logic capable of representing arithmetic, an undecidable problem.
In the given example, the teacher is actually either performing that computation mentally (going step by step, applying his own sequence of rules), or gives an estimation based on his experience. But, there's no generic algorithm that guarantees his sequence of steps are the simplest possible, nor that his resulting expression is the simplest one (except for trivial expressions), and hence any quantification of "distance" to a solution is meaningless.
Wouldn't all this be true, your problem would be simple: you know the number of steps, you know how many steps you've taken so far, you divide the latter by the former ;-)
Now, returning to the criteria of simplicity, I also advice you to take a look on Hilbert's 24th problem, that specifically looked for a "Criteria of simplicity, or proof of the greatest simplicity of certain proofs.", and the slightly related proof compression. If you are philosophically inclined to further understand these subjects, I would suggest reading the classic Gödel, Escher, Bach.
Further notes: To understand why, consider a well-known mathematical artefact called the Mandelbrot fractal set. Each pixel color is calculated by determining if the solution to the equation z(n+1) = z(n)^2 + c for any specific c is bounded, that is, "a complex number c is part of the Mandelbrot set if, when starting with z(0) = 0 and applying the iteration repeatedly, the absolute value of z(n) remains bounded however large n gets." Despite the equation being extremely simple (you know, square a number and sum a constant), there's absolutely no way to know if it will remain bounded or not without actually performing an infinite number of iterations or until a cycle is found (disregarding complex heuristics). In this sense, every fractal out there is a rough approximation that typically usages an escape time algorithm as an heuristic to provide an educated guess whether the solution will be bounded or not.

Computer Adaptive Testing 1PL Ability Calculation Math: How to implement?

Preamble:
I have been implementing my own CAT system. The resources that have helped me most are these:
An On-line, Interactive, Computer Adaptive Testing Tutorial, 11/98 -- A good explanation of how to pick a test question based on which one would return the most information. Fascinating idea, really. The equations are not illustrated with examples, however... but there is a simulation to play with. Unfortunately the simulation is down!
Computer-Adaptive Testing: A Methodology Whose Time Has Come -- This has similar equations, although it does not use IRT or the Newton-Raphson Method. It is also Rasch, not 3PL. It does, however, have a BASIC program that is far more explicit than the usual equations that are cited. I have converted portions of the program in order to get my own system to experiment with, but I would prefer to use 1PL and/or 3PL.
Rasch Dichotomous Model vs. One-parameter Logistic Model -- This clears some stuff up, but perhaps only makes me more dangerous at this stage.
Now, the question.
I want to be able to measure someone's ability level based on a series of questions that are rated at a 1PL difficulty level and of course the person's answers and whether or not they are correct.
I have to first have a function that calculates the probably of a given item. This equation gives the probability function for 1PL.
Probability correct = e^(ability - difficulty) / (1+ e^(ability - difficulty))
I'll go with this one arbitrarily for now. Using an ability estimate of 0, we get the following probabilities:
-0.3 --> 0.574442516811659
-0.2 --> 0.549833997312478
-0.1 --> 0.52497918747894
0 --> 0.5
0.1 --> 0.47502081252106
0.2 --> 0.450166002687522
0.3 --> 0.425557483188341
This makes sense. A problem targeting their level is 50/50... and the questions are harder or easier depending on which direction you go. The harder questions have a smaller chance of coming out correct.
Now... consider a test taker that has done five questions at this difficulty: -.1, 0, .1, .2, .1. Assume they got them all correct except the one that's at difficulty .2. Assuming an ability level of 0... I would want some equations to indicate that this person is slightly above average.
So... how to calculate that with 1PL? This is where it gets hard.
Looking at the equations on the various pages... I will start with an assumed ability level... and then gradually adjust it with each question after more or less like the following.
Starting Ability: B0 = 0
Ability after problem 1: B1 = B0 + [summations and function evaluated for item 1 at ability B0]
Ability after problem 2: B2 = B1 + [summations and functions evaluated for items 1-2 at ability B1]
Ability after problem 3: B3 = B2 + [summations and functions evaluated for items 1-3 at ability B2]
Ability after problem 4: B4 = B3 + [summations and functions evaluated for items 1-4 at ability B3]
Ability after problem 5: B5 = B4 + [summations and functions evaluated for items 1-5 at ability B4]
And so on.
Just reading papers on this, this is the gist of what the algorithm should be doing. But there are so many different ways to do this. The behaviour of my code is clearly wrong as I get division by zero errors... so this is where I get lost. I've messed with information functions and done derivatives, but my college level math is not cutting it.
Can someone explain to me how to do this part? The literature I've read is short on examples and the descriptions of the math appears incomplete to me. I suppose I'm asking for how to do this with a 3PL model that assumes that c is always zero and a is always 1.7 (or maybe -1.7-- whatever works.) I was trying to get to 1PL somehow anyway.
Edit: A visual guide to item response theory is the best explanation of how to do this I've seen so far, but the text gets confusing at the most critical point. I'm closer to getting this, but I'm still not understanding something. Also... the pattern of summations and functions isn't in this text like I expected.
How to do this:
This is an inefficient solution, but it works and is reasonably inituitive.
The last link I mentioned in the edit explains this.
Given a probability function, set of question difficulties, and corresponding set of evaluations-- ie, whether or not they got it correct.
With that, I can get a series of functions that will tell you the chance of their giving that exact response. Now... multiply all of those functions together.
We now have a big mess! But it's a single function in terms of the unknown ability variable that we want to find.
Next... run a slew of numbers through this function. Whatever returns the maximum value is the test taker's ability level. This can be used to either determine the standard error or to pick the next question for computer adaptive testing.

Function point to kloc ratio as a software metric... the "Name That Tune" metric?

What do you think of using a metric of function point to lines of code as a metric?
It makes me think of the old game show "Name That Tune". "I can name that tune in three notes!" I can write that functionality in 0.1 klocs! Is this useful?
It would certainly seem to promote library usage, but is that what you want?
I think it's a terrible idea. Just as bad as paying programmers by lines of code that they write.
In general, I prefer concise code over verbose code, but only as long as it still expresses the programmers' intention clearly. Maximizing function points per kloc is going to encourage everyone to write their code as briefly as they possibly can, which goes beyond concise and into cryptic. It will also encourage people to join adjacent lines of code into one line, even if said joining would not otherwise be desirable, just to reduce the number of lines of code. The maximum allowed line length would also become an issue.
KLOC is tolerable if you strictly enforce code standards, kind of like using page requirements for a report: no putting five statements on a single line or removing most of the whitespace from your code.
I guess one way you could decide how effective it is for your environment is to look at several different applications and modules, get a rough estimate of the quality of the code, and compare that to the size of the code. If you can demonstrate that code quality is consistent within your organization, then KLOC isn't a bad metric.
In some ways, you'll face the same battle with any similar metric. If you count feature or function points, or simply features or modules, you'll still want to weight them in some fashion. Ultimately, you'll need some sort of subjective supplement to the objective data you'll collect.
"What do you think of using a metric of function point to lines of code as a metric?"
Don't get the question. The above ratio is -- for a given language and team -- a simple statistical fact. And it tends toward a mean value with a small standard deviation.
There are lots of degrees of freedom: how you count function points, what language you're using, how (collectively) clever the team is. If you don't change those things, the value stays steady.
After a few projects together, you have a solid expectation that 1200 function points will be 12,000 lines of code in your preferred language/framework/team organization.
KSloc / FP is a bare statistical observation. Clearly, there's something else about this that's bothering you. Could you be more specific in your question?
The metric of Function Points to Lines of Code is actually used to generate the language level charts (actually, it is Function Points to Statements) to give an approximate sense of how powerful a programming language is. Here is an example: http://web.cecs.pdx.edu/~timm/dm/functionpoints.html
I wouldn't recommend using that ratio for anything else, except high level approximations like the language level chart.
Promoting library usage is a good thing, but the other thing to keep in mind is you will lose in the ratio when you are building the libraries, and will only pay it off with dividends of savings over time. Bean-counters won't understand that.
I personally would like to see a Function point to ABC metric ratio -- as I am curious about how the ABC metric (which indicates size and includes complexity as part of the info) would relate - perhaps linear, perhaps exponential, etc... www.softwarerenovation.com/ABCMetric.pdf
All metrics suck. My theory has always been that if you have to have them, then use the easiest thing you can to gather them and be done with it and onto important things.
That generally means something along the lines of
grep -c ";" *.h *.cpp | awk -F: '/:/ {x += $2} END {print x}'
If you are looking for a "metric" to track code efficency, don't. If you insist, again try something stupid but easy like source file size (see grep command above, w/o the awk pipe) or McCabe (with a counter program).

Resources