Drools for Morphological Analysis - rules

Is Drools suitable for writing rules for Stemming and/or POS tagging ? Suggestions for a better rule-language are welcome. I read many papers in this field that use the rule-based approach but none of them mentioned what library or framework was used to write the rules.
My rules are like the following:
if (length = 3 & first_letter in group1 and second_letter in group2) then ...
if (length = 3 & first_letter in group1 and second_letter not_in group2) then ...
if (length = 3 & first_letter not_in group1 and second_letter in group2) then ...
if (length = 3 & first_letter not_in group1 and second_letter not_in group2) then ...
if (length = 4...
... and so on.
The problem is that these rules are too many to handle. Imagine that there are ten letter-groups, and that there is a case for each letter belonging to each group. I could easily have over a thousand rules to classify a word correctly. I wrote 30 of those rules in plain C# code and that was enough for me to see how inefficient this approach was. I already have my rules organized as a tree on paper. I just need the right framework to insert, represent, tweak, and test them.
I hope my question is clear. Thank you.

You can certainly use Drools for that. Drools can handle many thousand rules (I've seen kbases with 30k+ rules), much more complex than the ones you present above, without a sweat.
The main issue I see is not the runtime, but the maintenance of your rules. Doing it manually, due to your use case, seems a lot of work, does not matter which language/engine you choose. Maybe you can use a decision table to define your rules as that is usually a lot less "typing" to do? or maybe you can have a script generate all the rules for you? Drools supports both.

Related

Is there any way to do example based programming? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Is it possible to automatically generate a program/function by writing a certain amount of examples which show the before and after? How many examples would be needed to insure correctness and lack of holes? What is the name of doing such an automatic process?
It's called program synthesis or perhaps guessing. Your question reminds me of the 2013 ICFP programming contest where you were supposed to guess programs and all you had was the output for your chosen test input in a very simple functional language.
So imagine you wanted to make the identity function:
(define-by-example identity
(0 -> 0)
(1 -> 1))
There is something missing? What if I told you to make a function that returns it's own value for 0 and 1 only? How would you make that different than the above example? What if it's a non linear transformation. Eg. how many examples does it take to get a polynomial correct?
I guess we're back to something like this:
(define-by-example fib
(0 -> 0)
(1 -> 1)
(n -> (+ (fib (- n 1)) (fib (- n 2)))))
But many languages has exactly this. Even Schemes dirty cousin Racket has this:
#!racket
(define (fib n)
(match n
[0 0]
[1 1]
[n (+ (fib (- n 1)) (fib (- n 2)))]))
This is the Hello World of Haskell so we better write that next :-)
-- haskell
fib 0 = 0
fib 1 = 1
fib n = fib(n - 1) + fib(n - 2)
Now there are smart people out there that tries to find the best method of transferring the idea down to a syntax that is clear for the computer and us. I'm sure the essence of "by example" is good but it needs to be a way of explaining the corrolations too and I cannot think of a better way to do that than maths or code where maths don't apply.
This is a subdomain of machine learning called supervised learning. The most popular method to solve these problems is by using neural networks. Another method is genetic programming.
In the general case it is not possible to guarantee correctness, no matter how large the training set (number of examples) is. But in a typical application this is not required. The training is considered successfull if a high enough percentage os the results is correct.
Is it possible to automatically generate a program/function by writing a certain amount of examples which show the before and after?
I'd say yes, but ...
... it's not exactly that easy, and certainly not that generally applicable. There are basically two "approaches" that I know of:
a) Try to generate a program
Just pack together program code (most probably instructions from some instruction set) until you get the desired results. This can be done with brute force, but it's hardly possible to synthesize non-trivial functions, yet alone complete programs, or probably with techniques from your favorite artificial intelligence tool set, like hill climbing, simulated annealing, genetic programming and the whole rest.
The only reference that I know of is the "Superoptimizer" by Henry Massalin, which tries to generate a function that computes the same as a given function, just with fewer instructions.
It's probably better to use some higher level representation of "computation" than assembler instructions (probably the Lisp AST?), but that's just my guess.
b) Write a meta program that "somehow" learns to behave like the desired program
This is actually very common nowadays in the form of neural networks, e.g. for image or voice recognition. Here you don't try to get a program that does what you want (e.g. "recognize birds in arbitrary images") but rather write a general program that is capable of learning to "behave differently" and train that in order to behave as you want.
Note that there's a lot of effort going into understanding what these meta programs actually do after they learned their intended function. Just taking the first relevant result from a Google search is already an interesting read.

Most efficient way to print differences of two arrays?

Recently, a colleague of mine asked me how he could test the equalness of two arrays. He had two sources of Address and wanted to assert that both sources contained exactly the same elements, although order didn't matter.
Both using Array or like List in Java, or IList would be okay, but since there could be two equal Address objects, things like Sets can't be used.
In most programming languages, a List already has an equals method doing the comparison (assuming that the collection was ordered before doing it), but there is no information about the actual differences; only that there are some, or none.
The output should inform about elements that are in one collection but not in the other, and vice-versa.
An obvious approach would be to iterate through one of the collections (if one of them is), and just call contains(element) on the other one, and doing it the the other way around afterwards. Assuming a complexity of O(n) for contains, that would result in O(2n²), if I'm correct.
Is there a more efficient way for getting the information "A1 and A2 isn't in List1, A3 and A4 isn't in List2"? Are there data structures better suited for doing this job than lists? Is it worth it to sort the collections before and using a custom, binary search contains?
The first thing that comes to mind is using set difference
In pseudo-python
addr1 = set(originalAddr1)
addr2 = set(originalAddr2)
in1notin2 = addr1 - addr2
in2notin1 = addr2 - addr1
allDifferences = in1notin2 + in2notin1
From here you can see that set difference is O(len(set)) and union is O(len(set1) + len(set2)) giving you a linear time solution with this python specific set implementation, instead of quadratic as you suggest.
I believe other popular languages tend to implement these type of data structures pretty much the same way, but can't really be sure about this.
Is it worth to sort the collection [...]?
Compare the naive approach O(n²) to sorting two lists in O(n logn) and then comparing them in O(n) - or sorting one list in O(n logn) and iterating over the other in O(n)

Simple and short if clauses for combind statements

TRUE/FALSE if clauses are easily and quickly done in R. However, if the argument gets more complex, it also gets ugly very soon.
For instance:
I might want to execute different operations for a row(foo) dependent on the value in one cell (foo[1]).
Let the intervals be 0:39 and 40:59 and 60:100
Something like does not exit:
(if foo[1] "in" 40:60){...
In fact, I only see ways of at least two if clauses and two else statements and the action for the first interval somewhere at the bottom of the code. With more intervals(or any other condition) it is getting more complex.
Is there a best practice (for this purpose or others) with a simple construction and nice design to read?
Not totally sure, but I would suggest to use something like:
f <- approxfun(0:100,c(rep(1,40),rep(2,20),rep(3,41)),method="c")
fac <- f(foo)
tapply(foo,fac,FUN,...)
where you can use any function FUN.
Not totally following your question. Are you looking for a switch statement? Have a look at this example:
ccc <- c("b","QQ","a","A","bb")
for(ch in ccc)
cat(ch,":",switch(EXPR = ch, a=1, b=2:3), "\n")

Real-world examples of recursion [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
What are real-world problems where a recursive approach is the natural solution besides depth-first search (DFS)?
(I don't consider Tower of Hanoi, Fibonacci number, or factorial real-world problems. They are a bit contrived in my mind.)
A real world example of recursion
How about anything involving a directory structure in the file system. Recursively finding files, deleting files, creating directories, etc.
Here is a Java implementation that recursively prints out the content of a directory and its sub-directories.
import java.io.File;
public class DirectoryContentAnalyserOne implements DirectoryContentAnalyser {
private static StringBuilder indentation = new StringBuilder();
public static void main (String args [] ){
// Here you pass the path to the directory to be scanned
getDirectoryContent("C:\\DirOne\\DirTwo\\AndSoOn");
}
private static void getDirectoryContent(String filePath) {
File currentDirOrFile = new File(filePath);
if ( !currentDirOrFile.exists() ){
return;
}
else if ( currentDirOrFile.isFile() ){
System.out.println(indentation + currentDirOrFile.getName());
return;
}
else{
System.out.println("\n" + indentation + "|_" +currentDirOrFile.getName());
indentation.append(" ");
for ( String currentFileOrDirName : currentDirOrFile.list()){
getPrivateDirectoryContent(currentDirOrFile + "\\" + currentFileOrDirName);
}
if (indentation.length() - 3 > 3 ){
indentation.delete(indentation.length() - 3, indentation.length());
}
}
}
}
There are lots of mathy examples here, but you wanted a real world example, so with a bit of thinking, this is possibly the best I can offer:
You find a person who has contracted a given contageous infection, which is non fatal, and fixes itself quickly( Type A) , Except for one in 5 people ( We'll call these type B ) who become permanently infected with it and shows no symptoms and merely acts a spreader.
This creates quite annoying waves of havoc when ever type B infects a multitude of type A.
Your task is to track down all the type Bs and immunise them to stop the backbone of the disease. Unfortunately tho, you cant administer a nationwide cure to all, because the people who are typeAs are also deadly allergic to the cure that works for type B.
The way you would do this, would be social discovery, given an infected person(Type A), choose all their contacts in the last week, marking each contact on a heap. When you test a person is infected, add them to the "follow up" queue. When a person is a type B, add them to the "follow up" at the head ( because you want to stop this fast ).
After processing a given person, select the person from the front of the queue and apply immunization if needed. Get all their contacts previously unvisited, and then test to see if they're infected.
Repeat until the queue of infected people becomes 0, and then wait for another outbreak..
( Ok, this is a bit iterative, but its an iterative way of solving a recursive problem, in this case, breadth first traversal of a population base trying to discover likely paths to problems, and besides, iterative solutions are often faster and more effective, and I compulsively remove recursion everywhere so much its become instinctive. .... dammit! )
Quicksort, merge sort, and most other N-log N sorts.
Matt Dillard's example is good. More generally, any walking of a tree can generally be handled by recursion very easily. For instance, compiling parse trees, walking over XML or HTML, etc.
Recursion is often used in implementations of the Backtracking algorithm. For a "real-world" application of this, how about a Sudoku solver?
Recursion is appropriate whenever a problem can be solved by dividing it into sub-problems, that can use the same algorithm for solving them. Algorithms on trees and sorted lists are a natural fit. Many problems in computational geometry (and 3D games) can be solved recursively using binary space partitioning (BSP) trees, fat subdivisions, or other ways of dividing the world into sub-parts.
Recursion is also appropriate when you are trying to guarantee the correctness of an algorithm. Given a function that takes immutable inputs and returns a result that is a combination of recursive and non-recursive calls on the inputs, it's usually easy to prove the function is correct (or not) using mathematical induction. It's often intractable to do this with an iterative function or with inputs that may mutate. This can be useful when dealing with financial calculations and other applications where correctness is very important.
Surely that many compilers out there use recursion heavily. Computer languages are inherently recursive themselves (i.e., you can embed 'if' statements inside other 'if' statements, etc.).
Disabling/setting read-only for all children controls in a container control. I needed to do this because some of the children controls were containers themselves.
public static void SetReadOnly(Control ctrl, bool readOnly)
{
//set the control read only
SetControlReadOnly(ctrl, readOnly);
if (ctrl.Controls != null && ctrl.Controls.Count > 0)
{
//recursively loop through all child controls
foreach (Control c in ctrl.Controls)
SetReadOnly(c, readOnly);
}
}
People often sort stacks of documents using a recursive method. For example, imagine you are sorting 100 documents with names on them. First place documents into piles by the first letter, then sort each pile.
Looking up words in the dictionary is often performed by a binary-search-like technique, which is recursive.
In organizations, bosses often give commands to department heads, who in turn give commands to managers, and so on.
Famous Eval/Apply cycle from SICP
(source: mit.edu)
Here is the definition of eval:
(define (eval exp env)
(cond ((self-evaluating? exp) exp)
((variable? exp) (lookup-variable-value exp env))
((quoted? exp) (text-of-quotation exp))
((assignment? exp) (eval-assignment exp env))
((definition? exp) (eval-definition exp env))
((if? exp) (eval-if exp env))
((lambda? exp)
(make-procedure (lambda-parameters exp)
(lambda-body exp)
env))
((begin? exp)
(eval-sequence (begin-actions exp) env))
((cond? exp) (eval (cond->if exp) env))
((application? exp)
(apply (eval (operator exp) env)
(list-of-values (operands exp) env)))
(else
(error "Unknown expression type - EVAL" exp))))
Here is the definition of apply:
(define (apply procedure arguments)
(cond ((primitive-procedure? procedure)
(apply-primitive-procedure procedure arguments))
((compound-procedure? procedure)
(eval-sequence
(procedure-body procedure)
(extend-environment
(procedure-parameters procedure)
arguments
(procedure-environment procedure))))
(else
(error
"Unknown procedure type - APPLY" procedure))))
Here is the definition of eval-sequence:
(define (eval-sequence exps env)
(cond ((last-exp? exps) (eval (first-exp exps) env))
(else (eval (first-exp exps) env)
(eval-sequence (rest-exps exps) env))))
eval -> apply -> eval-sequence -> eval
Recursion is used in things like BSP trees for collision detection in game development (and other similar areas).
Real world requirement I got recently:
Requirement A: Implement this feature after thoroughly understanding Requirement A.
Recursion is applied to problems (situations) where you can break it up (reduce it) into smaller parts, and each part(s) looks similar to the original problem.
Good examples of where things that contain smaller parts similar to itself are:
tree structure (a branch is like a tree)
lists (part of a list is still a list)
containers (Russian dolls)
sequences (part of a sequence looks like the next)
groups of objects (a subgroup is a still a group of objects)
Recursion is a technique to keep breaking the problem down into smaller and smaller pieces, until one of those pieces become small enough to be a piece-of-cake. Of course, after you break them up, you then have to "stitch" the results back together in the right order to form a total solution of your original problem.
Some recursive sorting algorithms, tree-walking algorithms, map/reduce algorithms, divide-and-conquer are all examples of this technique.
In computer programming, most stack-based call-return type languages already have the capabilities built in for recursion: i.e.
break the problem down into smaller pieces ==> call itself on a smaller subset of the original data),
keep track on how the pieces are divided ==> call stack,
stitch the results back ==> stack-based return
Feedback loops in a hierarchical organization.
Top boss tells top executives to collect feedback from everyone in the company.
Each executive gathers his/her direct reports and tells them to gather feedback from their direct reports.
And on down the line.
People with no direct reports -- the leaf nodes in the tree -- give their feedback.
The feedback travels back up the tree with each manager adding his/her own feedback.
Eventually all the feedback makes it back up to the top boss.
This is the natural solution because the recursive method allows filtering at each level -- the collating of duplicates and the removal of offensive feedback. The top boss could send a global email and have each employee report feedback directly back to him/her, but there are the "you can't handle the truth" and the "you're fired" problems, so recursion works best here.
Parsers and compilers may be written in a recursive-descent method. Not the best way to do it, as tools like lex/yacc generate faster and more efficient parsers, but conceptually simple and easy to implement, so they remain common.
I have a system that uses pure tail recursion in a few places to simulate a state machine.
Some great examples of recursion are found in functional programming languages. In functional programming languages (Erlang, Haskell, ML/OCaml/F#, etc.), it's very common to have any list processing use recursion.
When dealing with lists in typical imperative OOP-style languages, it's very common to see lists implemented as linked lists ([item1 -> item2 -> item3 -> item4]). However, in some functional programming languages, you find that lists themselves are implemented recursively, where the "head" of the list points to the first item in the list, and the "tail" points to a list containing the rest of the items ([item1 -> [item2 -> [item3 -> [item4 -> []]]]]). It's pretty creative in my opinion.
This handling of lists, when combined with pattern matching, is VERY powerful. Let's say I want to sum a list of numbers:
let rec Sum numbers =
match numbers with
| [] -> 0
| head::tail -> head + Sum tail
This essentially says "if we were called with an empty list, return 0" (allowing us to break the recursion), else return the value of head + the value of Sum called with the remaining items (hence, our recursion).
For example, I might have a list of URLs, I think break apart all the URLs each URL links to, and then I reduce the total number of links to/from all URLs to generate "values" for a page (an approach that Google takes with PageRank and that you can find defined in the original MapReduce paper). You can do this to generate word counts in a document also. And many, many, many other things as well.
You can extend this functional pattern to any type of MapReduce code where you can taking a list of something, transforming it, and returning something else (whether another list, or some zip command on the list).
XML, or traversing anything that is a tree. Although, to be honest, I pretty much never use recursion in my job.
A "real-world" problem solved by recursion would be nesting dolls. Your function is OpenDoll().
Given a stack of them, you would recursilvey open the dolls, calling OpenDoll() if you will, until you've reached the inner-most doll.
Parsing an XML file.
Efficient search in multi-dimensional spaces. E. g. quad-trees in 2D, oct-trees in 3D, kd-trees, etc.
Hierarchical clustering.
Come to think of it, traversing any hierarchical structure naturally lends itself to recursion.
Template metaprogramming in C++, where there are no loops and recursion is the only way.
Suppose you are building a CMS for a website, where your pages are in a tree structure, with say the root being the home-page.
Suppose also your {user|client|customer|boss} requests that you place a breadcrumb trail on every page to show where you are in the tree.
For any given page n, you'll may want to walk up to the parent of n, and its parent, and so on, recursively to build a list of nodes back up to the root of page tree.
Of course, you're hitting the db several times per page in that example, so you may want to use some SQL aliasing where you look up page-table as a, and page-table again as b, and join a.id with b.parent so you make the database do the recursive joins. It's been a while, so my syntax is probably not helpful.
Then again, you may just want to only calculate this once and store it with the page record, only updating it if you move the page. That'd probably be more efficient.
Anyway, that's my $.02
You have an organization tree that is N levels deep. Several of the nodes are checked, and you want to expand out to only those nodes that have been checked.
This is something that I actually coded.
Its nice and easy with recursion.
In my job we have a system with a generic data structure that can be described as a tree. That means that recursion is a very effective technique to work with the data.
Solving it without recursion would require a lot of unnecessary code. The problem with recursion is that it is not easy to follow what happens. You really have to concentrate when following the flow of execution. But when it works the code is elegant and effective.
Calculations for finance/physics, such as compound averages.
Parsing a tree of controls in Windows Forms or WebForms (.NET Windows Forms / ASP.NET).
The best example I know is quicksort, it is a lot simpler with recursion. Take a look at:
shop.oreilly.com/product/9780596510046.do
www.amazon.com/Beautiful-Code-Leading-Programmers-Practice/dp/0596510047
(Click on the first subtitle under the chapter 3: "The most beautiful code I ever wrote").
Phone and cable companies maintain a model of their wiring topology, which in effect is a large network or graph. Recursion is one way to traverse this model when you want to find all parent or all child elements.
Since recursion is expensive from a processing and memory perspective, this step is commonly only performed when the topology is changed and the result is stored in a modified pre-ordered list format.
Inductive reasoning, the process of concept-formation, is recursive in nature. Your brain does it all the time, in the real world.
Ditto the comment about compilers. The abstract syntax tree nodes naturally lend themselves to recursion. All recursive data structures (linked lists, trees, graphs, etc.) are also more easily handled with recursion. I do think that most of us don't get to use recursion a lot once we are out of school because of the types of real-world problems, but it's good to be aware of it as an option.

Smart design of a math parser?

What is the smartest way to design a math parser? What I mean is a function that takes a math string (like: "2 + 3 / 2 + (2 * 5)") and returns the calculated value? I did write one in VB6 ages ago but it ended up being way to bloated and not very portable (or smart for that matter...). General ideas, psuedo code or real code is appreciated.
A pretty good approach would involve two steps. The first step involves converting the expression from infix to postfix (e.g. via Dijkstra's shunting yard) notation. Once that's done, it's pretty trivial to write a postfix evaluator.
I wrote a few blog posts about designing a math parser. There is a general introduction, basic knowledge about grammars, sample implementation written in Ruby and a test suite. Perhaps you will find these materials useful.
You have a couple of approaches. You could generate dynamic code and execute it in order to get the answer without needing to write much code. Just perform a search on runtime generated code in .NET and there are plenty of examples around.
Alternatively you could create an actual parser and generate a little parse tree that is then used to evaluate the expression. Again this is pretty simple for basic expressions. Check out codeplex as I believe they have a math parser on there. Or just look up BNF which will include examples. Any website introducing compiler concepts will include this as a basic example.
Codeplex Expression Evaluator
If you have an "always on" application, just post the math string to google and parse the result. Simple way but not sure if that's what you need - but smart in some way i guess.
I know this is old, but I came across this trying to develop a calculator as part of a larger app and ran across some issues using the accepted answer. The links were IMMENSELY helpful in understanding and solving this problem and should not be discounted. I was writing an Android app in Java and for each item in the expression "string," I actually stored a String in an ArrayList as the user types on the keypad. For the infix-to-postfix conversion, I iterated through each String in the ArrayList, then evaluated the newly arranged postfix ArrayList of Strings. This was fantastic for a small number of operands/operators, but longer calculations were consistently off, especially as the expressions started evaluating to non-integers. In the provided link for Infix to Postfix conversion, it suggests popping the Stack if the scanned item is an operator and the topStack item has a higher precedence. I found that this is almost correct. Popping the topStack item if it's precedence is higher OR EQUAL to the scanned operator finally made my calculations come out correct. Hopefully this will help anyone working on this problem, and thanks to Justin Poliey (and fas?) for providing some invaluable links.
The related question Equation (expression) parser with precedence? has some good information on how to get started with this as well.
-Adam
Assuming your input is an infix expression in string format, you could convert it to postfix and, using a pair of stacks: an operator stack and an operand stack, work the solution from there. You can find general algorithm information at the Wikipedia link.
ANTLR is a very nice LL(*) parser generator. I recommend it highly.
Developers always want to have a clean approach, and try to implement the parsing logic from ground up, usually ending up with the Dijkstra Shunting-Yard Algorithm. Result is neat looking code, but possibly ridden with bugs. I have developed such an API, JMEP, that does all that, but it took me years to have stable code.
Even with all that work, you can see even from that project page that I am seriously considering to switch over to using JavaCC or ANTLR, even after all that work already done.
11 years into the future from when this question was asked: If you don't want to re-invent the wheel, there are many exotic math parsers out there.
There is one that I wrote years ago which supports arithmetic operations, equation solving, differential calculus, integral calculus, basic statistics, function/formula definition, graphing, etc.
Its called ParserNG and its free.
Evaluating an expression is as simple as:
MathExpression expr = new MathExpression("(34+32)-44/(8+9(3+2))-22");
System.out.println("result: " + expr.solve());
result: 43.16981132075472
Or using variables and calculating simple expressions:
MathExpression expr = new MathExpression("r=3;P=2*pi*r;");
System.out.println("result: " + expr.getValue("P"));
Or using functions:
MathExpression expr = new MathExpression("f(x)=39*sin(x^2)+x^3*cos(x);f(3)");
System.out.println("result: " + expr.solve());
result: -10.65717648378352
Or to evaluate the derivative at a given point(Note it does symbolic differentiation(not numerical) behind the scenes, so the accuracy is not limited by the errors of numerical approximations):
MathExpression expr = new MathExpression("f(x)=x^3*ln(x); diff(f,3,1)");
System.out.println("result: " + expr.solve());
result: 38.66253179403897
Which differentiates x^3 * ln(x) once at x=3.
The number of times you can differentiate is 1 for now.
or for Numerical Integration:
MathExpression expr = new MathExpression("f(x)=2*x; intg(f,1,3)");
System.out.println("result: " + expr.solve());
result: 7.999999999998261... approx: 8
This parser is decently fast and has lots of other functionality.
Work has been concluded on porting it to Swift via bindings to Objective C and we have used it in graphing applications amongst other iterative use-cases.
DISCLAIMER: ParserNG is authored by me.

Resources