Setup: 1) a string trie database formed from linked nodes and a vector array linking to the next node terminating in a leaf, 2) a recursive regular expression function that if A) char '*' continues down all paths until string length limit is reached, then continues down remaining string paths if valid, and B) char '?' continues down all paths for 1 char and then continues down remaining string paths if valid. 3) after reg expression the candidate strings are measured for edit distance against the 'try' string.
Problem: the reg expression works fine for adding chars or swapping ? for a char but if the remaining string has an error then there is not a valid path to a terminating leaf; making the matching function redundant. I tried adding a 'step-over' ? char if the end of the node vector was reached and then followed every path of that node - allowing this step-over only once; resulted in a memory exception; I cannot find logically why it is accessing the vector out of range - bactracking?
Questions: 1) how can the regular expression step over an invalid char and continue with the path? 2) why is swapping the 'sticking' char for '?' resulting in an overflow?
Function:
void Ontology::matchRegExpHelper(nodeT *w, string inWild, Set<string> &matchSet, string out, int level, int pos, int stepover)
{
if (inWild=="") {
matchSet.add(out);
} else {
if (w->alpha.size() == pos) {
int testLength = out.length() + inWild.length();
if (stepover == 0 && matchSet.size() == 0 && out.length() > 8 && testLength == tokenLength) {//candidate generator
inWild[0] = '?';
matchRegExpHelper(w, inWild, matchSet, out, level, 0, stepover+1);
} else
return; //giveup on this path
}
if (inWild[0] == '?' || (inWild[0] == '*' && (out.length() + inWild.length() ) == level ) ) { //wild
matchRegExpHelper(w->alpha[pos].next, inWild.substr(1), matchSet, out+w->alpha[pos].letter, level, 0, stepover);//follow path -> if ontology is full, treat '*' like a '?'
} else if (inWild[0] == '*')
matchRegExpHelper(w->alpha[pos].next, '*'+inWild.substr(1), matchSet, out+w->alpha[pos].letter, level, 0, stepover); //keep adding chars
if (inWild[0] == w->alpha[pos].letter) //follow self
matchRegExpHelper(w->alpha[pos].next, inWild.substr(1), matchSet, out+w->alpha[pos].letter, level, 0, stepover); //follow char
matchRegExpHelper(w, inWild, matchSet, out, level, pos+1, stepover);//check next path
}
}
Error Message:
+str "Attempt to access index 1 in a vector of size 1." std::basic_string<char,std::char_traits<char>,std::allocator<char> >
+err {msg="Attempt to access index 1 in a vector of size 1." } ErrorException
Note: this function works fine for hundreds of test strings with '*' wilds if the extra stepover gate is not used
Semi-Solved: I place a pos < w->alpha.size() condition on each path that calls w->alpha[pos]... - this prevented the backtrack calls from attempting to access the vector with an out of bounds index value. Still have other issues to work out - it loops infinitely adding the ? and backtracking to remove it, then repeat. But, moving forward now.
Revised question: why during backtracking is the position index accumulating and/or not deincrementing - so at somepoint it calls w->alpha[pos]... with an invalid position that is either remaining from the next node or somehow incremented pos+1 when passing upward?
SOLVED: combine the regular expression wilds function as loops in the matching function
Related
Hi I'm currently learning about recursive Inorder Binary Tree Traversal using C#. There's one main aspect I cannot understand, in particular with this code below.
public void InOrder(BinaryTreeNode node)
{
if (node != null)
{
InOrder(node.Left);
Console.WriteLine(node.Value);
InOrder(node.Right);
}
}
If I had a Binary tree that looked like this...
9
/ \
4 20
/ \ / \
1 6 15 170
I know that eventually by recursively calling Inorder(node.left) I will get to the left leaf of the binary tree i.e. the very end of the tree, where node.left will equal null as there are no more nodes.
The tree would look like this...
9
/ \
4 20
/ \ / \
1 6 15 170
/
null
Because node.left = null, the first recursive function
InOrder(node.left)
will terminate, and
Console.Writeline(node.left)
will execute
Printing a value of 1
Eventually these null values move up the call stack after each node is analysed, and all nodes are printed, the tree starts to look like this, as null value moves up the tree..
9
/ \
4 20
/ \ / \
null 6 15 170
/ \ / \
null null null
Eventually all the nodes in the tree are equal to null, and all nodes are printed in order to an output of ...
1, 4, 6, 9, 15, 20, 170
What I don't understand is how this null value is moving up the tree, and changing all the nodes that have been analysed to null when there is no return value. Normally there would be a base case like...
if (node == null)
{
return null;
}
For this, I understand that null is being returned so will persist/return up the call stack. But for fist block of code above, there is no return statement.
I also find it just as confusing when there is only a return statement without a return value like...
if (node == null)
{
return;
}
Again there is no return of null specified, so how does this null value move up the tree as each node is evaluated?
There isn't a problem with any of this code, it works as expected, and prints all the nodes of the Binary Tree InOrder. This is more about understanding Recursion, and why the first block of code still works even though a return null value is not specified.
Thanks in Advance for the help.
there is no return of null specified, so how does this null value move up the tree as each node is evaluated?
The function will still return, even if there is no value to return. It's done executing, so control is passed back to the caller.
if (node != null) <- skipped entirely when the node is null
{
InOrder(node.Left);
Console.WriteLine(node.Value);
InOrder(node.Right);
}
For the tree you gave, this is what happens at the node with value=1:
It's not null, so we go into the if block.
We evaluate InOrder(node.Left) which is just InOrder(null):
It's null, so the if block is skipped.
We return to the caller, InOrder(node with value=1)
Console.WriteLine(node.Value) prints 1.
etc...
Although you can't 'see' the base case in the code, it's still there :) just implicitly.
Is there an easy way to understand when you can just call the recursive method vs having to set that recursive method to a variable?
For example...
Just calling the recursive function to traverse:
self.recurse(node.left)
self.recurse(node.right)
Having to set the recursive function to node.left and node.right:
node.left = self.recurse(node.left)
node.right = self.recurse(node.left)
Another example is to delete a node in a bst you have to set the recursive function to root.left and root.right... I get it but not completely... is there a easy way to understand when you can just call the recursive function vs having to set it to node.left, node.right..etc...?
def deleteNode(self, root: TreeNode, key:int) -> TreeNode:
if not root:
return root
if key < root.val:
root.left = self.deleteNode(root.left,key)
elif key > root.val:
root.right = self.deleteNode(root.right,key)
else:
if not root.left:
return root.right
elif not root.right:
return root.left
root.val = self.successor(root.right)
root.right = self.deleteNode(root.right,root.val)
return root
To understand this two above scenarios (Simple Recursive Call and Set result of Recursive call to a Variable), just try to understand the following code/function.
Let's say, you have a TREE, which contains a value in every node where value is either negative or positive. Now let's say, you are going to count how many nodes are there whose value is Positive.
The TREE structure for this problem is like following:
TREE{
Integer val;
TREE left = right = null;
}
Now you gave me this problem to solve. And I wrote a function/method which will count nodes with positive value. The function is following:
Integer countNodes(TREE node){
if(node == null){
return 0;
}else{
Integer count = 0; // which will count how many nodes are there with positive value
if(node.val >= 0){
count += 1; // if the value is positive I incremented count
}
// and we are checking every other nodes present in the TREE
getCount(node.left);
getCount(node.right);
// and return the final result
return count;
}
}
Now I returned my function to you, and you executed! But what! There is a big WRONG! It's giving wrong result, over and over again!!
But why???
Let's analysis.
if(node.val >= 0){
count += 1;
}
Up to that we were right! But the problem was, we were incremented the count, but wasn't use it! Each time we was calling function recursively, a new stack frame was created, a new variable named "count" was created, but we were not using this value!
To use the variable "count", we need to re-initialize the returned value of every recursive call to the variable, that's the way we can keep a link between the current stack-frame and the previous stack-frame and the previous of previous stack-frame and goes onn!.. we need to change little-bit in the function countNodes like following:
if(node.val >= 0){
count += 1;
}
count += getCount(node.left); // re-initialize count in each recursive call
count += getCount(node.right); // re-initialize count in each recursive call
return count;
Now everything we'll be alright! this code will work perfectly!
The above scenarios is implies your problem.
self.recurse(node.left)
self.recurse(node.right)
this is nothing but simple traversing over all the nodes.
But if you need to use the returned result of every recursion, you need to initialize/re-initialize the returned value to a variable. That's what is happening with:
node.left = self.recurse(node.left)
node.right = self.recurse(node.left)
I HOPE this long (bit long) explanation will help to go further. Happy Coding! : )
Following are 2 codes:
1. Find the kth smallest integer in a binary search tree:
void FindKthSmallest(struct TreeNode* root, int& k)
{
if (root == NULL) return;
if (k == 0) return; // k==0 means target node has been found
FindKthSmallest (root->left, k);
if (k > 0) // k==0 means target node has been found
{
k--;
if (k == 0) { // target node is current node
cout << root->data;
return;
} else {
FindKthSmallest (root->right, k);
}
}
}
Find the number of nodes in a binary tree:
int Size (struct TreeNode* root)
{
if (root == NULL) return 0;
int l = Size (root->left);
int r = Size (root->right);
return (l+r+1);
}
My Question:
In both these codes, I will have to keep track of the number of nodes I visit. Why is it that code 1 requires passing a parameter by reference to keep track of the number of nodes I visit, whereas code 2 does not require any variable to be passed by reference ?
The first code (1) is looking for the smallest node in your BST. You search from the root down the left side of the tree since the smallest valued node will be found in that location. You make several checks:
root == null - to determine if the tree is empty.
k == 0 - zero in this case is the smallest element. You are making this assumption based on whatever principles are apart of this tree.
Then you recursively traverse the list to find the next smallest in the left side of the tree. You perform one more check that if k > 0 you decrement k <- this is why you need to pass by reference since you are making changes to some value k given by a separate function, global variable, etc. If k happens to be zero then you have found the smallest valued node, if not you go one right of the current node and then continue the process from there. This seems like a very arbitrary way of finding the smallest node...
For the second code (2) you are just counting the nodes in your tree starting at the root and counting each subsequent node (either left or right) recursively until no more nodes can be found. You return your result which is the total amount of left nodes,right nodes. and + 1 for the root since it was not counted earlier. In this instance no passed by reference variable is needed although you could potentially implement one if you choose to do so.
Does this help?
Passing the parameter by reference allows you to keep track of the count within the recursive process, otherwise the count would reset. It allows you to modify the data within the memory space, thus changing the former value not the current/local value.
I need to write recursive function Repl
takes as input an expression in e in Expr and returns an expression in Expr
wherein each number is replaced by the number 1.
For example, if e is the expression
((((9 + 5) ∗ 2) ∗ (2 + (4 ∗ 6))))
then Repl(e) is the expression
((((1 + 1) ∗ 1) ∗ (1 + (1 ∗ 1))))
Can anybody help me how to go about this?
Iterative one is easy to write but how to write it recursively?
It is not clear why you would want a recursive solution for this problem, but the solution is relatively straightforward. Here is pseudocode:
string replace(string s, bool seenDigit) {
if (s == "") {
// The string is empty : we are done
return "";
}
if (s[0] is digit) {
if (seenDigit) {
// This is a second, third, etc. digit in a multi-digit chain
// It has been replaced with "1" already, so we cut it out
return replace(s.substring(1), true);
} else {
// This is the first digit in a chain of one or more digits
// Replace it with "1", and tell the next level that we've
// done the replacement already
return "1"+replace(s.substring(1), true);
}
} else {
// Non-digits do not get replaced
return s[0] + replace(s.substring(1), false);
}
}
s[0] means the first character; string+string denotes concatenation.
Making #dasblinkenlight's solution tail recursive:
string replace(string sToGo, string sSoFar, bool inNumber) {
if (sToGo == "") {
return sSoFar;
}
if (sToGo[0] is digit) {
if (isNumber) {
return replace(sSoFar, sToGo.substring(1), true);
} else {
return replace(sSoFar+"1", sToGo.substring(1), true);
}
} else {
return replace(sSoFar+s[0], sToGo.substring(1), false);
}
}
Notice that every return is either a direct value (the base case) or directly returns what a recursive call gives back. This means the program doesn't need to keep track of the recursive calls, because there's nothing to do with the value being returned other than returning it up the chain, which means (if the interpreter takes advantage of it) that the primary downside to using recursion (the overhead of the stack) can be eliminated.
I've been experimenting with genetic algorithms as of late and now I'd like to build mathematical expressions out of the genomes (For easy talk, its to find an expression that matches a certain outcome).
I have genomes consisting of genes which are represented by bytes, One genome can look like this: {12, 127, 82, 35, 95, 223, 85, 4, 213, 228}. The length is predefined (although it must fall in a certain range), neither is the form it takes. That is, any entry can take any byte value.
Now the trick is to translate this to mathematical expressions. It's fairly easy to determine basic expressions, for example: Pick the first 2 values and treat them as products, pick the 3rd value and pick it as an operator ( +, -, *, /, ^ , mod ), pick the 4th value as a product and pick the 5th value as an operator again working over the result of the 3rd operator over the first 2 products. (or just handle it as an postfix expression)
The complexity rises when you start allowing priority rules. Now when for example the entry under index 2 represents a '(', your bound to have a ')' somewhere further on except for entry 3, but not necessarily entry 4
Of course the same goes for many things, you can't end up with an operator at the end, you can't end up with a loose number etc.
Now i can make a HUGE switch statement (for example) taking in all the possible possibilities but this will make the code unreadable. I was hoping if someone out there knows a good strategy of how to take this one on.
Thanks in advance!
** EDIT **
On request: The goal I'm trying to achieve is to make an application which can resolve a function for a set of numbers. As for the example I've given in the comment below: {4, 11, 30} and it might come up with the function (X ^ 3) + X
Belisarius in a comment gave a link to an identical topic: Algorithm for permutations of operators and operands
My code:
private static double ResolveExpression(byte[] genes, double valueForX)
{
// folowing: https://stackoverflow.com/questions/3947937/algorithm-for-permutations-of-operators-and-operands/3948113#3948113
Stack<double> operandStack = new Stack<double>();
for (int index = 0; index < genes.Length; index++)
{
int genesLeft = genes.Length - index;
byte gene = genes[index];
bool createOperand;
// only when there are enough possbile operators left, possibly add operands
if (genesLeft > operandStack.Count)
{
// only when there are at least 2 operands on the stack
if (operandStack.Count >= 2)
{
// randomly determine wether to create an operand by threating everything below 127 as an operand and the rest as an operator (better then / 2 due to 0 values)
createOperand = gene < byte.MaxValue / 2;
}
else
{
// else we need an operand for sure since an operator is illigal
createOperand = true;
}
}
else
{
// false for sure since there are 2 many operands to complete otherwise
createOperand = false;
}
if (createOperand)
{
operandStack.Push(GeneToOperand(gene, valueForX));
}
else
{
double left = operandStack.Pop();
double right = operandStack.Pop();
double result = PerformOperator(gene, left, right);
operandStack.Push(result);
}
}
// should be 1 operand left on the stack which is the ending result
return operandStack.Pop();
}
private static double PerformOperator(byte gene, double left, double right)
{
// There are 5 options currently supported, namely: +, -, *, /, ^ and log (math)
int code = gene % 6;
switch (code)
{
case 0:
return left + right;
case 1:
return left - right;
case 2:
return left * right;
case 3:
return left / right;
case 4:
return Math.Pow(left, right);
case 5:
return Math.Log(left, right);
default:
throw new InvalidOperationException("Impossible state");
}
}
private static double GeneToOperand(byte gene, double valueForX)
{
// We only support numbers 0 - 9 and X
int code = gene % 11; // Get a value between 0 and 10
if (code == 10)
{
// 10 is a placeholder for x
return valueForX;
}
else
{
return code;
}
}
#endregion // Helpers
}
Use "post-fix" notation. That handles priorities very nicely.
Post-fix notation handles the "grouping" or "priority rules" trivially.
For example, the expression b**2-4*a*c, in post-fix is
b, 2, **, 4, a, *, c, *, -
To evaluate a post-fix expression, you simply push the values onto a stack and execute the operations.
So the above becomes something approximately like the following.
stack.push( b )
stack.push( 2 )
x, y = stack.pop(), stack.pop(); stack.push( y ** x )
stack.push( 4 )
stack.push( a )
x, y = stack.pop(), stack.pop(); stack.push( y * x )
stack.push( c )
x, y = stack.pop(), stack.pop(); stack.push( y * x )
x, y = stack.pop(), stack.pop(); stack.push( y - x )
To make this work, you need to have to partition your string of bytes into values and operators. You also need to check the "arity" of all your operators to be sure that the number of operators and the number of operands balances out. In this case, the number of binary operators + 1 is the number of operands. Unary operators don't require extra operands.
As ever with GA a large part of the solution is choosing a good representation. RPN (or post-fix) has already been suggested. One concern you still have is that your GA might throw up expressions which begin with operators (or mismatch operators and operands elsewhere) such as:
+,-,3,*,4,2,5,+,-
A (small) part of the solution would be to define evaluations for operand-less operators. For example one might decide that the sequence:
+
evaluates to 0, which is the identity element for addition. Naturally
*
would evaluate to 1. Mathematics may not have figured out what the identity element for division is, but APL has.
Now you have the basis of an approach which doesn't care if you get the right sequence of operators and operands, but you still have a problem when you have too many operands for the number of operators. That is, what is the intepretation of (postfix following) ?
2,4,5,+,3,4,-
which (possibly) evaluates to
2,9,-1
Well, now you have to invent your own convention if you want to reduce this to a single value. But you could adopt the convention that the GA has created a vector-valued function.
EDIT: response to OP's comment ...
If a byte can represent either an operator or an operand, and if your program places no restrictions on where a genome can be split for reproduction, then there will always be a risk that the offspring represents an invalid sequence of operators and operands. Consider, instead of having each byte encode either an operator or an operand, a byte could encode an operator+operand pair (you might run out of bytes quickly so perhaps you'd need to use two bytes). Then a sequence of bytes might be translated to something like:
(plus 1)(plus x)(power 2)(times 3)
which could evaluate, following a left-to-right rule with a meaningful interpretation for the first term, to 3((x+1)^2)