I have some Perl code (for performance analysis) first developed under Linux which now needs to be ported to the mainframe. Apparently REXX is the scripting language of choice on that platform but this Perl script relies heavily on associative arrays (basically arrays where the index is a string).
Is there a way that in REXX? How would I code up something like:
$arr{"Pax"} = "Diablo";
$arr{"Bob"} = "Dylan";
print $arr{"Pax"} . "\n";
if (defined $arr{"no"}) {
print "Yes\n";
} else {
print "No\n";
}
You can use stem variables, not exactly like arrays but very similar
/* REXX */
NAME = PAX
ARRAY.NAME = "DIABLO"
NAME = BOB
ARRAY.NAME = "DYLAN"
NAME = 'PAX'
SAY "ARRAY.PAX " IS ARRAY.NAME
NAME = 'BOB'
SAY "ARRAY.BOB " IS ARRAY.NAME
NAME = 'SANDY'
SAY "ARRAY.SANDY " IS ARRAY.NAME
IF ARRAY.NAME = "ARRAY.SANDY" THEN SAY "ARRAY.SANDY IS EMPTY"
The above Rexx will print
ARRAY.PAX IS DIABLO
ARRAY.BOB IS DYLAN
ARRAY.SANDY IS ARRAY.SANDY
ARRAY.SANDY IS EMPTY
They can also be compound like a.b.c
A stem variable if empty will return itself.
There is no way to iterate of a stem that does not use consecutive numbers as the index that I know of.
IBM Manual with reference to Stem variables
Perl is available as an extra free feature for ZOS IBM Ported Tools for z/OS
I just want to add a bit more to the answer given by Deuian.
I agree, REXX stem variables
are the answer.
Simple REXX variables default to their own name. For example:
/* REXX */
SAY X
will print "X" until X is assigned some other value:
/* REXX */
X = 'A'
SAY X
will print "A".
No big surprise so far. Stem variables are a bit different. The
head of the stem is never evaluated, only the bit after the initial dot
is.
To illustrate:
/* REXX */
X. = 'empty' /* all unassigned stem values take on this value */
A. = 'nil'
B = 'A' /* simple variable B is assigned value A */
X = 'A' /* simple variable X is assigned value A */
SAY X.A /* prints: empty */
X.A = 'hello' /* Stem X. associates value of A with 'hello' */
SAY X.A /* prints: hello */
SAY X.B /* prints: hello */
SAY X.X /* prints: hello */
Notice the X and the A stem names are not evaluated, however, the
X and A variables appearing after them are. Some people find this a
bit confusing - think about it for a while and it makes
great sense.
The Z/OS version of REXX does not provide a natural way to iterate over
a stem variable. The easiest way to do this is to build your own index.
For example:
/* REXX */
X. = ''
DO I = 1 TO 10
J = RANDOM(100, 500) /* Random # 100..500 */
X.INDEX = X.INDEX J /* concatinate J's with 1 space between */
X.J = 'was picked' /* Associate 'was picked' with whatever J evalauates to */
END
DO I = 1 TO WORDS(X.INDEX) /* Number of blank delimited 'words' */
J = WORD(X.INDEX, I) /* Extract 1 'word' at a time */
SAY J X.J /* Print 'word' and its associated value */
END
Pretty trivial but illustrates the idea. Just be sure that INDEX (or whatever name you
choose) to hold the indexing names never pops up as an associative value! If this is a possibility, use some other variable to hold the index.
Last point. Notice each of my examples begins with /* REXX */ you may find
that this needs to be the first line of your REXX programs under Z/OS.
Related
I have a list of phone numbers that sometimes have a person in parenthesis at the end. I need to extract the person's name (and add that as a note in a separate field). Here is an example of the data:
(517)234-6789(Bob)
701-556-2345
(325)663-5977
(215)789-8585
425-557-7745(Pauline)
There is always a () around the person's name, but often there is also a () around the area code, so I can't use the ( as a way to know a name has started. I'd like to create a loop that goes through the phone number string and if it sees alpha characters, builds a string that will be assigned to a variable as the name.
Something like this. I am making up the IS-ALPHA syntax, of course. That is what I am looking for, or something where I don't have to list every letter.
PROCEDURE CreatePhoneNote (INPUT cPhone AS CHARACTER)
DEFINE VARIABLE cPersonName AS CHARACTER NO-UNDO.
DEFINE VARIABLE cThisChar AS CHARACTER NO-UNDO.
DEFINE VARIABLE iCount AS INTEGER NO-UNDO.
DO iCount 1 TO LENGTH(cPhone):
cThisChar = SUBSTRING(cPhone,iCount,1).
IF IS_ALPHA(cThisChar) THEN cPersonName = cPersonName + cThisChar.
END.
//etc.....
END PROCEDURE.
Since these are the fun questions, just one more isAlpha answer that does not use hard-coded ASCII codes but leans on the property / assumption that an alpha character has an upper and lower case version:
function isAlpha returns logical (
i_cc as char
):
return compare( upper( i_cc ), '<>', lower( i_cc ), 'case-sensitive' ).
end function.
With some code to test the function:
// test
def var ic as int.
do ic = 0 to 255:
if isAlpha( chr(ic) ) then
message ic chr( ic ).
end.
And then you see that the hard-coded ASCII answer did not take characters with diacritics into account. :-)
Watch it run on ProgressAblDojo.
Watch it run again on ProgresAblDojo with a fix to help ProgressAblDojo over it's ignorance of it's own codepage.
i can not comment, but a suggestion is to see what character is at the 0th index of the string, if it is a ( then you know how to deal with that condition. Although the next method will only work for usa numbers (it does seem that is what you have), you can check if the length matches a set number (10 since there are 10 digits in a usa number, or 12 since that is how long it would be with 2 parenthesis), and if its not, you know you have a name at the end. You would then split that string at the appropriate index
You don't have to go through each character. You can use the open parenthesis to break up the string and get the data after the last parenthesis. This may run faster if you have a large amount of data.
DEFINE VARIABLE cPhone AS CHARACTER NO-UNDO INITIAL "(517)234-6789(Bob)".
DEFINE VARIABLE cPersonName AS CHARACTER NO-UNDO.
DEFINE VARIABLE iCount AS INTEGER NO-UNDO.
DEFINE VARIABLE iNum AS INTEGER NO-UNDO.
iCount = NUM-ENTRIES(cPhone, "("). /* See how many open parentheses there are */
cPersonName = ENTRY(iCount, cPhone, "("). /* Get the string after the last open paren */
iNum = INTEGER(SUBSTRING(cPersonName, 1, 1)) NO-ERROR. /* See if the first character is a number */
IF iNum > 0 THEN
cPersonName = "". /* If it's a number, there is no name so blank out the variable */
ELSE
cPersonName = SUBSTRING(cPersonName, 1, LENGTH(cPersonName) - 1). /* Drop the closed paren */
MESSAGE cPersonName VIEW-AS ALERT-BOX INFORMATION.
You can do this using the ABL's ASC() function.
if asc(cThisChar) ge 65
and asc(cThisChar) le 90
and asc(cThisChar) ge 97
and asc(cThisChar) le 122
then
cPersonName = cPersonName + cThisChar.
ASC() works simply enough for all codepages for codepoints between 0 and 255; for others it'll depend on your -cpinternal / session:cpinternal value.
This is a bit of unexpected behavior that's likely to bite beginners. First, is this intended? Second, what other things does Raku use to guess which object to create? Does it start off thinking it's Block or Hash and change later, or does it decide on the end?
You can construct a Hash with braces and the fat arrow:
my $color-name-to-rgb = {
'red' => 'FF0000',
};
put $color-name-to-rgb.^name; # Hash
Using the other Pair notation creates a Hash too.
my $color-name-to-rgb = {
:red('FF0000'),
};
But, absent the fat arrow, I get a Block instead:
my $color-name-to-rgb = {
'red', 'FF0000',
};
put $color-name-to-rgb.^name; # Block
The Hash docs only mention that using $_ inside the braces creates a Block.
There are other ways to define a hash, but I'm asking about this particular bit of syntax and not looking for the workarounds I already know about.
$ perl6 -v
This is Rakudo version 2017.04.3 built on MoarVM version 2017.04-53-g66c6dda
implementing Perl 6.c.
When it's a Hash
Your question1 and this answer only apply to braced blocks in term position2.
Braced code that precisely follows the rule explained below constructs a Hash:
say WHAT { } # (Hash)
say WHAT { %foo } # (Hash)
say WHAT { %foo, ... } # (Hash)
say WHAT { foo => 42, ... } # (Hash)
say WHAT { :foo, ... } # (Hash)
say WHAT { key => $foo, ... } # (Hash)
The rule
If the block is empty, or contains just a list whose first element is a % sigil'd variable (eg %foo) or a literal pair (eg :bar), and it does not have a signature or include top level statements, it's a Hash. Otherwise it's a Block.
To force Block or Hash interpretation
To force a {...} term to construct a Block instead of a Hash, write a ; at the start i.e. { ; ... }.
To write an empty Block term, write {;}.
To write an empty Hash term, write {}.
To force a {...} term to construct a Hash instead of a Block, follow the rule (explained in detail in the rest of this answer), or write %(...) instead.
An explicit signature means it's a Block
Some braced code has an explicit signature, i.e. it has explicit parameters such as $foo below. It always constructs a Block no matter what's inside the braces:
say WHAT { key => $foo, 'a', 'b' } # (Hash)
say WHAT -> $foo { key => $foo, 'a', 'b' } # (Block)
An implicit signature also means it's a Block
Some braced code has an implicit signature that is generated due to some explicit choice of coding within the block:
Use of a "pronoun" inside {...} means it's a Block with a signature (an implicit signature if it doesn't already have an explicit one). The pronouns are $_, #_, and %_.
This includes implied use of $_ inside {...} due to a .method call with no left hand side argument. In other words, even { .foo } has a signature ((;; $_? is raw)) due to .foo's lack of a left hand side argument.
Use of a "placeholder" variable (e.g. $^foo).
As with an explicit signature, if braced code has an implicit signature then it always constructs a Block no matter what's inside the braces:
say WHAT { key => $_ } # (Block)
say WHAT { key => 'value', .foo, .bar } # (Block)
Top level statements mean it's a Block
say WHAT { :foo; (do 'a'), (do 'b') } # (Block)
say WHAT { :foo, (do 'a'), (do 'b') } # (Hash)
The second line contains multiple statements but they're producing values within individual elements of a list that's the single top level expression.
A top level declaration of an identifier mean it's a Block
A declaration is a statement, but I've included this section just in case someone doesn't realize that.
say WHAT { :foo, $baz, {my $bar} } # (Hash)
say WHAT { :foo, $baz, (my $bar) } # (Block)
The first line contains a Block as a key that contains a declaration (my $bar). But that declaration belongs to the inner {my $bar} Block, not the outer {...}. So the inner Block is just a value as far as the outer {...} is concerned, and thus that outer braced code is still interpreted as a Hash.
In contrast the second line declares a variable directly within the outer {...}. So it's a Block.
Still Blocks, not Hashs
Recall that, to be a Hash, the content of braced code must be a list that begins with either a % sigil'd variable or a literal pair. So these all produce Blocks:
my $bar = key => 'value';
say WHAT { $bar, %baz } # (Block)
say WHAT { |%baz } # (Block)
say WHAT { %#quux } # (Block)
say WHAT { 'a', 'b', key => $foo } # (Block)
say WHAT { Pair.new: 'key', $foo } # (Block)
Footnotes
1 This "Hash or Block?" question is an example of DWIM design. In Raku culture, good DWIM design is considered a good thing. But every DWIM comes with corresponding WATs3. The key to good DWIM design is ensuring that, in general, WATs' barks are worse than their bites4; and that the barks are useful5; and that the net benefits of the DWIM are considered to far outweigh all the barking and biting.6
2 A term is Raku's analog of a noun or noun phrase in English. It's a value.
Examples of braced blocks that are terms:
.say given { ... } # closure? hash?
say 42, { ... } # closure? hash?
Examples of braced blocks that are not terms:
if True { ... } # always a closure
class foo { ... } # always a package
put bar{ ... } # always a hash index
This answer only discusses braced blocks that are terms. For more details about terms, or more specifically "term position" (places in the grammar where a braced block will be interpreted as a term), see the comments below this answer.
3 WAT refers to a dev's incredulous surprise when something seems crazy to them. It's known that, even for well designed DWIMs, for each one that works for most folk, most of the time, there are inevitably one or more related WATs that surprise some folk, some of the time, including some of the same folk who at other times benefit from the DWIM.
4 The bite of the WATs related to this DWIM varies. It's typically a bark (error message) that makes the problem obvious. But it can also be much more obscure:
say { a => 42 }() ; # No such method 'CALL-ME' for invocant of type 'Hash' WAT? Oh.
say { a => $_ }<a> ; # Type Block does not support associative indexing. WAT? Oh.
say { a => $_, b => 42, c => 99 } .elems # 1 WAT?????
5 A "bark" is an error message or warning in documentation. These can often be improved. cf Lock.protect({}) fails, but with surprising message.
6 Community member opinions differ on whether DWIM design in general, or any given DWIM in particular, is worth it. cf my perspective vs Sam's answer to this question.
The preferred Perl6 way is to use %( ) to create hashes.
my $color-name-to-rgb = %(
'red', 'FF0000',
);
I would not recommend people use braces to create hashes, ever. If they want to make a hash then %( ) is the proper way to do it.
If you are coming from the Perl 5 world it's best to just get in the habit of using %( ) instead of { } when creating a Hash.
I have written a program for export file to a specific directory and I feel I have written some unwanted logic. so I would like to know short and best way to do export the files. Let me share what I tried
DEFINE VARIABLE cData AS CHARACTER NO-UNDO.
DEFINE VARIABLE i AS INTEGER NO-UNDO.
DEFINE VARIABLE icount AS INTEGER NO-UNDO.
DEFINE VARIABLE cName AS CHARACTER NO-UNDO.
DEFINE VARIABLE cPath AS CHARACTER NO-UNDO.
DEFINE TEMP-TABLE ttdata
FIELD GetName AS CHARACTER
FIELD iValue AS INTEGER.
ASSIGN
icount = 2
cPath = "*******".
DO I = 1 TO icount:
IF I = 1 THEN cName = "David".
IF I = 2 THEN cName = "Macavo".
CREATE ttdata.
ASSIGN
ttdata.GetName = cName
ttdata.iValue = 100.
END.
/** ttdata has two records now*/
FOR EACH ttdata.
RUN CallProc.p (INPUT ttdata.GetName,
INPUT ttdata.iValue).
END.
PROCEDURE CallProc:
DEFINE INPUT PARAMETER getName AS CHARACTER NO-UNDO.
DEFINE INPUT PARAMETER iValue AS INTEGER NO-UNDO.
OUTPUT TO cPath.
PUT UNFORMATTED ttdata.GetName ttdata.GetName.
OUTPUT CLOSE.
END PROCEDURE.
From my logic its working well and exporting 2 files as I expected but its poor idea to call another procedure.Please help this case.
I am going to use the sports2000 db in my example. Everyone has a copy so it is easy to run the sample.
define stream outFile. /* using a named stream rather than the default, unnamed, stream avoids unintended conflicts if someone else's code is lazily using the unnamed stream */
function mkTemp returns character ( input tmpid as character, input extension as character ):
define variable fname as character no-undo.
run adecomm/_tmpfile.p ( tmpid, extension, output fname ).
/* create the temp file with no content
*/
output stream outFile to value( fname ).
output stream outFile close.
return fname.
end.
procedure doStuff:
define input parameter tmpfile as character no-undo.
define input parameter custid as integer no-undo.
output stream outFile to value( tmpFile ) append. /* open the existing file in append mode */
put stream outFile "customer:" custId skip.
for each order no-lock where order.custNum = custId and orderStatus <> "shipped" and salesRep = "bbb":
put stream outFile orderNum " " promised skip.
end.
output stream outFile close.
return.
end.
define variable i as integer no-undo.
define variable tmpName as character no-undo.
/* tmpName = mkTemp( "xyzzy", ".tmp" ). */ /* if you only need one temp file get the name here and comment it out below */
for each customer no-lock:
tmpName = mkTemp( "xyzzy", ".tmp" ). /* use this if every customer should get a distinct temp file */
run doStuff ( tmpName, custNum ).
/* if there is no good reason to be calling the doStuff() procedure then just remove it and do it inline like this: */
/*
*
output stream outFile to value( tmpFile ) append. /* open the existing file in append mode */
put stream outFile "customer:" customer.custNum skip.
for each order no-lock where order.custNum = customer.CustNum and orderStatus <> "shipped" and salesRep = "bbb":
put stream outFile orderNum " " promised skip.
end.
output stream outFile close.
*/
i = i + 1.
if i >= 3 then leave. /* just do 3 customers for the sample run... */
end.
The program doesn't look too bad at first glance, but there are several issues.
The DEFINE TEMP-TABLE could use a NO-UNDO.
You should probably use "FOR EACH ttdata:" instead of "FOR EACH ttdata." which is old style.
You are running CallProc.p which is an external program and not the internal procedure included in your example. If your code actually runs you would have to show us the code in CallProc.p.
Assuming the code from CallProc, the file you open is named cPath. (I don't get why are saying two files are written.) If you want the file to be named "*******" you have to write value(cPath) instead of cPath, but "*******" is an invalid name in Windows anyway.
It doesn't hurt too much to run a procedure for every line. The bigger issue is that you open and close the file every time. Open the file before the for each and close it afterwards. If you are using a current OpenEdge version you should close it inside a finally block.
Also you are opening the file without APPEND which means that you are overwriting it every time, so only the last record gets written.
As for not using a procedure this should be pretty trivial, especially since you don't use the parameters you pass to the procedure. You are currently outputting ttdata.GetName twice though, which is probably an error. Also you are missing a SKIP at the end of the put statement and a space in between since UNFORMATTED doesn't add any spaces. I suppose you should have written PUT UNFORMATTED getName " " iValue skip.
I suppose this is some kind of homework?
If you want two (or more) separate export files, you'll need to give them unique names. I did that here by reusing your 'I' variable and reassigning cPath each time. And although I don't agree that calling a separate procedure to write the file is a poor idea, I've incorporated it into the single FOR-EACH loop. I've also fixed some of the points that idspispopd made.
DEFINE VARIABLE i AS INTEGER NO-UNDO.
DEFINE VARIABLE icount AS INTEGER NO-UNDO.
DEFINE VARIABLE cName AS CHARACTER NO-UNDO.
DEFINE VARIABLE cPath AS CHARACTER NO-UNDO.
DEFINE TEMP-TABLE ttdata NO-UNDO
FIELD GetName AS CHARACTER
FIELD iValue AS INTEGER.
ASSIGN
icount = 2.
DO I = 1 TO icount:
/* Using a CASE statement makes it easier to add in other values in the future */
CASE I:
WHEN 1 THEN cName = "David".
WHEN 2 THEN cName = "Macavo".
END CASE.
CREATE ttdata.
ASSIGN
ttdata.GetName = cName
ttdata.iValue = 100.
END.
/** ttdata has two records now*/
I = 1.
FOR EACH ttdata NO-LOCK:
cPath = ".\" + STRING(I) + ".txt".
OUTPUT TO VALUE(cPath).
PUT UNFORMATTED ttdata.GetName ttdata.iValue SKIP.
OUTPUT CLOSE.
I = I + 1.
END.
i'm having some issues on bison (again).
I'm trying to pass a string value between a "recursive rule" in my grammar file using the $$,
but when I print the value I have passed, the output looks like a wrong reference ( AU�� ) instead the value I wrote in my input file.
line: tok1 tok2
| tok1 tok2 tok3
{
int len=0;
len = strlen($1) + strlen($3) + 3;
char out[len];
strcpy(out,$1);
strcat(out," = ");
strcat(out,$3);
printf("out -> %s;\n",out);
$$ = out;
}
| line tok4
{
printf("line -> %s\n",$1);
}
Here I've reported a simplified part of the code.
Giving in input the token tok1 tok2 tok3 it should assign to $$ the out variable (with the printf I can see that in the first part of the rule the out variable has the correct value).
Matching the tok4 sequentially I'm in the recursive part of the rule. But when I print the $1 value (who should be equal to out since I have passed it trough $$), I don't have the right output.
You cannot set:
$$ = out;
because the string that out refers to is just about to vanish into thin air, as soon as the block in which it was declared ends.
In order to get away with this, you need to malloc the storage for the new string.
Also, you need strlen($1) + strlen($3) + 4; because you need to leave room for the NUL terminator.
It's important to understand that C does not really have strings. It has pointers to char (char*), but those are really pointers. It has arrays (char []), but you cannot use an array as an aggregate. For example, in your code, out = $1 would be illegal, because you cannot assign to an array. (Also because $1 is a pointer, not an array, but that doesn't matter because any reference to an array, except in sizeof, is effectively reduced to a pointer.)
So when you say $$ = out, you are making $$ point to the storage represented by out, and that storage is just about to vanish. So that doesn't work. You can say $$ = $1, because $1 is also a pointer to char; that makes $$ and $1 point to the same character. (That's legal but it makes memory management more complicated. Also, you need to be careful with modifications.) Finally, you can say strcpy($$, out), but that relies on $$ already pointing to a string which is long enough to hold out, something which is highly unlikely, because what it means is to copy the storage pointed to by out into the location pointed to by $$.
Also, as I noted above, when you are using "string" functions in C, they all insist that the sequence of characters pointed to by their "string" arguments (i.e. the pointer-to-character arguments) must be terminated with a 0 character (that is, the character whose code is 0, not the character 0).
If you're used to programming in languages which actually have a string datatype, all this might seem a bit weird. Practice makes perfect.
The bottom line is that what you need to do is to create a new region of storage large enough to contain your string, like this (I removed out because it's not necessary):
$$ = malloc(len + 1); // room for NUL
strcpy($$, $1);
strcat($$, " = ");
strcat($$, $3);
// You could replace the strcpy/strcat/strcat with:
// sprintf($$, "%s = %s", $1, $3)
Note that storing mallocd data (including the result of strdup and asprintf) on the parser stack (that is, as $$) also implies the necessity to free it when you're done with it; otherwise, you have a memory leak.
I've solved it changin the $$ = out; line into strcpy($$,out); and now it works properly.
I've been experimenting with genetic algorithms as of late and now I'd like to build mathematical expressions out of the genomes (For easy talk, its to find an expression that matches a certain outcome).
I have genomes consisting of genes which are represented by bytes, One genome can look like this: {12, 127, 82, 35, 95, 223, 85, 4, 213, 228}. The length is predefined (although it must fall in a certain range), neither is the form it takes. That is, any entry can take any byte value.
Now the trick is to translate this to mathematical expressions. It's fairly easy to determine basic expressions, for example: Pick the first 2 values and treat them as products, pick the 3rd value and pick it as an operator ( +, -, *, /, ^ , mod ), pick the 4th value as a product and pick the 5th value as an operator again working over the result of the 3rd operator over the first 2 products. (or just handle it as an postfix expression)
The complexity rises when you start allowing priority rules. Now when for example the entry under index 2 represents a '(', your bound to have a ')' somewhere further on except for entry 3, but not necessarily entry 4
Of course the same goes for many things, you can't end up with an operator at the end, you can't end up with a loose number etc.
Now i can make a HUGE switch statement (for example) taking in all the possible possibilities but this will make the code unreadable. I was hoping if someone out there knows a good strategy of how to take this one on.
Thanks in advance!
** EDIT **
On request: The goal I'm trying to achieve is to make an application which can resolve a function for a set of numbers. As for the example I've given in the comment below: {4, 11, 30} and it might come up with the function (X ^ 3) + X
Belisarius in a comment gave a link to an identical topic: Algorithm for permutations of operators and operands
My code:
private static double ResolveExpression(byte[] genes, double valueForX)
{
// folowing: https://stackoverflow.com/questions/3947937/algorithm-for-permutations-of-operators-and-operands/3948113#3948113
Stack<double> operandStack = new Stack<double>();
for (int index = 0; index < genes.Length; index++)
{
int genesLeft = genes.Length - index;
byte gene = genes[index];
bool createOperand;
// only when there are enough possbile operators left, possibly add operands
if (genesLeft > operandStack.Count)
{
// only when there are at least 2 operands on the stack
if (operandStack.Count >= 2)
{
// randomly determine wether to create an operand by threating everything below 127 as an operand and the rest as an operator (better then / 2 due to 0 values)
createOperand = gene < byte.MaxValue / 2;
}
else
{
// else we need an operand for sure since an operator is illigal
createOperand = true;
}
}
else
{
// false for sure since there are 2 many operands to complete otherwise
createOperand = false;
}
if (createOperand)
{
operandStack.Push(GeneToOperand(gene, valueForX));
}
else
{
double left = operandStack.Pop();
double right = operandStack.Pop();
double result = PerformOperator(gene, left, right);
operandStack.Push(result);
}
}
// should be 1 operand left on the stack which is the ending result
return operandStack.Pop();
}
private static double PerformOperator(byte gene, double left, double right)
{
// There are 5 options currently supported, namely: +, -, *, /, ^ and log (math)
int code = gene % 6;
switch (code)
{
case 0:
return left + right;
case 1:
return left - right;
case 2:
return left * right;
case 3:
return left / right;
case 4:
return Math.Pow(left, right);
case 5:
return Math.Log(left, right);
default:
throw new InvalidOperationException("Impossible state");
}
}
private static double GeneToOperand(byte gene, double valueForX)
{
// We only support numbers 0 - 9 and X
int code = gene % 11; // Get a value between 0 and 10
if (code == 10)
{
// 10 is a placeholder for x
return valueForX;
}
else
{
return code;
}
}
#endregion // Helpers
}
Use "post-fix" notation. That handles priorities very nicely.
Post-fix notation handles the "grouping" or "priority rules" trivially.
For example, the expression b**2-4*a*c, in post-fix is
b, 2, **, 4, a, *, c, *, -
To evaluate a post-fix expression, you simply push the values onto a stack and execute the operations.
So the above becomes something approximately like the following.
stack.push( b )
stack.push( 2 )
x, y = stack.pop(), stack.pop(); stack.push( y ** x )
stack.push( 4 )
stack.push( a )
x, y = stack.pop(), stack.pop(); stack.push( y * x )
stack.push( c )
x, y = stack.pop(), stack.pop(); stack.push( y * x )
x, y = stack.pop(), stack.pop(); stack.push( y - x )
To make this work, you need to have to partition your string of bytes into values and operators. You also need to check the "arity" of all your operators to be sure that the number of operators and the number of operands balances out. In this case, the number of binary operators + 1 is the number of operands. Unary operators don't require extra operands.
As ever with GA a large part of the solution is choosing a good representation. RPN (or post-fix) has already been suggested. One concern you still have is that your GA might throw up expressions which begin with operators (or mismatch operators and operands elsewhere) such as:
+,-,3,*,4,2,5,+,-
A (small) part of the solution would be to define evaluations for operand-less operators. For example one might decide that the sequence:
+
evaluates to 0, which is the identity element for addition. Naturally
*
would evaluate to 1. Mathematics may not have figured out what the identity element for division is, but APL has.
Now you have the basis of an approach which doesn't care if you get the right sequence of operators and operands, but you still have a problem when you have too many operands for the number of operators. That is, what is the intepretation of (postfix following) ?
2,4,5,+,3,4,-
which (possibly) evaluates to
2,9,-1
Well, now you have to invent your own convention if you want to reduce this to a single value. But you could adopt the convention that the GA has created a vector-valued function.
EDIT: response to OP's comment ...
If a byte can represent either an operator or an operand, and if your program places no restrictions on where a genome can be split for reproduction, then there will always be a risk that the offspring represents an invalid sequence of operators and operands. Consider, instead of having each byte encode either an operator or an operand, a byte could encode an operator+operand pair (you might run out of bytes quickly so perhaps you'd need to use two bytes). Then a sequence of bytes might be translated to something like:
(plus 1)(plus x)(power 2)(times 3)
which could evaluate, following a left-to-right rule with a meaningful interpretation for the first term, to 3((x+1)^2)