Convert EBNF to BNF - bnf

I am struggling to convert this EBNF to BNF. Using the image:
I converted this to EBNF and would like to now convert this to BNF.
The EBNF I have two alternatives:
number_constant ::= ( | "-") digit+ ("." digit+ | )
number_constant ::= "-"? digit+ ("." digit+)?
The part where I am struggling is the middle of the diagram, I have digit defined as 1-9 so can't use digit as keyword. I was thinking of breaking down the diagram such as the first part:
<min> ::= ' ' | "-"
Then for the mid part:
<dig> ::= <digit> | <digit> <dig>
Combined this would look simply like:
<number_constant> ::= <min> <dig> <last_part>
Then I am unsure of the last part.
Any help is appreciated.

Your dig solution seems correct.
The last part can be implemented with:
<last_part> ::= "." <dig> | ""

Extended BNF sure lets you have things a lot more concise.
Here's a variation based on the semantics of what goes into making up a decimal number:
<number_constant> ::= <integer>
| <integer> '.' <whole_number>
<integer> ::= <integer>
| '- <whole_number>
<whole_number> ::= Digit
| <whole_number> Digit

Related

StartTag: invalid element name Error: 1: StartTag: invalid element name

I have an xml which I am trying to parse using xmlParse in R. I have a number of xml's which are very similar to what I am trying below and I have no issues, however when trying the exact same process using one of my xml's, I get the below error message.
a = "productlist1374.xml"
b = xmlParse(a)
StartTag: invalid element name
Error: 1: StartTag: invalid element name
Only certain characters are permitted in XML names by the W3C XML BNF for component names:
Name ::= NameStartChar (NameChar)*
NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] |
[#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] |
[#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] |
[#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] |
[#x10000-#xEFFFF]
NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] |
[#x203F-#x2040]
You've not posted your XML, but clearly one or more of your start tags uses a character or characters that are not allowed.

How to replace text sequences ending in a fixed pattern within a long text string in R?

I have a column within a data frame containing long text sequences (often in the thousands of characters) of the format:
abab(VR) | ddee(NR) | def(NR) | fff(VR) | oqq | pqq | ppf(VR)
i.e. a string, a suffix in brackets, then a delimiter
I'm trying to work out the syntax in R to delete the items that end in (VR), including the trailing pipe if present, so that I'm left with:
ddee(NR) | def(NR) | oqq | pqq
I cannot work out the regular expression (or gsub) that will remove these entries and would like to request if anyone could help me please.
If you want to use gsub, you can remove the pattern in two stages:
gsub(" \\| $", "", gsub("\\w+\\(VR\\)( \\| )?", "", s))
# firstly remove all words ending with (VR) and optional | following the pattern and
# then remove the possible | at the end of the string
# [1] "ddee(NR) | def(NR) | oqq | pqq"
regular expression \\w+\\(VR\\) will match words ending with (VR), parentheses are escaped by \\;
( \\| )? matches optional delimiter |, this makes sure it will match the pattern both in the middle and at the end of the string;
possible | left out at the end of the string can be removed by a second gsub;
Here is a method using strsplit and paste with the collapse argument:
paste(sapply(strsplit(temp, split=" +\\| +"),
function(i) { i[setdiff(seq_along(i), grep("\\(VR\\)$", i))] }),
collapse=" | ")
[1] "ddee(NR) | def(NR) | oqq | pqq"
We split on the pipe and spaces, then feed the resulting list to sapply which uses the grep function to drop any elements of the vector that end with "(VR)". Finally, the result is pasted together.
I added a subsetting method with setdiff so that vectors without any "(VR)" will return without any modification.

EBNF Definition of Identifier

The EBNF definition of an identifier is (a-zA-Z, _ ){a-zA-Z0-9, _ }. Can someone explain this definition and give me a valid identifier by this definition.
The syntax of EBNF like languages differ a lot.
Normally I would define something like this:
letter = "a" | "b" | ... | "z" | "A" | ... | "Z";
digit = "0" | "1" | "2" | ... | "9";
identifier = letter , { letter | digit | "_" } ;
Your form looks like a mixture of EBNF and regex.
It is hard to tell what this means if I don't know which language we are talking about.
But by pure guessing, I would say it describes a C-like identifier (e.g. variable name) like "myVar_0123ab".
The identifier has to start with a letter, or an underline '_', followed by letters, underlines and digits.

Representing simple mathematics using BNF

I have written the following BNF "code", which attempts to describe simple mathematics using BNF. The issue I am having is that I have no idea how to add parentheses (brackets).
Digit ::= "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9";
Digits ::= <Digit>|<Digit><Digit>;
Number ::= <Digits>|<Digits>.<Digits>;
Addition ::= <Value> + <Value>;
Subtraction ::= <Value> - <Value>;
Multiplication ::= <Value> * <Value>;
Division ::= <Value> / <Value>;
Value ::= <Number>|<Addition>|<Subtraction>|<Multiplication>|<Division>;
The other issue is that I'm not sure that the BNF is 100% correct, as the Value "description" doesn't look right to me.
Digit ::= "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9";
Digits ::= <Digit>|<Digit><Digits>;
Number ::= <Digits>|<Digits>.<Digits>;
Operator ::= "+" | "-" | "*" | "/"
Bracket_Left ::= "("
Bracket_Right ::= ")"
Value ::= <Number>|<Bracket_Left><Value><Bracket_Right>|<Value><Operator><Value>
Maybe not the most elegant solution, but should work. Always keep in mind the power of recursion.
If you are after operator precedence too, you should use well known method by a recursion (right one in my example):
AddSub ::= <MulDiv> ("+" | "-") <AddSub> | <MulDiv>;
MulDiv ::= <Brackets> ("*" | "/") <MulDiv> | <Brackets>;
Brackets ::= "(" <AddSub> ")" | <Decimal>;
Decimal ::= <Integer> "." <Integer> | <Integer>;
Integer ::= <Digit> <Integer> | <Digit>;
Digit ::= "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9";
and operator precedence is automatically followed by parser, without further intervention. I didn't invent this method, it is there for decades, but I have to admit it's kind of genial.

Backus Naur Form Assoicativity

Is this the correct way to implement right associativity for Exponentiation PowExp? So that 2^3^4 is actually (2^(3^4))
<Exp> ::= <Exp> + <MulExp>
| <Exp> - <MulExp>
| <MulExp>
<MulExp> ::= <MulExp> * <PowExp>
| <MulExp> / <PowExp>
| <PowExp>
<PowExp> ::= <NegExp> ^ <PowExp>
|<NegExp>
<NegExp> ::= - <RootExp>
| <RootExp>
<RootExp> ::= ( <Exp> )
| 1 | 2 | 3 | 4
The way you've written it is correct.
Incidentally, you might want to reconsider your hierarchy; in regular math, −34 is −(34), not (−3)4. So you might want - 3 ^ 4 to mean - (3 ^ 4), in which case NegExp would include PowExp rather than the other way around. (But I suppose it could be confusing if -3 ^ 4 means -(3 ^ 4), so maybe there's no intuitive order-of-operations here? Another possibility is to require parentheses for either reading, by having PowExp and NegExp both depend directly on RootExp.)

Resources