i have a reguler expresion
10+(0+11)*1
how to change the reguler expression to Finite State Automata ?
There are algorithms used in the propf of equivalency between REs and NFAs on the one hand, and between NFAs and DFAs on the other. That is one option and does not require insight or understanding about what language the RE generates.
Another option is to try to understand the language first, and then write a DFA for the language from scratch.
The way I will show you involves the Myhill-Nerode theorem which says a regular language has as many equivalence classes over the indististinguishability relation as there are states in a minimal DFA for that language. Two strings qre indistinguishable with respect yo the language if the same set of strings can be appended to them to get some string in the language. For instance, the strings a, b and ba are distinguishable w.r.t. L(ab) since a can be followed b, b by the empty string only, and ba by nothing (empty set) to get a string in L. This tells us a minimal DFA for ab requires at least three states.
For your language, L(10 + (0 + 11)*1, we observe:
the empty string is the first we look at. It needs a state - the DFA's initial state - and can be followed by any string in L to get a string in L. Call this state [e].
the string 0 can be followed by 1 to get a string in the language; this makes it fifferent from the empty string, so a new state is required. Call this [0].
the string 1 can be followed by the empty string to get another string in the language; this makes it distinguishable from the empty string and the string 0. Call its state [1].
the string 00 can be followed by exactly the same strings as could follow 0 to get a string in the language. This means 00 does not need a new state; 00 will take our DFA to state [0].
the string 01 can be followed by strings of the form 1(0 + 11)*1 to get a string in the language. This is new, so we need a new state. Call this [01]
the string 10 can be followed by the empty string only to get a string in L. This is new, so call its state [10].
the string 11 can be followed by exactly the same strings as could follow 0 to get a string in the language. 11 will take our DFA to state [0].
the string 010 can't ever lead to a string in the language; it must lead to a dead state in our minimal DFA. Call this [010].
the string 011 is indistinguishable from strings 0 and 11.
the strings 100 and 101 can't lead to a string in the language, so it must take our DFA to the dead state [010].
The states we found we needed are these: [e], [0], [1], [01], [10] and [010]. The transitions are not too hard to figure out:
[e] transitions to [0] and [1] on inputs 0 and 1, respectively
[0] transitions to [0] and [01] on inputs 0 and 1, respectively
[1] transitions to [10] and [0] on inputs 0 and 1, respectively
[01] transitions to [010] and [0] on inputs 0 and 1, respectively
[10] transitions to [010] ob inputs 0 or 1
[010] always transitions to itself
You now have a minimal DFA for your language, as well as a proof of such minimality.
Related
Consider the following code in built-in-library-tests.robot:
***Test Cases***
Use "Convert To Hex"
${hex_value} = Convert To Hex 255 base=10 prefix=0x # Result is 0xFF
# Question: How does the following statement work step by step?
Should Be True ${hex_value}==${0xFF} #: is ${0xFF} considered by Robot a string or a integer value in base 16?
# To Answer My Own Question, here is an hypothesis solution:
# For python to execute the expression:
# Should Be True a_python_expression_in_a_string_without_quotes # i.e. 0xFF==255
# To reach that target, i think of a 2 step solution:
# STEP 1: When a variable is used in the expressing using the normal ${hex_value} syntax, its value is replaced before the expression is evaluated.
# This means that the value used in the expression will be the string representation of the variable value, not the variable value itself.
Should Be True 0xFF==${0xFF}
# Step 2: When the hexadecimal value 0xFF is given in ${} decoration, robot converts the value to its
# integer representation 255 and puts the string representation of 255 into the the expression
Should Be True 0xFF==255
The test above passes with all its steps. I want to check with my community, is my 2 step hypothesis solution correct or not? Does Robot exactly go through these steps, before evaluating the final expression 0xFF==255 in Python?
Robot receives the expression as the string ${hex_value}==${0xFF}. It then performs variable substitution, yielding the string 0xFF==255. This string is then passed to python's eval statement.
The reason for the right hand side being 255 is described in the user guide:
It is possible to create integers also from binary, octal, and hexadecimal values using 0b, 0o and 0x prefixes, respectively. The syntax is case insensitive.
${0xFF} gets replaced with 255, and ${hex_value} gets substituted with whatever is in that variable. In this case, that variable contains the four bytes 0xFF.
Thus, ${hex_value}==${0xFF} gets converted to 0xFF==255, and that gets passed to eval as a string.
In other words, it's exactly the same as if you had typed eval("0xFF==255") at a python interactive prompt.
I try to find a regex that matches the string only if the string does not end with at least three '0' or more. Intuitively, I tried:
.*[^0]{3,}$
But this does not match when there one or two zeroes at the end of the string.
If you have to do it without lookbehind assertions (i. e. in JavaScript):
^(?:.{0,2}|.*(?!000).{3})$
Otherwise, use hsz's answer.
Explanation:
^ # Start of string
(?: # Either match...
.{0,2} # a string of up to two characters
| # or
.* # any string
(?!000) # (unless followed by three zeroes)
.{3} # followed by three characters
) # End of alternation
$ # End of string
You can try using a negative look-behind, i.e.:
(?<!000)$
Tests:
Test Target String Matches
1 654153640 Yes
2 5646549800 Yes
3 848461158000 No
4 84681840000 No
5 35450008748 Yes
Please keep in mind that negative look-behinds aren't supported in every language, however.
What wrong with the no-look-behind, more general-purpose ^(.(?!.*0{3,}$))*$?
The general pattern is ^(.(?!.* + not-ending-with-pattern + $))*$. You don't have to reverse engineer the state machine like Tim's answer does; you just insert the pattern you don't want to match at the end.
This is one of those things that RegExes aren't that great at, because the string isn't very regular (whatever that means). The only way I could come up with was to give it every possibility.
.*[^0]..$|.*.[^0].$|.*..[^0]$
which simplifies to
.*([^0]|[^0].|[^0]..)$
That's fine if you only want strings not ending in three 0s, but strings not ending in ten 0s would be long. But thankfully, this string is a bit more regular than some of these sorts of combinations, and you can simplify it further.
.*[^0].{0,2}$
I am reading the grammar of SQLite and having a few questions about the following paragraph.
// The name of a column or table can be any of the following:
//
%type nm {Token}
nm(A) ::= id(A).
nm(A) ::= STRING(A).
nm(A) ::= JOIN_KW(A).
The nm has been used quite widely in the program. The lemon parser documentation said
Typically the data type of a non-terminal is a pointer to the root of
a parse-tree structure that contains all information about that
non-terminal
%type expr {Expr*}
Should I understand {Token} actually stands for a syntactic grouping which is a non-terminal token that "is a parse-tree structure that contains all.."?
What is nm short for in this same, is it simply "name"?
What is the period sign (dot .) that each nm(A) declaration ends up with?
No, you should understand that Token is a C object type used for the semantic value of nms.
(It is defined in sqliteInt.h and consists of a pointer to a non-null terminated character array and the length of that array.)
The comment immediately above the definition of nm starts with the words "the name", which definitely suggests to me that nm is an abbreviation for "name", yes. That is also consistent with its semantic type, as above, which is basically a name (or at least a string of characters).
All lemon productions end with a dot. It tells lemon where the end of the production is, like semicolons indicate to a C compiler where the end of a statement is. This makes it easier to parse consecutive productions, since otherwise the parser would have to look several symbols ahead to see the ::=
I am using the Rlibstree package version 0.3-2 with the function getLongestCommonSubstring
So I have character strings that only contain 0-9 and >; they look like this:
string A:
0113>0213>0212>0312>0411>0611>0711>0812>1012>1112>1212>1412>1313>1413>1412>1311>1211>1212>1012>1013>0912>0812>0712>0513>0612>0511>0410>0309>0209>0308>0207>0107>0007>0109>0010>0110>0010>0008>0007>0106>0105>0204>0304>0503>0603>0701>0801>0802>0803>0904>1003>1002>1001>1002>1103>1004>0904>0803>0802>0701>0702>0603>0503>0403>0303>0204>0105>0104>0203>0302>0401>0302>0203>0204>0104>0105>0106>0107>0307>0308>0409>0410>0311>0212>0113>0213>0113>0213
String B:
0113>0213>0212>0312>0411>0511>0410>0409>0308>0307>0207>0107>0108>0109>0010>0110>0010>0009>0107>0207>0307>0308>0309>0209>0309>0410>0411>0611>0711>0812>0912>1012>1112>1212>1412>1313>1412>1212>1112>1012>1013>0912>0812>0612>0613>0513>0612>0611>0511>0411>0312>0213>0113>0213>0113>0212>0311>0411>0312>0213>0212>0311>0312>0311>0411>0410>0409>0308>0307>0207>0107>0106>0105>0204>0304>0503>0604>0603>0602>0601>0701>0801>0802>0803>0804>0904>1004>1003>1002>1001>1002>1001>1003>1004>0904>0803>0802>0801>0701>0602>0604>0504>0404>0304>0104>0105>0107>0108>0109>0108>0107>0207>0308>0409>0410>0311>0212>0213
String C:
0113>0213>0113>0213>0113>0213>0212>0311>0411>0611>0812>0912>1012>1212>1312>1412>1413>1314>1313>1213>1413>1412>1411>1311>1212>1011>0911>0811>0712>0611>0411>0410>0409>0309>0209>0309>0408>0410>0510>0611>0712>0611>0511>0411>0311>0310>0409>0309>0307>0207>0108>0109>0110>0010>0109>0108>0107>0006>0106>0105>0204>0203>0303>0204>0203>0302>0401>0402>0401>0302>0203>0304>0404>0504>0503>0604>0705>0605>0705>0604>0505>0504>0603>0503>0403>0303>0203>0104>0105>0005>0107>0108>0109>0108>0107>0207>0107>0106>0104>0204>0304>0404>0504>0603>0604>0603>0503>0504>0603>0702>0701>0801>0802>0804>0904>1004>1003>1002>1001>1002>1003>1104>1205>1304>1303>1403>1404>1403>1304>1205>1104>0904>0804>0802>0801>0701>0602>0703>0604>0704>0602>0701>0601>0602>0603>0504>0404>0303>0203>0204>0105>0106>0107>0207>0308>0408>0409>0308>0309>0409>0410>0411>0511>0611>0812>0912>1012>1112>1012>0912>1013>1012>1112>1212>1312>1313>1213>1313>1312>1412>1313>1312>1413>1313>1213>1313>1312>1112>1012>0911>1011>1112>1312>1412>1312>1413>1313>1312>1212>1112>0911>0811>0711>0511>0411>0312>0212>0312>0411>0511>0611>0612>0413>0513>0612>0611>0411>0312>0212>0213>0212>0213>0113>0213>0113
I want my input string to compare with String A.
See example below:
If I compare A and B, no problem, found two longest common substring, happy!
getLongestCommonSubstring(c(A,B))
[1] "07>0106>0105>0204>0304>0503>060" "12>1012>1112>1212>1412>1313>141"
BUT, if I compare A and C, something happened, as you can see the result,
I got \xc1 or ! at the end, and these special character will change every time.
Execute First time:
getLongestCommonSubstring(c(A,C))
[1] "04>1003>1002>1001>1002>1\xc1" ">0603>0503>0403>0303>020!"
Execute Second time:
getLongestCommonSubstring(c(A,C))
[1] "04>1003>1002>1001>1002>11" ">0603>0503>0403>0303>020!"
Execute Third time:
getLongestCommonSubstring(c(A,C))
[1] "04>1003>1002>1001>1002>1\xc1" ">0603>0503>0403>0303>020\xc1"
With these special character, or escape character in the string, I cannot perform tasks like the nchar() function, these characters are redundant and annoying.
For me, the only difference between B and C is their length, their format is the same, I really cannot figure out why this happened.
I am studying mathematical computation and I am completely stuck on this task! I don't even know how to go about starting it!
**Write a program in Fortran that can parse a single line of well-formed HTML or XML markup so that it takes input on a single line (guaranteed to not exceed 80 characters in total) like
-lots of lovely text
where
tag might be anything from 1 to 37 ASCII characters and will not contain spaces
text could contain spaces and be anything from 1 to 73 characters in length
so that the program outputs one of two lines:
tag : text if the two occurrences of tag match inside <...> and
syntax error if anything else is input.
Any help is hugely appreciated !**
There are a number of intrinsic functions for working with strings that may be helpful.
result = index(string, substring) - returns the position of the start of the first occurrence of string substring as a substring in string, counting from one. (Fortran 77)
result = scan(string, set) - scans a string for any of the characters in a set of characters. (Fortran 95)
result = verify(string, set) - verifies that all the characters in a string are present in a set. (Fortran 95)
There are a few user-contributed string tokenization functions on the Fortran Wiki that might be helpful:
delim, strtok, and find_field. Also, FLIBS includes some string manipulation and tokenization routines that might be useful as examples.
Finally, there are a number of existing open-source XML parsers written in Fortran: xmlf90 and xml-fortran. Looking at the source code for these libraries should be helpful.