Ada 2012 RM - Comments and String Literals - ada

I am journeying through the Ada 2012 RM and would like to see if there is a hole in my understanding or a hole in the RM. Assuming that
put_line ("-- this is a not a comment");
is legal code, how can I deduce its legality from the RM, since section 2.7 states that "a comment starts with two adjacent hyphens and extends up to the end of the line.", while section 2.6 states "a string_literal is formed by a sequence of graphic characters (possibly none) enclosed between two
quotation marks used as string brackets." It seems like there is tension between the two sections and that 2.7 would win, but that is apparently not the case.

To get a clearer understanding here, you need to have a look at section 2.2 in the RM.
2.2 (1), which states;
The text of each compilation is a sequence of separate lexical elements. Each lexical element is formed from a sequence of characters, and is either a delimiter, an identifier, a reserved word, a numeric_literal, a character_literal, a string_literal, or a comment. The meaning of a program depends only on the particular sequences of lexical elements that form its compilations, excluding comments.
And 2.2 (3/2) which states:
"[In some cases an explicit separator is required to separate adjacent lexical elements.] A separator is any of a separator_space space character, a format_effector format effector, or the end of a line, as follows:
A separator_space space character is a separator except within a comment, a string_literal, or a character_literal.
The character whose code point position is 16#09# (CHARACTER TABULATION) Character tabulation (HT) is a separator except within a comment.
The end of a line is always a separator.
One or more separators are allowed between any two adjacent lexical elements, before the first of each compilation, or after the last."
and
A delimiter is either one of the following special characters:
& ' ( ) * + , – . / : ; < = > |
or one of the following compound delimiters each composed of two adjacent special characters
=> .. ** := /= >= <= << >> <>
Each of the special characters listed for single character delimiters is a single delimiter except if this character is used as a character of a compound delimiter, or as a character of a comment, string_literal, character_literal, or numeric_literal.
So, once you filter out the white-space of a program text and break it down into a sequence of lexical elements, a lexical element corresponding to a string literal begins with a double quote character, and a lexical element corresponding to a comment begins with --.
These are clearly different syntax items, and do not conflict with each other.
This also explains why;
X := A - -1
+ B;
gives a different result than;
X := A --1
+ B;
The space separator between the dashes makes the first minus a different lexical element than the -1, so -1 is a numeric literal in the first case, while the --1 is a comment.

Related

What do these characters inside the parentheses of a paremeter expansion (${(…)}) do?

hi am new to zsh and am trying to create multi-line prompt and came across this line of code:
local pad=${(pl.$pad_len.. .)}
My 1st question is what is the pl inside the parentheses? Is it a command or operator or a flag(s)?
And my 2nd question is what are the dots that follow $pad_len?
Those are Zsh parameter expansion flags.
l.$pad_len. makes the given (in this case, empty) string exactly $pad_len long, either by truncating it from the left or by padding it on the left with spaces.
l.$pad_len.. . does the same as the above, but specifies explicitly to use the space character for padding — which is unnecessary, since the default is to pad with spaces.
The .s here are arbitrary separators used to enclose each argument for the preceding flag. It doesn’t matter which (matching pair of) punctuation characters you use for this, as long they enclose each argument in pairs. So, l:$pad_len:: : and l<$pad_len>< > do the exact same thing.
p makes l support print escape codes in the second argument — which is unnecessary, since we don’t use any here.
So, a shorter way to write this would be
local pad=${(l.$pad_len.)}
If you want to do this operation on a non-empty string, you can either pass the name of a variable
local foo=bar
local pad=${(l.$pad_len.)foo}
or pass a literal string with :-
local pad=${(l.$pad_len.):-bar}

Distinguish between transpose and command string in Julia lexer

For my thesis I am implementing a parser/lexer for Julia, however some areas are being a bit of a problem.
For background Julia has a special token that gives the transpose (`), also there is 'command string' that uses this same token to wrap the string (`command`). The problem I am having is that I can't seem to get a regex that will match properly.
i.e.
this should match for a transpose:
a`
as well as
a` b`
and
a`
b`
and this should match the command string
`a`
and also:
` a
b `
The issue I'm having is that either, when there's 2 transposes in a file it will match the command string, or when there is new line in a command string then the parser will fail as both are seen as only a transpose, to me this seems like they are mutually exclusive.
The regexes in the order in which they are in the lexer are:
option 1:
COMMAND
: '`' (ESC|.)*? '`'
;
TRANSPOSE
: '\'' | '`'
;
option 2:
COMMAND
: '`' ( '\\' | ~[\\\r\n\f] )* '`'
;
TRANSPOSE
: '\'' | '`'
;
As has been noted in the comments, the transpose operator in Julia is actually ', rather than `. What has not been noted yet is there is at least one one critical diffence between how ' is used and how ` is used) that makes your job a lot easier. Specifically:
Unlike `, which may be used to quote command strings of any length, ' is only ever used to quote characters. Consequently, the only valid uses of ' as a quotation are of single characters (e.g. `a`) or one of the special ANSI escape sequences beginning with \ such as `\n` for newline (the full list being to my knowledge \a, \b, \f, \n, \r, \t, \v, \`, \", and \\).
Consequently, the 's in a sequence like [a' b']can only possibly be interpreted as transposes since ' b' is not a valid Char
While juxtaposition can be taken to mean multiplication in Julia, and multiplication can in turn be used to concatenate strings in Julia (long story -- string concatenation is the associative, noncommutative binary operation of a free monoid and thus analogous to multiplication), juxtaposition is not currently allowed as a way to multiply strings or characters.
Consequently, a sequence like a'b' can only be interpreted as a' * b' and not a * 'b'.
Combining these two more broadly, unless I am missing some edge case, it appears that a new ' following any character other than whitespace, parentheses, or a valid infix operator, is always parsed as transpose, rather than the opening quote of a character literal.

Using GLOB to match each character in an SQLite TEXT field

I have an TEXT field in an SQLite table that is required to be exactly 17 characters long. Let's call it myfield.
For any arbitrary row in this table, how can we use SQLite's GLOB operator within a CHECK bracket to match each character in the myfield column entry (and, if possible, enforce the 17 character constraint)?
Every character in this entry can be any digit 0 - 9, or any uppercase letter excluding I, O, and Q. Also, the 9th character of the myfield entry can only be a digit 0 - 9 or the letter X, but I'm still trying to get the previous conditions first.
What have I tried?
CHECK(myfield GLOB '[A-HJ-NPR-Z0-9]{17}') - This doesn't work, and I think it's because GLOB doesn't support the curly bracket notation - correct me if I'm wrong.
CHECK(length(myfield) == 17 AND myfield GLOB '[A-HJ-NPR-Z0-9]') - Also doesn't work, presumably because the second check condition only matches a single character, contradicting the first.
I'm convinced there's a simpler solution than setting up 17 check conditions for each character in the string!
In SQLite there is a trick to emulate a function like MySql's REPEAT() which returns a string repeated n times.
So by:
REPLACE(HEX(ZEROBLOB(8)), '00', '[A-HJ-NPR-Z0-9]')
you get the string '[A-HJ-NPR-Z0-9]' repeated 8 times:
[A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9]
You can use this trick to construct (by concatenations) the string that you need after GLOB in your CHECK constraint:
CHECK(myfield GLOB
REPLACE(HEX(ZEROBLOB(8)), '00', '[A-HJ-NPR-Z0-9]') ||
'[X0-9]' ||
REPLACE(HEX(ZEROBLOB(8)), '00', '[A-HJ-NPR-Z0-9]')
)
See a simplified demo.
Simply copy and paste it for 17 times
CHECK (myfield GLOB "[A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9][X0-9][A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9][A-HJ-NPR-Z0-9]")
And at the 9th element, just key in X0-9, so "the 9th character of the myfield entry can only be a digit 0 - 9 or the letter X" can be checked too
By doing this, you can also enforce the 17 characters constraint

Regex: How to extract text from last parenthesis

What is a correct regular expression to extract the string "(procedure)" -or in general text from inside the parenthesis - from the strings below
input string examples are
Positron emission tomography using flutemetamol (18F) with computed
tomography of brain (procedure)
another example
Urinary tract infection prophylaxis (procedure)
Possible approaches are:
Go to end of the text, and look for first opening parenthesis and take subset from that position to the end of the text
from beginning of text, identify last '(' char and do that position to end as substring
Other strings can be (different "tag" is extracted)
[1] "Xanthoma of eyelid (disorder)" "Ventricular tachyarrhythmia (disorder)"
[3] "Abnormal urine odor (finding)" "Coloboma of iris (disorder)"
[5] "Macroencephaly (disorder)" "Right main coronary artery thrombosis (disorder)"
(general regex is sought) (or a solution in R is even better)
If it is the last part of the string then this regex will do it:
/\(([^()]*)\)$/
Explaination: Look for an open ( and match everything in between it that isn't ( or ) and then has a ) at the end of the string.
https://regex101.com/r/cEsQtf/1
sub can do that with the right regex
Text = c("Positron emission tomography using flutemetamol (18F)
with computed tomography of brain (procedure)",
"Urinary tract infection prophylaxis (procedure)",
"Xanthoma of eyelid (disorder)",
"Ventricular tachyarrhythmia (disorder)",
"Abnormal urine odor (finding)",
"Coloboma of iris (disorder)",
"Macroencephaly (disorder)",
"Right main coronary artery thrombosis (disorder)")
sub(".*\\((.*)\\).*", "\\1", Text)
[1] "procedure" "procedure" "disorder" "disorder" "finding" "disorder"
[7] "disorder" "disorder"
Addendum: Detailed explanation of the regex
The question asks to find the content of the final set of parentheses in the strings. This expression is slightly confusing because it includes two different uses of parentheses, One is to represent parentheses in the string being processed and the other is to set up a "capturing group", the way that we specify what part should be returned by the expression. The expression is made up of five basic units:
1. Initial .* - matches everything up to the final open parenthesis.
Note that this is relying on "greedy matching"
2. \\( ... \\) - matches the final set of parentheses.
Because ( by itself means something else, we need to "escape" the
parentheses by preceding them with \. That is we want the regular
expression to say \( ... \). However, the way R interprets strings,
if we just typed \( and \), R would interpret the \ as escaping the (
and so interpret this as just ( ... ). So we escape the backslash.
R will interpret \\( ... \\) as \( ... \) meaning the literal
characters ( & ).
3. ( ... ) Inside the pair in part 2
This is making use of the special meaning of parentheses. When we
enclose an expression in parentheses, whatever value is inside them
will be stored in a variable for later use. That variable is called
\1, which is what was used in the substitution pattern. Again, is
we just wrote \1, R would interpret it as if we were trying to escape
the 1. Writing \\1 is interpreted as the character \ followed by 1,
i.e. \1.
4. Central .* Inside the pair in part 3
This is what we are looking for, all characters inside the parentheses.
5. Final .*
This is in the expression to match any characters that may follow the
final set of parentheses.
The sub function will use this to replace the matched pattern (in this case, all characters in the string) with the substitution pattern \1 i.e. the contents of the variable containing whatever was in the first (in our case only) capturing group - the stuff inside the final parentheses.
You can actually use the following to extract the text inside nested parentheses at the end of string:
x <- c("FELON IN POSSESSION OF AMMUNITION (ACTUAL POSSESSION) (79023)",
"FAIL TO DISPLAY REGISTRATION - POSSESSION REQUIRED (320.0605(1))")
sub(".*(\\(((?:[^()]++|(?1))*)\\))$", "\\2", x, perl=TRUE)
See the online R demo and the regex demo.
Details:
.* - any zero or more chars other than line break chars, as many as possible
(\(((?:[^()]++|(?1))*)\)) - Capturing group 1 (necessary for recursion to take place):
\( - a ( char
((?:[^()]++|(?1))*) - Capturing group 2 (our value): zero or more occurrences of any one or more chars other than ( and ), or the whole Group 1 pattern
\) - a ) char
$ - end of string.
The whole string is thus, when matched, replaced with the value of Group 2. If there is no match, the string remains what it was.

Regular expression to match maximium of five words

I have a regular expression
^[a-zA-Z+#-.0-9]{1,5}$
which validates that the word contains alpha-numeric characters and few special characters and length should not be more than 5 characters.
How do I make this regular expression to accept a maximum of five words matching the above regular expression.
^[a-zA-Z+#\-.0-9]{1,5}(\s[a-zA-Z+#\-.0-9]{1,5}){0,4}$
Also, you could use for example [ ] instead of \s if you just want to accept space, not tab and newline. And you could write [ ]+ (or \s+) for any number of spaces (or whitespaces), not just one.
Edit: Removed the invalid solution and fixed the bug mentioned by unicornaddict.
I believe this may be what you're looking for. It forces at least one word of your desired pattern, then zero to four of the same, each preceded by one or more white-space characters:
^XX(\s+XX){0,4}$
where XX is your actual one-word regex.
It's separated into two distinct sections so that you're not required to have white-space at the end of the string. If you want to allow for such white-space, simply add \s* at that point. For example, allowing white-space both at start and end would be:
^\s*XX(\s+XX){0,4}\s*$
You regex has a small bug. It matches letters, digits, +, #, period but not hyphen and also all char between # and period. This is because hyphen in a char class when surrounded on both sides acts as a range meta char. To avoid this you'll have to escape the hyphen:
^[a-zA-Z+#\-.0-9]{1,5}$
Or put it at the beg/end of the char class, so that its treated literally:
^[-a-zA-Z+#-.0-9]{1,5}$
^[a-zA-Z+#.0-9-]{1,5}$
Now to match a max of 5 such words you can use:
^(?:[a-zA-Z+#\-.0-9]{1,5}\s+){1,5}$
EDIT: This solution has a severe limitation of matching only those input that end in white space!!! To overcome this limitation you can see the ans by Jakob.

Resources