Make tr replace only exact characters given in arguments - unix

I'm trying to convert filenames to remove unacceptable characters, but tr doesn't always treat its input arguments exactly as they're given.
For example:
$ echo "(hello) - {world}" | tr '()-{}' '_'
_______ _ _______
...whereas I only intended to replace (, ), -, { and }, all the characters between ) and { in ASCII collation order were replaced as well -- so every letter in the input also became a _!
Is there a way to make tr replace only the exact characters given in its argument?

tr's syntax is surprisingly complicated. It supports ranges, character classes, collation-based equivalence matching, etc.
To avoid surprises (when a string matches any of that syntax unexpectedly), we can convert our literal characters to a string of \### octal specifiers of those characters' ordinals:
trExpressionFor() {
printf %s "$1" | od -v -A n -b | tr ' ' '\\'
}
trL() { # name short for "tr-literal"
tr "$(trExpressionFor "$1")" "$(trExpressionFor "$2")"
}
...used as:
$ trExpressionFor '()-{}'
\050\051\055\173\175
$ echo "(hello) - {world}" | trL '()-{}' '_'
_hello_ _ _world_

Related

Properly catching an unclosed string in ANTLR4

I have to define string literal in ANTLR4 and catch UNCLOSE_STRING exceptions.
Strings are surrounded by a pair of "" and have may have supported escapes:
\b \f \r \n \t \’ \\
The only way for " to appear inside a string is to be appended by a
' ('").
I have tried various ways to define a string literal but they were all catched by UNCLOSE_STRING:
program: global_variable_part function_declaration_part EOF;
<!-- Shenanigans of statements ...-->
fragment Character: ~( [\b\f\r\n\t"\\] | '\'') | Escape | '\'"';
fragment Escape: '\\' ( 'b' | 'f' | 'r' | 'n' | 't' | '\'' | '\\');
fragment IllegalEscape: '\\' ~( 'b' | 'f' | 'r' | 'n' | 't' | '\'' | '\\') ;
STR_LIT: '"' Character* '"' {
content = str(self.text)
self.text = content[1:-1]
};
UNCLOSE_STRING: '"' Character* ([\b\f\r\n\t\\] | EOF) {
esc = ['\b', '\t', '\n', '\f', '\r', '\\']
content = str(self.text)
raise UncloseString(content)
};
For example
"ab'"c\\n def" would match but only Unclosed String: ab'"c\n def" was produced.
This is quite close to the specification for Strings in Java. Don't be afraid to "borrow" from other grammars. I slight modification to the Java Lexer rules that (I think) matches your needs would be:
StringLiteral
: '"' StringCharacters? '"'
;
fragment
StringCharacters
: StringCharacter+
;
fragment
StringCharacter
: ~["\\\r\n]
| EscapeSequence
;
fragment
EscapeSequence
: '\\' [btnfr'\\]
: "\'"" // <-- the '" escape match
;
If you know of another language that's a closer match, you can look at how it was handled for looking for it's grammar here (ANTLR4 Grammars)

jq: How to output quotes on raw output on windows

Using raw output I have to quote some values of the output.
echo [{"a" : "b"}] | jq-win64.exe --raw-output ".[] | \"Result is: \" + .a + \".\""
generates
Result is: b.
but how can I generate
Result is: "b".
Unfortunately it has to run on Windows called from inside a CMD file.
You need to escape the slashes to escape a "
$ echo [{"a" : "b"}] | jq-win64.exe --raw-output ".[] | \"Result is: \\\"\" + .a + \"\\\".\""
Result is: "b".
A hacky workaround with less backslashing could be:
jq -r ".[] | \"Result is: \" + (.a|tojson)"
[REVISED to reflect OP goal.]
Since you're trying to output double quotes in a double quoted string, you need to escape the inner quotes. And to escape the inner quotes, you need to also escape the escaping backslashes. So a literal double quote would have to be entered as \\\". You can do this a little cleaner by using string interpolation instead of regular string concatenation.
jq -r ".[] | \"Result is: \\\"\(.a)\\\".\""

How do I fetch this substring using awk?

I have a string let's say
k=CHECK_${SOMETHING}_CUSTOM_executable.acs
Now I want to fetch only CUSTOM_executable from the above string. This is what I have tried so far in Unix
echo $k|awk -F '_' '{print $2}'
Can you explain how can i do this
Try this :
$ echo "$k"
CHECK_111_CUSTOM_executable.acs
code:
echo "$k" | awk 'BEGIN{FS=OFS="_"}{sub(/.acs/, "");print $3, $4}'
Assume the variable ${SOMETHING} has the value SOMETHING just for simplicity.
The following assignment, therefore,
k=CHECK_${SOMETHING}_CUSTOM_executable.acs
sets the value of k to CHECK_SOMETHING_CUSTOM_executable.acs.
When split into fields on _ by awk -F '_' (note the single quotes aren't necessary here).
You get the following fields:
$ echo "$k" | awk -F _ '{for (i=0; i<=NF; i++) {print i"="$i}}'
0=CHECK_SOMETHING_CUSTOM_executable.acs
1=CHECK
2=SOMETHING
3=CUSTOM
4=executable.acs
So to get the output you want simply use
echo "$k" | awk -F _ -v OFS=_ '{print $3,$4}'
Suppose if SOMETHING variable is having 111_222_333 (or) 111_222_333_444,
Use this:
$ k=CHECK_${SOMETHING}_CUSTOM_executable.acs
$ echo $k | awk 'BEGIN{FS=OFS="_"}{ print $(NF-1),$NF }'
(Or)
echo $k | awk -F_ '{ print $(NF-1), $NF }' OFS=_
Explanation :
NF - The number of fields in the current input record.
Try this simple awk:
awk -F[._] '{print $3"_"$4}' <<<"$k"
CUSTOM_executable
The -F[._] defines both dot and underline as field separator. Then awk prints the filed number 3 and 4 from $k as input.
If the k contains k='CHECK_${111_111}_CUSTOM_executable.acs', then use filed with numbers $4 and $5:
awk -F[._] '{print $4"_"$5}' <<<"$k"
CHECK_${111_111}_CUSTOM_executable.acs
| $1| |$2 | |$3| | $4 | | $5 | |$6|
You do not need to use awk, it can be done in bash easily. I assume that $SOMETHING does not contains _ characters (also CUSTOM and executable part is just some text, they also not contains _). Then:
k=CHECK_${SOMETHING}_CUSTOM_executable.acs
l=${k#*_}; l=${l#*_}; l=${l%.*};
This cuts anything from the beginning to the 2nd _ char, and chomps off anything after the last . char. Result is put into the l env.var.
If $SOMETHING may contain _ then a little bit work has to be done (I assume the CUSTOM and executable part does not contain _):
k=CHECK_${SOMETHING}_CUSTOM_executable.acs
l=${k%_*}; l=${l%_*}; l=${k#${l}_*}; l=${l%.*};
This chomps off everything after the last but one _ character, the cuts the result off from the original string. The last statement chomps the extension off. The result is in l env.var.
Or it can be done using regex:
[[ $k =~ ([^_]+_[^_]+)\.[^.]+$ ]] && l=${BASH_REMATCH[1]}
This matches any string containing two words separated by _ and finished with .<extension>. The extension part is chomped off and result is in l env.var.
I hope this helps!

How to Use Chaining Translation in Unix

I want translate the word "abcd" into upper case "ABCD" using tr command then translate the "ABCD" to digit e.g 1234.
I want to chain two translations together (lowercase to upper case, then upper case to 1234) using pipes and also pipe the final output into more.
I'm not able to chain the second part.
echo "abcd" | tr '[:lower:]' '[:upper:]' > file1
Here I'm not sure how to add the second translation in the same command.
You can't do it in a single tr command; you can do it in a single pipeline:
echo "abcd" | tr '[:lower:]' '[:upper:]' | tr 'ABCD' '1234'
Note that your [:lower:] and [:upper:] notation will translate more than abcd to ABCD. If you want to extend the mapping of digits so A-I map to 1-9, that's doable; what maps to 0?
If you want to do it in a single command, then you could write:
echo "abcdABCD" | tr 'abcdABCD' '12341234'
Or, abbreviated slightly:
$ echo 'abecedenarian-DIABOLICALISM' | tr 'a-dA-D' '1-41-4'
12e3e4en1ri1n-4I12OLI31LISM
$

How to convert BNF to EBNF

How can I convert this BNF to EBNF?
<vardec> ::= var <vardeclist>;
<vardeclist> ::= <varandtype> {;<varandtype>}
<varandtype> ::= <ident> {,<ident>} : <typespec>
<ident> ::= <letter> {<idchar>}
<idchar> ::= <letter> | <digit> | _
EBNF or Extended Backus-Naur Form is ISO 14977:1996, and is available in PDF from ISO for free*. It is not widely used by the computer language standards. There's also a paper that describes it, and that paper contains this table summarizing EBNF notation.
Table 1: Extended BNF
Extended BNF Operator Meaning
-------------------------------------------------------------
unquoted words Non-terminal symbol
" ... " Terminal symbol
' ... ' Terminal symbol
( ... ) Brackets
[ ... ] Optional symbols
{ ... } Symbols repeated zero or more times
{ ... }- Symbols repeated one or more times†
= in Defining symbol
; post Rule terminator
| in Alternative
, in Concatenation
- in Except
* in Occurrences of
(* ... *) Comment
? ... ? Special sequence
The * operator is used with a preceding (unsigned) integer number; it does not seem to allow for variable numbers of repetitions — such as 1-15 characters after an initial character to make identifiers up to 16 characters long. This lis
In the standard, open parenthesis ( is called start group symbol and close parenthesis ) is called end group symbol; open square bracket [ is start option symbol and close square bracket is end option symbol; open brace { is start repeat symbol and close brace } is end repeat symbol. Single quotes ' are called first quote symbol and double quotes " are second quote symbol.
* Yes, free — even though you can also pay 74 CHF for it if you wish. Look at the Note under the box containing the chargeable items.
The question seeks to convert this 'BNF' into EBNF:
<vardec> ::= var <vardeclist>;
<vardeclist> ::= <varandtype> {;<varandtype>}
<varandtype> ::= <ident> {,<ident>} : <typespec>
<ident> ::= <letter> {<idchar>}
<idchar> ::= <letter> | <digit> | _
The BNF is not formally defined, so we have to make some (easy) guesses as to what it means. The translation is routine (it could be mechanical if the BNF is formally defined):
vardec = 'var', vardeclist, ';';
vardeclist = varandtype, { ';', varandtype };
varandtype = ident, { ',', ident }, ':', typespec;
ident = letter, { idchar };
idchar = letter | digit | '_';
The angle brackets have to be removed around non-terminals; the definition symbol ::= is replaced by =; the terminals such as ; and _ are enclosed in quotes; concatenation is explicitly marked with ,; and each rule is ended with ;. The grouping and alternative operations in the original happen to coincide with the standard notation. Note that explicit concatenation with the comma means that multi-word non-terminals are unambiguous.
† Casual study of the standard itself suggests that the {...}- notation is not part of the standard, just of the paper. However, as jmmut notes in a comment, the standard does define the meaning of {…}-:
§5.8 Syntactic term
…
When a syntactic-term is a syntactic-factor followed by
an except-symbol followed by a syntactic-exception it
represents any sequence of symbols that satisfies both of
the conditions:
a) it is a sequence of symbols represented by the syntactic-factor,
b) it is not a sequence of symbols represented by the
syntactic-exception.
…
NOTE - { "A" } - represents a sequence of one or more A's because it is a syntactic-term with an empty syntactic-exception.
Remove the angle brackets and put all terminals into quotes:
vardec ::= "var" vardeclist;
vardeclist ::= varandtype { ";" varandtype }
varandtype ::= ident { "," ident } ":" typespec
ident ::= letter { idchar }
idchar ::= letter | digit | "_"

Resources