Zsh escape backslash - zsh

I noticed a while ago already that in zsh, you can get a \ by typing \\ like in bash.
> echo \\
\
But, there's a strange phenomenon with 4 backlashes in zsh.
bash$ echo \\\\
\\
zsh> echo \\\\
\
Do you know why ? Is it a bug ?

No, this is not a bug. It's just that the echo implementation in these shells
have a different default settings for interpretation of backslash sequences.
In either shell the command-line parser will remove one layer of backslashes
converting 4 backslashes to 2. That argument is then passed to the echo
builtin command. When echo interprets backslash sequences 1 backslash is
output for that sequence, if backslash interpretation isn't being done by echo
2 backslashes will be output.
In either shell's implementation of echo the -e or -E option can be used
to respectively enable or disable backslash interpretation. So the following
will produce the same output in either shell:
echo -e \\\\
echo -E \\\\
Both shells also have shell-level options to alter the default behaviour of
their echo command. In zsh the default can be changed with setopt BSD_echo,
to change the default in bash the command is shopt -s xpg_echo.
If you're trying to write portable shell scripts, you'd be best served by
avoiding use of echo altogether; it is one of the least portable commands
around. Use printf instead.

Related

Single quotes in awk's system

I am trying to run bioawk (an extension of awk for fasta files) from awk's system functionality:
awk -v var=$i '{system("~/bin/bioawk-master/bioawk -c fastx '\''{if ($name==\""var"\"){print \">\"$name\"\\\\n\"$seq}}'\'' ../../prokka/"$2"/"$1"/"$1".ffn")}'
The result prints the literal "\n" between the values of $name and $seq instead of the intended carriage return.
What it prints:
NAME\nSEQUENCE
What I would like it to print:
NAME
SEQUENCE
When I print the bioawk command that want to run with:
awk -v var=$i '{system("echo ~/bin/bioawk-master/bioawk -c fastx '\''{if ($name==\""var"\"){print \">\"$name\"\\\\n\"$seq}}'\'' ../../prokka/"$2"/"$1"/"$1".ffn")}'
I get:
~/bin/bioawk-master/bioawk -c fastx {if ($name=="CANHHJNM_03494"){print ">"$name"\n"$seq}} ../../prokka/p190631-dr-tm-dc-sp-pi/EP41/EP41.ffn
I can see that it is missing the single quotes surrounding the brackets. I though having '\'' would solve this issue, but obviously it doesn't. Any help with this problem would be much appreciated
not sure this will solve your problem but the (second) easiest way to handle single quotes in an awk script is defining it externally as a variable
$ awk -v q="'" 'BEGIN{print q "single_quoted" q}'
'single_quoted'

Zsh backslash madness?

Zsh seems to do some weird backslashing when you try to echo a bunch of backslashes. I cannot seem to figure out a very clear pattern to this. Any reasons for this madness? Of course, if I actually wanted to use backslashes properly, then I'd use proper quoting etc, but why does this happen in the first place?
Here's a small example to show the same:
$ echo \\
\
$ echo \\ \\
\ \
$ echo \\ \\ \\
\ \ \
$ echo \\ \\ \\ \\
\ \ \ \
$ echo \\\\ \\ \\
\ \ \
$ echo \\\\\\ \\
\\ \
$ echo \\\\\\\\
\\
I initially independently discovered this a while ago, but was reminded of it by this tweet by Zach Riggle.
In the first step, the echo command is not special. The command line is parsed by rules that are independent of what command is being executed. The overall effect of this step is convert your command from a series of characters to a series of words.
The two general parsing rules you need to know to understand this example are: the space character separates words, and the backslash character escapes special characters, including itself.
So the command echo \\ becomes a list of 2 words:
echo
\
The first backslash escapes the second one, resulting in a single backslash being in the second word.
echo \\ \\ \\ \\
becomes this list of words:
echo
\
\
\
\
Now command line parsing is done. Only now does the shell look for a command named by the first word. Up until now, the fact that the command is echo has been irrelevant. If you'd said cat \\ \\ \\ \\, cat would be invoked with 4 argument words, each containing a single backslash.
Normally when you run echo you'll be getting the shell builtin command. The zsh builtin echo has configurable behavior. I like to use setopt BSD_ECHO to select BSD-style echo behavior, but from your sample output it appears you are in the default mode, SysV-style.
BSD-style echo doesn't do any backslash processing, it would just print them as it received them.
SysV echo processes backslash escapes like in C strings - \t becomes a tab character, \r becomes a carriage return, etc. Also \c is interpreted as "end the output without a newline".
So if you said echo a\\tb then the shell parsing would result in a single backslash in the argument word given to echo, and echo would interpret a\tb and print a and b separated by a tab. It would be more readable if written as echo 'a\tb', using apostrophes to provide quoting at the shell-command-parsing level. Likewise echo \\\\ is two backslashes after command line parsing, so echo sees \\ and outputs one backslash. If you wanted to print literally a\tb without using an other form of quoting, you'd have to says echo a\\\\tb.
So the shell has a simple rule - two backslashes on the command line to make one backslash in the argument word. And echo has a simple rule - two backslashes in the argument word to make one backslash in the output.
But there's a problem... when echo does its thing, a backslash followed by t means output a tab, a backslash followed by a backslash means output a backslash... but there are lots of combinations that don't mean anything. A backslash followed by T for example is not a valid escape sequence. In C it would be a warning or an error. But the echo command tries to be more tolerant.
Try echo \\T or echo '\T' and you will discover that a backslash followed by anything that doesn't have a defined meaning as a backslash escape will just cause echo to output both characters as-is.
Which brings us to the last case: what if the backslash isn't followed by anything at all? What if it's the last character in the argument word? In that case, echo just outputs the backslash.
So in summary, two backslashes in the argument word result in one backslash in the output. But one backslash in the argument word also results in one backslash in the output, if it is the last character in the word or if the backslash together with the next character don't form a valid escape sequence.
The command line echo \\\\ thus becomes the word list
echo
\\
which outputs a single backslash "properly", with quoting applied at all levels.
The command line echo \\ becomes the word list
echo
\
which output a single backslash "messily", because echo found a stray backslash at the end of the argument and was generous enough to output it for you even though it wasn't escaped.
The rest of the examples should be clear from these principles.

strange echo output

Can anybody explain this behaviour of the bash shell which is driving me nuts
[root#ns1 bin]# export test=`whois -h whois.lacnic.net 187.14.6.108 | grep -i inetnum: | awk '{print $2}'`
[root#ns1 bin]# echo $test
187.12/14
[root#ns1 bin]# echo "iptables -I INPUT -s $test -J DROP"
-J DROP -I INPUT -s 187.12/14
[root#ns1 bin]#
Why is my echo screwed up? It is being changed by the contents of $test.
If you change $test to "ABC" all is fine. Is it related to the slash?
Why is my echo screwed up? It is being changed by the contents of
$test.
Because your test contains a carriage return. Remove it:
test=$(whois -h whois.lacnic.net 187.14.6.108 | grep -i inetnum: | awk '{print $2}' | tr -d '\r')
Your test contains something like
1234567 -I INPUT -s 187.12/14\r-J DROP
which, due to the carriage return, is visible only as
-J DROP -I INPUT -s 187.12/14
The CR moves the cursor to the start-of-line, where it then overwrites previous characters.
You could try
echo "$test" | od -bc
to verify this.
This is almost certainly a carriage return. echo is doing its job correctly and emitting the string to your terminal; the problem is that your terminal is treating a part of the string as a command for it to follow (specifically, a LF character, $'\r', telling it to send the cursor to the beginning of the existing line).
If you want to see the contents of $test in a way which doesn't allow your terminal to interpret escape sequences or other control characters, run the following (note that the %q format string is a bash extension, not available in pure-POSIX systems):
printf '%q\n' "$test"
This will show you the precise contents formatted and escaped for use by the shell, which will be illuminative as to why they are problematic.
To remove $'\r', which is almost certainly the character giving you trouble, you can run the following parameter expansion:
test=${test//$'\r'/}
Unlike solutions requiring piping launching an extra process (such as tr), this happens inside the already-running bash shell, and is thus more efficient.

how to use sed from a tcl file

I'm trying to use the Unix "sed" command form within a tcl file, like this:
(to change multiple spaces to one space)
exec /bin/sed 's/ \+/ /g' $file
I also tried exec /bin/sed 's/ \\+/ /g' $file (an extra backslash)
none of the version work, and I get the error
/bin/sed: -e expression #1, char 1: Unknown command: `''
The command works fine when run from a linux terminal
What am I doing wrong?
What am I doing wrong?
What you're doing wrong is using ' (single quote) characters. They're not special to Tcl at all. The equivalent in Tcl is enclosing a word in {braces}; it gives no special treatment at all to the characters inside. Thus, what you seek to do would be:
exec /bin/sed {s/ +/ /g} $file
Mind you, if you're doing something more complex and the restriction of Tcl to whole-words being unquoted, then you might instead go for this:
exec /bin/sh -c "sed 's/ +/ /g' $file"
Or, real idiomatic Tcl just doesn't use sed for something this simple:
set f [open $file]
set replacedContents [regsub -all { +} [read $f] " "]
close $f
Use exec /bin/sed "s/\ +/\ /g" $file
The '\ ' tells TCL that there's an space there. Also using the '"' configures properly the string.

grep a tab in UNIX

How do I grep tab (\t) in files on the Unix platform?
If using GNU grep, you can use the Perl-style regexp:
grep -P '\t' *
The trick is to use $ sign before single quotes. It also works for cut and other tools.
grep $'\t' sample.txt
I never managed to make the '\t' metacharacter work with grep.
However I found two alternate solutions:
Using <Ctrl-V> <TAB> (hitting Ctrl-V then typing tab)
Using awk: foo | awk '/\t/'
From this answer on Ask Ubuntu:
Tell grep to use the regular expressions as defined by Perl (Perl has
\t as tab):
grep -P "\t" <file name>
Use the literal tab character:
grep "^V<tab>" <filename>
Use printf to print a tab character for you:
grep "$(printf '\t')" <filename>
One way is (this is with Bash)
grep -P '\t'
-P turns on Perl regular expressions so \t will work.
As user unwind says, it may be specific to GNU grep. The alternative is to literally insert a tab in there if the shell, editor or terminal will allow it.
Another way of inserting the tab literally inside the expression is using the lesser-known $'\t' quotation in Bash:
grep $'foo\tbar' # matches eg. 'foo<tab>bar'
(Note that if you're matching for fixed strings you can use this with -F mode.)
Sometimes using variables can make the notation a bit more readable and manageable:
tab=$'\t' # `tab=$(printf '\t')` in POSIX
id='[[:digit:]]\+'
name='[[:alpha:]_][[:alnum:]_-]*'
grep "$name$tab$id" # matches eg. `bob2<tab>323`
There are basically two ways to address it:
(Recommended) Use regular expression syntax supported by grep(1). Modern grep(1) supports two forms of POSIX 1003.2 regex syntax: basic (obsolete) REs, and modern REs. Syntax is described in details on re_format(7) and regex(7) man pages which are part of BSD and Linux systems respectively. The GNU grep(1) also supports Perl-compatible REs as provided by the pcre(3) library.
In regex language the tab symbol is usually encoded by \t atom. The atom is supported by BSD extended regular expressions (egrep, grep -E on BSD compatible system), as well as Perl-compatible REs (pcregrep, GNU grep -P).
Both basic regular expressions and Linux extended REs apparently have no support for the \t. Please consult UNIX utility man page to know which regex language it supports (hence the difference between sed(1), awk(1), and pcregrep(1) regular expressions).
Therefore, on Linux:
$ grep -P '\t' FILE ...
On BSD alike system:
$ egrep '\t' FILE ...
$ grep -E '\t' FILE ...
Pass the tab character into pattern. This is straightforward when you edit a script file:
# no tabs for Python please!
grep -q ' ' *.py && exit 1
However, when working in an interactive shell you may need to rely on shell and terminal capabilities to type the proper symbol into the line. On most terminals this can be done through Ctrl+V key combination which instructs terminal to treat the next input character literally (the V is for "verbatim"):
$ grep '<Ctrl>+<V><TAB>' FILE ...
Some shells may offer advanced support for command typesetting. Such, in bash(1) words of the form $'string' are treated specially:
bash$ grep $'\t' FILE ...
Please note though, while being nice in a command line this may produce compatibility issues when the script will be moved to another platform. Also, be careful with quotes when using the specials, please consult bash(1) for details.
For Bourne shell (and not only) the same behaviour may be emulated using command substitution augmented by printf(1) to construct proper regex:
$ grep "`printf '\t'`" FILE ...
Use echo to insert the tab for you grep "$(echo -e \\t)"
grep "$(printf '\t')" worked for me on Mac OS X
A good choice is to use sed.
sed -n '/\t/p' file
Examples (works in bash, sh, ksh, csh,..):
[~]$ cat testfile
12 3
1 4 abc
xa c
a c\2
1 23
[~]$ sed -n '/\t/p' testfile
xa c
a c\2
[~]$ sed -n '/\ta\t/p' testfile
a c\2
(This answer has been edited following suggestions in comments. Thank you all)
use gawk, set the field delimiter to tab (\t) and check for number of fields. If more than 1, then there is/are tabs
awk -F"\t" 'NF>1' file
+1 way, that works in ksh, dash, etc: use printf to insert TAB:
grep "$(printf 'BEGIN\tEND')" testfile.txt
On ksh I used
grep "[^I]" testfile
The answer is simpler. Write your grep and within the quote type the tab key, it works well at least in ksh
grep " " *
Using the 'sed-as-grep' method, but replacing the tabs with a visible character of personal preference is my favourite method, as it clearly shows both which files contain the requested info, and also where it is placed within lines:
sed -n 's/\t/\*\*\*\*/g' file_name
If you wish to make use of line/file info, or other grep options, but also want to see the visible replacement for the tab character, you can achieve this by
grep -[options] -P '\t' file_name | sed 's/\t/\*\*\*\*/g'
As an example:
$ echo "A\tB\nfoo\tbar" > test
$ grep -inH -P '\t' test | sed 's/\t/\*\*\*\*/g'
test:1:A****B
test:2:foo****bar
EDIT: Obviously the above is only useful for viewing file contents to locate tabs --- if the objective is to handle tabs as part of a larger scripting session, this doesn't serve any useful purpose.
This works well for AIX. I am searching for lines containing JOINED<\t>ACTIVE
voradmin cluster status | grep JOINED$'\t'ACTIVE
vorudb201 1 MEMBER(g) JOINED ACTIVE
*vorucaf01 2 SECONDARY JOINED ACTIVE
You might want to use grep "$(echo -e '\t')"
Only requirement is echo to be capable of interpretation of backslash escapes.
These alternative binary identification methods are totally functional. And, I really like the one's using awk, as I couldn't quite remember the syntaxic use with single binary chars. However, it should also be possible to assign a shell variable a value in a POSIX portable fashion (i.e. TAB=echo "#" | tr "\100" "\011"), and then employ it from there everywhere, in a POSIX portable fashion; as well (i.e grep "$TAB" filename). While this solution works well with TAB, it will also work well other binary chars, when another desired binary value is used in the assignment (instead of the value for the TAB character to 'tr').
The $'\t' notation given in other answers is shell-specific -- it seems to work in bash and zsh but is not universal.
NOTE: The following is for the fish shell and does not work in bash:
In the fish shell, one can use an unquoted \t, for example:
grep \t foo.txt
Or one can use the hex or unicode notations e.g.:
grep \X09 foo.txt
grep \U0009 foo.txt
(these notations are useful for more esoteric characters)
Since these values must be unquoted, one can combine quoted and unquoted values by concatenation:
grep "foo"\t"bar"
You can also use a Perl one-liner instead of grep resp. grep -P:
perl -ne 'print if /\t/' FILENAME
You can type
grep \t foo
or
grep '\t' foo
to search for the tab character in the file foo. You can probably also do other escape codes, though I've only tested \n. Although it's rather time-consuming, and unclear why you would want to, in zsh you can also type the tab character, back to the begin, grep and enclose the tab with quotes.
Look for blank spaces many times [[:space:]]*
grep [[:space:]]*'.''.'
Will find something like this:
'the tab' ..
These are single quotations ('), and not double ("). This is how you make concatenation in grep. =-)

Resources