Zsh oneliner to Extract part of string until and including token

Zsh oneliner to Extract part of string until and including token - zsh

I need a solution in zsh
I have a string like http://xxx.abc.mp3?yyy:mno. Is there a one-liner in zsh that can extract the string until mp3, that http://xxx.abc.mp3? I can do this in bash, but I needed a way to do it in zsh.

You didn't say how you did it in bash, but the same substitution works with both shells:
str='http://xxx.abc.mp3?yyy:mno'
printf "%s\n" "${str/%\?*}"
Removes the text that matches a question mark and 0 or more characters until the end of the string.

Related

how to echo literal variable value with zsh

I have a simple shell function to convert a *nix style path to Windows style (I happen to be using Windows Subsystem for Linux).
# convert "/mnt/c/Users/josh" to "C:\Users\josh"
function winpath(){
enteredPath=$1
newPath="${enteredPath/\/mnt\/c/C:}" # replace /mount/c/ with C:
newPath="${newPath//\//\\}" # replace / with \
echo $newPath
}
The desired behavior is:
$ winpath /mnt/c/Users/josh
C:\Users\josh
This works correctly in bash, but in zsh, echo seems to do some extra interpolation of the $newPath value. It behaves like this:
$ winpath /mnt/c/Users/josh
C:sers\josh
What character sequence is echo interpolating and why is it remove the \U? Most importantly, how do I return the literal value?
I've tried digging through the zsh documentation, but it's a jungle. Thanks in advance!

zsh processes certain escape sequences that bash does not by default. \U introduces 4-byte Unicode codepoint, but since the following 8 characters are not a valid hexadecimal number, no character is substituted.
I would recommend using printf, as its behavior is much more predictable from shell to shell.
printf '%s\n' "$newPath"

The problem is that you are using the internal command echo, instead of the external one. If you would write
command echo $newPath
you would get the expected output. command forces zsh to look up the command word according to the current PATH, ignoring internal commands, aliases or functions of the same name.

Find and replace: \'

I'm trying to replace a every reference of \' with &apos; in a file
I've used variations of: sed -e s/\'/"\&apos;"/g file.txt
But they always replace every.single.(single).quote
Any help would be greatly appreciated.

Not sure it's the best solution,I could do it like this:
sed "s/[\]'/\"\&apos;\"/g" file.txt
(putting the backslash character in a character range so it doesn't interfere with the following quote, and protect with double quotes)
Or just extending your syntax, without quotes but using almost the same trick:
sed -e s/[\\]\'/"\&apos;"/g file.txt

An approach trying to conserve as much of the "single-quotedness" of the sed command as possible:
sed 's/\\'"'"'/\&apos;/g'
Just escaping \ with \\ and getting a single quote into the command with '"'"': the first single quote ends the command so far, then we have a double-quoted single quote ("'"), and finally an opening single quote for the rest of the command.
Alternatively, double quoting the whole command and escaping both the backslash and single quote:
sed "s/\\\'/\&apos;/g"

The correct syntax is:
$ echo "foo'bar" | sed 's/'\''/\&apos;/'
foo&apos;bar
Every script (sed, awk, whatever) should always be enclosed in single quotes and you just us other single quotes to stop/restart the script delimiters break out to shell for the minimal portion of the script that's absolutely necessary, in this case long enough to use \'. You need to break out to shell to specify that ' because per shell rules no script enclosed in 's can contain a ', not even if you try to escape it.

echo "foo'bar" | gawk '{gsub(/\47/,"\\&apos;")}1'
foo&apos;bar
The tricky part here is to replace a single quote with ampersand.
First in order to make the single quote manageable use its octal
code here \47 and then escaping ampersand by two back slash. And all of sudden
it becomes feasible :)

Count number of slashes in string with zsh

I try to get the number of slashes in a string with zsh. I thought it should work like this (replace all non / chars and then count them)
foo="sdfds/sf/sdf/sdf/sd/f//sdf/"
print ${#foo//[^/]/}
But i get preexec:26: bad pattern: [^. It seems to work for all characters except /. I tried to escape it with backslashes but it did not work until I added 3 backslashes:
print ${#foo//[^\\\/]/}
Why do I need to find an escaped slash in the string?
edit: Yes, it seems to work with one backslash using zsh -f.
My setopt:
autocd
autopushd
nobeep
completeinword
correct
extendedglob
extendedhistory
nohistbeep
histfindnodups
histignorealldups
histignoredups
histignorespace
histnofunctions
histnostore
histreduceblanks
histsavenodups
histverify
nohup
incappendhistory
interactive
interactivecomments
longlistjobs
monitor
nonomatch
promptsubst
pushdignoredups
sharehistory
shinstdin
transientrprompt
zle

Unix sort text file with user-defined newline character

I have a plain text file where newline character in not "\n" but a special character.
Now I want to sort this file.
Is there a direct way to specify custom new-line character while using unix sort command?
I don't want to use a script for this as far as possible?
Please note the data in text file have \n, \r\n, and \t characters(the reason for such data is application specific so please don't comment on that).
The sample data is as below:
1111\n1111<Ctrl+A>
2222\t2222<Ctrl+A>
3333333<Ctrl+A>
Here Ctrl+A is the newline character.

Use perl -001e 'print sort <>' to do this:
prompt$ cat -tv /tmp/a
2222^I2222^A3333333^A1111
1111^A
prompt$ perl -001e 'print sort <>' /tmp/a | cat -tv
1111
1111^A2222^I2222^A3333333^Aprompt$
That works because character 001 (octal 1) is control-A ("\cA"), which is your record terminator in this dataset.
You can also use the code point in hex using -0xHHHHH. Note that it must be a single code point, not a string, using this shortcut. There are ways of doing it for strings and even regexes that involve infinitessimally more code.

grep a tab in UNIX

How do I grep tab (\t) in files on the Unix platform?

If using GNU grep, you can use the Perl-style regexp:
grep -P '\t' *

The trick is to use $ sign before single quotes. It also works for cut and other tools.
grep $'\t' sample.txt

I never managed to make the '\t' metacharacter work with grep.
However I found two alternate solutions:
Using <Ctrl-V> <TAB> (hitting Ctrl-V then typing tab)
Using awk: foo | awk '/\t/'

From this answer on Ask Ubuntu:
Tell grep to use the regular expressions as defined by Perl (Perl has
\t as tab):
grep -P "\t" <file name>
Use the literal tab character:
grep "^V<tab>" <filename>
Use printf to print a tab character for you:
grep "$(printf '\t')" <filename>

One way is (this is with Bash)
grep -P '\t'
-P turns on Perl regular expressions so \t will work.
As user unwind says, it may be specific to GNU grep. The alternative is to literally insert a tab in there if the shell, editor or terminal will allow it.

Another way of inserting the tab literally inside the expression is using the lesser-known $'\t' quotation in Bash:
grep $'foo\tbar' # matches eg. 'foo<tab>bar'
(Note that if you're matching for fixed strings you can use this with -F mode.)
Sometimes using variables can make the notation a bit more readable and manageable:
tab=$'\t' # `tab=$(printf '\t')` in POSIX
id='[[:digit:]]\+'
name='[[:alpha:]_][[:alnum:]_-]*'
grep "$name$tab$id" # matches eg. `bob2<tab>323`

There are basically two ways to address it:
(Recommended) Use regular expression syntax supported by grep(1). Modern grep(1) supports two forms of POSIX 1003.2 regex syntax: basic (obsolete) REs, and modern REs. Syntax is described in details on re_format(7) and regex(7) man pages which are part of BSD and Linux systems respectively. The GNU grep(1) also supports Perl-compatible REs as provided by the pcre(3) library.
In regex language the tab symbol is usually encoded by \t atom. The atom is supported by BSD extended regular expressions (egrep, grep -E on BSD compatible system), as well as Perl-compatible REs (pcregrep, GNU grep -P).
Both basic regular expressions and Linux extended REs apparently have no support for the \t. Please consult UNIX utility man page to know which regex language it supports (hence the difference between sed(1), awk(1), and pcregrep(1) regular expressions).
Therefore, on Linux:
$ grep -P '\t' FILE ...
On BSD alike system:
$ egrep '\t' FILE ...
$ grep -E '\t' FILE ...
Pass the tab character into pattern. This is straightforward when you edit a script file:
# no tabs for Python please!
grep -q ' ' *.py && exit 1
However, when working in an interactive shell you may need to rely on shell and terminal capabilities to type the proper symbol into the line. On most terminals this can be done through Ctrl+V key combination which instructs terminal to treat the next input character literally (the V is for "verbatim"):
$ grep '<Ctrl>+<V><TAB>' FILE ...
Some shells may offer advanced support for command typesetting. Such, in bash(1) words of the form $'string' are treated specially:
bash$ grep $'\t' FILE ...
Please note though, while being nice in a command line this may produce compatibility issues when the script will be moved to another platform. Also, be careful with quotes when using the specials, please consult bash(1) for details.
For Bourne shell (and not only) the same behaviour may be emulated using command substitution augmented by printf(1) to construct proper regex:
$ grep "`printf '\t'`" FILE ...

Use echo to insert the tab for you grep "$(echo -e \\t)"

grep "$(printf '\t')" worked for me on Mac OS X

A good choice is to use sed.
sed -n '/\t/p' file
Examples (works in bash, sh, ksh, csh,..):
[~]$ cat testfile
12 3
1 4 abc
xa c
a c\2
1 23
[~]$ sed -n '/\t/p' testfile
xa c
a c\2
[~]$ sed -n '/\ta\t/p' testfile
a c\2
(This answer has been edited following suggestions in comments. Thank you all)

use gawk, set the field delimiter to tab (\t) and check for number of fields. If more than 1, then there is/are tabs
awk -F"\t" 'NF>1' file

+1 way, that works in ksh, dash, etc: use printf to insert TAB:
grep "$(printf 'BEGIN\tEND')" testfile.txt

On ksh I used
grep "[^I]" testfile

The answer is simpler. Write your grep and within the quote type the tab key, it works well at least in ksh
grep " " *

Using the 'sed-as-grep' method, but replacing the tabs with a visible character of personal preference is my favourite method, as it clearly shows both which files contain the requested info, and also where it is placed within lines:
sed -n 's/\t/\*\*\*\*/g' file_name
If you wish to make use of line/file info, or other grep options, but also want to see the visible replacement for the tab character, you can achieve this by
grep -[options] -P '\t' file_name | sed 's/\t/\*\*\*\*/g'
As an example:
$ echo "A\tB\nfoo\tbar" > test
$ grep -inH -P '\t' test | sed 's/\t/\*\*\*\*/g'
test:1:A****B
test:2:foo****bar
EDIT: Obviously the above is only useful for viewing file contents to locate tabs --- if the objective is to handle tabs as part of a larger scripting session, this doesn't serve any useful purpose.

This works well for AIX. I am searching for lines containing JOINED<\t>ACTIVE
voradmin cluster status | grep JOINED$'\t'ACTIVE
vorudb201 1 MEMBER(g) JOINED ACTIVE
*vorucaf01 2 SECONDARY JOINED ACTIVE

You might want to use grep "$(echo -e '\t')"
Only requirement is echo to be capable of interpretation of backslash escapes.

These alternative binary identification methods are totally functional. And, I really like the one's using awk, as I couldn't quite remember the syntaxic use with single binary chars. However, it should also be possible to assign a shell variable a value in a POSIX portable fashion (i.e. TAB=echo "#" | tr "\100" "\011"), and then employ it from there everywhere, in a POSIX portable fashion; as well (i.e grep "$TAB" filename). While this solution works well with TAB, it will also work well other binary chars, when another desired binary value is used in the assignment (instead of the value for the TAB character to 'tr').

The $'\t' notation given in other answers is shell-specific -- it seems to work in bash and zsh but is not universal.
NOTE: The following is for the fish shell and does not work in bash:
In the fish shell, one can use an unquoted \t, for example:
grep \t foo.txt
Or one can use the hex or unicode notations e.g.:
grep \X09 foo.txt
grep \U0009 foo.txt
(these notations are useful for more esoteric characters)
Since these values must be unquoted, one can combine quoted and unquoted values by concatenation:
grep "foo"\t"bar"

You can also use a Perl one-liner instead of grep resp. grep -P:
perl -ne 'print if /\t/' FILENAME

You can type
grep \t foo
or
grep '\t' foo
to search for the tab character in the file foo. You can probably also do other escape codes, though I've only tested \n. Although it's rather time-consuming, and unclear why you would want to, in zsh you can also type the tab character, back to the begin, grep and enclose the tab with quotes.

Look for blank spaces many times [[:space:]]*
grep [[:space:]]*'.''.'
Will find something like this:
'the tab' ..
These are single quotations ('), and not double ("). This is how you make concatenation in grep. =-)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Zsh oneliner to Extract part of string until and including token - zsh

I need a solution in zsh I have a string like http://xxx.abc.mp3?yyy:mno. Is there a one-liner in zsh that can extract the string until mp3, that http://xxx.abc.mp3? I can do this in bash, but I needed a way to do it in zsh.

You didn't say how you did it in bash, but the same substitution works with both shells: str='http://xxx.abc.mp3?yyy:mno' printf "%s\n" "${str/%\?*}" Removes the text that matches a question mark and 0 or more characters until the end of the string.

Related

how to echo literal variable value with zsh

Find and replace: \'

Count number of slashes in string with zsh

Unix sort text file with user-defined newline character

grep a tab in UNIX

Categories

Resources