filename expansion on assigning a non-array variable - zsh

This is about Zsh 5.5.1.
Say I have a glob pattern which expands to exactly one file, and I would like to assign this file to a variable. This works:
# N: No error if no files match. D: Match dot files. Y1: Expand to exactly one entry.
myfile=(*(NDY1))
and echo $myfile will show the file (or directory). But this one does not work:
myfile=*(NDY1)
In the latter case, echo $myfile holds the pattern, i.e. *(NDY1).
Of course I could do some cheap trick, such as creating a chilprocess via
myfile=$(echo *(NDY1))
but is there a way to do the assinment without such tricks?

By default, zsh does not do filename expansion in scalar assignment, but the option GLOB_ASSIGN could help. (This option is provided as for backwards compatibility only.)
local myfile=''
() {
setopt localoptions globassign
myfile=*(NDY1)
}
echo $myfile
;#>> something
Here are some descriptions in zsh docs:
The value of a scalar parameter may also be assigned by writing:
name=value
In scalar assignment, value is expanded as a single string, in which the elements of arrays are joined together; filename expansion is not performed unless the option GLOB_ASSIGN is set.
--- zshparam(1), Description, zsh parameters
GLOB_ASSIGN <C>
If this option is set, filename generation (globbing) is performed on the right hand side of scalar parameter assignments of the form 'name=pattern (e.g. foo=*'). If the result has more than one word the parameter will become an array with those words as arguments. This option is provided for backwards compatibility only: globbing is always performed on the right hand side of array assignments of the form name=(value) (e.g. foo=(*)) and this form is recommended for clarity; with this option set, it is not possible to predict whether the result will be an array or a scalar.
--- zshoptions(1), GLOB_ASSIGN, Expansion and Globbing, Description Of Options, zsh options

Related

Delete part of file name (up to but NOT including string)

I have a script that was very kindly provided for me a while ago which allowed me to generate input files by inserting coordinates from a series of .xyz files into a template file (Create new files by copying contents of coordinate files into template file).
I'm trying to adapt that script to do something very similar, but different in a very slight, but annoying way. In the script, the new directories created to house these new files are named like this:
# File name is in the form '....Hnnn.xyz';
# this will parse nnn from that name.
local inputNumber=$coordFile
# Remove '.xyz'.
inputNumber=${inputNumber%.xyz}
# Remove everything up to and including the 'H'.
inputNumber=${inputNumber##*H}
# Subdirectory name is based on the input number.
local outDir=$baseDir/D$inputNumber
# Create the directory if it doesn't exist.
if [[ ! -d $outDir ]]; then
mkdir $outDir
fi
This worked for my last problem, because the files were all named in the form xxxx_DH000.xyz. However, now the files I have are named using the form xxxx.000.xyz. While everything else in the script works, I cannot figure out how to name the new directories in the form 000.
The line in the script which I think needs to be edited slightly is where it says inputNumber=${inputNumber##*H}. What I cannot figure out is how to get the script to delete everything up to but not including a 0. I've searched online, but the only questions/answers I've found relating to the renaming of files by stripping part of the original names speaks about deleting everything 'up to and including' a string.
I was able to generate directories named 1, 2, 3, etc. with inputNumber=${inputNumber##*0}, however I want all three digits present (i.e. I would like create directories 001, 002, 003, etc.).
As an aside, I cannot use the . as the cutoff point, as there are multiple .s in each file name. An example of one of the file names is tma.h2s-2-pes-b97m-d4-tz.011.xyz.
Is there some way to get the script to simply name the files based on the full three digit number?
Although it's not needed in this case, zsh does support deleting text just before a matched pattern in a string. These parameter expansions will remove everything prior to the first 0 in the string, but keep the 0:
inputNumber='tma.h2s-2-pes-b97m-d4-tz.011.xyz'
inputNumber=${inputNumber:r} # remove '.xyz'
inputNumber=${(SM)inputNumber##0*}
print ${inputNumber}
# ==> 011
This includes a few zsh-isms:
${...:r} returns the 'root' of a filename, removing the extension.
(S) - parameter expansion flag to change the behavior of the ## expansion. It will now search for patterns in the middle of a string, not just at the beginning.
(M) - flag to include the pattern match (the 0*) in the result.
This depends on the number always starting with 0, which may not be a good choice - what file comes after 099?
This next version uses a zsh extended glob pattern to find a number between two periods, and returns that number - i.e. it will find the number in .11., .011., or .2345., but not in .x11.:
coordFile='tma.h2s-2-pes-b97m-d4-tz.022.xyz'
inputNumber=${(*)coordFile//(#b)*.(<->).*/${match}}
print ${inputNumber}
# ==> 022
Some of the pieces:
${...//.../...} - substitution expansion.
(*) - enables extendedglob for this expansion.
(#b) - globbing flag to enable 'backreferences', so that $match will work.
<-> - matches a number. This can be restricted to a range if needed, like <100-199>.
(<->) - puts the number into a match group.
*. and .* - everything before and after the number; these are not in the match group.
${match} - the matched string from the parenthesized part of the pattern. This is used as the replacement for the entire string, so we get just the number. If more than one part of the input string matches the pattern, this will be the last one. match is actually an array, but since there's only one match group in the pattern, it does not need to be indexed with ${match[1]}.
This variant uses a standard regular expression to find the number:
coordFile='tma.h2s-2-pes-b97m-d4-tz.033.xyz'
match=
[[ $coordFile =~ .*\\.([[:digit:]]+)\\..* ]]
inputNumber=${match[1]}
print ${inputNumber}
# ==> 033
After the [[ ]] test, the match array will contain matches from any parenthesized groups in the regular expression - here, that will be a set of one or more digits in between two periods / full stops.
But, as #choroba and Fravadona have noted, since the number will be always be at the end of the string, you can use the standard #/##/%/%% expansions to remove parts of the string based only on the .s. This is a common idiom that will be familiar to many shell programmers, and will also work in bash (note that other parts of your original script depend on zsh).
inputNumber='tma.h2s-2-pes-b97m-d4-tz.044.xyz'
inputNumber=${inputNumber%.xyz}
inputNumber=${inputNumber##*.}
print ${inputNumber}
# ==> 044
In zsh everything can be consolidated into a single nested substitution:
baseDir='files/are/here'
coordFile='tma.h2s-2-pes-b97m-d4-tz.055.xyz'
local outDir=$baseDir/D${${coordFile:r}##*.}
print $outDir
# ==> files/are/here/D055

Redirecting man to a file it is not identical to the text in the console

I am trying to print the man page for ls and I am getting output in my file with repeated characters. I am relatively new to bash and I dont know where to start with this issue.
This is the command I typed
man ls | cat > file.txt
I expected output like in the terminal
DESCRIPTION
For each operand that names a file of a type other than directory, ls displays its
name as well as any requested, associated information. For each operand that names a
file of type directory, ls displays the names of files contained within that direc-
tory, as well as any requested, associated information.
If no operands are given, the contents of the current directory are displayed. If
more than one operand is given, non-directory operands are displayed first; directory
and non-directory operands are sorted separately and in lexicographical order.
The following options are available:
-# Display extended attribute keys and sizes in long (-l) output.
-1 (The numeric digit ``one''.) Force output to be one entry per line. This is
the default when output is not to a terminal.
-A List all entries except for . and ... Always set for the super-user.
-a Include directory entries whose names begin with a dot (.).
-B Force printing of non-printable characters (as defined by ctype(3) and cur-
rent locale settings) in file names as \xxx, where xxx is the numeric value
of the character in octal.
-b As -B, but use C escape codes whenever possible.
-C Force multi-column output; this is the default when output is to a terminal.
But what I got as output in my file was like this
DDEESSCCRRIIPPTTIIOONN
For each operand that names a _f_i_l_e of a type other than directory, llss
displays its name as well as any requested, associated information. For
each operand that names a _f_i_l_e of type directory, llss displays the names
of files contained within that directory, as well as any requested, asso-
ciated information.
If no operands are given, the contents of the current directory are dis-
played. If more than one operand is given, non-directory operands are
displayed first; directory and non-directory operands are sorted sepa-
rately and in lexicographical order.
The following options are available:
--## Display extended attribute keys and sizes in long (--ll) output.
--11 (The numeric digit ``one''.) Force output to be one entry per
line. This is the default when output is not to a terminal.
--AA List all entries except for _. and _._.. Always set for the super-
user.
--aa Include directory entries whose names begin with a dot (_.).
--BB Force printing of non-printable characters (as defined by
ctype(3) and current locale settings) in file names as \_x_x_x,
where _x_x_x is the numeric value of the character in octal.
--bb As --BB, but use C escape codes whenever possible.
--CC Force multi-column output; this is the default when output is to
a terminal.
--cc Use time when file status was last changed for sorting (--tt) or
What would make it do this and how can I get the man page in readable text?
Some systems have a man program which notices whether it is sending output to the terminal or to a pipe and behaves differently in each case.
For example, on ubuntu linux, man man has an option:
MAN_KEEP_FORMATTING
Normally, when output is not being directed to a terminal (such
as to a file or a pipe), formatting characters are discarded to
make it easier to read the result without special tools. How-
ever, if $MAN_KEEP_FORMATTING is set to any non-empty value,
these formatting characters are retained. This may be useful
for wrappers around man that can interpret formatting charac-
ters.
In your case, it seems that man does not behave differently when sending output to a pipe.
There may be an option to turn on the behaviour you are looking for, but it may be simpler just to strip the unwanted characters out of the output. A common method is to use col:
man ls | col -bx > file.txt

julia to regex match lines in a file like grep

I would like to see a code snippet of julia that will read a file and return lines (string type) that match a regular expression.
I welcome multiple techniques, but output should be equivalent to the following:
$> grep -E ^AB[AJ].*TO' 'webster-unabridged-dictionary-1913.txt'
ABACTOR
ABATOR
ABATTOIR
ABJURATORY
I'm using GNU grep 3.1 here, and the first line of each entry in the file is the all caps word on its own.
You could also use the filter function to do this in one line.
filter(line -> ismatch(r"^AB[AJ].*TO",line),readlines(open("webster-unabridged-dictionary-1913.txt")))
filter applies a function returning a Boolean to an array, and only returns those elements of the array which are true. The function in this case is an anonymous function line -> ismatch(r"^AB[AJ].*TO",line)", which basically says to call each element of the array being filtered (each line, in this case) line.
I think this might not be the best solution for very large files as the entire file needs to be loaded into memory before filtering, but for this example it seems to be just as fast as the for loop using eachline. Another difference is that this solution returns the results as an array rather than printing each of them, which depending on what you want to do with the matches might be a good or bad thing.
My favored solution uses a simple loop and is very easy to understand.
julia> open("webster-unabridged-dictionary-1913.txt") do f
for i in eachline(f)
if ismatch(r"^AB[AJ].*TO", i) println(i) end
end
end
ABACTOR
ABATOR
ABATTOIR
ABJURATORY
notes
Lines with tab separations have the tabs preserved (no literal output of '\t')
my source file in this example has the dictionary words in all caps alone on one line above the definition; the complete line is returned.
the file I/O operation is wrapped in a do block syntax structure, which expresses an anonymous function more conveniently than lamba x -> f(x) syntax for multi-line functions. This is particularly expressive with the file open() command, defined with a try-finally-close operation when called with a function as an argument.
Julia docs: Strings/Regular Expressions
regex objects take the form r"<regex_literal_here>"
the regex itself is a string
based on perl PCRE library
matches become regex match objects
example
julia> reg = r"^AB[AJ].*TO";
julia> typeof(reg)
Regex
julia> test = match(reg, "ABJURATORY")
RegexMatch("ABJURATO")
julia> typeof(test)
RegexMatch
Just putting ; in front is Julia's way to using commandline commands so this works in Julia's REPL
;grep -E ^AB[AJ].*TO' 'webster-unabridged-dictionary-1913.txt'

zsh: command substitution, proper quoting and backslash (again)

(Note: This is a successor question to my posting zsh: Command substitution and proper quoting , but now with an additional complication).
I have a function _iwpath_helper, which outputs to stdout a path, which possibly contains spaces. For the sake of this discussion, let's assume that _iwpath_helper always returns a constant text, for instance
function _iwpath_helper
{
echo "home/rovf/my directory with spaces"
}
I also have a function quote_stripped expects one parameter and if this parameter is surrounded by quotes, it removes them and returns the remaining text. If the parameter is not surrounded by quotes, it returns it unchanged. Here is its definition:
function quote_stripped
{
echo ${1//[\"\']/}
}
Now I combine both functions in the following way:
target=$(quote_stripped "${(q)$(_iwpath_helper)}")
(Of course, 'quote_stripped' would be unnecessary in this toy example, because _iwpath_helper doesn't return a quote-delimited path here, but in the real application, it sometimes does).
The problem now is that the variable target contains a real backslash character, i.e. if I do a
echo +++$target+++
I see
+++home/rovf/my\ directory\ with\ spaces
and if I try to
cd $target
I get on my system the error message, that the directory
home/rovf/my/ directory/ with/ spaces
would not exist.
(In case you are wondering where the forward slashes come from: I'm running on Cygwin, and I guess that the cd command just interprets backslashes as forward slashes in this case, to accomodate better for the Windows environment).
I guess the backslashes, which physically appear in the variable target are caused by the (q) expansion flag which I apply to $(_iwpath_helper). My problem is now that I can not simply drop the (q), because without it, the function quote_stripped would get on parameter $1 only the first part of the string, up to the first space (/home/rovf/my).
How can I write this correctly?
I think you just want to avoid trying to strip quotes manually, and use the (Q) expansion flag. Compare:
% v="a b c d"
% echo "$v"
a b c d
% echo "${(q)v}"
a\ b\ c\ d
% echo "${(Q)${(q)v}}"
a b c d
chepner was right: The way I tried to unquote the string was silly (I was thinking too much in a "Bourne Shell way"), and I should have used the (Q) flag.
Here is my solution:
target="${(Q)$(_iwpath_helper)}"
No need for the quote_stripped function anymore....

What is meaning of ##*/ in unix?

I found syntax like below.
${VARIABLE##*/}
what is the meaning of ##*/ in this?
I know meaning of */ in ls */ but not aware about what above syntax does.
This example will make it clear:
VARIABLE='abcd/def/123'
echo "${VARIABLE#*/}"
def/123
echo "${VARIABLE##*/}"
123
##*/ is stripping out longest match of anything followed by / from start of input.
#*/ is stripping out shortest match of anything followed by / from start of input.
PS: Using all capital variable names is not considered very good practice in Unix shell. Better to use variable instead of VARIABLE.
From man bash:
${parameter#word}
${parameter##word}
Remove matching prefix pattern. The word is expanded to produce
a pattern just as in pathname expansion. If the pattern matches
the beginning of the value of parameter, then the result of the
expansion is the expanded value of parameter with the shortest
matching pattern (the ``#'' case) or the longest matching pat‐
tern (the ``##'' case) deleted. If parameter is # or *, the
pattern removal operation is applied to each positional parame‐
ter in turn, and the expansion is the resultant list. If param‐
eter is an array variable subscripted with # or *, the pattern
removal operation is applied to each member of the array in
turn, and the expansion is the resultant list.

Resources