Avoid Unix PATH duplicating with "set -f" - unix

I've seen a script that modifies the Unix $PATH, and in order to avoid duplicating items, it uses the following technic:
set path = ($path:q /some/new/path)
set path = ($path:q /another/directory)
set -f path = ($path:q)
I don't understand how is working...
Documentation to the "-f" flag says :
Disable file name generation
which doesn't make any sense to me. And what's this strange ":q"?
Thanks!
EDIT:
This Super User Question helped me understand that ":q" is a modifier.
And tcsh man explains it:
When the `:q' modifier is applied to
a substitution the variable will expand to multiple words
with each word separated by a blank and quoted to prevent
later command or filename substitution
Second Edit:
Actually, it seems that "-f" alone does the magic:
~$ set days = (Sunday Monday Tuesday Monday Sunday)
~$ echo $days
Sunday Monday Tuesday Monday Sunday
~$ set -f days = ($days)
~$ echo $days
Sunday Monday Tuesday
Still, I don't understand how is result of "Disable file name generation".

Disabling file name generation usually needed if we encounter file names that contain *, ?, {} etc. Care should be taken while handling these files so that we don't process file name as a wildcard pattern. Crete a file named stack* as vim stack*, later we shouldn't delete this file since all other files start with stack also gets deleted. Alternate way to delete the file is using quoting as rm "stack*". If required it is possible to enable file name generation by set +f in the shell.

Your confusion arises from the fact that you're reading the ksh manual, but you're using the tcsh shell. Tcsh syntax is very different from the vastly more common POSIX shell syntax.
The command set is built-in to the shell, so when you run set in tcsh you get an entirely different command to set run from ksh.
From man tcsh:
set [-r] [-f|-l] name=(wordlist) ... (+)
...
If -f or -l are specified, set only unique words keeping their
order. -f prefers the first occurrence of a word, and -l the
last.

Related

How to select all files from one sample?

I have a problem figuring out how to make the input directive only select all {samples} files in the rule below.
rule MarkDup:
input:
expand("Outputs/MergeBamAlignment/{samples}_{lanes}_{flowcells}.merged.bam", zip,
samples=samples['sample'],
lanes=samples['lane'],
flowcells=samples['flowcell']),
output:
bam = "Outputs/MarkDuplicates/{samples}_markedDuplicates.bam",
metrics = "Outputs/MarkDuplicates/{samples}_markedDuplicates.metrics",
shell:
"gatk --java-options -Djava.io.tempdir=`pwd`/tmp \
MarkDuplicates \
$(echo ' {input}' | sed 's/ / --INPUT /g') \
-O {output.bam} \
--VALIDATION_STRINGENCY LENIENT \
--METRICS_FILE {output.metrics} \
--MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 200000 \
--CREATE_INDEX true \
--TMP_DIR Outputs/MarkDuplicates/tmp"
Currently it will create correctly named output files, but it selects all files that match the pattern based on all wildcards. So I'm perhaps halfway there. I tried changing {samples} to {{samples}} in the input directive as such:
expand("Outputs/MergeBamAlignment/{{samples}}_{lanes}_{flowcells}.merged.bam", zip,
lanes=samples['lane'],
flowcells=samples['flowcell']),`
but this broke the previous rule somehow. So the solution is something like
input:
"{sample}_*.bam"
But clearly this doesn't work.
Is it possible to collect all files that match {sample}_*.bam with a function and use that as input? And if so, will the function still work with $(echo ' {input}' etc...) in the shell directive?
If you just want all the files in the directory, you can use a lambda function
from glob import glob
rule MarkDup:
input:
lambda wcs: glob('Outputs/MergeBamAlignment/%s*.bam' % wcs.samples)
output:
bam="Outputs/MarkDuplicates/{samples}_markedDuplicates.bam",
metrics="Outputs/MarkDuplicates/{samples}_markedDuplicates.metrics"
shell:
...
Just be aware that this approach can't do any checking for missing files, since it will always report that the files needed are the files that are present. If you do need confirmation that the upstream rule has been executed, you can have the previous rule touch a flag, which you then require as input to this rule (though you don't actually use the file for anything other than enforcing execution order).
If I understand correctly, zip needs to be applied only to {lane} and {flowcells} and not to {samples}. In that case, use two expand instances can achieve that.
input:
expand(expand("Outputs/MergeBamAlignment/{{samples}}_{lanes}_{flowcells}.merged.bam",
zip, lanes=samples['lane'], flowcells=samples['flowcell']),
samples=samples['sample'])
PS: output.tmp file uses {sample} instead of {samples}. Typo?

How old is file?

I have a shell script that will check a file is how many days old. I did stat -f "%m%t%Sm %N" "$file" . But I want to store this into a variable and then compare current time and file created time ?
Assuming you're using bash, you can capture the output of commands with something like:
fdate=$(stat -f "%m%t%Sm %N" "$file")
and then do whatever you will with the results:
echo ${fdate}
That's assuming the command itself works in the first place. If you are, you can ignore the text below.
The GNU stat program uses -f to specify you want to query the filesystem rather than a file and the other options you have don't seem to make sense in the context of your question.
Using Gnu stat, you can get the time since the last file update(1) as:
ageInSeconds=$(($(date -u +%s) - $(stat --printf "%Y" "file")))
The subtracts the last modification time of the file from the current time (both expressed as seconds since the epoch) to give you the age in seconds.
To turn that into days, assuming you're not overly concerned about the possible error from leap seconds (an error of, at most, one part in about 15.7 million, or 0.000006%), you can just divide it by 86,400:
ageInDays=$((($(date -u +%s) - $(stat --printf "%Y" "file")) / 86400))
(1) Note that, although stat purports to have a %W format specifier that gives the birth of the file, this doesn't always work (it returns zero). You could check that first if you're really interested in when the file was created rather than last updated but you may have to be prepared to accept the possibility the information is not available. I've used last modification time above since, frequently, it's used for things like detecting changes.

How do i read, modify and write to the same file without involving a temporary file in zsh?

I like keeping my history files uncluttered. Since zsh has excellent history searching features, there is no need to save all the commands that I repeatedly use (e.g., finger, pwd, ls, etc) multiple times. To strip the history file of all duplicate lines, I did sort .zhistory|uniq -du. Now, I'd like to write this back to the same file, so that if I simply put this in my .zshrc, everytime I login, my history is trimmed and clean. If I try sort .zhistory|uniq -du>.zhistory, the resulting file is empty! On the other hand, if I do sort .zhistory|uniq -du>tempfile, it writes to tempfile correctly. Any idea how I can write to the same file?
You might be able to use a variable:
file='.zhistory' && var=$(sort -u "$file") && echo "$var" > "$file"
The reason you can't write to the same file is that the redirection occurs first and truncates the file before the utility ever sees it.
You can prevent duplicate lines in the first place. Use setopt with one or more of the following settings (from man zshoptions):
HIST_EXPIRE_DUPS_FIRST
If the internal history needs to be trimmed to add the current
command line, setting this option will cause the oldest history
event that has a duplicate to be lost before losing a unique
event from the list. You should be sure to set the value of
HISTSIZE to a larger number than SAVEHIST in order to give you
some room for the duplicated events, otherwise this option will
behave just like HIST_IGNORE_ALL_DUPS once the history fills up
with unique events.
HIST_FIND_NO_DUPS
When searching for history entries in the line editor, do not
display duplicates of a line previously found, even if the
duplicates are not contiguous.
HIST_IGNORE_ALL_DUPS
If a new command line being added to the history list duplicates
an older one, the older command is removed from the list (even
if it is not the previous event).
HIST_IGNORE_DUPS (-h)
Do not enter command lines into the history list if they are
duplicates of the previous event.
HIST_SAVE_NO_DUPS
When writing out the history file, older commands that duplicate
newer ones are omitted.
The program sponge can be useful to write back in the same file you read.
(For the example's sake, you don't know about sed -i)
echo "say what again" > file
sed s/what/woot/ file > file
So bad, file is now empty, you lost your file.
echo "say what again" > file
sed s/what/woot/ file | sponge file
does what you want
(Be careful not to write sponge > file or the file will be empty again.)
The fact that i didn't have an answer to this question annoyed me sufficiently that i wrote one - call this inplace and put it executably on your path:
#! /bin/bash
BACKUP_EXT=
while getopts "b:" flag
do
case "$flag" in
b) BACKUP_EXT="$OPTARG" ;;
esac
done
shift $((OPTIND - 1))
CMD="$1"
shift
for filename in "$#"
do
TMP_FILE="$(mktemp -t)"
bash -c "$CMD" <"$filename" >"$TMP_FILE"
if [[ -n "$BACKUP_EXT" ]]
then
mv "$filename" "$filename.$BACKUP_EXT"
fi
mv "$TMP_FILE" "$filename"
done
You may now say:
inplace 'sort | uniq -du' .zhistory
Incidentally, there's a way to do that uniqification without having to sort - but that's an answer for another question!

cron syntax for date

The following statement work at command prompt. But does not work in a cron.
myvar=`date +'%d%m'`; echo $myvar >> append.txt
The cron log shows that only a part of the date statement is run.
How do I use it in a cron?
Escape the percent signs with a backslash (\%).
My general rule of thumb is "do not write scripts in the crontab file". That means I don't place anything other than a simple script name (with absolute path) and possibly some control arguments in the crontab file. In particular, I do not place I/O redirection or variable evaluations in the crontab file; such things go in a (shell) script run by the cron job.
This avoids the trouble - and works across a wide variety of variants of cron, both ancient and modern.
from man 5 crontab:
The sixth field (the rest of the
line) specifies the command to be run.
The entire command portion of the line, up to a newline or % character, will be
executed by /bin/sh or by
the shell specified in the SHELL variable of the cronfile.
Percent-signs (%) in the command, unless escaped with backslash (), will be changed into
newline characters, and all
data after the first % will be sent to the command as standard input.
Your %s are being changed to newlines, and the latter part of your command is being fed to the command as stdin. As Ignacio says, you need to escape the %s with a \

Why did my use of the read command not do what I expected?

I did some havoc on my computer, when I played with the commands suggested by vezult [1]. I expected the one-liner to ask file-names to be removed. However, it immediately removed my files in a folder:
> find ./ -type f | while read x; do rm "$x"; done
I expected it to wait for my typing of stdin:s [2]. I cannot understand its action. How does the read command work, and where do you use it?
What happened there is that read reads from stdin. When you put it at the end of a pipe, it read from that pipe.
So your find becomes
file1
file2
and so on; read reads that and replaces x successively with file1 then file2, and so your loop becomes
rm "file1"
rm "file2"
and sure enough, that rm's every file starting at the current directory ".".
A couple hints.
You didn't need the "/".
It's better and safer to say
find . -type f
because should you happen to type ". /" (ie, dot SPACE slash) find will start at the current directory and then go look starting at the root directory. That trick, given the right privileges, would delete every file in the computer. "." is already the name of a directory; you don't need to add the slash.
The find or rm commands will do this
It sounds like what you wanted to do was go through all the files in all the directories starting at the current directory ".", and have it ASK if you want to delete it. You could do that with
find . -type f -exec rm -i {} \;
or
find . -type f -ok rm {} \;
and not need a loop at all. You can also do
rm -r -i *
and get nearly the same effect, except that it will try to delete directories too. If the directory is empty, that'll even work.
Another thought
Come to think of it, unless you have a LOT of files, you could also do
rm -i `find . -type f`
Now the find in backquotes will become a bunch of file names on the command line, and the '-i' interactive flag on rm will ask the yes or no question.
Charlie Martin gives you a good dissection and explanation of what went wrong with your specific example, but doesn't address the general question of:
When should you use the read command?
The answer to that is - when you want to read successive lines from some file (quite possibly the standard output of some previous sequence of commands in a pipeline), possibly splitting the lines into several separate variables. The splitting is done using the current value of '$IFS', which normally means on blanks and tabs (newlines don't count in this context; they separate lines). If there are multiple variables in the read command, then the first word goes into the first variable, the second into the second, ..., and the residue of the line into the last variable. If there's only one variable, the whole line goes into that variable.
There are many uses. This is one of the simpler scripts I have that uses the split option:
#!/bin/ksh
#
# #(#)$Id: mkdbs.sh,v 1.4 2008/10/12 02:41:42 jleffler Exp $
#
# Create basic set of databases
MKDUAL=$HOME/bin/mkdual.sql
ELEMENTS=$HOME/src/sqltools/SQL/elements.sql
cat <<! |
mode_ansi with log mode ansi
logged with buffered log
unlogged
stores with buffered log
!
while read dbs logging
do
if [ "$dbs" = "unlogged" ]
then bw=""; cw=""
else bw="-ebegin"; cw="-ecommit"
fi
sqlcmd -xe "create database $dbs $logging" \
$bw -e "grant resource to public" -f $MKDUAL -f $ELEMENTS $cw
done
The cat command with a here-document has its output sent to a pipe, so the output goes into the while read dbs logging loop. The first word goes into $dbs and is the name of the (Informix) database I want to create. The remainder of the line is placed into $logging. The body of the loop deals with unlogged databases (where begin and commit do not work), then run a program sqlcmd (completely separate from the Microsoft new-comer of the same name; it's been around since about 1990) to create a database and populate it with some standard tables and data - a simulation of the Oracle 'dual' table, and a set of tables related to the 'table of elements'.
Other scripts that use the read command are bigger (by far), but generally read lines containing one or more file names and some other attributes of relevance, and then apply an appropriate transform to the files using the attributes.
Osiris JL: file * | grep 'sh.*script' | sed 's/:.*//' | xargs wgrep read
esqlcver:read version letter
jlss: while read directory
jlss: read x || exit
jlss: read x || exit
jlss: while read file type link owner group perms
jlss: read x || exit
jlss: while read file type link owner group perms
kb: while read size name
mkbod: while read directory
mkbod:while read dist comp
mkdbs:while read dbs logging
mkmsd:while read msdfile master
mknmd:while read gfile sfile version notes
publictimestamp:while read name type title
publictimestamp:while read name type title
Osiris JL:
'Osiris JL: ' is my command line prompt; I ran this in my 'bin' directory. 'wgrep' is a variant of grep that only matches entire words (to avoid words like 'already'). This gives some indication of how I've used it.
The 'read x || exit' lines are for an interactive script that reads a response from standard input, but exits if the command gets EOF (for example, if standard input comes from /dev/null).

Resources