Parsing variable in loop incorrectly [duplicate] - zsh

I want to run certain actions on a group of lexicographically named files (01-09 before 10). I have to use a rather old version of FreeBSD (7.3), so I can't use yummies like echo {01..30} or seq -w 1 30.
The only working solution I found is printf "%02d " {1..30}. However, I can't figure out why can't I use $1 and $2 instead of 1 and 30. When I run my script (bash ~/myscript.sh 1 30) printf says {1..30}: invalid number
AFAIK, variables in bash are typeless, so how can't printf accept an integer argument as an integer?

Bash supports C-style for loops:
s=1
e=30
for i in ((i=s; i<e; i++)); do printf "%02d " "$i"; done
The syntax you attempted doesn't work because brace expansion happens before parameter expansion, so when the shell tries to expand {$1..$2}, it's still literally {$1..$2}, not {1..30}.
The answer given by #Kent works because eval goes back to the beginning of the parsing process. I tend to suggest avoiding making habitual use of it, as eval can introduce hard-to-recognize bugs -- if your command were whitelisted to be run by sudo and $1 were, say, '$(rm -rf /; echo 1)', the C-style-for-loop example would safely fail, and the eval example... not so much.
Granted, 95% of the scripts you write may not be accessible to folks executing privilege escalation attacks, but the remaining 5% can really ruin one's day; following good practices 100% of the time avoids being in sloppy habits.
Thus, if one really wants to pass a range of numbers to a single command, the safe thing is to collect them in an array:
a=( )
for i in ((i=s; i<e; i++)); do a+=( "$i" ); done
printf "%02d " "${a[#]}"

I guess you are looking for this trick:
#!/bin/bash
s=1
e=30
printf "%02d " $(eval echo {$s..$e})

Ok, I finally got it!
#!/bin/bash
#BSD-only iteration method
#for day in `jot $1 $2`
for ((day=$1; day<$2; day++))
do
echo $(printf %02d $day)
done
I initially wanted to use the cycle iterator as a "day" in file names, but now I see that in my exact case it's easier to iterate through normal numbers (1,2,3 etc.) and process them into lexicographical ones inside the loop. While using jot, remember that $1 is the numbers amount, and the $2 is the starting point.

Related

When to use spawn.with_shell and when spawn is only needed?

I'm confused when i should use awful.spawn and when to use awful.spawn.with_shell. To me these look and work the same.
The only difference I see is that in awful.spawn you can set client rules and make a callback.
I would appreciate any examples or rules on when to use each one.
awful.spawn.with_shell really does not do more than spawning the given command with a shell: https://github.com/awesomeWM/awesome/blob/c539e0e4350a42f813952fc28dd8490f42d934b3/lib/awful/spawn.lua#L370-L371
function spawn.with_shell(cmd)
if cmd and cmd ~= "" then
cmd = { util.shell, "-c", cmd }
return capi.awesome.spawn(cmd, false)
end
end
So, why would one want that? Some things are done by shells. For example, output redirections (echo hi > some_file), command sequences (echo 1; echo 2) or pipes (echo hello | grep ell) are all done by a shell. None of these work when starting a process correctly.
Why would one not want a shell? For example, argument escaping is way more complicated when a shell is involved. When you e.g. want to start print a pipe symbol (no idea why one would need that), then awful.spawn({"echo", "|"}) just works, while with a shell you need to escape the pipe symbol the appropriate number of times. I guess that awful.spawn.with_shell("echo \\\|") would work, but I am not sure and this is the point.
Also, a shell that does nothing is an extra process and is a tiny bit slower than without a shell, but this difference is really unimportant.

/usr/xpg4/bin/grep -q [^0-9] does not always work as expected

I have a Unix ksh script that has been in daily use for years (kicked off at night by the crontab). Recently one function in the script is behaving erratically as never happened before. I tried various ways to find out why, but have no success.
The function validates an input string, which is supposed to be a string of 10 numeric characters. The function checks if the string length is 10, and whether it contains any non-numeric characters:
#! /bin/ksh
# The function:
is_valid_id () {
# Takes one argument, which is the ID being tested.
if [[ $(print ${#1}) -ne 10 ]] || print "$1" | /usr/xpg4/bin/grep -q [^0-9] ; then
return 1
else
return 0
fi
}
cat $input_file | while read line ; do
id=$(print $line | awk -F: '{print $5}')
# Calling the function:
is_valid_id $id
stat=$?
if [[ $stat -eq 1 ]] ; then
print "The ID $id is invalid. Request rejected.\n" >> $ERRLOG
continue
else
...
fi
done
The problem with the function is that, every night, out of scores or hundreds of requests, it finds the IDs in several requests as invalid. I visually inspected the input data and found that all the "invalid" IDs are actually strings of 10 numeric characters as should be. This error seems to be random, because it happens with only some of the requests. However, while the rejected requests persistently come back, it is consistently the same IDs that are picked out as invalid day after day.
I did the following:
The Unix machine has been running for almost a year, therefore might need to be refreshed. The system admin to reboot the machine at my request. But the problem persists after the reboot.
I manually ran exactly the same two tests in the function, at command prompt, and the IDs that have been found invalid at night are all valid.
I know the same commands may behave differently invoked manually or in a script. To see how the function behaves in script, the above code excerpt is the small script I ran to reproduce the problem. And indeed, some (though not all) of the IDs found to be invalid at night are also found invalid by the small trouble-shooting script.
I then modified that troubleshooting script to run the two tests one at a time, and found it is the /usr/xpg4/bin/grep -q [^0-9] test that erroneously finds some of the ID as containing non-numeric character(s). Well, the IDs are all numeric characters, at least visually.
I checked if there is any problem with the xpg4 grep command file (ls -l /usr/xpg4/bin/grep), to see if it is put there recently. But its timestamp is year 2005 (this machine runs Solaris 10).
Knowing that the data comes from a central ERP system, to which data entry is performed from different locations using all kinds of various terminal machines running all kinds of possible operating systems that support various character sets and encodings. The ERP system simply allows them. But can characters from other encodings visually appear as numeric characters but the encoded values are not as the /usr/xpg4/bin/grep command expects to be on our Unix machine? I tried the od (octal dump) command but it does not help me much as I am not familiar with it. Maybe I need to know more about od for solving this problem.
My temporary work-around is omitting the /usr/xpg4/bin/grep -q [^0-9] test. But the problem has not been solved. What can I try next?
Your validity test function happens to be more complicated than it should be. E.g. why do you use a command substitution with print for ${#1}? Why don't you use ${#1} directly? Next, forking grep to test for a non-number is a slow and expensive operation. What about this equivalent function, 100% POSIX and blazingly fast:
is_valid_id () {
# Takes one argument, which is the ID being tested.
if test ${#1} -ne 10; then
return 1 # ID length not exactly 10.
fi
case $1 in
(*[!0-9]*) return 1;; # ID contains a non-digit.
(*) return 0;; # ID is exactly 10 digits.
esac
}
Or even more simple, if you don't mind repeating yourself:
is_valid_id () {
# Takes one argument, which is the ID being tested.
case $1 in
([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]) # 10 digits.
return 0;;
(*)
return 1;;
esac
}
This also avoids your unquoted use of a grep pattern, which is error-prone in the presence of one-character file names. Does this work better?

Renaming multiple files using parameter unix

I have a rename script below rename.sh. I want to introduce a variable such that I can pass a date argument when executing the script like./rename.sh 20151103 such that 20151103 replaces 20140306 in the script.
for f in *.CDR*; do
echo mv "$f" "${f/-20140306/-0-20140306}"
done
Thinking of automating this as I don't want to manually edit the script each time i'm doing a rename. Any other method will be highly welcomed.
#!/bin/bash
pattern="$1"
for f in *.CDR*; do
echo mv "$f" "${f/-${pattern}/-0-${pattern}}"
done
Explanation:
The #!-line says we're running this as a bash script.
The script will populate the variables $1, $2 etc. with the arguments handed to it on the command line. These are called the positional parameters ($0 usually holds the name of the script).
We take $1, because we know that should contain the pattern we're replacing, and assign it to the variable $pattern. In much more complex scripts, here is where we would handle command line switches (with getopts, but that's an answer for another day).
We quote $1, just because. (It's good practice to quote user input, just to be sure no shell-globbing characters, such as * gets expanded).
The rest is the script like you had from before, but with the string 20140306 replaced by ${pattern}. I'm using ${pattern} rather than $pattern here for readability only. In general, you need to use ${a} rather than $a if you, for example, interpolate a string like "${a}nospaceafter".
Then it should just be a matter of making the script executable before testing it:
$ chmod +x rename.sh
This is the one of the method you can consider:
#!/bin/bash
input=$1
for f in *.CDR*; do
echo mv "$f" "${f/-$input/-0-$input}"
done

grep -f maximum number of patterns?

I'd like to use grep on a text file with -f to match a long list (10,000) of patterns. Turns out that grep doesn't like this (who, knew?). After a day, it didn't produce anything. Smaller lists work almost instantaneously.
I was thinking I might split my long list up and do it a few times. Any idea what a good maximum length for the pattern list might be?
Also, I'm rather new with unix. Alternative approaches are welcome. The list of patterns, or search terms, are in a plaintext file, one per line.
Thank you everyone for your guidance.
From comments, it appears that the patterns you are matching are fixed strings. If that is the case, you should definitely use -F. That will increase the speed of the matching considerably. (Using 479,000 strings to match on an input file with 3 lines using -F takes under 1.5 seconds on a moderately powered machine. Not using -F, that same machine is not yet finished after several minutes.)
i got the same problem with approx. 4 million patterns to search for in a file with 9 million lines. Seems like it is a problem of RAM. so i got this neat little work around which might be slower than splitting and joining but it just need this one line.
while read line; do grep $line fileToSearchIn;done < patternFile
I needed to use the work around since the -F flag is no solution for that large files...
EDIT: This seems to be really slow for large files. After some more research i found 'faSomeRecords' and really other awesome tools from Kent NGS-editing-Tools
I tried it on my own by extracting 2 million fasta-rec from 5.5million records file. Took approx. 30 sec..
cheers
EDIT: direct download link
Here is a bash script you can run on your files (or if you would like, a subset of your files). It will split the key file into increasingly large blocks, and for each block attempt the grep operation. The operations are timed - right now I'm timing each grep operation, as well as the total time to process all the sub-expressions.
Output is in seconds - with some effort you can get ms, but with the problem you are having it's unlikely you need that granularity.
Run the script in a terminal window with a command of the form
./timeScript keyFile textFile 100 > outputFile
This will run the script, using keyFile as the file where the search keys are stored, and textFile as the file where you are looking for keys, and 100 as the initial block size. On each loop the block size will be doubled.
In a second terminal, run the command
tail -f outputFile
which will keep track of the output of your other process into the file outputFile
I recommend that you open a third terminal window, and that you run top in that window. You will be able to see how much memory and CPU your process is taking - again, if you see vast amounts of memory consumed it will give you a hint that things are not going well.
This should allow you to find out when things start to slow down - which is the answer to your question. I don't think there's a "magic number" - it probably depends on your machine, and in particular on the file size and the amount of memory you have.
You could take the output of the script and put it through a grep:
grep entire outputFile
You will end up with just the summaries - block size, and time taken, e.g.
Time for processing entire file with blocksize 800: 4 seconds
If you plot these numbers against each other (or simply inspect the numbers), you will see when the algorithm is optimal, and when it slows down.
Here is the code: I did not do extensive error checking but it seemed to work for me. Obviously in your ultimate solution you need to do something with the outputs of grep (instead of piping it to wc -l which I did just to see how many lines were matched)...
#!/bin/bash
# script to look at difference in timing
# when grepping a file with a large number of expressions
# assume first argument = name of file with list of expressions
# second argument = name of file to check
# optional third argument = initial block size (default 100)
#
# split f1 into chunks of 1, 2, 4, 8... expressions at a time
# and print out how long it took to process all the lines in f2
if (($# < 2 )); then
echo Warning: need at leasttwo parameters.
echo Usage: timeScript keyFile searchFile [initial blocksize]
exit 0
fi
f1_linecount=`cat $1 | wc -l`
echo linecount of file1 is $f1_linecount
f2_linecount=`cat $2 | wc -l`
echo linecount of file2 is $f2_linecount
echo
if (($# < 3 )); then
blockLength=100
else
blockLength=$3
fi
while (($blockLength < f1_linecount))
do
echo Using blocks of $blockLength
#split is a built in command that splits the file
# -l tells it to break after $blockLength lines
# and the block$blockLength parameter is a prefix for the file
split -l $blockLength $1 block$blockLength
Tstart="$(date +%s)"
Tbefore=$Tstart
for fn in block*
do
echo "grep -f $fn $2 | wc -l"
echo number of lines matched: `grep -f $fn $2 | wc -l`
Tnow="$(($(date +%s)))"
echo Time taken: $(($Tnow - $Tbefore)) s
Tbefore=$Tnow
done
echo Time for processing entire file with blocksize $blockLength: $(($Tnow - $Tstart)) seconds
blockLength=$((2*$blockLength))
# remove the split files - no longer needed
rm block*
echo block length is now $blockLength and f1 linecount is $f1_linecount
done
exit 0
You could certainly give sed a try to see whether you get a better result, but it is a lot of work to do either way on a file of any size. You didn't provide any details on your problem, but if you have 10k patterns I would be trying to think about whether there is some way to generalize them into a smaller number of regular expressions.
Here is a perl script "match_many.pl" which addresses a very common subset of the "large number of keys vs. large number of records" problem. Keys are accepted one per line from stdin. The two command line parameters are the name of the file to search and the field (white space delimited) which must match a key. This subset of the original problem can be solved quickly since the location of the match (if any) in the record is known ahead of time and the key always corresponds to an entire field in the record. In one typical case it searched 9400265 records with 42899 keys, matching 42401 of the keys and emitting 1831944 records in 41s. The more general case, where the key may appear as a substring in any part of a record, is a more difficult problem that this script does not address. (If keys never include white space and always correspond to an entire word the script could be modified to handle that case by iterating over all fields per record, instead of just testing the one, at the cost of running M times slower, where M is the average field number where the matches are found.)
#!/usr/bin/perl -w
use strict;
use warnings;
my $kcount;
my ($infile,$test_field) = #ARGV;
if(!defined($infile) || "$infile" eq "" || !defined($test_field) || ($test_field <= 0)){
die "syntax: match_many.pl infile field"
}
my %keys; # hash of keys
$test_field--; # external range (1,N) to internal range (0,N-1)
$kcount=0;
while(<STDIN>) {
my $line = $_;
chomp($line);
$keys {$line} = 1;
$kcount++
}
print STDERR "keys read: $kcount\n";
my $records = 0;
my $emitted = 0;
open(INFILE, $infile ) or die "Could not open $infile";
while(<INFILE>) {
if(substr($_,0,1) =~ /#/){ #skip comment lines
next;
}
my $line = $_;
chomp($line);
$line =~ s/^\s+//;
my #fields = split(/\s+/, $line);
if(exists($keys{$fields[$test_field]})){
print STDOUT "$line\n";
$emitted++;
$keys{$fields[$test_field]}++;
}
$records++;
}
$kcount=0;
while( my( $key, $value ) = each %keys ){
if($value > 1){
$kcount++;
}
}
close(INFILE);
print STDERR "records read: $records, emitted: $emitted; keys matched: $kcount\n";
exit;

Unix ksh loop and put the result into variable

I have a simple thing to do, but I'm novice in UNIX.
So, I have a file and on each line I have an ID.
I need to go through the file and put all ID's into one variable.
I've tried something like in Java but does not work.
for variable in `cat myFile.txt`
do
param=`echo "${param} ${variable}"`
done
It does not seems to add all values into param.
Thanks.
I'd use:
param=$(<myFile.txt)
The parameter has white space (actually newlines) between the names. When used without quotes, the shell will expand those to spaces, as in:
cat $param
If used with quotes, the file names will remain on separate lines, as in:
echo "$param"
Note that the Korn shell special-cases the '$(<file)' notation and does not fork and execute any command.
Also note that your original idea can be made to work more simply:
param=
for variable in `cat myFile.txt`
do
param="${param} ${variable}"
done
This introduces a blank at the front of the parameter; it seldom matters. Interestingly, you can avoid the blank at the front by having one at the end, using param="${param}${variable} ". This also works without messing things up, though it looks as though it jams things together. Also, the '${var}' notation is not necessary, though it does no harm either.
And, finally for now, it is better to replace the back-tick command with '$(cat myFile.txt)'. The difference becomes crucial when you need to nest commands:
perllib=$(dirname $(dirname $(which perl)))/lib
vs
perllib=`dirname \`dirname \\\`which perl\\\`\``/lib
I know which I prefer to type (and read)!
Try this:
param=`cat myFile.txt | tr '\n' ' '`
The tr command translates all occurrences of \n (new line) to spaces. Then we assign the result to the param variable.
Lovely.
param="$(< myFile.txt)"
or
while read line
do
param="$param$line"$'\n'
done < myFile.txt
awk
var=$(awk '1' ORS=" " file)
ksh
while read -r line
do
t="$t $line"
done < file
echo $t

Resources