Single quotes in awk's system - unix

I am trying to run bioawk (an extension of awk for fasta files) from awk's system functionality:
awk -v var=$i '{system("~/bin/bioawk-master/bioawk -c fastx '\''{if ($name==\""var"\"){print \">\"$name\"\\\\n\"$seq}}'\'' ../../prokka/"$2"/"$1"/"$1".ffn")}'
The result prints the literal "\n" between the values of $name and $seq instead of the intended carriage return.
What it prints:
NAME\nSEQUENCE
What I would like it to print:
NAME
SEQUENCE
When I print the bioawk command that want to run with:
awk -v var=$i '{system("echo ~/bin/bioawk-master/bioawk -c fastx '\''{if ($name==\""var"\"){print \">\"$name\"\\\\n\"$seq}}'\'' ../../prokka/"$2"/"$1"/"$1".ffn")}'
I get:
~/bin/bioawk-master/bioawk -c fastx {if ($name=="CANHHJNM_03494"){print ">"$name"\n"$seq}} ../../prokka/p190631-dr-tm-dc-sp-pi/EP41/EP41.ffn
I can see that it is missing the single quotes surrounding the brackets. I though having '\'' would solve this issue, but obviously it doesn't. Any help with this problem would be much appreciated

not sure this will solve your problem but the (second) easiest way to handle single quotes in an awk script is defining it externally as a variable
$ awk -v q="'" 'BEGIN{print q "single_quoted" q}'
'single_quoted'

Related

Unix command to replace first column of a .csv file

I want a unix command (that I will call in a ControlM job) that changes the value of the first column of my .csv file (not the header line), with the date of the previous day (expected format : YYYY-MM-DD).
I tried many commands but none of them do want I want :
tmp=$(mktemp) && awk -F\| -v val=`date -d yesterday +%F` 'NR>1 {gsub($1,val)}' file.csv > "$tmp" && mv "$tmp" file.csv
or :
awk -F\| -v val=`date -d yesterday +%F` '{gsub($1, val)}1' file.csv
even tried gensub but not working.
Example of what I want :
Input :
VALUE_DATE;TRADE_DATE;DESCR1;DESCR2
2019-03-05;2017-11-15;BRIDGE;HELLO
2019-03-05;2018-03-17;WORK;DATA
Output I want (as today is 2019-03-07):
VALUE_DATE;TRADE_DATE;DESCR1;DESCR2
2019-03-06;2017-11-15;BRIDGE;HELLO
2019-03-06;2018-03-17;WORK;DATA
Can you help please and give me examples of commands that should work, I'm not finding a solution.
Thanks a lot
Could you please try following first?(not saving output into file.csv itself it will print output on terminal once happy then you could use answer
provided at last of this post)
awk -v val=$(date -d yesterday +%F) 'BEGIN{FS=OFS=";"}FNR>1{$1=val} 1' file.csv
Problems identified in OP's code(and fixed in my suggestion):
1- Use of backtick is depreciated now to save shell variable's values, so instead use val=$(date....) for declaring awk's variable named val.
2- Use of -F, you have set your field separator as \| which is pipe but when we see your provided sample Input_file carefully it is delimited with ;(semi colon) NOT | so that is also one of the reason why it is not reflecting in output.
3- Since use of gsub($1,val), replaces whole line to only with value of variable val
because
syntax of gsub is something like: gsub(your_regex/value_needs_to_be_replaced,"new_value"/variable_which_should_be_there_after_replacement,current_line/variable). Since you have defined wrong field separator so whole line being treated as $1 and thus when you print it by doing awk -F\| -v val=$(date -d yesterday +%F) 'NR>1 {gsub($1,val)} 1' file.csv it will only print previous dates.
4- 4th and main issue is you have NOT printed anything, so even you did mistakes you will NOT see any output either on terminal or in output file.
If happy then you could run your own command to make changes into Input_file itself.(I am assuming that you are having propervaluein your tmp variable here)
tmp=$(mktemp) && awk -v val=$(date -d yesterday +%F) 'BEGIN{FS=OFS=";"}FNR>1{$1=val} 1' file.csv > "$tmp" && mv "$tmp" file.csv

unix combine grep w and v command

I want to search a file and include the text #!/bin/bash, but exclude any other line that has a # sign. These two commands: grep -w '#!/bin/bash' file and grep -v '^#' file each do one part of this job. I would like this to be a single command, so here's what I've tried.
grep -w '#!/bin/bash' | grep -v '^#' file
This excludes lines beginning with #, but doesn't include the line #!/bin/bash
grep -w '#!/bin/bash' -v '^#' file
This just prints every line but #!/bin/bash
grep "^[^#]\|^#\!/bin/bash$" test.sh
Explanation:
^[^#] means starts by something different that #
\| is a or
^#\!/bin/bash$ is the exact line #!/bin/bash
So .. it looks as if you're trying to strip comments from bash files without removing their shebang.
The grep command can search for regular expressions, but isn't so good at applying rules of logic. You could do something like this:
grep -v '^#[^!]' input.sh
But you'd fail to strip comments that are affixed to the ends of lines. Note that I'm being a little more liberal with this regex, since it's entirely possible that a script might use something other than /bin/bash for its shebang. :-)
Another possibility would be to use awk. This lets you apply logic that cannot be expressed within a regular expression. For example, if you want to keep the commented line only if it is a shebang on the first line of the file, and remove all other comments, awk can express that as follows:
awk '
NF==1 && /^#!/; # if we're on the first line and find shebang, print.
/^#/ { next } # if this is a comment line, skip it.
1 # print everything else.
' input.sh

awk getline not accepting external variable from a file

I have a file test.sh from which I am executing the following awk command.
awk -f x.awk < result/output.txt >>difference.txt
x.awk
while (getline < result/$bld/$DeviceType)
the variable DeviceType and bld are available in test.sh.
I have declared them as export type.
export DeviceType=$line
Even then while executing test.sh file, the script stops at following line
awk -f x.awk < result/output.txt >>difference.txt
and I am getting
awk: x.awk:4: (FILENAME=- FNR=116) fatal: division by zero attempted
error.
The awk script is read by awk, not touched by the shell. Inside an awk script, $bld means 'the field designated by the number in the variable bld' (that's the awk variable bld).
You can set awk variables on the command line (officially with the -v option):
awk -v bld="$bld" -v dev="$DeviceType" -f x.awk < result/output.txt >> difference.txt
Whether that does what you want is still debatable. Most likely you need x.awk to contain something like:
BEGIN { file = sprintf("result/%s/%s", bld, dev); }
{ while ((getline < file) > 0) print }
awk is not shell just like C is not shell. You should not expect to be able to access shell variables within an awk program any more than you can access shell variables within a C program.
To pass the VALUE of shell variables to an awk script, see http://cfajohnson.com/shell/cus-faq-2.html#Q24 for details but essentially:
awk -v awkvar="$shellvar" '{ ... use awkvar ...}'
is usually the right approach.
Having said that, whatever you're trying to do it looks like the wrong approach. If you are considering using getline, make sure to read http://awk.freeshell.org/AllAboutGetline first and understand all of the caveats but if you tell us what it is you're trying to do with sample input and expected output we can almost certainly help you come up with a better approach that has nothing to do with getline.

Unix - Need to cut a file which has multiple blanks as delimiter - awk or cut?

I need to get the records from a text file in Unix. The delimiter is multiple blanks. For example:
2U2133 1239
1290fsdsf 3234
From this, I need to extract
1239
3234
The delimiter for all records will be always 3 blanks.
I need to do this in an unix script(.scr) and write the output to another file or use it as an input to a do-while loop. I tried the below:
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then
int_1=0
else
int_2=0
fi
done < awk -F' ' '{ print $2 }' ${Directoty path}/test_file.txt
test_file.txt is the input file and file1.txt is a lookup file. But the above way is not working and giving me syntax errors near awk -F
I tried writing the output to a file. The following worked in command line:
more test_file.txt | awk -F' ' '{ print $2 }' > output.txt
This is working and writing the records to output.txt in command line. But the same command does not work in the unix script (It is a .scr file)
Please let me know where I am going wrong and how I can resolve this.
Thanks,
Visakh
The job of replacing multiple delimiters with just one is left to tr:
cat <file_name> | tr -s ' ' | cut -d ' ' -f 2
tr translates or deletes characters, and is perfectly suited to prepare your data for cut to work properly.
The manual states:
-s, --squeeze-repeats
replace each sequence of a repeated character that is
listed in the last specified SET, with a single occurrence
of that character
It depends on the version or implementation of cut on your machine. Some versions support an option, usually -i, that means 'ignore blank fields' or, equivalently, allow multiple separators between fields. If that's supported, use:
cut -i -d' ' -f 2 data.file
If not (and it is not universal — and maybe not even widespread, since neither GNU nor MacOS X have the option), then using awk is better and more portable.
You need to pipe the output of awk into your loop, though:
awk -F' ' '{print $2}' ${Directory_path}/test_file.txt |
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then int_1=0
else int_2=0
fi
done
The only residual issue is whether the while loop is in a sub-shell and and therefore not modifying your main shell scripts variables, just its own copy of those variables.
With bash, you can use process substitution:
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then int_1=0
else int_2=0
fi
done < <(awk -F' ' '{print $2}' ${Directory_path}/test_file.txt)
This leaves the while loop in the current shell, but arranges for the output of the command to appear as if from a file.
The blank in ${Directory path} is not normally legal — unless it is another Bash feature I've missed out on; you also had a typo (Directoty) in one place.
Other ways of doing the same thing aside, the error in your program is this: You cannot redirect from (<) the output of another program. Turn your script around and use a pipe like this:
awk -F' ' '{ print $2 }' ${Directory path}/test_file.txt | while read readline
etc.
Besides, the use of "readline" as a variable name may or may not get you into problems.
In this particular case, you can use the following line
sed 's/ /\t/g' <file_name> | cut -f 2
to get your second columns.
In bash you can start from something like this:
for n in `${Directoty path}/test_file.txt | cut -d " " -f 4`
{
grep -c $n ${Directory path}/file*.txt
}
This should have been a comment, but since I cannot comment yet, I am adding this here.
This is from an excellent answer here: https://stackoverflow.com/a/4483833/3138875
tr -s ' ' <text.txt | cut -d ' ' -f4
tr -s '<character>' squeezes multiple repeated instances of <character> into one.
It's not working in the script because of the typo in "Directo*t*y path" (last line of your script).
Cut isn't flexible enough. I usually use Perl for that:
cat file.txt | perl -F' ' -e 'print $F[1]."\n"'
Instead of a triple space after -F you can put any Perl regular expression. You access fields as $F[n], where n is the field number (counting starts at zero). This way there is no need to sed or tr.

UNIX: Replace Newline w/ Colon, Preserving Newline Before EOF

I have a text file ("INPUT.txt") of the format:
A<LF>
B<LF>
C<LF>
D<LF>
X<LF>
Y<LF>
Z<LF>
<EOF>
which I need to reformat to:
A:B:C:D:X:Y:Z<LF>
<EOF>
I know you can do this with 'sed'. There's a billion google hits for doing this with 'sed'. But I'm trying to emphasis readability, simplicity, and using the correct tool for the correct job. 'sed' is a line editor that consumes and hides newlines. Probably not the right tool for this job!
I think the correct tool for this job would be 'tr'. I can replace all the newlines with colons with the command:
cat INPUT.txt | tr '\n' ':'
There's 99% of my work done. I have a problem, now, though. By replacing all the newlines with colons, I not only get an extraneous colon at the end of the sequence, but I also lose the carriage return at the end of the input. It looks like this:
A:B:C:D:X:Y:Z:<EOF>
Now, I need to remove the colon from the end of the input. However, if I attempt to pass this processed input through 'sed' to remove the final colon (which would now, I think, be a proper use of 'sed'), I find myself with a second problem. The input is no longer terminated by a newline at all! 'sed' fails outright, for all commands, because it never finds the end of the first line of input!
It seems like appending a newline to the end of some input is a very, very common task, and considering I myself was just sorely tempted to write a program to do it in C (which would take about eight lines of code), I can't imagine there's not already a very simple way to do this with the tools already available to you in the Linux kernel.
This should do the job (cat and echo are unnecessary):
tr '\n' ':' < INPUT.TXT | sed 's/:$/\n/'
Using only sed:
sed -n ':a; $ ! {N;ba}; s/\n/:/g;p' INPUT.TXT
Bash without any externals:
string=($(<INPUT.TXT))
string=${string[#]/%/:}
string=${string//: /:}
string=${string%*:}
Using a loop in sh:
colon=''
while read -r line
do
string=$string$colon$line
colon=':'
done < INPUT.TXT
Using AWK:
awk '{a=a colon $0; colon=":"} END {print a}' INPUT.TXT
Or:
awk '{printf colon $0; colon=":"} END {printf "\n" }' INPUT.TXT
Edit:
Here's another way in pure Bash:
string=($(<INPUT.TXT))
saveIFS=$IFS
IFS=':'
newstring="${string[*]}"
IFS=$saveIFS
Edit 2:
Here's yet another way which does use echo:
echo "$(tr '\n' ':' < INPUT.TXT | head -c -1)"
Old question, but
paste -sd: INPUT.txt
Here's yet another solution: (assumes a character set where ':' is
octal 72, eg ascii)
perl -l72 -pe '$\="\n" if eof' INPUT.TXT

Resources