sub and gsub function? - r

I have this command:
$ find $PWD -name "*.jpg" | awk '{system( "echo " $(sub(/\//, "_")) ) }'
_home/mol/Pulpit/test/1.jpg
Now the same thing, but using gsub:
$ find $PWD -name "*.jpg" | awk '{system( "echo " $(gsub(/\//, "_")) ) }'
mol#mol:~
I want to get the result:
_home_mol_Pulpit_test_1.jpg
Thank you for your help.
EDIT:
I put 'echo' to test the command:
$ find $PWD -name "*.jpg" | awk '{gsub("/", "_")} {system( "echo " mv $0 " " $0) }'
_home_mol_Pulpit_test_1.jpg _home_pic_Pulpit_test_1.jpg
mol#mol:~
I want to get the result:
$ find $PWD -name "*.jpg" | awk '{gsub("/", "_")} {system( "echo " mv $0 " " $0) }'
/home/pic/Pulpit/test/1.jpg _home_pic_Pulpit_test_1.jpg

That won't work if the string contains more than one match... try this:
echo "/x/y/z/x" | awk '{ gsub("/", "_") ; system( "echo " $0) }'
or better (if the echo isn't a placeholder for something else):
echo "/x/y/z/x" | awk '{ gsub("/", "_") ; print $0 }'
In your case you want to make a copy of the value before changing it:
echo "/x/y/z/x" | awk '{ c=$0; gsub("/", "_", c) ; system( "echo " $0 " " c )}'

Related

UNIX shell script reading csv

I have a csv file. I would like to put the fields into different variables. Supposed there are three fields in each line of the csv file. I have this code:
csvfile=test.csv
while read inline; do
var1=`echo $inline | awk -F',' '{print $1}'`
var2=`echo $inline | awk -F',' '{print $2}'`
var3=`echo $inline | awk -F',' '{print $3}'`
.
.
.
done < $csvfile
This code is good. However, if a field is coded with an embedded comma, then, it would not work. Any suggestion? For example:
how,are,you
I,"am, very",good
this,is,"a, line"
This may not be the perfect solution but it will work in your case.
[cloudera#quickstart Documents]$ cat cd.csv
a,b,c
d,"e,f",g
File content
csvfile=cd.csv
while read inline; do
var1=`echo $inline | awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", "*", $i) }1' | awk -F',' '{print $1}' | sed 's/*/,/g'`
var2=`echo $inline | awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", "*", $i) }1' | awk -F',' '{print $2}' | sed 's/*/,/g'`
var3=`echo $inline | awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", "*", $i) }1' | awk -F',' '{print $3}' | sed 's/*/,/g'`
echo $var1 " " $var2 " " $var3
done< $csvfile
Output :
[cloudera#quickstart Documents]$ sh a.sh
a b c
d e,f g
So basically first we are trying to handle "," in data and then replacing the "," with "*" to get desired column using awk and then reverting * to "," again to get actual field value

How can I extract all repeated pattern in a line to comma separated format

I am extracting an interested pattern in a file. In each line I have repeated pattern and I want to order all repeated pattern for each line in a comma separated format. For example: In each line I have a string like this:
Line1: InterPro:IPR000504 InterPro:IPR003954 InterPro:IPR012677 Pfam:PF00076 PROSITE:PS50102 SMART:SM00360 SMART:SM00361 EMBL:CP002684 Proteomes:UP000006548 GO:GO:0009507 GO:GO:0003723 GO:GO:0000166 Gene3D:3.30.70.330 SUPFAM:SSF54928 eggNOG:KOG0118 eggNOG:COG0724 InterPro:IPR003954
Line2: InterPro:IPR000306 InterPro:IPR002423 InterPro:IPR002498 Pfam:PF00118 Pfam:PF01363 Pfam:PF01504 PROSITE:PS51455 SMART:SM00064 SMART:SM00330 InterPro:IPR013083 Proteomes:UP000006548 GO:GO:0005739 GO:GO:0005524 EMBL:CP002686 GO:GO:0009555 GO:GO:0046872 GO:GO:0005768 GO:GO:0010008 Gene3D:3.30.40.10 InterPro:IPR017455
I want to extract all InterPro IDs for each line as like as this :
IPR000504,IPR003954,IPR012677,IPR003954
IPR000306,IPR002423,IPR002498,IPR013083,IPR017455
I have used this script:
while read line; do
NUM=$(echo $line | grep -oP 'InterPro:\K[^ ]+' | wc -l)
if [ $NUM -eq 0 ];then
echo "NA" >> InterPro.txt;
fi;
if [ ! $NUM -eq 0 ];then
echo $line | grep -oP 'InterPro:\K[^ ]+' | tr '\n' ',' >> InterPro.txt;
fi;
done <./File.txt
The problem is once I run this script, all the pattern's values in the File.txt print in one line. I want all interested pattern's values of each line print in separated line.
Thank you in advance
With awk:
awk '{for (i=1; i<=NF; ++i) {if ($i~/^InterPro:/) {gsub(/InterPro:/, "", $i); x=x","$i}} gsub (/^,/, "", x); print x; x=""}' file
Output:
IPR000504,IPR003954,IPR012677,IPR003954
IPR000306,IPR002423,IPR002498,IPR013083,IPR017455
With indent and more meaningful variable names:
awk '
{
for (column=1; column<=NF; ++column)
{
if ($column~/^InterPro:/)
{
gsub(/InterPro:/, "", $column)
line=line","$column
}
}
gsub (/^,/, "",line)
print line
line=""
}' file
With bash builtin commands:
while IFS= read -r line; do
for column in $line; do
[[ $column =~ ^InterPro:(.*) ]] && new+=",${BASH_REMATCH[1]}"
done
echo "${new#,*}"
unset new
done < file
Finally, I changed the script and could get the interested results:
while read line; do
NUM=$(echo $line | grep -oP 'InterPro:\K[^ ]+' | wc -l)
if [ $NUM -eq 0 ];then
echo "NA" >> InterPro.txt;
fi;
if [ ! $NUM -eq 0 ];then
echo $line | grep -oP 'InterPro:\K[^ ]+' | sed -n -e 'H;${x;s/\n/,/g;s/^,//;p;}' | sed 's/ /,/g' >> InterPro.txt;
fi;
done <./File.txt

How to reverse a string in ksh

please help me with this problem, i have an array witch includes 1000 lines with number which are treated as strings and i want for all of them to reverse them one by one, my problem is how to reverse them because i have to use ksh or else with bash or something it would be so easy..... what i have now is this, but
rev="$rev${copy:$y:1}" doesnt work in ksh.
i=0
while [[ $i -lt 999 ]]
do
rev=""
var=${xnumbers[$i]}
copy=${var}
len=${#copy}
y=$(expr $len - 1)
while [[ $y -ge 0 ]]
do
rev="$rev${copy:$y:1}"
echo "y = " $y
y=$(expr $y - 1)
done
echo "i = " $i
echo "rev = " $rev
#xnumbers[$i]=$(expr $xnumbers[$i] "|" $rev)
echo "xum = " ${xnumbers[$i]}
echo "##############################################"
i=$(expr $i + 1)
done
I am not sure why we cannot use built in rev function.
$ echo 798|rev
897
You can also try:
$ echo 798 | awk '{ for(i=length;i!=0;i--)x=x substr($0,i,1);}END{print x}'
897
If, you can print the contents of the array to a file, you can then process the file with this awk oneliner.
awk '{s1=split($0,A,""); line=""; for (i=s1;i>0;i--) line=line A[i];print line}' file
Check this!!
other_var=`echo ${xnumbers[$i]} | awk '{s1=split($0,A,""); line=""; for (i=s1;i>0;i--) line=line A[i];print line}'`
I have tested this on Ubuntu with ksh, same results:
number="789"
other_var=`echo $number | awk '{s1=split($0,A,""); line=""; for (i=s1;i>0;i--) line=line A[i];print line}'`
echo $other_var
987
You could use cut, paste and rev together, just change printf to cat file.txt:
paste -d' ' <(printf "%s data\n" {1..100} | cut -d' ' -f1) <(printf "%s data\n" {1..100} | cut -d' ' -f2 |rev)
Or rev alone if, it's not a numbered file as clarified by the OP.

How To Run Multiple "awk" commands:

Would like to run multiple "awk" commands in single script ..
For example Master.csv.gz located at /cygdrive/e/Test/Master.csv.gz and
Input files are located in different sub directories like /cygdrive/f/Jan/Input_Jan.csv.gz & /cygdrive/f/Feb/Input_Feb.csv.gz and so on ..
All input files are *.gz extension files.
Below commands are working fine while executing command one by one:
Command#1
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$2] = $0; next} ($2 in a) {print $0}' <(gzip -dc /cygdrive/e/Test/Master.csv.gz) <(gzip -dc /cygdrive/f/Jan/Input_Jan.csv.gz) >>Output.txt
Output#1:
Name,Age,Location
abc,20,xxx
Command#2
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$2] = $0; next} ($2 in a) {print $0}' <(gzip -dc /cygdrive/e/Test/Master.csv.gz) <(gzip -dc /cygdrive/f/Feb/Input_Feb.csv.gz) >>Output.txt
Output#2:
Name,Age,Location
def,40,yyy
cat Output.txt
Name,Age,Location
abc,20,xxx
def,40,yyy
Have tried below commands to run in via single script , got error:
Attempt#1: awk -f Test.awk
cat Test.awk
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$2] = $0; next} ($2 in a) {print $0}' <(gzip -dc /cygdrive/e/Test/Master.csv.gz) <(gzip -dc /cygdrive/f/Jan/Input_Jan.csv.gz) >>Output.txt
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$2] = $0; next} ($2 in a) {print $0}' <(gzip -dc /cygdrive/e/Test/Master.csv.gz) <(gzip -dc /cygdrive/f/Feb/Input_Feb.csv.gz) >>Output.txt
Error : Attempt#1: awk -f Test.awk
awk: Test.awk:1: ^ invalid char ''' in expression
awk: Test.awk:1: ^ syntax error
Attempt#2: sh Test.sh
cat Test.sh
#!/bin/sh
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$2] = $0; next} ($2 in a) {print $0}' <(gzip -dc /cygdrive/e/Test/Master.csv.gz) <(gzip -dc /cygdrive/f/Jan/Input_Jan.csv.gz) >>Output.txt
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$2] = $0; next} ($2 in a) {print $0}' <(gzip -dc /cygdrive/e/Test/Master.csv.gz) <(gzip -dc /cygdrive/f/Feb/Input_Feb.csv.gz) >>Output.txt
Error : Attempt#2: sh Test.sh
Test.sh: line 2: syntax error near unexpected token `('
Desired Output:
Name,Age,Location
abc,20,xxx
def,40,yyy
Looking for your suggestions ..
Update#2-Month Name
Ed Morton, Thanks for the inputs, however the output order are not proper , "Jan2014" is print on next line , please suggest
cat Output.txt:
Name,Age,Location
abc,20,xxx
Jan2014
def,40,yyy
Feb2014
Expected Output
Name,Age,Location
abc,20,xxx,Jan2014
def,40,yyy,Feb2014
All you need is:
#!/bin/bash
awk -F, 'FNR==NR{a[$2]; next} $2 in a' \
<(gzip -dc /cygdrive/e/Test/Master.csv.gz) \
<(gzip -dc /cygdrive/f/Jan/Input_Jan.csv.gz) \
<(gzip -dc /cygdrive/f/Feb/Input_Feb.csv.gz) \
>> Output.txt
If you want to print the month name too then the simplest thing would be:
#!/bin/bash
awk -F, 'FNR==NR{a[$2]; next} $2 in a{print $0, mth}' \
<(gzip -dc /cygdrive/e/Test/Master.csv.gz) \
mth="Jan" <(gzip -dc /cygdrive/f/Jan/Input_Jan.csv.gz) \
mth="Feb" <(gzip -dc /cygdrive/f/Feb/Input_Feb.csv.gz) \
>> Output.txt
but you could remove the redundant specifying of the month name 3 times on each line with:
#!/bin/bash
mths=(Jan Feb)
awk -F, 'FNR==NR{a[$2]; next} $2 in a{print $0, mth}' \
<(gzip -dc /cygdrive/e/Test/Master.csv.gz) \
mth="${mths[$((i++))]}" <(gzip -dc "/cygdrive/f/${mths[$i]}/Input_${mths[$i]}.csv.gz") \
mth="${mths[$((i++))]}" <(gzip -dc "/cygdrive/f/${mths[$i]}/Input_${mths[$i]}.csv.gz") \
>> Output.txt
Your first attempt failed because you were trying to call awk in an awk script, and your second attempt failed because the bash process substitution, <(...), is not defined by POSIX, and is not guaranteed to work with /bin/sh. Here is an awk script that should work.
#!/usr/bin/awk -f
BEGIN {
if (ARGC < 3) exit 1;
ct = "cat ";
gz = "gzip -dc "
f = "\"" ARGV[1] "\"";
c = (f~/\.gz$/)?gz:ct;
while ((c f | getline t) > 0) {
split(t, a, ",");
A[a[2]] = t;
}
close(c f);
for (n = 2; n < ARGC; n++) {
f = "\"" ARGV[n] "\"";
c = (f~/\.gz$/)?gz:ct;
while ((c f | getline t) > 0) {
split(t, a, ",");
if (a[2] in A) print t;
}
close(c f);
}
exit;
}
usage
script.awk /cygdrive/e/Test/Master.csv.gz /cygdrive/f/Jan/Input_Jan.csv.gz
script.awk /cygdrive/e/Test/Master.csv.gz /cygdrive/f/Feb/Input_Feb.csv.gz
or
script.awk /cygdrive/e/Test/Master.csv.gz /cygdrive/f/Jan/Input_Jan.csv.gz\
/cygdrive/f/Feb/Input_Feb.csv.gz

AWK Include Whitespaces in Command

I have String: "./Delivery Note.doc 1" , where:
$1 = ./Delivery
$2 = Note.doc
$3 = 1
I need to execute sum command concatenating $1 and $2 but keeping white space (./Delivery Note.doc). I try this but it trim whitespaces:
| '{ command="sum -r "$1 $2"
Result: ./DeliveryNote.doc
To execute the sum command
echo "./Delivery Note.doc 1" | awk '{ command="sum -r \""$1" "$2"\""; print command}' | bash
$ echo "./Delivery Note.doc 1" | awk '{ command="sum -r "$1" "$2; print command}'
sum -r ./Delivery Note.doc

Resources