Find all words starting with a fix string in a file? - unix

How can I find all the words in my csv file starting with $
My file is like:
Test1,$Var1,$varCab1,$Vargab1,Comment1
Test2,$Var2,$varCab2,$Vargab2,Comment2
Test3,$Var3,$varCab3,$Vargab3,Comment3
As an output I want
$Var1
$varCab1
$Vargab1
$Var2
$varCab2
$Vargab2
$Var3
$varCab3
$Vargab3

Try following (grep -oE '\$\w+' filename):
$ cat 1.csv
Test1,$Var1,$varCab1,$Vargab1,Comment1
Test2,$Var2,$varCab2,$Vargab2,Comment2
Test3,$Var3,$varCab3,$Vargab3,Comment3
$ grep -oE '\$\w+' 1.csv
$Var1
$varCab1
$Vargab1
$Var2
$varCab2
$Vargab2
$Var3
$varCab3
$Vargab3
Using awk:
$ awk -F, '{ for(i=1;i<=NF;i++) if ($i ~ /\$/) print $i; }' 1.csv
$Var1
$varCab1
$Vargab1
$Var2
$varCab2
$Vargab2
$Var3
$varCab3
$Vargab3

Use tr and grep:
$ tr ',' '\n' < inputfile | grep "^[$]"
$Var1
$varCab1
$Vargab1
$Var2
$varCab2
$Vargab2
$Var3
$varCab3
$Vargab3

Using perl:
perl -ne 'for (m/\$\w+/g) { print $_, "\n" }' < inputfile
Or even shorter:
perl -ne 'print map("$_\n", m/\$\w+/g)' < inputfile
Explanation:
The regular expression \$\w+ matches a $ followed by one or more word characters.
The m//g expression returns a list of matches.
perl -ne runs the expression for each file in the input, inserting the line in $_, which is then used by the m//g expression.

Related

How to coerce AWK to evaluate string as math expression?

Is there a way to evaluate a string as a math expression in awk?
balter#spectre3:~$ echo "sin(0.3) 0.3" | awk '{print $1,sin($2)}'
sin(0.3) 0.29552
I would like to know a way to also have the first input evaluated to 0.29552.
You can just create your own eval function which calls awk again to execute whatever command you want it to:
$ cat tst.awk
{ print eval($1), sin($2) }
function eval(str, cmd,line,ret) {
cmd = "awk \047BEGIN{print " str "; exit}\047"
if ( (cmd | getline line) > 0 ) {
ret = line
}
close(cmd)
return ret
}
$ echo 'sin(0.3) 0.3' | awk -f tst.awk
0.29552 0.29552
$ echo '4*7 0.3' | awk -f tst.awk
28 0.29552
$ echo 'tolower("FOO") 0.3' | awk -f tst.awk
foo 0.29552
awk lacks an eval(...) function. This means that you cannot do string to code translation based on input after the awk program initializes. Ok, perhaps it could be done, but not without writing your own parsing and evaluation engine in awk.
I would recommend using bc for this effort, like
[edwbuck#phoenix ~]$ echo "s(0.3)" | bc -l
.29552020666133957510
Note that this would require sin to be shortened to s as that's the bc sine operation.
Here's a simple one liner!
math(){ awk "BEGIN{printf $1}"; }
Examples of use:
math 1+1
Yields "2"
math 'sqrt(25)'
Yeilds "5"
x=100; y=5; math "sqrt($x) + $y"
Yeilds "15"
With gawk version 4.1.2 :
echo "sin(0.3) 0.3" | awk '{split($1,a,/[()]/);f=a[1];print #f(a[2]),sin($2)}'
It's ok with tolower(FOO) too.
You can try Perl as it has eval() function.
$ echo "sin(0.3)" | perl -ne ' print eval '
0.29552020666134
$
For the given input,
$ echo "sin(0.3) 0.3" | perl -ne ' /(\S+)\s+(\S+)/ and print eval($1), " ", $2 '
0.29552020666134 0.3
$

How can I extract all repeated pattern in a line to comma separated format

I am extracting an interested pattern in a file. In each line I have repeated pattern and I want to order all repeated pattern for each line in a comma separated format. For example: In each line I have a string like this:
Line1: InterPro:IPR000504 InterPro:IPR003954 InterPro:IPR012677 Pfam:PF00076 PROSITE:PS50102 SMART:SM00360 SMART:SM00361 EMBL:CP002684 Proteomes:UP000006548 GO:GO:0009507 GO:GO:0003723 GO:GO:0000166 Gene3D:3.30.70.330 SUPFAM:SSF54928 eggNOG:KOG0118 eggNOG:COG0724 InterPro:IPR003954
Line2: InterPro:IPR000306 InterPro:IPR002423 InterPro:IPR002498 Pfam:PF00118 Pfam:PF01363 Pfam:PF01504 PROSITE:PS51455 SMART:SM00064 SMART:SM00330 InterPro:IPR013083 Proteomes:UP000006548 GO:GO:0005739 GO:GO:0005524 EMBL:CP002686 GO:GO:0009555 GO:GO:0046872 GO:GO:0005768 GO:GO:0010008 Gene3D:3.30.40.10 InterPro:IPR017455
I want to extract all InterPro IDs for each line as like as this :
IPR000504,IPR003954,IPR012677,IPR003954
IPR000306,IPR002423,IPR002498,IPR013083,IPR017455
I have used this script:
while read line; do
NUM=$(echo $line | grep -oP 'InterPro:\K[^ ]+' | wc -l)
if [ $NUM -eq 0 ];then
echo "NA" >> InterPro.txt;
fi;
if [ ! $NUM -eq 0 ];then
echo $line | grep -oP 'InterPro:\K[^ ]+' | tr '\n' ',' >> InterPro.txt;
fi;
done <./File.txt
The problem is once I run this script, all the pattern's values in the File.txt print in one line. I want all interested pattern's values of each line print in separated line.
Thank you in advance
With awk:
awk '{for (i=1; i<=NF; ++i) {if ($i~/^InterPro:/) {gsub(/InterPro:/, "", $i); x=x","$i}} gsub (/^,/, "", x); print x; x=""}' file
Output:
IPR000504,IPR003954,IPR012677,IPR003954
IPR000306,IPR002423,IPR002498,IPR013083,IPR017455
With indent and more meaningful variable names:
awk '
{
for (column=1; column<=NF; ++column)
{
if ($column~/^InterPro:/)
{
gsub(/InterPro:/, "", $column)
line=line","$column
}
}
gsub (/^,/, "",line)
print line
line=""
}' file
With bash builtin commands:
while IFS= read -r line; do
for column in $line; do
[[ $column =~ ^InterPro:(.*) ]] && new+=",${BASH_REMATCH[1]}"
done
echo "${new#,*}"
unset new
done < file
Finally, I changed the script and could get the interested results:
while read line; do
NUM=$(echo $line | grep -oP 'InterPro:\K[^ ]+' | wc -l)
if [ $NUM -eq 0 ];then
echo "NA" >> InterPro.txt;
fi;
if [ ! $NUM -eq 0 ];then
echo $line | grep -oP 'InterPro:\K[^ ]+' | sed -n -e 'H;${x;s/\n/,/g;s/^,//;p;}' | sed 's/ /,/g' >> InterPro.txt;
fi;
done <./File.txt

Remove duplicated string stored in variable

I have a variable $var with this content:
var=word1,word2,word3,word1,word3
and I need to delete duplicate words and the results is required stored in the same variable $var.
Try
var="word1,word2,word3,word1,word3"
list=$(echo $var | tr "," "\n")
var=($(printf "%s\n" "${list[#]}" | sort | uniq -c | sort -rnk1 | awk '{ print $2 }'))
echo "${var[#]}"
If open to perl then:
$ var="word1,word2,word3,word1,word3"
$ var=$(perl -F, -lane'{$h{$_}++ or push #a, $_ for #F; print join ",", #a}' <<< "$var")
$ echo "$var"
word1,word2,word3

How to reverse a string in ksh

please help me with this problem, i have an array witch includes 1000 lines with number which are treated as strings and i want for all of them to reverse them one by one, my problem is how to reverse them because i have to use ksh or else with bash or something it would be so easy..... what i have now is this, but
rev="$rev${copy:$y:1}" doesnt work in ksh.
i=0
while [[ $i -lt 999 ]]
do
rev=""
var=${xnumbers[$i]}
copy=${var}
len=${#copy}
y=$(expr $len - 1)
while [[ $y -ge 0 ]]
do
rev="$rev${copy:$y:1}"
echo "y = " $y
y=$(expr $y - 1)
done
echo "i = " $i
echo "rev = " $rev
#xnumbers[$i]=$(expr $xnumbers[$i] "|" $rev)
echo "xum = " ${xnumbers[$i]}
echo "##############################################"
i=$(expr $i + 1)
done
I am not sure why we cannot use built in rev function.
$ echo 798|rev
897
You can also try:
$ echo 798 | awk '{ for(i=length;i!=0;i--)x=x substr($0,i,1);}END{print x}'
897
If, you can print the contents of the array to a file, you can then process the file with this awk oneliner.
awk '{s1=split($0,A,""); line=""; for (i=s1;i>0;i--) line=line A[i];print line}' file
Check this!!
other_var=`echo ${xnumbers[$i]} | awk '{s1=split($0,A,""); line=""; for (i=s1;i>0;i--) line=line A[i];print line}'`
I have tested this on Ubuntu with ksh, same results:
number="789"
other_var=`echo $number | awk '{s1=split($0,A,""); line=""; for (i=s1;i>0;i--) line=line A[i];print line}'`
echo $other_var
987
You could use cut, paste and rev together, just change printf to cat file.txt:
paste -d' ' <(printf "%s data\n" {1..100} | cut -d' ' -f1) <(printf "%s data\n" {1..100} | cut -d' ' -f2 |rev)
Or rev alone if, it's not a numbered file as clarified by the OP.

If the last column is equal "R" then... Is it possible? In unix

I need to find the last column from a variable that contains some fields. I need to write something like:
if [ #the last column = "R" ];
then
value=`echo "'$value'"`
fi
Is it possible?
With awk you can try:
awk '$NF=="R"' <<< "$var"
Test:
$ var="this is a var with last as R"
$ awk '$NF=="R"' <<< "$var"
this is a var with last as R
$ var1="This should not be printed"
$ awk '$NF=="R"' <<< "$var1"
$
The condition can be:
if [[ $value == *' 'R ]]
then
echo $value
fi
No need for an external language, like awk.
Using the =~ binary operator:
$ var="Some arbitrary string ending in R"
$ unset value
$ [[ "$var" =~ $'R$' ]] && value=${var}
$ echo $value
Some arbitrary string ending in R
$ var="Some arbitrary string ending in Q"
$ unset value
$ [[ "$var" =~ $'R$' ]] && value=${var}
$ echo $value
More universal code assuming separation by spaces:
case $var in
(*\ R) printf "%s\n" "$var"
esac
Or:
if [ "${var##* }" = R ]; then
printf "%s\n" "$var"
fi

Resources