Two field division, awk

Two field division, awk - math

I've get stuck with (maybe very easy) function in awk: I'm trying to divide two fields row by row with the following code:
awk 'BEGIN{FS=OFS="\t"} $43 > 0 && $31 > 0 {$43/$31; print}' file.tsv
But I'm getting continuously this error: fatal: division by zero attempted, but I've already check that denominator is always distinct from zero (and indeed, I think the code should be discarding zeroes) and I've no idea what's happening... any suggestion, please? Thanks a lot!
EDIT: The input table has this format (awk 'BEGIN{FS=OFS="\t"} {print $31,$43}' file.tsv | head -4):
triCount_PM triSum_altPM
3 25
3 7
3 0

E.g. "fnord" > 0 evaluates to true in Awk; you really need to make sure the values are properly numeric. A common trick for coercing a number interpretation is to add zero.
awk 'BEGIN{FS=OFS="\t"} 0+$43 > 0 && 0+$31 > 0 { print $43/$31 }' file.tsv
Just print always prints $0 (which is initialized to the current input line, though you can change it directly or indirectly from your program); to print something else, pass that "something else" as an argument to print.

The division-by-zero results from the header which does not have numbers for the fields so both fields are zero. To make this work, you need to skip the header (NR == 1 tests for the first line)
$ awk 'NR==1{print "$43/$31"; next} $43>0 && $31>0 {print $43/$31}' file

Related

Replace text in lines in a file with increments

I have a file with multiple lines (no. of lines unknown)
DD0TRANSID000019021210504250003379433005533665506656000008587201902070168304000.0AK 0000L00000.00 N 01683016832019021220190212N0000.001683065570067.000000.00000.0000000000000NAcknowledgment
DD0TRANSID000019021210505110003379433005535567606656000008587201902085381804000.0FC 0000L00000.00 N 53818538182019021220190212N0000.053818065570067.000000.00000.0000000000000NFirst Contact
DD0TRANSID000019021210510360003379433005535568006656000008587201902085381804000.0SR 0000L00000.00 N 53818538182019021220190212N0000.053818065570067.000000.00000.0000000000000NStatus Report
The text TRANSID000 is in every line starting from 3rd to 10th poisition
I need to be able to replace it with TRAN000066 in increments of 1
66 is a variable I am getting from another file (say nextcounter) for storing the start of the counter. Once the program updates all the lines, I should be able to capture the last number and update the nextcounter file with it.
Output
DD0TRAN00066019021210504250003379433005533665506656000008587201902070168304000.0AK 0000L00000.00 N 01683016832019021220190212N0000.001683065570067.000000.00000.0000000000000NAcknowledgment
DD0TRAN00067019021210505110003379433005535567606656000008587201902085381804000.0FC 0000L00000.00 N 53818538182019021220190212N0000.053818065570067.000000.00000.0000000000000NFirst Contact
DD0TRAN00068019021210510360003379433005535568006656000008587201902085381804000.0SR 0000L00000.00 N 53818538182019021220190212N0000.053818065570067.000000.00000.0000000000000NStatus Report
I have tried awk sed and perl, but it does not give me desired results.
Please suggest.

Simple loop
s=66; while read l; do echo "$l" | sed "s/TRANSID000/TRAN$(printf '%06d' $s)/" ; s=$((s+=1)); done < inputFile > outputFile; echo $s > counterFile

Walter A answer is almost perfect, missing the required lines 3-10 limit.
So the improved answer is:
awk -v start=66 'NR > 2 && NR < 11{ sub(/TRANSID000/, "TRAN0000" start++); print }' inputfile

When you want to use sed you might want to use a loop, avoiding the ugly
sed '=' inputfile | sed -r '{N;s/(.*)\n(.*)(TRANSID000)(.*)/echo "\2TRAN0$((\1+65))\4"/e}'
It is much easier with awk:
awk -v start=66 '{ sub(/TRANSID000/, "TRAN0" start++); print }' inputfile
EDIT:
OP asks for replace TRANSID with TRAN0, I showed this in the edited solution.
When I look to the example output, the additional 0 is not needed.
Another question is what happens when the counter comes above 99. Should one of the leading zeroes be deleted (with a construction like printf "%.4d"), or will the line length be 1 more?
DD0TRAN00099019...
DD0TRAN00100019...
# or
DD0TRAN00099019...
DD0TRAN000100019...

How to select multiple columns which are not next to each other?

I have a dataset which I am trying to select the first 10 columns from, and the last 27 columns from (from the 125th column onwards to the final 152nd column).
awk 'BEGIN{FS="\t"} { printf $1,$2,$3,$4,$5,$6,$7,$8,$9,$10; for(i=125; i<=NF; ++i) printf $i""FS; print ""}' Bigdata.txt > Smalldata.txt
With trying this code it gives me the first 12 columns (with their data) and all the headers for all 152 columns from my original big data file. How do I select both columns 1-10 and 125-152 to go into a new file? I am new to linux and any guidence would be appreciated.

don't reinvent the wheel, if you already know the number of columns cut is the tool for this task.
$ cut -f1-10,125-152 bigdata
tab is the default delimiter.
If you don't know the number of columns, awk comes to the rescue!
$ cut -f1-10,$(awk '{print NF-27"-"NF; exit}' file) file
awk will print the end range by reading the first line of the file.

Using the KISS principle
awk 'BEGIN{FS=OFS="\t"}
{ c=""; for(i=1;i<=10;++i) { printf c $i; c=OFS}
for(i=NF-27;i<=NF;++i) { printf c $i }
printf ORS }' file

Could you please try following, since no samples produced so couldn't test it. You need NOT to manually write 1...10 field values you could use a loop for that too.
awk 'BEGIN{FS=OFS="\t"}{for(i=1;i<=10;i++){printf("%s%s",$i,OFS)};for(i=(NF-27);i<=NF;i++){printf("%s%s",$i,i==NF?ORS:OFS)}}' Input_file > output_file
Also you need NOT to worry about headers here, since we are simply printing the lines and no logic specifically applied for lines so no need to add any specific entry for 1st line or so.
EDIT: 1 more point here seems you meant that different column values(in different ranges) should come in single line(for a single line from Input) if this is the case then my above code should handle it, since I am printing spaces as separator for their values and printing a new only when their last field's value is printed, by this each line from Input_file fields will be on same line(as Input_file's entry).
Explanation: Adding detailed explanation here.
awk ' ##Starting awk program here.
BEGIN{ ##Starting BEGIN section here, which will be executed before Input_file is getting read.
FS=OFS="\t" ##Setting FS and OFS as TAB here.
} ##Closing BEGIN section here for this awk code.
{ ##Starting a new BLOCK which will be executed when Input_file is being read.
for(i=1;i<=10;i++){ ##Running a for loop which will run 10 times from i=1 to i=10 value.
printf("%s%s",$i,OFS) ##Printing value of specific field with OFS value.
} ##Closing for loop BLOCK here.
for(i=(NF-27);i<=NF;i++){ ##Starting a for loop which will run for 27 last fields only as per OP requirements.
printf("%s%s",$i,i==NF?ORS:OFS) ##Printing field value and checking condition i==NF, if field is last field of line print new line else print space.
} ##Closing block for, for loop now.
}' Input_file > output_file ##Mentioning Input_file name here, whose output is going into output_file.

Attempted to use awk sqrt but only returns 0

I am attempting to use the sqrt function from awk command in my script, but all it returns is 0. Is there anything wrong with my script below?
echo "enter number"
read root
awk 'BEGIN{ print sqrt($root) }'
This is my first time using the awk command, are there any mistakes that I am not understanding here?

Maybe you can try this.
echo "enter number"
read root
echo "$root" | awk '{print sqrt($0)}'
You have to give a data input to awk. So, you can pipe 'echo'.
The BEGIN statement is to do things, like print a header...etc before
awk starts reading the input.

$ echo "enter number"
enter number
$ read root
3
$ awk -v root="$root" 'BEGIN{ print sqrt(root) }'
1.73205
See the comp.unix.shell FAQ for the 2 correct ways to pass the value of a shell variable to an awk script.

UPDATE : My proposed solution turns out to be potentially dangerous. See Ed Morton's answer for a better solution. I'll leave this answer here as a warning.
Because of the single quotes, $root is interpreted by awk, not by the shell. awk treats root as an uninitialized variable, whose value is the empty string, treated as 0 in a numeric context. $root is the root'th field of the current line -- in this case, as $0, which is the entire line. Since it's in a BEGIN block, there is no current line, so $root is the empty string -- which again is treated as 0 when passed to sqrt().
You can see this by changing your command line a bit:
$ awk 'BEGIN { print sqrt("") }'
0
$ echo 2 | awk '{ print sqrt($root) }'
1.41421
NOTE: The above is merely to show what's wrong with the original command, and how it's interpreted by the shell and by awk.
One solution is to use double quotes rather than single quotes. The shell expands variable references within double quotes:
$ echo "enter number"
enter number
$ read x
2
$ awk "BEGIN { print sqrt($x) }" # DANGEROUS
1.41421
You'll need to be careful when doing this kind of thing. The interaction between quoting and variable expansion in the shell vs. awk can be complicated.
UPDATE: In fact, you need to be extremely careful. As Ed Morton points out in a comment, this method can result in arbitrary code execution given a malicious value for $x, which is always a risk for a value read from user input. His answer avoids that problem.
(Note that I've changed the name of your shell variable from $root to $x, since it's the number whose square root you want, not the root itself.)

Filter file based on internal value within a column

In UNIX, I would like to filter my 3 columns file based on the "DP" values that within the 3rd column.
I'd like to obtain only rows that have DP values higher than 7.
A|49.14|AC=2;AF=0.500;AN=4;BaseQRankSum=1.380;DP=6;Dels=0.00;
T|290.92|AC=2;AF=1.00;AN=2;DP=8;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;
T|294.75|AC=6;AF=1.00;AN=6;DP=9;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=6;
I'm using here "|" for separating between my three columns

Here's one simple solution
$ echo "A|49.14|AC=2;AF=0.500;AN=4;BaseQRankSum=1.380;DP=6;Dels=0.00;
AC=6;AF=1.00;AN=6;DP=9;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=6;T|290.92|AC=2;AF=1.00;AN=2;DP=8;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;
MLEAC=6;" \
| awk '{dpVal=$0;sub(/.*DP=/, "", dpVal);sub(/;.*$/,"", dpVal); if (dpVal>7) print}'
output
T|290.92|AC=2;AF=1.00;AN=2;DP=8;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;
T|294.75|AC=6;AF=1.00;AN=6;DP=9;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=6;
This makes a copy of each line ($0), the strips away everything before DP=, and everything after the trailing ; char for that field, leaving just the value for DP. That value is tested, and if true the whole line is printed (the default action of awk print is to print the whole line, but you can tell it to print anything you like, maybe print "Found it:" $0 or zillons of variants.
edit
I would like to keep all the first 53 lines intact and save them as well to my Output.txt file.
Yes, very easy, you're on the right track. With awk is is very easy to have multiple conditions process different parts or conditions in a file. Try this:
awk 'FNR <= 53 {print}
FNR > 53 {
vpVal=$0;sub(/.*DP=/, "", dpVal);sub(/;.*$/,"", dpVal)
if (dpVal>7) print
}' File.vcf > Output.txt
(i don't have a file to test with, so let me know if this isn't right).
IHTH

modify only one column of a big file and keep field seperator the same in unix

I have a very big file (more than 10000 columns). I would like to change 3 entries in the second column and keep anything else the same, including the field separator.
For example:
ab123\t123\t0.1
ab234\t120\t0.5
I would like to check if the second column has the entry 120 and change it 1201 and keep everything else the same.
I tried awk. It works fine but replaces the tab delimited with space.
awk '{ if ( $2 == 120 ) { $2 = 1201 }; print}' file
How can i do this without losing the tab delimited version?

You want to set FS (field separator) and OFS (output field separator) to tabs:
awk '$2==120{$2=1201}1' FS='\t' OFS='\t' file
OFS is the important variable here as awk uses it's value to separate the fields on output.
EDIT:
The structure of awk is conditional{block}, if the conditional is evaluated TRUE then the block is executed. So with $2==120{$2=1201} the conditional is $2==120 if the second field is the value 120 and the block is {$2=1201} assign the second field the value 1201. The default block in awk is {print $0} so:
awk '$2==120{$2=1201}{print $0}'
Can be re-written as:
awk '$2==120{$2=1201}1'
Where 1 is the conditional which always evaluates to TRUE and because we don't specify a block the default {print $0} is executed.
For multiple conditions just add more structures i.e:
awk '$2==120{$2=1201}$3==130{$3==1301}1'
This is more of an if if structure as both block can be executed and if else would use the next statement to jump to next line in the file i.e:
awk '$2==120{$2=1201;next}{$2==1202}1'
If the first block is executed here the second field is takes the value 1201 and we grab the next line else the second field will take the value 1202. So the second field will always take a new value, either 1201 or 1202.
An if elif would be:
awk '$2==120{$2=1201;next}$3==130{$3==1301}1'
Here the second field may take a new value, if it does the third field will not be updated even if the condition is true because it never get evaluated. The third field can only be updated if the first condition is FALSE and the second TRUE.

sed -r 's/^ *[^ ]+ +120\b/\01/' file

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex