How to subtract some values in different line using awk? - unix

I have a data that separated by "|". This date is occurs every 15th minutes. What I want to do is to subtract this data and multiply it with 100 but it seems didn't work.
bash-4.2$ cat kresna.txt
2019-05-29 16:48:01||196579|1637589633|0|109423435|101347165|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0||0|0|111|1554983|1554990|0||0|6347782|0|0|0|0|||1637602667|8747|13287295146|283512|1636036853|38771|||326516100|101703893|145340456|6988739|224616616|107247291|7764|101598218|19745231|0
2019-05-29 17:03:01||197446|1637876915|0|109456309|101349847|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0||0|0|111|1552437|1552441|0||0|6336110|0|0|0|0|||1637889948|8747|13290533845|283553|1636326689|38771|||326591972|101734973|145373623|6990480|224660545|107268556|7764|101629298|19748302|0
awk -F "|" '{if(NR>1){print (1 - ($45+$47+$49-_m) / ($44+$46+$48-_n) *100)};_n=$44+$46+$48;_m=$45+$47+$49}' kresna.txt
0.998926
That expected output is to e 99.98

I don't see a problem with the code except the grouping of the arithmetic operands is incorrect. You want to perform the multiply by 100 on the value from the division previously. So just do below. Further get rid of the if else statements as
awk -F "|" 'NR > 1 { print ((1 - ($45+$47+$49-_m) / ($44+$46+$48-_n)) *100)} { _n=$44+$46+$48; _m=$45+$47+$49 }'
99.9989
Even wit printf(), it needs atleast 3 precision characters to print it without rounding of the value to the next whole number
awk -F "|" 'NR > 1 { printf "%.3f\n", ((1 - ($45+$47+$49-_m) / ($44+$46+$48-_n)) *100)} { _n=$44+$46+$48; _m=$45+$47+$49 }'
99.999

Related

Extract two consecutive lines that have non-consecutive strings

I have a very large text file with 2 columns and more than 10 mio of lines.
Most lines have in column 2 a number that is the number of column 2 of the previous line +1. However, few thousands of lines behave differently (see example below).
Input file:
A 1
A 2
A 3
A 10
A 11
A 12
A 40
A 41
I would like to extract the pair of two lines that do not respect the +1 increment in column 2.
Desired output file:
A 3
A 10
A 12
A 40
Is there (preferentially) an awk command that allows to do that?
I tried several codes comparing column 2 of two consecutive lines but unfortunately I fail until now (see the code below).
awk 'FNR==1 {print; next} $2==p2+1 {print p $0; p=""; next} {p=$0 ORS; p2=$2}' input.txt > output.txt
Thanks for your help. Best,
Would you please try the following:
awk 'NR>1 {if ($2!=p2+1) print p ORS $0} {p=$0; p2=$2}' input.txt > output.txt
Output:
A 3
A 10
A 12
A 40
The variables names are similar to yours: p holds the previous line and
p2 holds the second column of the previous line.
The condition NR>1 suppresses to print on the 1st line.
if ($2!=p2+1) print p ORS $0 prints the pairs of two lines which
meet the condition.
The block {p=$0; p2=$2} preserves values of current line for the next iteration.
I like perl for the text processing that needs arithmetic.
$ perl -ane 'print and next if $.<3; print $p and print if $F[3]!=$fp+1; $fp=$F[3]; $p=$_' input.txt
| COLUMN 1 | COLUMN 2 |
| -------- | -------- |
| A | 3 |
| A | 10 |
| A | 12 |
| A | 40 |
This is using -a to autosplit into #F.
Prints first 2 lines: print and next if $.<3
On subsequent lines, prints previous line and current line if the 4th field isn't exactly one more than the prior 4th field: print $p and print if $F[3]!=$fp+1
Saves the 4th field as $fp and the entire line as $p: $fp=$F[3]; $p=$_
Assumptions:
columns are tab-delimited
the 1st column may contain white space (this isn't demonstrated in the sample provided by OP but it also hasn't been ruled out)
lines of interest must have the same value in the 1st column (ie, if the values in the 1st column differ then we don't bother with comparing the values in the 2nd column and instead proceed to the next input line)
if 3 consecutive lines meet the criteria, the 2nd/middle line is only printed once
Setup:
$ cat input.txt
A 1
A 2
A 3 # match
A 10 # match
A 11
A 12 # match
A 23 # match
A 40 # match
A 41
X to Z 101
X to Z 102 # match
X to Z 104 # match
X to Z 105
NOTE: comments only added here to highlight the lines that match the search criteria
One awk idea:
awk -F'\t' '
FNR==1 { prevline=$0 }
FNR>1 { if ($1 == prev1 && $2+0 != prev2+1) {
if (prevline) print prevline
print
prevline="" # make sure this line is not printed again if next line also meets criteria
}
else
prevline=$0
}
{ prev1=$1; prev2=$2 }
' input.txt
This generates:
A 3
A 10
A 12
A 23
A 40
X to Z 102
X to Z 104
This might work for you (GNU sed):
sed -nE 'N;h
s/.*\s+(.*)\n.*(\s.*)/echo "$((\1+1))\2"/e;/^(.*)\s\1$/!{x;p;x};x;D' file
Open a two line window throughout the length of the file.
Make a copy of the window and increment the 2nd column of the first line by one. If this amended value is equal to the 2nd column of the second line then print both unadulterated lines.
Delete the first line and repeat.
N.B. This may print the second of these lines twice if the following line meets the same criteria.

OFS when using if-else statement in awk

I have a simple text file, delimited by multiple spaces, and with a different number of columns (6 or 5).
What I am trying to do is, for the rows with more than 5 columns, combine the 2 last columns in one, doing:
cat data.txt | awk '{if(NF>5) print $1,$2,$3,$4,$5"_"$6; else print $0} OFS="," ' > data.csv
The problem is that the OFS is not working for the else statement.
Example - input:
a d e t er ap
b q j n mm
Output that I am getting:
a,d,e,t,er_ap
b q j n mm
Desirable output:
a,d,e,t,er_ap
b,q,j,n,mm
Any suggestions?
Set your OFS in the BEGIN block so that it's a comma before any processing happens. Also when you do print $0 without manipulating the line in any way, awk will just spit out the line as-is with whatever delimiters are in place in the source file. Personally I think that's dumb, but that's awk. As a workaround, just set one column equal to itself, then print:
awk 'BEGIN{OFS=","}{if(NF>5) print $1,$2,$3,$4,$5"_"$6; else {$1=$1;print $0}}' data.txt
If you anticipate more than 6 columns you can just have it toss underscores for all of them after column 5 with some printf trickery too
awk '{for (i=1;i<=NF;i++){printf (i==NF)?"%s\n":(i>=5)?"%s_":"%s,", $i}}' data.txt

transpose column to row [duplicate]

This question already has answers here:
awk group by multiple columns and print max value with non-primary key
(3 answers)
Closed 6 years ago.
I have tried transpose the file below using awk
n counts
1 -0.1520
1 0.0043
1 -0.4903
10 0.0316
10 -0.4076
10 -0.1175
200 0.2720
200 -0.2007
200 0.0559
I need a output like that
1 -0.1520 0.0043 -0.4903
10 0.0316 -0.4076 -0.1175
200 0.2720 -0.2007 0.0559
I tried but didnĀ“t work
awk 'NR==1{print} NR>1{a[$1]=a[$1]" "$2}END{for (i in a){print i " " a[i]}}'
Thank you
its working. try below
awk 'NR==1{print} NR>1{a[$1]=a[$1]" "$2}END{for (i in a){print i " " a[i]}}' file | tac
or you can use sort
awk 'NR==1{print} NR>1{a[$1]=a[$1]" "$2}END{for (i in a){print i " " a[i]}}' file | sort -k1 -n
mmm awk.
$ cat bar.awk
#! /usr/bin/awk -f
BEGIN{getline}
$1 != n {if(row)print row; n=$1; row = $0}
$1 == n {row = row FS $2}
END{ print row }
$ ./bar.awk foo
1 -0.1520 -0.1520 0.0043 -0.4903
10 0.0316 0.0316 -0.4076 -0.1175
200 0.2720 0.2720 -0.2007 0.0559
the getline eats the header
the $1 != n will notice when the first column changes
n will be 0 to begin with so if the first column (& second row) is also zero there will be a problem and you will have to initialize n to something else
when the first column changes it is time to print the previous row and begin collecting the next row,
(if the row is empty, as it is initially, don't print it)
when the first column is the same as the previous row,
just append the second value to your row.
finally print the last row.
the FS is whatever the current field separator is.

Comparing consecutive rows within an file

I have two files with with 1s and 0s in each column, where the field separator is "," :
1,0,0,1,1,1,0,0,0,0,1,0,0,1,1,0,1,0
0,1,0,1,1,1,0,1,0,1,0,0,0,0,0,0,0,0
1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,1,0
1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,1,0
1,0,1,0,0,0,0,1,1,1,1,1,1,1,1,1,0,1
1,0,1,0,0,0,0,1,1,1,1,1,1,1,1,1,0,0
1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0
1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0
1,1,1,0,0,0,1,1,1,0,0,0,1,1,1,0,0,0
1,1,1,0,0,0,1,1,1,0,0,0,1,1,1,0,0,0
Want I want to do is look at the file in pairs of rows, compare them, and if they are exactly the same output a 1. So for this example the rows 1 & 2 are different so they don't get a 1, rows 3 & 4 are exactly the same so they get a 1, and rows 5&6 differ by 1 column so they don't get a 1, and so on.
So the desired output could be something like :
1
1
1
Because here there are exactly 3 pairs (they are paired by the fact if they are consecutive) of rows that are exactly the same: rows 3&4, 7&8, and 9&10. The comparison should not reuse a row, so if you compare rows 1 & 2, you shouldn't then compare rows 2 & 3.
You can do this with awk like:
awk -F, '!(NR%2) {print $0==p} {p=$0}' data
0
1
0
1
1
where every line that's evenly divisible by two will print a 0 if the current line doesn't match the last value for p or a 1 if it matches.
If you truly only want the 1s, which is throwing away any information about which pairs matched, you could:
awk -F, '!(NR%2)&&$0==p {print 1} {p=$0}' data
1
1
1
Alternatively, you could output matching pair line numbers like:
awk -F, '!(NR%2)&&$0==p {print NR-1 "," NR} {p=$0}' data
3,4
7,8
9,10
Or just the counts of all matched pairs:
awk -F, '!(NR%2)&&$0==p {c++} {p=$0} END{ print c}' data
3
Another useful variant might be just to return the matching lines directly:
awk -F, '!(NR%2)&&$0==p {print} {p=$0}' data
1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,1,0
1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0
1,1,1,0,0,0,1,1,1,0,0,0,1,1,1,0,0,0
I would use a shell script like this:
while read line
do
if test "$prevline" = "$line"
then
echo 1
fi
prevline=$line
done
I'm not 100% sure about your requirement to "not reuse a row", but I think that could be achieved by changing inner part of the loop to
if test "$prevline" = "$line"
then
echo 1
line="" # don't reuse a line
fi

print if all value are higher

I have a file like:
A 50.40,60.80,56.60,67.80,51.20,78.40,63.80,64.2
B 37.40,37.40,38.40,38.80,58.40,58.80,45.00,44.8
.
.
.
I want to print those lines that all values in column 2 are more than 50
output:
A 50.40,60.80,56.60,67.80,51.20,78.40,63.80,64.2
I tried:
cat file | tr ',' '\t' | awk '{for (i=2; i<=NF; i++){if($i<50) continue; else print $i}}'
I hope you meant that r tag you added to your question.
tab <- read.table("file")
splt <- strsplit(as.character(tab[[2]]), ",")
rows <- unlist(lapply(splt, function(a) all(as.numeric(a) > 50)))
tab[rows,]
This will read your file as a space-separated table, split the second column into individual values (resulting in a list of character vectors), then compute a logical value for each such row depending on whether or not all values are > 50. These results are combined to a logical vector which is then used to subset your data.
The field separator can be any regular expression, so if you include commas in FS your approach works:
awk '{ for(i=2; i<=NF; i++) if($i<=50) next } 1' FS='[ \t,]+' infile
Output:
A 50.40,60.80,56.60,67.80,51.20,78.40,63.80,64.2
Explanation
The for-loop runs through the comma-separated values in the second column and if any of them is lower than or equal to 50 next is executed, i.e. skip to next line. If the first block is passed, the 1 is encountered which evaluates to true and executes the default block: { print $0 }.

Resources