I have following data ;
24692 -rw-rw-r--+ 1 da01 da01 25284427 Aug 31 09:06 collected_BOT.227031
24660 -rw-rw-r--+ 1 da01 da01 25248756 Aug 31 09:35 collected_BOT.227032
24748 -rw-rw-r--+ 1 da01 da01 25338868 Aug 31 10:03 collected_BOT.227033
24740 -rw-rw-r--+ 1 da01 da01 25331322 Aug 31 10:31 collected_BOT.227034
sample:
grep 1303 collected_BOT.227034 | more
1559254293,151840703,AJ1X,10178801756650692,VA,VB,0,0,2,2,1303,1,L1O,6797,129,1,3,601,0,GVW1,9110,551,17,000000,0001,000000,,6,4,,1,1,,0
1559254294,151840704,AJ2X,10178801756650693,VA,VB,0,0,2,2,1303,1,L2O,6797,203,1,3,601,0,GVW2,9110,552,17,000000,0001,000000,,6,4,,1,1,,0
1559254295,151840705,AJ3X,10178801756650694,VA,VB,0,0,2,2,1303,1,L3O,6797,664,1,3,601,0,GVW3,9110,552,17,000000,0001,000000,,6,4,,1,1,,0
$15 = duration
I just want to calculate the total amount $15 on file collected_BOT.227034 (only if $11=1303)
awk -F, '$11==1303{sum+=$15} END {print sum}' collected_BOT.227034
-F, field separator is ,
$11==1303 check if 11th field exactly matches the number 1303
If so, add the value of 15th field to sum variable (whose initial value is zero by default)
END {print sum} after processing all the lines of input file, print the value of sum variable
Edit:
Thanks #Mark Setchell for pointing out that $11==1303 can be used instead of $11 ~ /^1303$/
Also, use print sum + 0 if output is needed as '0' even when no lines match. Or an explicit BEGIN{sum=0} block
Great solution #sp asic.
No need to use regular expression for field $11 though:
awk -F, '$11=="1303" {sum+=$15} END {print sum}' collected_BOT.227034
(beware: use == and not =, because this last one will do nothing except do a (successful) assignment to field $11
This question already has an answer here:
Use awk to sum or average for each unique ID
(1 answer)
Closed 6 years ago.
I have a file that contains several comma-separated columns, including a customer ID in the first column.
One customer ID may occur on several rows, but always refers to the same real customer.
How do I run basic calculations in a shell script based on this ID column? For example, calculating the sum of the mileages (the 5th field) for the given customer ID.
102,305,Jin,Kerala,40
104,308,Paul,US,45
105,350,Nina,AUS,50
102,390,Jin,Kerala,10
104,395,Paul,US,35
102,399,Jin,Kerala,35
5th field is the mileage, 1st field is the customer ID.
This is a simple awk script that will sum up the mileages and print the customer IDs together with the sums at the end:
#!/usr/bin/awk -f
BEGIN { FS = "," }
{
customer_id = $1;
mileage = $5;
total_mileage[customer_id] += mileage;
}
END {
for (customer_id in total_mileage) {
print customer_id, total_mileage[customer_id];
}
}
To run (after making it executable with chmod +x script.awk):
$ ./script.awk data.in
102 85
104 80
105 50
Alternatively, as a "one-liner":
$ awk -F, '{t[$1]+=$5} END {for (c in t){print c,t[c]}}' data.in
102 85
104 80
105 50
While I agree with #wilx that using a database might be smarter, this sample awk script should get you started:
awk -v FS=',' '{miles[$1] += $5}
END { for (customerid in miles) {
print customerid, miles[customerid]; } }' customers
You can get a list of unique IDs using something like (assuming the first column is the ID):
awk '{print $1}' inputFile | sort -u
This outputs the first field of every single line in the input file inputFile, sorts them and removes duplicates.
You can then use that method with a bash loop to process each of the unique IDs with another awk command to perform some action on them. In the following snippet, I print out the matching lines for each ID:
for id in $(awk '{print $1}' inputFile | sort -u) ; do
echo "${id}:"
awk -vid=${id} '$1==id {print " "$0)' inputFile
done
In that code, for each individual ID, it first outputs the ID then uses awk to only process lines matching that ID. The action carried out is to output the full line with indentation.
Of course, you can do anything you wish with the lines matching each ID. As shown below, an example more closely matching your requirements.
First, here's an input file I used for testing - we can assume field 1 is the customer ID and field 2 the mileage:
$ cat inputFile
a 1
b 2
c 3
a 4
b 5
c 6
a 7
b 8
c 9
b 10
c 11
c 12
And here's a command-line transcript of the method proposed (note that $ and + are input prompt and continuation prompt respectively, they are not part of the actual commands):
$ for id in $(awk '{print $1}' inputFile | sort -u) ; do
+ awk -vid=${id} '
+ $1==id {print $0; sum += $2 }
+ END {print "Total: "sum; print }
+ ' inputFile
+ done
a 1
a 4
a 7
Total: 12
b 2
b 5
b 8
b 10
Total: 25
c 3
c 6
c 9
c 11
c 12
Total: 41
Keep in mind that, for non-huge data sets, it's also possible to do this in a single pass awk script, using associative arrays to store the totals then outputting all the data in the END block. I myself tend to prefer the multi-pass approach myself since it minimises the possibility of running out of memory. The trade-off, of course, is that it will no doubt take longer since you're processing the file more than once.
For a single-pass solution, you can use something like:
$ awk '{sum[$1] += $2} {for (key in sum) { print key": "sum[key]}}' inputFile
which gives you:
a: 12
b: 25
c: 41
Please help to calculate Moving/Rolling back Weekly Sum of Amount($4) based on Distributor wise ($2) and Rolling Date wise.
Want to set vaiable like
RollingStartDate ==01/05/2015 and RollingInterval==7 and RollingEndDate ==08/05/2015
For Example :
1st May 2015 Rolling 7 Days data set would be from 01/05/2015 to 25/04/2015
2nd May 2015 Rolling 7 Days data set would be from 02/05/2015 to 26/04/2015
....................................................................
7th May 2015 Rolling 7 Days data set would be from 07/05/2015 to 01/05/2015
8th May 2015 Rolling 7 Days data set would be from 08/05/2015 to 02/05/2015
Input.csv
Des,Date,Distributor,Amount,Loc
aaa,25/04/2015,abc123,25,bbb
aaa,25/04/2015,xyz456,75,bbb
aaa,26/04/2015,xyz456,50,bbb
aaa,27/04/2015,abc123,250,bbb
aaa,27/04/2015,abc123,100,bbb
aaa,29/04/2015,xyz456,50,bbb
aaa,30/04/2015,abc123,25,bbb
aaa,01/05/2015,xyz456,75,bbb
aaa,01/05/2015,abc123,50,bbb
aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb
Example: 8th May 2015 Rolling 7 Days data set would be from 08/05/2015 to 02/05/2015
aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb
Output for 8th May 2015 Rolling 7 Days data set
RollingDate,Distributor,Amount
08/05/2015,abc123,145
08/05/2015,xyz456,350
I am able to obtain the above output from this command :
awk -F, '{key=$3;b[key]=b[key]+$4} END {for(i in a) print i","b[i]}'
Kindly suggest how to derive weekly split-up data sets then Sum.
Desired Output:
RollingDate,Distributor,Amount
01/05/2015,abc123,450
01/05/2015,xyz456,250
02/05/2015,abc123,450
02/05/2015,xyz456,250
03/05/2015,abc123,450
03/05/2015,xyz456,200
04/05/2015,abc123,130
04/05/2015,xyz456,235
05/05/2015,abc123,130
05/05/2015,xyz456,247
06/05/2015,abc123,162
06/05/2015,xyz456,240
07/05/2015,abc123,137
07/05/2015,xyz456,327
08/05/2015,abc123,145
08/05/2015,xyz456,350
Edit#1
1.
The logic is to find a Sum of Amount is billed to the distributor for the period of 7days range, i.e if i need to calculate sum for 1st May then I need to consider the line items from 1st May,30th Apr,29th Apr,28th Apr,27th Apr,26th Apr and 25th Apr , It is equivalent to 1st May (-) minus 6 days back ... like wise 2nd May rolling date is equal to from 2nd May to 26th May ( 2nd May minus 6 days back ..)
2.
Date format is DD/MM/YYYY - 02/05/2015 is 2nd May
Since the file contains 2 to 3 months deatils , dont want to select the first date (25/04/2015) from file then do minus 6 days back analysis , hence "RollingStartDate" will help from which dates need to consider the data , "RollingInterval" will help to do the analysis for "7 days" moving back or "14 days" moving back or "30 days monthly " moving back analysis.
"RollingEndDate" will help to avoid if actual file contains any future date data availabe , in this case if 09th or 15th may date line items need to be excluded ...
Here's a solution that just excludes dates that don't have 7 days before them instead of requiring a specific start/stop range:
$ cat tst.awk
BEGIN { FS=OFS=","; window=(window?window:7); secsPerDay=24*60*60 }
NR==1 { print "RollingDate", $3, $4; next }
{
endSecs = mktime(gensub(/(..)\/(..)\/(....)/,"\\3 \\2 \\1 0 0 0","",$2))
if (begSecs=="") {
begSecs = endSecs + ((window-1) * secsPerDay)
}
amount[endSecs][$3] += $4
dists[$3]
}
END {
for (currSecs=begSecs; currSecs<=endSecs; currSecs+=secsPerDay) {
for (dayNr=1; dayNr<=window; dayNr++) {
rollSecs = currSecs - ((dayNr-1) * secsPerDay)
for (dist in dists) {
sum[dist] += (rollSecs in amount ? amount[rollSecs][dist] : 0)
}
}
for (dist in dists) {
print strftime("%d/%m/%Y",currSecs), dist, sum[dist]
delete sum[dist]
}
}
}
.
$ awk -f tst.awk file
RollingDate,Distributor,Amount
01/05/2015,xyz456,250
01/05/2015,abc123,450
02/05/2015,xyz456,250
02/05/2015,abc123,450
03/05/2015,xyz456,200
03/05/2015,abc123,450
04/05/2015,xyz456,235
04/05/2015,abc123,130
05/05/2015,xyz456,247
05/05/2015,abc123,130
06/05/2015,xyz456,240
06/05/2015,abc123,162
07/05/2015,xyz456,327
07/05/2015,abc123,137
08/05/2015,xyz456,350
08/05/2015,abc123,145
.
To use some different window size than 7 days, just set it on the command line:
$ awk -v window=5 -f tst.awk file
RollingDate,Distributor,Amount
29/04/2015,xyz456,175
29/04/2015,abc123,375
30/04/2015,xyz456,100
30/04/2015,abc123,375
01/05/2015,xyz456,125
01/05/2015,abc123,425
02/05/2015,xyz456,200
02/05/2015,abc123,100
03/05/2015,xyz456,200
03/05/2015,abc123,100
04/05/2015,xyz456,185
04/05/2015,abc123,130
05/05/2015,xyz456,197
05/05/2015,abc123,105
06/05/2015,xyz456,165
06/05/2015,abc123,87
07/05/2015,xyz456,177
07/05/2015,abc123,62
08/05/2015,xyz456,275
08/05/2015,abc123,120
The above uses GNU awk for true 2D arrays and time functions. Hopefully it's clear enough that you can make any modifications you need to include/exclude specific date ranges.
I am trying to find the date that was seven days before today.
CURRENT_DT=`date +"%F %T"`
diff=$CURRENT_DT-7
echo $diff
I am trying stuff like the above to find the 7 days less than from current date. Could anyone help me out please?
GNU date will to the math for you:
date --date "7 days ago"
Other version will require you to covert the current date into seconds since the UNIX epoch first, manually subtract 7 days' worth of seconds, and convert that back into the desired form. Consult the documentation for your version of date for details on how to convert to and from Unix timestamps. Here's an example using GNU date again:
x=$(date +%s)
x=$((x - 7 * 24 * 60 * 60))
date --date #$x
Here is a simple Perl script which (unlike the other examples) works with Unix:
perl -e 'use POSIX qw(ctime); printf "%s", ctime(time - (7 * 24 * 60 * 60));'
(Tested with Solaris 10, and a token Linux system, of course - with the caveat that Perl is not necessarily part of one's configuration, merely very likely).
Adding this one for shells on OSX:
date -v-7d
> Tue Apr 3 15:16:31 EDT 2018
date
> Tue Apr 10 15:16:33 EDT 2018
Need that formated?
date -v-7d +%Y-%m-%d
> 2018-04-03
Ksh's printf can do time calculation:
$ printf '%(%Y-%m-%d)T\n'
2015-04-07
$ printf '%(%Y-%m-%d)T\n' '7 days ago'
2015-03-31
$
I haven't used unix in a while but I found this in one of my scripts
echo `date +%s`-604800 | bc
DATE=$(date --date "7 days ago" | awk '{print$1,$2,$3}')
echo "$DATE"
if [ -z "$(grep -i "$DATE" test.log)" ]; then
exit 1
fi
sed -i "1,/$DATE/d" test.log