Please help to calculate Moving/Rolling back Weekly Sum of Amount($4) based on Distributor wise ($2) and Rolling Date wise.
Want to set vaiable like
RollingStartDate ==01/05/2015 and RollingInterval==7 and RollingEndDate ==08/05/2015
For Example :
1st May 2015 Rolling 7 Days data set would be from 01/05/2015 to 25/04/2015
2nd May 2015 Rolling 7 Days data set would be from 02/05/2015 to 26/04/2015
....................................................................
7th May 2015 Rolling 7 Days data set would be from 07/05/2015 to 01/05/2015
8th May 2015 Rolling 7 Days data set would be from 08/05/2015 to 02/05/2015
Input.csv
Des,Date,Distributor,Amount,Loc
aaa,25/04/2015,abc123,25,bbb
aaa,25/04/2015,xyz456,75,bbb
aaa,26/04/2015,xyz456,50,bbb
aaa,27/04/2015,abc123,250,bbb
aaa,27/04/2015,abc123,100,bbb
aaa,29/04/2015,xyz456,50,bbb
aaa,30/04/2015,abc123,25,bbb
aaa,01/05/2015,xyz456,75,bbb
aaa,01/05/2015,abc123,50,bbb
aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb
Example: 8th May 2015 Rolling 7 Days data set would be from 08/05/2015 to 02/05/2015
aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb
Output for 8th May 2015 Rolling 7 Days data set
RollingDate,Distributor,Amount
08/05/2015,abc123,145
08/05/2015,xyz456,350
I am able to obtain the above output from this command :
awk -F, '{key=$3;b[key]=b[key]+$4} END {for(i in a) print i","b[i]}'
Kindly suggest how to derive weekly split-up data sets then Sum.
Desired Output:
RollingDate,Distributor,Amount
01/05/2015,abc123,450
01/05/2015,xyz456,250
02/05/2015,abc123,450
02/05/2015,xyz456,250
03/05/2015,abc123,450
03/05/2015,xyz456,200
04/05/2015,abc123,130
04/05/2015,xyz456,235
05/05/2015,abc123,130
05/05/2015,xyz456,247
06/05/2015,abc123,162
06/05/2015,xyz456,240
07/05/2015,abc123,137
07/05/2015,xyz456,327
08/05/2015,abc123,145
08/05/2015,xyz456,350
Edit#1
1.
The logic is to find a Sum of Amount is billed to the distributor for the period of 7days range, i.e if i need to calculate sum for 1st May then I need to consider the line items from 1st May,30th Apr,29th Apr,28th Apr,27th Apr,26th Apr and 25th Apr , It is equivalent to 1st May (-) minus 6 days back ... like wise 2nd May rolling date is equal to from 2nd May to 26th May ( 2nd May minus 6 days back ..)
2.
Date format is DD/MM/YYYY - 02/05/2015 is 2nd May
Since the file contains 2 to 3 months deatils , dont want to select the first date (25/04/2015) from file then do minus 6 days back analysis , hence "RollingStartDate" will help from which dates need to consider the data , "RollingInterval" will help to do the analysis for "7 days" moving back or "14 days" moving back or "30 days monthly " moving back analysis.
"RollingEndDate" will help to avoid if actual file contains any future date data availabe , in this case if 09th or 15th may date line items need to be excluded ...
Here's a solution that just excludes dates that don't have 7 days before them instead of requiring a specific start/stop range:
$ cat tst.awk
BEGIN { FS=OFS=","; window=(window?window:7); secsPerDay=24*60*60 }
NR==1 { print "RollingDate", $3, $4; next }
{
endSecs = mktime(gensub(/(..)\/(..)\/(....)/,"\\3 \\2 \\1 0 0 0","",$2))
if (begSecs=="") {
begSecs = endSecs + ((window-1) * secsPerDay)
}
amount[endSecs][$3] += $4
dists[$3]
}
END {
for (currSecs=begSecs; currSecs<=endSecs; currSecs+=secsPerDay) {
for (dayNr=1; dayNr<=window; dayNr++) {
rollSecs = currSecs - ((dayNr-1) * secsPerDay)
for (dist in dists) {
sum[dist] += (rollSecs in amount ? amount[rollSecs][dist] : 0)
}
}
for (dist in dists) {
print strftime("%d/%m/%Y",currSecs), dist, sum[dist]
delete sum[dist]
}
}
}
.
$ awk -f tst.awk file
RollingDate,Distributor,Amount
01/05/2015,xyz456,250
01/05/2015,abc123,450
02/05/2015,xyz456,250
02/05/2015,abc123,450
03/05/2015,xyz456,200
03/05/2015,abc123,450
04/05/2015,xyz456,235
04/05/2015,abc123,130
05/05/2015,xyz456,247
05/05/2015,abc123,130
06/05/2015,xyz456,240
06/05/2015,abc123,162
07/05/2015,xyz456,327
07/05/2015,abc123,137
08/05/2015,xyz456,350
08/05/2015,abc123,145
.
To use some different window size than 7 days, just set it on the command line:
$ awk -v window=5 -f tst.awk file
RollingDate,Distributor,Amount
29/04/2015,xyz456,175
29/04/2015,abc123,375
30/04/2015,xyz456,100
30/04/2015,abc123,375
01/05/2015,xyz456,125
01/05/2015,abc123,425
02/05/2015,xyz456,200
02/05/2015,abc123,100
03/05/2015,xyz456,200
03/05/2015,abc123,100
04/05/2015,xyz456,185
04/05/2015,abc123,130
05/05/2015,xyz456,197
05/05/2015,abc123,105
06/05/2015,xyz456,165
06/05/2015,abc123,87
07/05/2015,xyz456,177
07/05/2015,abc123,62
08/05/2015,xyz456,275
08/05/2015,abc123,120
The above uses GNU awk for true 2D arrays and time functions. Hopefully it's clear enough that you can make any modifications you need to include/exclude specific date ranges.
Help needed. I want to increment Date (which is a string) column in csv by one day.
e.g. (Date Format yyyy-MM-dd)
Col1,Col2,Col3
ABC,001,1900-01-01
XYZ,002,2000-01-01
Expected OutPut
Col1,Col2,Col3
ABC,001,1900-01-02
XYZ,002,2000-01-02
There's one standard Unix utility that has all the date magic from September 14, 1752 through December 31, 9999 built-in: the calendar cal. Instead of reinventing the wheel and do messy date calculations we will use its intelligence to our advantage. The basic problem is: given a date, is it the last day of a month? If not, simply increment the day. If yes, reset day to 1 and increment month (and possibly year).
However, the output of cal is unspecified and it may look like this:
$ cal 2 1900
February 1900
Su Mo Tu We Th Fr Sa
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28
What we would need is a list of days, 1 2 3 ... 28. We can do this by skipping everything up to the "1":
set -- $(cal 2 1900)
while test $1 != 1; do shift; done
Now the number of args gives us the number of days in February 1900:
$ echo $#
28
Putting it all together in a script:
#!/bin/sh
read -r header
printf "%s\n" "$header"
while IFS=,- read -r col1 col2 y m d; do
case $m-$d in
(12-31) y=$((y+1)) m=01 d=01;;
(*)
set -- $(cal $m $y)
# Shift away the month and weekday names.
while test $1 != 1; do shift; done
# Is the day the last day of a month?
if test ${d#0} -eq $#; then
# Yes: increment m and reset d=01.
m=$(printf %02d $((${m#0}+1)))
d=01
else
# No: increment d.
d=$(printf %02d $((${d#0}+1)))
fi
;;
esac
printf "%s,%s,%s-%s-%s\n" "$col1" "$col2" $y $m $d
done
Running it on this input:
Col1,Col2,Col3
ABC,001,1900-01-01
ABC,001,1900-02-28
ABC,001,1900-12-31
XYZ,002,2000-01-01
XYZ,002,2000-02-28
XYZ,002,2000-02-29
yields
Col1,Col2,Col3
ABC,001,1900-01-02
ABC,001,1900-03-01
ABC,001,1901-01-01
XYZ,002,2000-01-02
XYZ,002,2000-02-29
XYZ,002,2000-03-01
I made one little assumption: The first two columns don't contain a - or escaped comma. If they do, the IFS=,- read will act up.
Using the date command, this can be done in awk:
awk 'BEGIN{FS=OFS=","}NR>1{("date -d\""$3" +1 day\" +%Y-%m-%d")|getline newdate; $3=newdate; print}' file.in
If you can extract the date from the file, you can use this:
d="1900-01-01" # date from file
date --date '#'$(( $(date --date $d +"%s") + 86400 ))
I am trying to find a file that are 0 days old. Below are the steps I performed to test this
$ ls
$ ls -ltr
total 0
$ touch tmp.txt
$ ls -ltr
total 0
-rw-r----- 1 tstUser tstUser 0 Feb 28 20:02 tmp.txt
$ find * -mtime 0
$
$ find * -mtime -1
tmp.txt
$
Why is '-mtime 0' not getting me the file?
What is the exact difference between '-mtime 0' and '-mtime -1'?
Im sure there must be other ways to find files that are 0 days old in unix, but im curious in understanding how this '-mtime' actually works.
This is a not user friendly aspect of find - you have to understand how the matching actually works to correctly define your search criteria. The following explanation is based on GNU find (findutils) 4.4.2.
find tests -atime, -ctime, -mtime work on 24 hour periods, therefore let's define "file age" as
floor (current_timestamp - file_modification_timestamp / 86400)
Given three files modified 1 hour ago, 25 hours ago and 49 hours ago
$ touch -t $(date -d "1 hour ago" +"%m%d%H%M") a.txt
$ touch -t $(date -d "25 hours ago" +"%m%d%H%M") b.txt
$ touch -t $(date -d "49 hours ago" +"%m%d%H%M") c.txt
file ages (as defined above) are
$ echo "($(date +"%s") - $(stat -c %Y a.txt)) / 86400" | bc
0
$ echo "($(date +"%s") - $(stat -c %Y b.txt)) / 86400" | bc
1
$ echo "($(date +"%s") - $(stat -c %Y c.txt)) / 86400" | bc
2
Given the above, here's what find does
$ find -type f -mtime 0 # find files with file age == 0, i.e. files modified less than 24 hours ago
./a.txt
$ find -type f -mtime -1 # find files with file age < 1, i.e. files modified less than 24 hours ago
./a.txt
$ find -mtime 1 # find files with file age == 1, i.e. files modified more than (or equal to) 24 hours ago, but less than 48 hours ago
./b.txt
$ find -mtime +1 # find files with file age > 1, i.e. files modified more than 48 hours ago
./c.txt
This shows that -mtime 0 and -mtime -1 give equivalent results.
-mmin gives the same test with finer granularity - argument is minutes instead of 24 hour periods.
I'm unable to reproduce your problem using the aforementioned version of find
$ touch tmp.txt
$ find * -mtime 0
tmp.txt
$ find * -mtime -1
tmp.txt
-mtime n
File's data was last modified n*24 hours ago. See the comments
for -atime to understand how rounding affects the interpretation
of file modification times.
So, -mtime 0 would be equal to: "File's data was last modified 0 hours ago.
While -mtime 1 would be: "File's data was last modified 24 hours ago"
Edit:
Numeric arguments can be specified as
+n for greater than n,
-n for less than n,
n for exactly n.
So I guess -1 would be modified within the last 24 hours, while 1 would be exactly one day.
The meaning of those three possibilities are as following:
n: exactly n 24-hour periods (days) ago, 0 means today.
+n: "more then n 24-hour periods (days) ago", or older then n,
-n: less than n 24-hour periods (days) ago (-n), or younger then n. It's evident that -1, and 0 are the same and both means "today".
NOTE: If you use parameters with find command in scripts be careful when -mtime parameter is equal zero. Some (earlier) versions of GNU find incorrectly interpret the following expression
Sourece: http://www.softpanorama.org/Tools/Find/index.shtml