To calculate Moving/Rolling back Weekly (7 days) Sum: - unix

Please help to calculate Moving/Rolling back Weekly Sum of Amount($4) based on Distributor wise ($2) and Rolling Date wise.
Want to set vaiable like
RollingStartDate ==01/05/2015 and RollingInterval==7 and RollingEndDate ==08/05/2015
For Example :
1st May 2015 Rolling 7 Days data set would be from 01/05/2015 to 25/04/2015
2nd May 2015 Rolling 7 Days data set would be from 02/05/2015 to 26/04/2015
....................................................................
7th May 2015 Rolling 7 Days data set would be from 07/05/2015 to 01/05/2015
8th May 2015 Rolling 7 Days data set would be from 08/05/2015 to 02/05/2015
Input.csv
Des,Date,Distributor,Amount,Loc
aaa,25/04/2015,abc123,25,bbb
aaa,25/04/2015,xyz456,75,bbb
aaa,26/04/2015,xyz456,50,bbb
aaa,27/04/2015,abc123,250,bbb
aaa,27/04/2015,abc123,100,bbb
aaa,29/04/2015,xyz456,50,bbb
aaa,30/04/2015,abc123,25,bbb
aaa,01/05/2015,xyz456,75,bbb
aaa,01/05/2015,abc123,50,bbb
aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb
Example: 8th May 2015 Rolling 7 Days data set would be from 08/05/2015 to 02/05/2015
aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb
Output for 8th May 2015 Rolling 7 Days data set
RollingDate,Distributor,Amount
08/05/2015,abc123,145
08/05/2015,xyz456,350
I am able to obtain the above output from this command :
awk -F, '{key=$3;b[key]=b[key]+$4} END {for(i in a) print i","b[i]}'
Kindly suggest how to derive weekly split-up data sets then Sum.
Desired Output:
RollingDate,Distributor,Amount
01/05/2015,abc123,450
01/05/2015,xyz456,250
02/05/2015,abc123,450
02/05/2015,xyz456,250
03/05/2015,abc123,450
03/05/2015,xyz456,200
04/05/2015,abc123,130
04/05/2015,xyz456,235
05/05/2015,abc123,130
05/05/2015,xyz456,247
06/05/2015,abc123,162
06/05/2015,xyz456,240
07/05/2015,abc123,137
07/05/2015,xyz456,327
08/05/2015,abc123,145
08/05/2015,xyz456,350
Edit#1
1.
The logic is to find a Sum of Amount is billed to the distributor for the period of 7days range, i.e if i need to calculate sum for 1st May then I need to consider the line items from 1st May,30th Apr,29th Apr,28th Apr,27th Apr,26th Apr and 25th Apr , It is equivalent to 1st May (-) minus 6 days back ... like wise 2nd May rolling date is equal to from 2nd May to 26th May ( 2nd May minus 6 days back ..)
2.
Date format is DD/MM/YYYY - 02/05/2015 is 2nd May
Since the file contains 2 to 3 months deatils , dont want to select the first date (25/04/2015) from file then do minus 6 days back analysis , hence "RollingStartDate" will help from which dates need to consider the data , "RollingInterval" will help to do the analysis for "7 days" moving back or "14 days" moving back or "30 days monthly " moving back analysis.
"RollingEndDate" will help to avoid if actual file contains any future date data availabe , in this case if 09th or 15th may date line items need to be excluded ...

Here's a solution that just excludes dates that don't have 7 days before them instead of requiring a specific start/stop range:
$ cat tst.awk
BEGIN { FS=OFS=","; window=(window?window:7); secsPerDay=24*60*60 }
NR==1 { print "RollingDate", $3, $4; next }
{
endSecs = mktime(gensub(/(..)\/(..)\/(....)/,"\\3 \\2 \\1 0 0 0","",$2))
if (begSecs=="") {
begSecs = endSecs + ((window-1) * secsPerDay)
}
amount[endSecs][$3] += $4
dists[$3]
}
END {
for (currSecs=begSecs; currSecs<=endSecs; currSecs+=secsPerDay) {
for (dayNr=1; dayNr<=window; dayNr++) {
rollSecs = currSecs - ((dayNr-1) * secsPerDay)
for (dist in dists) {
sum[dist] += (rollSecs in amount ? amount[rollSecs][dist] : 0)
}
}
for (dist in dists) {
print strftime("%d/%m/%Y",currSecs), dist, sum[dist]
delete sum[dist]
}
}
}
.
$ awk -f tst.awk file
RollingDate,Distributor,Amount
01/05/2015,xyz456,250
01/05/2015,abc123,450
02/05/2015,xyz456,250
02/05/2015,abc123,450
03/05/2015,xyz456,200
03/05/2015,abc123,450
04/05/2015,xyz456,235
04/05/2015,abc123,130
05/05/2015,xyz456,247
05/05/2015,abc123,130
06/05/2015,xyz456,240
06/05/2015,abc123,162
07/05/2015,xyz456,327
07/05/2015,abc123,137
08/05/2015,xyz456,350
08/05/2015,abc123,145
.
To use some different window size than 7 days, just set it on the command line:
$ awk -v window=5 -f tst.awk file
RollingDate,Distributor,Amount
29/04/2015,xyz456,175
29/04/2015,abc123,375
30/04/2015,xyz456,100
30/04/2015,abc123,375
01/05/2015,xyz456,125
01/05/2015,abc123,425
02/05/2015,xyz456,200
02/05/2015,abc123,100
03/05/2015,xyz456,200
03/05/2015,abc123,100
04/05/2015,xyz456,185
04/05/2015,abc123,130
05/05/2015,xyz456,197
05/05/2015,abc123,105
06/05/2015,xyz456,165
06/05/2015,abc123,87
07/05/2015,xyz456,177
07/05/2015,abc123,62
08/05/2015,xyz456,275
08/05/2015,abc123,120
The above uses GNU awk for true 2D arrays and time functions. Hopefully it's clear enough that you can make any modifications you need to include/exclude specific date ranges.

Related

pyspark daterange calculations in spark

I am trying to process website login session data by each user. I am reading an S3 session log file into an RDD. The data looks something like this.
----------------------------------------
User | Site | Session start | Session end
---------------------------------------
Joe |Waterloo| 9/21/19 3:04 AM |9/21/19 3:18 AM
Stacy|Kirkwood| 8/4/19 3:06 PM |8/4/19 3:54 PM
John |Waterloo| 9/21/19 8:48 AM |9/21/19 9:05 AM
Stacy|Kirkwood| 8/4/19 4:16 PM |8/4/19 5:41 PM
...
...
I want to find out how many users were logged in each second of the hour on a given day.
Example: I might be processing this data for 9/21/19 only. So, I would need to remove all other records and then SUM user sessions for each second of the hour for all 24 hours of 9/21/19. The output should be possibly 24 rows for all the hours of 9/21/19 and then counts for each second of the day(yikes, second by second data!).
Is this something possible to do in pyspark using either rdds or DF?
(Apologize for the tardiness in building the grid).
Thanks
my dataset
data=[['Joe','Waterloo','9/21/19 3:04 AM','9/21/19 3:18 AM'],['Stacy','Kirkwood','8/4/19 3:06 PM','8/4/19 3:54 PM'],['John','Waterloo','9/21/19 8:48 AM','9/21/19 9:05 AM'],
['Stacy','Kirkwood','9/21/19 4:06 PM', '9/21/19 4:54 PM'],
['Mo','Hashmi','9/21/19 1:06 PM', '9/21/19 5:54 PM'],
['Murti','Hash','9/21/19 1:00 PM', '9/21/19 3:00 PM'],
['Floo','Shmi','9/21/19 9:10 PM', '9/21/19 11:54 PM']]
cSchema = StructType([StructField("User", StringType())\
,StructField("Site", StringType())
, StructField("Sesh-Start", StringType())
, StructField("Sesh-End", StringType())])
df= spark.createDataFrame(data,schema=cSchema)
display(df)
parse timestamp
df1=df.withColumn("Start", F.from_unixtime(F.unix_timestamp("Sesh-Start",'MM/dd/yyyy hh:mm aa'),'20yy-MM-dd HH:mm:ss').cast("timestamp")).withColumn("End", F.from_unixtime(F.unix_timestamp("Sesh-End",'MM/dd/yyyy hh:mm aa'),'20yy-MM-dd HH:mm:ss').cast("timestamp")).drop("Sesh-Start","Sesh-End")
build and register udf, for multiple hours per person
def yo(a,b):
from datetime import datetime
d1 = datetime.strptime(str(a), '%Y-%m-%d %H:%M:%S')
d2 = datetime.strptime(str(b), '%Y-%m-%d %H:%M:%S')
y=[]
if d1.hour == d2.hour:
y.append(d1.hour)
else:
for i in range(d1.hour,d2.hour+1):
y.append(i)
return y
rng= udf(yo, ArrayType(IntegerType()))
explode list of hours into column
df2=df1.withColumn("new", rng(F.col("Start"),F.col("End"))).withColumn("new1",F.explode("new")).drop("new")
get seconds for each hour
df3=df2.withColumn("Seconds", when(F.hour("Start")==F.hour("End"), F.col("End").cast('long') - F.col("Start").cast('long'))
.when(F.hour("Start")==F.col("new1"), 3600-F.minute("Start")*60)
.when(F.hour("End")==F.col("new1"), F.minute("End")*60)
.otherwise(3600))
create temp view and query it
df3.createOrReplaceTempView("final")
display(spark.sql("Select new1, sum(Seconds) from final group by new1 order by new1"))
The above answer by Lennart could be more perfomant because he uses a join to get all the different hours, instead I use a UDF which could be slower. My code will work for any user who can be online for any amount of hours. My data used only the day required, so you could use day filter given above to limit your query to the day in question.. Final output
Try to check this:
Initiaize filter.
val filter = to_date("2019-09-21")
val startFilter = to_timestamp("2019-09-21 00:00:00.000")
val endFilter = to_timestamp("2019-09-21 23:59:59.999")
Generate range (0 .. 23).
hours = spark.range(24).collect()
Get actual user sessions that match the filter.
df = sessions.alias("s") \
.where(filter >= to_date(s.start) & filter <= to_date(s.end)) \
.select(s.user, \
when(s.start < startFilter, startFilter).otherwise(s.start).alias("start"), \
when(s.end > endFilter, endFilter).otherwise(s.end).alias("end"))
Combine match user sessions with range of hours.
df2 = df.join(hours, hours.id.between(hour(df.start), hour(df.end)), 'inner') \
.select(df.user, hours.id.alias("hour"), \
(when(hour(df.end) > hours.id, 360).otherwise(minute(df.end) * 60 + second(df.end)) - \
when(hour(df.start) < hours.id, 0).otherwise(minute(df.start) * 60 + second(df.start))).alias("seconds"))
Generate summary: calculate users count and sum of seconds for each hour of sessions.
df2.groupBy(df2.hour)\
.agg(count(df2.user).alias("user counts"), \
sum(dg2.seconds).alias("seconds")) \
.show()
Hope this helps.

Calculating time difference in sas (military time)

I am having trouble figuring out how to calculate duration of a time variable
Any thoughts on how to tackle this?
A military time value encoded as a integer number h,hmm can be processed by converting the number to a SAS time value and then performing delta computations using certain assumptions.
data sleep_log;
input name $ boots_down boots_up;
datalines;
Joe 2000 0600 slept over midnight
Joe 1000 1230 slept into lunch
Joe 1630 1700 30 winks
Joe 0100 0100 out cold!
run;
data sleep_data;
set sleep_log;
down = hms(
int(boots_down / 100) /* extract hours */
, mod(boots_down , 100) /* extract minutes */
, 0 /* seconds not logged, use zero */
);
up = hms(
int(boots_up / 100) /* extract hours */
, mod(boots_up , 100) /* extract minutes */
, 0 /* seconds not logged, use zero */
);
* SAS time values are linear and simple arithmetic can apply;
if up <= down
then delta = '24:00't + up - down; /* presume roll over midnight */
else delta = up - down;
format down up delta time5.;
run;
A more robust log would also record the day, eliminating presumptions and providing a proper time dimension.
You can extract the Hours and Minutes from your numeric military time HHMM , then create a SAS time using HMS() function.
Extract Hours: Divide your HHMM by 100 and save as integer to get hours,
Extract Minutes: get the Remainder (MOD) of HHMM by 100 to get the minutes,
Create a new time variable using HMS(Hour,Minute,Second),
Create a new Datetime for each using DHMS(date,hour,minute,second)
Full Code:
data have;
input sleep awake date_s date_w;
informat date_s date9. date_w date9.;
format sleep z4. awake z4. date_s date9. date_w date9.;
datalines;
2300 0500 12feb2018 13feb2018
2000 0300 11feb2018 12feb2018
0530 1230 10feb2018 10feb2018
;
run;
data want;
set have;
new_sleep_time=hms(int(sleep/100),int(mod(sleep,100)),0);
new_awake_time=hms(int(awake/100),int(mod(awake,100)),0);
dt_awake=dhms(date_w,hour(new_awake_time),minute(new_awake_time),0);
dt_sleep=dhms(date_s,hour(new_sleep_time),minute(new_sleep_time),0);
diff=dt_awake-dt_sleep;
keep new_sleep_time new_awake_time dt_awake dt_sleep diff;
format new_sleep_time time8. new_awake_time time8. diff time8. dt_awake datetime21. dt_sleep datetime21.;
run;
Output:
new_sleep_time=23:00:00 new_awake_time=5:00:00 diff=6:00:00 dt_awake=13FEB2018:05:00:00 dt_sleep=12FEB2018:23:00:00
new_sleep_time=20:00:00 new_awake_time=3:00:00 diff=7:00:00 dt_awake=12FEB2018:03:00:00 dt_sleep=11FEB2018:20:00:00
new_sleep_time=5:30:00 new_awake_time=12:30:00 diff=7:00:00 dt_awake=10FEB2018:12:30:00 dt_sleep=10FEB2018:05:30:00

Gnuplot: data normalization

I have several time-based datasets which are of very different scale, e. g.
[set 1]
2010-01-01 10
2010-02-01 12
2010-03-01 13
2010-04-01 19
…
[set 2]
2010-01-01 920
2010-02-01 997
2010-03-01 1010
2010-04-01 1043
…
I'd like to plot the relative growth of both since 2010-01-01. To put both curves on the same graph I have to normalize them. So I basically need to pick the first Y value and use it as a weight:
plot "./set1" using 1:($2/10), "./set2" using 1:($2/920)
But I want to do it automatically instead of hard-coding 10 and 920 as dividers. I don't even need the max value of the second column, I just want to pick the first value or, better, a value for a given date.
So my question: is there a way to parametrize the value of a given column which corresponds a given value of the given X column (X is a time axis)? Something like
plot "./set1" using 1:($2/$2($1="2010-01-01")), "./set2" using 1:($2/$2($1="2010-01-01"))
where $2($1="2010-01-01") is the feature I'm looking for.
Picking the first value is quite easy. Simply remember its value and divide all data values by it:
ref = 0
plot "./set1" using 1:(ref = ($0 == 0 ? $2 : ref), $2/ref),\
"./set2" using 1:(ref = ($0 == 0 ? $2 : ref), $2/ref)
Using the value at a given date is more involved:
Using an external tool (awk)
ref1 = system('awk ''$1 == "2010-01-01" { print $2; exit; }'' set1')
ref2 = system('awk ''$1 == "2010-01-01" { print $2; exit; }'' set1')
plot "./set1" using 1:($2/ref1), "./set1" using 1:($2/ref2)
Using gnuplot
You can use gnuplot's stats command to pick the desired value, but you must pay attention to do all time settings only after that:
a) String comparison
stats "./set1" using (strcol(1) eq "2010-01-01" ? $2 : 1/0)
ref1 = STATS_max
...
set timefmt ...
set xdata time
...
plot ...
b) Compare the actual time value (works like this only since version 5.0):
reftime = strptime("%Y-%m-%d", "2010-01-01")
stats "./set1" using (timecolumn(1, "%Y-%m-%d") == reftime ? $2 : 1/0)
ref1 = STATS_max
...
set timefmt ...
set xdata time
...
plot ...

File renaming based on file content in UNIX

I have pattern namely QUARTERDATE and FILENAME inside the file.
Both will have some value as in below eg.
My requirement is, I should rename the file like FILENAME_QUARTERDATE.
My file(myfile.txt) will be as below:
QUARTERDATE: 03/31/14 - 06/29/14
FILENAME : LEAD
field1 field2
34567
20.0 5,678
20.0 5,678
20.0 5,678
20.0 5,678
20.0 5,678
I want the the file name to be as LEAD_201402.txt
Date range in the file is for Quarter 2, so i given as 201402.
Thanks in advance for the replies.
newname=$(awk '/QUARTERDATE/ { split($4, d, "/");
quarter=sprintf("%04d%02d", 2000+d[3], int((d[1]-1)/3)+1); }
/FILENAME/ { fn = $3; print fn "_" quarter; exit; }' "$file")
mv "$file" "$newname"
How is a quarter defined?
As noted in comments to the main question, the problem is as yet ill-defined.
What data would appear in the previous quarter's QUARTERDATE line? Could Q1 ever start with a date in December of the previous year? Could the end date of Q2 ever be in July (or Q1 in April, or Q3 in October, or Q4 in January)? Since the first date of Q2 is in March, these alternatives need to be understood. Could a quarter ever start early and end late simultaneously (a 14 week quarter)?
To which the response was:
QUARTERDATE of Q2 will start as 1st Monday of April and end as last Sunday of June.
Which triggered a counter-response:
2014-03-31 is a Monday, but hardly a Monday in April. What this mainly means is that your definition of a quarter is, as yet, not clear. For example, next year, 2015-03-30 is a Monday, but 'the first Monday in April' is 2015-04-06. The last Sunday in March 2015 is 2015-03-29. So which quarter does the week (Mon) 2015-03-30 to (Sun) 2015-04-05 belong to, and why? If you don't know (both how and why), we can't help you reliably.
Plausible working hypothesis
The lessons of Y2K have been forgotten already (why else are two digits used for the year, dammit!).
Quarters run for an integral number of weeks.
Quarters start on a Monday and end on a Sunday.
Quarters remain aligned with the calendar quarters, rather than drifting around the year. (There are 13 weeks in 91 days, and 4 such quarters in a year, but there's a single extra day in an ordinary year and two extra in a leap year, which mean that occasionally you will get a 14-week quarter, to ensure things stay aligned.)
The date for the first date in a quarter will be near 1st January, 1st April, 1st July or 1st October, but the month might be December, March (as in the question), June or September.
The date for the last date in a quarter will be near 31st March, 30th June, 30th September, 31st December, but the month might be April, July, October or January.
By adding 1 modulo 12 (values in the range 1..12, not 0..11) to the start month, you should end up with a month firmly in the calendar quarter.
By subtracting 1 modulo 12 (values in the range 1..12 again) to the end month, you should end up with a month firmly in calendar quarter.
If the data is valid, the 'start + 1' and 'end - 1' months should be in the same quarter.
The early year might be off-by-one if the start date is in December (but that indicates Q1 of the next year).
The end year might be off-by-one if the end date is in January (but that indicates Q4 of the prior year).
More resilient code
Despite the description above, it is possible to write code that detects the quarter despite any or all of the idiosyncrasies of the quarter start and end dates. This code borrows a little from Barmar's answer, but the algorithm is more resilient to the vagaries of the calendar and the quarter start and end dates.
#!/bin/sh
awk '/QUARTERDATE/ {
split($2, b, "/")
split($4, e, "/")
if (b[1] == 12) { q = 1; y = e[3] }
else if (e[1] == 1) { q = 4; y = b[3] }
else
{
if (b[3] != e[3]) {
print "Year mismatch (" $2 " vs " $4 ") in file " FILENAME
exit 1
}
m = int((b[1] + e[1]) / 2)
q = int((m - 1) / 3) + 1
y = e[3]
}
quarter = sprintf("%.4d%.2d", y + 2000, q)
}
/FILENAME/ {
print $3 "_" quarter
# exit
}' "$#"
The calculation for m adds the start month plus one to the end month minus one and then does integer division by two. With the extreme cases already taken care of, this always yields a month number that is in the correct quarter.
The comment in front of the exit associated with FILENAME allows testing more easily. When processing each file separately, as in Barmar's example, that exit is an important optimization. Note that the error message gives an empty file name if the input comes from standard input. (Offhand, I'm not sure how to print the error message to standard error rather than standard output, other than by a platform-specific technique such as print "message" > "/dev/stderr" or print "message" > "/dev/fd/2".)
Given this sample input data (semi-plausible start and end dates for 6 quarters from 2014Q1 through 2015Q2):
QUARTERDATE: 12/30/13 - 03/30/14
FILENAME : LEAD
QUARTERDATE: 03/31/14 - 06/29/14
FILENAME : LEAD
QUARTERDATE: 06/30/14 - 09/28/14
FILENAME : LEAD
QUARTERDATE: 09/29/14 - 12/28/14
FILENAME : LEAD
QUARTERDATE: 12/29/14 - 03/29/15
FILENAME : LEAD
QUARTERDATE: 03/30/15 - 06/29/15
FILENAME : LEAD
The output from this script is:
LEAD_201401
LEAD_201402
LEAD_201403
LEAD_201404
LEAD_201501
LEAD_201502
You can juggle the start and end dates of the quarters within reason and you should still get the required output. But always be wary of calendrical calculations; they are almost invariably harder than you expect.

Unix: Increment date column by one day in csv file

Help needed. I want to increment Date (which is a string) column in csv by one day.
e.g. (Date Format yyyy-MM-dd)
Col1,Col2,Col3
ABC,001,1900-01-01
XYZ,002,2000-01-01
Expected OutPut
Col1,Col2,Col3
ABC,001,1900-01-02
XYZ,002,2000-01-02
There's one standard Unix utility that has all the date magic from September 14, 1752 through December 31, 9999 built-in: the calendar cal. Instead of reinventing the wheel and do messy date calculations we will use its intelligence to our advantage. The basic problem is: given a date, is it the last day of a month? If not, simply increment the day. If yes, reset day to 1 and increment month (and possibly year).
However, the output of cal is unspecified and it may look like this:
$ cal 2 1900
February 1900
Su Mo Tu We Th Fr Sa
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28
What we would need is a list of days, 1 2 3 ... 28. We can do this by skipping everything up to the "1":
set -- $(cal 2 1900)
while test $1 != 1; do shift; done
Now the number of args gives us the number of days in February 1900:
$ echo $#
28
Putting it all together in a script:
#!/bin/sh
read -r header
printf "%s\n" "$header"
while IFS=,- read -r col1 col2 y m d; do
case $m-$d in
(12-31) y=$((y+1)) m=01 d=01;;
(*)
set -- $(cal $m $y)
# Shift away the month and weekday names.
while test $1 != 1; do shift; done
# Is the day the last day of a month?
if test ${d#0} -eq $#; then
# Yes: increment m and reset d=01.
m=$(printf %02d $((${m#0}+1)))
d=01
else
# No: increment d.
d=$(printf %02d $((${d#0}+1)))
fi
;;
esac
printf "%s,%s,%s-%s-%s\n" "$col1" "$col2" $y $m $d
done
Running it on this input:
Col1,Col2,Col3
ABC,001,1900-01-01
ABC,001,1900-02-28
ABC,001,1900-12-31
XYZ,002,2000-01-01
XYZ,002,2000-02-28
XYZ,002,2000-02-29
yields
Col1,Col2,Col3
ABC,001,1900-01-02
ABC,001,1900-03-01
ABC,001,1901-01-01
XYZ,002,2000-01-02
XYZ,002,2000-02-29
XYZ,002,2000-03-01
I made one little assumption: The first two columns don't contain a - or escaped comma. If they do, the IFS=,- read will act up.
Using the date command, this can be done in awk:
awk 'BEGIN{FS=OFS=","}NR>1{("date -d\""$3" +1 day\" +%Y-%m-%d")|getline newdate; $3=newdate; print}' file.in
If you can extract the date from the file, you can use this:
d="1900-01-01" # date from file
date --date '#'$(( $(date --date $d +"%s") + 86400 ))

Resources