File renaming based on file content in UNIX - unix

I have pattern namely QUARTERDATE and FILENAME inside the file.
Both will have some value as in below eg.
My requirement is, I should rename the file like FILENAME_QUARTERDATE.
My file(myfile.txt) will be as below:
QUARTERDATE: 03/31/14 - 06/29/14
FILENAME : LEAD
field1 field2
34567
20.0 5,678
20.0 5,678
20.0 5,678
20.0 5,678
20.0 5,678
I want the the file name to be as LEAD_201402.txt
Date range in the file is for Quarter 2, so i given as 201402.
Thanks in advance for the replies.

newname=$(awk '/QUARTERDATE/ { split($4, d, "/");
quarter=sprintf("%04d%02d", 2000+d[3], int((d[1]-1)/3)+1); }
/FILENAME/ { fn = $3; print fn "_" quarter; exit; }' "$file")
mv "$file" "$newname"

How is a quarter defined?
As noted in comments to the main question, the problem is as yet ill-defined.
What data would appear in the previous quarter's QUARTERDATE line? Could Q1 ever start with a date in December of the previous year? Could the end date of Q2 ever be in July (or Q1 in April, or Q3 in October, or Q4 in January)? Since the first date of Q2 is in March, these alternatives need to be understood. Could a quarter ever start early and end late simultaneously (a 14 week quarter)?
To which the response was:
QUARTERDATE of Q2 will start as 1st Monday of April and end as last Sunday of June.
Which triggered a counter-response:
2014-03-31 is a Monday, but hardly a Monday in April. What this mainly means is that your definition of a quarter is, as yet, not clear. For example, next year, 2015-03-30 is a Monday, but 'the first Monday in April' is 2015-04-06. The last Sunday in March 2015 is 2015-03-29. So which quarter does the week (Mon) 2015-03-30 to (Sun) 2015-04-05 belong to, and why? If you don't know (both how and why), we can't help you reliably.
Plausible working hypothesis
The lessons of Y2K have been forgotten already (why else are two digits used for the year, dammit!).
Quarters run for an integral number of weeks.
Quarters start on a Monday and end on a Sunday.
Quarters remain aligned with the calendar quarters, rather than drifting around the year. (There are 13 weeks in 91 days, and 4 such quarters in a year, but there's a single extra day in an ordinary year and two extra in a leap year, which mean that occasionally you will get a 14-week quarter, to ensure things stay aligned.)
The date for the first date in a quarter will be near 1st January, 1st April, 1st July or 1st October, but the month might be December, March (as in the question), June or September.
The date for the last date in a quarter will be near 31st March, 30th June, 30th September, 31st December, but the month might be April, July, October or January.
By adding 1 modulo 12 (values in the range 1..12, not 0..11) to the start month, you should end up with a month firmly in the calendar quarter.
By subtracting 1 modulo 12 (values in the range 1..12 again) to the end month, you should end up with a month firmly in calendar quarter.
If the data is valid, the 'start + 1' and 'end - 1' months should be in the same quarter.
The early year might be off-by-one if the start date is in December (but that indicates Q1 of the next year).
The end year might be off-by-one if the end date is in January (but that indicates Q4 of the prior year).
More resilient code
Despite the description above, it is possible to write code that detects the quarter despite any or all of the idiosyncrasies of the quarter start and end dates. This code borrows a little from Barmar's answer, but the algorithm is more resilient to the vagaries of the calendar and the quarter start and end dates.
#!/bin/sh
awk '/QUARTERDATE/ {
split($2, b, "/")
split($4, e, "/")
if (b[1] == 12) { q = 1; y = e[3] }
else if (e[1] == 1) { q = 4; y = b[3] }
else
{
if (b[3] != e[3]) {
print "Year mismatch (" $2 " vs " $4 ") in file " FILENAME
exit 1
}
m = int((b[1] + e[1]) / 2)
q = int((m - 1) / 3) + 1
y = e[3]
}
quarter = sprintf("%.4d%.2d", y + 2000, q)
}
/FILENAME/ {
print $3 "_" quarter
# exit
}' "$#"
The calculation for m adds the start month plus one to the end month minus one and then does integer division by two. With the extreme cases already taken care of, this always yields a month number that is in the correct quarter.
The comment in front of the exit associated with FILENAME allows testing more easily. When processing each file separately, as in Barmar's example, that exit is an important optimization. Note that the error message gives an empty file name if the input comes from standard input. (Offhand, I'm not sure how to print the error message to standard error rather than standard output, other than by a platform-specific technique such as print "message" > "/dev/stderr" or print "message" > "/dev/fd/2".)
Given this sample input data (semi-plausible start and end dates for 6 quarters from 2014Q1 through 2015Q2):
QUARTERDATE: 12/30/13 - 03/30/14
FILENAME : LEAD
QUARTERDATE: 03/31/14 - 06/29/14
FILENAME : LEAD
QUARTERDATE: 06/30/14 - 09/28/14
FILENAME : LEAD
QUARTERDATE: 09/29/14 - 12/28/14
FILENAME : LEAD
QUARTERDATE: 12/29/14 - 03/29/15
FILENAME : LEAD
QUARTERDATE: 03/30/15 - 06/29/15
FILENAME : LEAD
The output from this script is:
LEAD_201401
LEAD_201402
LEAD_201403
LEAD_201404
LEAD_201501
LEAD_201502
You can juggle the start and end dates of the quarters within reason and you should still get the required output. But always be wary of calendrical calculations; they are almost invariably harder than you expect.

Related

How to parse CCYY-MM-DDThh:mm:ss[.sss...] date format

As we all know, date parsing in Go has it's quirks*.
However, I have now come up against needing to parse a datetime string in CCYY-MM-DDThh:mm:ss[.sss...] to a valid date in Go.
This CCYY format is a format that seems to be ubiquitous in astronomy, essentially the CC is the current century, so although we're in 2022, the century is the 21st century, meaning the date in CCYY format would be 2122.
How do I parse a date string in this format, when we can't specify a coded layout?
Should I just parse in that format, and subtract one "century" e.g., 2106 becomes 2006 in the parsed datetime...?
Has anyone come up against this niche problem before?
*(I for one would never have been able to remember January 2nd, 3:04:05 PM of 2006, UTC-0700 if it wasn't the exact time of my birth! I got lucky)
The time package does not support parsing centuries. You have to handle it yourself.
Also note that a simple subtraction is not enough, as e.g. the 21st century takes place between January 1, 2001 and December 31, 2100 (the year may start with 20 or 21). If the year ends with 00, you do not have to subtract 100 years.
I would write a helper function to parse such dates:
func parse(s string) (t time.Time, err error) {
t, err = time.Parse("2006-01-02T15:04:05[.000]", s)
if err == nil && t.Year()%100 != 0 {
t = t.AddDate(-100, 0, 0)
}
return
}
Testing it:
fmt.Println(parse("2101-12-31T12:13:14[.123]"))
fmt.Println(parse("2122-10-29T12:13:14[.123]"))
fmt.Println(parse("2100-12-31T12:13:14[.123]"))
fmt.Println(parse("2201-12-31T12:13:14[.123]"))
Which outputs (try it on the Go Playground):
2001-12-31 12:13:14.123 +0000 UTC <nil>
2022-10-29 12:13:14.123 +0000 UTC <nil>
2100-12-31 12:13:14.123 +0000 UTC <nil>
2101-12-31 12:13:14.123 +0000 UTC <nil>
As for remembering the layout's time:
January 2, 15:04:05, 2006 (zone: -0700) is a common order in the US, and in this representation parts are in increasing numerical order: January is month 1, 15 hour is 3PM, year 2006 is 6. So the ordinals are 1, 2, 3, 4, 5, 6, 7.
I for one would never have been able to remember January 2nd, 3:04:05 PM of 2006, UTC-0700 if it wasn't the exact time of my birth! I got lucky.
The reason for the Go time package layout is that it is derived from the Unix (and Unix-like) date command format. For example, on Linux,
$ date
Fri Apr 15 08:20:43 AM EDT 2022
$
Now, count from left to right,
Month = 1
Day = 2
Hour = 3 (or 15 = 12 + 3)
Minute = 4
Second = 5
Year = 6
Note: Rob Pike is an author of The Unix Programming Environment

What does NNN mean in date format <YYMMDDhhmmssNNN><C|D|G|H>?

hi I has date format and I want converted to correct GMT date :
<YYMMDDhhmmssNNN><C|D|G|H>
Sample value on that date:
210204215026000C
I get this explanation for part NNN :
NNN If flag is C or D then NNN is the number of hours relativeto GMT,
if flag is G or H, NNN is the number of quarter hours relative to GMT
C|D|G|H C and G = Ahead of GMT, D and H = Behind GMT
but I did not get how number of hours relative to GMT can present on 3 digits ? it should be in 2 digit as i knew the offset for hours related to GMT is from 0 to 23 , and also what quarter hours relative to GMT mean ?
I want to use Scala or Java.
I don’t know why they set 3 digits aside for the offset. I agree with you that 2 digits suffice for all cases. Maybe they just wanted to be very sure they would never run of out space, and maybe they even overdid this a bit. 3 digits is not a problem as long as the actual values are within the range that java.time.ZoneOffset can handle, +/-18 hours. In your example NNN is 000, so 0 hours from GMT, which certainly is OK and trivial to handle.
A quarter hour is a quarter of an hour. As Salman A mentioned in a comment, 22 quarter hours ahead of Greenwich means an offset of +05:30, currently used in Sri Lanka and India. If the producer of the string wants to use this option, they can give numbers up to 72 (still comfortably within 2 digits). 18 * 4 = 72, so 18 hours equals 72 quarter hours. To imagine a situation where 2 digits would be too little, think an offset of 25 hours. I wouldn’t think it realistic, on the other hand no one can guarantee that it will never happen.
Java solution: how to parse and convert to GMT time
I am using these constants:
private static final Pattern DATE_PATTERN
= Pattern.compile("(\\d{12})(\\d{3})(\\w)");
private static final DateTimeFormatter FORMATTER
= DateTimeFormatter.ofPattern("uuMMddHHmmss");
private static final int SECONDS_IN_A_QUARTER_HOUR
= Math.toIntExact(Duration.ofHours(1).dividedBy(4).getSeconds());
Parse and convert like this:
String sampleValue = "210204215026000C";
Matcher matcher = DATE_PATTERN.matcher(sampleValue);
if (matcher.matches()) {
LocalDateTime ldt = LocalDateTime.parse(matcher.group(1), FORMATTER);
int offsetAmount = Integer.parseInt(matcher.group(2));
char flag = matcher.group(3).charAt(0);
// offset amount denotes either hours or quarter hours
boolean quarterHours = flag == 'G' || flag == 'H';
boolean negative = flag == 'D' || flag == 'H';
if (negative) {
offsetAmount = -offsetAmount;
}
ZoneOffset offset = quarterHours
? ZoneOffset.ofTotalSeconds(offsetAmount * SECONDS_IN_A_QUARTER_HOUR)
: ZoneOffset.ofHours(offsetAmount);
OffsetDateTime dateTime = ldt.atOffset(offset);
OffsetDateTime gmtDateTime = dateTime.withOffsetSameInstant(ZoneOffset.UTC);
System.out.println("GMT time: " + gmtDateTime);
}
else {
System.out.println("Invalid value: " + sampleValue);
}
Output is:
GMT time: 2021-02-04T21:50:26Z
I think my code covers all valid cases. You will probably want to validate that the flag is indeed C, D, G or H, and also handle the potential DateTimeException and NumberFormatException from the parsing and creating the ZoneOffset (NumberFormatException should not happen).

Start of previous year

**DATE FROM:**
def format=new java.text.SimpleDateFormat("yyyyMMdd")
def cal=Calendar.getInstance()
cal.get(Calendar.YEAR);
cal.set(Calendar.MONTH, 0);
cal.set(Calendar.DAY_OF_MONTH, 31);
[format.format(cal.getTime())]
**DATE TO:**
def format=new java.text.SimpleDateFormat("yyyyMMdd")
def cal=Calendar.getInstance()
cal.add(Calendar.DAY_OF_MONTH,-cal.get(Calendar.DAY_OF_MONTH))
[format.format(cal.getTime())]
when year changes (2020 - 2021) - it confuses January of previous year with January of this year
I have to correct so that in January (December reporting) it extracts data for period 31.01 - 31.12. of previous year.
The job was wrong because it extracted data from 31.01.2021 to 31.12.2020
// retrieve details of the current date
def cal = Calendar.instance;
def currentYear = cal.get(Calendar.YEAR);
def currentMonth = cal.get(Calendar.MONTH);
// set the instance to the start of the previous month
if ( currentMonth == 0 ) {
cal.set(currentYear-1, 11, 1);
} else {
cal.set(currentYear, (currentMonth-1), 1);
}
// extract the date, and format to a string
Date previousMonthStart = cal.time;
String previousMonthStartFormatted = previousMonthStart.format('yyyy-MM-dd');
If all you are looking for is the start of the previous year as in your title then the following code:
import java.time.*
def startOfPreviousYear = LocalDate.now()
.withDayOfMonth(1)
.withMonth(1)
.minusYears(1)
println startOfPreviousYear
def againStartingFromJanuary = LocalDate.of(2021, 1, 15)
.withDayOfMonth(1)
.withMonth(1)
.minusYears(1)
println againStartingFromJanuary
demonstrates one way to accomplish this. When run, this prints (with now being today's date of 2021.Mar.10):
─➤ groovy solution.groovy
2020-01-01
2020-01-01
updated after comments
You can get the end of previous and current months with something like this:
import java.time.*
def endOfPreviousMonth = LocalDate.now()
.withDayOfMonth(1)
.minusDays(1)
def endOfCurrentMonth = LocalDate.now()
.withDayOfMonth(1)
.plusMonths(1)
.minusDays(1)
println "end of last month: ${endOfPreviousMonth}"
println "end of current month: ${endOfCurrentMonth}"
which with current date prints:
end of last month: 2021-02-28
end of current month: 2021-03-31
or if we are in january:
def endOfPreviousMonth = LocalDate.of(2021, 1, 15)
.withDayOfMonth(1)
.minusDays(1)
def endOfCurrentMonth = LocalDate.of(2021, 1, 15)
.withDayOfMonth(1)
.plusMonths(1)
.minusDays(1)
println "end of last month: ${endOfPreviousMonth}"
println "end of current month: ${endOfCurrentMonth}"
which prints:
─➤ groovy solution.groovy
end of last month: 2020-12-31
end of current month: 2021-01-31
In general you should try to, when possible, stay away from using manual date arithmetic when dealing with dates if your target is based on the current date (as in, previous month, next month, three months ago, etc). Use the api:s handed to you by java. The date classes take care of rolling years, rolling months, rolling days, leap years, etc, all that stuff that you really do not want to spend time solving yourself.

To calculate Moving/Rolling back Weekly (7 days) Sum:

Please help to calculate Moving/Rolling back Weekly Sum of Amount($4) based on Distributor wise ($2) and Rolling Date wise.
Want to set vaiable like
RollingStartDate ==01/05/2015 and RollingInterval==7 and RollingEndDate ==08/05/2015
For Example :
1st May 2015 Rolling 7 Days data set would be from 01/05/2015 to 25/04/2015
2nd May 2015 Rolling 7 Days data set would be from 02/05/2015 to 26/04/2015
....................................................................
7th May 2015 Rolling 7 Days data set would be from 07/05/2015 to 01/05/2015
8th May 2015 Rolling 7 Days data set would be from 08/05/2015 to 02/05/2015
Input.csv
Des,Date,Distributor,Amount,Loc
aaa,25/04/2015,abc123,25,bbb
aaa,25/04/2015,xyz456,75,bbb
aaa,26/04/2015,xyz456,50,bbb
aaa,27/04/2015,abc123,250,bbb
aaa,27/04/2015,abc123,100,bbb
aaa,29/04/2015,xyz456,50,bbb
aaa,30/04/2015,abc123,25,bbb
aaa,01/05/2015,xyz456,75,bbb
aaa,01/05/2015,abc123,50,bbb
aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb
Example: 8th May 2015 Rolling 7 Days data set would be from 08/05/2015 to 02/05/2015
aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb
Output for 8th May 2015 Rolling 7 Days data set
RollingDate,Distributor,Amount
08/05/2015,abc123,145
08/05/2015,xyz456,350
I am able to obtain the above output from this command :
awk -F, '{key=$3;b[key]=b[key]+$4} END {for(i in a) print i","b[i]}'
Kindly suggest how to derive weekly split-up data sets then Sum.
Desired Output:
RollingDate,Distributor,Amount
01/05/2015,abc123,450
01/05/2015,xyz456,250
02/05/2015,abc123,450
02/05/2015,xyz456,250
03/05/2015,abc123,450
03/05/2015,xyz456,200
04/05/2015,abc123,130
04/05/2015,xyz456,235
05/05/2015,abc123,130
05/05/2015,xyz456,247
06/05/2015,abc123,162
06/05/2015,xyz456,240
07/05/2015,abc123,137
07/05/2015,xyz456,327
08/05/2015,abc123,145
08/05/2015,xyz456,350
Edit#1
1.
The logic is to find a Sum of Amount is billed to the distributor for the period of 7days range, i.e if i need to calculate sum for 1st May then I need to consider the line items from 1st May,30th Apr,29th Apr,28th Apr,27th Apr,26th Apr and 25th Apr , It is equivalent to 1st May (-) minus 6 days back ... like wise 2nd May rolling date is equal to from 2nd May to 26th May ( 2nd May minus 6 days back ..)
2.
Date format is DD/MM/YYYY - 02/05/2015 is 2nd May
Since the file contains 2 to 3 months deatils , dont want to select the first date (25/04/2015) from file then do minus 6 days back analysis , hence "RollingStartDate" will help from which dates need to consider the data , "RollingInterval" will help to do the analysis for "7 days" moving back or "14 days" moving back or "30 days monthly " moving back analysis.
"RollingEndDate" will help to avoid if actual file contains any future date data availabe , in this case if 09th or 15th may date line items need to be excluded ...
Here's a solution that just excludes dates that don't have 7 days before them instead of requiring a specific start/stop range:
$ cat tst.awk
BEGIN { FS=OFS=","; window=(window?window:7); secsPerDay=24*60*60 }
NR==1 { print "RollingDate", $3, $4; next }
{
endSecs = mktime(gensub(/(..)\/(..)\/(....)/,"\\3 \\2 \\1 0 0 0","",$2))
if (begSecs=="") {
begSecs = endSecs + ((window-1) * secsPerDay)
}
amount[endSecs][$3] += $4
dists[$3]
}
END {
for (currSecs=begSecs; currSecs<=endSecs; currSecs+=secsPerDay) {
for (dayNr=1; dayNr<=window; dayNr++) {
rollSecs = currSecs - ((dayNr-1) * secsPerDay)
for (dist in dists) {
sum[dist] += (rollSecs in amount ? amount[rollSecs][dist] : 0)
}
}
for (dist in dists) {
print strftime("%d/%m/%Y",currSecs), dist, sum[dist]
delete sum[dist]
}
}
}
.
$ awk -f tst.awk file
RollingDate,Distributor,Amount
01/05/2015,xyz456,250
01/05/2015,abc123,450
02/05/2015,xyz456,250
02/05/2015,abc123,450
03/05/2015,xyz456,200
03/05/2015,abc123,450
04/05/2015,xyz456,235
04/05/2015,abc123,130
05/05/2015,xyz456,247
05/05/2015,abc123,130
06/05/2015,xyz456,240
06/05/2015,abc123,162
07/05/2015,xyz456,327
07/05/2015,abc123,137
08/05/2015,xyz456,350
08/05/2015,abc123,145
.
To use some different window size than 7 days, just set it on the command line:
$ awk -v window=5 -f tst.awk file
RollingDate,Distributor,Amount
29/04/2015,xyz456,175
29/04/2015,abc123,375
30/04/2015,xyz456,100
30/04/2015,abc123,375
01/05/2015,xyz456,125
01/05/2015,abc123,425
02/05/2015,xyz456,200
02/05/2015,abc123,100
03/05/2015,xyz456,200
03/05/2015,abc123,100
04/05/2015,xyz456,185
04/05/2015,abc123,130
05/05/2015,xyz456,197
05/05/2015,abc123,105
06/05/2015,xyz456,165
06/05/2015,abc123,87
07/05/2015,xyz456,177
07/05/2015,abc123,62
08/05/2015,xyz456,275
08/05/2015,abc123,120
The above uses GNU awk for true 2D arrays and time functions. Hopefully it's clear enough that you can make any modifications you need to include/exclude specific date ranges.

correct sum of hours in access

I have two columns in an access 2010 database with some calculated field:
time_from time_until calculated_field(time_until-time_from)
10:45 15:00 4:15
13:15 16:00 2:45
11:10 16:00 4:50
08:00 15:00 7:00
08:00 23:00 15:00
Now so far, it is good: calculated field did its job to tell me total hours and mins...
now, I need a sum of a calculated field....
I put in an expression builder: =Sum([time_until]-[time_from])
I guess total sum should give me 33:50... but it gives me some 9:50. why is this happening? Is there a way to fix this?
update:
when I put like this:
=Format(Sum([vrijeme_do]-[vrijeme_od])*24)
I get a decimal point number... which I suppose is correct....
for example, 25hrs and 30mins is shown as 25,5
but, how do I format this 25,5 to look like 25:30?
As #Arvo mentioned in his comment, this is a formatting problem. Your expected result for the sum of calculated_field is 33:50. However that sum is a Date/Time value, and since the number of hours is greater than 24, the day portion of the Date/Time is advanced by 1 and the remainder 9:50 is displayed as the time. Apparently your total is formatted to display only the time portion; the day portion is not displayed.
But the actual Date/Time value for the sum of calculated_field is #12/31/1899 09:50#. You can use a custom function to display that value in your desired format:
? duration_hhnn(#12/31/1899 09:50#)
33:50
This is the function:
Public Function duration_hhnn(ByVal pInput As Date) As String
Dim lngDays As Long
Dim lngMinutes As Long
Dim lngHours As Long
Dim strReturn As String
lngDays = Int(pInput)
lngHours = Hour(pInput)
lngMinutes = Minute(pInput)
lngHours = lngHours + (lngDays * 24)
strReturn = lngHours & ":" & Format(lngMinutes, "00")
duration_hhnn = strReturn
End Function
Note the function returns a string value so you can't do further date arithmetic on it directly.
Similar to the answer from #HansUp, it can be done without VBA code like so
Format(24 * Int(SUM(elapsed_time)) + Hour(SUM(elapsed_time)), "0") & ":" & Format(SUM(elapsed_time), "Nn")
I guess you are trying to show the total in a text box? the correct expression would be =SUM([calculated_field_name]).

Resources