file manipulation unix - unix

cat sample_file.txt(Extracted job info from Control-M)
upctm,pmdw_bip,pmdw_bip_mnt_35-FOLDistAutoRpt,Oct 7 2019 4:45 AM,Oct 7 2019 4:45 AM,1,1,Oct 6 2019 12:00 AM,Ended OK,3ppnc
upctm,pmdw_ddm,pmdw_ddm_dum_01-StartProjDCSDemand,Oct 17 2019 4:02 AM,Oct 17 2019 4:02 AM,3,1,Oct 16 2019 12:00 AM,Ended OK,3pqgq
I need to process this file into DB table(Oracle)
Bu I need to make sure that day is 2 number (example 7 to 07).
(example: Oct 07 2019 6:32 AM)
I used this command to get all the date in every line:
cat sample_file.txt | grep "," | while read line
do
l_start_date=`echo $line|cut -d ',' -f4`
l_end_date=`echo $line|cut -d ',' -f5`
l_order_date=`echo $line|cut -d ',' -f8`
echo $l_start_date
echo $l_end_date
echo $l_order_date
done
Output:
Oct 7 2019 4:45 AM
Oct 7 2019 4:45 AM
Oct 6 2019 12:00 AM
Oct 17 2019 4:02 AM
Oct 17 2019 4:02 AM
Oct 16 2019 12:00 AM
expected output:
FROM: Oct 7 2019 6:32 AM
To: Oct 07 2019 6:32 AM
I used this sed command but it add also to 2 number day (17)
sed command sed 's|,Oct |,Oct 0|g' sample_file.txt
Oct 17 was change to Oct 017
upctm,pmdw_bip,pmdw_bip_mnt_35-FOLDistAutoRpt,Oct 07 2019 4:45 AM,Oct 07 2019 4:45 AM,1,1,Oct 06 2019 12:00 AM,Ended OK,3ppnc
upctm,pmdw_ddm,pmdw_ddm_dum_01-StartProjDCSDemand,Oct 017 2019 4:02 AM,Oct 017 2019 4:02 AM,3,1,Oct 016 2019 12:00 AM,Ended OK,3pqgq

I wish it was easier, but I only managed the following:
awk.f:
function fmt(s) {
split(s,a," "); a[2]=substr(a[2]+100,2)
return a[1] " " a[2] " "a[3] " " a[4] " " a[5]
}
BEGIN {FS=",";OFS=","}
{gsub(/ +/," ");
$4=fmt($4); $5=fmt($5); $8=fmt($8);
print}
This is a little awk script that first removes superfluous blanks and then picks out particular columns (4,5 and 8) and reformats the second part of each date string into a two-digit number.
You run the script like this:
awk -f f.awk sample_file.txt
output:
upctm,pmdw_aud,pmdw_aud_ext_06-GAPAnalysYTD,Oct 07 2019 6:32 AM,Oct 07 2019 6:32 AM,17,17,Oct 06 2019 12:00 AM,Ended OK,3pu9v
upctm,pmdw_ddm,pmdw_ddm_dum_01-StartProjDCSDemand,Oct 07 2019 4:02 AM,Oct 07 2019 4:02 AM,3,1,Oct 06 2019 12:00 AM,Ended OK,3pqgq
upctm,pmdw_bip,pmdw_bip_mnt_35-FOLDistAutoRpt,Oct 07 2019 4:45 AM,Oct 07 2019 4:45 AM,1,1,Oct 06 2019 12:00 AM,Ended OK,3ppnc

With a fixed locale, you can make a fixed replacement like
sed -r 's/(Jan|Feb|Oct|Whatever) ([1-9]) /\1 0\2 /g' sample_file.txt

Related

How can I write a shell scripts that calls another script for each day starting from start date upto current date [duplicate]

This question already has answers here:
How to loop through dates using Bash?
(10 answers)
Closed 1 year ago.
How can I write a shell scripts that calls another script for each day
i.e currently have
#!/bin/sh
./update.sh 2020 02 01
./update.sh 2020 02 02
./update.sh 2020 02 03
but I want to just specify start date (2020 02 01) and let is run update.sh for every day upto current date, but don't know how to manipulate date in shell script.
I made a stab at it, but rather messy, would prefer if it could process date itself.
#!/bin/bash
for j in {4..9}
do
for k in {1..9}
do
echo "update.sh" 2020 0$j 0$k
./update.sh 2020 0$j 0$k
done
done
for j in {10..12}
do
for k in {10..31}
do
echo "update.sh" 2020 $j $k
./update.sh 2020 $j $k
done
done
for j in {1..9}
do
for k in {1..9}
do
echo "update.sh" 2021 0$j 0$k
./update.sh 2021 0$j 0$k
done
done
for j in {1..9}
do
for k in {10..31}
do
echo "update.sh" 2021 0$j $k
./update.sh 2021 0$j $k
done
done
You can use date to convert your input dates into seconds in order to compare. Also use date to add one day.
#!/bin/bash
start_date=$(date -I -d "$1") # Input in format yyyy-mm-dd
end_date=$(date -I) # Today in format yyyy-mm-dd
echo "Start: $start_date"
echo "Today: $end_date"
d=$start_date # In case you want start_date for later?
end_d=$(date -d "$end_date" +%s) # End date in seconds
while [ $(date -d "$d" +%s) -le $end_d ]; do # Check dates in seconds
# Replace `echo` in the below with your command/script
echo ${d//-/ } # Output the date but replace - with [space]
d=$(date -I -d "$d + 1 day") # Next day
done
In this example, I use echo but replace this with the path to your update.sh.
Sample output:
[user#server:~]$ ./dateloop.sh 2021-08-29
Start: 2021-08-29
End : 2021-09-20
2021 08 29
2021 08 30
2021 08 31
2021 09 01
2021 09 02
2021 09 03
2021 09 04
2021 09 05
2021 09 06
2021 09 07
2021 09 08
2021 09 09
2021 09 10
2021 09 11
2021 09 12
2021 09 13
2021 09 14
2021 09 15
2021 09 16
2021 09 17
2021 09 18
2021 09 19
2021 09 20

How to combine two files sequentially based on certain conditions in Unix

I am trying to format files in Unix (In this case RHEL).
File 1
AAAAA|AAA|1582|YNYY
BBBBB|BAV|1234|NYYY
File 1 has 1 sample record (row). There are 4 columns in each record. In Column 4 we have 4 status values.
File 2
20190103|W 2019 01
20190203|W 2019 02
20190303|W 2019 03
20190403|W 2019 04
Output has to be as follows:
AAAAA|1582|Y|20190103|W 2019 01
AAAAA|1582|N|20190203|W 2019 02
AAAAA|1582|Y|20190303|W 2019 03
AAAAA|1582|Y|20190403|W 2019 04
BBBBB|1234|N|20190103|W 2019 01
BBBBB|1234|Y|20190203|W 2019 02
BBBBB|1234|Y|20190303|W 2019 03
BBBBB|1234|Y|20190403|W 2019 04
I have tried AWK and Paste but am not able to get the required output.
Using awk
awk -F'|' '{split($4,a,""); b=$1"|"$2"|"$3} { getline < "file2"; for (i in a ) print b"|"a[i]"|"$0 }' < file1`
Demo:
$cat file1 file2
AAAAA|AAA|1582|YNYY
BBBBB|BAV|1234|NYYY
20190103|W 2019 01
20190203|W 2019 02
20190303|W 2019 03
20190403|W 2019 04
$awk -F'|' '{split($4,a,""); b=$1"|"$2"|"$3} { getline < "file2"; for (i in a ) print b"|"a[i]"|"$0 }' < file1
AAAAA|AAA|1582|Y|20190103|W 2019 01
AAAAA|AAA|1582|N|20190103|W 2019 01
AAAAA|AAA|1582|Y|20190103|W 2019 01
AAAAA|AAA|1582|Y|20190103|W 2019 01
BBBBB|BAV|1234|N|20190203|W 2019 02
BBBBB|BAV|1234|Y|20190203|W 2019 02
BBBBB|BAV|1234|Y|20190203|W 2019 02
BBBBB|BAV|1234|Y|20190203|W 2019 02
$
Explanation:
awk -F'|' <-- Set field seprator as |
'{split($4,a,""); <-- Split 4th field and store in array a
b=$1"|"$2"|"$3} <-- Store Column 1-2 in variable b
getline < "file2"; <-- read input from file2 row by row
for (i in a ) print b"|"a[i]"|"$0 <-- Loop through array a and append variable b and input record from file2
Note: When you use getline value of internal variables $0, NF, NR get changed

Convert date with Time Zone formats in R

I have my dates in the following format :- Wed Apr 25 2018 00:00:00 GMT-0700 (Pacific Standard Time) or 43167 or Fri May 18 2018 00:00:00 GMT-0700 (PDT) all mixed in 1 column. What would be the easiest way to convert all of these in a simple YYYY-mm-dd (2018-04-13) format? Here is the column:
dates <- c('Fri May 18 2018 00:00:00 GMT-0700 (PDT)',
'43203',
'Wed Apr 25 2018 00:00:00 GMT-0700 (Pacific Standard Time)',
'43167','43201',
'Fri May 18 2018 00:00:00 GMT-0700 (PDT)',
'Tue May 29 2018 00:00:00 GMT-0700 (Pacific Standard Time)',
'Tue May 01 2018 00:00:00 GMT-0700 (PDT)',
'Fri May 25 2018 00:00:00 GMT-0700 (Pacific Standard Time)',
'Fri Apr 06 2018 00:00:00 GMT-0700 (PDT)','43173')
Expected format:2018-05-18, 2018-04-13, 2018-04-25, ...
I believe similar questions have been asked several times before. However, there
is a crucial point which needs special attention:
What is the origin for the dates given as integer (or as character string which can be converted to integer to be exact)?
If the data is imported from the Windows version of Excel, origin = "1899-12-30" has to be used. For details, see the Example section in help(as.Date) and the Other Applications section of the R Help Desk article by Gabor Grothendieck and Thomas Petzoldt.
For conversion of the date time strings, the mdy_hms() function from the lubridate package is used. In addition, I am using data.table syntax for its conciseness:
library(data.table)
data.table(dates)[!dates %like% "^\\d+$", new_date := as.Date(lubridate::mdy_hms(dates))][
is.na(new_date), new_date := as.Date(as.integer(dates), origin = "1899-12-30")][]
dates new_date
1: Fri May 18 2018 00:00:00 GMT-0700 (PDT) 2018-05-18
2: 43203 2018-04-13
3: Wed Apr 25 2018 00:00:00 GMT-0700 (Pacific Standard Time) 2018-04-25
4: 43167 2018-03-08
5: 43201 2018-04-11
6: Fri May 18 2018 00:00:00 GMT-0700 (PDT) 2018-05-18
7: Tue May 29 2018 00:00:00 GMT-0700 (Pacific Standard Time) 2018-05-29
8: Tue May 01 2018 00:00:00 GMT-0700 (PDT) 2018-05-01
9: Fri May 25 2018 00:00:00 GMT-0700 (Pacific Standard Time) 2018-05-25
10: Fri Apr 06 2018 00:00:00 GMT-0700 (PDT) 2018-04-06
11: 43173 2018-03-14
Apparently, the assumption to choose the origin which belongs to the Windows version of Excel seems to hold.
If only a vector of Date values is required:
data.table(dates)[!dates %like% "^\\d+$", new_date := as.Date(lubridate::mdy_hms(dates))][
is.na(new_date), new_date := as.Date(as.integer(dates), origin = "1899-12-30")][, new_date]
[1] "2018-05-18" "2018-04-13" "2018-04-25" "2018-03-08" "2018-04-11" "2018-05-18"
[7] "2018-05-29" "2018-05-01" "2018-05-25" "2018-04-06" "2018-03-14"

Converting UTC Time to Local Time with Days of Week and Date Included

I have the following 2 columns as part of a larger data frame. The Timezone_Offset is the difference in hours for the local time (US West Coast in the data I'm looking at). In other words, UTC + Offset = Local Time.
I'm looking to convert the UTC time to the local time, while also correctly changing the day of the week and date, if necessary. For instance, here are the first 5 rows of the two columns.
UTC Timezone_Offset
Sun Apr 08 02:42:03 +0000 2012 -7
Sun Jul 01 03:27:20 +0000 2012 -7
Wed Jul 11 04:40:18 +0000 2012 -7
Sat Nov 17 01:31:36 +0000 2012 -8
Sun Apr 08 20:50:30 +0000 2012 -7
Things get tricky when the day of the week and date also have to be changed. For instance, looking at the first row, the local time should be Sat Apr 07 19:42:03 +0000 2012. In the second row, the month also has to be changed.
Sorry, I'm fairly new to R. Could someone possibly explain how to do this? Thank you so much in advance.
Parse as UTC, then apply the offset in seconds, ie times 60*60 :
data <- read.csv(text="UTC, Timezone_Offset
Sun Apr 08 02:42:03 +0000 2012, -7
Sun Jul 01 03:27:20 +0000 2012, -7
Wed Jul 11 04:40:18 +0000 2012, -7
Sat Nov 17 01:31:36 +0000 2012, -8
Sun Apr 08 20:50:30 +0000 2012, -7", stringsAsFactors=FALSE)
data$pt <- as.POSIXct(strptime(data$UTC, "%a %b %d %H:%M:%S %z %Y", tz="UTC"))
data$local <- data$pt + data$Timezone_Offset*60*60
Result:
> data[,3:4]
pt local
1 2012-04-08 02:42:03 2012-04-07 19:42:03
2 2012-07-01 03:27:20 2012-06-30 20:27:20
3 2012-07-11 04:40:18 2012-07-10 21:40:18
4 2012-11-17 01:31:36 2012-11-16 17:31:36
5 2012-04-08 20:50:30 2012-04-08 13:50:30
>

Search for text between two time frame using sed

I have log files with time stamps. I want to search for text between two time stamps using sed even if the first time stamp or the last time stamp are not present.
For example, if I search between 9:30 and 9:40 then it should return text even if neither 9:30 nor 9:40 is there but the time stamp is between 9:30 and 9:40.
I am using a sed one liner:
sed -n '/7:30:/,/7:35:/p' xyz.log
But it only returns data if both the time stamps are present; it will print everything if one of the time stamp are missing. And if the time is in 12 hr format it will pull data for both AM and PM.
Additionally, I have different time stamp formats for different log files so I need a generic command.
Here are some time format examples:
<Jan 27, 2013 12:57:16 AM MST>
Jan 29, 2013 8:58:12 AM
2013-01-31 06:44:04,883
Some of them contain AM/PM i.e. 12 hr format and others contain 24 hr format so I have to account for that as well.
I have tried this as well but it doesn't work:
sed -n -e '/^2012-07-19 18:22:48/,/2012-07-23 22:39:52/p' history.log
With the serious medley of time formats you have to parse, sed is not the correct tool to use. I'd automatically reach for Perl, but Python would do too, and you probably could do it in awk if you put your mind to it. You need to normalize the time formats (you don't say anything about date, so I assume you're working only with the time portion).
#!/usr/bin/env perl
use strict;
use warnings;
use constant debug => 0;
my $lo = "09:30";
my $hi = "09:40";
my $lo_tm = to_minutes($lo);
my $hi_tm = to_minutes($hi);
while (<>)
{
print "Read: $_" if debug;
if (m/\D\d\d?:\d\d:\d\d/)
{
my $tm = normalize_hhmm($_);
print "Normalized: $tm\n" if debug;
print $_ if ($tm >= $lo_tm && $tm<= $hi_tm);
}
}
sub to_minutes
{
my($val) = #_;
my($hh, $mm) = split /:/, $val;
if ($hh < 0 || $hh > 24 || $mm < 0 || $mm >= 60 || ($hh == 24 && $mm != 0))
{
print STDERR "to_minutes(): garbage = $val\n";
return undef;
}
return $hh * 60 + $mm;
}
sub normalize_hhmm
{
my($line) = #_;
my($hhmm, $ampm) = $line =~ m/\D(\d\d?:\d\d):\d\d\s*(AM|PM|am|pm)?/;
my $tm = to_minutes($hhmm);
if (defined $ampm)
{
if ($ampm =~ /(am|AM)/)
{
$tm -= 12 * 60 if ($tm >= 12 * 60);
}
else
{
$tm += 12 * 60 if ($tm < 12 * 60);
}
}
return $tm;
}
I used the sample data:
<Jan 27, 2013 12:57:16 AM MST>
Jan 29, 2013 8:58:12 AM
2013-01-31 06:44:04,883
Feb 2 00:00:00 AM
Feb 2 00:59:00 AM
Feb 2 01:00:00 AM
Feb 2 01:00:00 PM
Feb 2 11:00:00 AM
Feb 2 11:00:00 PM
Feb 2 11:59:00 AM
Feb 2 11:59:00 PM
Feb 2 12:00:00 AM
Feb 2 12:00:00 PM
Feb 2 12:59:00 AM
Feb 2 12:59:00 PM
Feb 2 00:00:00
Feb 2 00:59:00
Feb 2 01:00:00
Feb 2 11:59:59
Feb 2 12:00:00
Feb 2 12:59:59
Feb 2 13:00:00
Feb 2 09:31:00
Feb 2 09:35:23
Feb 2 09:36:23
Feb 2 09:37:23
Feb 2 09:35:00
Feb 2 09:40:00
Feb 2 09:40:59
Feb 2 09:41:00
Feb 2 23:00:00
Feb 2 23:59:00
Feb 2 24:00:00
Feb 3 09:30:00
Feb 3 09:40:00
and it produced what I consider the correct output:
Feb 2 09:31:00
Feb 2 09:35:23
Feb 2 09:36:23
Feb 2 09:37:23
Feb 2 09:35:00
Feb 2 09:40:00
Feb 2 09:40:59
Feb 3 09:30:00
Feb 3 09:40:00
I'm sure this isn't the only way to do the processing; it seems to work, though.
If you need to do date analysis, then you need to use one of the date or time manipulation packages from CPAN to deal with the problems. The code above also hard codes the times in the script. You'd probably want to handle them as command line arguments, which is perfectly doable, but isn't scripted above.

Resources