How to combine two files sequentially based on certain conditions in Unix - unix

I am trying to format files in Unix (In this case RHEL).
File 1
AAAAA|AAA|1582|YNYY
BBBBB|BAV|1234|NYYY
File 1 has 1 sample record (row). There are 4 columns in each record. In Column 4 we have 4 status values.
File 2
20190103|W 2019 01
20190203|W 2019 02
20190303|W 2019 03
20190403|W 2019 04
Output has to be as follows:
AAAAA|1582|Y|20190103|W 2019 01
AAAAA|1582|N|20190203|W 2019 02
AAAAA|1582|Y|20190303|W 2019 03
AAAAA|1582|Y|20190403|W 2019 04
BBBBB|1234|N|20190103|W 2019 01
BBBBB|1234|Y|20190203|W 2019 02
BBBBB|1234|Y|20190303|W 2019 03
BBBBB|1234|Y|20190403|W 2019 04
I have tried AWK and Paste but am not able to get the required output.

Using awk
awk -F'|' '{split($4,a,""); b=$1"|"$2"|"$3} { getline < "file2"; for (i in a ) print b"|"a[i]"|"$0 }' < file1`
Demo:
$cat file1 file2
AAAAA|AAA|1582|YNYY
BBBBB|BAV|1234|NYYY
20190103|W 2019 01
20190203|W 2019 02
20190303|W 2019 03
20190403|W 2019 04
$awk -F'|' '{split($4,a,""); b=$1"|"$2"|"$3} { getline < "file2"; for (i in a ) print b"|"a[i]"|"$0 }' < file1
AAAAA|AAA|1582|Y|20190103|W 2019 01
AAAAA|AAA|1582|N|20190103|W 2019 01
AAAAA|AAA|1582|Y|20190103|W 2019 01
AAAAA|AAA|1582|Y|20190103|W 2019 01
BBBBB|BAV|1234|N|20190203|W 2019 02
BBBBB|BAV|1234|Y|20190203|W 2019 02
BBBBB|BAV|1234|Y|20190203|W 2019 02
BBBBB|BAV|1234|Y|20190203|W 2019 02
$
Explanation:
awk -F'|' <-- Set field seprator as |
'{split($4,a,""); <-- Split 4th field and store in array a
b=$1"|"$2"|"$3} <-- Store Column 1-2 in variable b
getline < "file2"; <-- read input from file2 row by row
for (i in a ) print b"|"a[i]"|"$0 <-- Loop through array a and append variable b and input record from file2
Note: When you use getline value of internal variables $0, NF, NR get changed

Related

How can I write a shell scripts that calls another script for each day starting from start date upto current date [duplicate]

This question already has answers here:
How to loop through dates using Bash?
(10 answers)
Closed 1 year ago.
How can I write a shell scripts that calls another script for each day
i.e currently have
#!/bin/sh
./update.sh 2020 02 01
./update.sh 2020 02 02
./update.sh 2020 02 03
but I want to just specify start date (2020 02 01) and let is run update.sh for every day upto current date, but don't know how to manipulate date in shell script.
I made a stab at it, but rather messy, would prefer if it could process date itself.
#!/bin/bash
for j in {4..9}
do
for k in {1..9}
do
echo "update.sh" 2020 0$j 0$k
./update.sh 2020 0$j 0$k
done
done
for j in {10..12}
do
for k in {10..31}
do
echo "update.sh" 2020 $j $k
./update.sh 2020 $j $k
done
done
for j in {1..9}
do
for k in {1..9}
do
echo "update.sh" 2021 0$j 0$k
./update.sh 2021 0$j 0$k
done
done
for j in {1..9}
do
for k in {10..31}
do
echo "update.sh" 2021 0$j $k
./update.sh 2021 0$j $k
done
done
You can use date to convert your input dates into seconds in order to compare. Also use date to add one day.
#!/bin/bash
start_date=$(date -I -d "$1") # Input in format yyyy-mm-dd
end_date=$(date -I) # Today in format yyyy-mm-dd
echo "Start: $start_date"
echo "Today: $end_date"
d=$start_date # In case you want start_date for later?
end_d=$(date -d "$end_date" +%s) # End date in seconds
while [ $(date -d "$d" +%s) -le $end_d ]; do # Check dates in seconds
# Replace `echo` in the below with your command/script
echo ${d//-/ } # Output the date but replace - with [space]
d=$(date -I -d "$d + 1 day") # Next day
done
In this example, I use echo but replace this with the path to your update.sh.
Sample output:
[user#server:~]$ ./dateloop.sh 2021-08-29
Start: 2021-08-29
End : 2021-09-20
2021 08 29
2021 08 30
2021 08 31
2021 09 01
2021 09 02
2021 09 03
2021 09 04
2021 09 05
2021 09 06
2021 09 07
2021 09 08
2021 09 09
2021 09 10
2021 09 11
2021 09 12
2021 09 13
2021 09 14
2021 09 15
2021 09 16
2021 09 17
2021 09 18
2021 09 19
2021 09 20

file manipulation unix

cat sample_file.txt(Extracted job info from Control-M)
upctm,pmdw_bip,pmdw_bip_mnt_35-FOLDistAutoRpt,Oct 7 2019 4:45 AM,Oct 7 2019 4:45 AM,1,1,Oct 6 2019 12:00 AM,Ended OK,3ppnc
upctm,pmdw_ddm,pmdw_ddm_dum_01-StartProjDCSDemand,Oct 17 2019 4:02 AM,Oct 17 2019 4:02 AM,3,1,Oct 16 2019 12:00 AM,Ended OK,3pqgq
I need to process this file into DB table(Oracle)
Bu I need to make sure that day is 2 number (example 7 to 07).
(example: Oct 07 2019 6:32 AM)
I used this command to get all the date in every line:
cat sample_file.txt | grep "," | while read line
do
l_start_date=`echo $line|cut -d ',' -f4`
l_end_date=`echo $line|cut -d ',' -f5`
l_order_date=`echo $line|cut -d ',' -f8`
echo $l_start_date
echo $l_end_date
echo $l_order_date
done
Output:
Oct 7 2019 4:45 AM
Oct 7 2019 4:45 AM
Oct 6 2019 12:00 AM
Oct 17 2019 4:02 AM
Oct 17 2019 4:02 AM
Oct 16 2019 12:00 AM
expected output:
FROM: Oct 7 2019 6:32 AM
To: Oct 07 2019 6:32 AM
I used this sed command but it add also to 2 number day (17)
sed command sed 's|,Oct |,Oct 0|g' sample_file.txt
Oct 17 was change to Oct 017
upctm,pmdw_bip,pmdw_bip_mnt_35-FOLDistAutoRpt,Oct 07 2019 4:45 AM,Oct 07 2019 4:45 AM,1,1,Oct 06 2019 12:00 AM,Ended OK,3ppnc
upctm,pmdw_ddm,pmdw_ddm_dum_01-StartProjDCSDemand,Oct 017 2019 4:02 AM,Oct 017 2019 4:02 AM,3,1,Oct 016 2019 12:00 AM,Ended OK,3pqgq
I wish it was easier, but I only managed the following:
awk.f:
function fmt(s) {
split(s,a," "); a[2]=substr(a[2]+100,2)
return a[1] " " a[2] " "a[3] " " a[4] " " a[5]
}
BEGIN {FS=",";OFS=","}
{gsub(/ +/," ");
$4=fmt($4); $5=fmt($5); $8=fmt($8);
print}
This is a little awk script that first removes superfluous blanks and then picks out particular columns (4,5 and 8) and reformats the second part of each date string into a two-digit number.
You run the script like this:
awk -f f.awk sample_file.txt
output:
upctm,pmdw_aud,pmdw_aud_ext_06-GAPAnalysYTD,Oct 07 2019 6:32 AM,Oct 07 2019 6:32 AM,17,17,Oct 06 2019 12:00 AM,Ended OK,3pu9v
upctm,pmdw_ddm,pmdw_ddm_dum_01-StartProjDCSDemand,Oct 07 2019 4:02 AM,Oct 07 2019 4:02 AM,3,1,Oct 06 2019 12:00 AM,Ended OK,3pqgq
upctm,pmdw_bip,pmdw_bip_mnt_35-FOLDistAutoRpt,Oct 07 2019 4:45 AM,Oct 07 2019 4:45 AM,1,1,Oct 06 2019 12:00 AM,Ended OK,3ppnc
With a fixed locale, you can make a fixed replacement like
sed -r 's/(Jan|Feb|Oct|Whatever) ([1-9]) /\1 0\2 /g' sample_file.txt

Time series SparkR missing value

I'm working with SparkR on Time Series and I have a question.
After some operation I got something like this, where DayHour represent the Day and the Hour of the ID's Value.
DayHour ID Value
01 00 4704 10
01 01 4705 11
.
.
.
04 23 4705 12
The problem is that I have some gap like 01 01, 01 02 missing
DayHour ID Value
01 00 4704 13
01 03 4704 12
I have to fill the gap in the whole dataset with :
DayHour ID Value
01 00 4704 13
01 01 4704 0
01 02 4704 0
01 03 4704 12
Foreach ID I have to fill the gap with the DayHour missing, ID and Value = 0
Solution both in R SparkR would be usefull.
I represented your data in data frame df_r
>df_r <- data.frame(DayHour=c("01 00","01 01","01 02","01 03","01 06","01 07"),
ID = c(4704,4705,4705,4706,4706,4706),Value=c(10,11,12,13,14,15))
> df_r
DayHour ID Value
1 01 00 4704 10
2 01 01 4705 11
3 01 02 4705 12
4 01 03 4706 13
5 01 06 4706 14
6 01 07 4706 15
where the missing hours are 01 04 and 01 05
#Removing white spaces
>df_r$DayHour <- sub(" ", "", df_r$DayHour)
# create dummy all the 'dayhour' in sequence
x=c(00:23)
y=01:04
all_day_hour <- data.frame(Hour = rep(x,4), Day = rep(y,each=24))
all_day_hour$Hour <- sprintf("%02d", all_day_hour$Hour)
all_day_hour$Day <- sprintf("%02d", all_day_hour$Day)
all_day_hour_1 <- transform(all_day_hour,DayHour=paste0(Day,Hour))
all_day_hour_1 <- all_day_hour_1[c(3)]
# using for loop to filter out by each id
>library(dplyr)
>library(forecast)
>df.new <- data.frame()
>factors=unique(df_r$ID)
>for(i in 1:length(factors))
{
df_r1 <- filter(df_r, ID == factors[i])
#Merge
df_data1<- merge(df_r1, all_day_hour_1, by="DayHour", all=TRUE)
df_data1$Value[which(is.na(df_data1$Value))] <- 0
df.new <- rbind(df.new, df_data1)
}

Search for text between two time frame using sed

I have log files with time stamps. I want to search for text between two time stamps using sed even if the first time stamp or the last time stamp are not present.
For example, if I search between 9:30 and 9:40 then it should return text even if neither 9:30 nor 9:40 is there but the time stamp is between 9:30 and 9:40.
I am using a sed one liner:
sed -n '/7:30:/,/7:35:/p' xyz.log
But it only returns data if both the time stamps are present; it will print everything if one of the time stamp are missing. And if the time is in 12 hr format it will pull data for both AM and PM.
Additionally, I have different time stamp formats for different log files so I need a generic command.
Here are some time format examples:
<Jan 27, 2013 12:57:16 AM MST>
Jan 29, 2013 8:58:12 AM
2013-01-31 06:44:04,883
Some of them contain AM/PM i.e. 12 hr format and others contain 24 hr format so I have to account for that as well.
I have tried this as well but it doesn't work:
sed -n -e '/^2012-07-19 18:22:48/,/2012-07-23 22:39:52/p' history.log
With the serious medley of time formats you have to parse, sed is not the correct tool to use. I'd automatically reach for Perl, but Python would do too, and you probably could do it in awk if you put your mind to it. You need to normalize the time formats (you don't say anything about date, so I assume you're working only with the time portion).
#!/usr/bin/env perl
use strict;
use warnings;
use constant debug => 0;
my $lo = "09:30";
my $hi = "09:40";
my $lo_tm = to_minutes($lo);
my $hi_tm = to_minutes($hi);
while (<>)
{
print "Read: $_" if debug;
if (m/\D\d\d?:\d\d:\d\d/)
{
my $tm = normalize_hhmm($_);
print "Normalized: $tm\n" if debug;
print $_ if ($tm >= $lo_tm && $tm<= $hi_tm);
}
}
sub to_minutes
{
my($val) = #_;
my($hh, $mm) = split /:/, $val;
if ($hh < 0 || $hh > 24 || $mm < 0 || $mm >= 60 || ($hh == 24 && $mm != 0))
{
print STDERR "to_minutes(): garbage = $val\n";
return undef;
}
return $hh * 60 + $mm;
}
sub normalize_hhmm
{
my($line) = #_;
my($hhmm, $ampm) = $line =~ m/\D(\d\d?:\d\d):\d\d\s*(AM|PM|am|pm)?/;
my $tm = to_minutes($hhmm);
if (defined $ampm)
{
if ($ampm =~ /(am|AM)/)
{
$tm -= 12 * 60 if ($tm >= 12 * 60);
}
else
{
$tm += 12 * 60 if ($tm < 12 * 60);
}
}
return $tm;
}
I used the sample data:
<Jan 27, 2013 12:57:16 AM MST>
Jan 29, 2013 8:58:12 AM
2013-01-31 06:44:04,883
Feb 2 00:00:00 AM
Feb 2 00:59:00 AM
Feb 2 01:00:00 AM
Feb 2 01:00:00 PM
Feb 2 11:00:00 AM
Feb 2 11:00:00 PM
Feb 2 11:59:00 AM
Feb 2 11:59:00 PM
Feb 2 12:00:00 AM
Feb 2 12:00:00 PM
Feb 2 12:59:00 AM
Feb 2 12:59:00 PM
Feb 2 00:00:00
Feb 2 00:59:00
Feb 2 01:00:00
Feb 2 11:59:59
Feb 2 12:00:00
Feb 2 12:59:59
Feb 2 13:00:00
Feb 2 09:31:00
Feb 2 09:35:23
Feb 2 09:36:23
Feb 2 09:37:23
Feb 2 09:35:00
Feb 2 09:40:00
Feb 2 09:40:59
Feb 2 09:41:00
Feb 2 23:00:00
Feb 2 23:59:00
Feb 2 24:00:00
Feb 3 09:30:00
Feb 3 09:40:00
and it produced what I consider the correct output:
Feb 2 09:31:00
Feb 2 09:35:23
Feb 2 09:36:23
Feb 2 09:37:23
Feb 2 09:35:00
Feb 2 09:40:00
Feb 2 09:40:59
Feb 3 09:30:00
Feb 3 09:40:00
I'm sure this isn't the only way to do the processing; it seems to work, though.
If you need to do date analysis, then you need to use one of the date or time manipulation packages from CPAN to deal with the problems. The code above also hard codes the times in the script. You'd probably want to handle them as command line arguments, which is perfectly doable, but isn't scripted above.

GridView Layout/Output

I have an Website using ASP.Net 2.0 with SQL Server as Database and C# 2005 as programming language. In one of the pages I have a GridView with following layout.
Date -> Time -> QtyUsed
The sample values are as follows: (Since this GridView/Report is generated for a specific month only, I have extracted and displaying only the Day part of the date ignoring the month and year part.
01 -> 09:00 AM -> 05
01 -> 09:30 AM -> 03
01 -> 10:00 AM -> 09
02 -> 09:00 AM -> 10
02 -> 09:30 AM -> 09
02 -> 10:00 AM -> 11
03 -> 09:00 AM -> 08
03 -> 09:30 AM -> 09
03 -> 10:00 AM -> 12
Now the user wants the layout to be like:
Time 01 02 03 04 05 06 07 08 09
-------------------------------------------------------------------------
09:00 AM -> 05 10 08
09:30 AM -> 03 09 09
10:00 AM -> 09 11 12
The main requirement is that the days should be in the column header from 01 to the last date (the reason why I extracted only the day part from the date). The Timeslots should be down as rows.
From my experience with Excel, the idea of Transpose comes to my mind to solve this, but I am not sure.
Please help me in solving this problem.
Thank you.
Lalit Kumar Barik
You will have to generate the dataset accordingly. I am guessing you are doing some kind of grouping based on the hour so generate a column for each hour of the day and populate the dataset accordingly.
In SQL Server, there is a PIVOT function that may be of use.
The MSDN article specifies usage and gives an example.
The example is as follows
Table DailyIncome looks like
VendorId IncomeDay IncomeAmount
---------- ---------- ------------
SPIKE FRI 100
SPIKE MON 300
FREDS SUN 400
SPIKE WED 500
...
To show
VendorId MON TUE WED THU FRI SAT SUN
---------- ----------- ----------- ----------- ----------- ----------- ----------- -----------
FREDS 500 350 500 800 900 500 400
JOHNS 300 600 900 800 300 800 600
SPIKE 600 150 500 300 200 100 400
Use this select
SELECT * FROM DailyIncome
PIVOT( AVG( IncomeAmount )
FOR IncomeDay IN
([MON],[TUE],[WED],[THU],[FRI],[SAT],[SUN])) AS AvgIncomePerDay
Alternatively, you could select all of the data from DailyIncome and build a DataTable with the data pivoted. Here is an example.

Resources