How to sum start time in date format and run time in milliseconds to find end time in unix script? - unix

I have a file having a column with start time in the format shown below.
2019-10-30T08:04:30Z
2019-10-30T08:04:25Z
2019-10-30T08:04:30Z
Also I have a file having run time of the jobs executed in the format of milliseconds.
2647ms
360ms
10440ms
.
.
.
How do I add the respective rows in both columns in those files and produce the end time as a result in seperate file?

paste start_time.txt execution_time.txt | while IFS="$(printf '\t')" read -r f1 f2
do
# convert execution_time in seconds from ms
execution_time_ms="$(echo $f2 | cut -d'm' -f1)"
execution_time_ms="$(echo $execution_time_ms | cut -d's' -f1)"
execution_time_s=$(awk "BEGIN {printf \"%.10f\",$execution_time_ms/1000}")
# create dateformat string to add seconds
date_format=$f1
date_format+=" +$execution_time_s"
date_format+="seconds"
# create end date using dateformat string
end_time=`date --date="$date_format" +'%Y-%m-%d %H:%M:%S'`
end_time="${end_time/ /T}"
end_time+='Z'
# append end time
echo $end_time >> end_time.txt
done
This is a script which solves your problem.
I have used 3 files as per you mentioned.
start_time.txt (Input file)
execution_time.txt (Input file)
end_time.txt (Output file)

Related

Command to perform full outer join with duplicate entries in key/join column

I have three files. I need to join them based on one column and perform some transformations.
file1.dat (Column 1 is used for joining)
123,is1,ric1,col1,smbc1
123,is2,ric1,col1,smbc1
234,is3,ric3,col3,smbc2
345,is4,ric4,,smbc2
345,is4,,col5,smbc2
file2.dat (Column 1 is used for joining)
123,abc
234,bcd
file3.dat (Column 4 is used for joining)
r0c1,r0c2,r0c3,123,r0c5,r0c6,r0c7,r0c8
r2c1,r2c2,r2c3,123,r2c5,r2c6,r2c7,r2c8
r3c1,r3c2,r3c3,234,r3c5,r3c6,r3c7,r3c8
r4c1,r4c2,r4c3,345,r4c5,r4c6,r4c7,r4c8
Expected Output (output.dat)
123,r0c5,is1,ric1,smbc1,abc,r0c8,r0c6,col1,r0c7,r0c1,r0c2,r0c3
123,r0c5,is2,ric1,smbc1,abc,r0c8,r0c6,col1,r0c7,r0c1,r0c2,r0c3
123,r2c5,is1,ric1,smbc1,abc,r2c8,r2c6,col1,r2c7,r2c1,r2c2,r2c3
123,r2c5,is2,ric1,smbc1,abc,r2c8,r2c6,col1,r2c7,r2c1,r2c2,r2c3
234,r3c5,is3,ric3,smbc2,bcd,r3c8,r3c6,col3,r3c7,r3c1,r3c2,r3c3
345,r4c5,is4,ric4,smbc2,N/A,r4c8,r4c6,N/A,r4c7,r4c1,r4c2,r4c3
345,r4c5,is4,N/A,smbc2,N/A,r4c8,r4c6,col5,r4c7,r4c1,r4c2,r4c3
I wrote the following awk command.
awk '
BEGIN {FS=OFS=","}
FILENAME == ARGV[1] { temp_join_one[$1] = $2"|"$3"|"$4"|"$5; next}
FILENAME == ARGV[2] { exchtbunload[$1] = $2; next}
FILENAME == ARGV[3] { s_temp_join_one = temp_join_one[$4];
split(s_temp_join_one, array_temp_join_one,"|");
v3=(array_temp_join_one[1]==""?"N/A":array_temp_join_one[1]);
v4=(array_temp_join_one[2]==""?"N/A":array_temp_join_one[2]);
v5=(array_temp_join_one[4]==""?"N/A":array_temp_join_one[4]);
v6=(exchtbunload[$4]==""?"N/A":exchtbunload[$4]);
v9=(array_temp_join_one[3]==""?"N/A":array_temp_join_one[3]);
v11=($2=""?"N/A":$2);
print $4, $5, v3, v4, v5, v6, $8, $6, v9, $7, $1, v11, $3 >
"output.dat" }
' file1.dat file2.dat file3.dat
I need to join all three files.
The final output file should have all the values from file3 irrespective of whether they are in other two files and the corresponding columns should be empty(or N/A) if it is not present in other two files. (The order of the columns is not a very big problem. I can use awk to rearrange them.)
But my problem is, as the key is not unique, I am not getting the expected output. My output has only three lines.
I tried to apply the solution suggested using join condition. It works with smaller files. But the files I have are close to 3-5 GB in size. And they are in numerical order and not lexicographical order. Sorting them looks like would take lot of time.
Any suggestion would be helpful.
Thanks in advance.
with join, assuming files are sorted by the key.
$ join -t, -1 1 -2 4 <(join -t, -a1 -a2 -e "N/A" -o1.1,1.2,1.3,1.4,1.5,2.1 file1 file2) \
file3 -o1.1,2.5,1.2,1.3,1.5,1.6,2.8,2.6,1.4,2.7,2.2,2.3
123,r0c5,is1,ric1,smbc1,123,r0c8,r0c6,col1,r0c7,r0c2,r0c3
123,r2c5,is1,ric1,smbc1,123,r2c8,r2c6,col1,r2c7,r2c2,r2c3
123,r0c5,is2,ric1,smbc1,123,r0c8,r0c6,col1,r0c7,r0c2,r0c3
123,r2c5,is2,ric1,smbc1,123,r2c8,r2c6,col1,r2c7,r2c2,r2c3
234,r3c5,is3,ric3,smbc2,234,r3c8,r3c6,col3,r3c7,r3c2,r3c3
345,r4c5,is4,ric4,smbc2,N/A,r4c8,r4c6,N/A,r4c7,r4c2,r4c3
345,r4c5,is4,N/A,smbc2,N/A,r4c8,r4c6,col5,r4c7,r4c2,r4c3
I really like the answer using join, but it does require that the files are sorted by the key column. Here's a version that doesn't have that restriction. Working under the theory that the best tool for doing database-like things is a database, it imports the CSV files into tables of a temporary SQLite database and then runs a SELECT on them to get your desired output:
(edit: Revised version based on new information about the data)
#!/bin/sh
# Usage: ./merge.sh file1.dat file2.dat file3.dat > output.dat
file1=$1
file2=$2
file3=$3
rm -f scratch.db
sqlite3 -batch -noheader -csv -nullvalue "N/A" scratch.db <<EOF | perl -pe 's#(?:^|,)\K""(?=,|$)#N/A#g'
CREATE TABLE file1(f1_1 INTEGER, f1_2, f1_3, f1_4, f1_5);
CREATE TABLE file2(f2_1 INTEGER, f2_2);
CREATE TABLE file3(f3_1, f3_2, f3_3, f3_4 INTEGER, f3_5, f3_6, f3_7, f3_8);
.import $file1 file1
.import $file2 file2
.import $file3 file3
-- Build indexes to speed up joining and sorting gigs of data.
CREATE INDEX file1_idx ON file1(f1_1);
CREATE INDEX file2_idx ON file2(f2_1);
CREATE INDEX file3_idx ON file3(f3_4);
SELECT f3_4, f3_5, f1_2, f1_3, f1_5, f2_2, f3_8, f3_6, f1_4, f3_7, f3_1
, f3_2, f3_3
FROM file3
LEFT JOIN file1 ON f1_1 = f3_4
LEFT JOIN file2 ON f2_1 = f3_4
ORDER BY f3_4;
EOF
rm -f scratch.db
Note: This will use a temporary database file that's going to be the size of all your data and then some because of indexes. If you're space constrained, I have an idea for doing it without temporary files, given the information that the join columns are sorted numerically, but it's enough work that I'm not going to bother unless asked.

Replace column in header of a large .txt file - unix

i need to replace the date in header of a large file. So i have multiple column in header, using |(pipe) as separator, like this:
A|B05|1|xxc|2018/06/29|AC23|SoOn
So i need the same header but with the date(5th column) updated : A|B05|1|xxc|2018/08/29|AC23
Any solutions for me? I tried with awk and sed but both of them carried me errors greater than me. I'm new on this and i really want to understand the solution. So could you please help me?
You can use below command which replaces 5th column from every line with content of newdate variable:
awk -v newdate="2018/08/29" 'BEGIN{FS=OFS="|"}{ $5 = newdate }1' infile > outfile
Explanation
awk -v newdate="2018/08/29" ' # call awk, and set variable newdate
BEGIN{
FS=OFS="|" # set input and output field separator
}
{
$5 = newdate # assign fifth field with a content of variable newdate
}1 # 1 at the end does default operation
# print current line/row/record, that is print $0
' infile > outfile
If you want to skip first line incase if you have header then use FNR>1
awk -v newdate="2018/08/29" 'BEGIN{FS=OFS="|"}FNR>1{ $5 = newdate }1' infile > outfile
If you want to replace 5th column in 1st row only then use FNR==1
awk -v newdate="2018/08/29" 'BEGIN{FS=OFS="|"}FNR==1{ $5 = newdate }1' infile > outfile
If you still have problem, frame your question with sample input and
expected output, so that it will be easy to interpret your problem.
Short sed solution:
sed -Ei '1s~\|[0-9]{4}/[0-9]{2}/[0-9]{2}\|~|2018/08/29|~' file
-i - modify the file in-place
1s - substitute only in the 1st(header) line
[0-9]{4}/[0-9]{2}/[0-9]{2} - date pattern

Grep to count occurrences of file A in file B

I have two files, file A may be in file B and I would like to count for each line in file A, how many times it occurs in file B. For example:
File A:
GAGGACAGACTACTAAAGCC
CTTGCCGCAGATTATCAGAG
CCAGCTTGATGTGTCCTGTG
TGATAGGCAGTGGAACACTG
File B:
NTCTTGAGGAAAGGACGAATCTGCGGAGGACAGACTACTAAAGCCGTTTGAGAGCTAGAACGAGCAAGTTAAGAGA
TCTTGAGGAAAGGACGAAACTCCGGAGGACAGACTACTAAAGCCGTTTTAGAGCTAGAAAGCGCAAGTTAAACGAC
NTCTTGAGGAAAGGACGAATCTGCGCTTGCCGCAGATTATCAGAGGTATGAGAGCTAGAACGAGCAAGTTAAGAGC
TCTTGAGGAAAGGACGAAAGTGCGCTTGCCGCAGATTATCAGAGGTTTTAGAGCTAGAAAGAGCAAGTTAAAATAA
GATCTAGTGGAAAGGACGATTCTCCGCTTGCCGCAGATTATCAGAGGTTGTAGAGCTAGAACTAGCAAGTGACAAG
ATCTTGAGGAAAGGACGAATCTGCGCTTGCCGCAGATTATCAGAGGTTTGAGAGCTAGAACTAGCAAGTTAATAGA
CGATCAAGTGGAAGGACGATTCTCCGTGATAGGCAGTGGAACACTGGATGTAGAGCTAGAAATAGCAAGTGAGCAG
ATCTAGAGGAAAGGACGAATCTCCGTGATAGGCAGTGGAACACTGGTATGAGAGCTAGAACTAGCAAGTTAATAGA
TCTTGAGGAAAGGACGAAACTCCGTGATAGGCAGTGGAACACTGGTTTTAGAGCTAGAAAGCGCAAGTTAAAAGAC
And the output should be File C:
2 GAGGACAGACTACTAAAGCC
4 CTTGCCGCAGATTATCAGAG
0 CCAGCTTGATGTGTCCTGTG
3 TGATAGGCAGTGGAACACTG
I would like to do this using grep and I've tried a few variations of -c,o,f but I can't seem to get the right output.
How can I achieve this?
Try this
for i in `cat a`; do echo "$i `grep $i -c b`"; done
In this case if line from file A occurred several times in one line of file B then this will be count as one occurrence. If you want to count such occurrences but without its overlapping use this
for i in `cat a`; do printf $i; grep $i -o b | wc -l; done
And maybe this variant would be quicker
cat b | grep "`cat a`" -o | sort | uniq -c
#!/usr/bin/perl
open A, "A"; # open file "A" to handle A
open B, "B"; # open file "B" to handle B
chomp(#keys = <A>); # read keys to array, strip line-feeds
#counts{#keys} = (0) x #keys; # initialize hash counts for keys
while(<B>){ # iterate file handle B line by line
foreach $k (#keys){ # iterate keys array
if (/$k/) { # if key matches line
$counts{$k}++; # increase count for key by one
}
}
}
print "$counts{$_} $_\n" for (keys %counts);
Linux command to compare files:
comm FileA FileB
comm produces three-column output. Column one contains lines unique to FileA, column two contains lines unique to FileB, and column three contains lines common to both files.

pattern match and create multiple files LINUX

I have a pipe delimited file with over 20M rows. In 4th column I have a date field. I have to take the partial value (YYYYMM) from the date field and write the matching data to a new file appending it to file name. Thanks for all your inputs.
Inputfile.txt
XX|1234|PROCEDURES|20160101|RC
XY|1634|PROCEDURES|20160115|RC
XM|1245|CODES|20170124|RC
XZ|1256|CODES|20170228|RC
OutputFile_201601.txt
XX|1234|PROCEDURES|20160101|RC
XY|1634|PROCEDURES|20160115|RC
OutputFile_201701.txt
XM|1245|CODES|20170124|RC
OutputFile_201702.txt
XZ|1256|CODES|20170228|RC
Using awk:
$ awk -F\| '{f="outputfile_" substr($4,1,6) ".txt"; print >> f ; close (f)}' file
$ ls outputfile_201*
outputfile_201601.txt outputfile_201701.txt outputfile_201702.txt
Explained:
$ awk -F\| ' # pipe as delimiter
{
f="outputfile_" substr($4,1,6) ".txt" # form output filename
print >> f # append record to file
close(f) # close output file
}' file

How to check if one given time, i.e, Starttime is not greater than another given time, i,e, EndTime in ksh shell?

I have two inputs, a StartTime and EndTime.I have to check if the StartTime entered is not greater than the EndTime. If so, it has to display an error.
My input format is
./filename Jan 10 16 20:00:00 Jun 12 16 00:00:00
I am using the logic as,
$Start=$(date --date="$1 $2 $4 $3" +%s)
$End=$(date --date="$5 $6 $8 $7" +%s)
if [[ " $Start" > "$End" ]]
then
{
echo "Starttime cannot be greater than endtime"
exit
}
fi
This code works in bash shell, but shows an error for the --date function in ksh shell. Any idea how I can replace the function to work in ksh shell?

Resources