How to sum start time in date format and run time in milliseconds to find end time in unix script? - unix
I have a file having a column with start time in the format shown below.
2019-10-30T08:04:30Z
2019-10-30T08:04:25Z
2019-10-30T08:04:30Z
Also I have a file having run time of the jobs executed in the format of milliseconds.
2647ms
360ms
10440ms
.
.
.
How do I add the respective rows in both columns in those files and produce the end time as a result in seperate file?
paste start_time.txt execution_time.txt | while IFS="$(printf '\t')" read -r f1 f2
do
# convert execution_time in seconds from ms
execution_time_ms="$(echo $f2 | cut -d'm' -f1)"
execution_time_ms="$(echo $execution_time_ms | cut -d's' -f1)"
execution_time_s=$(awk "BEGIN {printf \"%.10f\",$execution_time_ms/1000}")
# create dateformat string to add seconds
date_format=$f1
date_format+=" +$execution_time_s"
date_format+="seconds"
# create end date using dateformat string
end_time=`date --date="$date_format" +'%Y-%m-%d %H:%M:%S'`
end_time="${end_time/ /T}"
end_time+='Z'
# append end time
echo $end_time >> end_time.txt
done
This is a script which solves your problem.
I have used 3 files as per you mentioned.
start_time.txt (Input file)
execution_time.txt (Input file)
end_time.txt (Output file)
Related
Command to perform full outer join with duplicate entries in key/join column
I have three files. I need to join them based on one column and perform some transformations. file1.dat (Column 1 is used for joining) 123,is1,ric1,col1,smbc1 123,is2,ric1,col1,smbc1 234,is3,ric3,col3,smbc2 345,is4,ric4,,smbc2 345,is4,,col5,smbc2 file2.dat (Column 1 is used for joining) 123,abc 234,bcd file3.dat (Column 4 is used for joining) r0c1,r0c2,r0c3,123,r0c5,r0c6,r0c7,r0c8 r2c1,r2c2,r2c3,123,r2c5,r2c6,r2c7,r2c8 r3c1,r3c2,r3c3,234,r3c5,r3c6,r3c7,r3c8 r4c1,r4c2,r4c3,345,r4c5,r4c6,r4c7,r4c8 Expected Output (output.dat) 123,r0c5,is1,ric1,smbc1,abc,r0c8,r0c6,col1,r0c7,r0c1,r0c2,r0c3 123,r0c5,is2,ric1,smbc1,abc,r0c8,r0c6,col1,r0c7,r0c1,r0c2,r0c3 123,r2c5,is1,ric1,smbc1,abc,r2c8,r2c6,col1,r2c7,r2c1,r2c2,r2c3 123,r2c5,is2,ric1,smbc1,abc,r2c8,r2c6,col1,r2c7,r2c1,r2c2,r2c3 234,r3c5,is3,ric3,smbc2,bcd,r3c8,r3c6,col3,r3c7,r3c1,r3c2,r3c3 345,r4c5,is4,ric4,smbc2,N/A,r4c8,r4c6,N/A,r4c7,r4c1,r4c2,r4c3 345,r4c5,is4,N/A,smbc2,N/A,r4c8,r4c6,col5,r4c7,r4c1,r4c2,r4c3 I wrote the following awk command. awk ' BEGIN {FS=OFS=","} FILENAME == ARGV[1] { temp_join_one[$1] = $2"|"$3"|"$4"|"$5; next} FILENAME == ARGV[2] { exchtbunload[$1] = $2; next} FILENAME == ARGV[3] { s_temp_join_one = temp_join_one[$4]; split(s_temp_join_one, array_temp_join_one,"|"); v3=(array_temp_join_one[1]==""?"N/A":array_temp_join_one[1]); v4=(array_temp_join_one[2]==""?"N/A":array_temp_join_one[2]); v5=(array_temp_join_one[4]==""?"N/A":array_temp_join_one[4]); v6=(exchtbunload[$4]==""?"N/A":exchtbunload[$4]); v9=(array_temp_join_one[3]==""?"N/A":array_temp_join_one[3]); v11=($2=""?"N/A":$2); print $4, $5, v3, v4, v5, v6, $8, $6, v9, $7, $1, v11, $3 > "output.dat" } ' file1.dat file2.dat file3.dat I need to join all three files. The final output file should have all the values from file3 irrespective of whether they are in other two files and the corresponding columns should be empty(or N/A) if it is not present in other two files. (The order of the columns is not a very big problem. I can use awk to rearrange them.) But my problem is, as the key is not unique, I am not getting the expected output. My output has only three lines. I tried to apply the solution suggested using join condition. It works with smaller files. But the files I have are close to 3-5 GB in size. And they are in numerical order and not lexicographical order. Sorting them looks like would take lot of time. Any suggestion would be helpful. Thanks in advance.
with join, assuming files are sorted by the key. $ join -t, -1 1 -2 4 <(join -t, -a1 -a2 -e "N/A" -o1.1,1.2,1.3,1.4,1.5,2.1 file1 file2) \ file3 -o1.1,2.5,1.2,1.3,1.5,1.6,2.8,2.6,1.4,2.7,2.2,2.3 123,r0c5,is1,ric1,smbc1,123,r0c8,r0c6,col1,r0c7,r0c2,r0c3 123,r2c5,is1,ric1,smbc1,123,r2c8,r2c6,col1,r2c7,r2c2,r2c3 123,r0c5,is2,ric1,smbc1,123,r0c8,r0c6,col1,r0c7,r0c2,r0c3 123,r2c5,is2,ric1,smbc1,123,r2c8,r2c6,col1,r2c7,r2c2,r2c3 234,r3c5,is3,ric3,smbc2,234,r3c8,r3c6,col3,r3c7,r3c2,r3c3 345,r4c5,is4,ric4,smbc2,N/A,r4c8,r4c6,N/A,r4c7,r4c2,r4c3 345,r4c5,is4,N/A,smbc2,N/A,r4c8,r4c6,col5,r4c7,r4c2,r4c3
I really like the answer using join, but it does require that the files are sorted by the key column. Here's a version that doesn't have that restriction. Working under the theory that the best tool for doing database-like things is a database, it imports the CSV files into tables of a temporary SQLite database and then runs a SELECT on them to get your desired output: (edit: Revised version based on new information about the data) #!/bin/sh # Usage: ./merge.sh file1.dat file2.dat file3.dat > output.dat file1=$1 file2=$2 file3=$3 rm -f scratch.db sqlite3 -batch -noheader -csv -nullvalue "N/A" scratch.db <<EOF | perl -pe 's#(?:^|,)\K""(?=,|$)#N/A#g' CREATE TABLE file1(f1_1 INTEGER, f1_2, f1_3, f1_4, f1_5); CREATE TABLE file2(f2_1 INTEGER, f2_2); CREATE TABLE file3(f3_1, f3_2, f3_3, f3_4 INTEGER, f3_5, f3_6, f3_7, f3_8); .import $file1 file1 .import $file2 file2 .import $file3 file3 -- Build indexes to speed up joining and sorting gigs of data. CREATE INDEX file1_idx ON file1(f1_1); CREATE INDEX file2_idx ON file2(f2_1); CREATE INDEX file3_idx ON file3(f3_4); SELECT f3_4, f3_5, f1_2, f1_3, f1_5, f2_2, f3_8, f3_6, f1_4, f3_7, f3_1 , f3_2, f3_3 FROM file3 LEFT JOIN file1 ON f1_1 = f3_4 LEFT JOIN file2 ON f2_1 = f3_4 ORDER BY f3_4; EOF rm -f scratch.db Note: This will use a temporary database file that's going to be the size of all your data and then some because of indexes. If you're space constrained, I have an idea for doing it without temporary files, given the information that the join columns are sorted numerically, but it's enough work that I'm not going to bother unless asked.
Replace column in header of a large .txt file - unix
i need to replace the date in header of a large file. So i have multiple column in header, using |(pipe) as separator, like this: A|B05|1|xxc|2018/06/29|AC23|SoOn So i need the same header but with the date(5th column) updated : A|B05|1|xxc|2018/08/29|AC23 Any solutions for me? I tried with awk and sed but both of them carried me errors greater than me. I'm new on this and i really want to understand the solution. So could you please help me?
You can use below command which replaces 5th column from every line with content of newdate variable: awk -v newdate="2018/08/29" 'BEGIN{FS=OFS="|"}{ $5 = newdate }1' infile > outfile Explanation awk -v newdate="2018/08/29" ' # call awk, and set variable newdate BEGIN{ FS=OFS="|" # set input and output field separator } { $5 = newdate # assign fifth field with a content of variable newdate }1 # 1 at the end does default operation # print current line/row/record, that is print $0 ' infile > outfile If you want to skip first line incase if you have header then use FNR>1 awk -v newdate="2018/08/29" 'BEGIN{FS=OFS="|"}FNR>1{ $5 = newdate }1' infile > outfile If you want to replace 5th column in 1st row only then use FNR==1 awk -v newdate="2018/08/29" 'BEGIN{FS=OFS="|"}FNR==1{ $5 = newdate }1' infile > outfile If you still have problem, frame your question with sample input and expected output, so that it will be easy to interpret your problem.
Short sed solution: sed -Ei '1s~\|[0-9]{4}/[0-9]{2}/[0-9]{2}\|~|2018/08/29|~' file -i - modify the file in-place 1s - substitute only in the 1st(header) line [0-9]{4}/[0-9]{2}/[0-9]{2} - date pattern
Grep to count occurrences of file A in file B
I have two files, file A may be in file B and I would like to count for each line in file A, how many times it occurs in file B. For example: File A: GAGGACAGACTACTAAAGCC CTTGCCGCAGATTATCAGAG CCAGCTTGATGTGTCCTGTG TGATAGGCAGTGGAACACTG File B: NTCTTGAGGAAAGGACGAATCTGCGGAGGACAGACTACTAAAGCCGTTTGAGAGCTAGAACGAGCAAGTTAAGAGA TCTTGAGGAAAGGACGAAACTCCGGAGGACAGACTACTAAAGCCGTTTTAGAGCTAGAAAGCGCAAGTTAAACGAC NTCTTGAGGAAAGGACGAATCTGCGCTTGCCGCAGATTATCAGAGGTATGAGAGCTAGAACGAGCAAGTTAAGAGC TCTTGAGGAAAGGACGAAAGTGCGCTTGCCGCAGATTATCAGAGGTTTTAGAGCTAGAAAGAGCAAGTTAAAATAA GATCTAGTGGAAAGGACGATTCTCCGCTTGCCGCAGATTATCAGAGGTTGTAGAGCTAGAACTAGCAAGTGACAAG ATCTTGAGGAAAGGACGAATCTGCGCTTGCCGCAGATTATCAGAGGTTTGAGAGCTAGAACTAGCAAGTTAATAGA CGATCAAGTGGAAGGACGATTCTCCGTGATAGGCAGTGGAACACTGGATGTAGAGCTAGAAATAGCAAGTGAGCAG ATCTAGAGGAAAGGACGAATCTCCGTGATAGGCAGTGGAACACTGGTATGAGAGCTAGAACTAGCAAGTTAATAGA TCTTGAGGAAAGGACGAAACTCCGTGATAGGCAGTGGAACACTGGTTTTAGAGCTAGAAAGCGCAAGTTAAAAGAC And the output should be File C: 2 GAGGACAGACTACTAAAGCC 4 CTTGCCGCAGATTATCAGAG 0 CCAGCTTGATGTGTCCTGTG 3 TGATAGGCAGTGGAACACTG I would like to do this using grep and I've tried a few variations of -c,o,f but I can't seem to get the right output. How can I achieve this?
Try this for i in `cat a`; do echo "$i `grep $i -c b`"; done In this case if line from file A occurred several times in one line of file B then this will be count as one occurrence. If you want to count such occurrences but without its overlapping use this for i in `cat a`; do printf $i; grep $i -o b | wc -l; done And maybe this variant would be quicker cat b | grep "`cat a`" -o | sort | uniq -c
#!/usr/bin/perl open A, "A"; # open file "A" to handle A open B, "B"; # open file "B" to handle B chomp(#keys = <A>); # read keys to array, strip line-feeds #counts{#keys} = (0) x #keys; # initialize hash counts for keys while(<B>){ # iterate file handle B line by line foreach $k (#keys){ # iterate keys array if (/$k/) { # if key matches line $counts{$k}++; # increase count for key by one } } } print "$counts{$_} $_\n" for (keys %counts);
Linux command to compare files: comm FileA FileB comm produces three-column output. Column one contains lines unique to FileA, column two contains lines unique to FileB, and column three contains lines common to both files.
pattern match and create multiple files LINUX
I have a pipe delimited file with over 20M rows. In 4th column I have a date field. I have to take the partial value (YYYYMM) from the date field and write the matching data to a new file appending it to file name. Thanks for all your inputs. Inputfile.txt XX|1234|PROCEDURES|20160101|RC XY|1634|PROCEDURES|20160115|RC XM|1245|CODES|20170124|RC XZ|1256|CODES|20170228|RC OutputFile_201601.txt XX|1234|PROCEDURES|20160101|RC XY|1634|PROCEDURES|20160115|RC OutputFile_201701.txt XM|1245|CODES|20170124|RC OutputFile_201702.txt XZ|1256|CODES|20170228|RC
Using awk: $ awk -F\| '{f="outputfile_" substr($4,1,6) ".txt"; print >> f ; close (f)}' file $ ls outputfile_201* outputfile_201601.txt outputfile_201701.txt outputfile_201702.txt Explained: $ awk -F\| ' # pipe as delimiter { f="outputfile_" substr($4,1,6) ".txt" # form output filename print >> f # append record to file close(f) # close output file }' file
How to check if one given time, i.e, Starttime is not greater than another given time, i,e, EndTime in ksh shell?
I have two inputs, a StartTime and EndTime.I have to check if the StartTime entered is not greater than the EndTime. If so, it has to display an error. My input format is ./filename Jan 10 16 20:00:00 Jun 12 16 00:00:00 I am using the logic as, $Start=$(date --date="$1 $2 $4 $3" +%s) $End=$(date --date="$5 $6 $8 $7" +%s) if [[ " $Start" > "$End" ]] then { echo "Starttime cannot be greater than endtime" exit } fi This code works in bash shell, but shows an error for the --date function in ksh shell. Any idea how I can replace the function to work in ksh shell?