Format date with a 3-char month name abbreviation using AWK - datetime

I have a datetime format as shown in the example below, which I want to convert to dd-mm-yyyy hh:mm:ss with AWK. How can I do this?
Current format:
3Jun2020 9:33:24; HG3456
7Jun2020 15:25:10; CH4747
10Jun2020 8:49:18; EU4821
12Jun2020 7:13:57; PP3478
Desired output:
03-06-2020 09:33:24; HG3456
07-06-2020 15:25:10; CH4747
10-06-2020 08:49:18; EU4821
12-06-2020 07:13:57; PP3478

Using any awk in any shell on every Unix box:
$ cat tst.awk
BEGIN { FS=OFS=";" }
{
split($1,t,/[ :]/)
lgth = length(t[1])
dayNr = substr(t[1],1,lgth - 7)
mthAbbr = substr(t[1],lgth - 6,3)
mthNr = (index("JanFebMarAprMayJunJulAugSepOctNovDec",mthAbbr) + 2) / 3
yrNr = substr(t[1],lgth - 3)
$1 = sprintf("%02d-%02d-%04d %02d:%02d:%02d", dayNr, mthNr, yrNr, t[2], t[3], t[4])
print
}
$ awk -f tst.awk file
03-06-2020 09:33:24; HG3456
07-06-2020 15:25:10; CH4747
10-06-2020 08:49:18; EU4821
12-06-2020 07:13:57; PP3478

I would use GNU AWK for this task following way, let file.txt content be
3Jun2020 9:33:24; HG3456
7Jun2020 15:25:10; CH4747
10Jun2020 8:49:18; EU4821
12Jun2020 7:13:57; PP3478
then
awk '{sub(/Jan/,"-01-",$1);sub(/Feb/,"-02-",$1);sub(/Mar/,"-03-",$1);sub(/Apr/,"-04-",$1);sub(/May/,"-05-",$1);sub(/Jun/,"-06-",$1);sub(/Jul/,"-07-",$1);sub(/Aug/,"-08-",$1);sub(/Sep/,"-09-",$1);sub(/Oct/,"-10-",$1);sub(/Nov/,"-11-",$1);sub(/Dec/,"-12-",$1);print}' file.txt
output
3-06-2020 9:33:24; HG3456
7-06-2020 15:25:10; CH4747
10-06-2020 8:49:18; EU4821
12-06-2020 7:13:57; PP3478
Explantion: replace Jan using -01-, Feb using -02-, Mar using -03- and so on, then print. Disclaimer: code might need adjusting if you use other locale.
(tested in GNU Awk 5.0.1)

perl -MPOSIX -MDate::Parse -pe 's{^\S+\s+\S+(?=;)}{strftime("%d-%m-%Y %T", strptime($&))}e; s/^0//' file

If sed is an option, you can execute the date command within the replacement.
$ sed "s/\([^;]*\)\(.*\)/\date -d '\1' '+%d-%m-%Y %T\2'/e" input_file
03-06-2020 09:33:24; HG3456
07-06-2020 15:25:10; CH4747
10-06-2020 08:49:18; EU4821
12-06-2020 07:13:57; PP3478

Related

Converting timestamp to EPOCH in awk

I am converting timestamps to EPOCH seconds in awk, getting incorrect output for repeated timestamps
Input:
20180614 00:00:00
20180614 00:00:23
20180614 22:45:00
20180614 22:45:21
20180614 00:00:00
20180614 00:00:23
Expected Output :
1528930800
1528930823
1528930800
1529012721
1528930800
1528930823
I did
awk '{ ts="\""$0"\""; ("date +%s -d "ts)| getline epochsec; print epochsec}'
output after running above command:
1528930800
1528930823
1529012700
1529012721
1529012721
1529012721
With GNU xargs:
xargs -I {} date +%s -d {} < file
Output:
1528927200
1528927223
1529009100
1529009121
1528927200
1528927223
A bit a shorter GNU awk version is using FIELDWIDTHS which is available from gawk-2.13 onwards:
awk 'BEGIN{FIELDWIDTHS="4 2 3 2 1 2 1 2"}{print mktime($1" "$2" "$3$4" "$6" "$8)}'
Since gawk-4.2 you can skip intervening fields:
awk 'BEGIN{FIELDWIDTHS="4 2 2 1:2 1:2 1:2"}{print mktime($1" "$2" "$3" "$4" "$5" "$6)}'
Or even shorter using FPAT
awk 'BEGIN{FPAT="[0-9][0-9]"}{print mktime($1$2" "$3" "$4" "$5" "$6" "$7)}
note: the usage of a single awk-mktime combination will be faster than anything which makes system calls to date as you do not constantly have to call a binary. With the awk mktime solution you call a single binary. Nonetheless, the xargs solution given by Cyrus is by far the most comfortable one.
You could use system function
$ awk '{system("date +%s -d \""$0"\"")}' ip.txt
1528914600
1528914623
1528996500
1528996521
1528914600
1528914623
Or use sed
$ sed 's/.*/date +%s -d "&"/e' ip.txt
1528914600
1528914623
1528996500
1528996521
1528914600
1528914623
As per AllAboutGetline article, you'll need
$ awk '{ ts="date +%s -d \""$0"\""; while ((ts|getline ep)>0) print ep; close(ts) }' ip.txt
1528914600
1528914623
1528996500
1528996521
1528914600
1528914623
However getline is not needed at all for this case and avoid using it unless you really need it and know how to use it
Using GNU awk mktime function:
awk '{gsub(":"," ",$2); print mktime(substr($1,1,4) " " substr($1,5,2) " " substr($1,7,2) " " $2)}' file
To add to Cyrus's answer, the following works on Mac OSX. Strangely, MAC has a different way of handling date-time format to epoch conversion.
xargs -I {} date -j -u -f "%a %b %d %T %Z %Y" {} +%s < file

awk convert quoted date time format to unixtimestamp [duplicate]

I am converting timestamps to EPOCH seconds in awk, getting incorrect output for repeated timestamps
Input:
20180614 00:00:00
20180614 00:00:23
20180614 22:45:00
20180614 22:45:21
20180614 00:00:00
20180614 00:00:23
Expected Output :
1528930800
1528930823
1528930800
1529012721
1528930800
1528930823
I did
awk '{ ts="\""$0"\""; ("date +%s -d "ts)| getline epochsec; print epochsec}'
output after running above command:
1528930800
1528930823
1529012700
1529012721
1529012721
1529012721
With GNU xargs:
xargs -I {} date +%s -d {} < file
Output:
1528927200
1528927223
1529009100
1529009121
1528927200
1528927223
A bit a shorter GNU awk version is using FIELDWIDTHS which is available from gawk-2.13 onwards:
awk 'BEGIN{FIELDWIDTHS="4 2 3 2 1 2 1 2"}{print mktime($1" "$2" "$3$4" "$6" "$8)}'
Since gawk-4.2 you can skip intervening fields:
awk 'BEGIN{FIELDWIDTHS="4 2 2 1:2 1:2 1:2"}{print mktime($1" "$2" "$3" "$4" "$5" "$6)}'
Or even shorter using FPAT
awk 'BEGIN{FPAT="[0-9][0-9]"}{print mktime($1$2" "$3" "$4" "$5" "$6" "$7)}
note: the usage of a single awk-mktime combination will be faster than anything which makes system calls to date as you do not constantly have to call a binary. With the awk mktime solution you call a single binary. Nonetheless, the xargs solution given by Cyrus is by far the most comfortable one.
You could use system function
$ awk '{system("date +%s -d \""$0"\"")}' ip.txt
1528914600
1528914623
1528996500
1528996521
1528914600
1528914623
Or use sed
$ sed 's/.*/date +%s -d "&"/e' ip.txt
1528914600
1528914623
1528996500
1528996521
1528914600
1528914623
As per AllAboutGetline article, you'll need
$ awk '{ ts="date +%s -d \""$0"\""; while ((ts|getline ep)>0) print ep; close(ts) }' ip.txt
1528914600
1528914623
1528996500
1528996521
1528914600
1528914623
However getline is not needed at all for this case and avoid using it unless you really need it and know how to use it
Using GNU awk mktime function:
awk '{gsub(":"," ",$2); print mktime(substr($1,1,4) " " substr($1,5,2) " " substr($1,7,2) " " $2)}' file
To add to Cyrus's answer, the following works on Mac OSX. Strangely, MAC has a different way of handling date-time format to epoch conversion.
xargs -I {} date -j -u -f "%a %b %d %T %Z %Y" {} +%s < file

how to append multiple columns of data to tab delimited file efficiently in one-pass

So far I have figured how to add just one column .I have two more columns.
I have simplified my code here but actually I source the columns as variables from another file and I have 100 such files each 50 MB that I should add these columns.
This is input file
1|True
2|Fals
I want the output to be
1|True|2018-05-10|2018-05-11|2018-05-12
2|Fals|2018-05-10|2018-05-11|2018-05-12
I have written this
sed -i "s/.$/|2018-05-10/" $file
EDIT: As per OP dates are not system dates they are variables then adding following.
awk -v var_1=$(var1) -v var_2=$(var2) -v var_3=$(var3) '{print $0 OFS var_1 OFS var_2 OFS var_3}' OFS="|" Input_file
Where var_1, var_2 and var_3 are awk variables and var1, var2 and var3 are bash variables.
With GNU date following may help you on same.
awk -v today=$(date +%Y-%m-%d) -v tomorrow=$(date +%Y-%m-%d --date="+ 1 day") -v day_after=$(date +%Y-%m-%d --date="+ 2 day") '{print $0 OFS today OFS tomorrow OFS day_after}' OFS="|" Input_file
Adding a non-one liner form of solution too now.
awk -v today=$(date +%Y-%m-%d) -v tomorrow=$(date +%Y-%m-%d --date="+ 1 day") -v day_after=$(date +%Y-%m-%d --date="+ 2 day") '
{
print $0 OFS today OFS tomorrow OFS day_after
}
' OFS="|" Input_file
In case of saving changes into Input_file itself append > temp_file && mv temp_file Input_file to above code too.
If you have the following input file:
$ more file
1|True
2|Fals
Then you can use the following sed command to process it in one pass:
DATE1=2018-05-10; DATE2=2018-05-11; DATE3=2018-05-12; sed -i "s/$/|$DATE1|$DATE2|$DATE3/" file
output:
$ more file
1|True|2018-05-10|2018-05-11|2018-05-12
2|Fals|2018-05-10|2018-05-11|2018-05-12
You can assign any value to your three dates by using date +%Y-%m-%d --date="SPECIFIC DAY" or any other values

jq parsing date to timestamp

I have the following script:
curl -s -S 'https://bittrex.com/Api/v2.0/pub/market/GetTicks?marketName=BTC-NBT&tickInterval=thirtyMin&_=1521347400000' | jq -r '.result|.[] |[.T,.O,.H,.L,.C,.V,.BV] | #tsv | tostring | gsub("\t";",") | "(\(.))"'
This is the output:
(2018-03-17T18:30:00,0.00012575,0.00012643,0.00012563,0.00012643,383839.45768188,48.465051)
(2018-03-17T19:00:00,0.00012643,0.00012726,0.00012642,0.00012722,207757.18765437,26.30099514)
(2018-03-17T19:30:00,0.00012726,0.00012779,0.00012698,0.00012779,97387.01596624,12.4229077)
(2018-03-17T20:00:00,0.0001276,0.0001278,0.00012705,0.0001275,96850.15260027,12.33316229)
I want to replace the date with timestamp.
I can make this conversion with date in the shell
date -d '2018-03-17T18:30:00' +%s%3N
1521325800000
I want this result:
(1521325800000,0.00012575,0.00012643,0.00012563,0.00012643,383839.45768188,48.465051)
(1521327600000,0.00012643,0.00012726,0.00012642,0.00012722,207757.18765437,26.30099514)
(1521329400000,0.00012726,0.00012779,0.00012698,0.00012779,97387.01596624,12.4229077)
(1521331200000,0.0001276,0.0001278,0.00012705,0.0001275,96850.15260027,12.33316229)
This data is stored in MySQL.
Is it possible to execute the date conversion with jq or another command like awk, sed, perl in a single command line?
Here is an all-jq solution that assumes the "Z" (UTC+0) timezone.
In brief, simply replace .T by:
((.T + "Z") | fromdate | tostring + "000")
To verify this, consider:
timestamp.jq
[splits("[(),]")]
| .[1] |= ((. + "Z")|fromdate|tostring + "000") # milliseconds
| .[1:length-1]
| "(" + join(",") + ")"
Invocation
jq -rR -f timestamp.jq input.txt
Output
(1521311400000,0.00012575,0.00012643,0.00012563,0.00012643,383839.45768188,48.465051)
(1521313200000,0.00012643,0.00012726,0.00012642,0.00012722,207757.18765437,26.30099514)
(1521315000000,0.00012726,0.00012779,0.00012698,0.00012779,97387.01596624,12.4229077)
(1521316800000,0.0001276,0.0001278,0.00012705,0.0001275,96850.15260027,12.33316229)
Here is an unportable awk solution. It is not portable because it relies on the system date command; on the system I'm using, the relevant invocation looks like: date -j -f "%Y-%m-%eT%T" STRING "+%s"
awk -F, 'BEGIN{OFS=FS}
NF==0 { next }
{ sub(/\(/,"",$1);
cmd="date -j -f \"%Y-%m-%eT%T\" " $1 " +%s";
cmd | getline $1;
$1=$1 "000"; # milliseconds
printf "%s", "(";
print;
}' input.txt
Output
(1521325800000,0.00012575,0.00012643,0.00012563,0.00012643,383839.45768188,48.465051)
(1521327600000,0.00012643,0.00012726,0.00012642,0.00012722,207757.18765437,26.30099514)
(1521329400000,0.00012726,0.00012779,0.00012698,0.00012779,97387.01596624,12.4229077)
(1521331200000,0.0001276,0.0001278,0.00012705,0.0001275,96850.15260027,12.33316229)
Solution with sed :
sed -e 's/(\([^,]\+\)\(,.*\)/echo "(\$(date -d \1 +%s%3N),\2"/g' | ksh
test :
<commande_curl> | sed -e 's/(\([^,]\+\)\(,.*\)/echo "(\$(date -d \1 +%s%3N),\2"/g' | ksh
or :
<commande_curl> > results_curl.txt
cat results_curl.txt | sed -e 's/(\([^,]\+\)\(,.*\)/echo "(\$(date -d \1 +%s%3N),\2"/g' | ksh

No results while making precise matching using awk

I am having rows like this in my source file:
"Sumit|My Application|PROJECT|1|6|Y|20161103084527"
I want to make a precise match on Column 3 i.e. I do not want to use '~' operator while writing my awk command. However the command:
awk -F '|' '($3 ~ /'"$Var_ApPJ"'/) {print $3}' ${Var_RDR}/${Var_RFL};
is fetching me correct result but the command:
awk -F '|' '($3 == "${Var_ApPJ}") {print $3}' ${Var_RDR}/${Var_RFL};
fails to do so. Can anyone help in explaining why it happens and I am willing to use '==' because I do not want to match if the value is "PROJECT1" in source file.
Parameter Var_ApPJ="PROJECT"
${Var_RDR}/${Var_RFL} -> Refers to source file.
Refer to this part of documentation to know how to pass variable to awk.
I found an alternative way of '==' with '~':
awk -F '|' '($3 ~ "^${Var_ApPJ}"$) {print $3}' ${Var_RDR}/${Var_RFL};
here is the problem -
try below command -
awk -F '|' '$3 == Var_ApPJ {print $3}' ${Var_RDR}/${Var_RFL};
Remove curly braces and bracket.
vipin#kali:~$ cat kk.txt
a 5 b cd ef gh
vipin#kali:~$ awk -v var1="5" '$2 == var1 {print $3}' kk.txt
b
vipin#kali:~$
OR
#cat kk.txt
a 5 b cd ef gh
#var1="5"
#echo $var1
5
#awk '$2 == "'"$var1"'" {print $3}' kk.txt ### With "{}"
b
#
#awk '$2 == "'"${var1}"'" {print $3}' kk.txt ### without "{}"
b
#

Resources