awk convert quoted date time format to unixtimestamp [duplicate] - unix

I am converting timestamps to EPOCH seconds in awk, getting incorrect output for repeated timestamps
Input:
20180614 00:00:00
20180614 00:00:23
20180614 22:45:00
20180614 22:45:21
20180614 00:00:00
20180614 00:00:23
Expected Output :
1528930800
1528930823
1528930800
1529012721
1528930800
1528930823
I did
awk '{ ts="\""$0"\""; ("date +%s -d "ts)| getline epochsec; print epochsec}'
output after running above command:
1528930800
1528930823
1529012700
1529012721
1529012721
1529012721

With GNU xargs:
xargs -I {} date +%s -d {} < file
Output:
1528927200
1528927223
1529009100
1529009121
1528927200
1528927223

A bit a shorter GNU awk version is using FIELDWIDTHS which is available from gawk-2.13 onwards:
awk 'BEGIN{FIELDWIDTHS="4 2 3 2 1 2 1 2"}{print mktime($1" "$2" "$3$4" "$6" "$8)}'
Since gawk-4.2 you can skip intervening fields:
awk 'BEGIN{FIELDWIDTHS="4 2 2 1:2 1:2 1:2"}{print mktime($1" "$2" "$3" "$4" "$5" "$6)}'
Or even shorter using FPAT
awk 'BEGIN{FPAT="[0-9][0-9]"}{print mktime($1$2" "$3" "$4" "$5" "$6" "$7)}
note: the usage of a single awk-mktime combination will be faster than anything which makes system calls to date as you do not constantly have to call a binary. With the awk mktime solution you call a single binary. Nonetheless, the xargs solution given by Cyrus is by far the most comfortable one.

You could use system function
$ awk '{system("date +%s -d \""$0"\"")}' ip.txt
1528914600
1528914623
1528996500
1528996521
1528914600
1528914623
Or use sed
$ sed 's/.*/date +%s -d "&"/e' ip.txt
1528914600
1528914623
1528996500
1528996521
1528914600
1528914623
As per AllAboutGetline article, you'll need
$ awk '{ ts="date +%s -d \""$0"\""; while ((ts|getline ep)>0) print ep; close(ts) }' ip.txt
1528914600
1528914623
1528996500
1528996521
1528914600
1528914623
However getline is not needed at all for this case and avoid using it unless you really need it and know how to use it

Using GNU awk mktime function:
awk '{gsub(":"," ",$2); print mktime(substr($1,1,4) " " substr($1,5,2) " " substr($1,7,2) " " $2)}' file

To add to Cyrus's answer, the following works on Mac OSX. Strangely, MAC has a different way of handling date-time format to epoch conversion.
xargs -I {} date -j -u -f "%a %b %d %T %Z %Y" {} +%s < file

Related

How to run awk on a file with cedella as delimiter

I have a file with below contents
cat file1.dat
anuÇ89Çhyd
binduÇ45Çchennai
I would like to print the second column with Ç as delimiter.
output should be
89
45
The manpage of awk mentions the following:
-F fs
--field-separator fs
Use fs for the input field separator (the value of the FS predefined variable).
So, this command does what you want:
cat file1.dat | awk -F'Ç' '{print $2}'
Given:
$ cat file
anuÇ89Çhyd
binduÇ45Çchennai
You can use cut:
$ cut -f 2 -d 'Ç' file
awk:
$ awk -F'Ç' '{print $2}' file
sed:
$ sed -E 's/^[^Ç]*Ç([^Ç]*).*/\1/' file
GNU grep:
$ grep -oP '^[^Ç]*Ç\K[^Ç]+(?=Ç)' file
Perl:
$ perl -lnE 'print $1 if /^[^Ç]*Ç([^Ç]+)Ç/' file
All those print:
89
45

Converting timestamp to EPOCH in awk

I am converting timestamps to EPOCH seconds in awk, getting incorrect output for repeated timestamps
Input:
20180614 00:00:00
20180614 00:00:23
20180614 22:45:00
20180614 22:45:21
20180614 00:00:00
20180614 00:00:23
Expected Output :
1528930800
1528930823
1528930800
1529012721
1528930800
1528930823
I did
awk '{ ts="\""$0"\""; ("date +%s -d "ts)| getline epochsec; print epochsec}'
output after running above command:
1528930800
1528930823
1529012700
1529012721
1529012721
1529012721
With GNU xargs:
xargs -I {} date +%s -d {} < file
Output:
1528927200
1528927223
1529009100
1529009121
1528927200
1528927223
A bit a shorter GNU awk version is using FIELDWIDTHS which is available from gawk-2.13 onwards:
awk 'BEGIN{FIELDWIDTHS="4 2 3 2 1 2 1 2"}{print mktime($1" "$2" "$3$4" "$6" "$8)}'
Since gawk-4.2 you can skip intervening fields:
awk 'BEGIN{FIELDWIDTHS="4 2 2 1:2 1:2 1:2"}{print mktime($1" "$2" "$3" "$4" "$5" "$6)}'
Or even shorter using FPAT
awk 'BEGIN{FPAT="[0-9][0-9]"}{print mktime($1$2" "$3" "$4" "$5" "$6" "$7)}
note: the usage of a single awk-mktime combination will be faster than anything which makes system calls to date as you do not constantly have to call a binary. With the awk mktime solution you call a single binary. Nonetheless, the xargs solution given by Cyrus is by far the most comfortable one.
You could use system function
$ awk '{system("date +%s -d \""$0"\"")}' ip.txt
1528914600
1528914623
1528996500
1528996521
1528914600
1528914623
Or use sed
$ sed 's/.*/date +%s -d "&"/e' ip.txt
1528914600
1528914623
1528996500
1528996521
1528914600
1528914623
As per AllAboutGetline article, you'll need
$ awk '{ ts="date +%s -d \""$0"\""; while ((ts|getline ep)>0) print ep; close(ts) }' ip.txt
1528914600
1528914623
1528996500
1528996521
1528914600
1528914623
However getline is not needed at all for this case and avoid using it unless you really need it and know how to use it
Using GNU awk mktime function:
awk '{gsub(":"," ",$2); print mktime(substr($1,1,4) " " substr($1,5,2) " " substr($1,7,2) " " $2)}' file
To add to Cyrus's answer, the following works on Mac OSX. Strangely, MAC has a different way of handling date-time format to epoch conversion.
xargs -I {} date -j -u -f "%a %b %d %T %Z %Y" {} +%s < file

how to append multiple columns of data to tab delimited file efficiently in one-pass

So far I have figured how to add just one column .I have two more columns.
I have simplified my code here but actually I source the columns as variables from another file and I have 100 such files each 50 MB that I should add these columns.
This is input file
1|True
2|Fals
I want the output to be
1|True|2018-05-10|2018-05-11|2018-05-12
2|Fals|2018-05-10|2018-05-11|2018-05-12
I have written this
sed -i "s/.$/|2018-05-10/" $file
EDIT: As per OP dates are not system dates they are variables then adding following.
awk -v var_1=$(var1) -v var_2=$(var2) -v var_3=$(var3) '{print $0 OFS var_1 OFS var_2 OFS var_3}' OFS="|" Input_file
Where var_1, var_2 and var_3 are awk variables and var1, var2 and var3 are bash variables.
With GNU date following may help you on same.
awk -v today=$(date +%Y-%m-%d) -v tomorrow=$(date +%Y-%m-%d --date="+ 1 day") -v day_after=$(date +%Y-%m-%d --date="+ 2 day") '{print $0 OFS today OFS tomorrow OFS day_after}' OFS="|" Input_file
Adding a non-one liner form of solution too now.
awk -v today=$(date +%Y-%m-%d) -v tomorrow=$(date +%Y-%m-%d --date="+ 1 day") -v day_after=$(date +%Y-%m-%d --date="+ 2 day") '
{
print $0 OFS today OFS tomorrow OFS day_after
}
' OFS="|" Input_file
In case of saving changes into Input_file itself append > temp_file && mv temp_file Input_file to above code too.
If you have the following input file:
$ more file
1|True
2|Fals
Then you can use the following sed command to process it in one pass:
DATE1=2018-05-10; DATE2=2018-05-11; DATE3=2018-05-12; sed -i "s/$/|$DATE1|$DATE2|$DATE3/" file
output:
$ more file
1|True|2018-05-10|2018-05-11|2018-05-12
2|Fals|2018-05-10|2018-05-11|2018-05-12
You can assign any value to your three dates by using date +%Y-%m-%d --date="SPECIFIC DAY" or any other values

jq parsing date to timestamp

I have the following script:
curl -s -S 'https://bittrex.com/Api/v2.0/pub/market/GetTicks?marketName=BTC-NBT&tickInterval=thirtyMin&_=1521347400000' | jq -r '.result|.[] |[.T,.O,.H,.L,.C,.V,.BV] | #tsv | tostring | gsub("\t";",") | "(\(.))"'
This is the output:
(2018-03-17T18:30:00,0.00012575,0.00012643,0.00012563,0.00012643,383839.45768188,48.465051)
(2018-03-17T19:00:00,0.00012643,0.00012726,0.00012642,0.00012722,207757.18765437,26.30099514)
(2018-03-17T19:30:00,0.00012726,0.00012779,0.00012698,0.00012779,97387.01596624,12.4229077)
(2018-03-17T20:00:00,0.0001276,0.0001278,0.00012705,0.0001275,96850.15260027,12.33316229)
I want to replace the date with timestamp.
I can make this conversion with date in the shell
date -d '2018-03-17T18:30:00' +%s%3N
1521325800000
I want this result:
(1521325800000,0.00012575,0.00012643,0.00012563,0.00012643,383839.45768188,48.465051)
(1521327600000,0.00012643,0.00012726,0.00012642,0.00012722,207757.18765437,26.30099514)
(1521329400000,0.00012726,0.00012779,0.00012698,0.00012779,97387.01596624,12.4229077)
(1521331200000,0.0001276,0.0001278,0.00012705,0.0001275,96850.15260027,12.33316229)
This data is stored in MySQL.
Is it possible to execute the date conversion with jq or another command like awk, sed, perl in a single command line?
Here is an all-jq solution that assumes the "Z" (UTC+0) timezone.
In brief, simply replace .T by:
((.T + "Z") | fromdate | tostring + "000")
To verify this, consider:
timestamp.jq
[splits("[(),]")]
| .[1] |= ((. + "Z")|fromdate|tostring + "000") # milliseconds
| .[1:length-1]
| "(" + join(",") + ")"
Invocation
jq -rR -f timestamp.jq input.txt
Output
(1521311400000,0.00012575,0.00012643,0.00012563,0.00012643,383839.45768188,48.465051)
(1521313200000,0.00012643,0.00012726,0.00012642,0.00012722,207757.18765437,26.30099514)
(1521315000000,0.00012726,0.00012779,0.00012698,0.00012779,97387.01596624,12.4229077)
(1521316800000,0.0001276,0.0001278,0.00012705,0.0001275,96850.15260027,12.33316229)
Here is an unportable awk solution. It is not portable because it relies on the system date command; on the system I'm using, the relevant invocation looks like: date -j -f "%Y-%m-%eT%T" STRING "+%s"
awk -F, 'BEGIN{OFS=FS}
NF==0 { next }
{ sub(/\(/,"",$1);
cmd="date -j -f \"%Y-%m-%eT%T\" " $1 " +%s";
cmd | getline $1;
$1=$1 "000"; # milliseconds
printf "%s", "(";
print;
}' input.txt
Output
(1521325800000,0.00012575,0.00012643,0.00012563,0.00012643,383839.45768188,48.465051)
(1521327600000,0.00012643,0.00012726,0.00012642,0.00012722,207757.18765437,26.30099514)
(1521329400000,0.00012726,0.00012779,0.00012698,0.00012779,97387.01596624,12.4229077)
(1521331200000,0.0001276,0.0001278,0.00012705,0.0001275,96850.15260027,12.33316229)
Solution with sed :
sed -e 's/(\([^,]\+\)\(,.*\)/echo "(\$(date -d \1 +%s%3N),\2"/g' | ksh
test :
<commande_curl> | sed -e 's/(\([^,]\+\)\(,.*\)/echo "(\$(date -d \1 +%s%3N),\2"/g' | ksh
or :
<commande_curl> > results_curl.txt
cat results_curl.txt | sed -e 's/(\([^,]\+\)\(,.*\)/echo "(\$(date -d \1 +%s%3N),\2"/g' | ksh

Awk change decimal formats

I've got a file containing decimal values formatted like 9.85E-4. How can I make awk format this value to 0.000985?
Use printf with the %f option:
awk '{printf "%f\n", your_field .... }' file
Example
$ cat a
9.85E-4
23
$ awk '{printf "%f\n", $1}' a
0.000985
23.000000
From The GNU Awk User’s Guide # 5.5.2 Format-Control Letters:
%e, %E
Print a number in scientific (exponential) notation.
%f
Print a number in floating-point notation

Resources