I need to convert seconds to date:
My input documents are:
{"_id":"ae53d3ec-8fc3-44fc-a7eb-f2f32da4eaae","birthDate":{"value":{"$date":{"$numberLong":"-469929600000"}}}}
{"_id":"ef92c3e4-d5d7-4b81-8a0b-1ab1eac10331","birthDate":{"value":{"$date":{"$numberLong":"-854755200000"}}}}
I need to get:
{"_id":"ae53d3ec-8fc3-44fc-a7eb-f2f32da4eaae","birthDate":"1955-02-10"}
{"_id":"ef92c3e4-d5d7-4b81-8a0b-1ab1eac10331","birthDate":"1942-12-01"}
Any ideas?
Grab the source value, chop off the last three digits to turn the number into seconds and call todate to convert it into a date string (GMT).
jq -c '.birthDate |= (.value."$date"."$numberLong"[:-3] | tonumber | todate[:10])'
{"_id":"ae53d3ec-8fc3-44fc-a7eb-f2f32da4eaae","birthDate":"1955-02-10"}
{"_id":"ef92c3e4-d5d7-4b81-8a0b-1ab1eac10331","birthDate":"1942-12-01"}
Demo
.birthDate |= (
.value."$date"."$numberLong" |
tonumber |
. / 1000 |
strftime("%F")
)
With -c, this produces the desired output exactly.
{"_id":"ae53d3ec-8fc3-44fc-a7eb-f2f32da4eaae","birthDate":"1955-02-10"}
{"_id":"ef92c3e4-d5d7-4b81-8a0b-1ab1eac10331","birthDate":"1942-12-01"}
Demo on jqplay
Related
I have been trying to understand jq, and the following piuzzle is giving me a headache: I can construct two expressions, A and B, which seem to produce the same output. And yet, when I surround them with [] array construction braces (as in [A] and [B]), they produce different output. In this case, the expressions are:
A := jq '. | add'
B := jq -s `.[] | add`
Concretely:
$ echo '[1,2] [3,4]' | jq '.'
[1,2]
[3,4]
$ echo '[1,2] [3,4]' | jq '. | add'
3
7
# Now surround with array construction and we get two values:
$ echo '[1,2] [3,4]' | jq '[. | add]'
[3]
[7]
$ echo '[1,2] [3,4]' | jq -s '.[]'
[1,2]
[3,4]
$ echo '[1,2] [3,4]' | jq -s '.[] | add'
3
7
# Now surround with array construction and we get only one value:
$ echo '[1,2] [3,4]' | jq -s '[.[] | add]'
[3,7]
What is going on here? Why is it that the B expression, which applies the --slurp setting but appears to produce identical intermediate output to the A expression, produces different output when surrounded with [] array construction brackets?
When jq is fed with a stream, just like [1,2] [3,4] with two inputs, it executes the filter independently for each. That's why jq '[. | add]' will produce two results; each addition will separately be wrapped into an array.
When jq is given the --slurp option, it combines the stream to an array, rendering it just one input. Therefore jq -s '[.[] | add]' will have one result only; the multiple additions will be caught by the array constructor, which is executed just once.
How can I (generically) transform the input file below to the output file below, using jq. The record format of the output file is: array_index | key | value
Input file:
[{"a": 1, "b": 10},
{"a": 2, "d": "fred", "e": 30}]
Output File:
0|a|1
0|b|10
1|a|2
1|d|fred
1|e|30
Here's a solution using tostream, which creates a stream of paths and their values. Filter out those having values using select, flatten to align both, and join for the output format:
jq -r 'tostream | select(has(1)) | flatten | join("|")'
0|a|1
0|b|10
1|a|2
1|d|fred
1|e|30
Demo
Or a very similar one using paths to get the paths, scalars for the filter, and getpath for the corresponding value:
jq -r 'paths(scalars) as $p | [$p[], getpath($p)] | join("|")'
0|a|1
0|b|10
1|a|2
1|d|fred
1|e|30
Demo
< file.json jq -r 'to_entries
| .[]
| .key as $k
| ((.value | to_entries )[]
| [$k, .key, .value])
| #csv'
Output:
0,"a",1
0,"b",10
1,"a",2
1,"d","fred"
1,"e",30
You just need to remove the double quotes.
to_entries can be used to loop over the elements of arrays and objects in a way that gives both the key (index) and the value of the element.
jq -r '
to_entries[] |
.key as $id |
.value |
to_entries[] |
[ $id, .key, .value ] |
join("|")
'
Demo on jqplay
Replace join("|") with #csv to get proper CSV.
2021-08-04T22:55:12+0000
I want to convert the above string to local time.
But fromdateiso8601 does not work on this format. What is the best way to convert this kind of string to local time?
EDIT: I tried the following but the part of 23:55:12 does not change when the timezone is changed. I'd expect that this should be changed as TZ is changed.
$ TZ=America/New_York jq '.x | gsub("[+]0000"; "Z") | fromdateiso8601| gmtime | strftime("%Y-%m-%dT%H:%M:%S%Z")' <<< '{"x": "2021-08-04T22:55:12+0000" }'
"2021-08-04T23:55:12EST"
$ TZ=Europe/Madrid jq '.x | gsub("[+]0000"; "Z") | fromdateiso8601| gmtime | strftime("%Y-%m-%dT%H:%M:%S%Z")' <<< '{"x": "2021-08-04T22:55:12+0000" }'
"2021-08-04T23:55:12CET"
The first problem here is that jq's handling of timezones (TZ) is buggy; the second is that jq's built-ins do not recognize timezone offsets.
Unfortunately, there is to my knowledge not much that can easily be
done about jq's TZ-related bugginess other than using gojq, the Go
implementation of jq, instead.
Fortunately, time offsets can be handled quite easily, e.g. using the datetime_to_seconds filter as defined below.
So, for the time being, the following solution to the stated problem assumes the use of gojq rather than stedolan/jq. It has two main steps:
Use the generic filter datetime_to_seconds to convert the timestamp with an offset to "seconds since the epoch";
Use strflocaltime, which recognizes the environment variable TZ.
The solution is embedded in a bash script to facilitate a comparison between different versions of jq and gojq.
bash script
#!/bin/bash
# Syntax: go TZ
function go {
TZ="$1" $jq -Rr '
# Convert a timestamp with a possibly empty timezone offset to seconds since the Epoch.
# Input should be a string of the form yyyy-mm-ddThh:mm:ss or yyyy-mm-ddThh:mm:ss<OFFSET>
# where <OFFSET> is Z, or has the form [-+]hh:mm or [-+]hhmm
# If no timezone offset is explicitly given, it is taken to be Z.
def datetime_to_seconds:
if test("[-+]")
then
sub("(?<s>[-+])(?<d1>[0-9]{2})(?<d2>[0-9]{2})$"; "\(.s)\(.d1):\(.d2)")
| capture("(?<datetime>^.*T[0-9:]+)(?<s>[-+])(?<hh>[0-9]+):?(?<mm>[0-9]*)")
| (.datetime +"Z" | fromdateiso8601) as $seconds
| (if .s == "+" then -1 else 1 end) as $plusminus
| (.mm | if . == "" then 0 else . end) as $mm
| ([.hh,$mm] | map(tonumber) |.[0] *= 60 | add * 60 * $plusminus) as $offset
| ($seconds + $offset)
else . + (if test("Z") then "" else "Z" end) | fromdateiso8601
end;
datetime_to_seconds
| strflocaltime("%Y-%m-%dT%H:%M:%S %Z")
'
}
for jq in jq-1.6 jqMaster gojq ; do
echo $jq is $($jq --version)
done
echo
for TZ in America/New_York Europe/Madrid ;do
for jq in jq-1.6 jqMaster gojq ; do
for time in 2021-08-04T22:55:12+0000 ; do
echo $jq $TZ $time
echo $time | go $TZ
echo
done
done
done
Output
Here is the output with some "#" annotations.
jq-1.6 is jq-master-2e01ff1
jqMaster is jq-1.6-129-g80052e5-dirty
gojq is gojq 0.12.4 (rev: 244f9f7/go1.16.4)
jq-1.6 America/New_York 2021-08-04T22:55:12+0000
2021-08-04T19:55:12 EST # wrong
jqMaster America/New_York 2021-08-04T22:55:12+0000
2021-08-04T18:55:12 EST # wrong
gojq America/New_York 2021-08-04T22:55:12+0000
2021-08-04T18:55:12 EDT # correct
jq-1.6 Europe/Madrid 2021-08-04T22:55:12+0000
2021-08-05T01:55:12 CET # wrong
jqMaster Europe/Madrid 2021-08-04T22:55:12+0000
2021-08-05T00:55:12 CET # wrong
gojq Europe/Madrid 2021-08-04T22:55:12+0000
2021-08-05T00:55:12 CEST # correct
We have a log file where we store the searches happening on our platform. Now there is a departure date and I want to find the searches where departure date is after 330 days from today.
I am trying to run the query to find the difference between departure date column and logtime(entry time of the event into log). But getting the below error:
Query could not be parsed at 'datetime("departureDate")' on line [5,54]
Token: datetime("departureDate")
Line: 5
Position: 54
Date format of departure date is mm/dd/yyyy and logtime format is typical datetime format of app insight.
Query that I am running is below:
customEvents
| where name == "SearchLog"
| extend departureDate = tostring(customDimensions.departureDate)
| extend logTime = tostring(customDimensions.logTime)
| where datetime_diff('day',datetime("departureDate"),datetime("logTime")) > 200
As suggested I ran the below query but now I am getting 0 results but there is data that satisfy the given criteria.
customEvents
| where name == "SearchLog"
| extend departureDate = tostring(customDimensions.departureDate)
| extend logTime = tostring(customDimensions.logTime)
| where datetime_diff('day',todatetime(departureDate),todatetime(logTime)) > 200
Example:
departureDate
04/09/2020
logTime
8/13/2019 8:45:39 AM -04:00
I also tried the below query to check whether data format is supported or not and it gave correct response.
customEvents
| project datetime_diff('day', datetime('04/30/2020'),datetime('8/13/2019 8:25:51 AM -04:00'))
Please use the below query. Use todatetime statement to convert string to datetime
customEvents
| where name == "SearchLog"
| extend departureDate = tostring(customDimensions.departureDate)
| extend logTime = tostring(customDimensions.logTime)
| where datetime_diff('day',todatetime(departureDate),todatetime(logTime)) > 200
The double quotes inside datetime operator in where clause should be removed.
Your code should look like:
where datetime_diff('day',datetime(departureDate),datetime(logTime)) > 200
Let's say I have a file like this with 2 columns
56-cde
67-cde
56-cao
67-cgh
78-xyz
456-hhh
456-jjjj
45678-nnmn
45677-abdc
45678-aief
I am trying to get an output like this:
56-cde
56-cao
67-cde
67-cgh
456-hhh
456-jjjj
45678-aief
45678-nnmn
So basically instead of printing out the unique values I need to print the duplicates:
I tried to accomplish this using awk like this :
cat input.txt | awk -F"-" '{print $1,$2}' | sort -n | uniq -w 2 -D
This is without doubt showing me what values in column 1 have been duplicated, and also displaying the duplicated values of column 1 along with the respective column 2 values. But since I am hardcoding the number of bytes to 2, it displays the duplicated values only for the 2 digit numbers in column one. Is there a way to do this using awk ?
Thanks in advance.
See if your uniq has a -D option. My cygwin version does:
cat input.txt | sort | uniq -w 2 -D
another awk solution without arrays (but with presort)
sort -n file | awk -F- '
NR==1{p=$1; a=$0; c++; next}
p==$1{a=a RS $0; c++; next}
c{print a}
{a=$0; p=$1; c=0}
END{if(c) print a}'
This is what I came up with (just an awk program, no external sort, uniq etc.):
BEGIN { FS = "-" }
{ arr[$1] = arr[$1] "-" $2 }
END {
for (i in arr) {
if ((n = split(arr[i], a)) < 3) continue
for (j = 2; j <= n; ++j)
print i"-"a[j]
}
}
It collects all numbers along with the different strings attached
in arr (assuming the strings won't contain dashes -).
With gawk, you could use arrays of arrays in order to avoid the concatenation and splitting with dashes.
I would handle the varying-number-of-digits case by pre-conditioning the data so that the number field is a fixed large width (and use that width in uniq):
cat input.txt | awk -F- '{printf "%12d-%s\n",$1,$2}'| sort | uniq -w 12 -D
If you need the output left-justified as well, just tack on this post-conditioning step:
| awk '{print $1}'
Using Perl
$ cat two_cols.txt
56-cde
67-cde
56-cao
67-cgh
78-xyz
456-hhh
456-jjjj
45678-nnmn
45677-abdc
45678-aief
$ perl -F"-" -lane ' #t=#{$kv{$F[0]}}; push(#t,$_); $kv{$F[0]}=[#t]; END { while(($x,$y)=each(%kv)){ print join("\n",#{$y}) if scalar #{$y}>1 }} ' two_cols.txt
67-cde
67-cgh
56-cde
56-cao
456-hhh
456-jjjj
45678-nnmn
45678-aief
$