Insert date as epoch_seconds, output as formatted date - datetime

I have a set of timestamps formatted as seconds since the epoch. I'd like to insert to ElasticSearch as epoch_seconds but when querying would like to see the output as a pretty date, e.g. strict_date_optional_time.
My below mapping preserves the format that the input came in - is there any way to normalize the output to just one format via the mapping api?
Current Mapping:
PUT example
{
"mappings": {
"time": {
"properties": {
"time_stamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_second"
}
}
}
}
}
Example docs
POST example/time
{
"time_stamp": "2018-03-18T00:00:00.000Z"
}
POST example/time
{
"time_stamp": "1521389162" // Would like this to output as: 2018-03-18T16:05:50.000Z
}
GET example/_search output:
{
"total": 2,
"max_score": 1,
"hits": [
{
"_source": {
"time_stamp": "1521389162", // Stayed as epoch_second
}
},
{
"_source": {
"time_stamp": "2018-03-18T00:00:00.000Z"
}
}
]
}

Elasticsearch differentiates between the _source and the so called stored fields. The first one is supposed to represent your input.
If you actually use stored fields (by specifying store=true in your mapping) then specify multiple date formats this is easy: (emphasis mine)
Multiple formats can be specified by separating them with || as a separator. Each format will be tried in turn until a matching format is found. The first format will be used to convert the milliseconds-since-the-epoch value back into a string.
I have tested this with elasticsearch 5.6.4 and it works fine:
PUT /test -d '{ "mappings": {"doc": { "properties": {"post_date": {
"type":"date",
"format":"basic_date_time||epoch_millis",
"store":true
} } } } }'
PUT /test/doc/2 -d '{
"user" : "test1",
"post_date" : "20150101T121030.000+01:00"
}'
PUT /test/doc/1 -d '{
"user" : "test2",
"post_date" : 1525167490500
}'
Note how two different input-formats will result in the same format when using GET /test/_search?stored_fields=post_date&pretty=1
{
"hits" : [
{
"_index" : "test",
"_type" : "doc",
"_id" : "2",
"_score" : 1.0,
"fields" : {
"post_date" : [
"20150101T111030.000Z"
]
}
},
{
"_index" : "test",
"_type" : "doc",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"post_date" : [
"20180501T093810.500Z"
]
}
}
]
}
If you want to change the input (in _source) you're not so lucky, the mapping-transform feature has been removed:
This was deprecated in 2.0.0 because it made debugging very difficult. As of now there really isn’t a feature to use in its place other than transforming the document in the client application.
If, instead of changing the stored data you are interested in formatting the output, have a look at this answer to Format date in elasticsearch query (during retrieval)

Related

how to parse using Jsonpath?

I want to know the value of 'pageName' where the value of 'createdTs' is '333' in JSON like below, how should I use Jsonpath?
[
{
"Test1":{
"pageName":"Apple"
},
"Test1-1":{
"createdTs":"333"
}
},
{
"Test2":{
"pageName":"Tomato"
},
"Test2-1":{
"createdTs":"555"
}
}
]
.Test1-1[?(#.createdTs == 333)] would look like this:
[
{
"createdTs" : "333"
}
]
The parsing result I want is this.
[
{
"Test1":{
"pageName":"Apple"
},
"Test1-1":{
"createdTs":"333"
}
}
]
if you are using Jayway JSONPath a Java DSL for reading JSON documents, then you can use the these jsonpath expressions.
$[?(#.['Test1-1'].createdTs == 333)]
OR using in filter operator
$[?(333 in #..createdTs)]
OR using contains filter operator
$[?(#..createdTs contains 333)]

How to keep FULL_TRANSITIVE compatibility while adding new types to nested map in avro schema?

I have an existing avro schema that contains a field with a nested map of map of a record type (let's call it RecordA for now). I'm wondering if it's possible to add a new record type, RecordB, to this nested map of maps while maintaining FULL_TRANSIENT compatibility?
My thinking was that as long as the inner maps gets defaulted to an empty map it still adheres to the schema so it's backwards/forward compatible.
I've tried to redefine the type map<map<RecordA>> maps to map<map<union{RecordA, RecordB}>> maps in an .avdl file, but the schema registry is telling me this is not compatible.
I've also tried to default each map individually to an empty map ({ }) in a generated .avsc file, but schema registry says that's incompatible as well.
I do want to acknowledge that I know map<map<..>> is a bad practice, but what's been done has been done.
Registered Schema (original) .avdl:
record Outer {
map<map<RecordA>> maps;
}
record RecordA {
string value;
string updateTime;
}
Attempt with .avdl:
record Outer {
map<map<union{RecordA, RecordB}>> maps = {};
}
record RecordA {
string value;
string updateTime;
}
record RecordB {
union{null, array<string>} values = null;
union{null, string} updateTime = null;
}
Attempt with .avsc:
{
"name" : "maps",
"type" : {
"type" : "map",
"values" : {
"type" : "map",
"values" : [ {
"type" : "record",
"name" : "RecordA",
"fields" : [ {
"name" : "value",
"type" : "string"
}, {
"name" : "updateTime",
"type" : "string"
} ],
"default": { }
}, {
"type" : "record",
"name" : "RecordB",
"fields" : [ {
"name" : "value",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "values",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "updateTime",
"type" : [ "null", "string" ],
"default" : null
} ],
"default": { }
} ]
}
},
"default" : { }
}
The end goal is to have a map of maps to a record who has a field that can either be a string or array<string>. The original schema was registered to a schema-registry where the field has type string with no union {} with null or a default, so I believe the map needs to be map to a union of types with either version of the field.
Each try has returned the following from the schema-registry compatibility API
{
"is_compatible": false
}
Any insight would be very much appreciated!

Elasticsearch setting format for custom date

This is my date format:
10:00 2019-06-03
According to the Elasticsearch documents, I can do this:
{
"mappings": {
"properties": {
"date": {
"type": "date",
"format": "HH:mm yyyy-MM-dd"
}
}
}
}
However, when I do this, it doesn't recognise this as a date (and therefore convert it to a timestamp. Does anyone understand why?
https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html
Let's say we have the below mapping for the date field you've in your question
PUT <your_index_name>
{
"mappings":{
"properties":{
"date":{
"type":"date",
"format":"HH:mm yyyy-MM-dd||yyyy-MM-dd HH:mm"
}
}
}
}
Notice how I've added the two different types of date formats
Let me add two documents now:
POST mydate/_doc/1
{
"date": "10:00 2019-06-03"
}
POST mydate/_doc/2
{
"date": "2019-06-03 10:00"
}
Notice the above two date values. Semantically they both mean exactly the same. This has to be preserved while querying.
Now if the user wants to search based on semantic meaning of what a date value should be, then he/she should get both the documents.
POST <your_index_name>/_search
{
"query": {
"match": {
"date": "10:00 2019-06-03"
}
}
}
Response:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "mydate",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"date" : "10:00 2019-06-03"
}
},
{
"_index" : "mydate",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"date" : "2019-06-03 10:00"
}
}
]
}
}
Which is what observed in the response. Both those documents are returned.
This would only be possible if the underlying mechanism to store the values are exactly the same. Inside the inverted index, both these values would be stored as the same long number.
Now if you remove that semantic definition, then both these values are no different than just simple strings, where you know, 10:00 2019-06-03 and 2019-06-03 10:00 are both different, and adhere to semantics of what a string should be (And if date performs like this, why have date datatype at all, correct).
What we specify as format in the mapping is how the date value should appear to the user.
Note the below info from this link:
Internally, dates are converted to UTC (if the time-zone is specified)
and stored as a long number representing milliseconds-since-the-epoch.
Queries on dates are internally converted to range queries on this
long representation, and the result of aggregations and stored fields
is converted back to a string depending on the date format that is
associated with the field.
Hope this helps!

ElasticSearch - difference between two date fields

I have an index in ElasticSearch with two fields of date type (metricsTime & arrivalTime). A sample document is quoted below. In Kibana, I created a scripted field delay for the difference between those two fields. My painless script is:
doc['arrivalTime'].value - doc['metricsTime'].value
However, I got the following error message when navigating to Kibana's Discover tab: class_cast_exception: Cannot apply [-] operation to types [org.joda.time.MutableDateTime] and [org.joda.time.MutableDateTime].
This looks same as the error mentioned in https://discuss.elastic.co/t/problem-in-difference-between-two-dates/121655. But the answer in that page suggests that my script is correct. Could you please help?
Thanks!
{
"_index": "events",
"_type": "_doc",
"_id": "HLV274_1537682400000",
"_version": 1,
"_score": null,
"_source": {
"metricsTime": 1537682400000,
"box": "HLV274",
"arrivalTime": 1539930920347
},
"fields": {
"metricsTime": [
"2018-09-23T06:00:00.000Z"
],
"arrivalTime": [
"2018-10-19T06:35:20.347Z"
]
},
"sort": [
1539930920347
]
}
Check the list of Lucene Expressions to check what expressions are available for date field and how you could use them
Just for sake of simplicity, check the below query. I have created two fields metricsTime and arrivalTime in a sample index I've created.
Sample Document
POST mydateindex/mydocs/1
{
"metricsTime": "2018-09-23T06:00:00.000Z",
"arrivalTime": "2018-10-19T06:35:20.347Z"
}
Query using painless script
POST mydateindex/_search
{ "query": {
"bool": {
"must": {
"match_all": {
}
},
"filter": {
"bool" : {
"must" : {
"script" : {
"script" : {
"inline" : "doc['arrivalTime'].date.dayOfYear - doc['metricsTime'].date.dayOfYear > params.difference",
"lang" : "painless",
"params": {
"difference": 2
}
}
}
}
}
}
}
}
}
Note the below line in the query
"inline" : "doc['arrivalTime'].date.dayOfYear - doc['metricsTime'].date.dayOfYear > params.difference"
Now if you change the value of difference from 2 to 26 (which is one more than the difference in the dates) then you see that the above query would not return the document.
But nevertheless, I have mentioned the query as an example as how using scripting you can compare two different and please do refer to the link I've shared.

API Gateway and DynamoDB PutItem for String Set

I can't seem to find how to correctly call PutItem for a StringSet in DynamoDB through API Gateway. If I call it like I would for a List of Maps, then I get objects returned. Example data is below.
{
"eventId": "Lorem",
"eventName": "Lorem",
"companies": [
{
"companyId": "Lorem",
"companyName": "Lorem"
}
],
"eventTags": [
"Lorem",
"Lorem"
]
}
And my example template call for companies:
"companies" : {
"L": [
#foreach($elem in $inputRoot.companies) {
"M": {
"companyId": {
"S": "$elem.companyId"
},
"companyName": {
"S": "$elem.companyName"
}
}
} #if($foreach.hasNext),#end
#end
]
}
I've tried to call it with String Set listed, but it errors out still and tells me that "Start of structure or map found where not expected" or that serialization failed.
"eventTags" : {
"SS": [
#foreach($elem in $inputRoot.eventTags) {
"S":"$elem"
} #if($foreach.hasNext),#end
#end
]
}
What is the proper way to call PutItem for converting an array of strings to a String Set?
If you are using JavaScript AWS SDK, you can use document client API (docClient.createSet) to store the SET data type.
docClient.createSet - converts the array into SET data type
var docClient = new AWS.DynamoDB.DocumentClient();
var params = {
TableName:table,
Item:{
"yearkey": year,
"title": title
"product" : docClient.createSet(['milk','veg'])
}
};

Resources