Elasticsearch setting format for custom date - datetime

This is my date format:
10:00 2019-06-03
According to the Elasticsearch documents, I can do this:
{
"mappings": {
"properties": {
"date": {
"type": "date",
"format": "HH:mm yyyy-MM-dd"
}
}
}
}
However, when I do this, it doesn't recognise this as a date (and therefore convert it to a timestamp. Does anyone understand why?
https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html

Let's say we have the below mapping for the date field you've in your question
PUT <your_index_name>
{
"mappings":{
"properties":{
"date":{
"type":"date",
"format":"HH:mm yyyy-MM-dd||yyyy-MM-dd HH:mm"
}
}
}
}
Notice how I've added the two different types of date formats
Let me add two documents now:
POST mydate/_doc/1
{
"date": "10:00 2019-06-03"
}
POST mydate/_doc/2
{
"date": "2019-06-03 10:00"
}
Notice the above two date values. Semantically they both mean exactly the same. This has to be preserved while querying.
Now if the user wants to search based on semantic meaning of what a date value should be, then he/she should get both the documents.
POST <your_index_name>/_search
{
"query": {
"match": {
"date": "10:00 2019-06-03"
}
}
}
Response:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "mydate",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"date" : "10:00 2019-06-03"
}
},
{
"_index" : "mydate",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"date" : "2019-06-03 10:00"
}
}
]
}
}
Which is what observed in the response. Both those documents are returned.
This would only be possible if the underlying mechanism to store the values are exactly the same. Inside the inverted index, both these values would be stored as the same long number.
Now if you remove that semantic definition, then both these values are no different than just simple strings, where you know, 10:00 2019-06-03 and 2019-06-03 10:00 are both different, and adhere to semantics of what a string should be (And if date performs like this, why have date datatype at all, correct).
What we specify as format in the mapping is how the date value should appear to the user.
Note the below info from this link:
Internally, dates are converted to UTC (if the time-zone is specified)
and stored as a long number representing milliseconds-since-the-epoch.
Queries on dates are internally converted to range queries on this
long representation, and the result of aggregations and stored fields
is converted back to a string depending on the date format that is
associated with the field.
Hope this helps!

Related

Elasticsearch query using Kibana does not work using Java Rest Client API

Can someone help determine why a kibana query does not return hits when using the Elasticsearch Java Rest Client API.
I am currently using
Elasticsearch/Kibana: 7.16.2
Elasticsearch Java Client: 6.6.2
I am reluctant to upgrade java client due to numerous Geometry related updates needed.
fields:
mydatetime: timestamp of doc
category: keyword field
We have 1000 or more records for each category a day.
I want an aggregation that shows categories by day and includes the first and last "date" for the category.
This query works in Kibana
GET /mycategories/_search
{
"size":0,
"aggregations":{
"bucket_by_date":{
"date_histogram":{
"field":"mydatefield",
"format":"yyyy-MM-dd",
"interval":"1d",
"offset":0,
"order":{
"_key":"asc"
},
"keyed":false,
"min_doc_count":1
},
"aggregations":{
"unique_categories":{
"terms":{
"field":"category",
"size":10,
"min_doc_count":1,
"shard_min_doc_count":0,
"show_term_doc_count_error":false,
"order":[
{
"_count":"desc"
},
{
"_key":"asc"
}
]
},
"aggregations":{
"min_mydatefield":{
"min":{
"field":"mydatefield"
}
},
"max_mydatefield":{
"max":{
"field":"mydatefield"
}
}
}
}
}
}
}
}
The first record of the result...category1 and category2 for 2022 05 07 with min and max "mydatetime" for each category
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 4,
"successful" : 4,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2593,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"bucket_by_date" : {
"buckets" : [
{
"key_as_string" : "2022-05-07",
"key" : 1651881600000,
"doc_count" : 2,
"unique_missions" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "category1",
"doc_count" : 1,
"min_mydatefield" : {
"value" : 1.651967952E12,
"value_as_string" : "2022-05-07T13:22:17.000Z"
},
"max_mydatefield" : {
"value" : 1.651967952E12,
"value_as_string" : "2022-05-07T23:59:12.000Z"
}
},
{
"key" : "category2",
"doc_count" : 1,
"min_mydatefield" : {
"value" : 1.651967947E12,
"value_as_string" : "2022-05-07T03:47:23.000Z"
},
"max_mydatefield" : {
"value" : 1.651967947E12,
"value_as_string" : "2022-05-07T23:59:07.000Z"
}
}
]
}
},
I have successfully coded other, less complex aggregations without problems. However, i have not been able to get either an AggregationBuilder or WrapperQuery. Zero results are returned.
{"took":0,"timed_out":false,"_shards":{"total":0,"successful":0,"skipped":0,"failed":0},"hits":{"total":0,"max_score":0.0,"hits":[]}}
Before executing the query, i copy the SearchRequest.source() into Kibana, and it runs and returns the desired information.
Below is the AggregationBuilder code that seems to replicate my kibana query, but returns no results.
AggregationBuilder aggregation =
AggregationBuilders
.dateHistogram("bucket_by_date").format("yyyy-MM-dd")
.minDocCount(1)
.dateHistogramInterval(DateHistogramInterval.DAY)
.field("mydatefield")
.subAggregation(
AggregationBuilders
.terms("unique_categories")
.field("category")
.subAggregation(
AggregationBuilders
.min("min_mydatefield")
.field("mydatefield")
)
.subAggregation(
AggregationBuilders
.max("max_mydatefield")
.field("mydatefield")
)
);

How to keep FULL_TRANSITIVE compatibility while adding new types to nested map in avro schema?

I have an existing avro schema that contains a field with a nested map of map of a record type (let's call it RecordA for now). I'm wondering if it's possible to add a new record type, RecordB, to this nested map of maps while maintaining FULL_TRANSIENT compatibility?
My thinking was that as long as the inner maps gets defaulted to an empty map it still adheres to the schema so it's backwards/forward compatible.
I've tried to redefine the type map<map<RecordA>> maps to map<map<union{RecordA, RecordB}>> maps in an .avdl file, but the schema registry is telling me this is not compatible.
I've also tried to default each map individually to an empty map ({ }) in a generated .avsc file, but schema registry says that's incompatible as well.
I do want to acknowledge that I know map<map<..>> is a bad practice, but what's been done has been done.
Registered Schema (original) .avdl:
record Outer {
map<map<RecordA>> maps;
}
record RecordA {
string value;
string updateTime;
}
Attempt with .avdl:
record Outer {
map<map<union{RecordA, RecordB}>> maps = {};
}
record RecordA {
string value;
string updateTime;
}
record RecordB {
union{null, array<string>} values = null;
union{null, string} updateTime = null;
}
Attempt with .avsc:
{
"name" : "maps",
"type" : {
"type" : "map",
"values" : {
"type" : "map",
"values" : [ {
"type" : "record",
"name" : "RecordA",
"fields" : [ {
"name" : "value",
"type" : "string"
}, {
"name" : "updateTime",
"type" : "string"
} ],
"default": { }
}, {
"type" : "record",
"name" : "RecordB",
"fields" : [ {
"name" : "value",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "values",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "updateTime",
"type" : [ "null", "string" ],
"default" : null
} ],
"default": { }
} ]
}
},
"default" : { }
}
The end goal is to have a map of maps to a record who has a field that can either be a string or array<string>. The original schema was registered to a schema-registry where the field has type string with no union {} with null or a default, so I believe the map needs to be map to a union of types with either version of the field.
Each try has returned the following from the schema-registry compatibility API
{
"is_compatible": false
}
Any insight would be very much appreciated!

ElasticSearch - difference between two date fields

I have an index in ElasticSearch with two fields of date type (metricsTime & arrivalTime). A sample document is quoted below. In Kibana, I created a scripted field delay for the difference between those two fields. My painless script is:
doc['arrivalTime'].value - doc['metricsTime'].value
However, I got the following error message when navigating to Kibana's Discover tab: class_cast_exception: Cannot apply [-] operation to types [org.joda.time.MutableDateTime] and [org.joda.time.MutableDateTime].
This looks same as the error mentioned in https://discuss.elastic.co/t/problem-in-difference-between-two-dates/121655. But the answer in that page suggests that my script is correct. Could you please help?
Thanks!
{
"_index": "events",
"_type": "_doc",
"_id": "HLV274_1537682400000",
"_version": 1,
"_score": null,
"_source": {
"metricsTime": 1537682400000,
"box": "HLV274",
"arrivalTime": 1539930920347
},
"fields": {
"metricsTime": [
"2018-09-23T06:00:00.000Z"
],
"arrivalTime": [
"2018-10-19T06:35:20.347Z"
]
},
"sort": [
1539930920347
]
}
Check the list of Lucene Expressions to check what expressions are available for date field and how you could use them
Just for sake of simplicity, check the below query. I have created two fields metricsTime and arrivalTime in a sample index I've created.
Sample Document
POST mydateindex/mydocs/1
{
"metricsTime": "2018-09-23T06:00:00.000Z",
"arrivalTime": "2018-10-19T06:35:20.347Z"
}
Query using painless script
POST mydateindex/_search
{ "query": {
"bool": {
"must": {
"match_all": {
}
},
"filter": {
"bool" : {
"must" : {
"script" : {
"script" : {
"inline" : "doc['arrivalTime'].date.dayOfYear - doc['metricsTime'].date.dayOfYear > params.difference",
"lang" : "painless",
"params": {
"difference": 2
}
}
}
}
}
}
}
}
}
Note the below line in the query
"inline" : "doc['arrivalTime'].date.dayOfYear - doc['metricsTime'].date.dayOfYear > params.difference"
Now if you change the value of difference from 2 to 26 (which is one more than the difference in the dates) then you see that the above query would not return the document.
But nevertheless, I have mentioned the query as an example as how using scripting you can compare two different and please do refer to the link I've shared.

Place detail "place.opening_hours" not returning data for all markers

I am having an issue where I make a request to the Places Libray from Google Maps JavaScript API for Place Details, but the response is not returning data under the opening_hours object for all markers, only some of them. I have checked and those markers do mark actual places from my text search and return data for the other fields, and I checked on Google Maps Native App of those places returning "undefined" for the opening_hours and they do show opening and closing hours. Here is some code to better understand what I mean:
map.ts:
//Request for all places based on query search
var request = {
location: this.myLocation,
radius: '400',
query: "McDonalds"
};
// Callback function for places
function callback(results, status) {
if (status == google.maps.places.PlacesServiceStatus.OK) {
for (let i = 0; i < results.length; i++) {
let placeLoc = results[i].geometry.location;
scopeObj.addMarker(results[i], placeLoc);
}
}
};
let service = new google.maps.places.PlacesService(this.map);
service.textSearch(request, callback);
//Request for place details of each place, (this would loop through all place ids stored in an array
var request = {
placeId: details.place_id,
fields: ['name', 'formatted_address', 'formatted_phone_number', 'opening_hours',]
};
function callback(place, status) {
if (status == google.maps.places.PlacesServiceStatus.OK) {
console.log(place.opening_hourse);
/* Returns Object object if data is found,
returns undefined otherwise, question is why
if status returns OK for all places that have
been marked with markers on the map? */
}
}
let service = new google.maps.places.PlacesService(this.map);
service.getDetails(request, callback);
Using my own implementation of the API I am receiving opening_hours information from various places responses. I've omitted all the other fluff, but this is a Place Details request I've done and can repeat on numerous businesses and still get info.
{
"html_attributions" : [],
"result" : {
OMITTED
"name" : "McDonald's",
"opening_hours" : {
"open_now" : true,
"periods" : [
{
"close" : {
"day" : 0,
"time" : "1700"
},
"open" : {
"day" : 0,
"time" : "0800"
}
},
{
"close" : {
"day" : 1,
"time" : "2100"
},
"open" : {
"day" : 1,
"time" : "0800"
}
},
{
"close" : {
"day" : 2,
"time" : "2100"
},
"open" : {
"day" : 2,
"time" : "0800"
}
},
{
"close" : {
"day" : 3,
"time" : "2100"
},
"open" : {
"day" : 3,
"time" : "0800"
}
},
{
"close" : {
"day" : 4,
"time" : "2100"
},
"open" : {
"day" : 4,
"time" : "0800"
}
},
{
"close" : {
"day" : 5,
"time" : "2100"
},
"open" : {
"day" : 5,
"time" : "0800"
}
},
{
"close" : {
"day" : 6,
"time" : "1700"
},
"open" : {
"day" : 6,
"time" : "0800"
}
}
],
"weekday_text" : [
"Monday: 8:00 AM – 9:00 PM",
"Tuesday: 8:00 AM – 9:00 PM",
"Wednesday: 8:00 AM – 9:00 PM",
"Thursday: 8:00 AM – 9:00 PM",
"Friday: 8:00 AM – 9:00 PM",
"Saturday: 8:00 AM – 5:00 PM",
"Sunday: 8:00 AM – 5:00 PM"
]
},
OMITTED
"status" : "OK"
}
The only difference I can see with your implementation and mine is that you're specifying which fields you'd like in your request.
You can try retrieving all fields rather than specifically requesting the fields like you've done:
var request = {
placeId: details.place_id,
};
Fields is an optional parameter, and could be causing you issues.

Insert date as epoch_seconds, output as formatted date

I have a set of timestamps formatted as seconds since the epoch. I'd like to insert to ElasticSearch as epoch_seconds but when querying would like to see the output as a pretty date, e.g. strict_date_optional_time.
My below mapping preserves the format that the input came in - is there any way to normalize the output to just one format via the mapping api?
Current Mapping:
PUT example
{
"mappings": {
"time": {
"properties": {
"time_stamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_second"
}
}
}
}
}
Example docs
POST example/time
{
"time_stamp": "2018-03-18T00:00:00.000Z"
}
POST example/time
{
"time_stamp": "1521389162" // Would like this to output as: 2018-03-18T16:05:50.000Z
}
GET example/_search output:
{
"total": 2,
"max_score": 1,
"hits": [
{
"_source": {
"time_stamp": "1521389162", // Stayed as epoch_second
}
},
{
"_source": {
"time_stamp": "2018-03-18T00:00:00.000Z"
}
}
]
}
Elasticsearch differentiates between the _source and the so called stored fields. The first one is supposed to represent your input.
If you actually use stored fields (by specifying store=true in your mapping) then specify multiple date formats this is easy: (emphasis mine)
Multiple formats can be specified by separating them with || as a separator. Each format will be tried in turn until a matching format is found. The first format will be used to convert the milliseconds-since-the-epoch value back into a string.
I have tested this with elasticsearch 5.6.4 and it works fine:
PUT /test -d '{ "mappings": {"doc": { "properties": {"post_date": {
"type":"date",
"format":"basic_date_time||epoch_millis",
"store":true
} } } } }'
PUT /test/doc/2 -d '{
"user" : "test1",
"post_date" : "20150101T121030.000+01:00"
}'
PUT /test/doc/1 -d '{
"user" : "test2",
"post_date" : 1525167490500
}'
Note how two different input-formats will result in the same format when using GET /test/_search?stored_fields=post_date&pretty=1
{
"hits" : [
{
"_index" : "test",
"_type" : "doc",
"_id" : "2",
"_score" : 1.0,
"fields" : {
"post_date" : [
"20150101T111030.000Z"
]
}
},
{
"_index" : "test",
"_type" : "doc",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"post_date" : [
"20180501T093810.500Z"
]
}
}
]
}
If you want to change the input (in _source) you're not so lucky, the mapping-transform feature has been removed:
This was deprecated in 2.0.0 because it made debugging very difficult. As of now there really isn’t a feature to use in its place other than transforming the document in the client application.
If, instead of changing the stored data you are interested in formatting the output, have a look at this answer to Format date in elasticsearch query (during retrieval)

Resources