elasticsearch - Date format requires exactly 3 decimals - datetime

I'm having trouble with date parsing in elasticsearch 7.10.1.
Here's (a relevant part of) the mapping for the index:
"utcTime": {
"type": "date",
"format": "strict_date_optional_time_nanos"
}
Date format reference.
Some of the documents are accepted, for example documents with:
"utcTime": "2021-02-17T09:50:13.173Z"
"utcTime": "2021-02-17T09:51:44.158Z"
Note that in both cases, there are exactly 3 decimals to the seconds.
This, on the other hand, is rejected:
"utcTime": "2021-02-17T09:51:45.07Z"
illegal_argument_exception: failed to parse date field [2021-02-17T09:51:45.07Z] with format [yyyy-MM-dd''T''HH:mm:ss.SSSXX]
In this case, there are only two decimals. I'm using Newtonsoft's JSON.net to do the serialization, with a format that should always include 3 decimals, but it doesn't seem to do so anyway. It'll include at most 3 decimals, though.
How can I tell elasticsearch to accept date formats with anywhere between 0 and 3 decimals for the seconds?
EDIT
I finally found the issue, which had nothing to do with the mapping, but rather with a pipeline processor date_index_name.
PUT _ingest/pipeline/test_reroute_pipeline
{
"description" : "Route documents to another index",
"processors" : [
{
"date_index_name": {
"field": "utcTime",
"date_rounding": "d",
"index_name_prefix": "rerouted-"
}
}
]
}
Because the date_format parameter wasn't defined, it would remember the format of the first date received. If it was 2 decimals, it would require 2 every time. If it was 3, it would require three.
Specifying the date format solved the issue for good:
PUT _ingest/pipeline/test_reroute_pipeline
{
"description" : "Route documents to another index",
"processors" : [
{
"date_index_name": {
"field": "utcTime",
"date_rounding": "d",
"index_name_prefix": "rerouted-",
"date_formats": ["ISO8601"]
}
}
]
}

I just tried on a fresh new 7.10.1 cluster and it also accepted 1, 2, 3 decimals for the seconds part.
Looking at the error message you get
illegal_argument_exception: failed to parse date field [2021-02-17T09:51:45.07Z] with format [yyyy-MM-dd''T''HH:mm:ss.SSSXX]
The format that seems to be set is yyyy-MM-dd''T''HH:mm:ss.SSSXX and it is different from strict_date_optional_time_nanos which is yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ
If you check the real mapping from your index, I'm pretty sure the utcTime field doesn't have strict_date_optional_time_nanos as the format.

Related

Pinot fasthll and distinctcounthll returns different values

we are using pinot hll, and got suggested to switch from fasthll to distinctcounthll, but we got the count very different, with the same condition we have 1000x difference.
Example:
SELECT fasthll(my_hll), distinctcounthll(my_hll)
FROM counts_table WHERE timestamp >= 1500768000
I get results:
"aggregationResults": [
{
"function": "fastHLL_my_hll",
"value": "68685244"
}, {
"function": "distinctCountHLL_my_hll",
"value": "50535"
}]
Could anyone suggest what's the big difference between them?
Please refer to pinot-issue-5153.
FastHll will convert one string into a hyperloglog object, which may represent thousand unique values. DistinctCountHLL treats string as a value, not hyperloglog object, so it will return the approximation of how many unique hyperloglog serialized strings, the value should be close to your total number scanned .
fasthll is deprecated because of the low performance of deserialization. You may generate BYTES type for serialized HyperLogLog using org.apache.pinot.core.common.ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.serialize(hyperLogLog) and query it with distinctcounthll

Saving date in microsecond format in ElasticSearch

I am trying to save set of events from MySQL database into elastic search using jdbc input plugin with Logstash. The event record in database contains date fields which are in microseconds format. Practically, there are records in database between set of microseconds.
While importing data, Elasticsearch is truncating the microseconds date format into millisecond format. How could I save the data in microsecond format? The elasticsearch documentation says they follow the JODA time API to store date formats, which is not supporting the microseconds and truncating by adding a Z at the end of the timestamp.
Sample timestamp after truncation : 2018-05-02T08:13:29.268Z
Original timestamp in database : 2018-05-02T08:13:29.268482
The Z is not a result of the truncation but the GMT timezone.
ES supports microseconds, too, provided you've specified the correct date format in your mapping.
If the date field in your mapping is specified like this:
"date": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ss.SSSSSS"
}
Then you can index your dates with the exact microsecond precision as you have in your database
UPDATE
Here is a full re-creation that shows you that it works:
PUT myindex
{
"mappings": {
"doc": {
"properties": {
"date": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ss.SSSSSS"
}
}
}
}
}
PUT myindex/doc/1
{
"date": "2018-05-02T08:13:29.268482"
}
Side note, "date" datatype stores data in milliseconds in elasticsearch so here in case nanoseconds precision level are wanted in date ranges queries; the appropriate datatype is date_nanos

How do I create a Slot that accepts a currency amount

I want to receive a dollar amount in my utterance. So, for example, if I ask Alexa:
Send $100.51 to Kroger.
(pronounced, One hundred dollars and fifty one cents) I want to receive the value 100.51 in a proper slot.
I have tried searching and I defined my utterance slots like this:
"slots": [
{
"name": "Amount",
"type": "AMAZON.NUMBER"
}
]
But on JSON input I only get this result:
"slots": {
"Amount": {
"name": "Amount",
"value": "?"
}
}
How can I make Alexa accepts currency values?
I'm a bit confused by what you wrote in your last sentence and the code, but I'll confirm that there is no built-in intent or slot for handling currency.
So, you'll have to do it manually using AMAZON.NUMBER slot type as you seem to be trying.
I would imagine that you will want to create utterences with two AMAZON.NUMBER slots - one for dollars and one for cents.
Easy, make a custom slot and just use $10.11, $.03, and $1003.84 as the sample's. It will work as currency now, accepting users dollars and cents utterances and converting them to a dollar $XX.XX format.

Moment throwing TypeError string.indexOf when using Backgrid

I am using backgrid to display JSON results from the backend Db. One of the fields is time in seconds from 1970 (eg. 1362578461000) when the Backbone view receives this data backgrid sends it off to moment for formating. Moment then throws a javaScript TypeError exception indicated on line 758 of Moment.js (v.2.0.0)
TypeError: string.indexOf is not a function
The column format looks like this:
var columns = [{
name: "startTime",
label: "Start Time",
editable: false,
cell: "moment"
}, {
name: "endTime",
label: "End Time",
editable: false,
cell: "moment"
}];
Putting a brakepoint in Firebug it looks like Moment thinks that the value is a integer rather then a string.
utc()moment.js (line 960)
input = 1362578461000
format = "YYYY-MM-DDTHH:mm:ssZ"
lang = undefined
And the call to makeDateFromStringAndFormat looks like this:
makeDateFromStringAndFormat()moment.js (line 758)
config = Object { _useUTC=true, _isUTC=true, _i=1362578461000, more...}
Any ideas as to what I can do to fix/get around this?
Thanks
Author of Backgrid here. There are 2 parts to your question:
Backgrid.js has only gained compatibility with moment.js 2.0.0 in 0.2.5 released yesterday.
The moment cell doesn't accept integers as input in the model because it tries to convert timezones and locales, so your model values have to be in a datetime string format that moment knows how to parse.

No Idea how to create a specific MapReduce in CouchDB

I've got 3 types of documents in my db:
{
param: "a",
timestamp: "t"
} (Type 1)
{
param: "b",
partof: "a"
} (Type 2)
{
param: "b",
timestamp: "x"
} (Type 3)
(I can't alter the layout...;-( )
Type 1 defines a start timestamp, it's like the start event. A Type 1 is connected to several Type 3 docs by Type 2 documents.
I want to get the latest Type 3 (highest timestamp) and the corresponding type 1 document.
How may I organize my Map/Reduce?
Easy. For highly relational data, use a relational database.
As user jhs stated before me, your data is relational, and if you can't change it, then you might want to reconsider using CouchDB.
By relational we mean that each "type 1" or "type 3" document in your data "knows" only about itself, and "type 2" documents hold the knowledge about the relation between documents of the other types. With CouchDB, you can only index by fields in the documents themselves, and going one level deeper when querying using includedocs=true. Thus, what you asked for cannot be achieved with a single CouchDB query, because some of the desired data is two levels away from the requested document.
Here is a two-query solution:
{
"views": {
"param-by-timestamp": {
"map": "function(doc) { if (doc.timestamp) emit(doc.timestamp, [doc.timestamp, doc.param]); }",
"reduce": "function(keys, values) { return values.reduce(function(p, c) { return c[0] > p[0] ? c : p }) }"
},
"partof-by-param": {
"map": "function(doc) { if (doc.partof) emit(doc.param, doc.partof); }"
}
}
}
You query it first with param-by-timestamp?reduce=true to get the latest timestamp in value[0] and its corresponding param in value[1], and then query again with partof-by-param?key="<what you got in previous query>". If you need to fetch the full documents together with the timestamp and param, then you will have to play with includedocs=true and provide with the correct _doc values.

Resources