I am trying to save set of events from MySQL database into elastic search using jdbc input plugin with Logstash. The event record in database contains date fields which are in microseconds format. Practically, there are records in database between set of microseconds.
While importing data, Elasticsearch is truncating the microseconds date format into millisecond format. How could I save the data in microsecond format? The elasticsearch documentation says they follow the JODA time API to store date formats, which is not supporting the microseconds and truncating by adding a Z at the end of the timestamp.
Sample timestamp after truncation : 2018-05-02T08:13:29.268Z
Original timestamp in database : 2018-05-02T08:13:29.268482
The Z is not a result of the truncation but the GMT timezone.
ES supports microseconds, too, provided you've specified the correct date format in your mapping.
If the date field in your mapping is specified like this:
"date": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ss.SSSSSS"
}
Then you can index your dates with the exact microsecond precision as you have in your database
UPDATE
Here is a full re-creation that shows you that it works:
PUT myindex
{
"mappings": {
"doc": {
"properties": {
"date": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ss.SSSSSS"
}
}
}
}
}
PUT myindex/doc/1
{
"date": "2018-05-02T08:13:29.268482"
}
Side note, "date" datatype stores data in milliseconds in elasticsearch so here in case nanoseconds precision level are wanted in date ranges queries; the appropriate datatype is date_nanos
Related
I am trying to graph data in Azure Timeseries Insights. For the provided sensor, I can only graph the event count - none of the values provided by the JSON are available. Despite this, the raw data is clearly all present.
I can see the attribute state in the raw data, but it cannot be selected for graphing.
The raw data view:
The property selection for the entity:
The raw json (before it lands in Times Series Insights) is as follows (from another identical sensor). The entity_id and last_updated are used as the device id and update time for the event source.:
{
"entity_id": "sensor.temperature_9",
"state": "21.0",
"attributes": {
"on": true,
"unit_of_measurement": "°C",
"friendly_name": "XXXX Schlafzimmer Temp",
"icon": "mdi:thermometer",
"device_class": "temperature"
},
"last_changed": "2021-03-02T07:45:23.239584+00:00",
"last_updated": "2021-03-02T07:45:23.239584+00:00",
"context": {
"id": "32d7edfe14b5738ee47509d026c6d5d3",
"parent_id": null,
"user_id": null
}
}
How can I graph the state from raw data?
Figured it out: the state value from json is used by many objects, some report a numeric value and some an enumeration. This makes the field invalid for direct selection in a numeric data type.
Instead, a value of toDouble($event.state.String) in a type, then assigning the type to the instance, allows the correct value to be displayed.
I'm having trouble with date parsing in elasticsearch 7.10.1.
Here's (a relevant part of) the mapping for the index:
"utcTime": {
"type": "date",
"format": "strict_date_optional_time_nanos"
}
Date format reference.
Some of the documents are accepted, for example documents with:
"utcTime": "2021-02-17T09:50:13.173Z"
"utcTime": "2021-02-17T09:51:44.158Z"
Note that in both cases, there are exactly 3 decimals to the seconds.
This, on the other hand, is rejected:
"utcTime": "2021-02-17T09:51:45.07Z"
illegal_argument_exception: failed to parse date field [2021-02-17T09:51:45.07Z] with format [yyyy-MM-dd''T''HH:mm:ss.SSSXX]
In this case, there are only two decimals. I'm using Newtonsoft's JSON.net to do the serialization, with a format that should always include 3 decimals, but it doesn't seem to do so anyway. It'll include at most 3 decimals, though.
How can I tell elasticsearch to accept date formats with anywhere between 0 and 3 decimals for the seconds?
EDIT
I finally found the issue, which had nothing to do with the mapping, but rather with a pipeline processor date_index_name.
PUT _ingest/pipeline/test_reroute_pipeline
{
"description" : "Route documents to another index",
"processors" : [
{
"date_index_name": {
"field": "utcTime",
"date_rounding": "d",
"index_name_prefix": "rerouted-"
}
}
]
}
Because the date_format parameter wasn't defined, it would remember the format of the first date received. If it was 2 decimals, it would require 2 every time. If it was 3, it would require three.
Specifying the date format solved the issue for good:
PUT _ingest/pipeline/test_reroute_pipeline
{
"description" : "Route documents to another index",
"processors" : [
{
"date_index_name": {
"field": "utcTime",
"date_rounding": "d",
"index_name_prefix": "rerouted-",
"date_formats": ["ISO8601"]
}
}
]
}
I just tried on a fresh new 7.10.1 cluster and it also accepted 1, 2, 3 decimals for the seconds part.
Looking at the error message you get
illegal_argument_exception: failed to parse date field [2021-02-17T09:51:45.07Z] with format [yyyy-MM-dd''T''HH:mm:ss.SSSXX]
The format that seems to be set is yyyy-MM-dd''T''HH:mm:ss.SSSXX and it is different from strict_date_optional_time_nanos which is yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ
If you check the real mapping from your index, I'm pretty sure the utcTime field doesn't have strict_date_optional_time_nanos as the format.
As I understand, the Chrome browser uses the WebKit time format for timestamps within the browser history database. WebKit time is expressed as milliseconds since January, 1601.
I've found numerous articles that seemingly have the answer to my question, but none have worked so far. The common answer is to use the formula below to convert from WebKit to a human-readable, localtime:
SELECT datetime((time/1000000)-11644473600, 'unixepoch', 'localtime') AS time FROM table;
Sources:
https://linuxsleuthing.blogspot.com/2011/06/decoding-google-chrome-timestamps-in.html
What is the format of Chrome's timestamps?
I'm trying to convert the timestamps while gathering the data through Osquery, using the configuration below.
"chrome_browser_history" : {
"query" : "SELECT urls.id id, urls.url url, urls.title title, urls.visit_count visit_count, urls.typed_count typed_count, urls.last_visit_time last_visit_time, urls.hidden hidden, visits.visit_time visit_time, visits.from_visit from_visit, visits.visit_duration visit_duration, visits.transition transition, visit_source.source source FROM urls JOIN visits ON urls.id = visits.url LEFT JOIN visit_source ON visits.id = visit_source.id",
"path" : "/Users/%/Library/Application Support/Google/Chrome/%/History",
"columns" : ["path", "id", "url", "title", "visit_count", "typed_count", "last_visit_time", "hidden", "visit_time", "visit_duration", "source"],
"platform" : "darwin"
}
"schedule": {
"chrome_history": {
"query": "select distinct url,datetime((last_visit_time/1000000)-11644473600, 'unixepoch', 'localtime') AS time from chrome_browser_history where url like '%nhl.com%';",
"interval": 10
}
}
The resulting events have timestamps from the year 1600:
"time":"1600-12-31 18:46:16"
If I change the config to pull the raw timestamp with no conversion, I get stamps such as the following:
"last_visit_time":"1793021894"
From what I've read about WebKit time, it is expressed in 17-digit numbers, which clearly is not what I'm seeing. So I'm not sure if this is an Osquery, Chrome, or query issue at this point. All help and insight appreciated!
Solved. The datetime conversion needs to take place within the table definition query.
I.e. the query defined underneath "chrome_browser_history".
"chrome_browser_history" : {
"query" : "SELECT urls.id id, urls.url url, urls.title title, urls.visit_count visit_count, urls.typed_count typed_count, datetime(urls.last_visit_time/1000000-11644473600, 'unixepoch') last_visit_time, urls.hidden hidden, visits.visit_time visit_time, visits.from_visit from_visit, visits.visit_duration visit_duration, visits.transition transition, visit_source.source source FROM urls JOIN visits ON urls.id = visits.url LEFT JOIN visit_source ON visits.id = visit_source.id",
"path" : "/Users/%/Library/Application Support/Google/Chrome/%/History",
"columns" : ["path", "id", "url", "title", "visit_count", "typed_count", "last_visit_time", "hidden", "visit_time", "visit_duration", "source"],
"platform" : "darwin"
}
"schedule": {
"chrome_history": {
"query": "select distinct url,last_visit_time from chrome_browser_history where url like '%nhl.com%';",
"interval": 10
}
}
Trying to make the conversion within the osquery scheduled query (as I was trying before) will not work. i.e:
"schedule": {
"chrome_history": {
"query": "select distinct url,datetime((last_visit_time/1000000)-11644473600, 'unixepoch', 'localtime') AS time from chrome_browser_history where url like '%nhl.com%';",
"interval": 10
}
}
Try:
SELECT datetime(last_visit_time/1000000-11644473600, \"unixepoch\") as last_visited, url, title, visit_count FROM urls;
This is from something I wrote up a while ago - One-liner that runs osqueryi with ATC configuration to read in the chrome history file, export as json and curl the json to an API endpoint
https://gist.github.com/defensivedepth/6b79581a9739fa316b6f6d9f97baab1f
The things you're working with, are pretty straight sqlite. So I would start by debugging inside sqlit.
First, you should verify the data is what you expect. On my machine, I see:
$ cp Library/Application\ Support/Google/Chrome/Profile\ 1/History /tmp/
$ sqlite3 /tmp/History "select last_visit_time from urls limit 2"
13231352154237916
13231352154237916
Second, I would verify the underlying math:
sqlite> select datetime(last_visit_time/1000000-11644473600, "unixepoch") from urls limit 2;
2020-04-14 15:35:54
2020-04-14 15:35:54
It would be easier to test your config snippet if you included it as text we can copy/paste.
I have a cosmos DB whose size is around 100GB.
I successfully create a nice partition key, i have around 4600 partition on 70M records, but I still need to query on two datetime fields that are stored as a string, not in an epoch format.
Example json:
"someField1": "UNKNOWN",
"someField2": "DATA",
"endDate": 7014541201,
"startDate": 7054864502,
"someField3": "0",
"someField3": "0",
i notice when i do select * from tbl and when i do select * from tbl where startDate > {someDate} AND endDate<{someDate1} latency different is around 1s, so this filtering does not decrease my latency time.
Is it better to store date types as number? Does cosmos have better performance on epoch query ranges?
I am using SQL API.
Also when i try to add hash indexes on a startDate and endDate he basically convert that into two indexes.
Example:
"path": "/startDate/?",
"indexes": [
{
"kind": "Hash",
"dataType": "String",
"precision": 3
}
]
},
this is converted to
"path": "/startDate/?",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Range",
"dataType": "String",
"precision": -1
}
]
Is that a normal behaviour or it is related to my data?
Thanks.
I checked query metrics, and for 4k records query to cosmosDB is executed in 100ms. I would like to ask you is it normal behaviour that
var option = new FeedOptions { PartitionKey = new PartitionKey(partitionKey), MaxItemCount = -1};
var query= client.CreateDocumentQuery<MyModel>(collectionLink, option)
.Where(tl => tl.StartDate >= DateTimeToUnixTimestamp(startDate) && tl.EndDate <= DateTimeToUnixTimestamp(endDate))
.AsEnumerable().ToList();
this query returns 10k results (in Postman its around 9MB size) in 10-12s? This partition contains around 50k records.
Retrieved Document Count : 12,356
Retrieved Document Size : 12,963,709 bytes
Output Document Count : 3,633
Output Document Size : 3,819,608 bytes
Index Utilization : 29.00 %
Total Query Execution Time : 264.31
milliseconds
Query Compilation Time : 0.12 milliseconds
Logical Plan Build Time : 0.07 milliseconds
Physical Plan Build Time : 0.06 milliseconds
Query Optimization Time : 0.01 milliseconds
Index Lookup Time : 51.10 milliseconds
Document Load Time : 140.51 milliseconds
Runtime Execution Times
Query Engine Times : 55.61 milliseconds
System Function Execution Time : 0.00 milliseconds
User-defined Function Execution Time : 0.00 milliseconds
Document Write Time : 10.56 milliseconds
Client Side Metrics
Retry Count : 0
Request Charge : 904.73 RUs
I am from the CosmosDB engineering team.
Since your collection has 70M records, I assume that the observed latency is only on the first roundtrip (or first page) of results. Note that the observed latency can also be improved by tweaking FeedOptions.MaxDegreeOfParallelism to -1 when executing the query.
Regarding the difference between the two queries themselves, please note that SELECT * without a filter is a full scan query, which is probably a bit
faster to first return results, when compared to the other query with two filters, which does a little bit more work on the local indexes across all the partitions, which may explain the observed latency.
Regarding your other question, we no longer support the Hash indexing policy on new collections. Please see here: https://learn.microsoft.com/en-us/azure/cosmos-db/index-types#index-kind . We automatically convert Hash indexes to Range with full precision.
You may also fetch QueryMetrics for your query and analyze the results to figure out why you have latency. Details are here: https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-query-metrics#query-execution-metrics
I have to compare a user entered date, "Dt" (in mm/dd/yyyy format) with the date in RavenDB - "ReleaseDate" (time stamp like "/Date(1187668800000)/"). For this I am using the following code which almost gets the job done, but I need little help to finalize loose ends...
How can I compare the two dates so I can get the query to run successfully.
public ActionResult Calculation(DateTime? Dt)
{
var store = new DocumentStore { Url = "http://localhost:80" };
store.Initialize();
var CalcModel = new CalcViewModel();
using (var session = store.OpenSession())
{
//Converting user entered date dt in mm/dd/yyyy format to total
//milliseconds - So that later I can compare this value to RavenDB
//time stamp date format (older versions)
DateTime d1 = new DateTime(1970, 1, 1);
DateTime d2 = Dt.Value.ToUniversalTime();
TimeSpan ts = new TimeSpan(d2.Ticks - d1.Ticks);
double tmillisecs = ts.TotalMilliseconds; //Not yet using this value.
CalcModel.MoviesByDate = session.Query<Movies>()
.Where(x => x.ReleaseDate.Ticks == ts.Ticks)
.Count();
// this is where I need to compare two dates - ts.ticks gives the
// required value of date (1187668800000) multiplied by 10000.
}
return View(CalcModel);
}
Right now, when I debug I know what value ts.ticks is showing... and its like I said above in the code comments, the required value multiplied by 10000. But I have no clue at run time , what the value in x.ReleaseDate is or x.ReleaseDate.Ticks is.. am I doing this correctly. Thanks for the help.
Umm... I think you seriously misunderstand how SQL dates work, and how it applies to .NET. The whole point about dates is that they're stored in a numeric format, not a text one. So when you have a DateTime object, it's not stored as the text date, it's stored as a numeric type that you can convert to any format you want.
Because the .net provider converts database native datetime objects to DateTime objects, you can just compare them natively. ie:
DateTime d1 = new DateTime(1970, 1, 1);
CalcModel.MoviesByDate = session.Query<Movies>()
.Where(x => x.ReleaseDates.Date == d1.Date)
.Count();
Regardless of how RavenDB stores the dates internally, when the DateTime object is materialized in the query, it will be in native .NET format.