Elasticsearch query using Kibana does not work using Java Rest Client API - kibana

Can someone help determine why a kibana query does not return hits when using the Elasticsearch Java Rest Client API.
I am currently using
Elasticsearch/Kibana: 7.16.2
Elasticsearch Java Client: 6.6.2
I am reluctant to upgrade java client due to numerous Geometry related updates needed.
fields:
mydatetime: timestamp of doc
category: keyword field
We have 1000 or more records for each category a day.
I want an aggregation that shows categories by day and includes the first and last "date" for the category.
This query works in Kibana
GET /mycategories/_search
{
"size":0,
"aggregations":{
"bucket_by_date":{
"date_histogram":{
"field":"mydatefield",
"format":"yyyy-MM-dd",
"interval":"1d",
"offset":0,
"order":{
"_key":"asc"
},
"keyed":false,
"min_doc_count":1
},
"aggregations":{
"unique_categories":{
"terms":{
"field":"category",
"size":10,
"min_doc_count":1,
"shard_min_doc_count":0,
"show_term_doc_count_error":false,
"order":[
{
"_count":"desc"
},
{
"_key":"asc"
}
]
},
"aggregations":{
"min_mydatefield":{
"min":{
"field":"mydatefield"
}
},
"max_mydatefield":{
"max":{
"field":"mydatefield"
}
}
}
}
}
}
}
}
The first record of the result...category1 and category2 for 2022 05 07 with min and max "mydatetime" for each category
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 4,
"successful" : 4,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2593,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"bucket_by_date" : {
"buckets" : [
{
"key_as_string" : "2022-05-07",
"key" : 1651881600000,
"doc_count" : 2,
"unique_missions" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "category1",
"doc_count" : 1,
"min_mydatefield" : {
"value" : 1.651967952E12,
"value_as_string" : "2022-05-07T13:22:17.000Z"
},
"max_mydatefield" : {
"value" : 1.651967952E12,
"value_as_string" : "2022-05-07T23:59:12.000Z"
}
},
{
"key" : "category2",
"doc_count" : 1,
"min_mydatefield" : {
"value" : 1.651967947E12,
"value_as_string" : "2022-05-07T03:47:23.000Z"
},
"max_mydatefield" : {
"value" : 1.651967947E12,
"value_as_string" : "2022-05-07T23:59:07.000Z"
}
}
]
}
},
I have successfully coded other, less complex aggregations without problems. However, i have not been able to get either an AggregationBuilder or WrapperQuery. Zero results are returned.
{"took":0,"timed_out":false,"_shards":{"total":0,"successful":0,"skipped":0,"failed":0},"hits":{"total":0,"max_score":0.0,"hits":[]}}
Before executing the query, i copy the SearchRequest.source() into Kibana, and it runs and returns the desired information.
Below is the AggregationBuilder code that seems to replicate my kibana query, but returns no results.
AggregationBuilder aggregation =
AggregationBuilders
.dateHistogram("bucket_by_date").format("yyyy-MM-dd")
.minDocCount(1)
.dateHistogramInterval(DateHistogramInterval.DAY)
.field("mydatefield")
.subAggregation(
AggregationBuilders
.terms("unique_categories")
.field("category")
.subAggregation(
AggregationBuilders
.min("min_mydatefield")
.field("mydatefield")
)
.subAggregation(
AggregationBuilders
.max("max_mydatefield")
.field("mydatefield")
)
);

Related

Convert String to Int on AWS DocumentDB

I am currently trying to write a metabase question off AWS Document DB and I am running into an issue where I need to convert a string to an integer. Unfortunately, it seems like aws documentdb does not support $toInt and I am not sure how to get around it. Here is the query:
[
{"$match": {
"metaData.fileSize" : {"$exists": true}
}},
{"$project": {
"file_size" : "$metaData.fileSize",
"timestamp": 1,
"past7Days":
{ "$subtract":
[ ISODate(), 604800000]
}
}},
{"$project": {
"file_size" : 1,
"timestamp": 1,
"dayofweek": {"$dayOfWeek":["$timestamp"]},
"past7DaysComp":
{ "$subtract":
[ "$timestamp", "$past7Days"]
}
}},
{"$group" :
{
"_id" : {"dayofweek" : "$dayofweek"},
"size": {"$avg" : "$file_size"}
}
}
]
The group returns nothing for size since it is not of a numeric type. Any ideas how to convert file_size to integer or double or float?

CosmosDB $elemMatch syntax error

I am getting a strange syntax error for some commands in the MongoDB API for CosmosDB. Say I have a collection called "Collection" with two documents:
{
"_id" : 1,
"arr" : [
{
"_id" : 11
},
{
"_id" : 12
}
]
}
{
"_id" : 2,
"arr" : [
{
"_id" : 21
},
{
"_id" : 22
}
]
}
If I try to run the query
db.getCollection('Collection').find( { _id : 2 }, { arr : { $elemMatch : { _id : 21 } } })
I get the result
{
"_t" : "OKMongoResponse",
"ok" : 0,
"code" : 9,
"errmsg" : "Syntax error, incorrect syntax near '10'.",
"$err" : "Syntax error, incorrect syntax near '10'."
}
But the command works perfectly fine on my locally hosted instance of MongoDB, returning the expected result:
{
"_id" : 2,
"arr" : [
{
"_id" : 21
}
]
}
Anyway, this is certainly not a syntax error, but there is no helpful error message. If this is not yet supported by CosmosDB, is there any way to only get certain embedded documents stored in an array?
If I try to use an aggregation pipeline to just extract the document in the array (I realize this should give a different result than the command above, but it would also work for my purposes), like so:
db.getCollection('Collection').aggregate([{ "$unwind" : "$arr" }, { "$match" : { "arr._id" : 21 } }] )
I get the result
{
"_t" : "OKMongoResponse",
"ok" : 0,
"code" : 118,
"errmsg" : "$match is currently only supported when it is the first and only stage of the aggregation pipeline. Please restructure your query to combine multiple $match stages into a single $match stage.",
"$err" : "$match is currently only supported when it is the first and only stage of the aggregation pipeline. Please restructure your query to combine multiple $match stages into a single $match stage."
}
So that doesn't work for me either.
Try this
db.collection.aggregate([
{
$match: {
"_id": 2
}
},
{
$project: {
arr: {
$filter: {
input: "$arr",
as: "ar",
cond: {
$eq: [
"$$ar._id",
21
]
}
}
}
}
}
])
Check it here

Example Dgraph recurse sum query

New Dgraph user wondering if anyone can provide me with an example recursive count and sum query to help get me going.
The data looks like this (there are more predicates, but left out for simplicity):
{
"uid" : <0x1>,
"url" : "example.com",
"link" : [
{
"uid" : <0x2>,
"url" : "example2.com",
"link" : [
{
"uid" : <0x4>,
"url" : "example4.com",
"link" : [
{
"uid" : <0x6>,
"url" : "example6.com",
"link" : [
{
etc...
}
]
}
]
},
{
"uid" : <0x5>,
"url" : "example5.com",
}
]
},
{
"uid" : <0x2>,
"url" : "example2.com",
"link" : [
{
etc ....
}
},
]
}
Just a home page with n-links which each have n-links and the depth, obviously, can vary. Just hoping for a good example of how to count all the links for each url and sum them up. I will add different filters to the query at some point, but just wanting to see a basic query to help get me going. Thanks.

Using Usergrid how do I get related entities nested in a single json and not only the link to them

When I query /mycollections?ql=Select * where name='dfsdfsdfsdfsdfsdf' I get
{
"action" : "get",
"application" : "859e6180-de8a-11e4-9360-f1aabbc15f58",
"params" : {
"ql" : [ "Select * where name='dfsdfsdfsdfsdfsdf'" ]
},
"path" : "/mycollections",
"uri" : "http://localhost:8080/myorg/myapp/mycollections",
"entities" : [ {
"uuid" : "2ff8961a-dea8-11e4-996b-63ce373ace35",
"type" : "mycollection",
"name" : "dfsdfsdfsdfsdfsdf",
"created" : 1428577466865,
"modified" : 1428577466865,
"metadata" : {
"path" : "/mycollections/2ff8961a-dea8-11e4-996b-63ce373ace35",
"connections" : {
"relations" : "/mycollections/2ff8961a-dea8-11e4-996b-63ce373ace35/relations"
}
}
} ],
"timestamp" : 1428589309204,
"duration" : 53,
"organization" : "myorg",
"applicationName" : "myapp",
"count" : 1
}
Now if I query /mycollections/2ff8961a-dea8-11e4-996b-63ce373ace35/relations I get the second entity
{
"action" : "get",
"application" : "859e6180-de8a-11e4-9360-f1aabbc15f58",
"params" : { },
"path" : "/mycollections/2ff8961a-dea8-11e4-996b-63ce373ace35/relations",
"uri" : "http://localhost:8080/myorg/myapp/mycollections/2ff8961a-dea8-11e4-996b-63ce373ace35/relations",
"entities" : [ {
"uuid" : "56a1185a-dec1-11e4-9ac0-e9343f86b604",
"type" : "secondcollection",
"name" : "coucou",
"created" : 1428588269141,
"modified" : 1428588269141,
"metadata" : {
"connecting" : {
"relations" : "/mycollections/2ff8961a-dea8-11e4-996b-63ce373ace35/relations/56a1185a-dec1-11e4-9ac0-e9343f86b604/connecting/relations"
},
"path" : "/mycollections/2ff8961a-dea8-11e4-996b-63ce373ace35/relations/56a1185a-dec1-11e4-9ac0-e9343f86b604"
}
} ],
"timestamp" : 1428589668542,
"duration" : 51,
"organization" : "myorg",
"applicationName" : "myapp"
}
What I want is that instead of providing me the path of the related entity Usergrid directly nest it in the first JSON answer so that I only need to make a single http request instead of two.
You cannot. Usergrid is not designed in that way. You need to write an extra wrapper rest endpoint to simulate one response.
Not sure what DB you are using. If you are using document db like mongo then you can write a node.js scripts to do this manipulation. Apigee has volvo.js check is it possible to do scripting.

problems on elasticsearch with parent child documents

We work with two types of documents on elastic search (ES): items and slots, where items are parents of slot documents.
We define the index with the following command:
curl -XPOST 'localhost:9200/items' -d #itemsdef.json
where itemsdef.json has the following definition
{
"mappings" : {
"item" : {
"properties" : {
"id" : {"type" : "long" },
"name" : {
"type" : "string",
"_analyzer" : "textIndexAnalyzer"
},
"location" : {"type" : "geo_point" },
}
}
},
"settings" : {
"analysis" : {
"analyzer" : {
"activityIndexAnalyzer" : {
"alias" : ["activityQueryAnalyzer"],
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["trim", "lowercase", "asciifolding", "spanish_stop", "spanish_synonym"]
},
"textIndexAnalyzer" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["word_delimiter_impl", "trim", "lowercase", "asciifolding", "spanish_stop", "spanish_synonym"]
},
"textQueryAnalyzer" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["trim", "lowercase", "asciifolding", "spanish_stop"]
}
},
"filter" : {
"spanish_stop" : {
"type" : "stop",
"ignore_case" : true,
"enable_position_increments" : true,
"stopwords_path" : "analysis/spanish-stopwords.txt"
},
"spanish_synonym" : {
"type" : "synonym",
"synonyms_path" : "analysis/spanish-synonyms.txt"
},
"word_delimiter_impl" : {
"type" : "word_delimiter",
"generate_word_parts" : true,
"generate_number_parts" : true,
"catenate_words" : true,
"catenate_numbers" : true,
"split_on_case_change" : false
}
}
}
}
}
Then we add the child document definition using the following command:
curl -XPOST 'localhost:9200/items/slot/_mapping' -d #slotsdef.json
Where slotsdef.json has the following definition:
{
"slot" : {
"_parent" : {"type" : "item"},
"_routing" : {
"required" : true,
"path" : "parent_id"
},
"properties": {
"id" : { "type" : "long" },
"parent_id" : { "type" : "long" },
"activity" : {
"type" : "string",
"_analyzer" : "activityIndexAnalyzer"
},
"day" : { "type" : "integer" },
"start" : { "type" : "integer" },
"end" : { "type" : "integer" }
}
}
}
Finally we perform a bulk index with the following command:
curl -XPOST 'localhost:9200/items/_bulk' --data-binary #testbulk.json
Where testbulk.json holds the following data:
{"index":{"_type": "item", "_id":35}}
{"location":[40.4,-3.6],"id":35,"name":"A Name"}
{"index":{"_type":"slot","_id":126,"_parent":35}}
{"id":126,"start":1330,"day":1,"end":1730,"activity":"An Activity","parent_id":35}
We see through ES Head plugin that definitions seem to be ok. We test the analyzers to check that they have been loaded and they work. Both documents appear listed in ES Head browser view. But if we try to retrieve the child item using the API, ES responds that it does not exist:
$ curl -XGET 'localhost:9200/items/slot/126'
{"_index":"items","_type":"slot","_id":"126","exists":false}
When we import 50 documents, all parent documents can be retrieved through API, but only SOME of the requests for child elements get a successful response.
My guess is that it may have something to do with how docs are stored across shards and the routing...which certainly is not clear to me how it works.
Any clue on how to be able to retrieve individual child documents? ES Head shows they have been stored but HTTP GETs to localhost:9200/items/slot/XXX respond randomly with "exists":false.
The child documents are using parent's id for routing. So, in order to retrieve child documents you need to specify parent id in the routing parameter on your query:
curl "localhost:9200/items/slot/126?routing=35"
If parent id is not available, you will have to search for the child documents:
curl "localhost:9200/items/slot/_search?q=id:126"
or switch to an index with a single shard.

Resources