Error using elastic on linux server but no error on windows - r

When I execute
elastic::Search(index=index,body=body,size=1000,scroll="3m")
on a linux server I receive the following error.
invalid char in json text. <!DOCTYPE HTML PUBLIC "-//W3C//
On windows everything is fine. However, if I execute elastic::Search with a different body, it works. So here is my body.
'{
"_source":["DOC_ID", "DELIVERY_ID",
"CONTRIB_TS", "LANG", "SYS_NOT", "SURVEIL"],
"query": {
"bool": {
"must": [
{"match_phrase": { "CONTENT" : "XXX" }}
],
"filter": [{ "term" : { "DELIVERY_ID" : "100" } },{ "term" : { "SYS_NOT" : "0" } }]
}
},
"highlight": {
"pre_tags" : [""],
"post_tags" : [""],
"fields" : {
"CONTENT": {"fragment_size" : 200}
}
}
}'

Related

Kibana two different DSL Queries behaving as OR

I'm trying to create two Queries that appear as blue button on Visualizer that I want to apply both as if they are a OR . So in this case I can filter my logs by INFO or ERROR or BOTH AT THE SAME TIME. If I enable one or the other alone they work as expected but if I enable both it's like the final query is a INFO AND ERROR when what I want is INFO OR ERROR. Both of the queries are similar, one with ERROR the other with INFO
{
"query": {
"bool": {
"filter": [
{
"match": {
"message": "INFO"
}
}
]
}
}
}
I used both filter and should.
I do saw the Inspect but I can't understand it.....
Any idea if this is possible at all?
Thanks.
EDITED for clarification after 1st reply:
What I need is 2 different, separated queries (one with "status": "info" and the other "status": "error", because I want to attach them to those blue buttons that appear when you click "Add a filter". So I end up with 2 blue buttons, ERROR and INFO, in a way that when they are both enabled it will show both. At the moment they work individually but when I enable both I think it behaves like ERROR AND INFO and no line have both, so what I want is some kind of ERROR OR INFO so it will display both. Any idea?
EDIT 2:
From my last comment below, looking at the Inspect with two scripts each one in its own button, it shows
Inspect
"query": {
"bool": {
"must": [ <--- my scripts below get wrapped in this MUST
{
"bool": {
"should": [
{
"match_phrase": {
"message": "INFO"
}
}
],
"minimum_should_match": 1
}
},
{
"bool": {
"should": [
{
"match_phrase": {
"message": "ERROR"
}
}
],
"minimum_should_match": 1
}
},
...
and the scripts I have in the two buttons
INFO
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"message": "INFO"
}
}
],
"minimum_should_match": 1
}
}
}
ERROR
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"message": "ERROR"
}
}
],
"minimum_should_match": 1
}
}
}
So if there is no way to change the way Kimunda wraps the scripts I guess I'm screwed...
EDIT
You can use is one of it will be OR query.
You can use the should query.
POST _bulk
{ "index" : { "_index" : "test_should", "_id" : "1" } }
{ "status" : "info" }
{ "index" : { "_index" : "test_should", "_id" : "2" } }
{ "status" : "error" }
GET test_should/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"status": "info"
}
},
{
"match": {
"status": "error"
}
}
]
}
}
}

Converting a painless script into a visualisation on Kibana (Logs from AWS Connect)

I have logs being shipped from AWS Connect to Kibana through AWS OpenSearch. I have written the following script to return the latest status of an Agent like so:
GET agent-logs-*/_search
{
"script_fields": {
"data": {
"script": {
"lang": "painless",
"source": "params._source.CurrentAgentSnapshot.Configuration.Username + ', ' + params._source.CurrentAgentSnapshot.AgentStatus.Name + ', ' + params._source.EventTimestamp"
}
}
},
"collapse": {
"field": "CurrentAgentSnapshot.Configuration.Username.keyword"
},
"sort": [
{
"EventTimestamp": {
"order": "desc"
}
}
]
}
This returns a value of:
{
"took" : 29,
"timed_out" : false,
"_shards" : {
"total" : 65,
"successful" : 65,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [
{
"_index" : "agent-logs-2022-06-28",
"_type" : "_doc",
"_id" : "",
"_score" : null,
"fields" : {
"data" : [
"al.pacino#email.com, Available, 2022-06-28T10:52:01.238Z"
],
"CurrentAgentSnapshot.Configuration.Username.keyword" : [
"al.pacino#email.com"
]
},
"sort" : [
1656413521238
]
},
{
"_index" : "agent-logs-2022-06-28",
"_type" : "_doc",
"_id" : "",
"_score" : null,
"fields" : {
"data" : [
"robert.deniro#email.com, Available, 2022-06-28T10:50:45.622Z"
],
"CurrentAgentSnapshot.Configuration.Username.keyword" : [
"robert.deniro#email.com"
]
},
"sort" : [
1656413445622
]
},
{
"_index" : "agent-logs-2022-06-26",
"_type" : "_doc",
"_id" : "",
"_score" : null,
"fields" : {
"data" : [
"marlon.brando#email.com, Offline, 2022-06-26T14:51:55.203Z"
],
"CurrentAgentSnapshot.Configuration.Username.keyword" : [
"marlon.brando#email.com"
]
},
"sort" : [
1656255115203
]
}
]
}
}
I wanted to take the data lines from the JSON - "al.pacino#email.com, Available, 2022-06-28T10:52:01.238Z" and represent this in a visualisation such as a Data Table to get a list of agents with their corresponding status.
By using the current agent-logs, there is a delay whereby the status change and heart beats overlap, causing an inaccurate count of the status, thus need to use this script.

Strange output query Elastic Search

I just started with using Elastic Search. I've got everything set-up correctly. I'm using Firebase + Flashlight + Elastic Search.
In my front-end I'm building queries based on different search params. I insert them into a node in Firebase /search/requests/. Flashlight will pick this up and putting the response into /search/response, this works like a charm!
However, I'm not sure how to write my queries properly. I'm getting strange results when I'm trying to combine two must match queries. I'm using Query DSL.
My documents in Elastic Search under deliverables/doc are having the following scheme.
...
{
"createdBy" : "admin#xx.org",
"createdOn" : 1501200000000,
"deadLine" : 1508716800000,
"description" : {
"value" : "dummy description"
},
"key" : "<FBKEY>",
"programmes" : [ {
"code" : "95000",
"name" : "Test programme",
"programYear" : 2017
} ],
"projects" : [ {
"projectCode" : "113200",
"projectName" : "Test project",
"projectYear" : 2017
} ],
"reportingYear" : 2017,
"status" : "Open",
"type" : "writing",
"updatedBy" : "admin#xx.org",
"updatedOn" : 1501200000000,
},
...
My query has the following structure.
{
"query": {
"bool": {
"must": [
{
"match": {
"createdBy": "xx#company.org"
},
"match": {
"programmes.code": "95000"
}
}
]
}
}
}
In my output I'm also getting documents that don't have exactly those two fields? They have a very low score as well. Is this normal?
My mapping, automatically created using Flashlight
Update 1
I just tried this query, however it still gives me strange results by not filtering on both fields:
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"match": {
"programmes.code": "890000"
}
},
{
"match": {
"createdBy": "admin#xx.org"
}
}
]
}
}
}
}
}
The must clause used in bool query is executed in query context(all the documents are returned in decreasing order of score) and contributes to score. see link
If you want it to be executed as a filter, then use the following query:
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"match": {
"createdBy": "xx#company.org"
}
},
{
"match": {
"programmes.code": "95000"
}
}
]
}
}
}
}
}
NOTE:
By default the string field is analyzed, update the mapping of the string fields as not_analyzed, to use filter query. Refer: mapping-intro

Elastic Search Date Parsing Error

I'm pretty new at configuring elastic and I am having problems trying to parse a log date - which seems like it should be a trivial thing to do.
Any insight for a newbie?
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "failed to parse [Message.LogTime]"
}
],
"type": "mapper_parsing_exception",
"reason": "failed to parse [Message.LogTime]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Invalid format: \"2015-11-12 01:37:35.490\" is malformed at \" 01:37:35.490\""
}
}
My JSON payload
{
"LoggerType": "ErrorAndInfo",
"Message": {
"LogId": 0,
"LogStatus": 0,
"LogTime": "2015-11-12 01:37:35.490",
"VersionInfo": "",
"AdditionalInformation": null
}
}
Elastic Search Template Mapping
"mappings": {
"log_message" : {
"_all" : { "enabled": false },
"properties": {
"LoggerType" : { "type" : "string" },
"Message" : {
"properties": {
"LogId": { "type" : "integer" },
"LogStatus": { "type" : "integer" },
"LogTime": {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss.SSS"
},
"VersionInfo": {
"type" : "string",
"index" : "not_analyzed"
},
}
}
}
}
}
I figured it out. You will have to re-create your index for the changes to be applied.

Elasticsearch PHP longest prefix match

I am currently using the FOSElasticaBundle in Symfony2 and I am having a hard time trying to build a search to match the longest prefix.
I am aware of the 100 examples that are on the Internet to perform autocomplete-like searches using this. However, my problem is a little different.
In an autocomplete type of search the database holds the longest alphanumeric string (in length of characters) and the user just provides the shortest portion, let's say the user types "jho" and Elasticsearch can easily provide "Jhon, Jhonny, Jhonas".
My problem is backwards, I would like to provide the longest alphanumeric string and I want Elasticsearch to provide me the biggest match in the database.
For example: I could provide "123456789" and my database can have [12,123,14,156,16,7,1234,1,67,8,9,123456,0], in this case the longest prefix match in the database for the number that the user provided is "123456".
I am just starting with Elasticsearch so I don't really have a close to working settings or anything.
If there is any information not clear or missing let me know and I will provide more details.
Update 1 (Using Val's 2nd Update)
Index: Download 1800+ indexes
Settings:
curl -XPUT localhost:9200/tests -d '{
"settings": {
"analysis": {
"analyzer": {
"edge_ngram_analyzer": {
"tokenizer": "edge_ngram_tokenizer",
"filter": [ "lowercase" ]
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "25"
}
}
}
},
"mappings": {
"test": {
"properties": {
"my_string": {
"type": "string",
"fields": {
"prefix": {
"type": "string",
"analyzer": "edge_ngram_analyzer"
}
}
}
}
}
}
}'
Query:
curl -XPOST localhost:9200/tests/test/_search?pretty=true -d '{
"size": 1,
"sort": {
"_script": {
"script": "doc.my_string.value.length()",
"type": "number",
"order": "desc"
},
"_score": "desc"
},
"query": {
"filtered": {
"query": {
"match": {
"my_string.prefix": "8092232423"
}
},
"filter": {
"script": {
"script": "doc.my_string.value.length() <= maxlength",
"params": {
"maxlength": 10
}
}
}
}
}
}'
With this configuration the query returns the following results:
{
"took" : 61,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1754,
"max_score" : null,
"hits" : [ {
"_index" : "tests",
"_type" : "test",
"_id" : "AU8LqQo4FbTZPxBtq3-Q",
"_score" : 0.13441172,
"_source":{"my_string":"80928870"},
"sort" : [ 8.0, 0.13441172 ]
} ]
}
}
Bonus question
I would like to provide an array of numbers for that search and get the matching prefix for each one in an efficient way without having to perform the query each time
Here is my take at it.
Basically, what we need to do is to slice and dice the field (called my_string below) at indexing time with an edgeNGram tokenizer (called edge_ngram_tokenizer below). That way a string like 123456789 will be tokenized to 12, 123, 1234, 12345, 123456, 1234567, 12345678, 123456789 and all tokens will be indexed and searchable.
So let's create a tests index, a custom analyzer called edge_ngram_analyzer analyzer and a test mapping containing a single string field called my_string. You'll note that the my_string field is a multi-field declaring a prefixes sub-field which will contain all the tokenized prefixes.
curl -XPUT localhost:9200/tests -d '{
"settings": {
"analysis": {
"analyzer": {
"edge_ngram_analyzer": {
"tokenizer": "edge_ngram_tokenizer",
"filter": [ "lowercase" ]
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "25"
}
}
}
},
"mappings": {
"test": {
"properties": {
"my_string": {
"type": "string",
"fields": {
"prefixes": {
"type": "string",
"index_analyzer": "edge_ngram_analyzer"
}
}
}
}
}
}
}
Then let's index a few test documents using the _bulk API:
curl -XPOST localhost:9200/tests/test/_bulk -d '
{"index":{}}
{"my_string":"12"}
{"index":{}}
{"my_string":"1234"}
{"index":{}}
{"my_string":"1234567890"}
{"index":{}}
{"my_string":"abcd"}
{"index":{}}
{"my_string":"abcdefgh"}
{"index":{}}
{"my_string":"123456789abcd"}
{"index":{}}
{"my_string":"abcd123456789"}
'
The thing that I found particularly tricky was that the matching result could be either longer or shorter than the input string. To achieve that we have to combine two queries, one looking for shorter matches and another for longer matches. So the match query will find documents with shorter "prefixes" matching the input and the query_string query (with the edge_ngram_analyzer applied on the input string!) will search for "prefixes" longer than the input string. Both enclosed in a bool/should and sorted by a decreasing string length (i.e. longest first) will do the trick.
Let's do some queries and see what unfolds:
This query will return the one document with the longest match for "123456789", i.e. "123456789abcd". In this case, the result is longer than the input.
curl -XPOST localhost:9200/tests/test/_search -d '{
"size": 1,
"sort": {
"_script": {
"script": "doc.my_string.value.length()",
"type": "number",
"order": "desc"
}
},
"query": {
"bool": {
"should": [
{
"match": {
"my_string.prefixes": "123456789"
}
},
{
"query_string": {
"query": "123456789",
"default_field": "my_string.prefixes",
"analyzer": "edge_ngram_analyzer"
}
}
]
}
}
}'
The second query will return the one document with the longest match for "123456789abcdef", i.e. "123456789abcd". In this case, the result is shorter than the input.
curl -XPOST localhost:9200/tests/test/_search -d '{
"size": 1,
"sort": {
"_script": {
"script": "doc.my_string.value.length()",
"type": "number",
"order": "desc"
}
},
"query": {
"bool": {
"should": [
{
"match": {
"my_string.prefixes": "123456789abcdef"
}
},
{
"query_string": {
"query": "123456789abcdef",
"default_field": "my_string.prefixes",
"analyzer": "edge_ngram_analyzer"
}
}
]
}
}
}'
I hope that covers it. Let me know if not.
As for your bonus question, I'd simply suggest using the _msearch API and sending all queries at once.
UPDATE: Finally, make sure that scripting is enabled in your elasticsearch.yml file using the following:
# if you have ES <1.6
script.disable_dynamic: false
# if you have ES >=1.6
script.inline: on
UPDATE 2 I'm leaving the above as the use case might fit someone else's needs. Now, since you only need "shorter" prefixes (makes sense !!), we need to change the mapping a little bit and the query.
The mapping would be like this:
{
"settings": {
"analysis": {
"analyzer": {
"edge_ngram_analyzer": {
"tokenizer": "edge_ngram_tokenizer",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "25"
}
}
}
},
"mappings": {
"test": {
"properties": {
"my_string": {
"type": "string",
"fields": {
"prefixes": {
"type": "string",
"analyzer": "edge_ngram_analyzer" <--- only change
}
}
}
}
}
}
}
And the query would now be a bit different but will always return only the longest prefix but shorter or of equal length to the input string. Please try it out. I advise to re-index your data to make sure everything is setup properly.
{
"size": 1,
"sort": {
"_script": {
"script": "doc.my_string.value.length()",
"type": "number",
"order": "desc"
},
"_score": "desc" <----- also add this line
},
"query": {
"filtered": {
"query": {
"match": {
"my_string.prefixes": "123" <--- input string
}
},
"filter": {
"script": {
"script": "doc.my_string.value.length() <= maxlength",
"params": {
"maxlength": 3 <---- this needs to be set to the length of the input string
}
}
}
}
}
}

Resources