Elasticsearch & Elasticpress search by math_phrase with & inside query - wordpress

I have problem with my queries when I'm using " or ' - then I expect match_phrase, but I don't know how I can retrieve posts when I'm using match_phrase with &
For example I'm using Something & Something as phrase, and when I'm didn't using ' and " I can see posts with Something & Something but there I'm using multi_match.
Something what I've tried:
{
"from": 0,
"size": 10,
"sort": {
"post_date": {
"order": "desc"
}
},
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"match_phrase": {
"query": "Something & Something"
}
}
]
}
},
"exp": {
"post_date_gmt": {
"scale": "270d",
"decay": 0.5,
"offset": "90d"
}
},
"score_mode": "avg",
"boost_mode": "sum"
}
},
"post_filter": {
"bool": {
"must": [
{
"terms": {
"post_type.raw": [
"post"
]
}
},
{
"terms": {
"post_status": [
"publish"
]
}
}
]
}
}
}
But this doesn't return any post, and returning hits total 0. Anyone have any idea, or suggestions, what I'm doing wrong ?

match_phrase is very restrictive and in most of cases is recommended to use it inside a should clause to increase the score instead of a must, because it requires the user to type the value exactly as it is.
Example document
POST test_jakub/_doc
{
"query": "Something & Something",
"post_type": {
"raw": "post"
},
"post_status": "publish",
"post_date_gmt": "2021-01-01T12:10:30Z",
"post_date": "2021-01-01T12:10:30Z"
}
With this document searching for "anotherthing Something & Something" will return no results, that's why is a bad idea to use match_phrase here.
You can take 2 approaches
If you need this kind of tight queries take a look to the slop parameter that adds some flexibility to the match_phrase query allowing omit or transpose words in the phrase
Switch to a regular match query (recommended). In most cases this will work fine, but if you want to do extra score to the phrase matches you can add it as a should clause.
POST test_jakub/_search
{
"from": 0,
"size": 10,
"sort": {
"post_date": {
"order": "desc"
}
},
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match_phrase": {
"query": {
"query": "anotherthing something & something",
"slop": 2
}
}
}
],
"must": [
{
"match": {
"query": "anotherthing something & something"
}
}
]
}
},
"exp": {
"post_date_gmt": {
"scale": "270d",
"decay": 0.5,
"offset": "90d"
}
},
"score_mode": "avg",
"boost_mode": "sum"
}
},
"post_filter": {
"bool": {
"must": [
{
"terms": {
"post_type.raw": [
"post"
]
}
},
{
"terms": {
"post_status": [
"publish"
]
}
}
]
}
}
}
Last advice is to avoid using "query" as field name because leads to confusion and will break Kibana autocomplete on Dev Tools.

Related

How to use multiple if/then in JSON schema

I have a JSON schema defined as below -
{
"type": "object",
"properties": {
"prop1": {
"type": "string"
},
"prop2": {
"type": "string"
},
"prop3": {
"type": "string"
},
"prop4": {
"type": "string"
}
},
"anyOf": [
{
"if": {
"properties": {
"prop1": {
"const": "v1"
},
"prop2": {
"const": "v2"
}
}
},
"then": {
"required": [
"prop1",
"prop2",
"prop3"
]
}
},
{
"if": {
"properties": {
"prop1": {
"const": "v11"
},
"prop2": {
"const": "v22"
}
}
},
"then": {
"required": [
"prop1",
"prop2",
"prop4"
]
}
}
],
"required": [
"prop1",
"prop2"
]
}
A few scenarios i would like to validate -
{
"prop1": "aaa"
}
//should say prop2 is required --This works
{
"prop1": "aaa",
"prop2": "bbb"
}
//should validate to true --This works
{
"prop1": "v1"
"prop2": "v2"
}
//should say prop3 is required --This DOESN'T work
{
"prop1": "v11"
"prop2": "v22"
}
//should say prop4 is required --This DOESN'T work
Could someone please help me how to fix the above 2 test cases that doesnt work?
You need to change your anyOf to a allOf, to make sure both conditions are checked.
Due to the way you have written your if conditions, only one of them can be true at a time, so for one or the other of them (or both), the else clause will be executed, but you haven't provided an else clause, so it defaults to true, so at least one of the anyOf clauses will be true, so anyOf is true.

How to fix a problem with dynamic date templates?

I have a problem with dynamic date tampletes
I'm using ElasticSearch 6.2.4
My steps:
1) Create index with next settings:
PUT /test1
{
"settings": {
"index":{
"number_of_shards" : 9,
"number_of_replicas" : 0,
"max_rescore_window" : 2000000000,
"max_result_window" : 2000000000
}
},
"mappings": {
"files": {
"properties": {
"Дата добавления в БД": {
"type": "date"
}
},
"numeric_detection": true,
"dynamic_templates": [
{
"integers": {
"match_mapping_type": "long",
"mapping": {
"type": "long"
}
}
},
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
},
{
"dates": {
"match_mapping_type": "date",
"mapping": {
"format": "yyyy-MM-dd HH:mm:ss||yyyy/MM/dd HH:mm:ss||yyyyMMdd_HH:mm:ss",
"type": "date"
}
}
}
]
}
}
}
2) Try to put new records (I have only one)
POST /test1/files/_bulk
{"create":{"_index":"test1","_type":"files","_id":"0"}}
{"Дата добавления в БД":"2019/04/12 11:42:21"}
3) So, I have next output:
{
"took": 1,
"errors": true,
"items": [
{
"create": {
"_index": "test1",
"_type": "files",
"_id": "0",
"status": 400,
"error": {
"type": "mapper_parsing_exception",
"reason": "failed to parse [Дата добавления в БД]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Invalid format: \"2019/04/12 11:42:21\" is malformed at \"/04/12 11:42:21\""
}
}
}
}
]
}
I can't understand where is my mistake??
I tried to find some information about this problem in Google, unfortunately, I have no solves of this problem. Maybe, this question is so stupid, but I've already broken my brain.
Please, help me...
I can't fully understand, but this option work:
{
"settings": {
"index":{
"number_of_shards" : 9,
"number_of_replicas" : 0,
"max_rescore_window" : 2000000000,
"max_result_window" : 2000000000
}
},
"mappings": {
"files": {
"dynamic_date_formats": ["yyyy-MM-dd HH:mm:ss","yyyy/MM/dd HH:mm:ss", "yyyyMMdd_HH:mm:ss"],
"numeric_detection": true,
"date_detection": true,
"dynamic_templates": [
{
"integers": {
"match_mapping_type": "long",
"mapping": {
"type": "long"
}
}
},
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
}
]
}
}
}
Link to documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/6.2/dynamic-field-mapping.html
Thanks for attention :)

Elastic search 5.0 duplicate removal/optimisation

From ES 5.0 Fielddata is disabled on text fields by default., How I could remove duplicates/ achieve the same result with existing settings i.e. when field data is disabled on query below?
{
"aggs": {
"query": {
"terms": {
"field": "name"
}
,
"aggs": {
"top": {
"top_hits": {
"size": 1
}
}
}
}
},
"size": 0,
"query": {
"multi_match": {
"query": "laura",
"operator": "OR",
"fields": [
"name"
]
}
}
}
You would have to enable fielddata on text fields for ES 5.x. Use it with caution as it consumes lot of heap space.
Update your mapping with
PUT your_index/_mapping/your_type
{
"properties": {
"name": {
"type": "text",
"fielddata": true
}
}
}
and then run the query.

Elasticsearch PHP longest prefix match

I am currently using the FOSElasticaBundle in Symfony2 and I am having a hard time trying to build a search to match the longest prefix.
I am aware of the 100 examples that are on the Internet to perform autocomplete-like searches using this. However, my problem is a little different.
In an autocomplete type of search the database holds the longest alphanumeric string (in length of characters) and the user just provides the shortest portion, let's say the user types "jho" and Elasticsearch can easily provide "Jhon, Jhonny, Jhonas".
My problem is backwards, I would like to provide the longest alphanumeric string and I want Elasticsearch to provide me the biggest match in the database.
For example: I could provide "123456789" and my database can have [12,123,14,156,16,7,1234,1,67,8,9,123456,0], in this case the longest prefix match in the database for the number that the user provided is "123456".
I am just starting with Elasticsearch so I don't really have a close to working settings or anything.
If there is any information not clear or missing let me know and I will provide more details.
Update 1 (Using Val's 2nd Update)
Index: Download 1800+ indexes
Settings:
curl -XPUT localhost:9200/tests -d '{
"settings": {
"analysis": {
"analyzer": {
"edge_ngram_analyzer": {
"tokenizer": "edge_ngram_tokenizer",
"filter": [ "lowercase" ]
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "25"
}
}
}
},
"mappings": {
"test": {
"properties": {
"my_string": {
"type": "string",
"fields": {
"prefix": {
"type": "string",
"analyzer": "edge_ngram_analyzer"
}
}
}
}
}
}
}'
Query:
curl -XPOST localhost:9200/tests/test/_search?pretty=true -d '{
"size": 1,
"sort": {
"_script": {
"script": "doc.my_string.value.length()",
"type": "number",
"order": "desc"
},
"_score": "desc"
},
"query": {
"filtered": {
"query": {
"match": {
"my_string.prefix": "8092232423"
}
},
"filter": {
"script": {
"script": "doc.my_string.value.length() <= maxlength",
"params": {
"maxlength": 10
}
}
}
}
}
}'
With this configuration the query returns the following results:
{
"took" : 61,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1754,
"max_score" : null,
"hits" : [ {
"_index" : "tests",
"_type" : "test",
"_id" : "AU8LqQo4FbTZPxBtq3-Q",
"_score" : 0.13441172,
"_source":{"my_string":"80928870"},
"sort" : [ 8.0, 0.13441172 ]
} ]
}
}
Bonus question
I would like to provide an array of numbers for that search and get the matching prefix for each one in an efficient way without having to perform the query each time
Here is my take at it.
Basically, what we need to do is to slice and dice the field (called my_string below) at indexing time with an edgeNGram tokenizer (called edge_ngram_tokenizer below). That way a string like 123456789 will be tokenized to 12, 123, 1234, 12345, 123456, 1234567, 12345678, 123456789 and all tokens will be indexed and searchable.
So let's create a tests index, a custom analyzer called edge_ngram_analyzer analyzer and a test mapping containing a single string field called my_string. You'll note that the my_string field is a multi-field declaring a prefixes sub-field which will contain all the tokenized prefixes.
curl -XPUT localhost:9200/tests -d '{
"settings": {
"analysis": {
"analyzer": {
"edge_ngram_analyzer": {
"tokenizer": "edge_ngram_tokenizer",
"filter": [ "lowercase" ]
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "25"
}
}
}
},
"mappings": {
"test": {
"properties": {
"my_string": {
"type": "string",
"fields": {
"prefixes": {
"type": "string",
"index_analyzer": "edge_ngram_analyzer"
}
}
}
}
}
}
}
Then let's index a few test documents using the _bulk API:
curl -XPOST localhost:9200/tests/test/_bulk -d '
{"index":{}}
{"my_string":"12"}
{"index":{}}
{"my_string":"1234"}
{"index":{}}
{"my_string":"1234567890"}
{"index":{}}
{"my_string":"abcd"}
{"index":{}}
{"my_string":"abcdefgh"}
{"index":{}}
{"my_string":"123456789abcd"}
{"index":{}}
{"my_string":"abcd123456789"}
'
The thing that I found particularly tricky was that the matching result could be either longer or shorter than the input string. To achieve that we have to combine two queries, one looking for shorter matches and another for longer matches. So the match query will find documents with shorter "prefixes" matching the input and the query_string query (with the edge_ngram_analyzer applied on the input string!) will search for "prefixes" longer than the input string. Both enclosed in a bool/should and sorted by a decreasing string length (i.e. longest first) will do the trick.
Let's do some queries and see what unfolds:
This query will return the one document with the longest match for "123456789", i.e. "123456789abcd". In this case, the result is longer than the input.
curl -XPOST localhost:9200/tests/test/_search -d '{
"size": 1,
"sort": {
"_script": {
"script": "doc.my_string.value.length()",
"type": "number",
"order": "desc"
}
},
"query": {
"bool": {
"should": [
{
"match": {
"my_string.prefixes": "123456789"
}
},
{
"query_string": {
"query": "123456789",
"default_field": "my_string.prefixes",
"analyzer": "edge_ngram_analyzer"
}
}
]
}
}
}'
The second query will return the one document with the longest match for "123456789abcdef", i.e. "123456789abcd". In this case, the result is shorter than the input.
curl -XPOST localhost:9200/tests/test/_search -d '{
"size": 1,
"sort": {
"_script": {
"script": "doc.my_string.value.length()",
"type": "number",
"order": "desc"
}
},
"query": {
"bool": {
"should": [
{
"match": {
"my_string.prefixes": "123456789abcdef"
}
},
{
"query_string": {
"query": "123456789abcdef",
"default_field": "my_string.prefixes",
"analyzer": "edge_ngram_analyzer"
}
}
]
}
}
}'
I hope that covers it. Let me know if not.
As for your bonus question, I'd simply suggest using the _msearch API and sending all queries at once.
UPDATE: Finally, make sure that scripting is enabled in your elasticsearch.yml file using the following:
# if you have ES <1.6
script.disable_dynamic: false
# if you have ES >=1.6
script.inline: on
UPDATE 2 I'm leaving the above as the use case might fit someone else's needs. Now, since you only need "shorter" prefixes (makes sense !!), we need to change the mapping a little bit and the query.
The mapping would be like this:
{
"settings": {
"analysis": {
"analyzer": {
"edge_ngram_analyzer": {
"tokenizer": "edge_ngram_tokenizer",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "25"
}
}
}
},
"mappings": {
"test": {
"properties": {
"my_string": {
"type": "string",
"fields": {
"prefixes": {
"type": "string",
"analyzer": "edge_ngram_analyzer" <--- only change
}
}
}
}
}
}
}
And the query would now be a bit different but will always return only the longest prefix but shorter or of equal length to the input string. Please try it out. I advise to re-index your data to make sure everything is setup properly.
{
"size": 1,
"sort": {
"_script": {
"script": "doc.my_string.value.length()",
"type": "number",
"order": "desc"
},
"_score": "desc" <----- also add this line
},
"query": {
"filtered": {
"query": {
"match": {
"my_string.prefixes": "123" <--- input string
}
},
"filter": {
"script": {
"script": "doc.my_string.value.length() <= maxlength",
"params": {
"maxlength": 3 <---- this needs to be set to the length of the input string
}
}
}
}
}
}

Running queries created by Kibana using Java API?

Is it feasible to run queries created by Kibana using Java API?
I mean get a ready queries from Kibana dashboards that are created by users dynamically, and pass it like a parameter in Java ?
this is an example of a query coming from Kibana dashboards :
{
"size": 0,
"query": {
"filtered": {
"query": {
"query_string": {
"analyze_wildcard": true,
"query": "*"
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": 1274879129857,
"lte": 1432645529858
}
}
}
],
"must_not": []
}
}
}
},
"aggs": {
"3": {
"terms": {
"field": "ruleid",
"size": 20,
"order": {
"_count": "desc"
}
}
}
}
}

Resources