ElasticSearch match if value in list of values - symfony

I'm working with ElasticSearch (on a Symfony 4 project with FosElasticaBundle) and I don't know how can I build a query like :
Match if value in list of values
I have a field "code" and I need to retrieve an element ONLY if the value of this field "code" is : "first or "second" or "third"
I tried with terms :
Must query : nothing is retrieved, maybe must is "first AND "second" AND "third" ?
Should query : Everything is retrieved, even if I have another value than "first or "second" or "third", so shoul query is useless ?
I tried with Match (should, must) or term one by one... nothing...
So, how can I do that in elasticsearch ? Thanks !

PUT some-test
PUT /some-test/doc/1
{
"foo": "first"
}
PUT /some-test/doc/2
{
"foo": "second"
}
PUT /some-test/doc/3
{
"foo": "fourth"
}
POST some-test/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"foo": "first second third"
}
}
]
}
}
}
gives me
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.2876821,
"hits": [
{
"_index": "some-test",
"_type": "doc",
"_id": "2",
"_score": 0.2876821,
"_source": {
"foo": "second"
}
},
{
"_index": "some-test",
"_type": "doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"foo": "first"
}
}
]
}
}

Related

DynamoDB how to filter by attributes of array items?

{
"id":{"N": "1"},
"attributes": {
"L": [
{
"M": {
"name": { "S": "AA" }
}
},
{
"M": {
"name": { "S": "BB" }
}
}
]
}
},
{
"id":{"N": "1"},
"attributes": {
"L": [
{
"M": {
"name": { "S": "BB" }
}
}
]
}
}
With the above data in the same partition( id as the partition key), How can I find records where any of attributes has name = 'BB' ?
I can filter by Nth item, for example.
KeyConditionExpression: 'id = :myId',
ExpressionAttributeValues: {'myId':{N:'1'},'myValue':{S:'BB'}},
ExpressionAttributeNames:{'#name': 'name'},
FilterExpression: 'attributes[0].#name=:myValue'
This would return only 2nd item. But is there a way to filter by ANY item in the array? It should return both records.
Tried set FilterExpression to attributes[*].#name=:myValue or attributes.#name=:myValue, neither works.

Elasticsearch & Elasticpress search by math_phrase with & inside query

I have problem with my queries when I'm using " or ' - then I expect match_phrase, but I don't know how I can retrieve posts when I'm using match_phrase with &
For example I'm using Something & Something as phrase, and when I'm didn't using ' and " I can see posts with Something & Something but there I'm using multi_match.
Something what I've tried:
{
"from": 0,
"size": 10,
"sort": {
"post_date": {
"order": "desc"
}
},
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"match_phrase": {
"query": "Something & Something"
}
}
]
}
},
"exp": {
"post_date_gmt": {
"scale": "270d",
"decay": 0.5,
"offset": "90d"
}
},
"score_mode": "avg",
"boost_mode": "sum"
}
},
"post_filter": {
"bool": {
"must": [
{
"terms": {
"post_type.raw": [
"post"
]
}
},
{
"terms": {
"post_status": [
"publish"
]
}
}
]
}
}
}
But this doesn't return any post, and returning hits total 0. Anyone have any idea, or suggestions, what I'm doing wrong ?
match_phrase is very restrictive and in most of cases is recommended to use it inside a should clause to increase the score instead of a must, because it requires the user to type the value exactly as it is.
Example document
POST test_jakub/_doc
{
"query": "Something & Something",
"post_type": {
"raw": "post"
},
"post_status": "publish",
"post_date_gmt": "2021-01-01T12:10:30Z",
"post_date": "2021-01-01T12:10:30Z"
}
With this document searching for "anotherthing Something & Something" will return no results, that's why is a bad idea to use match_phrase here.
You can take 2 approaches
If you need this kind of tight queries take a look to the slop parameter that adds some flexibility to the match_phrase query allowing omit or transpose words in the phrase
Switch to a regular match query (recommended). In most cases this will work fine, but if you want to do extra score to the phrase matches you can add it as a should clause.
POST test_jakub/_search
{
"from": 0,
"size": 10,
"sort": {
"post_date": {
"order": "desc"
}
},
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match_phrase": {
"query": {
"query": "anotherthing something & something",
"slop": 2
}
}
}
],
"must": [
{
"match": {
"query": "anotherthing something & something"
}
}
]
}
},
"exp": {
"post_date_gmt": {
"scale": "270d",
"decay": 0.5,
"offset": "90d"
}
},
"score_mode": "avg",
"boost_mode": "sum"
}
},
"post_filter": {
"bool": {
"must": [
{
"terms": {
"post_type.raw": [
"post"
]
}
},
{
"terms": {
"post_status": [
"publish"
]
}
}
]
}
}
}
Last advice is to avoid using "query" as field name because leads to confusion and will break Kibana autocomplete on Dev Tools.

Updating item in DynamoDB fails for the UpdateExpression syntax

My table data looks like below one
{
"id": {
"S": "alpha-rocket"
},
"images": {
"SS": [
"apple/value:50",
"Mango/aa:284_454_51.0.0",
"Mango/bb:291",
"Mango/cc:4"
]
},
"product": {
"S": "fruit"
}
}
Below is my code to update table. The variables I am passing to function has values product_id has alpha-rocket, image_val has 284_454_53.0.0 and image has Mango/aa:284_454_53.0.0.
I am trying to update value of Mango/aa from 284_454_51.0.0 to 284_454_53.0.0 but getting an error "The document path provided in the update expression is invalid for update"
def update_player_score(product_id, image_val, image):
dynamo = boto3.resource('dynamodb')
tbl = dynamo.Table('<TableName>')
result = tbl.update_item(
expression_attribute_names: {
"#image_name" => "image_name"
},
expression_attribute_values: {
":image_val" => image_val,
},
key: {
"product" => "fruit",
"id" => product_id,
},
return_values: "ALL_NEW",
table_name: "orcus",
update_expression: "SET images.#image_val = :image_val",
}
Is there a way to update the value of Mango/aa or replace full string "Mango/aa:284_454_51.0.0" to "Mango/aa:284_454_53.0.0"
You cannot update a string in a list by matching the string. If you know the index of it you can replace the value of the string by index:
SET images[1] = : image_val
It seems like maybe what you want is not a list of strings, but another map. So instead of your data looking like it does you'd make it look like this, which would allow you to do the update you're looking for:
{
"id": {
"S": "alpha-rocket"
},
"images": {
"M": {
"apple" : {
"M": {
"value": {
"S": "50"
}
},
"Mango" : {
"M": {
"aa": {
"S": "284_454_51.0.0"
},
"bb": {
"S": "291"
},
"cc": {
"S": "4"
}
}
}
},
"product": {
"S": "fruit"
}
}
I would also consider putting the different values in different "rows" in the table and using queries to build the objects.

Jq parsing : Selecting an object from a list of objects based on criteria

I have a JSON like this.
{
"servers": [
{
"id": "1",
"addresses": {
"services_z1": [
{
"OS-EXT-IPS-MAC:mac_addr": "fa:16:3e:bc:db:7d",
"addr": "10.3.3.18",
"version": 4,
"OS-EXT-IPS:type": "fixed"
}
]
}
},
{
"id": "2",
"addresses": {
"services_z1": [
{
"OS-EXT-IPS-MAC:mac_addr": "fa:16:3e:bc:db:7d",
"addr": "10.3.3.19",
"version": 4,
"OS-EXT-IPS:type": "fixed"
}
]
}
},
{
"id": "3",
"addresses": {
"services_z1": [
{
"OS-EXT-IPS-MAC:mac_addr": "fa:16:3e:bc:db:7d",
"addr": "10.3.3.20",
"version": 4,
"OS-EXT-IPS:type": "fixed"
}
]
}
},
{
"id": "4",
"addresses": {
"services_z1": [
{
"OS-EXT-IPS-MAC:mac_addr": "fa:16:3e:bc:db:7d",
"addr": "10.3.3.21",
"version": 4,
"OS-EXT-IPS:type": "fixed"
}
]
}
}
]
}
I am trying to find the server id for which the addr value is 10.3.3.18. How can I achieve that?
I know that it would be something like jq '.servers[] | select(some criteria)'
But I am not able ot form that criteria.
Any pointer would be of huge help.
You want something like the following:
jq '.servers[]|select(.addresses.services_z1[].addr=="10.3.3.18")|.id'
This says to look through all of the servers, match those that have .addresses.services_z1[].addr=="10.3.3.18", and then print the id of those servers.

Elasticsearch PHP longest prefix match

I am currently using the FOSElasticaBundle in Symfony2 and I am having a hard time trying to build a search to match the longest prefix.
I am aware of the 100 examples that are on the Internet to perform autocomplete-like searches using this. However, my problem is a little different.
In an autocomplete type of search the database holds the longest alphanumeric string (in length of characters) and the user just provides the shortest portion, let's say the user types "jho" and Elasticsearch can easily provide "Jhon, Jhonny, Jhonas".
My problem is backwards, I would like to provide the longest alphanumeric string and I want Elasticsearch to provide me the biggest match in the database.
For example: I could provide "123456789" and my database can have [12,123,14,156,16,7,1234,1,67,8,9,123456,0], in this case the longest prefix match in the database for the number that the user provided is "123456".
I am just starting with Elasticsearch so I don't really have a close to working settings or anything.
If there is any information not clear or missing let me know and I will provide more details.
Update 1 (Using Val's 2nd Update)
Index: Download 1800+ indexes
Settings:
curl -XPUT localhost:9200/tests -d '{
"settings": {
"analysis": {
"analyzer": {
"edge_ngram_analyzer": {
"tokenizer": "edge_ngram_tokenizer",
"filter": [ "lowercase" ]
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "25"
}
}
}
},
"mappings": {
"test": {
"properties": {
"my_string": {
"type": "string",
"fields": {
"prefix": {
"type": "string",
"analyzer": "edge_ngram_analyzer"
}
}
}
}
}
}
}'
Query:
curl -XPOST localhost:9200/tests/test/_search?pretty=true -d '{
"size": 1,
"sort": {
"_script": {
"script": "doc.my_string.value.length()",
"type": "number",
"order": "desc"
},
"_score": "desc"
},
"query": {
"filtered": {
"query": {
"match": {
"my_string.prefix": "8092232423"
}
},
"filter": {
"script": {
"script": "doc.my_string.value.length() <= maxlength",
"params": {
"maxlength": 10
}
}
}
}
}
}'
With this configuration the query returns the following results:
{
"took" : 61,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1754,
"max_score" : null,
"hits" : [ {
"_index" : "tests",
"_type" : "test",
"_id" : "AU8LqQo4FbTZPxBtq3-Q",
"_score" : 0.13441172,
"_source":{"my_string":"80928870"},
"sort" : [ 8.0, 0.13441172 ]
} ]
}
}
Bonus question
I would like to provide an array of numbers for that search and get the matching prefix for each one in an efficient way without having to perform the query each time
Here is my take at it.
Basically, what we need to do is to slice and dice the field (called my_string below) at indexing time with an edgeNGram tokenizer (called edge_ngram_tokenizer below). That way a string like 123456789 will be tokenized to 12, 123, 1234, 12345, 123456, 1234567, 12345678, 123456789 and all tokens will be indexed and searchable.
So let's create a tests index, a custom analyzer called edge_ngram_analyzer analyzer and a test mapping containing a single string field called my_string. You'll note that the my_string field is a multi-field declaring a prefixes sub-field which will contain all the tokenized prefixes.
curl -XPUT localhost:9200/tests -d '{
"settings": {
"analysis": {
"analyzer": {
"edge_ngram_analyzer": {
"tokenizer": "edge_ngram_tokenizer",
"filter": [ "lowercase" ]
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "25"
}
}
}
},
"mappings": {
"test": {
"properties": {
"my_string": {
"type": "string",
"fields": {
"prefixes": {
"type": "string",
"index_analyzer": "edge_ngram_analyzer"
}
}
}
}
}
}
}
Then let's index a few test documents using the _bulk API:
curl -XPOST localhost:9200/tests/test/_bulk -d '
{"index":{}}
{"my_string":"12"}
{"index":{}}
{"my_string":"1234"}
{"index":{}}
{"my_string":"1234567890"}
{"index":{}}
{"my_string":"abcd"}
{"index":{}}
{"my_string":"abcdefgh"}
{"index":{}}
{"my_string":"123456789abcd"}
{"index":{}}
{"my_string":"abcd123456789"}
'
The thing that I found particularly tricky was that the matching result could be either longer or shorter than the input string. To achieve that we have to combine two queries, one looking for shorter matches and another for longer matches. So the match query will find documents with shorter "prefixes" matching the input and the query_string query (with the edge_ngram_analyzer applied on the input string!) will search for "prefixes" longer than the input string. Both enclosed in a bool/should and sorted by a decreasing string length (i.e. longest first) will do the trick.
Let's do some queries and see what unfolds:
This query will return the one document with the longest match for "123456789", i.e. "123456789abcd". In this case, the result is longer than the input.
curl -XPOST localhost:9200/tests/test/_search -d '{
"size": 1,
"sort": {
"_script": {
"script": "doc.my_string.value.length()",
"type": "number",
"order": "desc"
}
},
"query": {
"bool": {
"should": [
{
"match": {
"my_string.prefixes": "123456789"
}
},
{
"query_string": {
"query": "123456789",
"default_field": "my_string.prefixes",
"analyzer": "edge_ngram_analyzer"
}
}
]
}
}
}'
The second query will return the one document with the longest match for "123456789abcdef", i.e. "123456789abcd". In this case, the result is shorter than the input.
curl -XPOST localhost:9200/tests/test/_search -d '{
"size": 1,
"sort": {
"_script": {
"script": "doc.my_string.value.length()",
"type": "number",
"order": "desc"
}
},
"query": {
"bool": {
"should": [
{
"match": {
"my_string.prefixes": "123456789abcdef"
}
},
{
"query_string": {
"query": "123456789abcdef",
"default_field": "my_string.prefixes",
"analyzer": "edge_ngram_analyzer"
}
}
]
}
}
}'
I hope that covers it. Let me know if not.
As for your bonus question, I'd simply suggest using the _msearch API and sending all queries at once.
UPDATE: Finally, make sure that scripting is enabled in your elasticsearch.yml file using the following:
# if you have ES <1.6
script.disable_dynamic: false
# if you have ES >=1.6
script.inline: on
UPDATE 2 I'm leaving the above as the use case might fit someone else's needs. Now, since you only need "shorter" prefixes (makes sense !!), we need to change the mapping a little bit and the query.
The mapping would be like this:
{
"settings": {
"analysis": {
"analyzer": {
"edge_ngram_analyzer": {
"tokenizer": "edge_ngram_tokenizer",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "25"
}
}
}
},
"mappings": {
"test": {
"properties": {
"my_string": {
"type": "string",
"fields": {
"prefixes": {
"type": "string",
"analyzer": "edge_ngram_analyzer" <--- only change
}
}
}
}
}
}
}
And the query would now be a bit different but will always return only the longest prefix but shorter or of equal length to the input string. Please try it out. I advise to re-index your data to make sure everything is setup properly.
{
"size": 1,
"sort": {
"_script": {
"script": "doc.my_string.value.length()",
"type": "number",
"order": "desc"
},
"_score": "desc" <----- also add this line
},
"query": {
"filtered": {
"query": {
"match": {
"my_string.prefixes": "123" <--- input string
}
},
"filter": {
"script": {
"script": "doc.my_string.value.length() <= maxlength",
"params": {
"maxlength": 3 <---- this needs to be set to the length of the input string
}
}
}
}
}
}

Resources