How to inverse query using Artifactory Query Language? - artifactory

In Artfactory, I have a list of "repo" properties, some of which come with a "path" property. I need to find all items that do not match these properties. I was wondering if there was a way to structure my logic so that this could be done in one query.
Here's my current, broken logic:
curl -u $credentials \
-X POST https://my-artifactory-server.com/artifactory/api/search/aql \
-H content-type:text/plain -d 'items.find({
"$and": [
{"repo" : {"$neq" : "top_level_directory"}},
{
"$and": [
{"repo" : {"$neq" : "other_top_level_directory"}},
{"path" : {"$nmatch" : "sub_directory/*"}}
]
}
]
}).include("repo","path","name","size","modified")'
The issue with this is that while the "repo" property is unique, the "path" property is not. All subdirectories would thus be excluded from the query result.

Yes. AQL is like a logical clause and therefore we can structurize it.
I'm assuming the following is the original query:
{
"$or": [
{ "repo": { "$eq": "top_level_directory" } },
{
"$and": [
{ "repo": { "$eq": "other_top_level_directory" } },
{ "path": { "$match": "sub_directory/*" } }
]
}
]
}
A - repo == top_level_directory
B - repo == other_top_level_directory
C - path =~ sub_directory/*
Original query:
A ∨ (B ∧ C)
Inversion:
¬(A ∨ (B ∧ C))
<=> (De Morgan)
¬A ∧ ¬(B ∧ C))
<=> (De Morgan)
¬A ∧ (¬B ∨ ¬C))
And the following query deduced:
{
"$and": [
{ "repo": { "$neq": "top_level_directory" } },
{
"$or": [
{ "repo": { "$neq": "other_top_level_directory" } },
{ "path": { "$nmatch": "sub_directory/*" } }
]
}
]
}

Related

JQ: Delete duplicate entry inplace

I am trying to delete a key whose value is duplicated elsewhere. That is I would like to delete all occurences(duplicates) after the first occurence. Here is a sample json file I am working with
{
"clouds":{
"finfolk-vmaas":{
"auth-types":[
"oauth1"
],
"endpoint":"http://10.125.0.10:5240/MAAS/",
"type":"maas"
},
"vsphere":{
"auth-types":[
"userpass"
],
"endpoint":"10.247.0.3",
"regions":{
"QA":{
"endpoint":"10.247.0.3"
}
},
"type":"vsphere"
}
}
}
I would like to get this after the deletion:
{
"clouds":{
"finfolk-vmaas":{
"auth-types":[
"oauth1"
],
"endpoint":"http://10.125.0.10:5240/MAAS/",
"type":"maas"
},
"vsphere":{
"auth-types":[
"userpass"
],
"endpoint":"10.247.0.3",
"regions":{
"QA":{}
},
"type":"vsphere"
}
}
}
Essentially I want to remove this duplicate key:pair "endpoint":"10.247.0.3" and leave the enclosing parentheses {}
Here is a simple jq query that I am trying to play with:
jq -cs 'unique_by(.endpoint)' clouds.json
For each object in .clouds[], this saves the object reduced to its enpoint as $endpoint, then recursively traverses to all child objects, from which, if it contains the previously stored endpoint, (only) the endpoint field will be deleted.
.clouds[] |= ({endpoint} as $endpoint | .[] |= walk(
(objects | select(contains($endpoint))) |= del(.endpoint)
))
{
"clouds": {
"finfolk-vmaas": {
"auth-types": [
"oauth1"
],
"endpoint": "http://10.125.0.10:5240/MAAS/",
"type": "maas"
},
"vsphere": {
"auth-types": [
"userpass"
],
"endpoint": "10.247.0.3",
"regions": {
"QA": {}
},
"type": "vsphere"
}
}
}
Demo

JQ filter specific item based on inner item

Having the following Array
[
[
{ "field" : { "name": "appname" }, "value": { "value" : "app1" } },
{ "field" : { "name": "appstat" }, "value": { "value" : "UP" } }
],
[
{ "field" : { "name": "appname" }, "value": { "value" : "app2" } },
{ "field" : { "name": "appstat" }, "value": { "value" : "DOWN" } }
],
[
{ "field" : { "name": "appname" }, "value": { "value" : "app3" } },
{ "field" : { "name": "appstat" }, "value": { "value" : "READY"} }
]
]
I want to be able to select on specific items based on the appname.
So i can do for example
jq .[] app3
response should be READY
This should bring you there
jq -r --arg q "app3" '
.[]
| select(.[] | .field.name == "appname" and .value.value == $q)
| .[]
| select(.field.name == "appstat").value.value
'
READY
Demo
However, your data structure seems rather complicated. You'd be better off (at least for this use case) with a simpler array of objects to lookup key-value pairs. For example, transform your input like so:
jq 'map(map({(first(.field.name)): first(.value.value)}) | add)'
[
{
"appname": "app1",
"appstat": "UP"
},
{
"appname": "app2",
"appstat": "DOWN"
},
{
"appname": "app3",
"appstat": "READY"
}
]
Demo
That way, your lookup would be as simple as
jq -r --arg q "app3" '.[] | select(.appname == $q).appstat'
READY
Demo

jq: avoid empty arrays mapped field

Here my jq script:
def pick_nationality:
select(.NACIONALITAT) |
{nation: {country: .NACIONALITAT, code: "some code"} };
def pick_surname:
select(.SURNAME) |
{name: {surname: .SURNAME, code: "some code"} };
def pick_extension:
{ use: "official", extension: [pick_nationality, pick_surname] };
map(pick_extension)
Input json is like:
{
"SURNAME": "surname1"
}
{
"NACIONALITAT": "nacionalitat1"
}
However, sometimes any input objects don't contain any look up field:
{
"field1": "value1"
}
{
"field2": "value2"
}
Above script returns:
[
{
"use": "official",
"extension": []
},
{
"use": "official",
"extension": []
}
]
I'd like extension doesn't appear:
[
{
"use": "official"
},
{
"use": "official"
}
]
Any ideas?
You can simply add
| del(..|select(. == []))
as a trailing to your script in order to remove all such empty arrays
Demo
extend your function pick_extension for the desired output:
def pick_extension:
[pick_nationality, pick_surname] as $extension
| { use: "official" }
| if $extension | length > 0 then . + {extension: $extension} else . end;
If no extension could be picked, the empty array will no longer be added to the json object this way.

How to change the include section of an AQL query in a file spec

I want to change the output of a AQL string formatted as a file spec for Artifactory.
The query looks like this:
{
"files": [
{
"aql": {
"items.find":{
"repo":"gradle-dev-local",
"$or":[
{
"$and": [
{ "stat.downloads": { "$eq":null } },
{ "updated": { "$before": "7d" } }
]
},
{
"$and": [
{ "stat.downloads": { "$gt": 0 } },
{ "stat.downloaded": { "$before": "30d" } }
]
}
]
}
}
}
]
}
In a pure AQL REST API call, I would include the following:
"include":["repo", "name", "path", "updated", "sha256", "stat.downloads", "stat.downloaded"]
But when used, it does not get passed in to the right part of the query, resulting in the following error message:
Failed to parse query: items.find({
"repo":"mfm-gradle-dev-local",
"$or":[
{
"$and": [
{ "stat.downloads": { "$eq":null } },
{ "updated": { "$before": "7d" } }
]
},
{
"$and": [
{ "stat.downloads": { "$gt": 0 } },
{ "stat.downloaded": { "$before": "30d" } }
]
}
]
},
"include":["repo", "name", "path", "updated", "sha256", "stat.downloads", "stat.downloaded"]
).include("name","repo","path","actual_md5","actual_sha1","size","type","property"), it looks like there is syntax error near the following sub-query: "include":["repo", "name", "path", "updated", "sha256", "stat.downloads", "stat.downloaded"]
How do I format the AQL so that the include statement gets passed as well?
If you're using the JFrog CLI, there is an open issue (github.com/jfrog/jfrog-cli-go/issues/320) for being able to add includes in the search queries (both using the -s parameter and file specs). Please feel free to add additional information to that issue, if we've missed anything so far.

Elasticsearch PHP longest prefix match

I am currently using the FOSElasticaBundle in Symfony2 and I am having a hard time trying to build a search to match the longest prefix.
I am aware of the 100 examples that are on the Internet to perform autocomplete-like searches using this. However, my problem is a little different.
In an autocomplete type of search the database holds the longest alphanumeric string (in length of characters) and the user just provides the shortest portion, let's say the user types "jho" and Elasticsearch can easily provide "Jhon, Jhonny, Jhonas".
My problem is backwards, I would like to provide the longest alphanumeric string and I want Elasticsearch to provide me the biggest match in the database.
For example: I could provide "123456789" and my database can have [12,123,14,156,16,7,1234,1,67,8,9,123456,0], in this case the longest prefix match in the database for the number that the user provided is "123456".
I am just starting with Elasticsearch so I don't really have a close to working settings or anything.
If there is any information not clear or missing let me know and I will provide more details.
Update 1 (Using Val's 2nd Update)
Index: Download 1800+ indexes
Settings:
curl -XPUT localhost:9200/tests -d '{
"settings": {
"analysis": {
"analyzer": {
"edge_ngram_analyzer": {
"tokenizer": "edge_ngram_tokenizer",
"filter": [ "lowercase" ]
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "25"
}
}
}
},
"mappings": {
"test": {
"properties": {
"my_string": {
"type": "string",
"fields": {
"prefix": {
"type": "string",
"analyzer": "edge_ngram_analyzer"
}
}
}
}
}
}
}'
Query:
curl -XPOST localhost:9200/tests/test/_search?pretty=true -d '{
"size": 1,
"sort": {
"_script": {
"script": "doc.my_string.value.length()",
"type": "number",
"order": "desc"
},
"_score": "desc"
},
"query": {
"filtered": {
"query": {
"match": {
"my_string.prefix": "8092232423"
}
},
"filter": {
"script": {
"script": "doc.my_string.value.length() <= maxlength",
"params": {
"maxlength": 10
}
}
}
}
}
}'
With this configuration the query returns the following results:
{
"took" : 61,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1754,
"max_score" : null,
"hits" : [ {
"_index" : "tests",
"_type" : "test",
"_id" : "AU8LqQo4FbTZPxBtq3-Q",
"_score" : 0.13441172,
"_source":{"my_string":"80928870"},
"sort" : [ 8.0, 0.13441172 ]
} ]
}
}
Bonus question
I would like to provide an array of numbers for that search and get the matching prefix for each one in an efficient way without having to perform the query each time
Here is my take at it.
Basically, what we need to do is to slice and dice the field (called my_string below) at indexing time with an edgeNGram tokenizer (called edge_ngram_tokenizer below). That way a string like 123456789 will be tokenized to 12, 123, 1234, 12345, 123456, 1234567, 12345678, 123456789 and all tokens will be indexed and searchable.
So let's create a tests index, a custom analyzer called edge_ngram_analyzer analyzer and a test mapping containing a single string field called my_string. You'll note that the my_string field is a multi-field declaring a prefixes sub-field which will contain all the tokenized prefixes.
curl -XPUT localhost:9200/tests -d '{
"settings": {
"analysis": {
"analyzer": {
"edge_ngram_analyzer": {
"tokenizer": "edge_ngram_tokenizer",
"filter": [ "lowercase" ]
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "25"
}
}
}
},
"mappings": {
"test": {
"properties": {
"my_string": {
"type": "string",
"fields": {
"prefixes": {
"type": "string",
"index_analyzer": "edge_ngram_analyzer"
}
}
}
}
}
}
}
Then let's index a few test documents using the _bulk API:
curl -XPOST localhost:9200/tests/test/_bulk -d '
{"index":{}}
{"my_string":"12"}
{"index":{}}
{"my_string":"1234"}
{"index":{}}
{"my_string":"1234567890"}
{"index":{}}
{"my_string":"abcd"}
{"index":{}}
{"my_string":"abcdefgh"}
{"index":{}}
{"my_string":"123456789abcd"}
{"index":{}}
{"my_string":"abcd123456789"}
'
The thing that I found particularly tricky was that the matching result could be either longer or shorter than the input string. To achieve that we have to combine two queries, one looking for shorter matches and another for longer matches. So the match query will find documents with shorter "prefixes" matching the input and the query_string query (with the edge_ngram_analyzer applied on the input string!) will search for "prefixes" longer than the input string. Both enclosed in a bool/should and sorted by a decreasing string length (i.e. longest first) will do the trick.
Let's do some queries and see what unfolds:
This query will return the one document with the longest match for "123456789", i.e. "123456789abcd". In this case, the result is longer than the input.
curl -XPOST localhost:9200/tests/test/_search -d '{
"size": 1,
"sort": {
"_script": {
"script": "doc.my_string.value.length()",
"type": "number",
"order": "desc"
}
},
"query": {
"bool": {
"should": [
{
"match": {
"my_string.prefixes": "123456789"
}
},
{
"query_string": {
"query": "123456789",
"default_field": "my_string.prefixes",
"analyzer": "edge_ngram_analyzer"
}
}
]
}
}
}'
The second query will return the one document with the longest match for "123456789abcdef", i.e. "123456789abcd". In this case, the result is shorter than the input.
curl -XPOST localhost:9200/tests/test/_search -d '{
"size": 1,
"sort": {
"_script": {
"script": "doc.my_string.value.length()",
"type": "number",
"order": "desc"
}
},
"query": {
"bool": {
"should": [
{
"match": {
"my_string.prefixes": "123456789abcdef"
}
},
{
"query_string": {
"query": "123456789abcdef",
"default_field": "my_string.prefixes",
"analyzer": "edge_ngram_analyzer"
}
}
]
}
}
}'
I hope that covers it. Let me know if not.
As for your bonus question, I'd simply suggest using the _msearch API and sending all queries at once.
UPDATE: Finally, make sure that scripting is enabled in your elasticsearch.yml file using the following:
# if you have ES <1.6
script.disable_dynamic: false
# if you have ES >=1.6
script.inline: on
UPDATE 2 I'm leaving the above as the use case might fit someone else's needs. Now, since you only need "shorter" prefixes (makes sense !!), we need to change the mapping a little bit and the query.
The mapping would be like this:
{
"settings": {
"analysis": {
"analyzer": {
"edge_ngram_analyzer": {
"tokenizer": "edge_ngram_tokenizer",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "25"
}
}
}
},
"mappings": {
"test": {
"properties": {
"my_string": {
"type": "string",
"fields": {
"prefixes": {
"type": "string",
"analyzer": "edge_ngram_analyzer" <--- only change
}
}
}
}
}
}
}
And the query would now be a bit different but will always return only the longest prefix but shorter or of equal length to the input string. Please try it out. I advise to re-index your data to make sure everything is setup properly.
{
"size": 1,
"sort": {
"_script": {
"script": "doc.my_string.value.length()",
"type": "number",
"order": "desc"
},
"_score": "desc" <----- also add this line
},
"query": {
"filtered": {
"query": {
"match": {
"my_string.prefixes": "123" <--- input string
}
},
"filter": {
"script": {
"script": "doc.my_string.value.length() <= maxlength",
"params": {
"maxlength": 3 <---- this needs to be set to the length of the input string
}
}
}
}
}
}

Resources