Consider this json document
$ DATA='{ "url": "https::/abc/", "issues": { "1": { "number": 1}, "2": {"number": 2 } } }'
$ echo $DATA | jq .
{
"url": "https::/abc/",
"issues": {
"1": {
"number": 1
},
"2": {
"number": 2
}
}
}
I'm trying to add a new field, called extra to the issues object that is a composite of the .url field and the .number fields.
So the result should look like this
{
"url": "https::/abc/",
"issues": {
"1": {
"number": 1,
"extra": "https::/abc/1"
},
"2": {
"number": 2,
"extra": "https::/abc/2"
}
}
I can get part of the way there with the .url field by storing it in the a variable ($URL)
$ echo $DATA | jq '.url as $URL | .issues[] += { "extra" : "\( $URL )" } '
{
"url": "https::/abc/",
"issues": {
"1": {
"number": 1,
"extra": "https::/abc/"
},
"2": {
"number": 2,
"extra": "https::/abc/"
}
}
}
The problem comes with getting access to .number.
Just referencing .number in the composite value returns null
$ echo $DATA | jq '.url as $URL | .issues[] += { "extra" : "\( $URL )\( .number )" } '
{
"url": "https::/abc/",
"issues": {
"1": {
"number": 1,
"extra": "https::/abc/null"
},
"2": {
"number": 2,
"extra": "https::/abc/null"
}
}
}
next attempt was to try to store .number in a variable, but that didn't work
$ echo $DATA | jq '.url as $URL | .issues[].number as $NUM += { "extra" : "\( $URL )\( $NUM )" } '
jq: error: syntax error, unexpected +=, expecting '|' (Unix shell quoting issues?) at <top-level>, line 1:
.url as $URL | .issues[].number as $NUM += { "extra" : "\( $URL )\( $NUM )" }
jq: 1 compile error
Any ideas on what I'm missing?
By using += to update your objects you're losing the context of what you're updating. So naturally you won't be able to reference the current item. Instead you should use updating assignment instead. That way . refers to the current input and you can reference the other property directly.
.url as $URL | .issues[] |= . + { "extra" : "\( $URL )\( .number )" }
You can use another variable for .number, just use |= instead of += to keep the context:
.url as $url | .issues[] |= .number as $num | .extra = $url + ($num | tostring)
Store the .url into a variable, traverse to the level of the .issues[] items, and update |= the .extra field. Use string interpolation "\(…)" or tostring or #text to convert the numeral into a string, in order to add it to the previously stored url string:
. as {$url} | .issues[] |= (.extra = $url + "\(.number)")
{
"url": "https::/abc/",
"issues": {
"1": {
"number": 1,
"extra": "https::/abc/1"
},
"2": {
"number": 2,
"extra": "https::/abc/2"
}
}
}
Demo
Related
So, given two files with JSON data from the same source type. The JSON objects look something like:
file1:
[
{
"data": {
"id": "2",
"nodes": [
{
"stuff": "foo"
}
]
}
},
{
"data": {
"id": "6",
"nodes": [
{
"stuff": "bar"
}
]
}
},
{
"data": {
"id": "61",
"nodes": [
{
"stuff": "baz"
}
]
}
},
{
"data": {
"id": "63",
"nodes": [
{
"stuff": "qux"
}
]
}
}
]
file2:
[
{
"data": {
"id": "61",
"nodes": [
{
"stuff": "baz"
}
]
}
},
{
"data": {
"id": "63",
"nodes": [
{
"stuff": "qux"
}
]
}
}
]
I'm trying to remove objects in the array in the first file with the matching IDs in the second file so that the resultant output would be:
[
{
"data": {
"id": "2",
"nodes": [
{
"stuff": "foo"
}
]
}
},
{
"data": {
"id": "6",
"nodes": [
{
"stuff": "bar"
}
]
}
}
]
I've tried a bunch of ways to accomplish this, but I haven't found a proper solution yet.
A couple of attempts have been various permutations of the following with accompanying errors:
jq -n --argfile src /var/tmp/w-src.json --argfile dst /var/tmp/w-dst.json '
$dst
| [.data[].id] as $ids
| $src
| .data | map(select(.id | in($ids[])))
jq: error: select/0 is not defined at <top-level>, line 5:
| .data | map($ids | map(select .id == .))
jq: 1 compile error
jq -n --argfile src /var/tmp/w-src.json --argfile dst /var/tmp/w-dst.json '
$dst
| [.data[].id] as $ids
| $src
| .data[] | select(.id | in($ids[]))
'
jq: error (at <unknown>): Cannot check whether string has a string key
Ideally it would be super cool to do some kind of operation like:
$src.data[] - $dst.data[]
(kinda Ruby-ish like would be cool) and I admit, I haven't tried this but I will for kicks and giggles.
I'm trying not to have to use a function and I want to accomplish this using jq. I'm probably not too far off, but I'm at a loss. Any thoughts?
You could compile a list of IDs from the second file using input, check against it using IN, and either use del to delete the matching, or map to keep those that do not match:
jq '
(input | map(.data.id)) as $del | del(.[] | select(IN(.data.id; $del[])))
' file1.json file2.json
or
jq '
(input | map(.data.id)) as $del | map(select(IN(.data.id; $del[]) | not))
' file1.json file2.json
If you can assert that objects with identical IDs also are identical in their other parts, and you don't have many items (because it's costly), you can even just subtract the second file from the first:
jq '. - input' file1.json file2.json
I am new to jq and stuck with this problem for a while. Any help is appreciable.
I have two json files,
In file1.json:
{
"version": 4,
"group1": [
{
"name":"olditem1",
"content": "old content"
}
],
"group2": [
{
"name":"olditem2"
}
]
}
And in file2.json:
{
"group1": [
{
"name" : "newitem1"
},
{
"name":"olditem1",
"content": "new content"
}
],
"group2": [
{
"name" : "newitem2"
}
]
}
Expected result is:
{
"version": 4,
"group1": [
{
"name":"olditem1",
"content": "old content"
},
{
"name" : "newitem1"
}
],
"group2": [
{
"name":"olditem2"
},
{
"name" : "newitem2"
}
]
}
Criterial for merge:
Has to merge only group1 and group2
Match only by name
I have tried
jq -S '.group1+=.group1|.group1|unique_by(.name)' file1.json file2.json
but this is filtering group1 and all other info are lost.
This approach uses INDEX to create a dictionary of unique elements based on their .name field, reduce to iterate over the group fields to be considered, and an initial state created by combining the slurped (-s) input files using add after removing the group fileds to be processed separately using del.
jq -s '
[ "group1", "group2" ] as $gs | . as $in | reduce $gs[] as $g (
map(del(.[$gs[]])) | add; .[$g] = [INDEX($in[][$g][]; .name)[]]
)
' file1.json file2.json
{
"version": 4,
"group1": [
{
"name": "olditem1",
"content": "new content"
},
{
"name": "newitem1"
}
],
"group2": [
{
"name": "olditem2"
},
{
"name": "newitem2"
}
]
}
Demo
Having the following Array
[
[
{ "field" : { "name": "appname" }, "value": { "value" : "app1" } },
{ "field" : { "name": "appstat" }, "value": { "value" : "UP" } }
],
[
{ "field" : { "name": "appname" }, "value": { "value" : "app2" } },
{ "field" : { "name": "appstat" }, "value": { "value" : "DOWN" } }
],
[
{ "field" : { "name": "appname" }, "value": { "value" : "app3" } },
{ "field" : { "name": "appstat" }, "value": { "value" : "READY"} }
]
]
I want to be able to select on specific items based on the appname.
So i can do for example
jq .[] app3
response should be READY
This should bring you there
jq -r --arg q "app3" '
.[]
| select(.[] | .field.name == "appname" and .value.value == $q)
| .[]
| select(.field.name == "appstat").value.value
'
READY
Demo
However, your data structure seems rather complicated. You'd be better off (at least for this use case) with a simpler array of objects to lookup key-value pairs. For example, transform your input like so:
jq 'map(map({(first(.field.name)): first(.value.value)}) | add)'
[
{
"appname": "app1",
"appstat": "UP"
},
{
"appname": "app2",
"appstat": "DOWN"
},
{
"appname": "app3",
"appstat": "READY"
}
]
Demo
That way, your lookup would be as simple as
jq -r --arg q "app3" '.[] | select(.appname == $q).appstat'
READY
Demo
Here my jq script:
def pick_nationality:
select(.NACIONALITAT) |
{nation: {country: .NACIONALITAT, code: "some code"} };
def pick_surname:
select(.SURNAME) |
{name: {surname: .SURNAME, code: "some code"} };
def pick_extension:
{ use: "official", extension: [pick_nationality, pick_surname] };
map(pick_extension)
Input json is like:
{
"SURNAME": "surname1"
}
{
"NACIONALITAT": "nacionalitat1"
}
However, sometimes any input objects don't contain any look up field:
{
"field1": "value1"
}
{
"field2": "value2"
}
Above script returns:
[
{
"use": "official",
"extension": []
},
{
"use": "official",
"extension": []
}
]
I'd like extension doesn't appear:
[
{
"use": "official"
},
{
"use": "official"
}
]
Any ideas?
You can simply add
| del(..|select(. == []))
as a trailing to your script in order to remove all such empty arrays
Demo
extend your function pick_extension for the desired output:
def pick_extension:
[pick_nationality, pick_surname] as $extension
| { use: "official" }
| if $extension | length > 0 then . + {extension: $extension} else . end;
If no extension could be picked, the empty array will no longer be added to the json object this way.
I am currently using the FOSElasticaBundle in Symfony2 and I am having a hard time trying to build a search to match the longest prefix.
I am aware of the 100 examples that are on the Internet to perform autocomplete-like searches using this. However, my problem is a little different.
In an autocomplete type of search the database holds the longest alphanumeric string (in length of characters) and the user just provides the shortest portion, let's say the user types "jho" and Elasticsearch can easily provide "Jhon, Jhonny, Jhonas".
My problem is backwards, I would like to provide the longest alphanumeric string and I want Elasticsearch to provide me the biggest match in the database.
For example: I could provide "123456789" and my database can have [12,123,14,156,16,7,1234,1,67,8,9,123456,0], in this case the longest prefix match in the database for the number that the user provided is "123456".
I am just starting with Elasticsearch so I don't really have a close to working settings or anything.
If there is any information not clear or missing let me know and I will provide more details.
Update 1 (Using Val's 2nd Update)
Index: Download 1800+ indexes
Settings:
curl -XPUT localhost:9200/tests -d '{
"settings": {
"analysis": {
"analyzer": {
"edge_ngram_analyzer": {
"tokenizer": "edge_ngram_tokenizer",
"filter": [ "lowercase" ]
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "25"
}
}
}
},
"mappings": {
"test": {
"properties": {
"my_string": {
"type": "string",
"fields": {
"prefix": {
"type": "string",
"analyzer": "edge_ngram_analyzer"
}
}
}
}
}
}
}'
Query:
curl -XPOST localhost:9200/tests/test/_search?pretty=true -d '{
"size": 1,
"sort": {
"_script": {
"script": "doc.my_string.value.length()",
"type": "number",
"order": "desc"
},
"_score": "desc"
},
"query": {
"filtered": {
"query": {
"match": {
"my_string.prefix": "8092232423"
}
},
"filter": {
"script": {
"script": "doc.my_string.value.length() <= maxlength",
"params": {
"maxlength": 10
}
}
}
}
}
}'
With this configuration the query returns the following results:
{
"took" : 61,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1754,
"max_score" : null,
"hits" : [ {
"_index" : "tests",
"_type" : "test",
"_id" : "AU8LqQo4FbTZPxBtq3-Q",
"_score" : 0.13441172,
"_source":{"my_string":"80928870"},
"sort" : [ 8.0, 0.13441172 ]
} ]
}
}
Bonus question
I would like to provide an array of numbers for that search and get the matching prefix for each one in an efficient way without having to perform the query each time
Here is my take at it.
Basically, what we need to do is to slice and dice the field (called my_string below) at indexing time with an edgeNGram tokenizer (called edge_ngram_tokenizer below). That way a string like 123456789 will be tokenized to 12, 123, 1234, 12345, 123456, 1234567, 12345678, 123456789 and all tokens will be indexed and searchable.
So let's create a tests index, a custom analyzer called edge_ngram_analyzer analyzer and a test mapping containing a single string field called my_string. You'll note that the my_string field is a multi-field declaring a prefixes sub-field which will contain all the tokenized prefixes.
curl -XPUT localhost:9200/tests -d '{
"settings": {
"analysis": {
"analyzer": {
"edge_ngram_analyzer": {
"tokenizer": "edge_ngram_tokenizer",
"filter": [ "lowercase" ]
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "25"
}
}
}
},
"mappings": {
"test": {
"properties": {
"my_string": {
"type": "string",
"fields": {
"prefixes": {
"type": "string",
"index_analyzer": "edge_ngram_analyzer"
}
}
}
}
}
}
}
Then let's index a few test documents using the _bulk API:
curl -XPOST localhost:9200/tests/test/_bulk -d '
{"index":{}}
{"my_string":"12"}
{"index":{}}
{"my_string":"1234"}
{"index":{}}
{"my_string":"1234567890"}
{"index":{}}
{"my_string":"abcd"}
{"index":{}}
{"my_string":"abcdefgh"}
{"index":{}}
{"my_string":"123456789abcd"}
{"index":{}}
{"my_string":"abcd123456789"}
'
The thing that I found particularly tricky was that the matching result could be either longer or shorter than the input string. To achieve that we have to combine two queries, one looking for shorter matches and another for longer matches. So the match query will find documents with shorter "prefixes" matching the input and the query_string query (with the edge_ngram_analyzer applied on the input string!) will search for "prefixes" longer than the input string. Both enclosed in a bool/should and sorted by a decreasing string length (i.e. longest first) will do the trick.
Let's do some queries and see what unfolds:
This query will return the one document with the longest match for "123456789", i.e. "123456789abcd". In this case, the result is longer than the input.
curl -XPOST localhost:9200/tests/test/_search -d '{
"size": 1,
"sort": {
"_script": {
"script": "doc.my_string.value.length()",
"type": "number",
"order": "desc"
}
},
"query": {
"bool": {
"should": [
{
"match": {
"my_string.prefixes": "123456789"
}
},
{
"query_string": {
"query": "123456789",
"default_field": "my_string.prefixes",
"analyzer": "edge_ngram_analyzer"
}
}
]
}
}
}'
The second query will return the one document with the longest match for "123456789abcdef", i.e. "123456789abcd". In this case, the result is shorter than the input.
curl -XPOST localhost:9200/tests/test/_search -d '{
"size": 1,
"sort": {
"_script": {
"script": "doc.my_string.value.length()",
"type": "number",
"order": "desc"
}
},
"query": {
"bool": {
"should": [
{
"match": {
"my_string.prefixes": "123456789abcdef"
}
},
{
"query_string": {
"query": "123456789abcdef",
"default_field": "my_string.prefixes",
"analyzer": "edge_ngram_analyzer"
}
}
]
}
}
}'
I hope that covers it. Let me know if not.
As for your bonus question, I'd simply suggest using the _msearch API and sending all queries at once.
UPDATE: Finally, make sure that scripting is enabled in your elasticsearch.yml file using the following:
# if you have ES <1.6
script.disable_dynamic: false
# if you have ES >=1.6
script.inline: on
UPDATE 2 I'm leaving the above as the use case might fit someone else's needs. Now, since you only need "shorter" prefixes (makes sense !!), we need to change the mapping a little bit and the query.
The mapping would be like this:
{
"settings": {
"analysis": {
"analyzer": {
"edge_ngram_analyzer": {
"tokenizer": "edge_ngram_tokenizer",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "25"
}
}
}
},
"mappings": {
"test": {
"properties": {
"my_string": {
"type": "string",
"fields": {
"prefixes": {
"type": "string",
"analyzer": "edge_ngram_analyzer" <--- only change
}
}
}
}
}
}
}
And the query would now be a bit different but will always return only the longest prefix but shorter or of equal length to the input string. Please try it out. I advise to re-index your data to make sure everything is setup properly.
{
"size": 1,
"sort": {
"_script": {
"script": "doc.my_string.value.length()",
"type": "number",
"order": "desc"
},
"_score": "desc" <----- also add this line
},
"query": {
"filtered": {
"query": {
"match": {
"my_string.prefixes": "123" <--- input string
}
},
"filter": {
"script": {
"script": "doc.my_string.value.length() <= maxlength",
"params": {
"maxlength": 3 <---- this needs to be set to the length of the input string
}
}
}
}
}
}