jq update objects based on array of object names - jq

I'm trying to update some objects based on a list of objects. For example I want to turn this:
{
"names": ["a","c"],
"del": {
"a": true,
"b": true,
"c": true
}
}
into this:
{
"names": ["a","c"],
"del": {
"a": false,
"b": true,
"c": false
}
}
So for each object name in .names update its corresponding object in .del
The solution I can up with seems inefficient and I was wondering if there was a better way.
[foreach .names[] as $name (.;.del[$name] = false ; .) ] | last

I think using last is a good indication that you don't care about intermediate values and since foreach1 is described as:
The foreach syntax is similar to reduce, but intended to allow the construction of limit and reducers that produce intermediate results
There is an equivalent reduce:
reduce .names[] as $name (.; .del[$name]=false)
When both are possible, reduce is more efficient in terms of code as communication with other programmers and potential performance. (If the reduce implementation were found to be slower than a pattern with foreach, then jq could reimplement reduce with that pattern.)

Related

Multiple add if doesn't exist steps Gremlin

I have an injected array of values. I'm I want to add vertices if they don't exist. I use the fold and coalesce step, but it doesn't work in this instance since I'm trying to do it for multiple vertices. Since 1 vertex exists I can no longer get a null value, and the the unfold inside the coalesce step returns a value from there on. This leads to vertices that don't exist yet not to be added.
This is my current traversal:
const traversal = await g
?.inject([
{ twitterPostId: 'kay', like: true, retweet: false },
{ twitterPostId: 'fay', like: true, retweet: false },
{ twitterPostId: 'nay', like: true, retweet: false },
])
.unfold()
.as('a')
.aggregate('ta')
.V()
.as('b')
.where('b', p.eq('a'))
.by(__.id())
.by('twitterPostId')
.fold()
.coalesce(__.unfold(), __.addV().property(t.id, __.select('ta').unfold().select('twitterPostId')))
.toList();
Returns:
[Bn { id: 'kay', label: 'vertex', properties: undefined }]
Without using coalesce you can do conditional upserts using what we often refer to as "map injection". The Gremlin does get a little advanced, but here is an example
g.withSideEffect('ids',['3','4','xyz','abc']).
withSideEffect('p',['xyz': ['type':'dog'],'abc':['type':'cat']]).
V('3','4','xyz','abc').
id().fold().as('found').
select('ids').
unfold().
where(without('found')).as('missing').
addV('new-vertex').
property(id,select('missing')).
property('type',select('p').select(select('missing')).select('type'))
That query will look for a set of vertices, figure out which ones exist, and for the rest use the ID values and properties from the map called 'p' to create the new vertices. You can build on this pattern a great many ways and I find it very useful until mergeV and mergeE are more broadly available
You can also use the list of IDs in the query to check which ones exist. However, this may lead to inefficient query plans depending on the given implementation:
g.withSideEffect('ids',['3','4','xyz','abc']).
withSideEffect('p',['xyz': ['type':'dog'],'abc':['type':'cat']]).
V().
where(within('ids')).
by(id).
by().
id().fold().as('found').
select('ids').
unfold().
where(without('found')).as('missing').
addV('new-vertex').
property(id,select('missing')).
property('type',select('p').select(select('missing')).select('type'))
This is trickier than the first query, as the V step cannot take a traversal. So you cannot do V(select('ids')) in Gremlin today.

How can I delete all keys that don't match certain names with JQ?

I have a huge JSON file with lots of stuff I don't care about, and I want to filter it down to only the few keys I care about, preserving the structure. I won't bother if the same key name might occur in different paths and I get both of them. I gleaned something very close from the answers to this question, it taught me how to delete all properties with certain values, like all null values:
del(..|nulls)
or, more powerfully
del(..|select(. == null))
I searched high and low if I could write a predicate over the name of a property when I am looking at a property. I come from XSLT where I could write something like this:
del(..|select(name|test("^(foo|bar)$")))
where name/1 would be the function that returns the property name or array index number where the current value comes from. But it seems that jq lacks the metadata on its values, so you can only write predicates about their value, and perhaps the type of their value (that's still just a feature of the value), but you cannot inspect the name, or path leading up to it?
I tried to use paths and leaf_paths and stuff like that, but I have no clue what that would do and tested it out to see how this path stuff works, but it seems to find child paths inside an object, not the path leading up to the present value.
So how could this be done, delete everything but a set of key values? I might have found a way here:
walk(
if type == "object" then
with_entries(
select( ( .key |test("^(foo|bar|...)$") )
and ( .value != "" )
and ( .value != null ) )
)
else
.
end
)
OK, this seems to work. But I still wonder it would be so much easier if we had a way of querying the current property name, array index, or path leading up to the present item being inspected with the simple recusion ..| form.
In analogy to your approach using .. and del, you could use paths and delpaths to operate on a stream of path arrays, and delete a given path if not all of its elements meet your conditions.
delpaths([paths | select(all(IN("foo", "bar") or type == "number") | not)])
For the condition I used IN("foo", "bar") but (type == "string" and test("^(foo|bar)$")) would work as well. To also retain array elements (which have numeric indices), I added or type == "number".
Unlike in XML, there's no concept of attributes in jq. You'll need to delete from objects.
To delete an element of an object, you need to use del( obj[ key ] ) (or use with_entries). You can get a stream of the keys of an object using keys[]/keys_unsorted[] and filter out the ones you don't want to delete.
Finally, you need to invert the result of test because you want to delete those that don't match.
After fixing these problems, we get the following:
INDEX( "foo", "bar" ) as $keep |
del(
.. | objects |
.[
keys_unsorted[] |
select( $keep[ . ] | not )
]
)
Demo on jqplay
Note that I substituted the regex match with a dictionary lookup. You could use test( "^(?:foo|bar)\\z" ) in lieu of $keep[ . ], but a dictionary lookup should be faster than a regex match. And it should be less error-prone too, considering you misused $ and (...) in lieu of \z and (?:...).
The above visits deleted branches for nothing. We can avoid that by using walk instead of ...
INDEX( "foo", "bar" ) as $keep |
walk(
if type == "object" then
del(
.[
keys_unsorted[] |
select( $keep[ . ] | not )
]
)
else
.
end
)
Demo on jqplay
Since I mentioned one could use with_entries instead of del, I'll demonstrate.
INDEX( "foo", "bar" ) as $keep |
walk(
if type == "object" then
with_entries( select( $keep[ .key ] ) )
else
.
end
)
Demo on jqplay
Here's a solution that uses a specialized variant of walk for efficiency (*). It retains objects all keys of which are removed; only trivial changes are needed if a blacklist or some other criterion (e.g., regexp-based) is given instead. WHITELIST should be a JSON array of the key names to be retained.
jq --argjson whitelist WHITELIST '
def retainKeys($array):
INDEX($array[]; .) as $keys
| def r:
if type == "object"
then with_entries( select($keys[.key]) )
| map_values( r )
elif type == "array" then map( r )
else .
end;
r;
retainKeys($whitelist)
' input.json
(*) Note for example:
the use of INDEX
the recursive function, r, has arity 0
for objects, the top-level deletion occurs first.
Here's a space-efficient, walk-free approach, tailored for the case of a WHITELIST. It uses the so-called "streaming" parser, so the invocation would look like this:
jq -n --stream --argjson whitelist WHITELIST -f program.jq input.json
where WHITELIST is a JSON array of the names of the keys to be deleted, and
where program.jq is a file containing the program:
# Input: an array
# Output: the longest head of the array that includes only numbers or items in the dictionary
def acceptable($dict):
last(label $out
| foreach .[] as $x ([];
if ($x|type == "number") or $dict[$x] then . + [$x]
else ., break $out
end));
INDEX( $whitelist[]; .) as $dict
| fromstream(inputs
| if length==2
then (.[0] | acceptable($dict)) as $p
| if ($p|length) == (.[0]|length) - 1 then .[0] = $p | .[1] = {}
elif ($p|length) < (.[0]|length) then empty
else .
end
else .
end )
Note: The reason this is relatively complicated is that it assumes that you want to retain objects all of whose keys have been removed, as illustrated in the following example. If that is not the case, then the required jq program is much simpler.
Example:
WHITELIST: '["items", "config", "spec", "setting2", "name"]'
input.json:
{
"items": [
{
"name": "issue1",
"spec": {
"config": {
"setting1": "abc",
"setting2": {
"name": "xyz"
}
},
"files": {
"name": "cde",
"path": "/home"
},
"program": {
"name": "apache"
}
}
},
{
"name": {
"etc": 0
}
}
]
}
Output:
{
"items": [
{
"name": "issue1",
"spec": {
"config": {
"setting2": {
"name": "xyz"
}
}
}
},
{
"name": {}
}
]
}
I am going to put my own tentative answer here.
The thing is, the solution I had already in my question, meaning I can select keys during forward navigation, but I cannot find out the path leading up to the present value.
I looked around in the source code of jq to see how come we cannot inquire the path leading up to the present value, so we could ask for the key string or array index of the present value. And indeed it looks like jq does not track the path while it walks through the input structure.
I think this is actually a huge opportunity forfeited that could be so easily kept track during the tree walk.
This is why I continue thinking that XML with XSLT and XPath is a much more robust data representation and tool chain than JSON. In fact, I find JSON harder to read even than XML. The benefit of the JSON being so close to javascript is really only relevant if - as I do in some cases - I read the JSON as a javascript source code assigning it to a variable, and then instrument it by changing the prototype of the anonymous JSON object so that I have methods to go with them. But changing the prototype is said to cause slowness. Though I don't think it does when setting it for otherwise anonymous JSON objects.
There is JsonPath that tries (by way of the name) to be something like what XPath is for XML. But it is a poor substitute and also has no way to navigate up the parent (or then sibling) axes.
So, in summary, while selecting by key in white or black lists is possible in principle, it is quite hard, because a pretty easy to have feature of a JSON navigation language is not specified and not implemented. Other useful features that could be easily achieved in jq is backward navigation to parent or ancestor of the present value. Currently, if you want to navigate back, you need to capture the ancestor you want to get back to as a variable. It is possible, but jq could be massively improved by keeping track of ancestors and paths.

In Gremlin, how can I group pairs of elements by a property from one of them?

After some traversal I select the elements I'm interested in through select(). How can I group by one of the properties from one specific element.
What I did:
g.V() // ... some traversal happens here where I obtain a and b
select('a','b').by(valueMap('Name', 'Description', 'Label'))
Right now this gets me all the data I'm interested in, something like:
[
{
"a": { "Name": "A name" ... },
"b": { "Name": "other name" ... },
}
...
]
But I know that b.Name repeats among different pairs of a,b, and so I would like to group all the a elements under their common b element, I think this should be easy to do, but so far I'm unable to do it.
It's probably better to rewrite the whole traversal, but since you kept it as a secret, here's how you would do the post-grouping:
g.V()...
select('a','b').
by(valueMap('Name', 'Description', 'Label')).
group().
by(select('b')).
by(select('a').fold())

What is the proper idiomatic way of checking if a map has no elements in coffeescript?

since a code example is worth a thousand words:
console.log(#searchEnginesMap, {}, #searchEnginesMap == {}, #searchEnginesMap is {}, #searchEnginesMap.empty?, #searchEnginesMap.length)
returns:
{} {} false false false undefined
what's the correct syntax to get a true value for this? (or how should I correctly check if I have a map with zero elements?)
EDIT: extra credit:
how do you compare these two dictionaries to have them be the same (by value, not be reference):
a = {"foo":"bar?q=%s","baz":"qux?q=%s"}
b = {"foo":"bar?q=%s","baz":"qux?q=%s"}
so I need to know what I can use to get get true while comparing these?
Thanks in advance.
There is no CoffeeScript magic solution here. If you want to know if an Object is empty then you have to count the keys. You could use Object.keys:
if Object.keys(obj).length == 0
# obj is empty
Or you could use a loop:
if (true for v of obj).length == 0
# obj is empty
The for ... of loop version could be wrapped in a short-circuiting function without much effort.
I would probably wimp out and grab Underscore or Lodash so that I could use _.isEmpty:
if _(obj).isEmpty()
# obj is empty
That would also solve your second problem because you'd get _.isEqual too:
_(foo: "bar?q=%s", baz: "qux?q=%s").isEqual(baz: "qux?q=%s", foo: "bar?q=%s")
# true
Underscore demo: http://jsfiddle.net/ambiguous/Jad6e/

MapReduce for counting parameter values

I have document like this:
{
"_id": ObjectId("4d17c7963ffcf60c1100002f"),
"title": "Text",
"params": {
"brand": "BMW",
"model": "i3"
}
}
{
"_id": ObjectId("4d17c7963ffcf60c1100002f"),
"title": "Text",
"params": {
"brand": "BMW",
"model": "i5"
}
}
What i need is the count of every params values. like:
brand
---------
BMW (2)
model
---------
i3 (1)
i5 (1)
I think i have to write map/reduce functions. How can i do this? Thanks.
I think i have to write map/reduce functions.
Yes you need a map-reduce for this. For some simple map-reduce examples, please look here.
For your particular case, you first need to change your expectation of the output. The output of the map / reduce is a collection. The collection will look (in your case) something like this:
{ key : { 'brand' : 'bmw' }, value : 2 }
{ key : { 'model' : 'i5' }, value : 1 }
To generate this set you will need a "map" function and a "reduce" function. The "map" function will emit a key and a value. The key is each element of params, the value is the count of 1. The "reduce" function accepts a key and an array of values and returns just a single value. Your question is basically the same as this example on the MongoDB site:
map = function() {
if (!this.params) {
return;
}
for (index in this.params) {
emit(this.params[index], 1);
}
}
reduce = function(previous, current) {
var count = 0;
for (index in current) {
count += current[index];
}
return count;
}
In your map function enumerate the properties of the params property of the this object. For each property you find call emit with a key that contains both the name of the property and the value of the property. Pass 1 as the value. e.g. emit({'brand','BMW'}, 1) but obviously using variables not constants!
In your reduce function you are passed a key and an array of values. Sum these values and return the sum. Even though the initial array will be all 1's don't be tempted to use the length of the array because the reduce function can be called iteratively.
You can group the results afterwards from the result collection, applying an index if necessary for performance.

Resources