What does this do, and is there a simpler way to write it?
Collection>>into: a2block
| all pair |
all := ([:allIn :each| allIn key key: allIn value. each])
-> (pair := nil -> a2block).
pair key: all.
self inject: all
into: [:allIn :each| allIn value: (allIn key value: allIn value value: each). allIn].
^all value
If you want to find out, the best way is to try it, possibly step by step thru debugger.
For example, try:
(1 to: 4) into: #+.
(1 to: 4) into: #*.
You'll see a form of reduction.
So it's like inject:into: but without injecting anything.
The a2block will consume first two elements, then result of this and 3rd element, etc...
So we must understand the selector name as inject:into: without inject:: like the implementation, it's already a crooked way to name things.
This is implemented in Squeak as reduce:, except that reduce: would be well too obvious as a selector, and except that it would raise an Error if sent to an empty collection, contrarily to this which will answer the strange recursive artifact.
The all artifact contains a block to be executed for reduction as key, and current value of reduction as value. At first step, it arranges to replace the block to be executed with a2block, and the first value by first element. But the recursive structure used to achieve the replacement is... un-necessarily complex.
A bit less obfuscated way to write the same thing would be:
into: a2block
| block |
self ifEmpty: [self error: 'receiver of into: shall not be empty'].
block := [:newBlock :each| block := newBlock. each].
^self inject: a2block into: [:accum :each| block value: accum value: each]
That's more or less the same principle: at first iteration, the block is replaced by the reduction block (a2block), and first element is accumulated.
The reduction only begins at 2nd iteration.
The alternative is to check for first iteration inside the loop... One way to do it:
into: a2block
| first |
self ifEmpty: [self error: 'receiver of into: shall not be empty'].
first := Object new.
^self inject: first into: [:accum :each |
accum == first
ifTrue: [each]
ifFalse: [a2block value: accum value: each]].
There are many other ways to write it, more explicitely using a boolean to test for first iteration...
I have a huge JSON file with lots of stuff I don't care about, and I want to filter it down to only the few keys I care about, preserving the structure. I won't bother if the same key name might occur in different paths and I get both of them. I gleaned something very close from the answers to this question, it taught me how to delete all properties with certain values, like all null values:
del(..|nulls)
or, more powerfully
del(..|select(. == null))
I searched high and low if I could write a predicate over the name of a property when I am looking at a property. I come from XSLT where I could write something like this:
del(..|select(name|test("^(foo|bar)$")))
where name/1 would be the function that returns the property name or array index number where the current value comes from. But it seems that jq lacks the metadata on its values, so you can only write predicates about their value, and perhaps the type of their value (that's still just a feature of the value), but you cannot inspect the name, or path leading up to it?
I tried to use paths and leaf_paths and stuff like that, but I have no clue what that would do and tested it out to see how this path stuff works, but it seems to find child paths inside an object, not the path leading up to the present value.
So how could this be done, delete everything but a set of key values? I might have found a way here:
walk(
if type == "object" then
with_entries(
select( ( .key |test("^(foo|bar|...)$") )
and ( .value != "" )
and ( .value != null ) )
)
else
.
end
)
OK, this seems to work. But I still wonder it would be so much easier if we had a way of querying the current property name, array index, or path leading up to the present item being inspected with the simple recusion ..| form.
In analogy to your approach using .. and del, you could use paths and delpaths to operate on a stream of path arrays, and delete a given path if not all of its elements meet your conditions.
delpaths([paths | select(all(IN("foo", "bar") or type == "number") | not)])
For the condition I used IN("foo", "bar") but (type == "string" and test("^(foo|bar)$")) would work as well. To also retain array elements (which have numeric indices), I added or type == "number".
Unlike in XML, there's no concept of attributes in jq. You'll need to delete from objects.
To delete an element of an object, you need to use del( obj[ key ] ) (or use with_entries). You can get a stream of the keys of an object using keys[]/keys_unsorted[] and filter out the ones you don't want to delete.
Finally, you need to invert the result of test because you want to delete those that don't match.
After fixing these problems, we get the following:
INDEX( "foo", "bar" ) as $keep |
del(
.. | objects |
.[
keys_unsorted[] |
select( $keep[ . ] | not )
]
)
Demo on jqplay
Note that I substituted the regex match with a dictionary lookup. You could use test( "^(?:foo|bar)\\z" ) in lieu of $keep[ . ], but a dictionary lookup should be faster than a regex match. And it should be less error-prone too, considering you misused $ and (...) in lieu of \z and (?:...).
The above visits deleted branches for nothing. We can avoid that by using walk instead of ...
INDEX( "foo", "bar" ) as $keep |
walk(
if type == "object" then
del(
.[
keys_unsorted[] |
select( $keep[ . ] | not )
]
)
else
.
end
)
Demo on jqplay
Since I mentioned one could use with_entries instead of del, I'll demonstrate.
INDEX( "foo", "bar" ) as $keep |
walk(
if type == "object" then
with_entries( select( $keep[ .key ] ) )
else
.
end
)
Demo on jqplay
Here's a solution that uses a specialized variant of walk for efficiency (*). It retains objects all keys of which are removed; only trivial changes are needed if a blacklist or some other criterion (e.g., regexp-based) is given instead. WHITELIST should be a JSON array of the key names to be retained.
jq --argjson whitelist WHITELIST '
def retainKeys($array):
INDEX($array[]; .) as $keys
| def r:
if type == "object"
then with_entries( select($keys[.key]) )
| map_values( r )
elif type == "array" then map( r )
else .
end;
r;
retainKeys($whitelist)
' input.json
(*) Note for example:
the use of INDEX
the recursive function, r, has arity 0
for objects, the top-level deletion occurs first.
Here's a space-efficient, walk-free approach, tailored for the case of a WHITELIST. It uses the so-called "streaming" parser, so the invocation would look like this:
jq -n --stream --argjson whitelist WHITELIST -f program.jq input.json
where WHITELIST is a JSON array of the names of the keys to be deleted, and
where program.jq is a file containing the program:
# Input: an array
# Output: the longest head of the array that includes only numbers or items in the dictionary
def acceptable($dict):
last(label $out
| foreach .[] as $x ([];
if ($x|type == "number") or $dict[$x] then . + [$x]
else ., break $out
end));
INDEX( $whitelist[]; .) as $dict
| fromstream(inputs
| if length==2
then (.[0] | acceptable($dict)) as $p
| if ($p|length) == (.[0]|length) - 1 then .[0] = $p | .[1] = {}
elif ($p|length) < (.[0]|length) then empty
else .
end
else .
end )
Note: The reason this is relatively complicated is that it assumes that you want to retain objects all of whose keys have been removed, as illustrated in the following example. If that is not the case, then the required jq program is much simpler.
Example:
WHITELIST: '["items", "config", "spec", "setting2", "name"]'
input.json:
{
"items": [
{
"name": "issue1",
"spec": {
"config": {
"setting1": "abc",
"setting2": {
"name": "xyz"
}
},
"files": {
"name": "cde",
"path": "/home"
},
"program": {
"name": "apache"
}
}
},
{
"name": {
"etc": 0
}
}
]
}
Output:
{
"items": [
{
"name": "issue1",
"spec": {
"config": {
"setting2": {
"name": "xyz"
}
}
}
},
{
"name": {}
}
]
}
I am going to put my own tentative answer here.
The thing is, the solution I had already in my question, meaning I can select keys during forward navigation, but I cannot find out the path leading up to the present value.
I looked around in the source code of jq to see how come we cannot inquire the path leading up to the present value, so we could ask for the key string or array index of the present value. And indeed it looks like jq does not track the path while it walks through the input structure.
I think this is actually a huge opportunity forfeited that could be so easily kept track during the tree walk.
This is why I continue thinking that XML with XSLT and XPath is a much more robust data representation and tool chain than JSON. In fact, I find JSON harder to read even than XML. The benefit of the JSON being so close to javascript is really only relevant if - as I do in some cases - I read the JSON as a javascript source code assigning it to a variable, and then instrument it by changing the prototype of the anonymous JSON object so that I have methods to go with them. But changing the prototype is said to cause slowness. Though I don't think it does when setting it for otherwise anonymous JSON objects.
There is JsonPath that tries (by way of the name) to be something like what XPath is for XML. But it is a poor substitute and also has no way to navigate up the parent (or then sibling) axes.
So, in summary, while selecting by key in white or black lists is possible in principle, it is quite hard, because a pretty easy to have feature of a JSON navigation language is not specified and not implemented. Other useful features that could be easily achieved in jq is backward navigation to parent or ancestor of the present value. Currently, if you want to navigate back, you need to capture the ancestor you want to get back to as a variable. It is possible, but jq could be massively improved by keeping track of ancestors and paths.
Is there a standard method for removing duplicate entries in an array, but preserving the order?
e.g.
#(c a b a a b) withoutDuplicates "-> #(c a b)"
I used to use removeDuplicates, but apparently that's an extension method added by Roassal (so I cannot always use it)
Written by hand, the best solution (I have) is
a := #(c a b a a b).
d := OrderedDictionary new.
a do: [ :each | d at: each put: true ].
d keys. "-> #(c a b)"
But is there a standard way?
Same as yours but shorter in terms of text to type, not performance.
(#(c a f b c a d c a e f f) collect: [ :e | e -> true ]) asOrderedDictionary keys
Your solution looks very good to me. Here is another one:
withoutDuplicates
| visited |
visited := Set new.
^self select: [:element |
(visited includes: element) not
ifTrue: [visited add: element];
yourself]
This one is more verbose but uses (only) one additional collection: the visited set. An OrderedDictionary, on the other hand, has two internal collections a Dictionary and the sequence of orderedKeys. If you are not concerned with space I would suggest using your solution.
As an aside note I would say that the use of #yourself here is a bit unusual. It follows the pattern:
^boolean ifTrue: [self doThis]; yourself
Which has a side effect (self doThis) when boolean is true and answers with boolean in either case. Most people would write it as:
boolean ifTrue: [self doThis].
^boolean
but this requires the addition of a block temporary because in our case boolean refers to the expression (visited includes: element) not which we shouldn't repeat.
OR...
... you could take this opportunity to implement OrderedSet in Pharo...
Imagine we have some method
MyClass>>#method: arg
Transcript crShow: 'executed'
So when you do MyClass new method: 1 the transcript is filled with "executed" lines.
Now I want to skip this method if arg is 0. I've tried to install an instead metalink with a condition:
link := MetaLink new
condition: [ :arguments |
arguments first = 0 ]
arguments: #(arguments);
control: #instead.
(MyClass >> #method:) ast link: link
But then the method does not run anymore and I want to run it if the arg is not 0.
I've also tried to do the condition in the metaobject in this way:
link := MetaLink new
metaObject: [ :ast :arguments :receiver |
arguments first = 0
ifFalse: [
ast compiledMethod
valueWithReceiver: receiver
arguments: arguments ] ];
selector: #value:value:value:;
arguments: #(node arguments receiver);
control: #instead.
(MyClass >> #method:) ast link: link
But in this case you end up in a infinite recursion, as the metalink is called over and over again although I thought that ast compiledMethod should return a compiled method and not the reflective counterpart
yes, it looks like "instead hooks" are always executed "instead" of the original method, even if the link condition does not hold, the difference is just whether we return the value of the instead link evaluation or just nil.
Maybe this should be changed for instead links.
As solution for your usecase, you can use a before link that just returns the receiver if the condition holds:
| ml |
ml := MetaLink new.
ml control: #before.
ml condition:[:args | args first = 0] arguments:#(arguments).
ml selector:#value:.
ml metaObject:[:context | context return].
ml arguments:{#context}.
(MyObject>>#method:) ast link:ml.
the #context is the key for the thisContext refication (RFThisContextReification)
I've got a Dictionary like this:
a PluggableDictionary(
Rankable1->8.5
Rankable2->9.0
)
I need just an OrderedCollection with the Rankable objects in descending order:
a OrderedCollection(
Rankable2
Rankable1
)
I noticed it is easy to sort by keys, but I found it a bit more difficult to sort by values. What is the smalltalk way of doing this?
If you need one shot sorted collection in noncritical loop you might use something like this (uses pharo syntax to initialize example dictionary):
pd := PluggableDictionary newFromPairs: { 'a' . 2 . 'b' . 1 . 'c' . 3} .
(pd associations asSortedCollection: [:x :y | x value < y value])
collect: [:assoc | assoc key].
If you would need it more often, than you might consider introducing your own class that will keep this collection calculated.
If you're using VisualWorks, you can take advantage of SortFunction and Symbol>>value behavior to reduce all of that down to
(aDictionary associations sort: #value ascending) collect: #key
If you can use Grease (eg, when using Seaside), you can
probably use its GROrderedMultiMap. It is intended for small dictionaries with probably multiple values per key.
On a second note, probably you can swap key and value, and just send #asSortedCollection, like this:
(Dictionary newFrom: { 2 -> 'b' . 1-> 'a' })
asSortedCollection "--> a SortedCollection('a' 'b')"
(Tested in Squeak and Pharo)
Got it:
^ ((SortedCollection sortBlock:
[:association :otherAssociation | association value > otherAssociation value])
addAll: theDictionary associations;
yourself) collect: [:association | association key]