I've got a Dictionary like this:
a PluggableDictionary(
Rankable1->8.5
Rankable2->9.0
)
I need just an OrderedCollection with the Rankable objects in descending order:
a OrderedCollection(
Rankable2
Rankable1
)
I noticed it is easy to sort by keys, but I found it a bit more difficult to sort by values. What is the smalltalk way of doing this?
If you need one shot sorted collection in noncritical loop you might use something like this (uses pharo syntax to initialize example dictionary):
pd := PluggableDictionary newFromPairs: { 'a' . 2 . 'b' . 1 . 'c' . 3} .
(pd associations asSortedCollection: [:x :y | x value < y value])
collect: [:assoc | assoc key].
If you would need it more often, than you might consider introducing your own class that will keep this collection calculated.
If you're using VisualWorks, you can take advantage of SortFunction and Symbol>>value behavior to reduce all of that down to
(aDictionary associations sort: #value ascending) collect: #key
If you can use Grease (eg, when using Seaside), you can
probably use its GROrderedMultiMap. It is intended for small dictionaries with probably multiple values per key.
On a second note, probably you can swap key and value, and just send #asSortedCollection, like this:
(Dictionary newFrom: { 2 -> 'b' . 1-> 'a' })
asSortedCollection "--> a SortedCollection('a' 'b')"
(Tested in Squeak and Pharo)
Got it:
^ ((SortedCollection sortBlock:
[:association :otherAssociation | association value > otherAssociation value])
addAll: theDictionary associations;
yourself) collect: [:association | association key]
Related
I have a huge JSON file with lots of stuff I don't care about, and I want to filter it down to only the few keys I care about, preserving the structure. I won't bother if the same key name might occur in different paths and I get both of them. I gleaned something very close from the answers to this question, it taught me how to delete all properties with certain values, like all null values:
del(..|nulls)
or, more powerfully
del(..|select(. == null))
I searched high and low if I could write a predicate over the name of a property when I am looking at a property. I come from XSLT where I could write something like this:
del(..|select(name|test("^(foo|bar)$")))
where name/1 would be the function that returns the property name or array index number where the current value comes from. But it seems that jq lacks the metadata on its values, so you can only write predicates about their value, and perhaps the type of their value (that's still just a feature of the value), but you cannot inspect the name, or path leading up to it?
I tried to use paths and leaf_paths and stuff like that, but I have no clue what that would do and tested it out to see how this path stuff works, but it seems to find child paths inside an object, not the path leading up to the present value.
So how could this be done, delete everything but a set of key values? I might have found a way here:
walk(
if type == "object" then
with_entries(
select( ( .key |test("^(foo|bar|...)$") )
and ( .value != "" )
and ( .value != null ) )
)
else
.
end
)
OK, this seems to work. But I still wonder it would be so much easier if we had a way of querying the current property name, array index, or path leading up to the present item being inspected with the simple recusion ..| form.
In analogy to your approach using .. and del, you could use paths and delpaths to operate on a stream of path arrays, and delete a given path if not all of its elements meet your conditions.
delpaths([paths | select(all(IN("foo", "bar") or type == "number") | not)])
For the condition I used IN("foo", "bar") but (type == "string" and test("^(foo|bar)$")) would work as well. To also retain array elements (which have numeric indices), I added or type == "number".
Unlike in XML, there's no concept of attributes in jq. You'll need to delete from objects.
To delete an element of an object, you need to use del( obj[ key ] ) (or use with_entries). You can get a stream of the keys of an object using keys[]/keys_unsorted[] and filter out the ones you don't want to delete.
Finally, you need to invert the result of test because you want to delete those that don't match.
After fixing these problems, we get the following:
INDEX( "foo", "bar" ) as $keep |
del(
.. | objects |
.[
keys_unsorted[] |
select( $keep[ . ] | not )
]
)
Demo on jqplay
Note that I substituted the regex match with a dictionary lookup. You could use test( "^(?:foo|bar)\\z" ) in lieu of $keep[ . ], but a dictionary lookup should be faster than a regex match. And it should be less error-prone too, considering you misused $ and (...) in lieu of \z and (?:...).
The above visits deleted branches for nothing. We can avoid that by using walk instead of ...
INDEX( "foo", "bar" ) as $keep |
walk(
if type == "object" then
del(
.[
keys_unsorted[] |
select( $keep[ . ] | not )
]
)
else
.
end
)
Demo on jqplay
Since I mentioned one could use with_entries instead of del, I'll demonstrate.
INDEX( "foo", "bar" ) as $keep |
walk(
if type == "object" then
with_entries( select( $keep[ .key ] ) )
else
.
end
)
Demo on jqplay
Here's a solution that uses a specialized variant of walk for efficiency (*). It retains objects all keys of which are removed; only trivial changes are needed if a blacklist or some other criterion (e.g., regexp-based) is given instead. WHITELIST should be a JSON array of the key names to be retained.
jq --argjson whitelist WHITELIST '
def retainKeys($array):
INDEX($array[]; .) as $keys
| def r:
if type == "object"
then with_entries( select($keys[.key]) )
| map_values( r )
elif type == "array" then map( r )
else .
end;
r;
retainKeys($whitelist)
' input.json
(*) Note for example:
the use of INDEX
the recursive function, r, has arity 0
for objects, the top-level deletion occurs first.
Here's a space-efficient, walk-free approach, tailored for the case of a WHITELIST. It uses the so-called "streaming" parser, so the invocation would look like this:
jq -n --stream --argjson whitelist WHITELIST -f program.jq input.json
where WHITELIST is a JSON array of the names of the keys to be deleted, and
where program.jq is a file containing the program:
# Input: an array
# Output: the longest head of the array that includes only numbers or items in the dictionary
def acceptable($dict):
last(label $out
| foreach .[] as $x ([];
if ($x|type == "number") or $dict[$x] then . + [$x]
else ., break $out
end));
INDEX( $whitelist[]; .) as $dict
| fromstream(inputs
| if length==2
then (.[0] | acceptable($dict)) as $p
| if ($p|length) == (.[0]|length) - 1 then .[0] = $p | .[1] = {}
elif ($p|length) < (.[0]|length) then empty
else .
end
else .
end )
Note: The reason this is relatively complicated is that it assumes that you want to retain objects all of whose keys have been removed, as illustrated in the following example. If that is not the case, then the required jq program is much simpler.
Example:
WHITELIST: '["items", "config", "spec", "setting2", "name"]'
input.json:
{
"items": [
{
"name": "issue1",
"spec": {
"config": {
"setting1": "abc",
"setting2": {
"name": "xyz"
}
},
"files": {
"name": "cde",
"path": "/home"
},
"program": {
"name": "apache"
}
}
},
{
"name": {
"etc": 0
}
}
]
}
Output:
{
"items": [
{
"name": "issue1",
"spec": {
"config": {
"setting2": {
"name": "xyz"
}
}
}
},
{
"name": {}
}
]
}
I am going to put my own tentative answer here.
The thing is, the solution I had already in my question, meaning I can select keys during forward navigation, but I cannot find out the path leading up to the present value.
I looked around in the source code of jq to see how come we cannot inquire the path leading up to the present value, so we could ask for the key string or array index of the present value. And indeed it looks like jq does not track the path while it walks through the input structure.
I think this is actually a huge opportunity forfeited that could be so easily kept track during the tree walk.
This is why I continue thinking that XML with XSLT and XPath is a much more robust data representation and tool chain than JSON. In fact, I find JSON harder to read even than XML. The benefit of the JSON being so close to javascript is really only relevant if - as I do in some cases - I read the JSON as a javascript source code assigning it to a variable, and then instrument it by changing the prototype of the anonymous JSON object so that I have methods to go with them. But changing the prototype is said to cause slowness. Though I don't think it does when setting it for otherwise anonymous JSON objects.
There is JsonPath that tries (by way of the name) to be something like what XPath is for XML. But it is a poor substitute and also has no way to navigate up the parent (or then sibling) axes.
So, in summary, while selecting by key in white or black lists is possible in principle, it is quite hard, because a pretty easy to have feature of a JSON navigation language is not specified and not implemented. Other useful features that could be easily achieved in jq is backward navigation to parent or ancestor of the present value. Currently, if you want to navigate back, you need to capture the ancestor you want to get back to as a variable. It is possible, but jq could be massively improved by keeping track of ancestors and paths.
I have an object like a Dictionary('CMFireAutomataModel'->a Dictionary('nbAshes'->193 'nbFires'->851 ) ) and I would like to have something like Dictionary('nbAshes'->193 'nbFires'->851 ).
I don't know how to "unstack" the first dictionary.
Let's say you have a Dictionary whose keys are Strings and its values are Numbers or Dictionaries of the same sort (i.e., with keys that are strings and values that are dicts or numbers). What we want is a way to "promote" or "unstack" all string keys and numbers to the mother dictionary.
unstack: aDictionary
| dict |
dict := aDictionary class new.
aDictionary keysAndValuesDo: [:k :v | | d |
v isNumber
ifTrue: [dict at: k put: v]
ifFalse: [
d := self unstack: v.
dict addAll: d associations]].
^dict
Note that I've used aDictionary class new to make sure the method answers with a Dictionary of the same kind (e.g., an IdentityDictionary, etc.).
Note also that the method could go in any class. I haven't put it in Dictionary because I don't think this is general enough (even though that would have simplified the code a little bit)
I have a MarkLogic 8 database in which there are documents which have two date time fields:
created-on
active-since
I am trying to write an Xquery to search all the documents for which the value of active-since is less than the value of created-on
Currently I am using the following FLWOR exression:
for $entity in fn:collection("entities")
let $id := fn:data($entity//id)
let $created-on := fn:data($entity//created-on)
let $active-since := fn:data($entity//active-since)
where $active-since < $created-on
return
(
$id,
$created-on,
$active-since
)
The above query takes too long to execute and with increase in the number of documents the execution time of this query will also increase.
Also, I have
element-range-index for both the above mentioned dateTime fields but they are not getting used here. The cts-element-query function only compares one element with a set of atomic values. In my case I am trying to compare two elements of the same document.
I think there should be a better and optimized solution for this problem.
Please let me know in case there is any search function or any other approach which will be suitable in this scenario.
This may be efficient enough for you.
Take one of the values and build a range query per value. This all uses the range indexes, so in that sense, it is efficient. However, at some point, there is a large query that us built. It reads similiar to a flword statement. If really wanted to be a bit more efficient, you could find out which if your elements had less unique values (size of the index) and use that for your iteration - thus building a smaller query. Also, you will note that on the element-values call, I also constrain it to your collection. This is just in case you happen to have that element in documents outside of your collection. This keeps the list to only those values you know are in your collection:
let $q := cts:or-query(
for $created-on in cts:element-values(xs:QName("created-on"), (), cts:collection-query("entities"))
return cts:element-value-range-query(xs:Qname("active-since"), "<" $created-on)
)
return
cts:search(
fn:collection("entities"),
$q
)
So, lets explain what is happening in a simple example:
Lets say I have elements A and B - each with a range index defined.
Lets pretend we have the combinations like this in 5 documents:
A,B
2,3
4,2
2,7
5,4
2,9
let $ := cts:or-query(
for $a in cts:element-values(xs:QName("A"))
return cts:element-value-range-query(xs:Qname("B"), "<" $a)
)
This would create the following query:
cts:or-query(
(
cts:element-value-range-query(xs:Qname("B"), "<" 2),
cts:element-value-range-query(xs:Qname("B"), "<" 4),
cts:element-value-range-query(xs:Qname("B"), "<" 5)
)
)
And in the example above, the only match would be the document with the combination: (5,4)
You might try using cts:tuple-values(). Pass in three references: active-since, created-on, and the URI reference. Then iterate the results looking for ones where active-since is less than created-on, and you'll have the URI of the doc.
It's not the prettiest code, but it will let all the data come from RAM, so it should scale nicely.
I am now using the following script to get the count of documents for which the value of active-since is less than the value of created-on:
fn:sum(
for $value-pairs in cts:value-tuples(
(
cts:element-reference(xs:QName("created-on")),
cts:element-reference(xs:QName("active-since"))
),
("fragment-frequency"),
cts:collection-query("entities")
)
let $created-on := json:array-values($value-pairs)[1]
let $active-since := json:array-values($value-pairs)[2]
return
if($active-since lt $created-on) then cts:frequency($value-pairs) else 0
)
Sorry for not having enough reputation, hence I need to comment here on your answer. Why do you think that ML will not return (2,3) and (4,2). I believe we are using an Or-query which will take any single query as true and return the document.
I have a given dictionary and want to map it to an object of a specific class.
All keys of the dictionary should be mapped to equally named instance variables of the object.
I guess this is a common procedure? What is the common way to accomplish it?
Consider doing something like this:
dict := { #x -> 5 . #y -> 6 } asDictionary. "dictionary as you described"
basicObj := Point basicNew. "basic instance of your object"
dict keysAndValuesDo: [ :key :val |
basicObj instVarNamed: key put: val ].
^ basicObj
This is indeed a common pattern. It is often used in serialization and materialization. You can find an implementation in
STON
| dict |
dict := #{'foo'->'brown'. 'bar'->'yellow'.
'qix'->'white'. 'baz'->'red'. 'flub'->'green'} asDictionary.
dict at: 'qix'
If I PrintIt, I get 'white'. If I remove 'asDictionary', I still get 'white'. What does a dictionary give me that a collection of associations doesn't?
Expression like #{exp1 . sxp2 . exp3} is amber-smalltalkspecific and creates a HashedCollection, which is a special kind of dictionary where keys are strings (probably in Javascript you use things like this a lot).
In other smalltalks there is no expression like that. Instead array expressions which look like: {exp1 . sxp2 . exp3} (there is no leading #) were introduced in squeak and are also available in pharo (which is a fork of Squeak) and Amber. Now the array expression creates an Array and so you have to use integers for #at: message. For example dict at: 2 will return you an association 'bar'->'yellow' because it is on the second position of the array you've created.
#asDictionary is a method of a collection that converts it into a dictionary given that the elements of the collection are associations. So if you want to create a dictionary with keys other than strings, you can do it like this:
dict := {
'foo' -> 'brown' .
1 -> 'yellow' .
3 # 4 -> 'white' .
#(1 2) -> 'red' } asDictionary
A Dictionary is a collection of Associations. It is, in fact, Smalltalk's canonical collection of Associations. (An instance of the Association Class is a key value pair, where the value can be an object of any Class).
The advantage a Dictionary gives you is that it has specialised methods for dealing with Associations, compared to other Collections you might be tempted to use.
A Dictionary provides:
removeKey: aKey . removes aKey
includesKey: aKey . checks for the existence of the key
includes: aValue . checks for the existence of a value
at:put: . shorthand for
anAssociation := Association key:value: .
aDictionary add:
e.g.
anAssociation := Association key: 'Hello'
value: 'A greeting people often use' .
aDictionary add: anAssociation .
If the key already exists in the Dictionary, then at:put will overwrite the pre-existing value with the new value, so it's important to check and make sure that the key has a unique value when adding new items.
Both the key and the value can be an object instance of any Class. Every Association in a Dictionary can be any kind of object, and every single key and value might be a instance of a different Class of object from every other element in the Dictionary.
You can create an Association by
anAssociation := Association key: 'keyOfElement' value: 'valueOfElement'
or, more succinctly,
anAssociation := 'keyOfElement' -> 'valueOfElement'
If you want to use keys entirely made specifically of Symbols, there is also the Class IdentityDictionary