Xquery - How to match two sequences within a quantifier expression - xquery

Like many, I'm tackling the Mondial database on XML. It would be a piece of cake, if XQuery syntax wasn't doing its best to sabotage.
let $inland := //province/#id
where every $sea in //sea satisfies
$sea/located/#province != $inland
return $inland
What I am trying to do in the above is find all "inland" provinces, the provinces that don't have a sea next to it. This, however, doesn't work, because the $sea/located/province is a big string, with every single province that it borders in it.
So I tried to modify into.
let $inland := //province/#id
where every $sea in //sea satisfies
not(contains($sea/located/#province, $inland))
return $inland
Where I would like it to only find the provinces that are a part of the sea's bordering provinces. Simple and straightforward.
Error message:
Stopped at C:/Users/saffekaffe/Desktop/mondial/xml/country_without_island.xml, 2/1:
[XPTY0004] Item expected, sequence found: (attribute id {"prov-Greece-2"},....
How do I get around this?
Example of //sea/located/province#
province="prov-France-5 prov-France-20 prov-France-89 prov-France-99"
Example of //province/#id
id="prov-Greece-2"

There are multiple ways in which XQuery works in a different way than you seem to expect.
The comparison operators = and != have existential semantics if at least one of their arguments is a sequence instead of a single item. This means that $seq1 = $seq2 is equivalent to some $x in $seq1, $y in $seq2 satisfies $x = $y. The query ('foo', 'bar') = ('bar', 'baz', 'quuz') returns true because there is at least one common item.
An XQuery exception like //province/#id evaluates to a sequence of all matching nodes. In your case that would be a sequence of over 1000 province IDs: (id="prov-cid-cia-Greece-2", id="prov-cid-cia-Greece-3", id="prov-cid-cia-Greece-4", [...]). This sequence is then bound to the variable $inland in your let clause. Since you don't iterate over individual items in $inland (for example using a for clause), the where condition then works on the whole sequence of all provinces worldwide at once. So your condition every $sea in //sea satisfies
$sea/located/#province != $inland now means:
"For every sea there is a province located next to it that has an #id that is not equal to at least one of all existing province IDs."
Th is returns false because there are seas with no located children, e.g.the Gulf of Aden.
contains($str, $sub) is not a good fit for checking if a substring is contained in a space-delimited string, because it also matches parts of entries: contains("foobar baz quux", "oob") returns true.
Instead you should either split the string into its parts using tokenize($str) and look through its parts, or use contains-token($str, $token).
Putting it all together, a correct query very similar to your original one is:
for $inland in //province/#id
where
every $sea in //sea
satisfies not(contains-token($sea/located/#province, $inland))
return $inland
Another approach would be to first gather all (unique) provinces that are next to seas and then return all provinces not in that sequence:
let $next-to-sea := distinct-values(//sea/located/#province/tokenize(.))
return //province/#id[not(. = $next-to-sea)]
Even more compact (but potentially less efficient):
//province/#id[not(. = //sea/located/#province/tokenize(.))]
On the other end of the spectrum you can use XQuery 3.0 maps to replace the potentially linear search through all seaside provinces by a single lookup:
let $seaside :=
map:merge(
for $id in //sea/located/#province/tokenize(.)
return map{ $id: () }
)
return //province/#id[not(map:contains($seaside, .))]

Related

How do I get a list of all elements and their attributes via XQuery

I am quite new to XQuery and I am trying to get a list of all elements and all attributes.
It should look like this:
element1 #attributex, #attribue y, ...
element 2 #attribute x, #attribute y, ...
element 3 #attribute x, #attribute y, ...
I am trying this so far, but the error "Item expected, sequence found":
for $x in collection("XYZ")
let $att := local-name(//#*)
let $ele := local-name(//*)
let $eleatt := string-join($ele, $att)
return $eleatt
I feel like I am turning an easy step into a complicated one. Please help.
Thanks in advance, Eleonore
//#* gives you a sequence of attribute nodes, //* a sequence of element nodes. In general to apply a function like local-name() to each item in a sequence, for nodes you have three options:
Use a final step /local-name() e.g. //#*/local-name() or //*/local-name()
In XQuery 3.1 use the map operator ! e.g. //#*!local-name()
Use a for .. return expression e.g. for $att in //#* return local-name($att)
The local-name() function takes a single node as its argument, not a sequence of nodes. To apply the same function to every node in a sequence, using the "!" operator: //*!local-name().
The string-join() function takes two arguments, a list of strings, and a separator. You're trying to pass two lists of strings. You want
string-join((//*!local-name(), //#*!local-name()), ',')
Of course you might also want to de-duplicate the list using distinct-values(), and to distinguish element from attribute names, or to associate attribute names with the element they appear on. That's all eminently possible. But for that, you'll have to ask a more precise question.

Match nodes where all relations satisfy constraints

I'm looking to find nodes that have relations where all relations satisfy that constraint. the exact example is do you have a relation in a list.
the graph is bascially cocktails, with the relations being ingredients. given a list of ingredients i want to know what I can make.
with ['Sweet Vermouth', 'Gin', 'Campari', 'Bourbon'] as list
...
should return Negroni, Boulevardier, ...
I've been finding this tricky because we want to make sure that all relations of a node satisfy the constraint, but the number of nodes could very easily be a subset of the list and not an exact match to the ingredient list.
this is the best I've done so far, and it only works if you have all the ingredients, but nothing extra.
with ['Sweet Vermouth', 'Gin', 'Campari', 'Bourbon'] as list
MATCH (n:Cocktail)-[h:HAS]-(x)
WITH list, count(list) AS lth, n, COLLECT(DISTINCT x.name) AS cx, collect(DISTINCT h) as hh
WHERE ALL (i IN list WHERE i IN cx)
RETURN n
I'ved looked at stackoverflow.com/a/62053139/974731. I don't think it solves my problem
as you can see the addition of Bourbon removes the Negroni, which shouldn't happen since all we've done is add an ingredient to our bar.
This should return all cocktails whose needed ingredients are in the have list.
WITH ['Sweet Vermouth', 'Gin', 'Campari', 'Bourbon'] as have
MATCH (c:Cocktail)-[:HAS]->(x)
WITH have, c, COLLECT(x.name) AS needed
WHERE ALL(n IN needed WHERE n IN have)
RETURN c
Or, if you pass have as a parameter:
MATCH (c:Cocktail)-[:HAS]->(x)
WITH c, COLLECT(x.name) AS needed
WHERE ALL(n IN needed WHERE n IN $have)
RETURN c
It's terribly hacky, but this is where I got
with ['Sweet Vermouth', 'Gin', 'Campari', 'Bourbon'] as list
call {
match (ali:Cocktail)--(ii:Ingredient) //pull all nodes
return ali, count(ii) as needed // get count for needed ingredients
}
MATCH (ali)--(i:Ingredient)
WHERE i.name in list // get ingredients that are in the list
WITH distinct ali.name as name, count(ali.name) as available, needed
WHERE available = needed
RETURN name;

Counting nr of elements in a file

I am trying to count the number of Harbour elements in an XML file. However, i keep getting the following error:
item expected, sequence found: (element harbour {...}, ...)
The code snippet is the following:
for $harbour in distinct-values(/VOC/voyage/leftpage/harbour)
let $count := count(/VOC/voyage/leftpage/harbour eq $harbour)
return concat($harbour, " ", $count)
Input XML:
<voyage>
<number>4411</number>
<leftpage>
<harbour>Rammekens</harbour>
</leftpage>
</voyage>
<voyage>
<number>4412</number>
<leftpage>
<harbour>Texel</harbour>
</leftpage>
</voyage>
Can someone help me out? How do I iterate over the number of harbours in the XML file instead of trying to use /VOC/voyage/leftpage/harbour?
eq is a value comparison, i.e. used to compare individual items. That is why the errors messages tells you that it is expecting a (single) item, but instead found all the harbour elements. You have to use the general comparison operator =. Also, when you would compare it like that
/VOC/voyage/leftpage/harbour = $harbour
it would always be 1 as it will compare the existence. instead, you want to filter out all harbour items which have an equal text element as child. You can do so using []. All together it will be
for $harbour in distinct-values(/VOC/voyage/leftpage/harbour)
let $count := count(/VOC/voyage/leftpage/harbour[. = $harbour])
return concat($harbour, " ", $count)
Also, if your XQuery processor supports XQuery 3.0 you can also use a group by operator, which in my opinion is nicer to read (and could be faster, but this depends on the implementation):
for $voyage in /VOC/voyage
let $harbour := $voyage/leftpage/harbour
let $harbour-name := $harbour/string()
group by $harbour-name
return $harbour-name || " " || count($harbour)

Compare two elements of the same document in MarkLogic

I have a MarkLogic 8 database in which there are documents which have two date time fields:
created-on
active-since
I am trying to write an Xquery to search all the documents for which the value of active-since is less than the value of created-on
Currently I am using the following FLWOR exression:
for $entity in fn:collection("entities")
let $id := fn:data($entity//id)
let $created-on := fn:data($entity//created-on)
let $active-since := fn:data($entity//active-since)
where $active-since < $created-on
return
(
$id,
$created-on,
$active-since
)
The above query takes too long to execute and with increase in the number of documents the execution time of this query will also increase.
Also, I have
element-range-index for both the above mentioned dateTime fields but they are not getting used here. The cts-element-query function only compares one element with a set of atomic values. In my case I am trying to compare two elements of the same document.
I think there should be a better and optimized solution for this problem.
Please let me know in case there is any search function or any other approach which will be suitable in this scenario.
This may be efficient enough for you.
Take one of the values and build a range query per value. This all uses the range indexes, so in that sense, it is efficient. However, at some point, there is a large query that us built. It reads similiar to a flword statement. If really wanted to be a bit more efficient, you could find out which if your elements had less unique values (size of the index) and use that for your iteration - thus building a smaller query. Also, you will note that on the element-values call, I also constrain it to your collection. This is just in case you happen to have that element in documents outside of your collection. This keeps the list to only those values you know are in your collection:
let $q := cts:or-query(
for $created-on in cts:element-values(xs:QName("created-on"), (), cts:collection-query("entities"))
return cts:element-value-range-query(xs:Qname("active-since"), "<" $created-on)
)
return
cts:search(
fn:collection("entities"),
$q
)
So, lets explain what is happening in a simple example:
Lets say I have elements A and B - each with a range index defined.
Lets pretend we have the combinations like this in 5 documents:
A,B
2,3
4,2
2,7
5,4
2,9
let $ := cts:or-query(
for $a in cts:element-values(xs:QName("A"))
return cts:element-value-range-query(xs:Qname("B"), "<" $a)
)
This would create the following query:
cts:or-query(
(
cts:element-value-range-query(xs:Qname("B"), "<" 2),
cts:element-value-range-query(xs:Qname("B"), "<" 4),
cts:element-value-range-query(xs:Qname("B"), "<" 5)
)
)
And in the example above, the only match would be the document with the combination: (5,4)
You might try using cts:tuple-values(). Pass in three references: active-since, created-on, and the URI reference. Then iterate the results looking for ones where active-since is less than created-on, and you'll have the URI of the doc.
It's not the prettiest code, but it will let all the data come from RAM, so it should scale nicely.
I am now using the following script to get the count of documents for which the value of active-since is less than the value of created-on:
fn:sum(
for $value-pairs in cts:value-tuples(
(
cts:element-reference(xs:QName("created-on")),
cts:element-reference(xs:QName("active-since"))
),
("fragment-frequency"),
cts:collection-query("entities")
)
let $created-on := json:array-values($value-pairs)[1]
let $active-since := json:array-values($value-pairs)[2]
return
if($active-since lt $created-on) then cts:frequency($value-pairs) else 0
)
Sorry for not having enough reputation, hence I need to comment here on your answer. Why do you think that ML will not return (2,3) and (4,2). I believe we are using an Or-query which will take any single query as true and return the document.

Python: get all values associated with key in a dictionary, where the values may be a list or a single item

I'm looking to get all values associated with a key in a dictionary. Sometimes the key holds a single dictionary, sometimes a list of dictionaries.
a = {
'shelf':{
'book':{'title':'the catcher in the rye', 'author':'j d salinger'}
}
}
b = {
'shelf':[
{'book':{'title':'kafka on the shore', 'author':'haruki murakami'}},
{'book':{'title':'atomised', 'author':'michel houellebecq'}}
]
}
Here's my method to read the titles of every book on the shelf.
def print_books(d):
if(len(d['shelf']) == 1):
print d['shelf']['book']['title']
else:
for book in d['shelf']:
print book['book']['title']
It works, but doesn't look neat or pythonic. The for loop fails on the single value case, hence the if/else.
Can you improve on this?
Given your code will break if you have a list with a single item (and this is how I think it should be), if you really can't change your data structure this is a bit more robust and logic:
def print_books(d):
if isinstance(d['shelf'], dict):
print d['shelf']['book']['title']
else:
for book in d['shelf']:
print book['book']['title']
Why not always make 'shelf' map to a list of elements, but in the single element case it's a ... single element list? Then you'd always be able to treat each bookshelf the same.
def print_books(d):
container = d['shelf']
books = container if isinstance(container, list) else [container['book']]
books = [ e['book'] for e in books ]
for book in books:
print book['title']
I would first get the input consistent, then loop through all the books even if only one.
def print_books(d):
books = d['shelf'] if type(d['shelf']) is list else [ d['shelf'] ]
for book in books:
print book['book']['title']
I think this looks a little neater and pythonic, although some might argue not as efficient as your original code to create an array with one element and loop through it.

Resources