Recursion with XQuery with example - xquery

I have a database that look like this (access it via $database):
<country car_code="F" area="547030" capital="cty-france-paris">
<name>France</name>
<border country="AND" length="60"/>
<border country="E" length="623"/>
<border country="D" length="451"/>
<border country="I" length="488"/>
<border country="CH" length="573"/>
<border country="B" length="620"/>
<border country="L" length="73"/>
<border country="MC" length="4.4"/>
</country>
.....
other countries
I would like to write a function that gives the names of all countries reachable from France (or any other country) via land borders. A first attempt (probably with plenty of syntax errors and other errors, but the semantics of the program should be "more clear"):
declare function local:reachable($country as element())
as (return value should be a sequence of countries )
{
if $country == () (:if empty, it doesn't border to any other country:)
then ()
else(
$country/name UNION (for $bord in $country/border/#country return
local:reachable ($database/country/car_code = #bord ))
)
}
The call to that function:
local:reachable($database/country[#car_code = "F"])
The bordering countries to France should be:
<border country="AND" length="60"/>
<border country="E" length="623"/>
<border country="D" length="451"/>
<border country="I" length="488"/>
<border country="CH" length="573"/>
<border country="B" length="620"/>
<border country="L" length="73"/>
<border country="MC" length="4.4"/>
But we also need to find the bordering countries for these countries.
the final output should be "F", "AND", "E", "D", "I", "CH", "B", "L", "MC"..., X, Y, Z, (and other countries that border to these countries).
I know UNION is not defined but is there anything else I can use? I just wanted it to be more clear to what I want to do
One big problem, other than the syntax errors, is that if "F" borders to "L" then "L" will border with "F" so my "function" will never terminate - how can I handle that?
Could I get some help with the syntax
If the question is not clear please let me know so that i can clarify it further

Before we start
Here are a few comments on your code:
$country as element() defines a variable which MUST contain
exactly one element, so it never can be empty; use element()? if
the element is optional, element()* if there can be any number of
them, or element()+ if there must be one or more
the sequence operator , can be used to construct sequences from
other sequences: (1,2) , (3,4) constructs 2 sequences: (1,2) and
(3,4), then constructs another one containing all items in the
others, resulting in: (1,2,3,4)
Data
Let me change slightly the countries element, so I remove the noise,
and make it a bit simpler for this demonstration. Also, I create a
simple, yet complete map. Let us say we have 2 adjacent countries U
and K, and 4 others forming a square (each country is neighbourgh to 2
others): N, G, B, and F. Any similarity to existing geography or
politics is only in your eyes :-)
<!--
Map: U K | N G
B F
-->
<countries>
<country id="U">
<name>Over the top</name>
<border idref="K"/>
</country>
<country id="K">
<name>Beyond the see</name>
<border idref="U"/>
</country>
<country id="N">
<name>Flatland</name>
<border idref="B"/>
<border idref="G"/>
</country>
<country id="G">
<name>Marxhome</name>
<border idref="N"/>
<border idref="F"/>
</country>
<country id="B">
<name>Beerium</name>
<border idref="N"/>
<border idref="F"/>
</country>
<country id="F">
<name>Grapeandcheese</name>
<border idref="B"/>
<border idref="G"/>
</country>
</countries>
Solution
The solution includes a recursive function, that consumes a queue of
countries to handle. Meanwhile, it accumulates the result list one
country at a time. It takes the first country in the queue, add it to
the result, then recurse on all adjacent countries which are not
already in the queue nor the current result. The augmented result is
passed down as well.
xquery version "3.0";
declare variable $countries :=
<countries>
<!-- as above, just copy and paste it -->
</countries>;
declare function local:reachable(
$queue as element(country)*,
$result as element(country)*
) as element(country)*
{
if ( empty($queue) ) then (
(: we do not consider one country reachable from itself :)
tail($result)
)
else (
let $this := head($queue)
let $rest := tail($queue)
let $more := $this/border/#idref[not(. = ($queue, $result)/#id)]
return
local:reachable(
( $rest, $countries/country[#id = $more] ),
( $result, $this ))
)
};
(: for each countries, display its reachable countries
:)
for $c in $countries/country
order by $c/#id
let $r := local:reachable($c, ())
return
$c/name || ': ' || string-join($r/#id, ', ')
Result
Beerium: N, G, F
Grapeandcheese: N, G, B
Marxhome: N, B, F
Beyond the see: U
Flatland: G, B, F
Over the top: K

Related

Count pairs of distinct elements using Xquery

I would count all distinct items considering parent and child tag.
For example, if the document contains:
<root>
<A value="5.1">
<B value="3.5">
<C value="1.4">
<D value="0.2">
<E value="X"/>
</D>
</C>
</B>
</A>
<A value="5.1">
<B value="3.5">
<C value="1.4">
<D value="0.4">
<E value="Y"/>
</D>
</C>
</B>
</A>
<A value="4.6">
<B value="3.1">
<C value="1.5">
<D value="0.2">
<E value="X"/>
</D>
</C>
</B>
</A>
<A value="5.0">
<B value="3.6">
<C value="1.4">
<D value="0.2">
<E value="X"/>
</D>
</C>
</B>
</A>
</root>
I would count all distinct items for AB, ABC, ABCD, BCD etc.
I tried this xquery command, but got only the distinct values ​​for individual items:
count(distinct-values(doc('partitioncollection')/root/A/#value)) -> 3
count(distinct-values(doc('partitioncollection')/root/B/#value)) -> 3
count(distinct-values(doc('partitioncollection')/root/C/#value)) -> 2
More specifically, if I want calculate the counts of all possibile pairs of value, the results should show:
A/B : 3
A/B/C : 3
A/B/C/D : 4
A/B/C/D/E : 4
B/C : 3
B/C/D : 4
B/C/D/E : 4
C/D : 3
C/D/E : 3
D/E : 2
C/E : should be 3 because there are 3 distinct pairs of values:
<C value="1.4"><E value="X"/>
<C value="1.4"><E value="Y"/>
<C value="1.5"><E value="X"/>
A/D : should be 4 because there are 4 distinct pairs of values:
<A value="5.1"><D value="0.2">
<A value="5.1"><D value="0.4">
<A value="4.6"><D value="0.2">
<A value="5.0"><D value="0.2">
etc.
Being computationally complex, I believe it is easier to create a function that takes as a single set of tags (i.e. A-D or C-E etc.) and returns the value of the counter
I am still not sure the problem is precisely described (it is not clear whether the structure is always A/B/C/D/E, it is not clear whether for deeper levels you only want to compare e.g. B/C if they are children of an A with the same value) but in general I think this can be solve using grouping and recursion; I came up with
declare function local:distinct-descendants($elements as element()*) as xs:string*
{
for $element-group in $elements[*]
group by $element-name := node-name($element-group)
let $max-depth := max($element-group/count(.//*)) + 1
for $level in 2 to $max-depth
let $groups :=
for $path-group in $element-group
group by $group-path := string-join(subsequence($path-group/descendant-or-self::*/node-name(), 1, $level), '/')
for $value-group in $path-group
group by $value-seq := string-join(subsequence($value-group/descendant-or-self::*/#value, 1, $level), '|')
return head($group-path)
for $level-group in $groups
group by $level-path := $level-group
order by head($level)
return $level-path || ' : ' || count($level-group)
,
if ($elements/*) then local:distinct-descendants($elements/*) else ()
};
local:distinct-descendants(root/*)
At https://xqueryfiddle.liberty-development.net/nbUY4kz/4 this outputs
A/B : 3
A/B/C : 3
A/B/C/D : 4
A/B/C/D/E : 4
B/C : 3
B/C/D : 4
B/C/D/E : 4
C/D : 3
C/D/E : 3
D/E : 2
I get the same result with BaseX.
The code might be a bit too complicated if the structure is always A/B/C/D/E, in that case the first grouping on the node-name() is not needed.
I also had to group twice inside, on the "path" sequence (e.g. A/B/C) and on the #value sequence, there might be easier ways but I wasn't able to relate unique #value sequences with the the nesting structure without doing the duplicate grouping.
Perhaps
declare function local:distinct-descendants($elements as element()*) as xs:string*
{
let $max-depth := max($elements/count(descendant-or-self::*))
for $level in 2 to $max-depth
let $groups :=
for $path-group in $elements
group by
$group-path := string-join(subsequence($path-group/descendant-or-self::*/node-name(), 1, $level), '/'),
$value-seq := string-join(subsequence($path-group/descendant-or-self::*/#value, 1, $level), '|')
return head($group-path)
for $level-group in $groups
group by $level-path := $level-group
order by head($level)
return $level-path || ' : ' || count($level-group)
,
if ($elements/*) then local:distinct-descendants($elements/*) else ()
};
local:distinct-descendants(root/*)
at https://xqueryfiddle.liberty-development.net/nbUY4kz/6 is a bit simpler and still does the job.
I think the previous two attempts work fine as long as the subtrees all have one child but break if not; thus in the general case of an arbitrary number of child elements it seems necessary to compute paths and values from descendants and group them:
declare variable $value-separator as xs:string external := '|';
declare function local:distinct-descendants($elements as element()*) as xs:string*
{
for $element-group in $elements[*]
group by $element-name := node-name($element-group)
return
(
let $groups :=
for $descendant-group in $element-group//*
group by
$path-key := string-join(($descendant-group/ancestor-or-self::* except $element-group/ancestor::*)/node-name(), '/'),
$value-key := string-join(($descendant-group/ancestor-or-self::* except $element-group/ancestor::*)/#value, $value-separator)
return $path-key
for $path-group in $groups
group by $path-key := $path-group
return $path-key || ' : ' || count($path-group)
,
if ($element-group/*) then local:distinct-descendants($element-group/*) else ()
)
};
local:distinct-descendants(root/*)
https://xqueryfiddle.liberty-development.net/nbUY4kz/14

What is the "some" meaning in Collect result in Scala

"some" is not a special term which makes the googling seem to just ignore that search.
What I am asking is in my learning below:
b.collect:
Array[(Int, String)] = Array((3,dog), (6,salmon), (3,rat), (8,elephant))
d.collect:
Array[(Int, String)] = Array((3,dog), (3,cat), (6,salmon), (6,rabbit), (4,wolf), (7,penguin))
if I do some join and then collect the result, like b.join(d).collect, I will get the following:
Array[(Int, (String, String))] = Array((6,(salmon,salmon)), (6,(salmon,rabbit)), (3,(dog,dog)), (3,(dog,cat)), (3,(rat,dog)), (3,(rat,cat)))
which seems understandable, however, if I do: b.leftOuterJoin(d).collect, I will get:
Array[(Int, (String, Option[String]))] = Array((6,(salmon,Some(salmon))), (6,(salmon,Some(rabbit))), (3,(dog,Some(dog))), (3,(dog,Some(cat))), (3,(rat,Some(dog))), (3,(rat,Some(cat))), (8,(elephant,None)))
My question is why do I get results seems to be expressed differently, I mean why the second result contains "Some"? what's the difference between with "Some" and without "Some"? Can "Some" be removed? Does "Some" have any impact to any later operations as the content of RDD?
Thank you very much.
When you do the normal join as b.join(d).collect, you get Array[(Int, (String, String))]
This is because of only the same key with RDD b and RDD d so it is always guaranteed to have a value so it returns Array[(Int, (String, String))].
But when you use b.leftOuterJoin(d).collect the return type is Array[(Int, (String, Option[String]))] this is because to handle the null. In leftOuterJoin, there is no guarantee that all the keys of RDD b are available in RDD d, So it is returned as Option[String] which contains two values
Some(String) =>If the key is matched in both RDD
None If the key is present in b and not present in d
You can replace Some by getting the value from it and providing the value in case of None as below.
val z = b.leftOuterJoin(d).map(x => (x._1, (x._2._1, x._2._2.getOrElse("")))).collect
Now you should get Array[(Int, (String, String))] and output as
Array((6,(salmon,salmon)), (6,(salmon,rabbit)), (3,(dog,dog)), (3,(dog,cat)), (3,(rat,dog)), (3,(rat,Some(cat)), (8,(elephant,)))
Where you can replace "" with any other string as you require.
Hope this helps.

range index on mixed content node in exist db

My xml file is with the structure
<root>
<compound>abc<parts>a b c</parts></compound>
<compound>xyz<parts>x y z</parts></compound>
</root>
I have created a range index on
<range>
<create qname="compound" type="xs:string"/>
</range>
I expected the index terms are abca b c and xyzx y z but I found abc and xyz under index link in monitoring and profiling window. And also the search string
//compound[.="abca b c"] giving 0 results.
Can any one help in creating index on the whole contents of compound like on abca b c, xyz x y z so on..
Thanks
sony
In xquery, you have to use data() function in order to return all of the descendant or the sub-element values.
So, to test if the values of the compound element can be returned you can use the following:
//compound/data()[.="abca b c"]
nested="yes" attribute solved the problem.
I have changed the range index to
<range>
<create qname="compound" type="xs:string" nested="yes" />
</range>

How to perform a 'join' with a sub element in XQuery

Im not sure how to word what I am trying to do, but I am trying to get all touples of FID, BID, and Something. Consider the following XML:
<FOO>
<FID>f1</FID>
<NAME>f1</NAME>
<BAR>
<BID>b1</BID>
<SOMETHING>15</SOMETHING>
</BAR>
<BAR>
<BID>b2</BID>
<SOMETHING>25</SOMETHING>
</BAR>
</FOO>
<FOO>
<FID>f2</FID>
<NAME>f2</NAME>
<BAR>
<BID>b1</BID>
<SOMETHING>35</SOMETHING>
</BAR>
<BAR>
<BID>b3</BID>
<SOMETHING>0</SOMETHING>
</BAR>
</FOO>
What I need is:
b1 f1 15
b1 f2 35
b2 f1 25
b3 f2 0
Anyone know the syntax that I would use?
I tried:
for $foo in /root/FOO
for $bar in /root/FOO/BAR
let $fid := $foo/FID/text() where $foo/BAR/BID/text()=$bar/BID/text()
let $bid := $foo/BAR/BID/text() where $foo/BAR/BID/text()=$bar/BID/text()
let $something := $foo/BAR/SOMETHING/text() where $foo/BAR/BID/text()=$bar/BID/text()
If you'd wanted to order by the <FID/> elements first, it'd be as easy as looping over the <FOO/>s, for each of them over its <BAR/>s and dumping a string each time:
for $foo in /root/FOO
for $bar in /BAR
return string-join(($foo/FID, $bar/BID, $bar/SOMETHING), ' ')
For grouping by <BID/>s, you have to loop over those first and collect the other information relatively:
for $bar in //BAR
order by $bar/BID
return string-join(($bar/BID, $bar/../FID, $bar/SOMETHING), ' ')
A small remark (as I wasn't able to run your code without major cleanup): $foo and $FOO are not the same, XQuery is case sensitive. Furthermore, you're missing a return clause.

xQuery category nesting issue

I have a problem nesting the result tags in each other the right way.
The result should look like this:
aimed result
<categoryA>
<position>...</position>
<position>...</position>
...
</categoryA>
<categoryB>
<position>...</position>
<position>...</position>
...
</categoryB>
currently I have only managed to get the right results for the positions, the categoryA and B are 1 hierarchic layer higher than the positions. the positions should be nested in the categories. The categories can be referenced by let $y := $d/Bilanz/Aktiva/* (respectively $d$d/Bilanz/Aktiva/LangfristigesVermoegen and $d$d/Bilanz/Aktiva/KurzfristigesVermoegen).
Here is my query:
query
let $d := doc('http://etutor.dke.uni-linz.ac.at/etutor/XML?id=5001')/Bilanzen
let $a02 := $d/Bilanz[#jahr='2002']/Aktiva/*
let $a03 := $d/Bilanz[#jahr='2003']/Aktiva/*
for $n02 in $a02//* , $n03 in $a03//*
(:
where name($n02) = name($n03)
where node-name($n02) = node-name($n03)
:)
where name($n02) = name($n03)
return <position name="{node-name($n02)}">
<j2002>{data($n02/#summe)}</j2002>
<j2003>{data($n03/#summe)}</j2003>
<diff>{data($n03/#summe) - data($n02/#summe)}</diff>
</position>
xml
<Bilanzen>
<Bilanz jahr="2002">
<Aktiva>
<LangfristigesVermoegen>
<Sachanlagen summe="1486575.8"/>
<ImmateriellesVermoegen summe="67767.2"/>
<AssoziierteUnternehmen summe="190826.3"/>
<AndereBeteiligungen summe="507692.7"/>
<Uebrige summe="92916.4"/>
</LangfristigesVermoegen>
<KurzfristigesVermoegen>
<Vorraete summe="78830.9"/>
<Forderungen summe="198210.3"/>
<Finanzmittel summe="181102.0"/>
</KurzfristigesVermoegen>
</Aktiva>
<Passiva>
<Eigenkapital>
<Grundkapital summe="91072.4"/>
<Kapitalruecklagen summe="186789.5"/>
<Gewinnruecklagen summe="798176.2"/>
<Bewertungsruecklagen summe="-34922.4"/>
<Waehrungsumrechnung summe="0"/>
<EigeneAktien summe="0"/>
</Eigenkapital>
<AnteileGesellschafter summe="23613.1"/>
<LangfristigeVerb>
<Finanzverbindlichkeiten summe="680007.1"/>
<Steuern summe="36555.8"/>
<Rueckstellungen summe="429286.1"/>
<Baukostenzuschuesse summe="169246.0"/>
<Uebrige summe="36166.9"/>
</LangfristigeVerb>
<KurzfristigeVerb>
<Finanzverbindlichkeiten summe="14614.6"/>
<Steuern summe="65247.6"/>
<Lieferanten summe="94939.2"/>
<Rueckstellungen summe="123664.8"/>
<Uebrige summe="89464.8"/>
</KurzfristigeVerb>
</Passiva>
</Bilanz>
<Bilanz jahr="2003">
<Aktiva>
<LangfristigesVermoegen>
<Sachanlagen summe="1590313.7"/>
<ImmateriellesVermoegen summe="69693.2"/>
<AssoziierteUnternehmen summe="198224.7"/>
<AndereBeteiligungen summe="418489.3"/>
<Uebrige summe="104566.7"/>
</LangfristigesVermoegen>
<KurzfristigesVermoegen>
<Vorraete summe="20609.8"/>
<Forderungen summe="289458.5"/>
<Finanzmittel summe="302445.9"/>
</KurzfristigesVermoegen>
</Aktiva>
<Passiva>
<Eigenkapital>
<Grundkapital summe="91072.4"/>
<Kapitalruecklagen summe="186789.5"/>
<Gewinnruecklagen summe="875723.4"/>
<Bewertungsruecklagen summe="-15459.5"/>
<Waehrungsumrechnung summe="-633.7"/>
<EigeneAktien summe="0"/>
</Eigenkapital>
<AnteileGesellschafter summe="22669.8"/>
<LangfristigeVerb>
<Finanzverbindlichkeiten summe="733990.2"/>
<Steuern summe="68156.8"/>
<Rueckstellungen summe="395997.2"/>
<Baukostenzuschuesse summe="177338.5"/>
<Uebrige summe="38064.9"/>
</LangfristigeVerb>
<KurzfristigeVerb>
<Finanzverbindlichkeiten summe="6634.7"/>
<Steuern summe="97119.1"/>
<Lieferanten summe="89606.0"/>
<Rueckstellungen summe="128237.5"/>
<Uebrige summe="98495.2"/>
</KurzfristigeVerb>
</Passiva>
</Bilanz>
</Bilanzen>
I would really appreciate some help, i have no clue at all. Thank you.
If I understand you correctly, you want the information about LangfristigesVermoegen (and its children) to be grouped in the output under element categoryA, and the information about Kurzfristigesvermoegen to be grouped under categoryB.
So you will want first of all to do something to generate the categoryA and categoryB elements. For example,
let $d := doc(...)/Bilanzen
return (
<categoryA>{ ... children of category A here ... }</categoryA>,
<categoryB>{ ... children of category B here ... }</categoryB>
)
The positions in each category can be generated using code similar to what you've now got, except that instead of iterating over
for $n02 in $a02//* , $n03 in $a03//*
you will need to iterate over $a02[self::LangfristigesVermoegen]/* for category A, and over $a02[self::KurzfristigesVermoegen]/* for category B (and similarly, of course, for $n02 and $n03).
If the set of categories is not static and you just want to group things in the output using the same grouping elements present in the input, then you'll want an outer structure something like this:
for $assetclass1 in $anno2002/*
let $assetclass2 := $anno2003/*[name() = name($assetclass1)]
return
(element {name($assetclass1)} {
for $old in $assetclass1/*,
$new in $assetclass2/*
where name($old) eq name($new)
return <position name="{node-name($old)}">
<j2002>{data($old/#summe)}</j2002>
<j2003>{data($new/#summe)}</j2003>
<diff>{data($new/#summe) - data($old/#summe)}</diff>
</position>
})

Resources