I would count all distinct items considering parent and child tag.
For example, if the document contains:
<root>
<A value="5.1">
<B value="3.5">
<C value="1.4">
<D value="0.2">
<E value="X"/>
</D>
</C>
</B>
</A>
<A value="5.1">
<B value="3.5">
<C value="1.4">
<D value="0.4">
<E value="Y"/>
</D>
</C>
</B>
</A>
<A value="4.6">
<B value="3.1">
<C value="1.5">
<D value="0.2">
<E value="X"/>
</D>
</C>
</B>
</A>
<A value="5.0">
<B value="3.6">
<C value="1.4">
<D value="0.2">
<E value="X"/>
</D>
</C>
</B>
</A>
</root>
I would count all distinct items for AB, ABC, ABCD, BCD etc.
I tried this xquery command, but got only the distinct values for individual items:
count(distinct-values(doc('partitioncollection')/root/A/#value)) -> 3
count(distinct-values(doc('partitioncollection')/root/B/#value)) -> 3
count(distinct-values(doc('partitioncollection')/root/C/#value)) -> 2
More specifically, if I want calculate the counts of all possibile pairs of value, the results should show:
A/B : 3
A/B/C : 3
A/B/C/D : 4
A/B/C/D/E : 4
B/C : 3
B/C/D : 4
B/C/D/E : 4
C/D : 3
C/D/E : 3
D/E : 2
C/E : should be 3 because there are 3 distinct pairs of values:
<C value="1.4"><E value="X"/>
<C value="1.4"><E value="Y"/>
<C value="1.5"><E value="X"/>
A/D : should be 4 because there are 4 distinct pairs of values:
<A value="5.1"><D value="0.2">
<A value="5.1"><D value="0.4">
<A value="4.6"><D value="0.2">
<A value="5.0"><D value="0.2">
etc.
Being computationally complex, I believe it is easier to create a function that takes as a single set of tags (i.e. A-D or C-E etc.) and returns the value of the counter
I am still not sure the problem is precisely described (it is not clear whether the structure is always A/B/C/D/E, it is not clear whether for deeper levels you only want to compare e.g. B/C if they are children of an A with the same value) but in general I think this can be solve using grouping and recursion; I came up with
declare function local:distinct-descendants($elements as element()*) as xs:string*
{
for $element-group in $elements[*]
group by $element-name := node-name($element-group)
let $max-depth := max($element-group/count(.//*)) + 1
for $level in 2 to $max-depth
let $groups :=
for $path-group in $element-group
group by $group-path := string-join(subsequence($path-group/descendant-or-self::*/node-name(), 1, $level), '/')
for $value-group in $path-group
group by $value-seq := string-join(subsequence($value-group/descendant-or-self::*/#value, 1, $level), '|')
return head($group-path)
for $level-group in $groups
group by $level-path := $level-group
order by head($level)
return $level-path || ' : ' || count($level-group)
,
if ($elements/*) then local:distinct-descendants($elements/*) else ()
};
local:distinct-descendants(root/*)
At https://xqueryfiddle.liberty-development.net/nbUY4kz/4 this outputs
A/B : 3
A/B/C : 3
A/B/C/D : 4
A/B/C/D/E : 4
B/C : 3
B/C/D : 4
B/C/D/E : 4
C/D : 3
C/D/E : 3
D/E : 2
I get the same result with BaseX.
The code might be a bit too complicated if the structure is always A/B/C/D/E, in that case the first grouping on the node-name() is not needed.
I also had to group twice inside, on the "path" sequence (e.g. A/B/C) and on the #value sequence, there might be easier ways but I wasn't able to relate unique #value sequences with the the nesting structure without doing the duplicate grouping.
Perhaps
declare function local:distinct-descendants($elements as element()*) as xs:string*
{
let $max-depth := max($elements/count(descendant-or-self::*))
for $level in 2 to $max-depth
let $groups :=
for $path-group in $elements
group by
$group-path := string-join(subsequence($path-group/descendant-or-self::*/node-name(), 1, $level), '/'),
$value-seq := string-join(subsequence($path-group/descendant-or-self::*/#value, 1, $level), '|')
return head($group-path)
for $level-group in $groups
group by $level-path := $level-group
order by head($level)
return $level-path || ' : ' || count($level-group)
,
if ($elements/*) then local:distinct-descendants($elements/*) else ()
};
local:distinct-descendants(root/*)
at https://xqueryfiddle.liberty-development.net/nbUY4kz/6 is a bit simpler and still does the job.
I think the previous two attempts work fine as long as the subtrees all have one child but break if not; thus in the general case of an arbitrary number of child elements it seems necessary to compute paths and values from descendants and group them:
declare variable $value-separator as xs:string external := '|';
declare function local:distinct-descendants($elements as element()*) as xs:string*
{
for $element-group in $elements[*]
group by $element-name := node-name($element-group)
return
(
let $groups :=
for $descendant-group in $element-group//*
group by
$path-key := string-join(($descendant-group/ancestor-or-self::* except $element-group/ancestor::*)/node-name(), '/'),
$value-key := string-join(($descendant-group/ancestor-or-self::* except $element-group/ancestor::*)/#value, $value-separator)
return $path-key
for $path-group in $groups
group by $path-key := $path-group
return $path-key || ' : ' || count($path-group)
,
if ($element-group/*) then local:distinct-descendants($element-group/*) else ()
)
};
local:distinct-descendants(root/*)
https://xqueryfiddle.liberty-development.net/nbUY4kz/14
"some" is not a special term which makes the googling seem to just ignore that search.
What I am asking is in my learning below:
b.collect:
Array[(Int, String)] = Array((3,dog), (6,salmon), (3,rat), (8,elephant))
d.collect:
Array[(Int, String)] = Array((3,dog), (3,cat), (6,salmon), (6,rabbit), (4,wolf), (7,penguin))
if I do some join and then collect the result, like b.join(d).collect, I will get the following:
Array[(Int, (String, String))] = Array((6,(salmon,salmon)), (6,(salmon,rabbit)), (3,(dog,dog)), (3,(dog,cat)), (3,(rat,dog)), (3,(rat,cat)))
which seems understandable, however, if I do: b.leftOuterJoin(d).collect, I will get:
Array[(Int, (String, Option[String]))] = Array((6,(salmon,Some(salmon))), (6,(salmon,Some(rabbit))), (3,(dog,Some(dog))), (3,(dog,Some(cat))), (3,(rat,Some(dog))), (3,(rat,Some(cat))), (8,(elephant,None)))
My question is why do I get results seems to be expressed differently, I mean why the second result contains "Some"? what's the difference between with "Some" and without "Some"? Can "Some" be removed? Does "Some" have any impact to any later operations as the content of RDD?
Thank you very much.
When you do the normal join as b.join(d).collect, you get Array[(Int, (String, String))]
This is because of only the same key with RDD b and RDD d so it is always guaranteed to have a value so it returns Array[(Int, (String, String))].
But when you use b.leftOuterJoin(d).collect the return type is Array[(Int, (String, Option[String]))] this is because to handle the null. In leftOuterJoin, there is no guarantee that all the keys of RDD b are available in RDD d, So it is returned as Option[String] which contains two values
Some(String) =>If the key is matched in both RDD
None If the key is present in b and not present in d
You can replace Some by getting the value from it and providing the value in case of None as below.
val z = b.leftOuterJoin(d).map(x => (x._1, (x._2._1, x._2._2.getOrElse("")))).collect
Now you should get Array[(Int, (String, String))] and output as
Array((6,(salmon,salmon)), (6,(salmon,rabbit)), (3,(dog,dog)), (3,(dog,cat)), (3,(rat,dog)), (3,(rat,Some(cat)), (8,(elephant,)))
Where you can replace "" with any other string as you require.
Hope this helps.
My xml file is with the structure
<root>
<compound>abc<parts>a b c</parts></compound>
<compound>xyz<parts>x y z</parts></compound>
</root>
I have created a range index on
<range>
<create qname="compound" type="xs:string"/>
</range>
I expected the index terms are abca b c and xyzx y z but I found abc and xyz under index link in monitoring and profiling window. And also the search string
//compound[.="abca b c"] giving 0 results.
Can any one help in creating index on the whole contents of compound like on abca b c, xyz x y z so on..
Thanks
sony
In xquery, you have to use data() function in order to return all of the descendant or the sub-element values.
So, to test if the values of the compound element can be returned you can use the following:
//compound/data()[.="abca b c"]
nested="yes" attribute solved the problem.
I have changed the range index to
<range>
<create qname="compound" type="xs:string" nested="yes" />
</range>
Im not sure how to word what I am trying to do, but I am trying to get all touples of FID, BID, and Something. Consider the following XML:
<FOO>
<FID>f1</FID>
<NAME>f1</NAME>
<BAR>
<BID>b1</BID>
<SOMETHING>15</SOMETHING>
</BAR>
<BAR>
<BID>b2</BID>
<SOMETHING>25</SOMETHING>
</BAR>
</FOO>
<FOO>
<FID>f2</FID>
<NAME>f2</NAME>
<BAR>
<BID>b1</BID>
<SOMETHING>35</SOMETHING>
</BAR>
<BAR>
<BID>b3</BID>
<SOMETHING>0</SOMETHING>
</BAR>
</FOO>
What I need is:
b1 f1 15
b1 f2 35
b2 f1 25
b3 f2 0
Anyone know the syntax that I would use?
I tried:
for $foo in /root/FOO
for $bar in /root/FOO/BAR
let $fid := $foo/FID/text() where $foo/BAR/BID/text()=$bar/BID/text()
let $bid := $foo/BAR/BID/text() where $foo/BAR/BID/text()=$bar/BID/text()
let $something := $foo/BAR/SOMETHING/text() where $foo/BAR/BID/text()=$bar/BID/text()
If you'd wanted to order by the <FID/> elements first, it'd be as easy as looping over the <FOO/>s, for each of them over its <BAR/>s and dumping a string each time:
for $foo in /root/FOO
for $bar in /BAR
return string-join(($foo/FID, $bar/BID, $bar/SOMETHING), ' ')
For grouping by <BID/>s, you have to loop over those first and collect the other information relatively:
for $bar in //BAR
order by $bar/BID
return string-join(($bar/BID, $bar/../FID, $bar/SOMETHING), ' ')
A small remark (as I wasn't able to run your code without major cleanup): $foo and $FOO are not the same, XQuery is case sensitive. Furthermore, you're missing a return clause.
I have a problem nesting the result tags in each other the right way.
The result should look like this:
aimed result
<categoryA>
<position>...</position>
<position>...</position>
...
</categoryA>
<categoryB>
<position>...</position>
<position>...</position>
...
</categoryB>
currently I have only managed to get the right results for the positions, the categoryA and B are 1 hierarchic layer higher than the positions. the positions should be nested in the categories. The categories can be referenced by let $y := $d/Bilanz/Aktiva/* (respectively $d$d/Bilanz/Aktiva/LangfristigesVermoegen and $d$d/Bilanz/Aktiva/KurzfristigesVermoegen).
Here is my query:
query
let $d := doc('http://etutor.dke.uni-linz.ac.at/etutor/XML?id=5001')/Bilanzen
let $a02 := $d/Bilanz[#jahr='2002']/Aktiva/*
let $a03 := $d/Bilanz[#jahr='2003']/Aktiva/*
for $n02 in $a02//* , $n03 in $a03//*
(:
where name($n02) = name($n03)
where node-name($n02) = node-name($n03)
:)
where name($n02) = name($n03)
return <position name="{node-name($n02)}">
<j2002>{data($n02/#summe)}</j2002>
<j2003>{data($n03/#summe)}</j2003>
<diff>{data($n03/#summe) - data($n02/#summe)}</diff>
</position>
xml
<Bilanzen>
<Bilanz jahr="2002">
<Aktiva>
<LangfristigesVermoegen>
<Sachanlagen summe="1486575.8"/>
<ImmateriellesVermoegen summe="67767.2"/>
<AssoziierteUnternehmen summe="190826.3"/>
<AndereBeteiligungen summe="507692.7"/>
<Uebrige summe="92916.4"/>
</LangfristigesVermoegen>
<KurzfristigesVermoegen>
<Vorraete summe="78830.9"/>
<Forderungen summe="198210.3"/>
<Finanzmittel summe="181102.0"/>
</KurzfristigesVermoegen>
</Aktiva>
<Passiva>
<Eigenkapital>
<Grundkapital summe="91072.4"/>
<Kapitalruecklagen summe="186789.5"/>
<Gewinnruecklagen summe="798176.2"/>
<Bewertungsruecklagen summe="-34922.4"/>
<Waehrungsumrechnung summe="0"/>
<EigeneAktien summe="0"/>
</Eigenkapital>
<AnteileGesellschafter summe="23613.1"/>
<LangfristigeVerb>
<Finanzverbindlichkeiten summe="680007.1"/>
<Steuern summe="36555.8"/>
<Rueckstellungen summe="429286.1"/>
<Baukostenzuschuesse summe="169246.0"/>
<Uebrige summe="36166.9"/>
</LangfristigeVerb>
<KurzfristigeVerb>
<Finanzverbindlichkeiten summe="14614.6"/>
<Steuern summe="65247.6"/>
<Lieferanten summe="94939.2"/>
<Rueckstellungen summe="123664.8"/>
<Uebrige summe="89464.8"/>
</KurzfristigeVerb>
</Passiva>
</Bilanz>
<Bilanz jahr="2003">
<Aktiva>
<LangfristigesVermoegen>
<Sachanlagen summe="1590313.7"/>
<ImmateriellesVermoegen summe="69693.2"/>
<AssoziierteUnternehmen summe="198224.7"/>
<AndereBeteiligungen summe="418489.3"/>
<Uebrige summe="104566.7"/>
</LangfristigesVermoegen>
<KurzfristigesVermoegen>
<Vorraete summe="20609.8"/>
<Forderungen summe="289458.5"/>
<Finanzmittel summe="302445.9"/>
</KurzfristigesVermoegen>
</Aktiva>
<Passiva>
<Eigenkapital>
<Grundkapital summe="91072.4"/>
<Kapitalruecklagen summe="186789.5"/>
<Gewinnruecklagen summe="875723.4"/>
<Bewertungsruecklagen summe="-15459.5"/>
<Waehrungsumrechnung summe="-633.7"/>
<EigeneAktien summe="0"/>
</Eigenkapital>
<AnteileGesellschafter summe="22669.8"/>
<LangfristigeVerb>
<Finanzverbindlichkeiten summe="733990.2"/>
<Steuern summe="68156.8"/>
<Rueckstellungen summe="395997.2"/>
<Baukostenzuschuesse summe="177338.5"/>
<Uebrige summe="38064.9"/>
</LangfristigeVerb>
<KurzfristigeVerb>
<Finanzverbindlichkeiten summe="6634.7"/>
<Steuern summe="97119.1"/>
<Lieferanten summe="89606.0"/>
<Rueckstellungen summe="128237.5"/>
<Uebrige summe="98495.2"/>
</KurzfristigeVerb>
</Passiva>
</Bilanz>
</Bilanzen>
I would really appreciate some help, i have no clue at all. Thank you.
If I understand you correctly, you want the information about LangfristigesVermoegen (and its children) to be grouped in the output under element categoryA, and the information about Kurzfristigesvermoegen to be grouped under categoryB.
So you will want first of all to do something to generate the categoryA and categoryB elements. For example,
let $d := doc(...)/Bilanzen
return (
<categoryA>{ ... children of category A here ... }</categoryA>,
<categoryB>{ ... children of category B here ... }</categoryB>
)
The positions in each category can be generated using code similar to what you've now got, except that instead of iterating over
for $n02 in $a02//* , $n03 in $a03//*
you will need to iterate over $a02[self::LangfristigesVermoegen]/* for category A, and over $a02[self::KurzfristigesVermoegen]/* for category B (and similarly, of course, for $n02 and $n03).
If the set of categories is not static and you just want to group things in the output using the same grouping elements present in the input, then you'll want an outer structure something like this:
for $assetclass1 in $anno2002/*
let $assetclass2 := $anno2003/*[name() = name($assetclass1)]
return
(element {name($assetclass1)} {
for $old in $assetclass1/*,
$new in $assetclass2/*
where name($old) eq name($new)
return <position name="{node-name($old)}">
<j2002>{data($old/#summe)}</j2002>
<j2003>{data($new/#summe)}</j2003>
<diff>{data($new/#summe) - data($old/#summe)}</diff>
</position>
})