Count pairs of distinct elements using Xquery - xquery

I would count all distinct items considering parent and child tag.
For example, if the document contains:
<root>
<A value="5.1">
<B value="3.5">
<C value="1.4">
<D value="0.2">
<E value="X"/>
</D>
</C>
</B>
</A>
<A value="5.1">
<B value="3.5">
<C value="1.4">
<D value="0.4">
<E value="Y"/>
</D>
</C>
</B>
</A>
<A value="4.6">
<B value="3.1">
<C value="1.5">
<D value="0.2">
<E value="X"/>
</D>
</C>
</B>
</A>
<A value="5.0">
<B value="3.6">
<C value="1.4">
<D value="0.2">
<E value="X"/>
</D>
</C>
</B>
</A>
</root>
I would count all distinct items for AB, ABC, ABCD, BCD etc.
I tried this xquery command, but got only the distinct values ​​for individual items:
count(distinct-values(doc('partitioncollection')/root/A/#value)) -> 3
count(distinct-values(doc('partitioncollection')/root/B/#value)) -> 3
count(distinct-values(doc('partitioncollection')/root/C/#value)) -> 2
More specifically, if I want calculate the counts of all possibile pairs of value, the results should show:
A/B : 3
A/B/C : 3
A/B/C/D : 4
A/B/C/D/E : 4
B/C : 3
B/C/D : 4
B/C/D/E : 4
C/D : 3
C/D/E : 3
D/E : 2
C/E : should be 3 because there are 3 distinct pairs of values:
<C value="1.4"><E value="X"/>
<C value="1.4"><E value="Y"/>
<C value="1.5"><E value="X"/>
A/D : should be 4 because there are 4 distinct pairs of values:
<A value="5.1"><D value="0.2">
<A value="5.1"><D value="0.4">
<A value="4.6"><D value="0.2">
<A value="5.0"><D value="0.2">
etc.
Being computationally complex, I believe it is easier to create a function that takes as a single set of tags (i.e. A-D or C-E etc.) and returns the value of the counter

I am still not sure the problem is precisely described (it is not clear whether the structure is always A/B/C/D/E, it is not clear whether for deeper levels you only want to compare e.g. B/C if they are children of an A with the same value) but in general I think this can be solve using grouping and recursion; I came up with
declare function local:distinct-descendants($elements as element()*) as xs:string*
{
for $element-group in $elements[*]
group by $element-name := node-name($element-group)
let $max-depth := max($element-group/count(.//*)) + 1
for $level in 2 to $max-depth
let $groups :=
for $path-group in $element-group
group by $group-path := string-join(subsequence($path-group/descendant-or-self::*/node-name(), 1, $level), '/')
for $value-group in $path-group
group by $value-seq := string-join(subsequence($value-group/descendant-or-self::*/#value, 1, $level), '|')
return head($group-path)
for $level-group in $groups
group by $level-path := $level-group
order by head($level)
return $level-path || ' : ' || count($level-group)
,
if ($elements/*) then local:distinct-descendants($elements/*) else ()
};
local:distinct-descendants(root/*)
At https://xqueryfiddle.liberty-development.net/nbUY4kz/4 this outputs
A/B : 3
A/B/C : 3
A/B/C/D : 4
A/B/C/D/E : 4
B/C : 3
B/C/D : 4
B/C/D/E : 4
C/D : 3
C/D/E : 3
D/E : 2
I get the same result with BaseX.
The code might be a bit too complicated if the structure is always A/B/C/D/E, in that case the first grouping on the node-name() is not needed.
I also had to group twice inside, on the "path" sequence (e.g. A/B/C) and on the #value sequence, there might be easier ways but I wasn't able to relate unique #value sequences with the the nesting structure without doing the duplicate grouping.
Perhaps
declare function local:distinct-descendants($elements as element()*) as xs:string*
{
let $max-depth := max($elements/count(descendant-or-self::*))
for $level in 2 to $max-depth
let $groups :=
for $path-group in $elements
group by
$group-path := string-join(subsequence($path-group/descendant-or-self::*/node-name(), 1, $level), '/'),
$value-seq := string-join(subsequence($path-group/descendant-or-self::*/#value, 1, $level), '|')
return head($group-path)
for $level-group in $groups
group by $level-path := $level-group
order by head($level)
return $level-path || ' : ' || count($level-group)
,
if ($elements/*) then local:distinct-descendants($elements/*) else ()
};
local:distinct-descendants(root/*)
at https://xqueryfiddle.liberty-development.net/nbUY4kz/6 is a bit simpler and still does the job.
I think the previous two attempts work fine as long as the subtrees all have one child but break if not; thus in the general case of an arbitrary number of child elements it seems necessary to compute paths and values from descendants and group them:
declare variable $value-separator as xs:string external := '|';
declare function local:distinct-descendants($elements as element()*) as xs:string*
{
for $element-group in $elements[*]
group by $element-name := node-name($element-group)
return
(
let $groups :=
for $descendant-group in $element-group//*
group by
$path-key := string-join(($descendant-group/ancestor-or-self::* except $element-group/ancestor::*)/node-name(), '/'),
$value-key := string-join(($descendant-group/ancestor-or-self::* except $element-group/ancestor::*)/#value, $value-separator)
return $path-key
for $path-group in $groups
group by $path-key := $path-group
return $path-key || ' : ' || count($path-group)
,
if ($element-group/*) then local:distinct-descendants($element-group/*) else ()
)
};
local:distinct-descendants(root/*)
https://xqueryfiddle.liberty-development.net/nbUY4kz/14

Related

Regular Expression in teradata

I need to search few patterns from a column using regular expression in Teradata.
One of the example is mentioned below:
SELECT
REGEXP_SUBSTR(
REGEXP_SUBSTR('1-2-3','([0-9] *- *[0-9] *- *[0-9])',1, 1, 'i'),
'([0-9] *- *[0-9] *- *[0-9])',
1, 1, 'i'
) AS Tmp,
REGEXP_SUBSTR(
tmp,
'(^[0-9])',1,1,'i') || '-' || REGEXP_SUBSTR(tmp,'([0-9]$)',
1, 1, 'i'
) AS final_exp
;
In the above expression, I am extracting "1-3" out of a pattern like "1-2-3". Now the patterns can be anything like: 1-2-3-4-5 or 1-2,3 or 1&2-3 or 1-2,3 &4.
Is there any way that I can generalize the search pattern in regular expression like [-,&]* will only search for occurrence of this characters in order, but the characters can be present in any order in the data.
Few examples mentioned below,need is to fetch all the desired result set using a single pattern serch in expression.
Column name ==> Result
abc 1-2+3- 4 ==> 1-4
def 10,12 & 13 ==> 10-13
ijk 1,2,3, and 4 lmn ==> 1-4
abc1-2 & 3 def ==> 1-3
ikl 11 &12 -13 ==> 11-13
oAy$ 7-8 and 9 ==> 7-9
RegExp_Substr(col, '(\d+)',1, 1, 'c') || '-' ||
RegExp_Substr(col, '(\d+)(?!.*\d)',1, 1, 'c')
(\d+) = first number
(\d+)(?!.*\d) = last number (a number not followed by another number)
There's also no need for those optional parameters, because it's using the defaults anyway:
RegExp_Substr(col, '(\d+)') || '-' ||
RegExp_Substr(col, '(\d+)(?!.*\d)')

How to perform a 'join' with a sub element in XQuery

Im not sure how to word what I am trying to do, but I am trying to get all touples of FID, BID, and Something. Consider the following XML:
<FOO>
<FID>f1</FID>
<NAME>f1</NAME>
<BAR>
<BID>b1</BID>
<SOMETHING>15</SOMETHING>
</BAR>
<BAR>
<BID>b2</BID>
<SOMETHING>25</SOMETHING>
</BAR>
</FOO>
<FOO>
<FID>f2</FID>
<NAME>f2</NAME>
<BAR>
<BID>b1</BID>
<SOMETHING>35</SOMETHING>
</BAR>
<BAR>
<BID>b3</BID>
<SOMETHING>0</SOMETHING>
</BAR>
</FOO>
What I need is:
b1 f1 15
b1 f2 35
b2 f1 25
b3 f2 0
Anyone know the syntax that I would use?
I tried:
for $foo in /root/FOO
for $bar in /root/FOO/BAR
let $fid := $foo/FID/text() where $foo/BAR/BID/text()=$bar/BID/text()
let $bid := $foo/BAR/BID/text() where $foo/BAR/BID/text()=$bar/BID/text()
let $something := $foo/BAR/SOMETHING/text() where $foo/BAR/BID/text()=$bar/BID/text()
If you'd wanted to order by the <FID/> elements first, it'd be as easy as looping over the <FOO/>s, for each of them over its <BAR/>s and dumping a string each time:
for $foo in /root/FOO
for $bar in /BAR
return string-join(($foo/FID, $bar/BID, $bar/SOMETHING), ' ')
For grouping by <BID/>s, you have to loop over those first and collect the other information relatively:
for $bar in //BAR
order by $bar/BID
return string-join(($bar/BID, $bar/../FID, $bar/SOMETHING), ' ')
A small remark (as I wasn't able to run your code without major cleanup): $foo and $FOO are not the same, XQuery is case sensitive. Furthermore, you're missing a return clause.

How to use the function "table:get" (table extension) when 2 keys are required?

I have a file .txt with 3 columns: ID-polygon-1, ID-polygon-2 and distance.
When I import my file into Netlogo, I obtain 3 lists [[list1][list2][list3]] which corresponds with the 3 columns.
I used table:from-list list to create a table with the content of 3 lists.
I obtain {{table: [[1 1] [67 518] [815 127]]}} (The table displays the first two lines of my dataset).
For example, I would like to get the value of distance (list3) between ID-polygon-1 = 1 (list1) and ID-polygon-2 = 67 (list1), that is, 815.
How can I use table:get table key when I have need of 2 keys (ID-polygon-1 and ID-polygon-2) ?
Thanks very much your help.
Using table:from-list will not help you there: it expects "a list of two element lists, or pairs" where the "the first element in the pair is the key and the second element is the value." That's not what you have in your original list.
Furthermore, NetLogo tables (and associative arrays in general) cannot have two keys. They are always just key-value pairs. Nothing prevents the value from being another table, however, and in your case, that is what you need: a table of tables!
There is no primitive to build that directly, however. You will need to build it yourself:
extensions [ table ]
globals [ t ]
to setup
let lists [
[ 1 1 ] ; ID-polygon-1 column
[ 67 518 ] ; ID-polygon-2 column
[ 815 127 ] ; distance column
]
set t table:make
foreach n-values length first lists [ ? ] [
let id1 item ? (item 0 lists)
let id2 item ? (item 1 lists)
let dist item ? (item 2 lists)
if not table:has-key? t id1 [
table:put t id1 table:make
]
table:put (table:get t id1) id2 dist
]
end
Here is what you get when you print the resulting table:
{{table: [[1 {{table: [[67 815] [518 127]]}}]]}}
And here is a small reporter to make it convenient to get a distance from the table:
to-report get-dist [ id1 id2 ]
report table:get (table:get t id1) id2
end
Using get-dist 1 67 will give the 815 result you were looking for.

Xquery to concatenate

for the below data -
let $x := "Yahooooo !!!! Select one number - "
let $y :=
<A>
<a>1</a>
<a>2</a>
<a>3</a>
<a>4</a>
<a>5</a>
<a>6</a>
<a>7</a>
</A>
I want to get the output as -
`Yahooooo !!!! Select one number - [1 or 2 or 3 or 4 or 5 or 6 or 7]`
In XQuery 3.0, you can use || as a string concatenation operator:
return $x || "[" || fn:string-join($y/a, " or ") || "]"
In XQuery 1.0, you need to use fn:concat():
return fn:concat($x, fn:concat("[", fn:concat(fn:string-join($y/a, " or "), "]")))

xQuery category nesting issue

I have a problem nesting the result tags in each other the right way.
The result should look like this:
aimed result
<categoryA>
<position>...</position>
<position>...</position>
...
</categoryA>
<categoryB>
<position>...</position>
<position>...</position>
...
</categoryB>
currently I have only managed to get the right results for the positions, the categoryA and B are 1 hierarchic layer higher than the positions. the positions should be nested in the categories. The categories can be referenced by let $y := $d/Bilanz/Aktiva/* (respectively $d$d/Bilanz/Aktiva/LangfristigesVermoegen and $d$d/Bilanz/Aktiva/KurzfristigesVermoegen).
Here is my query:
query
let $d := doc('http://etutor.dke.uni-linz.ac.at/etutor/XML?id=5001')/Bilanzen
let $a02 := $d/Bilanz[#jahr='2002']/Aktiva/*
let $a03 := $d/Bilanz[#jahr='2003']/Aktiva/*
for $n02 in $a02//* , $n03 in $a03//*
(:
where name($n02) = name($n03)
where node-name($n02) = node-name($n03)
:)
where name($n02) = name($n03)
return <position name="{node-name($n02)}">
<j2002>{data($n02/#summe)}</j2002>
<j2003>{data($n03/#summe)}</j2003>
<diff>{data($n03/#summe) - data($n02/#summe)}</diff>
</position>
xml
<Bilanzen>
<Bilanz jahr="2002">
<Aktiva>
<LangfristigesVermoegen>
<Sachanlagen summe="1486575.8"/>
<ImmateriellesVermoegen summe="67767.2"/>
<AssoziierteUnternehmen summe="190826.3"/>
<AndereBeteiligungen summe="507692.7"/>
<Uebrige summe="92916.4"/>
</LangfristigesVermoegen>
<KurzfristigesVermoegen>
<Vorraete summe="78830.9"/>
<Forderungen summe="198210.3"/>
<Finanzmittel summe="181102.0"/>
</KurzfristigesVermoegen>
</Aktiva>
<Passiva>
<Eigenkapital>
<Grundkapital summe="91072.4"/>
<Kapitalruecklagen summe="186789.5"/>
<Gewinnruecklagen summe="798176.2"/>
<Bewertungsruecklagen summe="-34922.4"/>
<Waehrungsumrechnung summe="0"/>
<EigeneAktien summe="0"/>
</Eigenkapital>
<AnteileGesellschafter summe="23613.1"/>
<LangfristigeVerb>
<Finanzverbindlichkeiten summe="680007.1"/>
<Steuern summe="36555.8"/>
<Rueckstellungen summe="429286.1"/>
<Baukostenzuschuesse summe="169246.0"/>
<Uebrige summe="36166.9"/>
</LangfristigeVerb>
<KurzfristigeVerb>
<Finanzverbindlichkeiten summe="14614.6"/>
<Steuern summe="65247.6"/>
<Lieferanten summe="94939.2"/>
<Rueckstellungen summe="123664.8"/>
<Uebrige summe="89464.8"/>
</KurzfristigeVerb>
</Passiva>
</Bilanz>
<Bilanz jahr="2003">
<Aktiva>
<LangfristigesVermoegen>
<Sachanlagen summe="1590313.7"/>
<ImmateriellesVermoegen summe="69693.2"/>
<AssoziierteUnternehmen summe="198224.7"/>
<AndereBeteiligungen summe="418489.3"/>
<Uebrige summe="104566.7"/>
</LangfristigesVermoegen>
<KurzfristigesVermoegen>
<Vorraete summe="20609.8"/>
<Forderungen summe="289458.5"/>
<Finanzmittel summe="302445.9"/>
</KurzfristigesVermoegen>
</Aktiva>
<Passiva>
<Eigenkapital>
<Grundkapital summe="91072.4"/>
<Kapitalruecklagen summe="186789.5"/>
<Gewinnruecklagen summe="875723.4"/>
<Bewertungsruecklagen summe="-15459.5"/>
<Waehrungsumrechnung summe="-633.7"/>
<EigeneAktien summe="0"/>
</Eigenkapital>
<AnteileGesellschafter summe="22669.8"/>
<LangfristigeVerb>
<Finanzverbindlichkeiten summe="733990.2"/>
<Steuern summe="68156.8"/>
<Rueckstellungen summe="395997.2"/>
<Baukostenzuschuesse summe="177338.5"/>
<Uebrige summe="38064.9"/>
</LangfristigeVerb>
<KurzfristigeVerb>
<Finanzverbindlichkeiten summe="6634.7"/>
<Steuern summe="97119.1"/>
<Lieferanten summe="89606.0"/>
<Rueckstellungen summe="128237.5"/>
<Uebrige summe="98495.2"/>
</KurzfristigeVerb>
</Passiva>
</Bilanz>
</Bilanzen>
I would really appreciate some help, i have no clue at all. Thank you.
If I understand you correctly, you want the information about LangfristigesVermoegen (and its children) to be grouped in the output under element categoryA, and the information about Kurzfristigesvermoegen to be grouped under categoryB.
So you will want first of all to do something to generate the categoryA and categoryB elements. For example,
let $d := doc(...)/Bilanzen
return (
<categoryA>{ ... children of category A here ... }</categoryA>,
<categoryB>{ ... children of category B here ... }</categoryB>
)
The positions in each category can be generated using code similar to what you've now got, except that instead of iterating over
for $n02 in $a02//* , $n03 in $a03//*
you will need to iterate over $a02[self::LangfristigesVermoegen]/* for category A, and over $a02[self::KurzfristigesVermoegen]/* for category B (and similarly, of course, for $n02 and $n03).
If the set of categories is not static and you just want to group things in the output using the same grouping elements present in the input, then you'll want an outer structure something like this:
for $assetclass1 in $anno2002/*
let $assetclass2 := $anno2003/*[name() = name($assetclass1)]
return
(element {name($assetclass1)} {
for $old in $assetclass1/*,
$new in $assetclass2/*
where name($old) eq name($new)
return <position name="{node-name($old)}">
<j2002>{data($old/#summe)}</j2002>
<j2003>{data($new/#summe)}</j2003>
<diff>{data($new/#summe) - data($old/#summe)}</diff>
</position>
})

Resources