Counting nr of elements in a file - xquery

I am trying to count the number of Harbour elements in an XML file. However, i keep getting the following error:
item expected, sequence found: (element harbour {...}, ...)
The code snippet is the following:
for $harbour in distinct-values(/VOC/voyage/leftpage/harbour)
let $count := count(/VOC/voyage/leftpage/harbour eq $harbour)
return concat($harbour, " ", $count)
Input XML:
<voyage>
<number>4411</number>
<leftpage>
<harbour>Rammekens</harbour>
</leftpage>
</voyage>
<voyage>
<number>4412</number>
<leftpage>
<harbour>Texel</harbour>
</leftpage>
</voyage>
Can someone help me out? How do I iterate over the number of harbours in the XML file instead of trying to use /VOC/voyage/leftpage/harbour?

eq is a value comparison, i.e. used to compare individual items. That is why the errors messages tells you that it is expecting a (single) item, but instead found all the harbour elements. You have to use the general comparison operator =. Also, when you would compare it like that
/VOC/voyage/leftpage/harbour = $harbour
it would always be 1 as it will compare the existence. instead, you want to filter out all harbour items which have an equal text element as child. You can do so using []. All together it will be
for $harbour in distinct-values(/VOC/voyage/leftpage/harbour)
let $count := count(/VOC/voyage/leftpage/harbour[. = $harbour])
return concat($harbour, " ", $count)
Also, if your XQuery processor supports XQuery 3.0 you can also use a group by operator, which in my opinion is nicer to read (and could be faster, but this depends on the implementation):
for $voyage in /VOC/voyage
let $harbour := $voyage/leftpage/harbour
let $harbour-name := $harbour/string()
group by $harbour-name
return $harbour-name || " " || count($harbour)

Related

How to convert string to XPATH in BaseX

How can i convert string into XPATH, below is the code
let $ti := "item/title"
let $tiValue := "Welcome to America"
return db:open('test')/*[ $tiValue = $ti]/base-uri()
Here is one way to solve it:
let $ti := "item/title"
let $tiValue := "Welcome to America"
let $input := db:open('test')
let $steps := tokenize($ti, '/')
let $process-step := function($input, $step) { $input/*[name() = $step] }
let $output := fold-left($input, $steps, $process-step)
let $test := $output[. = $tiValue]
return $test/base-uri()
The path string is split into single steps (item, title). With fold-left, all child nodes of the current input (initially db:open('test')) will be matched against the current step (initially, item). The result will be used as new input and matched against the next step (title), and so on. Finally, only those nodes with $tiValue as text value will be returned.
Your question is very unclear - the basic problem is that you've shown us some code that doesn't do what you want, and you're asking us to work out what you want by guessing what was going on in your head when you wrote the incorrect code.
I suspect -- I may be wrong -- that you were hoping this might somehow give you the result of
db:open('test')/*[item/title = $ti]/base-uri()
and presumably $ti might hold different path expressions on different occasions.
XQuery 3.0/3.1 doesn't have any standard way to evaluate an XPath expression supplied dynamically as a string (unless you count the rather devious approach of using fn:transform() to invoke an XSLT transformation that uses the xsl:evaluate instruction).
BaseX however has an query:eval() function that will do the job for you. See https://docs.basex.org/wiki/XQuery_Module

Count number of occurences of a character in an element using xquery

I have a variable which has | separated values like below.
I need to make sure it never has more than 30 sequences separated by '|', so i believe if i count number of occurrences of '|' in the var it would suffice
class=1111|2222|3333|4444
Can you please help in writing xquery for the same.
I am new to xquery.
If you remove all characters but the bar and then use string-length as in let $s := '1111|2222|3333|4444' return string-length(translate($s, translate($s, '|', ''), '')) you get the number of | characters. That use of string-length and the double translate to remove anything but a certain character is an old XPath 1 trick, of course as XQuery also has replace you could as well use let $s := '1111|2222|3333|4444' return string-length(replace($s, '[^|]+', '')).
You could use the tokenize() function to split the value by the | character, and then count how many items in the sequence with fn:count().
Just remember that the tokenize function uses a regex pattern, so you would need to escape the | as \|:
let $PSV := "1111|2222|3333|4444"
let $tokens := fn:tokenize($PSV, "\|")
let $token-count := fn:count($tokens)
return
if ($token-count > 30) then
fn:error((), "Too many pipe separated values")
else
(: less than thirty values, do stuff with the $tokens :)
()
Just for good measure, and in case you want to do any performance comparisons, you could try
let $sep := string-to-codepoints('|')
return count(string-to-codepoints($in)[.=$sep])
This has the theoretical advantage that (at least in Saxon) it doesn't construct any new strings or sequences in memory.

Compare two elements of the same document in MarkLogic

I have a MarkLogic 8 database in which there are documents which have two date time fields:
created-on
active-since
I am trying to write an Xquery to search all the documents for which the value of active-since is less than the value of created-on
Currently I am using the following FLWOR exression:
for $entity in fn:collection("entities")
let $id := fn:data($entity//id)
let $created-on := fn:data($entity//created-on)
let $active-since := fn:data($entity//active-since)
where $active-since < $created-on
return
(
$id,
$created-on,
$active-since
)
The above query takes too long to execute and with increase in the number of documents the execution time of this query will also increase.
Also, I have
element-range-index for both the above mentioned dateTime fields but they are not getting used here. The cts-element-query function only compares one element with a set of atomic values. In my case I am trying to compare two elements of the same document.
I think there should be a better and optimized solution for this problem.
Please let me know in case there is any search function or any other approach which will be suitable in this scenario.
This may be efficient enough for you.
Take one of the values and build a range query per value. This all uses the range indexes, so in that sense, it is efficient. However, at some point, there is a large query that us built. It reads similiar to a flword statement. If really wanted to be a bit more efficient, you could find out which if your elements had less unique values (size of the index) and use that for your iteration - thus building a smaller query. Also, you will note that on the element-values call, I also constrain it to your collection. This is just in case you happen to have that element in documents outside of your collection. This keeps the list to only those values you know are in your collection:
let $q := cts:or-query(
for $created-on in cts:element-values(xs:QName("created-on"), (), cts:collection-query("entities"))
return cts:element-value-range-query(xs:Qname("active-since"), "<" $created-on)
)
return
cts:search(
fn:collection("entities"),
$q
)
So, lets explain what is happening in a simple example:
Lets say I have elements A and B - each with a range index defined.
Lets pretend we have the combinations like this in 5 documents:
A,B
2,3
4,2
2,7
5,4
2,9
let $ := cts:or-query(
for $a in cts:element-values(xs:QName("A"))
return cts:element-value-range-query(xs:Qname("B"), "<" $a)
)
This would create the following query:
cts:or-query(
(
cts:element-value-range-query(xs:Qname("B"), "<" 2),
cts:element-value-range-query(xs:Qname("B"), "<" 4),
cts:element-value-range-query(xs:Qname("B"), "<" 5)
)
)
And in the example above, the only match would be the document with the combination: (5,4)
You might try using cts:tuple-values(). Pass in three references: active-since, created-on, and the URI reference. Then iterate the results looking for ones where active-since is less than created-on, and you'll have the URI of the doc.
It's not the prettiest code, but it will let all the data come from RAM, so it should scale nicely.
I am now using the following script to get the count of documents for which the value of active-since is less than the value of created-on:
fn:sum(
for $value-pairs in cts:value-tuples(
(
cts:element-reference(xs:QName("created-on")),
cts:element-reference(xs:QName("active-since"))
),
("fragment-frequency"),
cts:collection-query("entities")
)
let $created-on := json:array-values($value-pairs)[1]
let $active-since := json:array-values($value-pairs)[2]
return
if($active-since lt $created-on) then cts:frequency($value-pairs) else 0
)
Sorry for not having enough reputation, hence I need to comment here on your answer. Why do you think that ML will not return (2,3) and (4,2). I believe we are using an Or-query which will take any single query as true and return the document.

Removing consecutive numbers from a sequence in XQuery

XQuery
Input: (1,2,3,4,5,6,7,14,15,16,17,24,25,26,27,28)
Output: (1,7,14,17,24,28)
I tried to remove consecutive numbers from the input sequence using the XQuery functions but failed doing so
xquery version "1.0" encoding "utf-8";
declare namespace ns1="http://www.somenamespace.org/types";
declare variable $request as xs:integer* external;
declare function local:func($reqSequence as xs:integer*) as xs:integer* {
let $nonRepeatSeq := for $count in (1 to count($reqSequence)) return
if ($reqSequence[$count+1] - $reqSequence) then
remove($reqSequence,$count+1)
else ()
return
$nonRepeatSeq
};
local:func((1,2,3,4,5,6,7,14,15,16,17,24,25,26,27,28))
Please suggest how to do so in XQuery functional language.
Two simple ways to do this in XQuery. Both rely on being able to assign the sequence of values to a variable, so that we can look at pairs of individual members of it when we need to.
First, just iterate over the values and select (a) the first value, (b) any value which is not one greater than its predecessor, and (c) any value which is not one less than its successor. [OP points out that the last value also needs to be included; left as an exercise for the reader. Or see Michael Kay's answer, which provides a terser formulation of the filter; DeMorgan's Law strikes again!]
let $vseq := (1,2,3,4,5,6,7,14,15,16,17,24,25,26,27,28)
for $v at $pos in $vseq
return if ($pos eq 1
or $vseq[$pos - 1] ne $v - 1
or $vseq[$pos + 1] ne $v + 1)
then $v
else ()
Or, second, do roughly the same thing in a filter expression:
let $vseq := (1,2,3,4,5,6,7,14,15,16,17,24,25,26,27,28)
return $vseq[
for $i in position() return
$i eq 1
or . ne $vseq[$i - 1] + 1
or . ne $vseq[$i + 1] - 1]
The primary difference between these two ways of performing the calculation and your non-working attempt is that they don't say anything about changing or modifying the sequence; they simply specify a new sequence. By using a filter expression, the second formulation makes explicit that the result will be a subsequence of $vseq; the for expression makes no such guarantee in general (although because for each value it returns either the empty sequence or the value itself, we can see that here too the result will be a subsequence: a copy of $vseq from which some values have been omitted.
Many programmers find it difficult to stop thinking in terms of assignment to variables or modification of data structures, but its worth some effort.
[Addendum] I may be overlooking something, but I don't see a way to express this calculation in pure XPath 2.0, since XPath 2.0 seems not to have any mechanism that can bind a variable like $vseq to a non-singleton sequence of values. (XPath 3.0 has let expressions, so it's not a challenge there. The second formulation above is itself pure XPath 3.0.)
In XSLT this can be done as:
<xsl:for-each-group select="$in" group-adjacent=". - position()">
<xsl:sequence select="current-group()[1], current-group()[last()]"/>
</xsl:for-each-group>
In XQuery 3.0 you can do it with tumbling windows, but I'm too lazy to work out the detail.
An XPath 2.0 solution (assuming the input sequence is in $in) is:
for $i in 1 to count($in)
return $in[$i][not(. eq $in[$i - 1]+1 and . eq $in[$i+1]-1)]
There are several logic and XQuery usage errors in your solution, but the main problem with it is that variables in XQuery are immutable, so you cannot reassign a value to one once assigned. Therefore, it's often easier to think about these types of problems in terms of recursive solutions:
declare function local:non-consec(
$prev as xs:integer?,
$rest as xs:integer*
) as xs:integer*
{
if (empty($rest)) then ()
else
let $curr := head($rest)
let $next := subsequence($rest, 2, 1)
return (
if ($prev eq $curr - 1 and $curr eq $next - 1)
then () (: This number is part of a consecutive sequence :)
else $curr,
local:non-consec(head($rest), tail($rest))
)
};
local:non-consec((), (1,2,3,4,5,6,7,14,15,16,17,24,25,26,27,28))
=>
1
7
14
17
24
28

Access variable from within itself in XQuery

I'm wondering whether in XQuery it is possible to access some elements in a variable from within the variable itself.
For instance, if you have a variable with several numbers and you want to sum them all up inside the variable itself. Can you do that with only one variable? Consider something like this:
let $my_variable :=
<my_variable_root>
<number>5</number>
<number>10</number>
<sum>{sum (??)}</sum>
</my_variable_root>
return $my_variable
Can you put some XPath expression inside sum() to access the value of the preceding number elements? I've tried $my_variable//number/number(text()), //number/number(text()), and preceding-sibling::number/number(text()) - but nothing worked for me.
You cannot do that. The variable is not created, till everything in it is constructed.
But you can have temporary variables in the variable
Like
let $my_variable :=
<my_variable_root>{
let $numbers := (
<number>5</number>,
<number>10</number>
)
return ($numbers, <sum>{sum ($numbers)}</sum>)
} </my_variable_root>
Or (XQuery 3):
let $my_variable :=
<my_variable_root>{
let $numbers := (5,10)
return (
$numbers ! <number>{.}</number>,
<sum>{sum ($numbers)}</sum>)
} </my_variable_root>
This is not possible, neither by using the variable name (it is not defined yet), nor using the preceding-sibling axis (no context item bound).
Construct the variable's contents in a flwor-expression instead:
let $my_variable :=
let $numbers := (
<number>5</number>,
<number>10</number>
)
return
<my_variable_root>
{ $numbers }
<sum>{ sum( $numbers) }</sum>
</my_variable_root>
return $my_variable
If you have similar patterns multiple times, consider writing a function; using XQuery Update might also be an alternative (but does not seem to be the most reasonable one to me, both in terms of readability and probably performance).

Resources