Xquery Split a string by the Nth occurrence of a character - xquery

I need some help with splitting a long string of characters by the Nth occurrence of a certain character. For example
<string>1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27</string>
to be split by the 9th comma
and to become
<string>1,2,3,4,5,6,7,8,9</string>
<string>10,11,12,13,14,15,16,17,18</string>
<string>19,20,21,22,23,24,25,26,27</string>
The length of the original string is not specified and the numbers 1-27 in the example could be words with spaces, but the comma is uniquely a separator.
Thanks!

let $s := <string>1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27</string>
let $len := 9
let $tokens := tokenize($s, ',')
for $n in (1 to count($tokens) idiv $len)
return <string>{
string-join(subsequence($tokens, $len * ($n - 1) + 1, $len), ',')
}</string>

For further reference, here is another solution using XQuery 3.0. It does not use regular expression, but instead a tumbling window.
let $s := '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27'
for tumbling window $w in tokenize($s, ',')
start at $start when true()
end at $end when $end - $start eq 8
return <string>{$w}</string>
This looks like the model use case for windows, in my opinion. It is quite nicely readable: Use a tumbling window (in contrast to a sliding window, which slides only one element further in the sequence each turn, a tumbling window never overlaps) and start at the beginning of the sequence. End a window if there are 9 elements in the window (i.e. 8 in between the start and the end).

If you've got access to XQuery 3.0, you can also use analyze-string(...) using some regex foo:
let $string := '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27'
let $result := analyze-string($string, '(?:[^,]+,){8}[^,]+')
return $result/fn:match
Please realize the number of recurrences in the regular expression is one less as the number of values you want to partition after; it resembles the number of values together with commata, and a single value afterwards.
If you also have to deal with the tail, eg. when dividing the string into tuples of 8 numbers:
let $string := '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27'
let $result := analyze-string($string, '(?:[^,]+,){7}[^,]+')
return $result/(fn:match/string(), *[last()]/substring(., 2))

Related

How to convert string to XPATH in BaseX

How can i convert string into XPATH, below is the code
let $ti := "item/title"
let $tiValue := "Welcome to America"
return db:open('test')/*[ $tiValue = $ti]/base-uri()
Here is one way to solve it:
let $ti := "item/title"
let $tiValue := "Welcome to America"
let $input := db:open('test')
let $steps := tokenize($ti, '/')
let $process-step := function($input, $step) { $input/*[name() = $step] }
let $output := fold-left($input, $steps, $process-step)
let $test := $output[. = $tiValue]
return $test/base-uri()
The path string is split into single steps (item, title). With fold-left, all child nodes of the current input (initially db:open('test')) will be matched against the current step (initially, item). The result will be used as new input and matched against the next step (title), and so on. Finally, only those nodes with $tiValue as text value will be returned.
Your question is very unclear - the basic problem is that you've shown us some code that doesn't do what you want, and you're asking us to work out what you want by guessing what was going on in your head when you wrote the incorrect code.
I suspect -- I may be wrong -- that you were hoping this might somehow give you the result of
db:open('test')/*[item/title = $ti]/base-uri()
and presumably $ti might hold different path expressions on different occasions.
XQuery 3.0/3.1 doesn't have any standard way to evaluate an XPath expression supplied dynamically as a string (unless you count the rather devious approach of using fn:transform() to invoke an XSLT transformation that uses the xsl:evaluate instruction).
BaseX however has an query:eval() function that will do the job for you. See https://docs.basex.org/wiki/XQuery_Module

Count number of occurences of a character in an element using xquery

I have a variable which has | separated values like below.
I need to make sure it never has more than 30 sequences separated by '|', so i believe if i count number of occurrences of '|' in the var it would suffice
class=1111|2222|3333|4444
Can you please help in writing xquery for the same.
I am new to xquery.
If you remove all characters but the bar and then use string-length as in let $s := '1111|2222|3333|4444' return string-length(translate($s, translate($s, '|', ''), '')) you get the number of | characters. That use of string-length and the double translate to remove anything but a certain character is an old XPath 1 trick, of course as XQuery also has replace you could as well use let $s := '1111|2222|3333|4444' return string-length(replace($s, '[^|]+', '')).
You could use the tokenize() function to split the value by the | character, and then count how many items in the sequence with fn:count().
Just remember that the tokenize function uses a regex pattern, so you would need to escape the | as \|:
let $PSV := "1111|2222|3333|4444"
let $tokens := fn:tokenize($PSV, "\|")
let $token-count := fn:count($tokens)
return
if ($token-count > 30) then
fn:error((), "Too many pipe separated values")
else
(: less than thirty values, do stuff with the $tokens :)
()
Just for good measure, and in case you want to do any performance comparisons, you could try
let $sep := string-to-codepoints('|')
return count(string-to-codepoints($in)[.=$sep])
This has the theoretical advantage that (at least in Saxon) it doesn't construct any new strings or sequences in memory.

Counting nr of elements in a file

I am trying to count the number of Harbour elements in an XML file. However, i keep getting the following error:
item expected, sequence found: (element harbour {...}, ...)
The code snippet is the following:
for $harbour in distinct-values(/VOC/voyage/leftpage/harbour)
let $count := count(/VOC/voyage/leftpage/harbour eq $harbour)
return concat($harbour, " ", $count)
Input XML:
<voyage>
<number>4411</number>
<leftpage>
<harbour>Rammekens</harbour>
</leftpage>
</voyage>
<voyage>
<number>4412</number>
<leftpage>
<harbour>Texel</harbour>
</leftpage>
</voyage>
Can someone help me out? How do I iterate over the number of harbours in the XML file instead of trying to use /VOC/voyage/leftpage/harbour?
eq is a value comparison, i.e. used to compare individual items. That is why the errors messages tells you that it is expecting a (single) item, but instead found all the harbour elements. You have to use the general comparison operator =. Also, when you would compare it like that
/VOC/voyage/leftpage/harbour = $harbour
it would always be 1 as it will compare the existence. instead, you want to filter out all harbour items which have an equal text element as child. You can do so using []. All together it will be
for $harbour in distinct-values(/VOC/voyage/leftpage/harbour)
let $count := count(/VOC/voyage/leftpage/harbour[. = $harbour])
return concat($harbour, " ", $count)
Also, if your XQuery processor supports XQuery 3.0 you can also use a group by operator, which in my opinion is nicer to read (and could be faster, but this depends on the implementation):
for $voyage in /VOC/voyage
let $harbour := $voyage/leftpage/harbour
let $harbour-name := $harbour/string()
group by $harbour-name
return $harbour-name || " " || count($harbour)

Removing consecutive numbers from a sequence in XQuery

XQuery
Input: (1,2,3,4,5,6,7,14,15,16,17,24,25,26,27,28)
Output: (1,7,14,17,24,28)
I tried to remove consecutive numbers from the input sequence using the XQuery functions but failed doing so
xquery version "1.0" encoding "utf-8";
declare namespace ns1="http://www.somenamespace.org/types";
declare variable $request as xs:integer* external;
declare function local:func($reqSequence as xs:integer*) as xs:integer* {
let $nonRepeatSeq := for $count in (1 to count($reqSequence)) return
if ($reqSequence[$count+1] - $reqSequence) then
remove($reqSequence,$count+1)
else ()
return
$nonRepeatSeq
};
local:func((1,2,3,4,5,6,7,14,15,16,17,24,25,26,27,28))
Please suggest how to do so in XQuery functional language.
Two simple ways to do this in XQuery. Both rely on being able to assign the sequence of values to a variable, so that we can look at pairs of individual members of it when we need to.
First, just iterate over the values and select (a) the first value, (b) any value which is not one greater than its predecessor, and (c) any value which is not one less than its successor. [OP points out that the last value also needs to be included; left as an exercise for the reader. Or see Michael Kay's answer, which provides a terser formulation of the filter; DeMorgan's Law strikes again!]
let $vseq := (1,2,3,4,5,6,7,14,15,16,17,24,25,26,27,28)
for $v at $pos in $vseq
return if ($pos eq 1
or $vseq[$pos - 1] ne $v - 1
or $vseq[$pos + 1] ne $v + 1)
then $v
else ()
Or, second, do roughly the same thing in a filter expression:
let $vseq := (1,2,3,4,5,6,7,14,15,16,17,24,25,26,27,28)
return $vseq[
for $i in position() return
$i eq 1
or . ne $vseq[$i - 1] + 1
or . ne $vseq[$i + 1] - 1]
The primary difference between these two ways of performing the calculation and your non-working attempt is that they don't say anything about changing or modifying the sequence; they simply specify a new sequence. By using a filter expression, the second formulation makes explicit that the result will be a subsequence of $vseq; the for expression makes no such guarantee in general (although because for each value it returns either the empty sequence or the value itself, we can see that here too the result will be a subsequence: a copy of $vseq from which some values have been omitted.
Many programmers find it difficult to stop thinking in terms of assignment to variables or modification of data structures, but its worth some effort.
[Addendum] I may be overlooking something, but I don't see a way to express this calculation in pure XPath 2.0, since XPath 2.0 seems not to have any mechanism that can bind a variable like $vseq to a non-singleton sequence of values. (XPath 3.0 has let expressions, so it's not a challenge there. The second formulation above is itself pure XPath 3.0.)
In XSLT this can be done as:
<xsl:for-each-group select="$in" group-adjacent=". - position()">
<xsl:sequence select="current-group()[1], current-group()[last()]"/>
</xsl:for-each-group>
In XQuery 3.0 you can do it with tumbling windows, but I'm too lazy to work out the detail.
An XPath 2.0 solution (assuming the input sequence is in $in) is:
for $i in 1 to count($in)
return $in[$i][not(. eq $in[$i - 1]+1 and . eq $in[$i+1]-1)]
There are several logic and XQuery usage errors in your solution, but the main problem with it is that variables in XQuery are immutable, so you cannot reassign a value to one once assigned. Therefore, it's often easier to think about these types of problems in terms of recursive solutions:
declare function local:non-consec(
$prev as xs:integer?,
$rest as xs:integer*
) as xs:integer*
{
if (empty($rest)) then ()
else
let $curr := head($rest)
let $next := subsequence($rest, 2, 1)
return (
if ($prev eq $curr - 1 and $curr eq $next - 1)
then () (: This number is part of a consecutive sequence :)
else $curr,
local:non-consec(head($rest), tail($rest))
)
};
local:non-consec((), (1,2,3,4,5,6,7,14,15,16,17,24,25,26,27,28))
=>
1
7
14
17
24
28

xquery- how to increment a counter variable within a for loop/ how to convert array of string values into node

In a java program, I am using Saxon Library to parse some XQuery commands.
Now, first I use tokenize() to split some text into a number of text values.
An example of the text is--
Mohan Prakash, Ramesh Saini
I use tokenize() on above text with ',' as the delimiter. And store the result of tokenize in variable $var
After this, I want to loop over those text values, and give as output the following--
Mohan Prakash,1
Ramesh Saini,2
As you can see from above, the last value in each row is a counter- for first row, this value is 1, for second row this value=2 and so on...
I thought of using something like the code below--
for $t in $var/position()
return concat($var[$t], ',', $t)
However I am getting an error that I cannot use position() on $var because $var is expected to be a node.
So there are 2 ways to resolve this--
(a) Convert the values in $var to a node
I tried text{$var} but that is not giving accurate results.
How do I convert the array of values into nodes?
(b) Use the following--
for $t in $var
Here, how do I create and use a counter within the for loop, so that the count value can be output along with each value of $t?
You can use the at keyword inside the for clause:
let $var := tokenize('Mohan Prakash, Ramesh Saini', ', ')
for $t at $pos in $var
return concat($t, ',', $pos)
This returns the two strings Mohan Prakash,1 and Ramesh Saini,2.
This XQuery (which also happens to be an XPath 2.0 expression):
for $text in 'Mohan Prakash, Ramesh Saini',
$i in 1 to count(tokenize($text, ','))
return concat(tokenize($text, ',')[$i], ',', $i, '
')
produces the wanted, correct result:
Mohan Prakash,1
Ramesh Saini,2

Resources