How to minimize the use of FOR loop in XQuery? - xquery

I am having a code as Below and i want to know is it possible to get the same result without using FOR loop-
let $doc := cts:search(fn:doc(),
cts:element-value-query(......))
let $column-values := ("one,two,three....") (: Can be n number of values :)
let $tokenized-values := fn:tokenize($column-values, ",")
let $result := for $i in $doc
let $temp := for $j in $tokenized-values
return fn:concat("$i//*:",$j)
return <root>{xdmp:value($temp)}</root>
return <result>{$result}</result>
Expected Result is as below-
<result>
<root>
<one>abc</one>
<two>456</two>
<three>675</three>
</root>
<root>
<one>dfd</one>
<two>235</two>
<three>765</three>
</root>
</result>
I am getting the results but how can i get the same result if i want to minimize the use of FOR loops.
Any Suggestions ?

To improve performance you could put a range index on all the columns you want to pull and use cts:element-value-tuples in lieu of cts:search . This would pull only the elements you want and not the whole document.
For an alternate syntax of second for loop you could use this syntax :
for $j in $tokenized-values
return fn:concat("$i//*:",$j)
To
$tokenized-values ! fn:concat("$i//*:", .)
Although it's roughly the same in terms of performance.

Related

Joining elements in separate sequences based on their position in Xquery 3.0

I have 2 sequences as shown below:
let $city := ('London', 'Tokyo')
let $country := ('England', 'Japan')
let $region := ('Europe','Asia')
I would like to do is manipulate the 2 sequences so that it appears as so:
<Data>
<location>
<city>London</city>
<country>England</country>
<region>Europe</region>
</location>
<location>
<city>Tokyo</city>
<country>Japan</country>
<region>Asia</region>
</location>
</Data>
My plan was to do the following:
1) Add a count to each of the $city, $country and $region sequences respectively as shown below. Please note that I haven't detailed how I'd be doing this but I believe it should be fairly straightforward.
let $city := ('1London', '2Tokyo')
let $country := ('1England', '2Japan')
let $region := ('1Europe','2Asia')
2) Join on the first character of each item in the sequence and manipulate it somehow as shown below. Please note this code doesn't work but it's what I believe could work.
let $city := ('1London', '2Tokyo')
let $country := ('1England', '2Japan')
let $region := ('1Europe','2Asia')
for $locCity in $city, $locCountry in $country, $locRegion in $region
where substring($locCity,1,1) = substring($locCountry ,1,1) and
substring($locCountry ,1,1) = substring($locRegion ,1,1)
let $location := ($locCity,$locCountry,$region)
return
$location
However, this is not working at all. I'm not sure how to proceed really. I'm sure there is a better approach to tackling this problem. Perhaps a "group by" approach might work. Any help would be greatly appreciated. Thanks.
Cheers
Jay
Find out which of the sequences has the most items using max(), then iterate from 1 to max() and construct a <location> element for each. Select the values from each sequence using a predicate select by position:
xquery version "1.0";
let $city := ('London', 'Tokyo')
let $country := ('England', 'Japan')
let $region := ('Europe','Asia')
return
<Data>
{
for $i in 1 to max((count($city), count($country), count($region)))
return
<location>
<city>{ $city[$i] }</city>
<country>{ $country[$i] }</country>
<region>{ $region[$i] }</region>
</location>
}
</Data>

Build dictionary in XQuery for loop and count occurrences of similar nodes

Im trying to count occurrences of a string during a for loop in a dictionary (baseX map). It seems that the contents of the dictionary are cleared after each iteration. Is there a way to keep the info throughout the loop?
declare variable $dict as map(*) := map:merge(());
for $x at $cnt in //a order by -$cnt
let $l := (if (map:contains($dict, $x/#line)) then (fn:number(map:get($dict, $x/#line))) else (0))
let $dict := map:put($dict, $x/#line, 1 + $l)
return (
$dict,
if ($x[#speaker="player.computer" or #speaker = "event.object"])
then ( <add sel="(//{fn:name($x)}[#line='{$x/#line}'])[{fn:string(map:get($dict, $x/#line))}]" type="#hidechoices">false</add> )
else ( <remove sel="(//{fn:name($x)}[#line='{$x/#line}'])[1]" />)
)
so for this xml:
<a line="x" />
<a line="y" />
<a line="y" />
<a line="z" />
i should get something like this for the first:
{
"x": 1
}
and this for the last iteration:
{
"x": 1,
"y": 2,
"z": 1
}
I have to construct some text out of this in the end, thats the last part of the output.
Right now i only get the current key/value pairs at each iteration, so $dict has only one entry throughout the whole execution, and $l is always 0.
Thankfully this worked:
for $x at $cnt in //a
let $dict := map:merge((
for $y at $pos in //a
let $line := $y/#line
where $pos <= $cnt
group by $line
return map:entry($line, count($y))
))
return (
$dict,
if ($x[#speaker="player.computer" or #speaker = "event.object"])
then ( <add sel="(//{fn:name($x)}[#line='{$x/#line}'])[{fn:string(map:get($dict, $x/#line))}]" type="#hidechoices">false</add> )
else ( <remove sel="(//{fn:name($x)}[#line='{$x/#line}'])[1]" />)
)
For some reason could not use position() to limit the inner for, it returned all nodes right at first iteration.
Thanks a lot for your help!
Your whole approach is flawed. XQuery is a functional language and the way you describe your problem and you wrote your query indicates that you not yet fully grasp the functional programming paradigm (which is fully understandable, as it is quite different from procedural programming). I would suggest you read into the topic in general.
Instead of iterating over all elements in a procedural way you can user a FLWOR expression with group by:
let $map := map:merge((
for $x in //a
let $line := $x/#line
group by $line
return map:entry($line, count($x))
))
This holds the result you expected. It iterates over the a elements and groups them together by their line attribute.
Another remark: Your output XML in the sel attribute looks suspiciously like the path to a certain element. Are you aware of the fn:path function, which gives you exactly that?
Based on your update from the comments you can calculate the map multiple times, but just up to the current position:
for $y at $pos in //a
let $map := map:merge((
for $x in //a[position() <= $pos]
let $line := $x/#line
group by $line
return map:entry($line, count($x))
))
return $map

How to dynamically create a search query based on a set of quoted strings in MarkLogic

I have the following query, where i want to form a string of values from a list and i want to use that comma separated string as an or-query but it does not give any result, however when i return just the concatenated string it gives the exact value needed for the query.
The query is as follows:
xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";
declare variable $docURI as xs:string external ;
declare variable $orQuery as xs:string external ;
let $tags :=
<tags>
<tag>"credit"</tag>
<tag>"bank"</tag>
<tag>"private banking"</tag>
</tags>
let $docURI := "/2012-10-22_CSGN.VX_(Citi)_Credit_Suisse_(CSGN.VX)__Model_Update.61198869.xml"
let $orQuery := (string-join($tags/tag, ','))
for $x in cts:search(doc($docURI)/doc/Content/Section/Paragraph, cts:or-query(($orQuery)))
let $r := cts:highlight($x, cts:or-query($orQuery), <b>{$cts:text}</b>)
return <result>{$r}</result>
The exact query that i want to run is :
cts:search(doc($docURI)/doc/Content/Section/Paragraph, cts:or-query(("credit","bank","private banking")))
and when i do
return (string-join($tags/tag, ','))
it gives me exactly what i require
"credit","bank","private banking"
But why does it not return any result in or-query?
The string-join step should not need to be string-join. That passes in a literal string. In xQuery, sequences are your friend.
I think you want to do something like this:
let $tags-to-search := ($tags/tag/text()!replace(., '^"|"$', '') ) (: a sequence of tags :)
cts:search(doc($docURI)/doc/Content/Section/Paragraph, cts:word-query($tags-to-search))
cts:word-query is the default query used for parameter 2 of search if you pass in a string. cts:word query also returns matches for any items in a sequence if presented with that.
https://docs.marklogic.com/cts:word-query
EDIT: Added the replace step for the quotes as suggested by Abel. This is specific to the data as presented by the original question. The overall approach remains the same.
Maybe do you need something like this
let $orQuery := for $tag in $tags/tag return cts:word-query($tag)
I used fn:tokenize instead it worked perfectly for my usecase
its because i was trying to pass these arguments from java using XCC api and it would not return anything with string values
xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";
declare variable $docURI as xs:string external ;
declare variable $orQuery as xs:string external ;
let $input := "credit,bank"
let $tokens := fn:tokenize($input, ",")
let $docURI := "2012-11-19 0005.HK (Citi) HSBC Holdings Plc (0005.HK)_ Model Update.61503613.pdf"
for $x in cts:search(fn:doc($docURI), cts:or-query(($tokens)))
let $r := cts:highlight($x, cts:or-query(($tokens)), <b>{$cts:text}</b>)
return <result>{$r}</result>

XQuery resulting in Out Of Memory Error

Following is the XML file -
<Info>
<Name>
<P1>Tomy</P1>
<P2>John</P2>
<R>P2</R>
</Name>
<Name>
<P1>Tomy</P1>
<P2>John</P2>
<R>P1</R>
</Name>
<Name>
<P1>Rojer</P1>
<P2>Messi</P2>
<R>P2</R>
</Name>
<Name>
<P1>Messi</P1>
<P2>Carl</P2>
<R>P2</R>
</Name>
<Name>
<P1>Messi</P1>
<P2/>
<R>P1</R>
</Name>
</Info>
P1 is the Player No 1 and P2 is the Player No 2, R is the result of the match.
I want to list the names of Players in following format, with distinct occurences.
<P>Tomy V John</P>
<P>Rojer V Messi</P>
<P>Messi V John</P>
<P>Messi</P>
Following is the solution, which gives the desired output -
let $o := doc('sam')//Name
for $x in $o
let $p := if (string-length($x/P2/text()) > 0) then concat($x/P1/text(),' Vs. ',$x/P2/text()) else $x/P1/text()
for $y in $p
group by $y
order by $y
return <a>{$y}</a>
But this query works better only for some selected records. In actual, I have 200000+ records, and when I execute this query, I get Out Of Memory Error. I am using BaseX 7.6.
If I remove the group by clause, the executes without error, but it gives the wrong result ie; with duplicate values. I need distinct result.
Any help ?
Try this
for $x in doc('sam')//Name
let $p := if (string-length($x/P2/text()) > 0) then concat($x/P1/text(),' Vs. ',$x/P2 /text()) else $x/P1/text()
group by $p
order by $p
return <a>{$p}</a>
For anyone who is looking in this question, a possible solution for an out of memory behavior could be also a wrapping xml tag and the specific behavior of copying new nodes related to xQuery implementation e.g:
... <a>{$y}</a>
In that case something like this could also be the solution:
let $o := doc('sam')//Name
for $x in $o
let $p := if (string-length($x/P2/text()) > 0) then concat($x/P1/text(),' Vs. ',$x/P2/text()) else $x/P1/text()
for $y in $p
group by $y
order by $y
return (# db:copynode false #) { <a>{$y}</a> }
For more please see this answer:
https://stackoverflow.com/a/48887745/4825622

Updating counter in XQuery

I want to create a counter in xquery. My initial attempt looked like the following:
let $count := 0
for $prod in $collection
let $count := $count + 1
return
<counter>{$count }</counter>
Expected result:
<counter>1</counter>
<counter>2</counter>
<counter>3</counter>
Actual result:
<counter>1</counter>
<counter>1</counter>
<counter>1</counter>
The $count variable either failing to update or being reset. Why can't I reassign an existing variable? What would be a better way to get the desired result?
Try using 'at':
for $d at $p in $collection
return
element counter { $p }
This will give you the position of each '$d'. If you want to use this together with the order by clause, this won't work since the position is based on the initial order, not on the sort result. To overcome this, just save the sorted result of the FLWOR expression in a variable, and use the at clause in a second FLWOR that just iterates over the first, sorted result.
let $sortResult := for $item in $collection
order by $item/id
return $item
for $sortItem at $position in $sortResult
return <item position="{$position}"> ... </item>
As #Ranon said, all XQuery values are immutable, so you can't update a variable. But if you you really need an updateable number (shouldn't be too often), you can use recursion:
declare function local:loop($seq, $count) {
if(empty($seq)) then ()
else
let $prod := $seq[1],
$count := $count + 1
return (
<count>{ $count }</count>,
local:loop($seq[position() > 1], $count)
)
};
local:loop($collection, 0)
This behaves exactly as you intended with your example.
In XQuery 3.0 a more general version of this function is even defined in the standard library: fn:fold-right($f, $zero, $seq)
That said, in your example you should definitely use at $count as shown by #tohuwawohu.
Immutable variables
XQuery is a functional programming language, which involves amongst others immutable variables, so you cannot change the value of a variable. On the other hand, a powerful collection of functions is available to you, which solves lots of daily programming problems.
let $count := 0
for $prod in $collection]
let $count := $count + 1
return
<counter>{$count }</counter>
let $count in line 1 defines this variable in all scope, which are all following lines in this case. let $count in line 3 defines a new $count which is 0+1, valid in all following lines within this code block - which isn't defined. So you indeed increment $count three times by one, but discard the result immediatly.
BaseX' query info shows the optimized version of this query which is
for $prod in $collection
return element { "counter" } { 1 }
The solution
To get the total number of elements in $collection, you can just use
return count($collection)
For a list of XQuery functions, you could have a look at the XQuery part of functx which contains both a list of XQuery functions and also some other helpful functions which can be included as a module.
Specific to MarkLogic you can also use xdmp:set. But this breaks functional language assumptions, so use it conservatively.
http://docs.marklogic.com/5.0doc/docapp.xqy#display.xqy?fname=http://pubs/5.0doc/apidoc/ExsltBuiltins.xml&category=Extension&function=xdmp:set
For an example of xdmp:set in real-world code, the search parser https://github.com/mblakele/xqysp/blob/master/src/xqysp.xqy might be helpful.
All the solution above are valid but I would like to mention that you can use the XQuery Scripting extension to set variable values:
variable $count := 0;
for $prod in (1 to 10)
return {
$count := $count + 1;
<counter>{$count}</counter>
}
You can try this example live at http://www.zorba-xquery.com/html/demo#twh+3sJfRpHhZR8pHhOdsmqOTvQ=
Use xdmp:set instead of the below query
let $count := 0
for $prod in (1 to 4)
return ( xdmp:set($count,number($count+1)) ,<counter>{$count }</counter>
I think you are looking for something like:
XQUERY:
for $x in (1 to 10)
return
<counter>{$x}</counter>
OUTPUT:
<counter>1</counter>
<counter>2</counter>
<counter>3</counter>
<counter>4</counter>
<counter>5</counter>
<counter>6</counter>
<counter>7</counter>
<counter>8</counter>
<counter>9</counter>
<counter>10</counter>

Resources