xquery "Content for update is empty" - xquery

This is the first time I've run into the Xquery (3.1) error Content for update is empty and a search on Google returns nothing useful.
If I run this simple query to identify nested /tei:p/tei:p:
for $x in $mycollection//tei:p/tei:p
return $x
I get XML fragments like the following:
<p xmlns="http://www.tei-c.org/ns/1.0"/>
<p xmlns="http://www.tei-c.org/ns/1.0">Histoires qui sont maintenant du passé (Konjaku monogatari shū). Traduction, introduction et
commentaires de Bernard Frank, Paris, Gallimard/UNESCO, 1987 [1re éd. 1968] (Connaissance de
l'Orient, Série japonaise, 17), p. 323. </p>
<p xmlns="http://www.tei-c.org/ns/1.0">Ed. Chavannes, Cinq cents contes et apologues extraits du Tripitaka chinois, Paris, t. 4,
1934, Notes complémentaires..., p. 147.</p>
<p xmlns="http://www.tei-c.org/ns/1.0"/>
<p xmlns="http://www.tei-c.org/ns/1.0">Ed. Chavannes, Cinq cents contes et apologues extraits du Tripitaka chinois, Paris, t. 4,
1934, Notes complémentaires..., p. 129.</p>
i.e. some with text() and others empty
I am trying to "de-duplicate" the /tei:p/tei:p, but the following attempts return the same aforementioned error:
for $x in $mycollection//tei:p/tei:p
return update replace $x with $x/(text()|*)
for $x in $mycollection//tei:p/tei:p
let $y := $x/(text()|*)
return update replace $x with $y
I don't understand what the error is trying to tell me in order to correct the query.
Many, many thanks.
edit:
for $x in $mycollection//tei:p[tei:p and count(node()) eq 1]
let $y := $x/tei:p
return update replace $x with $y
I also tried this, replacing parent with self axis, which resulted in a very ambiguous error exerr:ERROR node not found:
for $x in $mycollection//tei:p/tei:p
let $y := $x/self::*
return update replace $x/parent::* with $y
solution:
for $x in $local:COLLECTIONS//tei:p/tei:p
return if ($x/(text()|*))
then update replace $x with $x/(text()|*)
else update delete $x

The error message indicates that $y is an empty sequence. The XQuery Update documentation describes the replace statement as follows:
update replace expr with exprSingle
Replaces the nodes returned by expr with the nodes in exprSingle. expr must evaluate to a single element, attribute, or text node. If it is an element, exprSingle must contain a single element node...
In certain cases, as shown in your sample data above, $y would return an empty sequence - which would violate the rule that expr must evaluate to a single element.
To work around such cases, you can add a conditional expression, with an else clause of either an empty sequence () or a delete statement:
if ($y instance of element()) then
update replace $x with $y
else
update delete $x
If your goal is not simply to workaround the error, but to arrive at a more direct solution for replacing "double-nested" elements such as:
<p><p>Mixed <hi>content</hi>.</p></p>
.... with:
<p>Mixed <hi>content</hi>.</p>
... I'd suggest this query, which takes care not to inadvertently delete nodes that might somehow have slipped in between the two nested <p> elements:
xquery version "3.1";
declare namespace tei="http://www.tei-c.org/ns/1.0";
for $x in $mycollection//tei:p[tei:p and count(node()) eq 1]
let $y := $x/tei:p
return
update replace $x with $y
Given a $mycollection such as this:
<text xmlns="http://www.tei-c.org/ns/1.0">
<p>Hello</p>
<p><p>Hello there</p></p>
<p>Hello <p>there</p></p>
</text>
The query will transform the collection to be as follows:
<text xmlns="http://www.tei-c.org/ns/1.0">
<p>Hello</p>
<p>Hello there</p>
<p>Hello <p>there</p></p>
</text>
This is the expected result for the query, because only the 2nd <p> element had the nested <p> that could be cleanly stripped off. Obviously, if you can assume your content meets simpler patterns, you can remove the and count(node()) eq 1 condition.

Related

Multiple keyword search in xquery with tokenize and match

My attempt to ask this before was apparently too convoluted, trying again!
I am composing a search in Xquery. In one of the fields (title) it should be possible to enter multiple keywords. At the moment only ONE keyword works. When there is more than one there is the error ERROR XPTY0004: The actual cardinality for parameter 1 does not match the cardinality declared in the function's signature: concat($atomizable-values as xs:anyAtomicType?, ...) xs:string?. Expected cardinality: zero or one, got 2.
In my xquery I am trying to tokenize the keywords by \s and then match them individually. I think this method is probably false but I am not sure what other method to use. I am obviously a beginner!!
Here is the example XML to be searched:
<files>
<file>
<identifier>
<institution>name1</institution>
<idno>signature</idno>
</identifier>
<title>Math is fun</title>
</file>
<file>
<identifier>
<institution>name1</institution>
<idno>signature1</idno>
</identifier>
<title>philosophy of math</title>
</file>
<file>
<identifier>
<institution>name2</institution>
<idno>signature2</idno>
</identifier>
<title>i like cupcakes</title>
</file>
</files>
Here is the Xquery with example input 'math' for the search field title and 'name1' for the search field institution. This works, the search output are the titles 'math is fun' and 'philosophy of math'. What doesn't work is if you change the input ($title) to 'math fun'. Then you get the error message. The desired output is the title 'math is fun'.
xquery version "3.0";
let $institution := 'name1'
let $title := 'math' (:change to 'math fun' and doesn't work anymore, only a single word works:)
let $title-predicate :=
if ($title)
then
if (contains($title, '"'))
then concat("[contains(lower-case(title), '", replace($title, '["]', ''), "')]") (:This works fine:)
else
for $title2 in tokenize($title, '\s') (:HERE IS THE PROBLEM, this only works when the input is a single word, for instance 'math' not 'math fun':)
return
concat("[matches(lower-case(title), '", $title2, "')]")
else ()
let $institution-predicate := if ($institution) then concat('[lower-case(string-join(identifier/institution))', " = '", $institution, "']") else ()
let $eval-string := concat
("doc('/db/Unbenannt.xml')//file",
$institution-predicate,
$title-predicate
)
let $records := util:eval($eval-string)
let $test := count($records)
let $content :=
<inner_container>
<div>
<h2>Search Results</h2>
<ul>
{
for $record in $records
return
<li id="searchList">
<span>{$record//institution/text()}</span> <br/>
<span>{$record//title/text()}</span>
</li>
}
</ul>
</div>
</inner_container>
return
$content
You have to wrap your FLWOR expression with string-join():
string-join(
for $title2 in tokenize($title, '\s')
return
concat("[matches(lower-case(title), '", $title2, "')]")
)
If tokenize($title) returns a sequence of strings, then
for $title2 in tokenize($title, '\s')
return concat("[matches(lower-case(title), '", $title2, "')]")
will also return a sequence of strings
Therefore $title-predicate will be a sequence of strings, and you can't supply a sequence of strings as one of the arguments to concat().
So it's clear what's wrong, but fixing it requires a deeper understanding of your query than I have time to acquire.
I find it hard to believe that the approach of generating a query as a string and then doing dynamic evaluation of that query is really necessary.

How to tidy-up Processing Instructions in Marklogic

I have a content which is neither a valid HTML nor a XML in my legacy database. Considering the fact, it would be difficult to clean the legacy, I want to tidy this up in MarkLogic using xdmp:tidy. I am currently using ML-8.
<sub>
<p>
<???†?>
</p>
</sub>
I'm passing this content to tidy functionality in a way :
declare variable $xml as node() :=
<content>
<![CDATA[<p><???†?></p>]]>
</content>;
xdmp:tidy(xdmp:quote($xml//text()),
<options xmlns="xdmp:tidy">
<assume-xml-procins>yes</assume-xml-procins>
<quiet>yes</quiet>
<tidy-mark>no</tidy-mark>
<enclose-text>yes</enclose-text>
<indent>yes</indent>
</options>)
As a result it returns :
<p>
<? ?†?>
</p>
Now this result is not the valid xml format (I checked it via XML validator) due to which when I try to insert this XML into the MarkLogic it throws an error saying 'MALFORMED BODY | Invalid Processing Instruction names'.
I did some investigation around PIs but not much luck. I could have tried saving the content without PI but this is also not a valid PI too.
That is because what you think is a PI is in fact not a PI.
From W3C:
2.6 Processing Instructions
[Definition: Processing instructions (PIs) allow documents to contain
instructions for applications.]
Processing Instructions
[16] PI ::= '' Char*)))?
'?>'
[17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' |
'l'))
So the PI name cannot start with ? as in your sample ??†
You probably want to clean up the content before you pass it to tidy.
Like below:
declare variable $xml as node() :=
<content><![CDATA[<p>Hello <???†?>world</p>]]></content>;
declare function local:copy($input as item()*) as item()* {
for $node in $input
return
typeswitch($node)
case text()
return fn:replace($node,"<\?[^>]+\?>","")
case element()
return
element {name($node)} {
(: output each attribute in this element :)
for $att in $node/#*
return
attribute {name($att)} {$att}
,
(: output all the sub-elements of this element recursively :)
for $child in $node
return local:copy($child/node())
}
(: otherwise pass it through. Used for text(), comments, and PIs :)
default return $node
};
xdmp:tidy(local:copy($xml),
<options xmlns="xdmp:tidy">
<assume-xml-procins>no</assume-xml-procins>
<quiet>yes</quiet>
<tidy-mark>no</tidy-mark>
<enclose-text>yes</enclose-text>
<indent>yes</indent>
</options>)
This would do the trick to get rid of all PIs (real and fake PIs)
Regards,
Peter

XQuery not inserting child node

Below is the XML structure. Its a specimen of my original structure, not the exact.
<Docs>
<Doc>
<Para>
<P n="1"><B>Constants : T</B>he value of pi is 3.14</P>
<P n="2">pi is a geometric term.</P>
</Para>
</Doc>
<Doc>
<Para>
<P n="1"><B>Constants : T</B>he value of g is 9.81 m/sqr of sec</P>
<P n="2">g is a acceleration due to gravity.</P>
</Para>
</Doc>
<Doc>
<Para>
<P n="1"><B>Constants : T</B>he value of c is 3.00 x 10 power 8 m/sec</P>
<P n="2">c is a speed of light in vacuum.</P>
</Para>
</Doc>
</Docs>
I have generated XML files programmatically. The B node has data Constant : T, where as it should be only Constants :. I have written an XQuery to do the necessary changes, but its not working as expected.
below is the XQuery - Version 1
for $x in doc('doc1')//Doc
where $x/Para/P[#n="1"]/B/text()="Constants : T"
return
let $p := $x/Para/P[#n="1"]
let $pText := concat("T", $p/text())
let $tag := <P n="1">{$pText}</P>
return
(
delete node $p,
insert node $tag as first into $x/Para,
insert node <B>Constants :</B> as first into $x/Para/P[#n="1"]
)
Version - 2 (Smaller, sweeter but not working !!!)
let $b := <B> Constants :</B>
for $x in doc('doc1')//Doc/Para[P[#n="1"]/B/text()="Constants : T"]/P[#n="1"]
return
(
replace value of node $x with concat("T", $x/text()),
insert node $b/node() as first into $x
)
Neither query is inserting <B>Constants : </B>. Can anybody help me on this?
The problem you are facing has to do with he nature of XQuery Updates. It uses a pending update list and applies all updates at the end of the query. The order of the update operation is well defined and is therefore independent from the order you give in your update statement. See some more information at https://docs.basex.org/wiki/Updates#Pending_Update_List.
So in your case, insert is applied before replace, so you are actually replacing your just already inserted node and thus overwrite this change.
To resolve this, I would just replace the text values and replace the B node. Therefore, both of your operations are independent from another and their order of execution can be changed without a problem.
let $b := <B> Constants :</B>
for $x in doc('doc1')//Doc/Para[P[#n="1"]/B/text()="Constants : T"]/P[#n="1"]
return
(
replace value of node $x/text() with concat("T", $x/text()),
replace node $x/B with $b
)

XQuery - concat with same first value

I have a collection of files describing magazines content
<magazine>
<issue.number>22</issue.number>
<article>
<title>first article</title>
<subject>James</James>
</article>
<article>
<title>second article</title>
<subject>James</subject>
</article>
</magazine>
I want to output a list of articles in a format of <issue.number>, <title>.
So I created an XQuery:
for $x in /magazine
return concat($x/issue.number/text(),',',$x//title/text())
This results in an error, which I think is caused because the <issue.number> value is being returned twice. In eXist DB I get this:
The actual cardinality for parameter 1 does not match the cardinality declared in the function's signature: concat($atomizable-values as xdt:anyAtomicType?, ...) xs:string?. Expected cardinality: zero or one, got 2.
So if I can't use concat() what can I use?
The problem is that there are two articles in that magazine element, so you're passing two values as the third parameter of concat(...). Also, you don't need to use text() in here (and for most cases, shouldn't).
For once loop over the magazines, then again over the articles. It would be also fine to only loop over the articles, but then you'd have to use the parent-axis which I prefer to avoid.
for $magazine in $xml/magazine
for $article in $magazine/article
return concat($magazine/issue.number, ',', $article//title)
22,first article
22,second article
For concatenation over sequences of strings, have a look at string-join($sequence, $seperator). An example would be if you want all titles of one issue in a row.
for $magazine in $xml/magazine
return concat($magazine/issue.number, ': ', string-join($magazine//title, ', '))
which would return
22: first article, second article

How to indicate 'missing' tags in XQuery?

I have an XML file:
$xml := <xml>
<element>
<text>blahblah</text>
</element>
<element>
</element>
<element>
<text>blahblah</text>
</element>
</xml>
I can use the query
for $x in $xml/xml/element/text return string($x)
This gives me a list
blahblah
blahblah
with no indication that there is an element which has no element. What I'd like to do is use a query which, if there is no such element, returns, say "missing". How do I do this?
For a sequence of strings (slightly modified version of the first answer):
for $e in $xml/xml/element
return
if ($e/text)
then string($e/text)
else "missing"
or using a let (which seems a little cleaner to me... but it's probably just 6 of one and half dozen of the other):
for $e in $xml/xml/element
let $text := string($e/text)
return
if ($text)
then $text
else "missing"
Hope that helps.
Are you trying to return the "element" elements that don't have any children? (In your example, it's the second occurrence of "element" as the first and last contain "text" elements.)
If so, you can use a predicate in an XPath expression:
/xml/element[not(*)]
This should work:
for $x in $xml/xml/element
return
if (text)
then string(text)
else "missing"
in MarkLogic
for $e in $xml/xml/element
return ($e/text,"missing")
$xml/element/string((text,"missing")[1])
functions are allowed in XPath expressions, so an explicit loop is not needed here.
the expression (text,"missing")[1]
returns the first non-null item in the sequence of the text element followed by the string "missing"
you can use the eXist sandbox to execute code snippets:-
http://demo.exist-db.org/exist/sandbox/sandbox.xql

Resources