Encode string as html in eXist-db / XQuery - xquery

I'm trying to generate a treeview from a collection (filesystem). Unfortunately some Files have special characters like ü ä and ö. And I'd like to have them html encoded as &­auml;
When I get them from the variable, they are URL encoded. First I decode them to UTF-8 and then .... i don't know how to go further.
<li>{util:unescape-uri($child, "UTF-8")}
The function util:parse is doing the exact opposite from that what I want.
Here is the recursive function:
xquery version "3.0";
declare namespace ls="ls";
declare option exist:serialize "method=html media-type=text/html omit-xml-declaration=yes indent=yes";
declare function ls:ls($collection as xs:string, $subPath as xs:string) as element()* {
if (xmldb:collection-available($collection)) then
(
for $child in xmldb:get-child-collections($collection)
let $path := concat($collection, '/', $child)
let $sPath := concat($subPath, '/', $child)
order by $child
return
<li>{util:unescape-uri($child, "UTF-8")}
<ul>
{ls:ls($path,$sPath)}
</ul>
</li>,
for $child in xmldb:get-child-resources($collection)
let $sPath := concat($subPath, '/', $child)
order by $child
return
<li> {util:unescape-uri($child, "UTF-8")}</li>
)
else ()
};
let $collection := request:get-parameter('coll', '/db/apps/ebner-online/resources/xss/xml')
return
<ul>{ls:ls($collection,"")}</ul>

Rather than util:unescape-uri(), I would suggest using xmldb:encode-uri() and xmldb:decode-uri(). Use the encode version on a collection or document name when creating/storing it. Use the decode version when displaying the collection or document name. See the function documentation for the xmldb module.
As to forcing ä instead of ü, this is an even trickier serialization issue. Both, along with ä, are equivalent representations of the same UTF-8 character. Why not just let the character through as ü?

Related

Multiple keyword search in xquery with tokenize and match

My attempt to ask this before was apparently too convoluted, trying again!
I am composing a search in Xquery. In one of the fields (title) it should be possible to enter multiple keywords. At the moment only ONE keyword works. When there is more than one there is the error ERROR XPTY0004: The actual cardinality for parameter 1 does not match the cardinality declared in the function's signature: concat($atomizable-values as xs:anyAtomicType?, ...) xs:string?. Expected cardinality: zero or one, got 2.
In my xquery I am trying to tokenize the keywords by \s and then match them individually. I think this method is probably false but I am not sure what other method to use. I am obviously a beginner!!
Here is the example XML to be searched:
<files>
<file>
<identifier>
<institution>name1</institution>
<idno>signature</idno>
</identifier>
<title>Math is fun</title>
</file>
<file>
<identifier>
<institution>name1</institution>
<idno>signature1</idno>
</identifier>
<title>philosophy of math</title>
</file>
<file>
<identifier>
<institution>name2</institution>
<idno>signature2</idno>
</identifier>
<title>i like cupcakes</title>
</file>
</files>
Here is the Xquery with example input 'math' for the search field title and 'name1' for the search field institution. This works, the search output are the titles 'math is fun' and 'philosophy of math'. What doesn't work is if you change the input ($title) to 'math fun'. Then you get the error message. The desired output is the title 'math is fun'.
xquery version "3.0";
let $institution := 'name1'
let $title := 'math' (:change to 'math fun' and doesn't work anymore, only a single word works:)
let $title-predicate :=
if ($title)
then
if (contains($title, '"'))
then concat("[contains(lower-case(title), '", replace($title, '["]', ''), "')]") (:This works fine:)
else
for $title2 in tokenize($title, '\s') (:HERE IS THE PROBLEM, this only works when the input is a single word, for instance 'math' not 'math fun':)
return
concat("[matches(lower-case(title), '", $title2, "')]")
else ()
let $institution-predicate := if ($institution) then concat('[lower-case(string-join(identifier/institution))', " = '", $institution, "']") else ()
let $eval-string := concat
("doc('/db/Unbenannt.xml')//file",
$institution-predicate,
$title-predicate
)
let $records := util:eval($eval-string)
let $test := count($records)
let $content :=
<inner_container>
<div>
<h2>Search Results</h2>
<ul>
{
for $record in $records
return
<li id="searchList">
<span>{$record//institution/text()}</span> <br/>
<span>{$record//title/text()}</span>
</li>
}
</ul>
</div>
</inner_container>
return
$content
You have to wrap your FLWOR expression with string-join():
string-join(
for $title2 in tokenize($title, '\s')
return
concat("[matches(lower-case(title), '", $title2, "')]")
)
If tokenize($title) returns a sequence of strings, then
for $title2 in tokenize($title, '\s')
return concat("[matches(lower-case(title), '", $title2, "')]")
will also return a sequence of strings
Therefore $title-predicate will be a sequence of strings, and you can't supply a sequence of strings as one of the arguments to concat().
So it's clear what's wrong, but fixing it requires a deeper understanding of your query than I have time to acquire.
I find it hard to believe that the approach of generating a query as a string and then doing dynamic evaluation of that query is really necessary.

How to share markup snippets amongst functions in eXist-db?

I wonder whether there is a way how to share html code snippets in eXist-db. I have two different (more expected later) functions returning the same big html form for different results. It is annoying to maintain the same code when I change something in one of these. I have tried:
Saving it like html file and load it with doc() function (eXist complains it is not an xml file, it is binary.
Saving it like global variable into a separate module (eXist complains there is a problem with contexts). I don’t know how to pass such a variable without the namespace prefix.
Saving it like a function returning its own huge variable (eXist complains there is a problem with contexts).
What is the best practice?
UPDATE
Well, I have tried to put the snippet into a variable insinde a function loaded as a module. For me, it seems reasonable. However, I got an error when try to return that:
err:XPDY0002 Undefined context sequence for 'child::snip:snippet' [at line 62, column 13, source: /db/apps/karolinum-apps/modules/app.xql]
In function:a pp:book-search(node(), map, xs:string?) [34:9:/db/apps/karolinum-apps/modules/app.xql]
I am calling it like so:
declare function app:list-books($node as node(), $model as map(*)) {
for $resource in collection('/db/apps/karolinum-apps/data/mono')
let $author := $resource//tei:titleStmt/tei:author/text()
let $bookName := $resource//tei:titleStmt/tei:title/text()
let $bookUri := base-uri($resource)
let $imgPath := replace($bookUri, '[^/]*?$', '')
let $fileUri := ( '/exist/rest' || $bookUri )
let $fileName := replace($bookUri, '.*?/', '')
return
if ($resource//tei:titleStmt/tei:title)
then
snip:snippet
else ()
};
Any ideas, please?
UPDATE II
Here I have the function in the module:
module namespace snip = "http://46.28.111.241:8081/exist/db/apps/karolinum-apps/modules/snip";
declare function snip:snippet($node as node(), $model as map(*), $author as xs:string, $bookTitle as xs:string, $bookUri as xs:anyURI, $fileUri as xs:anyURI) as element()* {
let $snippet :=
(
<div class="panel panel-default">
<div class="panel-heading">
<h3 class="panel-title">{$bookTitle} ({$author})</h3>
</div>
<div class="panel-body">
...
</div>
)
return $snippet
};
Here I am trying to call it:
declare function app:list-books($node as node(), $model as map(*)) {
for $resource in collection('/db/apps/karolinum-apps/data/mono')
let $author := $resource//tei:titleStmt/tei:author/text()
let $bookTitle := $resource//tei:titleStmt/tei:title/text()
let $bookUri := base-uri($resource)
let $fileUri := ('/exist/rest' || $bookUri)
let $fileName := replace($bookUri, '.*?/', '')
where not(util:is-binary-doc($bookUri))
order by $bookTitle, $author
return
snip:snippet($author, $bookTitle, $bookUri, $fileUri)
};
It throws:
err:XPST0017 error found while loading module app: Error while loading module app.xql: Function snip:snippet() is not defined in namespace 'http://46.28.111.241:8081/exist/db/apps/karolinum-apps/modules/snip' [at line 35, column 9]
When I tried to put the snippet into a variable, it was not possible to pass there those local variables used (it threw $fileUri is not set). Besides that I tried to change the returned type element()* but nothing helped.
All of your approaches should work. Let me address each one:
Is the HTML snippet well-formed XML? If so, save it as, e.g., form.xml or form.html (since by default eXist assumes files with the .html extension are well-formed; see mime-types.xml in your eXist installation folder) and refer to it with doc($path). If it is not well-formed, you can save it as form.txt and pull it in with util:binary-to-string(util:binary-doc($path)). Or make the HTML well-formed and use the first alternative.
This too is valid, so you must not be properly declaring or referring to the global variable. What is the exact error you are getting? Can you post a small example snippet that we could run to reproduce your results?
See #2.
I was very close. It was necessary to somehow pass parameters to the nested function and omit eXist’s typical $node as node(), $model as map(*) as arguments.
Templating function:
declare function app:list-books($node as node(), $model as map(*)) {
for $resource in collection('/db/apps/karolinum-apps/data/mono')
let $author := $resource//tei:titleStmt/tei:author/text()
let $bookTitle := $resource//tei:titleStmt/tei:title/text()
let $bookUri := base-uri($resource)
let $bookId := xs:integer(util:random() * 10000)
let $fileUri := ('/exist/rest' || $bookUri)
let $fileName := replace($bookUri, '.*?/', '')
where not(util:is-binary-doc($bookUri))
order by $bookTitle, $author
return
snip:snippet($author, $bookTitle, $bookUri, $bookId, $fileUri)
};
Snippet function:
declare function snip:snippet($author as xs:string, $bookTitle as xs:string, $bookUri as xs:anyURI, $bookId as xs:string, $fileUri as xs:anyURI) as element()* {
let $snippet :=
(
<div class="panel panel-default">
...
</div>
)
return $snippet
};

How to tidy-up Processing Instructions in Marklogic

I have a content which is neither a valid HTML nor a XML in my legacy database. Considering the fact, it would be difficult to clean the legacy, I want to tidy this up in MarkLogic using xdmp:tidy. I am currently using ML-8.
<sub>
<p>
<???†?>
</p>
</sub>
I'm passing this content to tidy functionality in a way :
declare variable $xml as node() :=
<content>
<![CDATA[<p><???†?></p>]]>
</content>;
xdmp:tidy(xdmp:quote($xml//text()),
<options xmlns="xdmp:tidy">
<assume-xml-procins>yes</assume-xml-procins>
<quiet>yes</quiet>
<tidy-mark>no</tidy-mark>
<enclose-text>yes</enclose-text>
<indent>yes</indent>
</options>)
As a result it returns :
<p>
<? ?†?>
</p>
Now this result is not the valid xml format (I checked it via XML validator) due to which when I try to insert this XML into the MarkLogic it throws an error saying 'MALFORMED BODY | Invalid Processing Instruction names'.
I did some investigation around PIs but not much luck. I could have tried saving the content without PI but this is also not a valid PI too.
That is because what you think is a PI is in fact not a PI.
From W3C:
2.6 Processing Instructions
[Definition: Processing instructions (PIs) allow documents to contain
instructions for applications.]
Processing Instructions
[16] PI ::= '' Char*)))?
'?>'
[17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' |
'l'))
So the PI name cannot start with ? as in your sample ??†
You probably want to clean up the content before you pass it to tidy.
Like below:
declare variable $xml as node() :=
<content><![CDATA[<p>Hello <???†?>world</p>]]></content>;
declare function local:copy($input as item()*) as item()* {
for $node in $input
return
typeswitch($node)
case text()
return fn:replace($node,"<\?[^>]+\?>","")
case element()
return
element {name($node)} {
(: output each attribute in this element :)
for $att in $node/#*
return
attribute {name($att)} {$att}
,
(: output all the sub-elements of this element recursively :)
for $child in $node
return local:copy($child/node())
}
(: otherwise pass it through. Used for text(), comments, and PIs :)
default return $node
};
xdmp:tidy(local:copy($xml),
<options xmlns="xdmp:tidy">
<assume-xml-procins>no</assume-xml-procins>
<quiet>yes</quiet>
<tidy-mark>no</tidy-mark>
<enclose-text>yes</enclose-text>
<indent>yes</indent>
</options>)
This would do the trick to get rid of all PIs (real and fake PIs)
Regards,
Peter

How to dynamically create a search query based on a set of quoted strings in MarkLogic

I have the following query, where i want to form a string of values from a list and i want to use that comma separated string as an or-query but it does not give any result, however when i return just the concatenated string it gives the exact value needed for the query.
The query is as follows:
xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";
declare variable $docURI as xs:string external ;
declare variable $orQuery as xs:string external ;
let $tags :=
<tags>
<tag>"credit"</tag>
<tag>"bank"</tag>
<tag>"private banking"</tag>
</tags>
let $docURI := "/2012-10-22_CSGN.VX_(Citi)_Credit_Suisse_(CSGN.VX)__Model_Update.61198869.xml"
let $orQuery := (string-join($tags/tag, ','))
for $x in cts:search(doc($docURI)/doc/Content/Section/Paragraph, cts:or-query(($orQuery)))
let $r := cts:highlight($x, cts:or-query($orQuery), <b>{$cts:text}</b>)
return <result>{$r}</result>
The exact query that i want to run is :
cts:search(doc($docURI)/doc/Content/Section/Paragraph, cts:or-query(("credit","bank","private banking")))
and when i do
return (string-join($tags/tag, ','))
it gives me exactly what i require
"credit","bank","private banking"
But why does it not return any result in or-query?
The string-join step should not need to be string-join. That passes in a literal string. In xQuery, sequences are your friend.
I think you want to do something like this:
let $tags-to-search := ($tags/tag/text()!replace(., '^"|"$', '') ) (: a sequence of tags :)
cts:search(doc($docURI)/doc/Content/Section/Paragraph, cts:word-query($tags-to-search))
cts:word-query is the default query used for parameter 2 of search if you pass in a string. cts:word query also returns matches for any items in a sequence if presented with that.
https://docs.marklogic.com/cts:word-query
EDIT: Added the replace step for the quotes as suggested by Abel. This is specific to the data as presented by the original question. The overall approach remains the same.
Maybe do you need something like this
let $orQuery := for $tag in $tags/tag return cts:word-query($tag)
I used fn:tokenize instead it worked perfectly for my usecase
its because i was trying to pass these arguments from java using XCC api and it would not return anything with string values
xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";
declare variable $docURI as xs:string external ;
declare variable $orQuery as xs:string external ;
let $input := "credit,bank"
let $tokens := fn:tokenize($input, ",")
let $docURI := "2012-11-19 0005.HK (Citi) HSBC Holdings Plc (0005.HK)_ Model Update.61503613.pdf"
for $x in cts:search(fn:doc($docURI), cts:or-query(($tokens)))
let $r := cts:highlight($x, cts:or-query(($tokens)), <b>{$cts:text}</b>)
return <result>{$r}</result>

split document by using MarkLogic Flow Editor

i try to split my incoming documents using "Information Studio Flows" (MarkLogic v 8.0-1.1). The problem is in "Transform" section.
This is my importing documents. For simplicity i reduce it content to one stwtext-element
<docs>
<stwtext id="RD-10-00258" update="03.2011" seq="RQ-10-00001">
<head>
<ti>
<i>j</i>
</ti>
<ff-list>
<ff id="0103"/>
</ff-list>
</head><p>
Symbol für die
<vw idref="RD-19-04447">Stromdichte</vw>
.
</p>
</stwtext>
</docs>
This is my "xquery transform" content:
xquery version "1.0-ml";
(: Copyright 2002-2015 MarkLogic Corporation. All Rights Reserved. :)
(:
:: Custom action. It must be a CPF action module.
:: Replace this text completely, or use it as a template and
:: add imports, declarations,
:: and code between START and END comment tags.
:: Uses the external variables:
:: $cpf:document-uri: The document being processed
:: $cpf:transition: The transition being executed
:)
import module namespace cpf = "http://marklogic.com/cpf"
at "/MarkLogic/cpf/cpf.xqy";
(: START custom imports and declarations; imports must be in Modules/ on filesystem :)
(: END custom imports and declarations :)
declare option xdmp:mapping "false";
declare variable $cpf:document-uri as xs:string external;
declare variable $cpf:transition as node() external;
if ( cpf:check-transition($cpf:document-uri,$cpf:transition))
then
try {
(: START your custom XQuery here :)
let $doc := fn:doc($cpf:document-uri)
return
xdmp:eval(
for $wpt in fn:doc($doc)//stwtext
return
xdmp:document-insert(
fn:concat("/rom-data/", fn:concat($wpt/#id,".xml")),
$wpt
)
)
(: END your custom XQuery here :)
,
cpf:success( $cpf:document-uri, $cpf:transition, () )
}
catch ($e) {
cpf:failure( $cpf:document-uri, $cpf:transition, $e, () )
}
else ()
by running of snippet, i take the error:
Invalid URI format
and long description of it:
XDMP-URI: (err:FODC0005) fn:doc(fn:doc("/8122584828241226495/12835482492021535301/URI=/content/home/admin/Vorlagen/testing/v10.new-ML.xml")) -- Invalid URI format: "
j
Symbol für die
Stromdichte
"
In /18200382103958065126.xqy on line 37
In xdmp:invoke("/18200382103958065126.xqy", (xs:QName("trgr:uri"), "/8122584828241226495/12835482492021535301/URI=/content/home/admi...", xs:QName("trgr:trigger"), ...), <options xmlns="xdmp:eval"><isolation>different-transaction</isolation><prevent-deadlocks>t...</options>)
$doc = fn:doc("/8122584828241226495/12835482492021535301/URI=/content/home/admin/Vorlagen/testing/v10.new-ML.xml")
In /MarkLogic/cpf/triggers/internal-cpf.xqy on line 179
In execute-action("on-state-enter", "http://marklogic.com/states/initial", "/8122584828241226495/12835482492021535301/URI=/content/home/admi...", (xs:QName("trgr:uri"), "/8122584828241226495/12835482492021535301/URI=/content/home/admi...", xs:QName("trgr:trigger"), ...), <options xmlns="xdmp:eval"><isolation>different-transaction</isolation><prevent-deadlocks>t...</options>, (fn:doc("http://marklogic.com/cpf/pipelines/14379829270688061297.xml")/p:pipeline, fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline), fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline/p:state-transition[1]/p:default-action, fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline/p:state-transition[1])
$caller = "on-state-enter"
$state-or-status = "http://marklogic.com/states/initial"
$uri = "/8122584828241226495/12835482492021535301/URI=/content/home/admi..."
$vars = (xs:QName("trgr:uri"), "/8122584828241226495/12835482492021535301/URI=/content/home/admi...", xs:QName("trgr:trigger"), ...)
$invoke-options = <options xmlns="xdmp:eval"><isolation>different-transaction</isolation><prevent-deadlocks>t...</options>
$pipelines = (fn:doc("http://marklogic.com/cpf/pipelines/14379829270688061297.xml")/p:pipeline, fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline)
$action-to-execute = fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline/p:state-transition[1]/p:default-action
$chosen-transition = fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline/p:state-transition[1]
$raw-module-name = "/18200382103958065126.xqy"
$module-kind = "xquery"
$module-name = "/18200382103958065126.xqy"
In /MarkLogic/cpf/triggers/internal-cpf.xqy on line 320
i thought, it was a problem with "Document setting" in "load" section of "Flow editor"
URI=/content{$path}/{$filename}{$dot-ext}
but if i remove it, i recive the same error.
i have no idea what to do. i am really new. please help
First of all, Information Studio has been deprecated in MarkLogic 8. I would also recommend very much looking in to the aggregate_record feature of MarkLogic Content Pump:
http://docs.marklogic.com/guide/ingestion/content-pump#id_65814
Apart from that, there are several issues with your code. You are calling fn:doc twice, effectively trying to interpret the doc contents as a uri. There is an unnecessary xdmp:eval wrapping the FLWOR statement, which expects a string as first param. I think you can shorten it to (showing inner part of the action only):
(: START your custom XQuery here :)
let $doc := fn:doc($cpf:document-uri)
for $wpt in $doc//stwtext
return
xdmp:document-insert(
fn:concat("/roempp-data/", fn:concat($wpt/#id,".xml")),
$wpt
)
(: END your custom XQuery here :)
HTH!
very many thanks #grtjn and this is my approach. Practically it is the same solution
(: START your custom XQuery here :)
xdmp:log(fn:doc($cpf:document-uri), "debug"),
let $doc := fn:doc($cpf:document-uri)
return
xdmp:eval('
declare variable $doc external;
for $wpt in $doc//stwtext
return (
xdmp:document-insert(
fn:concat("/roempp-data/", fn:concat($wpt/#id,".xml")),
$wpt,
xdmp:default-permissions(),
"roempp-data"
)
)'
,
(xs:QName("doc"), $doc),
<options xmlns="xdmp:eval">
<database>{xdmp:database("roempp-tutorial")}</database>
</options>
)
(: END your custom XQuery here :)
Ok, now it works. It is fine, but i found, that after the loading is over, i see in MarkLogic two documents:
my splited document "/rom-data/RD-10-00258.xml" with one root element "stwtext" (as desired)
origin document "URI=/content/home/admin/Vorlagen/testing/v10.new-ML.xml" with root element "docs"
is it possible to prohibit insert of origin document ?

Resources