How to search facets using wildcard search - xquery

How to return all values starting with Ar* when we search for a facet
xquery version "1.0-ml";
import module namespace search = "http://marklogic.com/appservices/search"
at "/MarkLogic/appservices/search/search.xqy";
let $options :=
<options xmlns="http://marklogic.com/appservices/search">
<values name="entity">
<range type="xs:string">
<element ns="http://www.com/mynamespace" name="country" />
</range>
</values>
<return-metrics>false</return-metrics>
</options>
return search:values("entity", $options)

I don't think you can do this with search:values. It does take a start parameter, for which you could specify Ar, but that would only provide a lower-bound, not an upper bound. Proving a range query for upper and lower bound won't help either if you have concurrent values in your document fragments.
In case you can use cts functions directly, I'd say use cts:value-match. That can work with your wildcards directly:
cts:value-match(cts:element-reference(fn:QName("http://www.com/mynamespace", "country"), "Ar*")
HTH!

Related

eXist-db serialize is expand-xincludes=no ignored?

In eXist-db 4.4, Xquery 3.1, I am compressing a number of XML files to a .zip in a directory. The compression process uses serialize().
The XML files have some large xincludes which according to the documentation are automatically processed in serializing. I have attempted to 'turn off' the xinclude serialization in two places in the code (prologue declare and map), but the serializer is still outputting all xincludes:
declare option exist:serialize "expand-xincludes=no";
declare function zip:get-entries-for-zip()
{
(: get documents prefixed by 'MS609' :)
let $pref := "MS609"
(: get list of document names :)
let $doclist := xmldb:get-child-resources($globalvar:URIdata)[starts-with(., $pref)]
(: output serialized entries :)
let $entries :=
for $n in $doclist
return
<entry name="{$n}" type='text' method='store'>
{serialize(doc(concat($globalvar:URIdata, "/", $n)), map { "method": "xml", "expand-xincludes": "no"})}
</entry>
return $entries
};
The XML data with xincludes to reproduce this problem can be found here http://medieval-inquisition.huma-num.fr/downloads under the description "BM MS609 Edition (tei-xml)".
Many thanks in advance.
The expand-xincludes serialization parameter is specific to eXist and, as such (or at least at present), cannot be set using the fn:serialize() function. Instead, use the util:serialize() function:
util:serialize($document, "expand-xincludes=no")
Alternatively, since you're ultimately interested in zipping the contents of a collection, you can skip the explicit serialization step, declare your serialization options in the query's prolog (or set it inline using util:declare-option()), and simply provide the compression:zip() function the URI path(s) to the collections/documents you want to zip. For example:
xquery version "3.1";
declare option exist:serialize "expand-xincludes=no";
let $sources := "/db/apps/my-app/my-data" (: or a sequence of paths to individual docs:) ! xs:anyURI(.)
let $preserve-collection-structure := false()
let $zip := compression:zip($sources, $preserve-collection-structure),
return
xmldb:store("/db", "my-data.zip", $zip)
For more on serialization options in eXist, see my earlier answer to a similar question: https://stackoverflow.com/a/49290616/659732.

eXist-db / XQuery compression:zip() of XML files saves text only

In eXist-db 4.4, XQuery 3.1, I am using automation to compress a number of xml files. The problem is that when they compress they are storing only the text content and not the xml content.
This function uses compression:zip to create a zip from a batch of documents:
declare option exist:serialize "expand-xincludes=no";
declare option exist:serialize "method=xml media-type=application/xml";
declare function zip:create-zip-by-batch()
{
[...]
let $zipobject := compression:zip(zip:get-entry-for-zip($x,false())
let $zipname := "foozipname.zip"
let $store := xmldb:store("/db/foodirectory", $zipname, $zipobject)
return $store
};
The above calls this function, where the documents are serialized and put into <entry> per documentation:
declare option exist:serialize "expand-xincludes=no";
declare option exist:serialize "method=xml media-type=application/xml";
declare function zip:get-entry-for-zip($x)
{
[...for each $foo document in $x, create an <entry>...]
let $serialized := serialize($foo, map { "method": "xml" })
let $entry =
<entry name="somefooname" type='xml' method='store'>
{$serialized}
</entry>
[...return a sequence of $entry...]
}
I think it's missing a configuration for serialization, but I can't figure it out...
Thanks in advance for any help.
Here a query for eXist demonstrating how to compress XML documents into a ZIP file and store it into one's database:
xquery version "3.1";
(: create a test collection with 10 test files: 1.xml = <x>1</x>
thru 10.xml = <x>10</x> :)
let $prepare := xmldb:create-collection("/db", "test")
let $populate := (1 to 10) ! xmldb:store("/db/test", . || ".xml", <x>{.}</x>)
(: construct zip-bound <entry> elements for the documents in the test collection :)
let $entries := collection("/db/test") !
<entry name="{util:document-name(.)}" type="xml" method="store">{
serialize(., map { "method": "xml" })
}</entry>
(: compress the entries and store in database :)
let $zip := compression:zip($entries, false())
return
xmldb:store("/db", "test.zip", $zip)
The resulting ZIP file contains the 10 test XML documents, intact. For a variant showing how to write the ZIP file to a location on your file system, see https://gist.github.com/joewiz/aa8d84500b1f1478779cdf2cc1934348.
For a fuller discussion of serialization options in eXist, see my answer to an earlier question: https://stackoverflow.com/a/49290616/659732.

split document by using MarkLogic Flow Editor

i try to split my incoming documents using "Information Studio Flows" (MarkLogic v 8.0-1.1). The problem is in "Transform" section.
This is my importing documents. For simplicity i reduce it content to one stwtext-element
<docs>
<stwtext id="RD-10-00258" update="03.2011" seq="RQ-10-00001">
<head>
<ti>
<i>j</i>
</ti>
<ff-list>
<ff id="0103"/>
</ff-list>
</head><p>
Symbol für die
<vw idref="RD-19-04447">Stromdichte</vw>
.
</p>
</stwtext>
</docs>
This is my "xquery transform" content:
xquery version "1.0-ml";
(: Copyright 2002-2015 MarkLogic Corporation. All Rights Reserved. :)
(:
:: Custom action. It must be a CPF action module.
:: Replace this text completely, or use it as a template and
:: add imports, declarations,
:: and code between START and END comment tags.
:: Uses the external variables:
:: $cpf:document-uri: The document being processed
:: $cpf:transition: The transition being executed
:)
import module namespace cpf = "http://marklogic.com/cpf"
at "/MarkLogic/cpf/cpf.xqy";
(: START custom imports and declarations; imports must be in Modules/ on filesystem :)
(: END custom imports and declarations :)
declare option xdmp:mapping "false";
declare variable $cpf:document-uri as xs:string external;
declare variable $cpf:transition as node() external;
if ( cpf:check-transition($cpf:document-uri,$cpf:transition))
then
try {
(: START your custom XQuery here :)
let $doc := fn:doc($cpf:document-uri)
return
xdmp:eval(
for $wpt in fn:doc($doc)//stwtext
return
xdmp:document-insert(
fn:concat("/rom-data/", fn:concat($wpt/#id,".xml")),
$wpt
)
)
(: END your custom XQuery here :)
,
cpf:success( $cpf:document-uri, $cpf:transition, () )
}
catch ($e) {
cpf:failure( $cpf:document-uri, $cpf:transition, $e, () )
}
else ()
by running of snippet, i take the error:
Invalid URI format
and long description of it:
XDMP-URI: (err:FODC0005) fn:doc(fn:doc("/8122584828241226495/12835482492021535301/URI=/content/home/admin/Vorlagen/testing/v10.new-ML.xml")) -- Invalid URI format: "
j
Symbol für die
Stromdichte
"
In /18200382103958065126.xqy on line 37
In xdmp:invoke("/18200382103958065126.xqy", (xs:QName("trgr:uri"), "/8122584828241226495/12835482492021535301/URI=/content/home/admi...", xs:QName("trgr:trigger"), ...), <options xmlns="xdmp:eval"><isolation>different-transaction</isolation><prevent-deadlocks>t...</options>)
$doc = fn:doc("/8122584828241226495/12835482492021535301/URI=/content/home/admin/Vorlagen/testing/v10.new-ML.xml")
In /MarkLogic/cpf/triggers/internal-cpf.xqy on line 179
In execute-action("on-state-enter", "http://marklogic.com/states/initial", "/8122584828241226495/12835482492021535301/URI=/content/home/admi...", (xs:QName("trgr:uri"), "/8122584828241226495/12835482492021535301/URI=/content/home/admi...", xs:QName("trgr:trigger"), ...), <options xmlns="xdmp:eval"><isolation>different-transaction</isolation><prevent-deadlocks>t...</options>, (fn:doc("http://marklogic.com/cpf/pipelines/14379829270688061297.xml")/p:pipeline, fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline), fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline/p:state-transition[1]/p:default-action, fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline/p:state-transition[1])
$caller = "on-state-enter"
$state-or-status = "http://marklogic.com/states/initial"
$uri = "/8122584828241226495/12835482492021535301/URI=/content/home/admi..."
$vars = (xs:QName("trgr:uri"), "/8122584828241226495/12835482492021535301/URI=/content/home/admi...", xs:QName("trgr:trigger"), ...)
$invoke-options = <options xmlns="xdmp:eval"><isolation>different-transaction</isolation><prevent-deadlocks>t...</options>
$pipelines = (fn:doc("http://marklogic.com/cpf/pipelines/14379829270688061297.xml")/p:pipeline, fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline)
$action-to-execute = fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline/p:state-transition[1]/p:default-action
$chosen-transition = fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline/p:state-transition[1]
$raw-module-name = "/18200382103958065126.xqy"
$module-kind = "xquery"
$module-name = "/18200382103958065126.xqy"
In /MarkLogic/cpf/triggers/internal-cpf.xqy on line 320
i thought, it was a problem with "Document setting" in "load" section of "Flow editor"
URI=/content{$path}/{$filename}{$dot-ext}
but if i remove it, i recive the same error.
i have no idea what to do. i am really new. please help
First of all, Information Studio has been deprecated in MarkLogic 8. I would also recommend very much looking in to the aggregate_record feature of MarkLogic Content Pump:
http://docs.marklogic.com/guide/ingestion/content-pump#id_65814
Apart from that, there are several issues with your code. You are calling fn:doc twice, effectively trying to interpret the doc contents as a uri. There is an unnecessary xdmp:eval wrapping the FLWOR statement, which expects a string as first param. I think you can shorten it to (showing inner part of the action only):
(: START your custom XQuery here :)
let $doc := fn:doc($cpf:document-uri)
for $wpt in $doc//stwtext
return
xdmp:document-insert(
fn:concat("/roempp-data/", fn:concat($wpt/#id,".xml")),
$wpt
)
(: END your custom XQuery here :)
HTH!
very many thanks #grtjn and this is my approach. Practically it is the same solution
(: START your custom XQuery here :)
xdmp:log(fn:doc($cpf:document-uri), "debug"),
let $doc := fn:doc($cpf:document-uri)
return
xdmp:eval('
declare variable $doc external;
for $wpt in $doc//stwtext
return (
xdmp:document-insert(
fn:concat("/roempp-data/", fn:concat($wpt/#id,".xml")),
$wpt,
xdmp:default-permissions(),
"roempp-data"
)
)'
,
(xs:QName("doc"), $doc),
<options xmlns="xdmp:eval">
<database>{xdmp:database("roempp-tutorial")}</database>
</options>
)
(: END your custom XQuery here :)
Ok, now it works. It is fine, but i found, that after the loading is over, i see in MarkLogic two documents:
my splited document "/rom-data/RD-10-00258.xml" with one root element "stwtext" (as desired)
origin document "URI=/content/home/admin/Vorlagen/testing/v10.new-ML.xml" with root element "docs"
is it possible to prohibit insert of origin document ?

How to find the lowest common ancestor of two nodes in XQuery?

Suppose the input XML is
<root>
<entry>
<title>Test</title>
<author>Me</author>
</entry>
</root>
I would like to find the lowest common ancestor of title and author.
I tried the following code in BaseX:
let $p := doc('t.xq')//title,
$q := doc('t.xq')//author,
$cla := ($p/ancestor-or-self::node() intersect $q/ancestor-or-self::node())
return
$cla
But it returns nothing (blank output).
Your code works totally fine for me, apart from returning all common ancestors.
The Last Common Ancestor
Since they're returned in document order and the last common ancestor must also be the last node, simply extend with a [last()] predicate.
declare context item := document {
<root>
<entry>
<title>Test</title>
<author>Me</author>
</entry>
</root>
};
let $p := //title,
$q := //author,
$cla := ($p/ancestor-or-self::node() intersect $q/ancestor-or-self::node())[last()]
return
$cla
Files and Databases
If the query you posted does not return anything, you might be working on a file t.xq. intersect requires all nodes to be compared in the same database, each invocation of doc(...) on a file creates a new in-memory database. Either create a database in BaseX with the contents, or do something like
declare variable $doc := doc('t.xq');
and replace subsequent doc(...) calls by $doc (which now references a single in-memory database created for the file).
This is one possible way :
let $db := doc('t.xq'),
$q := $db//*[.//title and .//author][not(.//*[.//title and .//author])]
return
$q
brief explanation :
[.//title and .//author] : The first predicate take into account elements having descendant of both title and author.
[not(.//*[.//title and .//author])] : Then the 2nd predicate applies the opposite criteria to the descendant elements, meaning that overall we only accept the inner-most elements matching the first predicate criteria.
output :
<entry>
<title>Test</title>
<author>Me</author>
</entry>
I changed doc('t.xq') in front of the variables $p and $q with the variable $db as follows. Now it works (plus, I used the last() to have the last (lowest) common ancestor).
let
$db := doc('t.xq'),
$p := $db//title,
$q := $db//author,
$cla := ($p/ancestor-or-self::node() intersect $q/ancestor-or-self::node())[last()]
return $cla

Distinct attribute names

With XQuery I want to select a special value from every article within a product.
What I currently have:
Input XML (extract):
<product type="product" id="2246091">
<product type="article">
<attribute identifier="EXAMPLE1" type="BOOLEAN">0</attribute>
<attribute identifier="EXAMPLE2" type="BOOLEAN">1</attribute>
</product>
<product type="article">
<attribute identifier="EXAMPLE1" type="BOOLEAN">1</attribute>
<attribute identifier="EXAMPLE2" type="BOOLEAN">1</attribute>
</product>
<product type="article">
<attribute identifier="EXAMPLE1" type="BOOLEAN">0</attribute>
<attribute identifier="EXAMPLE2" type="BOOLEAN">1</attribute>
</product>
</product>
XQuery:
for $i in //product
[#type = 'product'
and #id = '2246091']
//attribute
[#type='BOOLEAN'
and #identifier= ('EXAMPLE1', 'EXAMPLE2') ]
where $i = '1'
return $i
This returns me every attribute element from every article under a product where the content is '1' and its identifier is EXAMPLE1 or EXAMPLE2.
It could be, that in article 1 there is the same attribute identifier (e.g. EXAMPLE1) as in article 2.
What I get:
<?xml version="1.0" encoding="UTF-8"?>
<attribute identifier="EXAMPLE2" type="BOOLEAN">1</attribute>
<attribute identifier="EXAMPLE1" type="BOOLEAN">1</attribute>
<attribute identifier="EXAMPLE2" type="BOOLEAN">1</attribute>
<attribute identifier="EXAMPLE2" type="BOOLEAN">1</attribute>
I tried to add a distinct-values around my for loop, but this will return me only '1'.
What I would like is to get every attribute only once:
<attribute identifier="EXAMPLE2" type="BOOLEAN">1</attribute>
<attribute identifier="EXAMPLE1" type="BOOLEAN">1</attribute>
It sounds as if what you want is to see one attribute element for each distinct value of the identifier attribute found among the attribute elements whose content is 1. (Or, slightly more challengingly, one attribute element for each set of equivalent attribute elements, where equivalence is defined by deep-equals().)
The distinct-values() function isn't helping you here, because it coerces any input nodes into simple values (here, 1).
If matching on the identifier attribute suffices
If the identifier attribute suffices to establish equivalence among the elements, then something like the following should suffice (not tested):
let $ones := //product[#type = 'product'
and #id = '2246091']
//attribute[#type='BOOLEAN'
and #identifier =
('EXAMPLE1', 'EXAMPLE2') ],
$ids := distinct-values($ones/#identifier)
for $id in $ids
return ($ones[#identifier = $id])[1]
If a more general equivalence test is needed
If #identifier does not suffice to establish equivalence for your purposes, you will have to do something more complicated; in the general case one way to do it would be to write a function of two arguments (I'll call it local:equivalent()) which returns true iff the two arguments are equivalent for your purposes. Then write a second function to accept a sequence of items and remove duplicates from the sequence (where 'being a duplicate' means 'returning true on local:equivalent()). Something like this might work as a first approximation (not tested):
(: dedup#1: remove duplicates from a sequence :)
declare function local:dedup(
$items as item()*
) as xs:boolean {
local:dedup($items, ())
};
(: dedup#2: work through the input sequence one
by one, removing duplicates and accumulating
non-duplicates. Cost is n^2 / 2. :)
declare function local:dedup(
$in as item()*,
$out as item()*
) as xs:boolean {
if (empty($in))
then $out
else let $car := head($in)
return if (some $i in $in
satisfies
local:equivalent($i, $car))
then local:dedup(tail($in), $out)
else local:dedup(tail($in), ($car, $out))
};
(: equivalent#2: true iff arguments are equivalent :)
declare function local:equivalent(
$x, $y : item()
) as xs:boolean {
// determine application-specific equivalence
// however you like ...
deep-equal($x, $y)
};
(: Now do the work :)
let $ones := //product[#type = 'product'
and #id = '2246091']
//attribute[#type='BOOLEAN'
and #identifier =
('EXAMPLE1', 'EXAMPLE2') ]
return local:dedup($ones)
Those comfortable with higher-order functions will want to go a step further and remove the dependency on having a function named local:equivalent by allowing both local:dedup functions to accept an additional argument providing the equivalence function.

Resources