How to validate the ContextItem in xquery - xquery

My XSLT is primitive, my XQuery almost non existent, this should be trivial, so I wont post a whole example.
I have an XQuery, that I'm compiling and executing via the dotnet saxon9ee-api
import schema default element namespace "" at "MessingAbout.xsd";
for $v in (validate { doc("MessingAbout.xml") })/element(SQUARE,FILLEDSQUARETYPE)
return <OUTPUT>{$v/#colour}</OUTPUT>
which works very nicely.
I want to use the "ContextItem" though, so I can query different XMLS, and I've got this to work, by setting the ContextItem in the XQueryEvaluator to a document.
import schema default element namespace "" at "MessingAbout.xsd";
for $v in /SQUARE
return <OUTPUT>{$v/#colour}</OUTPUT>
but I'd like to validate the contextItem and then use that do use things like element(SQUARE,FILLEDSQUARETYPE)...but how do you do this?

I'm not quite sure what you're attempting to do, but given "MessingAbout.xsd":
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:complexType name="FILLEDSQUARETYPE">
<xs:attribute name="colour" type="xs:string"/>
</xs:complexType>
<xs:element name="SQUARE" type="FILLEDSQUARETYPE"/>
</xs:schema>
and "MessingAbout.xml":
<SQUARE colour="red"/>
your first query produces <OUTPUT colour="red"/>, which I assume is what you expect. To use the context item in the second query, I rewrote it as:
import schema default element namespace "" at "MessingAbout.xsd";
for $v in (validate { . })/element(SQUARE,FILLEDSQUARETYPE)
return <OUTPUT>{$v/#colour}</OUTPUT>
and passed the source document on the command line: -q:test2.xq -s:MessingAbout.xml.
That gives me the same result as the first query. I hope that's helpful.

As well as the approaches suggested by Martin and Norm, you have the option of doing the validation in the calling application, e.g. Java or C#. Build the document using a s9api DocumentBuilder with validation options set, and then pass the resulting typed XdmNode as the context item when running the query. This approach is preferable if you want to do more with the validated document than just running one query. But if you do it this way, it's useful for the query to assert that it's expecting a validated document, which you can do with a "declare context-item" in the query prolog.

Related

Not able to get XML file via MarkLogic Corb Tool

I want to get xml input file via the MarkLogic CoRB Tool to proceed further, but not able to get this file via CoRB tool:
ML config Properties file:
THREAD-COUNT=16
MODULE-ROOT=/
MODULES-DATABASE=.\\37074\\XQuery\\PROD-MetadataModules
XML-FILE=.\\37074\\input\\asme_module_v3.xml
XML-NODE=rdf:RDF
PROCESS-MODULE=.\\37074\\XQuery\\upload-skos-file.xqy|ADHOC
EXPORT-FILE-DIR=.\\37074\\Report
EXPORT-FILE-NAME=update-Non-member-price-report.xml
EXPORT-FILE-TOP-CONTENT="Record"
URIS-LOADER=com.marklogic.developer.corb.FileUrisXMLLoader
PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask
DECRYPTER=com.marklogic.developer.corb.JasyptDecrypter
XML Input file('asme_module_v3.xml'), that I want to get through 'upload-skos-file.xqy' via MarkLogic Corb Tool :
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#">
<skos:ConceptScheme rdf:about="http://www.bsigroup.com/asme/">
<skos:hasTopConcept rdf:resource="http://www.bsigroup.com/asme/A112"/>
<skos:hasTopConcept rdf:resource="http://www.bsigroup.com/asme/A120"/>
</skos:ConceptScheme>
</rdf:RDF>
Code in 'upload-skos-file.xqy' file:
xquery version "1.0-ml";
declare variable $URI external;
let $skos-number := $URI
let $_ := xdmp:log("=========================skos-number===========================")
return xdmp:log($skos-number)
The MarkLogic corb tool executes successfully but not get any entry in the Marklogic Log file, I'm not sure where did a mistake there.
The CoRB StreamingXPath is not currently able to register and leverage namespaces and namespace-prefixes, so the XPath targeting namespace-qualified elements can't leverage namespace-prefixes.
A more generic match on the document element with a predicate filtering by local-name() will work though. It's a little ugly and a lot more typing, but works:
XML-NODE=*[local-name()='RDF' and namespace-uri()='http://www.w3.org/1999/02/22-rdf-syntax-ns#']
Or if RDF local-name() is good enough:
XML-NODE=*[local-name()='RDF']

How to get the maximum value of an element using cts:values in MarkLogic?

I want to get the Maximum value of <ID> from all the documents present inside the database.
Sample Document-
<root xmlns="http://marklogic.com/sample">
<node>
<ID>3253523</ID>
<value1>.....</value1>
<value2>.....</value2>
<value3>.....</value3>
<value4>.....</value4>
.....................
</node>
</root>
The approach which i tried is as below-
I created a path namespace with prefix sa with uri http://marklogic.com/sample.
Created a path range index of type int with path as /sa:root/sa:node/sa:ID
3.Trying to fetch the maximum value from the database by using the below code-
declare namespace sa = "http://marklogic.com/sample";
(cts:values(cts:path-reference('/sa:root/sa:node/sa:ID'), (), "descending"))[1]
But this is giving me an empty sequence. Not sure what i am missing here.
Any Suggestions ??
Try passing a map with the namespace bindings as the third argument to cts:path-reference(). See: http://docs.marklogic.com/cts:path-reference
By the way, cts:max() will probably be the most efficient way to get the maximum value from a range index. See: http://docs.marklogic.com/cts:max
The approach would resemble the following fragment:
cts:max(
cts:path-reference('/sa:root/sa:node/sa:ID', (),
map:entry("sa", "http://marklogic.com/sample")
))
Hoping that helps,
As suggested by Elijah Bernstein-Cooper
I just added the xmlns="http://marklogic.com/sample" namespace in the xml shared by you and inserted few xml files in the db.
Created the path namespace, path range index and ran the shared cts query and it worked perfectly so Elijah is correct you just need to specify the namespace in the xml.
Small change in your query is in declare namespace statement, prefix will be sa not es.
hope this helps.

eXist-db ft:query returning zero result while running eXide or oxygen

I am running ft:query on a collection which is stored in eXist-db but it's returning zero results. If I use fn:contains function it works perfect but ft:query returns zero results. Below is my XML structure, index configuration file, and query:
test.xml
<article xmlns="http://www.rsc.org/schema/rscart38"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
type="ART"
xsi:schemaLocation="http://www.rsc.org/schema/rscart38 http://www.rsc.org/schema/rscart38/rscart38.xsd" dtd="RSCART3.8">
<metainfo last-modified="2012-11-23T19:16:50.023Z">
<subsyear>1997</subsyear>
<collectiontype>rscart</collectiontype>
<collectionname>journals</collectionname>
<docid>A605867A</docid>
<doctitle>NMR studies on hydrophobic interactions in solution Part
2.—Temperature and urea effect on
the self-association of ethanol in water</doctitle>
<summary/>
</article>
collection.xconf
<collection xmlns="http://exist-db.org/collection-config/1.0">
<index rsc="http://www.rsc.org/schema/rscart38"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
type="ART"
xsi:schemaLocation="http://www.rsc.org/schema/rscart38 http://www.rsc.org/schema/rscart38/rscart38.xsd"
dtd="RSCART3.8">
<fulltext default="all" attributes="false"/>
<lucene>
<analyzer id="nosw" class="org.apache.lucene.analysis.standard.StandardAnalyzer">
<param name="stopwords" type="org.apache.lucene.analysis.util.CharArraySet"/>
</analyzer>
<text qname="//rsc:article" analyzer="nosw"/>
</lucene>
<create path="//rsc:doctitle" type="xs:string"/>
<create path="//rsc:journal-full-title" type="xs:string"/>
<create path="//rsc:journal-full-title" type="xs:string"/>
</index>
</collection>
test.xq
declare namespace rsc="http://www.rsc.org/schema/rscart38";
let $coll := collection('/db/apps/test/RSC')
let $hits := $coll//rsc:doctitle[ft:query(., 'studies')]
return
$hits
Let's start from your query. The key part of your query is:
$coll//rsc:doctitle[ft:query(., 'studies')]
This performs a full text query for the string studies on rsc:doctitle elements in the collection. For this ft:query() function to work, there must be an index configuration for the named elements. This brings us to your index configuration.
In your index configuration, you have a full text (Lucene) index:
<text qname="//rsc:article" analyzer="nosw"/>
A couple of issues:
The #qname attribute should be a QName - simply, an element or attribute name. You've expressed this as a path. Remove the path //, leaving just rsc:article.
Your code does a full text query on rsc:doctitle, not on rsc:article, so I would expect your code, as written, to return 0 results. Change the existing index to rsc:doctitle, or add a new index on rsc:doctitle so that you could query either one. Reindex the collection afterwards, and as Adam suggested, check the Monex app's Indexing pane to ensure that the database has applied your index configuration as expected.
Lastly, contains() does not require an index to be in place. It benefits from the presence of a range index (i.e., your <create> elements), but range indexes are quite different from full text indexes. To learn more about these, I'd suggest reading the eXist documentation on indexing, http://exist-db.org/exist/apps/doc/indexing.xml.
I am not certain if configuring a Standard Analyzer without stopwords in the way you have done is correct. Can you check with Monex that your index has your terms in it?
Note also, if you created the index config after loading the index, then you need to reindex the collection. When you reindex it is also worth monitoring $EXIST_HOME/webapp/WEB-INF/exist.log to ensure that the indexing is done as expected.

Improve performance of query with range indexes in eXist-db

Reading the docs http://exist-db.org/exist/apps/doc/indexing.xml
I'm finding difficult to understand how and if I can improve the performances of a 'read' query (with 2 parameters: a string and an integer).
Do eXist-db have a default structural index? Can I improve a 2 params query with a 'range index'?
More details about my XML db (note there are 2 different dbs simply merged on the same root):
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<db>
<docs>
<doc>
<header>
<year>2001</year>
<number>1</number>
<type>O</type>
</header>
<metas>
<meta>
<number>26001</number>
<details>
<detail>
<description>legge</description>
<number>19</number>
<date>14/01/1994</date>
</detail>
<detail>
<description>decreto legge</description>
<number>453</number>
<date>15/11/1993</date>
</detail>
</details>
</meta>
</metas>
</doc>
<doc>
<header>
<year>2001</year>
<number>2</number>
<type>O</type>
</header>
<metas>
<meta>
<number>26002</number>
<details>
<detail>
<description>decreto legislativo</description>
<number>29</number>
<date>03/02/1993</date>
</detail>
</details>
</meta>
<meta>
<number>26016</number>
<details>
<detail>
<description>decreto legislativo</description>
<number>29</number>
<date>03/02/1993</date>
</detail>
</details>
</meta>
</metas>
</doc>
</docs>
<full_text_docs>
<doc>
<header>
<year>2001</year>
<number>1</number>
<type>O</type>
<president>ferrari</president>
</header>
<text>lorem ipsum ...
</text>
</doc>
<doc>
<header>
<year>2001</year>
<number>2</number>
<type>O</type>
<president>ferrari</president>
</header>
<text>lorem ipsum......
</text>
</doc>
</full_text_docs>
</db>
This is my xquery
xquery version "3.0";
let $doc := doc("/db//index_test/test_general.xml")//db/docs/doc
let $fulltxt := doc("/db//index_test/test_general.xml")//db/full_text_docs/doc
return <root> {
for $a in $doc[metas/meta/details/detail[date="03/02/1993" and number = "29"]]/header
return $fulltxt[header/year/text()=$a/year/text() and
header/number/text()=$a/number/text() and
header/type/text()=$a/type/text()
]
} </root>
Basically I simply find for the detail/number and detail/date that matches the input in the first db and take the results for querying the second db. The results are all the <full_text_header> documents that matches.
I would to know if I can create indexes for the fields number and date to improve performance. Note this is the ONLY query I need to optimize (the only I do on this db) obviously number and date changes :).
SOLUTION:
For a clear explanation read the joewiz answer. My problem was the correct recognition of the .xconf file. It have to be placed in /db/yourcollectiondir. If you're using eXide when you create the file you should select Xml type with template "eXist-db collection configuration". When you try to save the file you will see a prompt "Apply configuration?" then click 'ok'. Just then run this xquery xmldb:reindex('/db/yourcollectiondir').
Now if all it's right when you run an xquery involving an index you will see the usage in "Monitoring and profiling".
As that documentation page states, eXist does create a structural index for all XML stored in the database. This is not an index of values, though, so without further indexes, queries based on value (rather than structure) would involve a lookup of values in the DOM. As your data grows larger, looking up values in the DOM gets slower and slower. This is where value-based indexes, such a range index, saves the day. (For a fuller explanation, see the "Indexing" section of Wolfgang Meier's "Tuning the Database" article, which is essential for getting the most performance out of eXist.)
So, yes, you can create indexes for the <number> and <date> fields. I'd recommend the "new range" index, as described on that documentation page. Your collection.xconf file setting up these indexes would look like this:
<collection xmlns="http://exist-db.org/collection-config/1.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<index>
<range>
<create qname="number" type="xs:integer"/>
<create qname="date" type="xs:string"/>
</range>
</index>
</collection>
You have to store this within the /db/system/config/ collection, in a subcollection corresponding to the location of your data in the database. So if your data is located in /db/apps/myapp/data, you would place this collection.xconf file in /db/system/config/db/apps/myapp/data.
Note that the configuration here would only affect the for clause's queries of date and number values, and not the predicates in the return clause, which depend on the values of <year> and <type> elements. So, to ensure your query maximized the use of indexes, you should declare indexes on these; it seems that xs:integer would be the appropriate type for each.
Lastly, I would suggest eliminating the /text() steps, which are completely extraneous. For more on the use/abuse of text(), see Evan Lenz's article, "text() is a code smell".
Update (2016-07-17): With the updated code sample above, I have a couple of additional suggestions. First, since the code is in /db/index_test, we will store our files as follows:
Assuming you're using eXide, when you store the collection.xconf file in a collection, eXide will prompt you to have a copy of the file placed in the correct location in /db/system/config. If you're not using eXide, you need to store the collection.xconf file there yourself.
Using the unmodified query, I can confirm that despite the presence of the collection.xconf file, monex shows no indexes are being applied:
Let's make a few modifications to the file to ensure indexes are properly applied:
xquery version "3.0";
<root> {
for $a in doc("/db/index_test/test_general.xml")//detail[date = "03/02/1993" and number = 29]/ancestor::doc/header
return
doc("/db/index_test/test_general.xml")/db/full_text_docs/doc
[
header/year = $a/year and
header/number = $a/number and
header/type = $a/type
]
} </root>
With these modifications, monex shows that indexes are applied to the comparisons in the for clause:
The insights here are derived from the "Tuning the Database" article. To get full indexing for all comparisons, you will need to define additional indexes and may need to make similar modifications to your query.
One final note: the version of monex you see in these pictures is using a feature I added this weekend, called "Tare", which tries to filter out other operations from the query profiling results in order to help the user see just the effects of their own query. This feature is still just a pull request, so running the current release version, you won't see identical results.

Not picking up modifications to an XML document in the database

I have only just started with MarkLogic and XQuery. I am having a really tough time in modifying the content of one of my XML documents. I just cannot seem to get a change to an element to pick up. Here's my process (I have had to take things back as basic as I could just to try and get it working):
In query console I have one tab open which queries for the contents of one XML doc:
xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";
xdmp:document-get("C:/Users/Paul/Documents/MarkLogic/xml/ppl/ppl/jdbc_ppl_3790.xml")
This brings back the document as below
false
...
3790
Victoria Wilson
</ppl_name>
I now want to update the element using XQuery but it's just not happening. Here's the XQuery:
xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";
let $docxml :=
xdmp:document-get("C:/Users/Paul/Documents/MarkLogic/xml/ppl/ppl/jdbc_ppl_3065.xml")/document/meta/ppl_name
return
for $node in $docxml/*
let $target := xdmp:document-get("C:/Users/Paul/Documents/MarkLogic/xml/ppl/ppl/jdbc_ppl_3790.xml")/document/meta/*[fn:name() = fn:name($node)]
return
xdmp:node-replace($target, $node)
I am basically looking to replace the ppl_name element in the target (3790) with the ppl_name element from the source (3065).
I run the XQuery - it completes without error (making me thing it has worked) - return value reads your query returned an empty sequence.
I then go back to the same tab as I used in step 1 and re-run the XQuery used in step 1. The doc (3790) comes back but it STILL has Victoria Wilson as the ppl_name.
The node returned by xdmp:document-get is an in-memory node from a document on the filesystem. It isn't coming from the database. You can't use xdmp:node-replace on in-memory nodes. That's only for database-resident nodes.
You can insert it using xdmp:document-insert. Then it's in the database, and you can access it using doc and update it using xdmp:node-replace. Or you can use in-memory operations to construct a new version with the changes you want.
See What are in memory elements in marklogic? for previous answers to a similar question, and more tips.
Here the node returned by xdmp:document-get is an in-memory node
If your working with in memory elements import the following module
import module namespace mem = "http://xqdev.com/in-mem-update" at "/MarkLogic/appservices/utils/in-mem-update.xqy";
Instead of using xdmp:node-replace you can use mem:node-replace(<x/>, <y/>)

Resources