Update multiple XML files using XQuery and Zorba - xquery

Is there a simple way using the Zorba XQuery Processor to update multiple XML files and save the output of the modification back in the same file?
So far I have figured out how to process multiple files using the File module and the file:list extension to find all the XML files in a directory. I then loop through each document and run an XQuery Update statement (replace value of node {} with {}). The problem is that this doesn't actually modify the file.
I was using Saxon before, but the license costs are too expensive for this particular project. In Saxon EE, if I ran a "replace value of node" on an open document, the document would be updated on the disk when the query was finished. I suspect Zorba doesn't work this way, instead only modifying the value in memory during the query. If I was editing one file I would just output the modified XML in Zorba and pipe that back to the input file, but in this case I want to update many files. Is that possible in one query?
Here is what the code looks like:
import module namespace file = "http://expath.org/ns/file";
for $file in file:list("XML", true(), "*.xml")
let $doc := doc(concat("XML/", $file))
return
{
for $key in $doc//key
return
replace value of node $key/texture
with replace($key/material/text(), ".mat", ".png")
}

Figured it out! I had to use the XQuery scripting extension that Zorba provides to re-write the result back to the file:
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
import module namespace file = "http://expath.org/ns/file";
for $file in file:list("XML", true(), "*.xml")
return
{
variable $doc := doc(concat("XML/", $file));
for $key in $doc//key
return
replace value of node $key/texture
with replace($key/material/text(), ".mat", ".png");
file:write(concat("XML/", $file), $doc,
<output:serialization-parameters>
<output:indent value="yes"/>
<output:method value="xml"/>
<output:omit-xml-declaration value="no"/>
</output:serialization-parameters>
);
}

Related

how to write query where the input file is passed from the command line (Saxon)

I'm very new to this.
I have a query and an xml file.
I can write a query over that specific file
for $x in doc("file:///C:/Users/Foo/IdeaProjects/XQuery/src/books.xml")/bookstore/book
where $x/price>30
order by $x/title
return $x/title
I have a basic xml file, with books in it, works nicely in intellij.
but if I wanted to run this query against some file defined on the command line, then how do I do it?
the command line for running the above is (as much for other peoples reference)
java -cp C:\Users\Foo\.IdeaIC2019.2\config\plugins\xquery-intellij-plugin\lib\Saxon-HE-9.9.1-7.jar net.sf.saxon.Query -t -q:"C:\Users\Foo\IdeaProjects\XQuery\src\w3schools.com.xqy"
and that also works nicely.
the saxon documentation
https://www.saxonica.com/html/documentation/using-xquery/commandline.html
implies that I can specify an input file, using "-d"
and "The document node of the document is made available to the query as the context item"
but this doesnt really make any sense to my 1 day old XQuery skills.
how do I specify the document is sent from the command line in the query? what is the context item? and how do I reference it?
(I can do a bit of XSLT 1.0, so I understand the notion of a context).
I think the option is named -s (for source) so you can use -s:books.xml and inside your XQuery main expression any path is evaluated with that document as the context item so you can just use e.g.
for $x in /bookstore/book
where $x/price>30
order by $x/title
return $x/title
and the answer is to drop the doc() function
for $x in bookstore/book
i.e. the same notion as xslt.

Write directly to file from BaseX GUI

I wrote an XQuery expression that has a large result of about 50MB and takes a couple of hours to compute. I execute it in the BaseX GUI, but this is a little inconvenient: it crops the result to a result window, which I then have to save. At this time, BaseX becomes unresponsive and may crash.
Is there a way to directly write the result to a file?
Have a look at BaseX' file module, which provides broad functionality to read and write from files and traverse the file system.
For you, file:write($path as xs:string, $items as item()*) as empty-sequence() will be of special interest, which allows to write an element sequence to a file. For example:
file:write(
'/tmp/output.xml',
<root>{
for $i in 1 to 1000000
return <some-large-amount-of-data />
}</root>
)
If your output isn't well-formed XML, consider the file:write-binary, file:write-text and file:write-text-lines functions.
Yet another alternative might be writing to documents in the database instead of files. db:add and db:create from the database module can be used to add the computed results to the current or a new database.

Updating embedded triples using xquery in MarkLogic

I tried to update embedded triples in marklogic using xquery but it seems to be not working for embedded triples however the same query is working for other triples
can you tell me if there is some other option which needs to specified while performing an update on embedded triples.
The code i used is
xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics"
at "/Marklogic/semantics.xqy";
let $triples := cts:triples(sem:iri("http://smartlogic.com/document#2012-10-26_DNB.OL_(Citi)_DNB_ASA_(DNB.OL)__Model_Update.61259187.xml"),()())
for $triple in $triples
let $node := sem:database-nodes($triple)
let $replace :=
<sem:triple>
<sem:subject>http://www.example.com/products/1001_Test
</sem:subject>
{$node/sem:predicate, $node/sem:object}
</sem:triple>
return $node ! xdmp:node-replace(., $replace)
My document contains the following triple
<sem:triples xmlns:sem="http://marklogic.com/semantics">
<sem:triple>
<sem:subject>http://smartlogic.com/document#2012-10-26_DNB.OL_(Citi)_DNB_ASA_(DNB.OL)__Model_Update.61259187.xml</sem:subject>
<sem:predicate>http://www.smartlogic.com/schemas/docinfo.rdf#cik</sem:predicate>
<sem:object>datatype="http://www.w3.org/2001/XMLSchema#string</sem:object>
</sem:triple>
</sem:triples>
and i want this particular subject to change into something like this
<sem:subject>http://www.example.com/products/1001_Test</sem:subject>
But when i use the xquery to update it , it does not alter anything, the embedded triple in the documents remains the same.
Because when i tried to see if any of the results have changed to the subject i specified it returned me no results.
I used the following query to test.
SELECT *
WHERE {
<http://www.example.com/products/1001_Test> ?predicate ?object
}
You need to add the option 'all' when you ask for the database nodes backing the triple: sem:database-nodes($triple, 'all').
To be perfectly honest, I am not 100% sure why, but I think this is because your sem:triples element is not the root element of the document it appears on.

How to rename a document in MarkLogic?

I have simple task to do but unable to find the exact solutions for this.I have saved a file as abc.xml in MarkLogic.How can i rename the file as some example.xml using XQuery?
Code which I tried:
xquery version "1.0-ml";
xdmp:document-rename ("/aaa.xml","/final.xml");
This is showing an error.
There is no way, that I know of, to change the document URI of an existing document. The only way I can think of is to create a new document with the same content and the new URI, and delete the existing one, in the same transaction.
Where it gets tricky is to make sure to preserve the ownership, the permissions, all the properties, the property document, make sure that the old URI is not used anywhere to link to the existing document, etc.
But usually, the document URI is never really used. You should first considering whether you really need to rename the document, and why.
(Note that saying "this is showing an error" is rarely useful on SO or on mailing lists, if you do not show what the error is.)
Florent is correct, a true 'rename' is not possible, or perhaps not even meaningful. ( analogy - rename a file from one disk to another )
"Move" however is meaningful (copy then delete in a transaction).
Defining "Move" is use case dependent - i.e. what metatdata also needs to 'move' ? permissions? collections ? document properties ? inherited permissions ?
xmlsh (http://www.xmlsh.org) implements a 'rename' (http://www.xmlsh.org/MarkLogicRename) command for the marklogic extension which is really a 'move', with the implemenation borrowed from postings on markmail (http://markmail.org/)
The implementation is the following XQuery - it doesnt do everything you might want and it might do more then you want. YMMV
https://github.com/DALDEI/xmlsh/blob/master/extensions/marklogic/src/org/xmlsh/marklogic/resources/rename.xquery
( it was also written long ago - it is likely to benefit from improvement )
I have working example this works for me.
xquery version "1.0-ml";
declare function local:document-rename(
$old-uri as xs:string, $new-uri as xs:string)
as empty-sequence()
{
xdmp:document-delete($old-uri),
let $permissions := xdmp:document-get-permissions($old-uri)
let $collections := xdmp:document-get-collections($old-uri)
return xdmp:document-insert(
$new-uri, doc($old-uri),
if ($permissions) then $permissions
else xdmp:default-permissions(),
if ($collections) then $collections
else xdmp:default-collections(),
xdmp:document-get-quality($old-uri)
)
,
let $prop-ns := namespace-uri(<prop:properties/>)
let $properties :=
xdmp:document-properties($old-uri)/node()
[ namespace-uri(.) ne $prop-ns ]
return xdmp:document-set-properties($new-uri, $properties)
};
(: function call :)
local:document-rename ("/opt/backup/x.xml","y.xml");
MarkLogic has a tutorial up addressing file renaming (moving):
https://developer.marklogic.com/recipe/move-a-document/
Importantly, it uses the function xdmp:lock-for-update() to prevent modifications to the source file while it is being copied to the target location.
Also, if you are doing a batch renaming you'll want to make sure that each file URI you rename corresponds to a document in the database or you'll get runtime errors.

Not picking up modifications to an XML document in the database

I have only just started with MarkLogic and XQuery. I am having a really tough time in modifying the content of one of my XML documents. I just cannot seem to get a change to an element to pick up. Here's my process (I have had to take things back as basic as I could just to try and get it working):
In query console I have one tab open which queries for the contents of one XML doc:
xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";
xdmp:document-get("C:/Users/Paul/Documents/MarkLogic/xml/ppl/ppl/jdbc_ppl_3790.xml")
This brings back the document as below
false
...
3790
Victoria Wilson
</ppl_name>
I now want to update the element using XQuery but it's just not happening. Here's the XQuery:
xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";
let $docxml :=
xdmp:document-get("C:/Users/Paul/Documents/MarkLogic/xml/ppl/ppl/jdbc_ppl_3065.xml")/document/meta/ppl_name
return
for $node in $docxml/*
let $target := xdmp:document-get("C:/Users/Paul/Documents/MarkLogic/xml/ppl/ppl/jdbc_ppl_3790.xml")/document/meta/*[fn:name() = fn:name($node)]
return
xdmp:node-replace($target, $node)
I am basically looking to replace the ppl_name element in the target (3790) with the ppl_name element from the source (3065).
I run the XQuery - it completes without error (making me thing it has worked) - return value reads your query returned an empty sequence.
I then go back to the same tab as I used in step 1 and re-run the XQuery used in step 1. The doc (3790) comes back but it STILL has Victoria Wilson as the ppl_name.
The node returned by xdmp:document-get is an in-memory node from a document on the filesystem. It isn't coming from the database. You can't use xdmp:node-replace on in-memory nodes. That's only for database-resident nodes.
You can insert it using xdmp:document-insert. Then it's in the database, and you can access it using doc and update it using xdmp:node-replace. Or you can use in-memory operations to construct a new version with the changes you want.
See What are in memory elements in marklogic? for previous answers to a similar question, and more tips.
Here the node returned by xdmp:document-get is an in-memory node
If your working with in memory elements import the following module
import module namespace mem = "http://xqdev.com/in-mem-update" at "/MarkLogic/appservices/utils/in-mem-update.xqy";
Instead of using xdmp:node-replace you can use mem:node-replace(<x/>, <y/>)

Resources