Recursive copy of a folder with XQuery - recursion

I have to copy an entire project folder inside the MarkLogic server and instead of doing it manually I decided to do it with a recursive function, but is becoming the worst idea I have ever had. I'm having problems with the transactions and with the syntax but being new I don't find a true way to solve it. Here's my code, thank you for the help!
import module namespace dls = "http://marklogic.com/xdmp/dls" at "/MarkLogic/dls.xqy";
declare option xdmp:set-transaction-mode "update";
declare function local:recursive-copy($filesystem as xs:string, $uri as xs:string)
{
for $e in xdmp:filesystem-directory($filesystem)/dir:entry
return
if($e/dir:type/text() = "file")
then dls:document-insert-and-manage($e/dir:filename, fn:false(), $e/dir:pathname)
else
(
xdmp:directory-create(concat(concat($uri, data($e/dir:filename)), "/")),
local:recursive-copy($e/dir:pathname, $uri)
)
};
let $filesystemfolder := 'C:\Users\WB523152\Downloads\expath-ml-console-0.4.0\src'
let $uri := "/expath_console/"
return local:recursive-copy($filesystemfolder, $uri)

MLCP would have been nice to use. However, here is my version:
declare option xdmp:set-transaction-mode "update";
declare variable $prefix-replace := ('C:/', '/expath_console/');
declare function local:recursive-copy($filesystem as xs:string){
for $e in xdmp:filesystem-directory($filesystem)/dir:entry
return
if($e/dir:type/text() = "file")
then
let $source := $e/dir:pathname/text()
let $dest := fn:replace($source, $prefix-replace[1], $prefix-replace[2])
let $_ := xdmp:document-insert($source,
<options xmlns="xdmp:document-load">
<uri>{$dest}</uri>
</options>)
return <record>
<from>{$source}</from>
<to>{$dest}</to>
</record>
else
local:recursive-copy($e/dir:pathname)
};
let $filesystemfolder := 'C:\Temp'
return <results>{local:recursive-copy($filesystemfolder)}</results>
Please note the following:
I changed my sample to the C:\Temp dir
The output is XML only because by convention I try to do this in case I want to analyze results. It is actually how I found the error related to conflicting updates.
I chose to define a simple prefix replace on the URIs
I saw no need for DLS in your description
I saw no need for the explicit creation of directories in your use case
The reason you were getting conflicting updates because you were using just the filename as the URI. Across the whole directory structure, these names were not unique - hence the conflicting update on double inserts of same URI.
This is not solid code:
You would have to ensure that a URI is valid. Not all filesystem paths/names are OK for a URI, so you would want to test for this and escape chars if needed.
Large filesystems would time-out, so spawning in batches may be useful.
A an example, I might gather the list of docs as in my XML and then process that list by spawning a new task for every 100 documents. This could be accomplished by a simple loop over xdmp:spawn-function or using a library such as taskbot by #mblakele

Related

Xquery. How to check current incremental backup status?

I have written an Xquery to that gets executed at the time of when incremental backup is in progress. I know the backup status returns three possible values -
completed, in-progress and failed. Not sure the exact value of last one but anyways this is my xquery -
xquery version "1.0-ml";
declare function local:escape-for-regex
( $arg as xs:string? ) as xs:string {
replace($arg,
'(\.|\[|\]|\\|\||\-|\^|\$|\?|\*|\+|\{|\}|\(|\))','\\$1')
} ;
declare function local:substring-before-last
( $arg as xs:string? ,
$delim as xs:string ) as xs:string {
if (matches($arg, local:escape-for-regex($delim)))
then replace($arg,
concat('^(.*)', local:escape-for-regex($delim),'.*'),
'$1')
else ''
} ;
let $server-info := doc("/config/server-info.xml")
let $content-database :="xyzzy"
let $backup-directory:=$server-info/configuration/server-info/backup-directory/text()
let $backup-latest-dateTime := xdmp:filesystem-directory(fn:concat( $backup-directory,'/',$content-database))/dir:entry[1]/dir:filename/text()
let $backup-latest-date := fn:substring-before($backup-latest-dateTime,"-")
let $backup-info := cts:search(/,cts:element-value-query(xs:QName("directory-name"),$backup-latest-date))
let $new-backup := if($backup-info)
then fn:false()
else fn:true()
let $db-bkp-status := if($new-backup)
then (xdmp:database-backup-status(())[./*:forest/*:backup-path[fn:contains(., $backup-latest-dateTime)]][./*:forest/*:incremental-backup eq "false"]/*:status)
else (xdmp:database-backup-status(())[./*:forest/*:backup-path[fn:contains(., $backup-latest-dateTime)]][./*:forest/*:incremental-backup eq "true"][./*:forest/*:incremental-backup-path[fn:contains(., fn:replace(local:substring-before-last(xs:string(fn:current-date()), "-"), "-", ""))]]/*:status)
return $db-bkp-status
We maintain a configuration file that stores backup status. If there is a new full backup day then $backup-info will return nothing. If it is daily incremental backup day then it will return the config. I'm using it just to check if todays backup is new full or incremental. For incremental day $backup-info is false and so it goes to the last line i.e. else condition. this doesn't return anything for incremental backups. Neither completed nor in-progress. I wonder how markLogic picks up the timestamp. Please assist on this.
Feel free to provide your own xquery from scratch. I can update mine.
I even took out the Job id and search in the output of the function xdmp:database-backup-status(()) but that job id too doesn't exist in the result set.
MarkLogic provides the Admin modules to provide much of the information you are attempting to get via other methods. The Admin UI modules (typically found in /opt/MarkLogic/Modules/MarkLogic/Admin/Lib) contains a lot of helpful code that can be adapted to get these sorts of details. In this case I would refer to database-status-form.xqy
define function db-mount-state(
$fstats as node()*,
$fcounts as node()*,
$dbid as xs:unsignedLong)
{
let $times := $fstats/fs:last-state-change,
$ls := max($times),
$since :=
if (not(empty($ls)))
then concat(" since ", longDate($ls), " ", longTimeSecs($ls))
else ""
return concat(database-status($dbid,$fstats,$fcounts),$since)
}
define function backup-recov-state($fstats as node()*)
{
if(empty($fstats/fs:backups/fs:backup)
and
empty($fstats/fs:restore))
then
"No backup or restore in progress"
else
if(empty($fstats/fs:backups/fs:backup))
then
"Restore in progress (see below for details)"
else
"Backup in progress (see below for details)"
}
... Call the functions against your database, then pull the details from the elements you want:
let $last-full-backup := max($fstats/fs:last-backup)
let $last-incremental-backup : = max($fstats/fs:last-incr-backup
return ($last-full-backup, $last-incremental-backup)
This is just some sample code snippets, not executable, but it should get you moving in the right direction.

How to execute XQuery on all XML documents in the folder

I need to make sure a particular node exists in many XML files. I have to switch the context each time I want to query another document.
Is there any way I can execute XQuery on all documents in the directory without switching the context?
I may be a little late, but most probably the following XQuery will do what you wish, it returns the path to each XML-File that does not contain a specific element:
let $path := "."
for $file in file:list( $path, true(), '*.xml')
let $path := $path || "/" || $file
where not(
exists(fetch:xml($path)/foo/bar[text() = "Text"])
)
return $path
If you were only interested if there were XML-files in a specific that do or do not contain a specific element the following query might be useful:
declare variable $path := "/Users/michael/Code/foo";
every $doc in file:list($path, true(), '*.xml') (: returns a sequence of file-names :)
=> for-each(concat($path,"/", ?)) (: returns a sequence of full paths to each file :)
=> for-each(fetch:xml#1) (: returns a sequence of documents :)
satisfies exists(
$doc/*/*[text() = "Text"]
)
Hope this helps ;-)

eXist-db serialize is expand-xincludes=no ignored?

In eXist-db 4.4, Xquery 3.1, I am compressing a number of XML files to a .zip in a directory. The compression process uses serialize().
The XML files have some large xincludes which according to the documentation are automatically processed in serializing. I have attempted to 'turn off' the xinclude serialization in two places in the code (prologue declare and map), but the serializer is still outputting all xincludes:
declare option exist:serialize "expand-xincludes=no";
declare function zip:get-entries-for-zip()
{
(: get documents prefixed by 'MS609' :)
let $pref := "MS609"
(: get list of document names :)
let $doclist := xmldb:get-child-resources($globalvar:URIdata)[starts-with(., $pref)]
(: output serialized entries :)
let $entries :=
for $n in $doclist
return
<entry name="{$n}" type='text' method='store'>
{serialize(doc(concat($globalvar:URIdata, "/", $n)), map { "method": "xml", "expand-xincludes": "no"})}
</entry>
return $entries
};
The XML data with xincludes to reproduce this problem can be found here http://medieval-inquisition.huma-num.fr/downloads under the description "BM MS609 Edition (tei-xml)".
Many thanks in advance.
The expand-xincludes serialization parameter is specific to eXist and, as such (or at least at present), cannot be set using the fn:serialize() function. Instead, use the util:serialize() function:
util:serialize($document, "expand-xincludes=no")
Alternatively, since you're ultimately interested in zipping the contents of a collection, you can skip the explicit serialization step, declare your serialization options in the query's prolog (or set it inline using util:declare-option()), and simply provide the compression:zip() function the URI path(s) to the collections/documents you want to zip. For example:
xquery version "3.1";
declare option exist:serialize "expand-xincludes=no";
let $sources := "/db/apps/my-app/my-data" (: or a sequence of paths to individual docs:) ! xs:anyURI(.)
let $preserve-collection-structure := false()
let $zip := compression:zip($sources, $preserve-collection-structure),
return
xmldb:store("/db", "my-data.zip", $zip)
For more on serialization options in eXist, see my earlier answer to a similar question: https://stackoverflow.com/a/49290616/659732.

MarkLogic - How to insert element into XML

How to insert the node in XML.
let $a := <a><b>bbb</b></a>)
return
xdmp:node-insert-after(doc("/example.xml")/a/b, <c>ccc</c>);
Expected Output:
<a><c>ccc</c><b>bbb</b></a>
Please help to get the output.
You should be using xdmp:node-insert-before I believe in the following way:
xdmp:document-insert('/example.xml', <a><b>bbb</b></a>);
xdmp:node-insert-before(fn:doc('/example.xml')/a/b, <c>ccc</c>);
fn:doc('/example.xml');
(: returns <a><c>ccc</c><b>bbb</b></a> :)
Nodes are immutable, so in-memory mutation can only be done by creating a new copy.
The copy can use the unmodified contained nodes from the original:
declare function local:insert-after(
$prior as node(),
$inserted as node()+
) as element()
{
let $container := $prior/parent::element()
return element {fn:node-name($container)} {
$container/namespace::*,
$container/attribute(),
$prior/preceding-sibling::node(),
$prior,
$inserted,
$prior/following-sibling::node()
}
};
let $a := <a><b>bbb</b></a>
return local:insert-after($a//b, <c>ccc</c>)
Creating a copy in memory and then inserting the copy is faster than inserting and modifying a document in the database.
Depending on how many documents are inserted, the difference could be significant.
There are community libraries for copying with changes, but sometimes it's as easy to write a quick function (recursive where necessary).
You can use below code to insert the element into the XML:
xdmp:node-insert-child(fn:doc('directory URI'),element {fn:QName('http://yournamesapce','elementName') }{$elementValue})
Here we use fn:QName to remove addition of xmlns="" in added node.

To remove the node but keep the value inside intact through XQuery

I have a content.xml modelled as below
<root>
<childnode>
Some text here
</childnode>
</root>
I am trying to remove the <childnode> and update the content.xml with only the value of it
so the output looks like
<root>
Some Text here
</root>
I wrote a function to perform this but anytime I run it it gives me error as "unexpected token: modify". I was thinking of a way to accomplish this without using functx functions.
xquery version "1.0";
declare namespace request="http://exist-db.org/xquery/request";
declare namespace file="http://exist-db.org/xquery/file";
declare namespace system="http://exist-db.org/xquery/system";
declare namespace util="http://exist-db.org/xquery/util";
declare namespace response="http://exist-db.org/xquery/response";
declare function local:contentUpdate() {
let $root := collection('/lib/repository/content')//root/childNode
let $rmChild := for $child in $root
modify
(
return rename node $child as ''
)
};
local:updateTitle()
Thanks in advance
There are multiple problems with your query:
Updating functions must be declared as updating.
You're calling another function than you defined (probably you didn't notice as there still have been syntax errors).
Rename node expects some element (or processing instruction, attribute) as target, the empty string is not allowed.
At least BaseX doesn't allow updating statements when defining code as XQuery 1.0. Maybe exist doesn't care about this, try adding it if you need to know.
You do not want to rename, but replace all <childnode />s with its contents, use replace node.
This code fixes all these problems:
declare updating function local:contentUpdate() {
let $root := collection('/lib/repository/content')
return
for $i in $root//childnode
return
replace node $i with $i/data()
};
local:contentUpdate()
eXist-db's XQuery Update syntax is documented at http://exist-db.org/exist/update_ext.xml. Note that this syntax predates the release of the XQuery Update Facility 1.0, so the syntax is different and remains unique to eXist-db.
The way to do what you want in eXist-db is as follows:
xquery version "1.0";
declare function local:contentUpdate() {
let $root := doc('/db/lib/repository/content/content.xml')/root
return
update value $root with $root/string()
};
local:contentUpdate()
The primary changes, compared to your original code, are:
Inserted the eXist-db syntax for your update
Prepended '/db' to your collection name, as /db is the root of the database in eXist-db; replaced the collection() call with a doc() call, since you stated you were operating on a single file, content.xml
Changed //root to /root, since "root" is the root element, so the // (descendant-or-self) axis is extraneous
Replaced updateTitle() with the actual name of the function, contentUpdate
Removed the extraneous namespace declarations
For more on why I used $root/string(), see http://community.marklogic.com/blog/text-is-a-code-smell.

Resources