I need to make sure a particular node exists in many XML files. I have to switch the context each time I want to query another document.
Is there any way I can execute XQuery on all documents in the directory without switching the context?
I may be a little late, but most probably the following XQuery will do what you wish, it returns the path to each XML-File that does not contain a specific element:
let $path := "."
for $file in file:list( $path, true(), '*.xml')
let $path := $path || "/" || $file
where not(
exists(fetch:xml($path)/foo/bar[text() = "Text"])
)
return $path
If you were only interested if there were XML-files in a specific that do or do not contain a specific element the following query might be useful:
declare variable $path := "/Users/michael/Code/foo";
every $doc in file:list($path, true(), '*.xml') (: returns a sequence of file-names :)
=> for-each(concat($path,"/", ?)) (: returns a sequence of full paths to each file :)
=> for-each(fetch:xml#1) (: returns a sequence of documents :)
satisfies exists(
$doc/*/*[text() = "Text"]
)
Hope this helps ;-)
Related
I'm trying to get all the filenames from a directory that have a specific string in the XML elements. I'm using this code for that:
for $x in collection('file:///C:/Sricpts/Software20180101-V1.1.1/workspace/Data_PreProcessing?select=*.xml')
where matches($x, 'INSERT INTO')
return $x
However when I run the code it gives that error:
[FODC0002] Resource 'C:/Program Files (x86)/BaseX/file:/file:///C:/Sricpts/Software20180101-V1.1.1/workspace/Data_PreProcessing?select=*.xml' does not exist.
How can I solve this problem?
Many thanks!
If you use BaseX, there is no need to specify a file filter. All XML documents that occur in the specified directory will be added to your collection:
for $x in collection('file:///C:/Sricpts/Software20180101- V1.1.1/workspace/Data_PreProcessing')
where matches($x, 'INSERT INTO')
return $x
If you want to have more control over the documents to be chosen, you can take advantage of the File Module:
let $root := 'C:\Sricpts\Software20180101- V1.1.1\workspace\Data_PreProcessing'
for $path in file:list($root, true(), '*.xml')
let $doc := doc($root || $path)
where matches($doc, 'INSERT INTO')
return $doc
I have to copy an entire project folder inside the MarkLogic server and instead of doing it manually I decided to do it with a recursive function, but is becoming the worst idea I have ever had. I'm having problems with the transactions and with the syntax but being new I don't find a true way to solve it. Here's my code, thank you for the help!
import module namespace dls = "http://marklogic.com/xdmp/dls" at "/MarkLogic/dls.xqy";
declare option xdmp:set-transaction-mode "update";
declare function local:recursive-copy($filesystem as xs:string, $uri as xs:string)
{
for $e in xdmp:filesystem-directory($filesystem)/dir:entry
return
if($e/dir:type/text() = "file")
then dls:document-insert-and-manage($e/dir:filename, fn:false(), $e/dir:pathname)
else
(
xdmp:directory-create(concat(concat($uri, data($e/dir:filename)), "/")),
local:recursive-copy($e/dir:pathname, $uri)
)
};
let $filesystemfolder := 'C:\Users\WB523152\Downloads\expath-ml-console-0.4.0\src'
let $uri := "/expath_console/"
return local:recursive-copy($filesystemfolder, $uri)
MLCP would have been nice to use. However, here is my version:
declare option xdmp:set-transaction-mode "update";
declare variable $prefix-replace := ('C:/', '/expath_console/');
declare function local:recursive-copy($filesystem as xs:string){
for $e in xdmp:filesystem-directory($filesystem)/dir:entry
return
if($e/dir:type/text() = "file")
then
let $source := $e/dir:pathname/text()
let $dest := fn:replace($source, $prefix-replace[1], $prefix-replace[2])
let $_ := xdmp:document-insert($source,
<options xmlns="xdmp:document-load">
<uri>{$dest}</uri>
</options>)
return <record>
<from>{$source}</from>
<to>{$dest}</to>
</record>
else
local:recursive-copy($e/dir:pathname)
};
let $filesystemfolder := 'C:\Temp'
return <results>{local:recursive-copy($filesystemfolder)}</results>
Please note the following:
I changed my sample to the C:\Temp dir
The output is XML only because by convention I try to do this in case I want to analyze results. It is actually how I found the error related to conflicting updates.
I chose to define a simple prefix replace on the URIs
I saw no need for DLS in your description
I saw no need for the explicit creation of directories in your use case
The reason you were getting conflicting updates because you were using just the filename as the URI. Across the whole directory structure, these names were not unique - hence the conflicting update on double inserts of same URI.
This is not solid code:
You would have to ensure that a URI is valid. Not all filesystem paths/names are OK for a URI, so you would want to test for this and escape chars if needed.
Large filesystems would time-out, so spawning in batches may be useful.
A an example, I might gather the list of docs as in my XML and then process that list by spawning a new task for every 100 documents. This could be accomplished by a simple loop over xdmp:spawn-function or using a library such as taskbot by #mblakele
I have a content which is neither a valid HTML nor a XML in my legacy database. Considering the fact, it would be difficult to clean the legacy, I want to tidy this up in MarkLogic using xdmp:tidy. I am currently using ML-8.
<sub>
<p>
<???†?>
</p>
</sub>
I'm passing this content to tidy functionality in a way :
declare variable $xml as node() :=
<content>
<![CDATA[<p><???†?></p>]]>
</content>;
xdmp:tidy(xdmp:quote($xml//text()),
<options xmlns="xdmp:tidy">
<assume-xml-procins>yes</assume-xml-procins>
<quiet>yes</quiet>
<tidy-mark>no</tidy-mark>
<enclose-text>yes</enclose-text>
<indent>yes</indent>
</options>)
As a result it returns :
<p>
<? ?†?>
</p>
Now this result is not the valid xml format (I checked it via XML validator) due to which when I try to insert this XML into the MarkLogic it throws an error saying 'MALFORMED BODY | Invalid Processing Instruction names'.
I did some investigation around PIs but not much luck. I could have tried saving the content without PI but this is also not a valid PI too.
That is because what you think is a PI is in fact not a PI.
From W3C:
2.6 Processing Instructions
[Definition: Processing instructions (PIs) allow documents to contain
instructions for applications.]
Processing Instructions
[16] PI ::= '' Char*)))?
'?>'
[17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' |
'l'))
So the PI name cannot start with ? as in your sample ??†
You probably want to clean up the content before you pass it to tidy.
Like below:
declare variable $xml as node() :=
<content><![CDATA[<p>Hello <???†?>world</p>]]></content>;
declare function local:copy($input as item()*) as item()* {
for $node in $input
return
typeswitch($node)
case text()
return fn:replace($node,"<\?[^>]+\?>","")
case element()
return
element {name($node)} {
(: output each attribute in this element :)
for $att in $node/#*
return
attribute {name($att)} {$att}
,
(: output all the sub-elements of this element recursively :)
for $child in $node
return local:copy($child/node())
}
(: otherwise pass it through. Used for text(), comments, and PIs :)
default return $node
};
xdmp:tidy(local:copy($xml),
<options xmlns="xdmp:tidy">
<assume-xml-procins>no</assume-xml-procins>
<quiet>yes</quiet>
<tidy-mark>no</tidy-mark>
<enclose-text>yes</enclose-text>
<indent>yes</indent>
</options>)
This would do the trick to get rid of all PIs (real and fake PIs)
Regards,
Peter
i try to split my incoming documents using "Information Studio Flows" (MarkLogic v 8.0-1.1). The problem is in "Transform" section.
This is my importing documents. For simplicity i reduce it content to one stwtext-element
<docs>
<stwtext id="RD-10-00258" update="03.2011" seq="RQ-10-00001">
<head>
<ti>
<i>j</i>
</ti>
<ff-list>
<ff id="0103"/>
</ff-list>
</head><p>
Symbol für die
<vw idref="RD-19-04447">Stromdichte</vw>
.
</p>
</stwtext>
</docs>
This is my "xquery transform" content:
xquery version "1.0-ml";
(: Copyright 2002-2015 MarkLogic Corporation. All Rights Reserved. :)
(:
:: Custom action. It must be a CPF action module.
:: Replace this text completely, or use it as a template and
:: add imports, declarations,
:: and code between START and END comment tags.
:: Uses the external variables:
:: $cpf:document-uri: The document being processed
:: $cpf:transition: The transition being executed
:)
import module namespace cpf = "http://marklogic.com/cpf"
at "/MarkLogic/cpf/cpf.xqy";
(: START custom imports and declarations; imports must be in Modules/ on filesystem :)
(: END custom imports and declarations :)
declare option xdmp:mapping "false";
declare variable $cpf:document-uri as xs:string external;
declare variable $cpf:transition as node() external;
if ( cpf:check-transition($cpf:document-uri,$cpf:transition))
then
try {
(: START your custom XQuery here :)
let $doc := fn:doc($cpf:document-uri)
return
xdmp:eval(
for $wpt in fn:doc($doc)//stwtext
return
xdmp:document-insert(
fn:concat("/rom-data/", fn:concat($wpt/#id,".xml")),
$wpt
)
)
(: END your custom XQuery here :)
,
cpf:success( $cpf:document-uri, $cpf:transition, () )
}
catch ($e) {
cpf:failure( $cpf:document-uri, $cpf:transition, $e, () )
}
else ()
by running of snippet, i take the error:
Invalid URI format
and long description of it:
XDMP-URI: (err:FODC0005) fn:doc(fn:doc("/8122584828241226495/12835482492021535301/URI=/content/home/admin/Vorlagen/testing/v10.new-ML.xml")) -- Invalid URI format: "
j
Symbol für die
Stromdichte
"
In /18200382103958065126.xqy on line 37
In xdmp:invoke("/18200382103958065126.xqy", (xs:QName("trgr:uri"), "/8122584828241226495/12835482492021535301/URI=/content/home/admi...", xs:QName("trgr:trigger"), ...), <options xmlns="xdmp:eval"><isolation>different-transaction</isolation><prevent-deadlocks>t...</options>)
$doc = fn:doc("/8122584828241226495/12835482492021535301/URI=/content/home/admin/Vorlagen/testing/v10.new-ML.xml")
In /MarkLogic/cpf/triggers/internal-cpf.xqy on line 179
In execute-action("on-state-enter", "http://marklogic.com/states/initial", "/8122584828241226495/12835482492021535301/URI=/content/home/admi...", (xs:QName("trgr:uri"), "/8122584828241226495/12835482492021535301/URI=/content/home/admi...", xs:QName("trgr:trigger"), ...), <options xmlns="xdmp:eval"><isolation>different-transaction</isolation><prevent-deadlocks>t...</options>, (fn:doc("http://marklogic.com/cpf/pipelines/14379829270688061297.xml")/p:pipeline, fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline), fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline/p:state-transition[1]/p:default-action, fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline/p:state-transition[1])
$caller = "on-state-enter"
$state-or-status = "http://marklogic.com/states/initial"
$uri = "/8122584828241226495/12835482492021535301/URI=/content/home/admi..."
$vars = (xs:QName("trgr:uri"), "/8122584828241226495/12835482492021535301/URI=/content/home/admi...", xs:QName("trgr:trigger"), ...)
$invoke-options = <options xmlns="xdmp:eval"><isolation>different-transaction</isolation><prevent-deadlocks>t...</options>
$pipelines = (fn:doc("http://marklogic.com/cpf/pipelines/14379829270688061297.xml")/p:pipeline, fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline)
$action-to-execute = fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline/p:state-transition[1]/p:default-action
$chosen-transition = fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline/p:state-transition[1]
$raw-module-name = "/18200382103958065126.xqy"
$module-kind = "xquery"
$module-name = "/18200382103958065126.xqy"
In /MarkLogic/cpf/triggers/internal-cpf.xqy on line 320
i thought, it was a problem with "Document setting" in "load" section of "Flow editor"
URI=/content{$path}/{$filename}{$dot-ext}
but if i remove it, i recive the same error.
i have no idea what to do. i am really new. please help
First of all, Information Studio has been deprecated in MarkLogic 8. I would also recommend very much looking in to the aggregate_record feature of MarkLogic Content Pump:
http://docs.marklogic.com/guide/ingestion/content-pump#id_65814
Apart from that, there are several issues with your code. You are calling fn:doc twice, effectively trying to interpret the doc contents as a uri. There is an unnecessary xdmp:eval wrapping the FLWOR statement, which expects a string as first param. I think you can shorten it to (showing inner part of the action only):
(: START your custom XQuery here :)
let $doc := fn:doc($cpf:document-uri)
for $wpt in $doc//stwtext
return
xdmp:document-insert(
fn:concat("/roempp-data/", fn:concat($wpt/#id,".xml")),
$wpt
)
(: END your custom XQuery here :)
HTH!
very many thanks #grtjn and this is my approach. Practically it is the same solution
(: START your custom XQuery here :)
xdmp:log(fn:doc($cpf:document-uri), "debug"),
let $doc := fn:doc($cpf:document-uri)
return
xdmp:eval('
declare variable $doc external;
for $wpt in $doc//stwtext
return (
xdmp:document-insert(
fn:concat("/roempp-data/", fn:concat($wpt/#id,".xml")),
$wpt,
xdmp:default-permissions(),
"roempp-data"
)
)'
,
(xs:QName("doc"), $doc),
<options xmlns="xdmp:eval">
<database>{xdmp:database("roempp-tutorial")}</database>
</options>
)
(: END your custom XQuery here :)
Ok, now it works. It is fine, but i found, that after the loading is over, i see in MarkLogic two documents:
my splited document "/rom-data/RD-10-00258.xml" with one root element "stwtext" (as desired)
origin document "URI=/content/home/admin/Vorlagen/testing/v10.new-ML.xml" with root element "docs"
is it possible to prohibit insert of origin document ?
Hi how can i retrieve the URI of all the documents in directory. I create the below xquery to achieve that but that doesn't helps.
for $i in xdmp:directory("/Test/performance/results/","infinity")
let $k := document-uri(fn:doc($i))
return <li>{$k}</li>
For efficiency you should use the URI lexicon.
cts:uris((), (), cts:directory-query("/Test/performance/results/","infinity"))
See https://docs.marklogic.com/cts:uris for documentation.
How about:
xdmp:directory("/Test/performance/results/","infinity") !
<li>{fn:document-uri(.)}</li>
The posters XQuery is incorrect because $i is already a document not a URI
so running fn:doc($i) is not going to work.
Adam's solution will work (to infinate depth) but can be very slow as it requires
fetching every document. On large databases thats very slow.
If you enable the URI Lexicon then you can do much better.
Here is the source for the xmlsh marklogic "ls" command ... embedded in it is fairly simple XQuery.
https://github.com/DALDEI/xmlsh/blob/master/extensions/marklogic/src/org/xmlsh/marklogic/ls.xsh
The relevent code:
xquery version "1.0-ml";
declare variable $root external;
for $p in fn:distinct-values(
for $d in cts:uris($root,"document",
if( $root eq "" ) then () else cts:directory-query($root,"infinity"))
let $p := substring-after( $d , $root )
where $d ne $root
return
if( contains($p,"/") ) then fn:concat($root , substring-before( $p , "/" ) , "/" )
else
concat($root ,$p)
)
where $p eq "" or $p ne $root
return $p
This lists only the direct children of a directory using the uri lexicon
This program is a bit more complex - it tries to use the uri lexicon but if it fails reverts to the xdmp:directory ... it lists either direct or recursive depending on the -r flag
https://github.com/DALDEI/xmlsh/blob/master/extensions/marklogic/src/org/xmlsh/marklogic/list.xsh