how to change the XML structure using XQuery - xquery

I have a XML file containing Employees Name and the Job done by them.
The structure of the XML file is -
<Employee>AAA#A#B#C#D</Employee>
<Employee>BBB#A#B#C#D</Employee>
<Employee>CCC#A#B#C#D</Employee>
<Employee>DDD#A#B#C#D</Employee>
There are thousands of records and I have to change structure to -
<Employee>
<Name>AAA</Name>
<Jobs>
<Job>A</Job>
<Job>B</Job>
<Job>C</Job>
<Job>D</Job>
</Jobs>
</Employee>
How to get this done using XQuery in BaseX ?

3 XQuery functions, substring-before, substring-after and tokenize are used to get
the required output.
substring-before is used to get the Name.
Similarly, the substring-after is used to get the Job portion.
Then the tokenize function, is used to split the Jobs.
let $data :=
<E>
<Employee>AAA#A#B#C#D</Employee>
<Employee>BBB#A#B#C#D</Employee>
<Employee>CCC#A#B#C#D</Employee>
<Employee>DDD#A#B#C#D</Employee>
</E>
for $x in $data/Employee
return
<Employee>
{<Name>{substring-before($x,"#")}</Name>}
{<Jobs>{
for $tag in tokenize(substring-after($x,"#"),'#')
return
<Job>{$tag}</Job>
}</Jobs>
}</Employee>
HTH...

Tokenizing the string is probably easier and faster. tokenize($string, $pattern) splits $string using the regular expression $pattern, head($seq) returns the first value of a sequence and tail($seq) all but the first. You could also use positional predicates of course, but these functions are easier to read.
for $employee in //Employee
let $tokens := tokenize($employee, '[##]')
return element Employee {
element Name { head($tokens) },
element Jobs {
for $job in tail($tokens)
return element Job { $job }
}
}

Related

Dynamic Path in XQUERY

This is my xml-file:
<?xml version="1.0" encoding="UTF-8"?>
<QQ:Envelope xmlns:QQ="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
<SOAP-ENV:Header>
<RR:ABCInfo xmlns:RR="http://abc.test.de/abc/SOAP-Header/1.0">
<RR:Version>2.2.2.2</RR:Version>
<RR:BuildRevision>3333</RR:BuildRevision>
<RR:BuildTimestamp>2019-01-01T00:00:00.000+02:00</RR:BuildTimestamp>
<RR:Start>2019-01-01T10:10:10.101+02:00</RR:Start>
<RR:End>2019-01-01T11:11:11.111+02:00</RR:End>
<RR:Something>2.222 sek.</RR:Something>
<RR:Anything/>
</RR:ABCInfo>
<work:WorkContext xmlns:work="http://test.com/">1234567890abcdefghijklmnopqrstuvwxyz</work:WorkContext>
</SOAP-ENV:Header>
<QQ:Body>
<TT:testA xmlns:TT="http://abc.test.de/XYZ/2.0.1" xmlns:RR="http://abc.test.de/abc/abcdefgh/1.0">
<TT:testB>
<TT:testC>
<TT:testD>
<TT:testE id="1234567" quellID="09876543">
<TT:data>urn:de:abc:test:whatever</TT:data>
<TT:changeDate>2019-02-02T02:02:02.020+02:00</TT:changeDate>
<TT:part1 listURI="urn:de:abc:codeliste:555" listVersionID="V12">
<code>555_777</code>
<name>Fischers Fritze</name>
</TT:part1>
<TT:piece2>Frische Fische fischen</TT:piece2>
<TT:begin>
<TT:date>20191231</TT:date>
</TT:begin>
</TT:testE>
</TT:testD>
</TT:testC>
</TT:testB>
</TT:testA>
</QQ:Body>
</QQ:Envelope>
I have a XQuery, where I have to return XML. The first element in the returning XML is "result". The other elements in the returning XML should be dynamically created.
I get 2 sequences from outside, though I have made 2 fix Sequences in the following example to test it.
In Sequence No 1 I get the names for the other elements.
In Sequence No 2 I get the related path to the element names in Sequence 1.
I open the XML file an read a path (there might be several elements, though in my example is only one.
Then I want to process this result in a loop and return the dynamic elements.
If I access the path with a fix value (variable $c in the following code) I get the correct value, but then I must know the elements in Sequence 1 and the path in Sequence 2.
If I concatenate the path then I get the value from all elements.
This is my XQuery Code:
declare namespace TT="http://abc.test.de/XYZ/2.0.1";
declare namespace QQ="http://schemas.xmlsoap.org/soap/envelope/";
declare function local:getValue($path) as xs:string {
if (fn:exists($path)) then
(
data($path)
) else (
""
)
};
let $a := ('part1', 'piece2', 'beginDate')
let $b := ('TT:part1/name','TT:piece2', 'TT:begin/TT:date')
for $x in doc("Test.XML")/QQ:Envelope/QQ:Body/TT:testA/TT:testB/TT:testC/TT:testD/TT:testE
return <result>
{
for $item at $ind in $a
let $c := local:getValue($x/TT:part1/name)
let $d := local:getValue($x || concat("/", $b[$ind]))
return element { $item } {$c, " --- ", $d}
}
</result>
Is there a possibility to access the path dynamically?
Thank you in advance.
http://www.xqueryfunctions.com/xq/functx_dynamic-path.html could help - at least did it help ME ;)
The functx:dynamic-path function dynamically evaluates a simple path expression. The function only supports element names and attribute names preceded by #, separated by single slashes. The names can optionally be prefixed, but they must use the same prefix that is used in the input document. It does not support predicates, other axes, or other node kinds. Note that most processors have an extension function that evaluates path expressions dynamically in a much more complete way.

MarkLogic - How to insert element into XML

How to insert the node in XML.
let $a := <a><b>bbb</b></a>)
return
xdmp:node-insert-after(doc("/example.xml")/a/b, <c>ccc</c>);
Expected Output:
<a><c>ccc</c><b>bbb</b></a>
Please help to get the output.
You should be using xdmp:node-insert-before I believe in the following way:
xdmp:document-insert('/example.xml', <a><b>bbb</b></a>);
xdmp:node-insert-before(fn:doc('/example.xml')/a/b, <c>ccc</c>);
fn:doc('/example.xml');
(: returns <a><c>ccc</c><b>bbb</b></a> :)
Nodes are immutable, so in-memory mutation can only be done by creating a new copy.
The copy can use the unmodified contained nodes from the original:
declare function local:insert-after(
$prior as node(),
$inserted as node()+
) as element()
{
let $container := $prior/parent::element()
return element {fn:node-name($container)} {
$container/namespace::*,
$container/attribute(),
$prior/preceding-sibling::node(),
$prior,
$inserted,
$prior/following-sibling::node()
}
};
let $a := <a><b>bbb</b></a>
return local:insert-after($a//b, <c>ccc</c>)
Creating a copy in memory and then inserting the copy is faster than inserting and modifying a document in the database.
Depending on how many documents are inserted, the difference could be significant.
There are community libraries for copying with changes, but sometimes it's as easy to write a quick function (recursive where necessary).
You can use below code to insert the element into the XML:
xdmp:node-insert-child(fn:doc('directory URI'),element {fn:QName('http://yournamesapce','elementName') }{$elementValue})
Here we use fn:QName to remove addition of xmlns="" in added node.

XQuery: Fill array the FLWOR way

I have an array and I want to fill it with strings taken from specific XML nodes, like in this pseudocode example:
let $array := array {}
for $child in $collection
where contains(data($child), "Hey")
do $array := array:append($array, data($child))
How would correct code look like to perform such an operation?
So if I have this XML
<root>
<child>Hey</child>
<child>Ho</child>
<child>Hey Ho</child>
</root>
I expect the array to be
array ["Hey", "Hey Ho"]
XQuery is a functional language. As such, variables cannot be reassigned once they have been declared.
The following code should do the trick:
array {
for $child in $collection
where contains(data($child/node1), "Hey")
return $child/node2
}
Please note that the native XQuery data type for values is a sequence. Depending on your use case, maybe you don’t need arrays at all.

Multiple keyword search in xquery with tokenize and match

My attempt to ask this before was apparently too convoluted, trying again!
I am composing a search in Xquery. In one of the fields (title) it should be possible to enter multiple keywords. At the moment only ONE keyword works. When there is more than one there is the error ERROR XPTY0004: The actual cardinality for parameter 1 does not match the cardinality declared in the function's signature: concat($atomizable-values as xs:anyAtomicType?, ...) xs:string?. Expected cardinality: zero or one, got 2.
In my xquery I am trying to tokenize the keywords by \s and then match them individually. I think this method is probably false but I am not sure what other method to use. I am obviously a beginner!!
Here is the example XML to be searched:
<files>
<file>
<identifier>
<institution>name1</institution>
<idno>signature</idno>
</identifier>
<title>Math is fun</title>
</file>
<file>
<identifier>
<institution>name1</institution>
<idno>signature1</idno>
</identifier>
<title>philosophy of math</title>
</file>
<file>
<identifier>
<institution>name2</institution>
<idno>signature2</idno>
</identifier>
<title>i like cupcakes</title>
</file>
</files>
Here is the Xquery with example input 'math' for the search field title and 'name1' for the search field institution. This works, the search output are the titles 'math is fun' and 'philosophy of math'. What doesn't work is if you change the input ($title) to 'math fun'. Then you get the error message. The desired output is the title 'math is fun'.
xquery version "3.0";
let $institution := 'name1'
let $title := 'math' (:change to 'math fun' and doesn't work anymore, only a single word works:)
let $title-predicate :=
if ($title)
then
if (contains($title, '"'))
then concat("[contains(lower-case(title), '", replace($title, '["]', ''), "')]") (:This works fine:)
else
for $title2 in tokenize($title, '\s') (:HERE IS THE PROBLEM, this only works when the input is a single word, for instance 'math' not 'math fun':)
return
concat("[matches(lower-case(title), '", $title2, "')]")
else ()
let $institution-predicate := if ($institution) then concat('[lower-case(string-join(identifier/institution))', " = '", $institution, "']") else ()
let $eval-string := concat
("doc('/db/Unbenannt.xml')//file",
$institution-predicate,
$title-predicate
)
let $records := util:eval($eval-string)
let $test := count($records)
let $content :=
<inner_container>
<div>
<h2>Search Results</h2>
<ul>
{
for $record in $records
return
<li id="searchList">
<span>{$record//institution/text()}</span> <br/>
<span>{$record//title/text()}</span>
</li>
}
</ul>
</div>
</inner_container>
return
$content
You have to wrap your FLWOR expression with string-join():
string-join(
for $title2 in tokenize($title, '\s')
return
concat("[matches(lower-case(title), '", $title2, "')]")
)
If tokenize($title) returns a sequence of strings, then
for $title2 in tokenize($title, '\s')
return concat("[matches(lower-case(title), '", $title2, "')]")
will also return a sequence of strings
Therefore $title-predicate will be a sequence of strings, and you can't supply a sequence of strings as one of the arguments to concat().
So it's clear what's wrong, but fixing it requires a deeper understanding of your query than I have time to acquire.
I find it hard to believe that the approach of generating a query as a string and then doing dynamic evaluation of that query is really necessary.

How to tidy-up Processing Instructions in Marklogic

I have a content which is neither a valid HTML nor a XML in my legacy database. Considering the fact, it would be difficult to clean the legacy, I want to tidy this up in MarkLogic using xdmp:tidy. I am currently using ML-8.
<sub>
<p>
<???†?>
</p>
</sub>
I'm passing this content to tidy functionality in a way :
declare variable $xml as node() :=
<content>
<![CDATA[<p><???†?></p>]]>
</content>;
xdmp:tidy(xdmp:quote($xml//text()),
<options xmlns="xdmp:tidy">
<assume-xml-procins>yes</assume-xml-procins>
<quiet>yes</quiet>
<tidy-mark>no</tidy-mark>
<enclose-text>yes</enclose-text>
<indent>yes</indent>
</options>)
As a result it returns :
<p>
<? ?†?>
</p>
Now this result is not the valid xml format (I checked it via XML validator) due to which when I try to insert this XML into the MarkLogic it throws an error saying 'MALFORMED BODY | Invalid Processing Instruction names'.
I did some investigation around PIs but not much luck. I could have tried saving the content without PI but this is also not a valid PI too.
That is because what you think is a PI is in fact not a PI.
From W3C:
2.6 Processing Instructions
[Definition: Processing instructions (PIs) allow documents to contain
instructions for applications.]
Processing Instructions
[16] PI ::= '' Char*)))?
'?>'
[17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' |
'l'))
So the PI name cannot start with ? as in your sample ??†
You probably want to clean up the content before you pass it to tidy.
Like below:
declare variable $xml as node() :=
<content><![CDATA[<p>Hello <???†?>world</p>]]></content>;
declare function local:copy($input as item()*) as item()* {
for $node in $input
return
typeswitch($node)
case text()
return fn:replace($node,"<\?[^>]+\?>","")
case element()
return
element {name($node)} {
(: output each attribute in this element :)
for $att in $node/#*
return
attribute {name($att)} {$att}
,
(: output all the sub-elements of this element recursively :)
for $child in $node
return local:copy($child/node())
}
(: otherwise pass it through. Used for text(), comments, and PIs :)
default return $node
};
xdmp:tidy(local:copy($xml),
<options xmlns="xdmp:tidy">
<assume-xml-procins>no</assume-xml-procins>
<quiet>yes</quiet>
<tidy-mark>no</tidy-mark>
<enclose-text>yes</enclose-text>
<indent>yes</indent>
</options>)
This would do the trick to get rid of all PIs (real and fake PIs)
Regards,
Peter

Resources