How to match space in MarkLogic using CTS functions? - xquery

I need to search those elements who have space " " in their attributes.
For example:
<unit href="http:xxxx/unit/2 ">
Suppose above code have space in the last for href attribute.
I have done this using FLOWER query. But I need this to be done using CTS functions. Please suggest.
For FLOWER query I have tried this:
let $x := (
for $d in doc()
order by $d//id
return
for $attribute in data($d//#href)
return
if (fn:contains($attribute," ")) then
<td>{(concat( "id = " , $d//id) ,", data =", $attribute)}</td>
else ()
)
return <tr>{$x}</tr>
This is working fine.
For CTS I have tried
let $query :=
cts:element-attribute-value-query(xs:QName("methodology"),
xs:QName("href"),
xs:string(" "),
"wildcarded")
let $search := cts:search(doc(), $query)
return fn:count($search)

Your query is looking for " " to be the entirety of the value of the attribute. If you want to look for attributes that contain a space, then you need to use wildcards. However, since there is no indexing of whitespace except for exact value queries (which are by definition not wildcarded), you are not going to get a lot of index support for that query, so you'll need to run this as a filtered search (which you have in your code above) with a lot of false positives.
You may be better off creating a string range index on the attribute and doing value-match on that.

Related

XSLT-style mini transformation in Xquery?

At the moment in Xquery 3.1 (in eXist 4.7) I receive XML fragments that look like the following (from eXist's Lucene full text search):
let $text :=
<tei:text>
<front>
<tei:div>
<tei:listBibl>
<tei:bibl>There is some</tei:bibl>
<tei:bibl>text in certain elements</tei:bibl>
</tei:listBibl>
</tei:div>
<tei:div>
<tei:listBibl>
<tei:bibl>which are subject <exist:match>to</exist:match> a Lucene search</tei:bibl>
<tei:bibl></tei:bibl>
<tei:listBibl>
</tei:div>
<tei:front>
<tei:body>
<tei:p>and often produces</tei:p>
<tei:p>a hit.</tei:p>
<tei:body>
<tei:text>
Currently I have Xquery send this fragment to an XSLT stylesheet in order to transform it into HTML like this:
<td>...elements which are subject <span class="search-hit">to</span> a Lucene search and often p...
Where the stylesheet's job is to return 30 characters of text before and after <exist:match/> and put the content of <exist:match/> into a span. There is only one <exist:match/> per transformation.
This all works fine. However, it's occurred to me that it is a very small job with effectively a single transformation of only one element, the rest being a sort of string-join. I therefore wonder if this can't be done efficiently in Xquery.
In trying to do this, I'm can't seem to find a way to handle the string content up to the <exist:match/> and then the string content after <exist:match/>. My idea is, in pseudo code, to output a result like:
let $textbefore := some function to get the text before <exist:match/>
let $textafter := some function to get text before <exist:match/>
return <td>...{$textbefore}
<span class="search-hit">
{$text//exist:match/text()}
</span> {$textafter}...</td>
Is this even worth doing in Xquery vs the current Xquery -> XSLT pipeline I have?
Many thanks.
I think it can be done as
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare namespace tei = "http://example.com/tei";
declare namespace exist = "http://example.com/exist";
declare option output:method 'html';
let $text :=
<tei:text>
<tei:front>
<tei:div>
<tei:listBibl>
<tei:bibl>There is some</tei:bibl>
<tei:bibl>text in certain elements</tei:bibl>
</tei:listBibl>
</tei:div>
<tei:div>
<tei:listBibl>
<tei:bibl>which are subject <exist:match>to</exist:match> a Lucene search</tei:bibl>
<tei:bibl></tei:bibl>
</tei:listBibl>
</tei:div>
</tei:front>
<tei:body>
<tei:p>and often produces</tei:p>
<tei:p>a hit.</tei:p>
</tei:body>
</tei:text>
,
$match := $text//exist:match,
$text-before-all := normalize-space(string-join($match/preceding::text(), ' ')),
$text-before := substring($text-before-all, string-length($text-before-all) - 30),
$text-after := substring(normalize-space(string-join($match/following::text(), ' ')), 1, 30)
return
<td>...{$text-before}
<span class="search-hit">
{$match/text()}
</span> {$text-after}...</td>
which is not really much of a query in XQuery either but just some XPath selection plus some possibly expensive string joining and extraction on the preceding and following axis.

Conditionally creating formatted string from elements and attributes in XQuery

I'm trying to convert an xml document into a specific tab separated flat file structure. Most of the elements can be mapped to single columns or concatenated simply using fn:string-join(), but I have some elements where the mapping is more complicated. An example element looks like this:
<record>
<details>
<passports>
<passport country="">0018061/104</passport>
<passport country="UK">0354761445</passport>
<passport country="USA">M001806145</passport>
</passports>
</details>
<record>
and I need to create a column that looks like this:
0018061/104;(UK) 0354761445;(USA) M001806145
so if the #country attribute is not "" it is put in (), otherwise it is omitted. The element value follows and each element is separated by ;.
Here's what I have done so far:
for $record in //record
return concat($record/#uid/string(),
(: ... other columns ... :)
" ", <S>{for $r in //$record/details/passports/passport
return concat("(", $r/#country, ") ", $r, ";")}</S>/string()
,"
")
I'm sure there's an easier way, but this almost does the job - it produces:
() 0018061/104;(UK) 0354761445;(USA) M001806145
Ideally I'd like to know the correct way to do this, otherwise just removing the empty brackets where #country="" would suffice.
Use an if clause right in the outer concat (I added some newlines for better readability in the answer, you can of course remove them as you wish):
concat(
if ($r/#country != "")
then concat("(", $r/#country, ") ")
else "",
$r,
";"
)
New result of the query:
0018061/104; (UK) 0354761445; (USA) M001806145;
You could also go for an implicit loop
/record/details/passports/passport/string-join(
(
" ",
if (#country != "")
then "(" || #country || ") "
else (),
.
), ""
)
or explicitly loop over the results and still have a cleaner query (by replacing the concatenation operator || by respective concat(...) calls, you would stay XQuery 1.0 compatible):
for $record in /record/details/passports/passport
return (
" " || (
if ($record/#country != "")
then "(" || $record/#country || ") "
else ()
) || $record
)
Both cases use the implicit newlines inserted by BaseX in-between tokens, alternatively you can of course add them as you had before.

XQuery "flattening" an element

I am extracting data from an XML file and I need to extract a delimited list of sub-elements. I have the following:
for $record in //record
let $person := $record/person/names
return concat($record/#uid/string()
,",", $record/#category/string()
,",", $person/first_name
,",", $person/last_name
,",", $record/details/citizenships
,"
")
The element "citizenships" contains sub-elements called "citizenship" and as the query stands it sticks them all together in one string, e.g. "UKFrance". I need to keep them in one string but separate them, e.g. "UK|France".
Thanks in advance for any help!
fn:string-join($arg1 as xs:string*, $arg2 as xs:string) is what you're looking for here.
In your currently desired usage, that would look something like the following:
fn:string-join($record/details/citizenships/citizenship, "|")
Testing outside your document, with:
fn:string-join(("UK", "France"), "|")
...returns:
UK|France
Notably, ("UK", "France") is a sequence of strings, just as a query returning multiple citizenships would likewise be a sequence (the entries in which will be evaluated for their string value when passed to fn:string-join(), which is typed as taking a sequence of strings for its first argument).
Consider the following (simplified) query:
declare context item := document { <root>
<record uid="1">
<person>
<citizenships>
<citizenship>France</citizenship>
<citizenship>UK</citizenship>
</citizenships>
</person>
</record>
</root> };
for $record in //record
return concat(fn:string-join($record//citizenship, "|"), "
")
...and its output:
France|UK

Xquery group by on 2 tags

Below is the XML part of my data.
<A>
<a><Type>Fruit</Type><Name>Banana</Name></a>
<a><Type>Fruit</Type><Name>Orange</Name></a>
<a><Type>Fruit</Type><Name>Apple</Name></a>
<a><Type>Fruit</Type><Name>Lemon</Name></a>
<a><Type>Cars</Type><Name>Toyota</Name></a>
<a><Type>Cars</Type><Name>Lamborghini</Name></a>
<a><Type>Cars</Type><Name>Renault</Name></a>
</A>
Out put as -
<a>Fruits-Banana,Orange,Apple,Lemon</a>
<a>Cars-Toyota,Lamborghini,Renault</a>
I tried to get the required output by all in vain. I tried 'group by` clause too, but getting errors.
any help?
let $x:=
<A>
<a><Type>Fruit</Type><Name>Banana</Name></a>
<a><Type>Fruit</Type><Name>Orange</Name></a>
<a><Type>Fruit</Type><Name>Apple</Name></a>
<a><Type>Fruit</Type><Name>Lemon</Name></a>
<a><Type>Cars</Type><Name>Toyota</Name></a>
<a><Type>Cars</Type><Name>Lamborghini</Name></a>
<a><Type>Cars</Type><Name>Renault</Name></a>
</A>
for $z in distinct-values($x//a/Type)
let $c := $x//a[Type=$z]/Name
return
<a>{concat($z, "-", string-join($c, ","))}</a>
First for is taking the distinct values of the tag Type, then for each distinct value of this, the respective values of all the Name tags are derived.
Then using the concat function I have concatenated the Type text with the string generated by string-join, used to add/append the Name and , (comma).
HTH :)

xQuery substring problem

I now have a full path for a file as a string like:
"/db/Liebherr/Content_Repository/Techpubs/Topics/HyraulicPowerDistribution/Released/TRN_282C_HYD_MOD_1_Drive_Shaft_Rev000.xml"
However, now I need to take out only the folder path, so it will be the above string without the last back slash content like:
"/db/Liebherr/Content_Repository/Techpubs/Topics/HyraulicPowerDistribution/Released/"
But it seems that the substring() function in xQuery only has substring(string,start,len) or substring(string,start), I am trying to figure out a way to specify the last occurence of the backslash, but no luck.
Could experts help? Thanks!
Try out the tokenize() function (for splitting a string into its component parts) and then re-assembling it, using everything but the last part.
let $full-path := "/db/Liebherr/Content_Repository/Techpubs/Topics/HyraulicPowerDistribution/Released/TRN_282C_HYD_MOD_1_Drive_Shaft_Rev000.xml",
$segments := tokenize($full-path,"/")[position() ne last()]
return
concat(string-join($segments,'/'),'/')
For more details on these functions, check out their reference pages:
fn:tokenize()
fn:string-join()
fn:replace can do the job with a regular expression:
replace("/db/Liebherr/Content_Repository/Techpubs/Topics/HyraulicPowerDistribution/Released/TRN_282C_HYD_MOD_1_Drive_Shaft_Rev000.xml",
"[^/]+$",
"")
This can be done even with a single XPath 2.0 (subset of XQuery) expression:
substring($fullPath,
1,
string-length($fullPath) - string-length(tokenize($fullPath, '/')[last()])
)
where $fullPath should be substituted with the actual string, such as:
"/db/Liebherr/Content_Repository/Techpubs/Topics/HyraulicPowerDistribution/Released/TRN_282C_HYD_MOD_1_Drive_Shaft_Rev000.xml"
The following code tokenizes, removes the last token, replaces it with an empty string, and joins back.
string-join(
(
tokenize(
"/db/Liebherr/Content_Repository/Techpubs/Topics/HyraulicPowerDistribution/Released/TRN_282C_HYD_MOD_1_Drive_Shaft_Rev000.xml",
"/"
)[position() ne last()],
""
),
"/"
)
It seems to return the desired result on try.zorba-xquery.com. Does this help?

Resources