I faced a problem with functx module dealing with strings with end of line characters. The following code should work (?)
declare %unit:test function test:substring-before-last() {
let $title := 'Something
blah other'
let $expected-title := 'Something
blah'
return unit:assert-equals(functx:substring-before-last($title, ' other'),
$expected-title)
};
However it gives a failure
"Something
blah" expected, "Something
blah other"
returned.
Removing line breaking makes the test working. What I don't understand? :)
BR
I think the issue is in the definition or implementation of the functx function http://www.xqueryfunctions.com/xq/functx_substring-before-last.html:
declare function functx:substring-before-last
( $arg as xs:string? ,
$delim as xs:string ) as xs:string {
if (matches($arg, functx:escape-for-regex($delim)))
then replace($arg,
concat('^(.*)', functx:escape-for-regex($delim),'.*'),
'$1')
else ''
} ;
and the regular expression dot . matching and the replace default "If the input string contains no substring that matches the regular expression, the result of the function is a single string identical to the input string."; if you add the flags m argument
declare function functx:substring-before-last
( $arg as xs:string? ,
$delim as xs:string ) as xs:string {
if (matches($arg, functx:escape-for-regex($delim)))
then replace($arg,
concat('^(.*)', functx:escape-for-regex($delim),'.*'),
'$1', 'm')
else ''
} ;
you get the right match and replacement and comparison.
Unfortunately, this does not work either reliable. Take this example, where the word 'end is being removed from both lines, not just the last line:
Related
Im trying to implement base64 coding in a very simple way. In my approach (lets for a second put away whether its appropriate or not) I need to reverse strings and then concate them. After that this concated string is used in substring function. Strings are joined properly but when I use substring basex seems to lose it.
Funny thing is substring works for well for all indexes starting at 8. So substring($string, 1, 8) and higher gives correct output. But everything below that is messed up. Starting with one disappeared number: substring($string, 1, 7 (and below) ) results in 6 length string.
Moreover substring can start only with 1st or 0 index. Anything greater results in empty return.
declare variable $array := [];
declare function bs:encode
( $input as xs:string ) {
bs:integer-to-binary(string-to-codepoints($input), "", $array)
} ;
declare function bs:integer-to-binary
( $input as xs:integer*, $string as xs:string, $array as array(xs:string) ) {
let $strings :=
for $i in $input
return
if ($i != 0)
then if ($i mod 2 = 0)
then bs:integer-to-binary(xs:integer($i div 2), concat($string, 0), $array)
else bs:integer-to-binary(xs:integer($i div 2), concat($string, 1), $array)
else if ($i <= 0)
then array:append($array, $string)
return bs:check-if-eight($strings)
} ;
declare function bs:check-if-eight
( $strings as item()+ ) {
let $fullBinary :=
for $string in $strings
return if (string-length($string) < 8)
then bs:check-if-eight(concat($string, 0))
else $string (: add as private below :)
return bs:concat-strings($fullBinary)
} ;
declare function bs:concat-strings
( $strings as item()+ ) {
let $firstStringToConcat := functx:reverse-string($strings[position() = 1])
let $secondStringToConcat := functx:reverse-string($strings[position() = 2])
let $thirdStringToConcat := functx:reverse-string($strings[position() = 3])
let $concat :=
concat
($firstStringToConcat,
$secondStringToConcat,
$thirdStringToConcat)
(: this returns correct string of binary value for Cat word :)
return bs:divide-into-six($concat)
} ;
declare function bs:divide-into-six
( $binaryString as xs:string) {
let $sixBitString := substring($binaryString, 1, 6)
(: this should return 010000 instead i get 000100 which is not even in $binaryString at all :)
return $sixBitString
} ;
bs:encode("Cat")
I expect first six letters from string (010000) instead I get some random sequence I guess (00100). The whole module is meant to encode strings into base64 format but for now (the part i uploaded) should just throw first six bits for 'C'
Alright so I figured it out I guess.
First of all in function concat-strings I changed concat to fn:string-join. It allowed me to pass as an argument symbol that separates joined strings.
declare function bs:concat-strings ( $strings as item()+ ) {
let $firstStringToConcat := xs:string(functx:reverse-string($strings[position() = 1]))
let $secondStringToConcat := xs:string(functx:reverse-string($strings[position() = 2]))
let $thirdStringToConcat := xs:string(functx:reverse-string($strings[position() = 3]))
let $concat :=
****fn:string-join(****
($firstStringToConcat,
$secondStringToConcat,
$thirdStringToConcat),****'X'****)
return bs:divide-into-six($concat) } ;
I saw that my input looked like this:
XXXXXXXX01000011XXXXXXXXXXXXXXXXX01100001XXXXXXXXXXXXXXXXX01110100XXXXXXXX
Obviously it had to looping somewhere without clear for loop and as I novice to Xquery i must have been missed that. And indeed. I found it in check-if-eight function:
> declare function bs:check-if-eight ( $strings as item()+ ) {
> **let $fullBinary :=**
> for $string in $strings
> return if (string-length($string) < 8)
> then bs:check-if-eight(concat($string, 0))
> else $string (: add as private below :)
> **return bs:concat-strings($fullBinary)** } ;
Despite being above FOR keyword, $fullBinary variable was in a loop and produced empty spaces(?) and it was clearly shown when i used X as a separator.
DISCLAIMER: I thought about this before and used functx:trim but for some reason it doesnt work like I expected. So it might not for you too if having similar issue.
At this point it was clear that let $fullBinary cannot be bided in FLWR statement at least can't trigger concat-strings function. I changed it and now it produces only string and now im trying to figure out new sequence of running whole module but I think the main problem here is solved.
I want to pad a string with whitespaces to make it of certain length in XQuery on the OSB platform.
I tried string-join and concat, but none of them pad whitespaces as they consider them as empty string.
Sample input:
<root-element xmlns="">
<string-to-pad>abc</string-to-pad>
</root-element>
**Expected output:**
<root-element>
<paddedString>abc </paddedString>
</root-element>
Yes not much to say without a code sample. This is how the functx library, solves your problem in XQuery. Either import it as a module (its uri is stable), or google the function name.
declare namespace functx = "http://www.functx.com";
declare function functx:pad-string-to-length
( $stringToPad as xs:string? ,
$padChar as xs:string ,
$length as xs:integer ) as xs:string {
substring(
string-join (
($stringToPad, for $i in (1 to $length) return $padChar)
,'')
,1,$length)
} ;
see this fiddle: http://xqueryfiddle.liberty-development.net/jyyiVhe/2
Will generate the desired output but Oracle Jdev will not display it with proper spacing.
My question seems trivial but could not figure out how to parse a string that contains a list of dates separated by commas. The parsing part of individual dates is not an issue but the empty values are. The trouble is the order of the dates are significant and some dates can be omitted. The dates are expected to be formatted in YYYY-mm-dd
So, the following are valid inputs and expected return values:
,2000-12-12,2012-05-03, ➔ ( NULL, 2000-12-12, 2012-05-03, NULL )
2000-12-12,,2012-05-03 ➔ ( 2000-12-12, NULL, 2012-05-03 )
And here is my function signature
declare function local:assert-date-array-param(
$input as xs:string
, $accept-nulls as xs:boolean?
) as xs:date*
I recognised the problem after realizing that there seems to be no equivalent of of NULL in XQuery for the returned values as placeholders for omitted dates, if you want to return a sequence, that is. Because empty sequences wrapped inside sequences are flattened to nothing.
I suppose, my fallback would be to use date like 1900-01-01 as the placeholder or return a map instead of a sequence, but I sure hope to find a more elegant way
Thank you,
K.
PS. I am working with MarkLogic v8 (and v9 soon) and any solution should execute with their XQuery processor.
UPDATE: thanks for both answers, in the end I chose to go with a placeholder date as XQuery works so nicely with sequences and anything else would have required some changes in other places. But the problem remains for the cases where the required return values are numerics. In this case, use of placeholder values would probably be not feasible. A null literal for xs:anyAtomicType would have solved the problem nicely, but alas.
You could consider returning a json:array(), or an array-node{} with null-node{}'s inside. But perhaps a null-date placeholder is not as bad as it sounds:
declare variable $null-date := xs:date("0001-01-01");
declare function local:assert-date-array-param(
$input as xs:string,
$accept-nulls as xs:boolean?
) as xs:date*
{
for $d in fn:tokenize($input, "\s*,\s*")
return
if ($d eq "") then
if ($accept-nulls) then
$null-date
else
fn:error(xs:QName("NULL-NOT-ALLOWED"), "Date is required")
else
if ($d castable as xs:date) then
xs:date($d)
else if ($d castable as xs:dateTime) then
xs:date(xs:dateTime($d))
else
fn:error(xs:QName("INVALID-DATE"), "Invalid date format: " || $d)
};
declare function local:print-date-array($dates) {
string-join(for $d in $dates return if ($d eq $null-date) then "NULL" else fn:string($d), ", ")
};
local:print-date-array(
local:assert-date-array-param(",2000-12-12,2012-05-03,", fn:true())
),
local:print-date-array(
local:assert-date-array-param("2000-12-12,,2012-05-03", fn:true())
)
HTH!
Multiple options .. in addition to above.
return a sequence of functions which when invoked return dates
for $i in string-tokenize-to-sequence-of-strings()
let $dt := my-parse-date($i)
return function() { $dt ;}
or
return function() { return my-parse-date($i) ;
return tokenized and validated but not parsed strings. Use "" for 'not valid', e.g.:
( "2014-01-22","","2017-03-30","" )
then there's arrays', maps, arrays of maps, and ... XML
parseFunction() as xs:element()*:
for ... return <date>{ parse-and-validate($value) } </date>
My attempt to ask this before was apparently too convoluted, trying again!
I am composing a search in Xquery. In one of the fields (title) it should be possible to enter multiple keywords. At the moment only ONE keyword works. When there is more than one there is the error ERROR XPTY0004: The actual cardinality for parameter 1 does not match the cardinality declared in the function's signature: concat($atomizable-values as xs:anyAtomicType?, ...) xs:string?. Expected cardinality: zero or one, got 2.
In my xquery I am trying to tokenize the keywords by \s and then match them individually. I think this method is probably false but I am not sure what other method to use. I am obviously a beginner!!
Here is the example XML to be searched:
<files>
<file>
<identifier>
<institution>name1</institution>
<idno>signature</idno>
</identifier>
<title>Math is fun</title>
</file>
<file>
<identifier>
<institution>name1</institution>
<idno>signature1</idno>
</identifier>
<title>philosophy of math</title>
</file>
<file>
<identifier>
<institution>name2</institution>
<idno>signature2</idno>
</identifier>
<title>i like cupcakes</title>
</file>
</files>
Here is the Xquery with example input 'math' for the search field title and 'name1' for the search field institution. This works, the search output are the titles 'math is fun' and 'philosophy of math'. What doesn't work is if you change the input ($title) to 'math fun'. Then you get the error message. The desired output is the title 'math is fun'.
xquery version "3.0";
let $institution := 'name1'
let $title := 'math' (:change to 'math fun' and doesn't work anymore, only a single word works:)
let $title-predicate :=
if ($title)
then
if (contains($title, '"'))
then concat("[contains(lower-case(title), '", replace($title, '["]', ''), "')]") (:This works fine:)
else
for $title2 in tokenize($title, '\s') (:HERE IS THE PROBLEM, this only works when the input is a single word, for instance 'math' not 'math fun':)
return
concat("[matches(lower-case(title), '", $title2, "')]")
else ()
let $institution-predicate := if ($institution) then concat('[lower-case(string-join(identifier/institution))', " = '", $institution, "']") else ()
let $eval-string := concat
("doc('/db/Unbenannt.xml')//file",
$institution-predicate,
$title-predicate
)
let $records := util:eval($eval-string)
let $test := count($records)
let $content :=
<inner_container>
<div>
<h2>Search Results</h2>
<ul>
{
for $record in $records
return
<li id="searchList">
<span>{$record//institution/text()}</span> <br/>
<span>{$record//title/text()}</span>
</li>
}
</ul>
</div>
</inner_container>
return
$content
You have to wrap your FLWOR expression with string-join():
string-join(
for $title2 in tokenize($title, '\s')
return
concat("[matches(lower-case(title), '", $title2, "')]")
)
If tokenize($title) returns a sequence of strings, then
for $title2 in tokenize($title, '\s')
return concat("[matches(lower-case(title), '", $title2, "')]")
will also return a sequence of strings
Therefore $title-predicate will be a sequence of strings, and you can't supply a sequence of strings as one of the arguments to concat().
So it's clear what's wrong, but fixing it requires a deeper understanding of your query than I have time to acquire.
I find it hard to believe that the approach of generating a query as a string and then doing dynamic evaluation of that query is really necessary.
I'm trying to generate a treeview from a collection (filesystem). Unfortunately some Files have special characters like ü ä and ö. And I'd like to have them html encoded as ä
When I get them from the variable, they are URL encoded. First I decode them to UTF-8 and then .... i don't know how to go further.
<li>{util:unescape-uri($child, "UTF-8")}
The function util:parse is doing the exact opposite from that what I want.
Here is the recursive function:
xquery version "3.0";
declare namespace ls="ls";
declare option exist:serialize "method=html media-type=text/html omit-xml-declaration=yes indent=yes";
declare function ls:ls($collection as xs:string, $subPath as xs:string) as element()* {
if (xmldb:collection-available($collection)) then
(
for $child in xmldb:get-child-collections($collection)
let $path := concat($collection, '/', $child)
let $sPath := concat($subPath, '/', $child)
order by $child
return
<li>{util:unescape-uri($child, "UTF-8")}
<ul>
{ls:ls($path,$sPath)}
</ul>
</li>,
for $child in xmldb:get-child-resources($collection)
let $sPath := concat($subPath, '/', $child)
order by $child
return
<li> {util:unescape-uri($child, "UTF-8")}</li>
)
else ()
};
let $collection := request:get-parameter('coll', '/db/apps/ebner-online/resources/xss/xml')
return
<ul>{ls:ls($collection,"")}</ul>
Rather than util:unescape-uri(), I would suggest using xmldb:encode-uri() and xmldb:decode-uri(). Use the encode version on a collection or document name when creating/storing it. Use the decode version when displaying the collection or document name. See the function documentation for the xmldb module.
As to forcing ä instead of ü, this is an even trickier serialization issue. Both, along with ä, are equivalent representations of the same UTF-8 character. Why not just let the character through as ü?