Numbers in cts:word query in Marklogic - xquery

I have a cts:word-query which is having number as the text value.
cts:search(fn:doc(),cts:word-query("226"))
This query will fetch results matching to only 226 in the documents. But I need to get the documents which contain 0026 also.
Example:
This is abc.xml
<a>
<b>00226</b>
</a>
This is abc1.xml
<a>
<b>226</b>
</a>
If I give the query as cts:search(fn:doc(),cts:word-query("226")), it will fetch only abc1.xml and if the query is cts:search(fn:doc(),cts:word-query("00226")), it will fetch only abc.xml.
But I need to get both the documents, irrespective of leading zeros.

Simplest way would be to use a wild card character (*) and add the wildcarded option
cts:search(fn:doc(),cts:word-query("*226", ('wildcarded')))
EDIT:
Although this matches the example documents, as Kishan points out in the comments, the wildcard also matches unwanted documents (e.g. containing "226226").
Since range indexes are not an option in this case because the data is mixed, here is an alternative hack:
cts:search(
fn:doc(),
cts:word-query(
for $lead in ('', '0', '00', '000')
return $lead || "226"))
Obviously, this depends on how many leading zeros there can be and will only work if this is known and limited.

You can add an element range index on the element <b> in the database with scalar type int or long, then you do the following query, it should return both documents:
let $query := cts:element-range-query(xs:QName("b"),"=",00226)
return cts:search(fn:doc(),$query)

Related

Custom sorting issue in MarkLogic?

xquery version "1.0-ml";
declare function local:sortit(){
for $i in ('a','e','f','b','d','c')
order by $i
return
element Result{
element N{1},
element File{$i}
}
};
local:sortit()
the above code is sample, I need the data in this format. This sorting function is used multiple places, and I need only element N data some places and only File element data at other places.
But the moment I use the local:sortit()//File. It removes the sorting order and gives the random output. Please let me know what is the best way to do this or how to handle it.
All these data in File element is calculated and comes from multiple files, after doing all the joins and calculation, it will be formed as XML with many elements in it. So sorting using index and all is not possible here. Only order by clause can be used.
XPath expressions are always returned in document order.
You lose the sorting when you apply an XPath to the sequence returned from that function call.
If you want to select only the File in sorted order, try using the simple mapping operator !, and then plucking the F element from the item as you are mapping each item in the sequence:
local:sortit() ! File
Or, if you like typing, you can use a FLWOR to iterate over the sequence and return the File:
for $result in local:sortit()
return $result/File

splitting up a particular field in Xquery

I have an input field coming as
<BID>12-ABS-65789345</BID>
I need to adjust the XQuery such way that, I need to capture only last part of the field like after two - symbols.
In above case, I need the output of XQuery as below
<BID>65789345</BID>
Any help here..
Thanks
Assuming your requirement can be interpreted as taking the content after the "last" hyphen, you can take the last item in the sequence formed by splitting the string on hyphen:
let $x := <BID>12-ABS-65789345</BID>
return
<BID>{tokenize($x,'-')[last()]}</BID>
If you always need the content after the second hyphen and you can guarantee there will always be at least two hyphens then you can take the third item after splitting the string:
let $x := <BID>12-ABS-65789345</BID>
return
<BID>{tokenize($x,'-')[3]}</BID>

Postgres's query to select value in array by index

My data is string like:
'湯姆 is a boy.'
or '梅isagirl.'
or '約翰,is,a,boy.'.
And I want to split the string and only choose the Chinese name.
In R, I can use the command
tmp=strsplit(string,[A-z% ])
unlist(lapply(tmp,function(x)x[1]))
And then getting the Chinese name I want.
But in PostgreSQL
select regexp_split_to_array(string,'[A-z% ]') from db.table
I get a array like {'湯姆','','',''},{'梅','','',''},...
And I don't know how to choose the item in the array.
I try to use the command
select regexp_split_to_array(string,'[A-z% ]')[1] from db.table
and I get an error.
I don't think that regexp_split_to_array is the appropriate function for what you are trying to do here. Instead, use regexp_replace to selectively remove all ASCII characters:
SELECT string, regexp_replace(string, '[[:ascii:]~:;,"]+', '', 'g') AS name
FROM yourTable;
Demo
Note that you might have to adjust the set of characters to be removed, depending on what other non Chinese characters you expect to have in the string column. This answer gives you a general suggestion for how you might proceed here.

An XDMP-NOTANODE error using xquery in marklogic

I'm getting the XDMP-NOTANODE error when I try to run an XQuery in MarkLogic. When I loaded my xml documents I loaded meta data files with them. I'm a student and I don't have experience in XQuery.
error:
[1.0-ml] XDMP-NOTANODE: (err:XPTY0019) $article/article/front/article-meta/title-group/article-title -- xs:untypedAtomic("
") is not a node
Stack Trace
At line 3 column 77:
In xdmp:eval("(for $article in fn:distinct-values(/article/text()) &#1...", (), <options xmlns="xdmp:eval"><database>4206169969988859108</database> <root>C:\mls-projects\pu...</options>)
$article := xs:untypedAtomic("
")
1. (for $article in fn:distinct-values(/article/text())
2.
3. return (fn:distinct-values($article/article/front/article-meta/title-group/article-title)
4.
5.
Code:
(
for $article in fn:distinct-values(/article/text())
return (
fn:distinct-values($article/article/front/article-meta/title-group/article-title/text())
)
)
Every $article is bound to an atomic value (fn:distinct-values() returns a sequence of atomic values). Then you try to apply a path expression (using the / operator) on $article. Which is forbidden, as the path operator requires its LHS operator to be nodes.
I am afraid your code does not make sense enough for me to suggest you an actual solution. I can only pinpoint where the error is.
Furthermore, using text() at the end of a path is most of the time a bad idea. And if /article is a complex document, it is certainly not what you want. One of the text nodes you select (most likely the first one) is simply one single newline character.
What do you want to achieve?
Your $article variable is bound to an atomic value, not a node() from the article document. You can only use an XPath axis on a node.
When you apply the function distinct-values() in the for statement, it returns simple string values, not the article document or nodes from it.
You can probably make things work by using the values in a predicate filter like this:
for $article-text in fn:distinct-values(/article/text())
return
fn:distinct-values(/article[text()=$article-text]/front/article-meta/title-group/article-title/text())
Note: The above XQuery should avoid the XDMP-NOTANODE error, but there are likely easier (and more efficient) solutions for achieving your goal. If you were to post a sample of your document and describe what you are trying to achieve, we could suggest alternatives.
Bit of a wild guess, but you have two distinct-values in your code. That makes me think you want a unique list of articles, and then finally a unique list of article-title's. I would hope you already have unique articles in your database, unless you are explicitly attempting to de-duplicate them.
In case you just want the overall unique list of article titles, I would do something like:
distinct-values(
for $article in collection()/article
return
$article/front/article-meta/title-group/article-title
)
HTH!

SQLite: which character can be ignored with FTS match in one word

I need to find any special character. If I put it in the middle of a word, SQLite FTS match can ignore it as if it does not exist, e.g.:
Text Body: book's
If my match string is 'books' I need to get result of "book's"..
No problem using porter or simple tokenizer.
I tried many characters for that like: book!s, book?s, book|s, book,s, book:s…, but when searching by match for 'books' no results of these returned.
I don't understand, why?
I am using: Contentless FTS4 Tables, and External Content FTS4 Tables, my text body has many characters in each word, should be changed to ignore it when searching..
I cannot change match query because I do not know where the special character in the word is. Also, I need to leave the original word length equal to the length of FTS Index word to use match info or snippet(); as such, I cannot remove these characters from text body.
The default tokenizers do not ignore punctuation characters but treat them as word separators.
So the text body or match string book's will end up as two words, book and s.
These will never match a single work like books.
To ignore characters like ', you have to install your own custom tokenizer.

Resources