How to find unique elements with XQuery?

How to find unique elements with XQuery? - xquery

How could I get the list of title books where country location is unique in the next XML?
<BooksLib>
<Book Title="Murder in NY" Year="1980">
<BookLocations>
<Location City="New York" Country="USA"/>
<Location City="Virginia" Country="USA"/>
</BookLocations>
</Book >
<Book Title="Dracula" Year="2000">
<BookLocations>
<Location City="Sydney" Country="Australia"/>
<Location City="Moab" Country="USA"/>
<Location City="Calvados" Country="France"/>
</BookLocations>
</Book>
<Book Title="Romance in calvados" Year="2012">
<BookLocations>
<Location City="Calvados" Country="France"/>
</BookLocations>
</Book >
</BooksLib>
For example, in this XML would be "Dracula", because Australia only appears once
Now I got this:
for $book in doc("books.xml")//Book
where count(distinct-values($Book/BookLocations/Location/#Country)) eq 1
return $Book/data(#Title)
But this gives me the titles where the county is the same.

I would do it slightly differently. First, because it feels more natural to me how one actually thinks and second, because it will be faster on a large set. So first I would try to identify which countries are just present once. And then based on this filter out the results. The following should work:
let $single-loc :=
for $loc in doc("books.xml")//Location/#Country/string()
where count(//Location/#Country[. = $loc]) = 1
return $loc
for $book in doc("books.xml")//Book
where $book/BookLocations/Location/#Country = $single-loc
return $book/data(#Title)
Please note, that your input XML was not well-formed (I edited your post). Also, your XQuery is wrong as $book and $Book are two different variables.

Related

cts search ignoring index order with cts:element-attribute-reference date

Background
I'm using a cts search in MarkLogic and it is not sorting by the passed sort option.
For example the following produces unsorted results
xdmp:document-insert("/test/test1",<test attrDate="2016-1-10"></test>);
xdmp:document-insert("/test/test2",<test attrDate="2015-1-10"></test>);
xdmp:document-insert("/test/test3",<test attrDate="2017-1-10"></test>);
cts:search(
xdmp:directory("/test/", "infinity")/test,
cts:true-query(),
(
cts:index-order(cts:element-attribute-reference(xs:QName("test"), xs:QName("attrDate")), ("ascending"))
)
);
This returns the following:
<test attrDate="2016-1-10">
</test>
element
<test attrDate="2015-1-10">
</test>
element
<test attrDate="2017-1-10">
</test>
So the correct results but unsorted.
Question
How can I sort by an attribute in a MarkLogic cts query?
Further Background
I have an index set up on that attribute, here is the config:
(This can index be created at http://localhost:8001/ > summary > YOURDATABASE-content > Attribute Range Indexes > Add, although I added it via Roxy)

It turns out this was a simple data issue (which I found in the last 5 seconds before posting this)
2016-01-10 is the 10th of January 2016
2016-1-10 is a malformed string that MarkLogic just ignores

Filtering by attribute

Below is a excerpt from a XML file with 65 lectures:
<?xml version="1.0" encoding="iso-8859-1" ?>
<university>
<lecture>
<class>English</class>
<hours>3</hours>
<pupils>30</pupils>
</lecture>
<lecture>
<class>Math</class>
<hours>4</hours>
<pupils>27</pupils
</lecture>
<lecture>
<class>Science</class>
<hours>2</hours>
<pupils>25</pupils>
</lecture>
</university>
I need a where clause that gives me a list of lectures with more pupils than an English lecture. However, not with the attribute "30" used, but calling the English's lecture attribute instead
E.g., I want to use a where clause with a condition like pupils > English.pupils, instead of pupils > 30.
(The "pupils > English.pupils" is just puesdo code as an example)

A where clause isn't strictly necessary, but to use one you would make it part of a for iterator:
let $lectures := doc("lectures.xml")/university/lecture
let $english-pupils := $lectures[class = "English"]/pupils/xs:integer(.)
for $lecture in $lectures
where ($lecture/pupils/xs:integer(.) gt $english-pupils)
return $lecture
You could also avoid the flwor altogether by using an XPath predicate.
let $lectures := doc("lectures.xml")/university/lecture
let $english-pupils := $lectures[class = "English"]/pupils/xs:integer(.)
return $lectures[pupils/xs:integer(.) gt $english-pupils]

How add more chapter in chapter metadata xml

A chapter-metadata.xml store in each book isbn folder(there are 100 isbn folder so there is 100 chapter-metadata.xml) which store in marklogic database server and chapter-metadata. Xml either contain data of one chapter or empty. If chapter-metadata.xml contain only one chapter information then I want to add more chapter information(my chapter infomation is common for all chapter) under chapter element with attribute and value of that chapter up to how many chapter store in book isbn folder(that I can fetch and store in a variable $chapter_sequence like ch001 ch002 ch003 ch004..) or if chapter-metadata.xml does not have any chaper information then it will create chapter element with attribute and value of chapter number and add my information, below I have put some xml structure if there is one chapter information and my information is from element keywordset
<?xml version="1.0" encoding="UTF-8" ?>
<chaptermetadata>
<bookisbn>isbn number</bookisbn>
<booktitle>Copyright</booktitle>
<chapter id="ch001"">
<keywordset>
<keyword role="primary">context</keyword>
<keyword role="secondary">Copyright</keyword>
<keyword role="tertiary">subject</keyword>
</keywordset>
</chapter>
</chaptermetadata>
I want like below:
<?xml version="1.0" encoding="UTF-8" ?>
<chaptermetadata>
<bookisbn>isbn number</bookisbn>
<booktitle>Copyright</booktitle>
<chapter id="ch001"">
<keywordset>
<keyword role="primary">context</keyword>
<keyword role="secondary">Copyright</keyword>
<keyword role="tertiary">subject</keyword>
</keywordset>
</chapter>
<chapter id="ch002"">
<keywordset>
<keyword role="primary">context</keyword>
<keyword role="secondary">Copyright</keyword>
<keyword role="tertiary">subject</keyword>
</keywordset>
</chapter>
so on to last chapter which I store in veriable
</chaptermetadata>
thanks,
raj

The question is hard to follow, but start with http://docs.marklogic.com/xdmp:directory and a FLWOR expression. Let's say you put this into a function. I'll handwave a few helper functions that you would also have to implement, but the function might look something like this:
declare function chaptermetadata($isbn as xs:string)
as element(chaptermetadata) {
<chaptermetadata>
{
<bookisbn>{ $isbn }</bookisbn>
<booktitle>{ title($isbn) }</booktitle>
for $chapter in xdmp:directory(isbn-uri($isbn), 'infinity')
return element { fn:node-name($chapter) } {
$chapter/#*,
$chapter/keywordset }
}
<chaptermetadata>
};
Now, this code won't help much unless you understand everything that it's doing so you can modify it to suit your needs. This is a variation on one of the XQuery use cases, so you might find helpful to work through and understand those: http://www.w3.org/TR/xquery-use-cases/

Attributes in where condition

Lets say this is the XML file I have.
<bookstore>
<book year="1994">
<title>blah</title>
<price>66</price>
</book>
<book year="1998">
<title>blahblah</title>
<price>99</price>
</book>
</bookstore>
How do I select all books where the year attribute is <1995 and price is <70.
This is what I have:
for $x in doc("bkstr.xml")/bookstore/book
where $x/price<70 and ??
return $x
How do i check the value of the year attribute?

Attributes are addressed by using #.
for $x in doc("bkstr.xml")/bookstore/book
where $x/price<70 and $x/#year<1995
return $x
You can also use the much shorter equivalent
doc("bkstr.xml")/bookstore/book[price<70 and #year<1995]

MarkLogic Join Query

Hi I am new to marklogic and in Xquery world. I am not able to think of starting point to write the following logic in Marklogic Xquery. I would be thankful if somebody can give me idea/sample so I can achieve the following:
I want to Query A.XML based on a word lookup in B.XML. Query should produce C.XML. The logic should be as follows:
A.XML
<root>
<content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Creicket HBO</content>
</root>
B.XML
<WordLookUp>
<companies>
<company name="Vodafone">Vodafone</company>
<company name="Nokia">Nokia</company>
</companies>
<topics>
<topic group="Sports">Cricket</topic>
<topic group="Entertainment">HBO</topic>
<topic group="Finance">GDP</topic>
</topics>
<moods>
<mood number="4">Growth</mood>
<mood number="-5">Depression</mood>
<mood number="-3">Recession</mood>
</moods>
C.XML (Result XML)
<root>
<content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Creicket HBO</content>
<updatedElement>
<companies>
<company count="1">Vodafone</company>
<company count="2">Nokia</company>
</companies>
<mood>1</mood>
<topics>
<topic count="1">Sports</topic>
<topic count="1">Entertainment</topic>
</topics>
<word-count>22</word-count>
</updatedElement>
</root>
Search each company/text() of A.xml in B.xml, if match found create tag:
TAG {company count="Number of occurrence of that word"}company/#name
{/company}
Search each topic/text() of A.xml in B.xml, if match found create tag
TAG {topic topic="Number of occurrences of that word"}topic/#group{/topic}
Search each mood/text() of A.xml in B.xml, if match found
[occurrences of first word * {/mood[first word]/#number}] + [occurrences of second word * {/mood[second word]/#number})]....
get the word count of element.

This was a fun one, and I learned a few things in the process. Thanks!
Note: to get the results you wanted, I fixed a typo in A.xml ("Creicket" -> "Cricket").
The following solution uses two MarkLogic-specific functions:
cts:highlight (for replacing matching text with nodes which you can then count)
cts:tokenize (for breaking up a given string into word, space, and punctuation parts)
It also includes some powerful magic specific to those two functions, respectively:
the dynamic binding of the special variable $cts:text (which isn't really necessary for this particular use case, but I digress), and
the data model extension which adds these subtypes of xs:string:
cts:word,
cts:space, and
cts:punctuation.
Enjoy!
xquery version "1.0-ml";
(: Generic function using MarkLogic's ability to find query matches within a single node :)
declare function local:find-matches($content, $search-text) {
cts:highlight($content, $search-text, <MATCH>{$cts:text}</MATCH>)
//MATCH
};
(: Generic function using MarkLogic's ability to tokenize text into words, punctuation, and spaces :)
declare function local:get-words($text) {
cts:tokenize($text)[. instance of cts:word]
};
(: The rest of this is pure XQuery :)
let $content := doc("A.xml")/root/content,
$lookup := doc("B.xml")/WordLookUp
return
<root>
{$content}
<updatedElement>
<companies>{
for $company in $lookup/companies/company
let $results := local:find-matches($content, string($company))
where exists($results)
return
<company count="{count($results)}">{string($company/#name)}</company>
}</companies>
<mood>{
sum(
for $mood in $lookup/moods/mood
let $results := local:find-matches($content, string($mood))
return count($results) * $mood/#number
)
}</mood>
<topics>{
for $topic in $lookup/topics/topic
let $results := local:find-matches($content, string($topic))
where exists($results)
return
<topic count="{count($results)}">{string($topic/#group)}</topic>
}</topics>
<word-count>{
count(local:get-words($content))
}</word-count>
</updatedElement>
</root>
Let me know if you have any follow-up questions about how all the above works. At first, I was inclined to use cts:search or cts:contains, which are the bread and butter for search in MarkLogic. But I realized that this example wasn't so much about search (finding documents) as it was about looking up matching text within an already-given document. If you needed to extend this somehow to aggregate across a large number of documents, then you'd want to look into the additional use of cts:search or cts:contains.
One final caveat: if you think your content might have <MATCH> elements already, you'll want to use a different element name when calling cts:highlight (a name which you can guarantee won't conflict with your content's existing element names). Otherwise, you'll potentially get the wrong number of results (higher than the accurate count).
ADDENDUM:
I was curious if this could be done without cts:highlight, given that cts:tokenize already breaks up the text into all the words for you. The same result is produced using this alternative implementation of local:find-matches (provided you swap the order of the function declarations because one depends on the other):
(: Find word matches by comparing them one-by-one :)
declare function local:find-matches($content, $search-text) {
local:get-words($content)[cts:stem(.) = cts:stem($search-text)]
};
It uses cts:stem to normalize the given word to its stem, so, for example searching for "pass" will match "passed", etc. However, this still won't work for multi-word (phrase) searches. So to be safe, I'd stick with using cts:highlight, which, like cts:search and cts:contains, can handle any cts:query you give it (including simple word/phrase searches like we do above).

Might make sense to step back and ask if you might be better served modeling your data and or documents for use with a document oriented database instead of an rdbms

This is simpler/shorter and fully compliant XQuery not containing any implementation extensions, which make it work with any compliant XQuery 1.0 processor:
let $content := doc('file:///c:/temp/delete/A.xml')/*/*,
$lookup := doc('file:///c:/temp/delete/B.xml')/*,
$words := tokenize($content, '\W+')[.]
return
<root>
{$content}
<updatedElement>
<companies>
{for $c in $lookup/companies/*,
$occurs in count(index-of($words, $c))
return
if($occurs)
then
<company count="{$occurs}">
{$c/text()}
</company>
else ()
}
</companies>
<mood>
{
sum($lookup/moods/*[false or index-of($words, data(.))]/#number)
}
</mood>
<topics>
{for $t in $lookup/topics/*,
$occurs in count(index-of($words, $t))
return
if($occurs)
then
<topic count="{$occurs}">
{data($t/#group)}
</topic>
else ()
}
</topics>
<word-count>{count($words)}</word-count>
</updatedElement>
</root>
When applied on the provided files A.xml and B.XML (contained in the local directory c:/temp/delete), the wanted, correct result is produced:
<root>
<content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Cricket HBO</content>
<updatedElement>
<companies>
<company count="1">Vodafone</company>
<company count="2">Nokia</company>
</companies>
<mood>1</mood>
<topics>
<topic count="1">Sports</topic>
<topic count="1">Entertainment</topic>
</topics>
<word-count>22</word-count>
</updatedElement>
</root>

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to find unique elements with XQuery? - xquery

Related

cts search ignoring index order with cts:element-attribute-reference date

Filtering by attribute

How add more chapter in chapter metadata xml

Attributes in where condition

MarkLogic Join Query

Categories

Resources