I have documents containing structures like:
<Reviews>
<Review complete="false">
<StartDate>2019-03-05T06:00:00Z</StartDate>
<EndDate>2019-03-12T05:00:00Z</EndDate>
<Reviewers>
<Reviewer userName="jdoe">
<ReviewStatus>Completed</ReviewStatus>
</Reviewer>
</Reviewers>
</Review>
<Review complete="false">
<StartDate>2019-03-06T06:00:00Z</StartDate>
<EndDate>2019-03-13T05:00:00Z</EndDate>
<Reviewers>
<Reviewer userName="jsmith">
<ReviewStatus>Pending</ReviewStatus>
</Reviewer>
<Reviewer userName="jdoe">
<ReviewStatus>Completed</ReviewStatus>
</Reviewer>
</Reviewers>
</Review>
</Reviews>
Using MarkLogic XQuery, I want to search for docs that have a Reviewer element for jsmith AND with his ReviewStatus=Completed. I.e., I do not want to see this sample above in my results because jsmith's ReviewStatus is not Completed.
I have tried a couple of different query types where cts:and-query() uses combinations of attribute value, element word, and even path range queries. But I have not figured out how to find only those docs containing a Reviewer element where both the userName attribute value matches "jsmith" AND the ReviewStatus child element value matches "Completed" in the same Reviewer element. Can anyone suggest an approach for this?
You are looking for scoping queries, like cts:element-query. It allows you to pick a joint ancestor for sub-queries. Here some code that shows how it works:
let $search-name := "jsmith"
let $search-status := "Completed"
let $xml := <Reviews>
<Review complete="false">
<StartDate>2019-03-05T06:00:00Z</StartDate>
<EndDate>2019-03-12T05:00:00Z</EndDate>
<Reviewers>
<Reviewer userName="jsmith">
<ReviewStatus>Completed</ReviewStatus>
</Reviewer>
<Reviewer userName="jdoe">
<ReviewStatus>Pending</ReviewStatus>
</Reviewer>
</Reviewers>
</Review>
<Review complete="false">
<StartDate>2019-03-06T06:00:00Z</StartDate>
<EndDate>2019-03-13T05:00:00Z</EndDate>
<Reviewers>
<Reviewer userName="jsmith">
<ReviewStatus>Pending</ReviewStatus>
</Reviewer>
<Reviewer userName="jdoe">
<ReviewStatus>Completed</ReviewStatus>
</Reviewer>
</Reviewers>
</Review>
</Reviews>
for $rev in $xml//Reviewer
where cts:contains(
$rev,
cts:element-query(
xs:QName("Reviewer"),
cts:and-query((
cts:element-attribute-value-query(xs:QName("Reviewer"), xs:QName("userName"), $search-name),
cts:element-value-query(xs:QName("ReviewStatus"), $search-status)
))
)
)
return $rev
Note that you will need to enable position indexes if you need accurate results for unfiltered searches.
HTH!
Related
I have a listPers.xml (TEI List containing persons, obviously ) . I want to write a function to update the listPers.xml
My function looks like this:
declare function app:addPerson($node as node(), $model as map(*)) {
let $person := "<person xml:id=""><persName><forename>Albert</forename><surname>Test</surname></persName></person>"
let $list := doc(concat($config:app-root, '/resources/listPers_test.xml'))
return
update insert $person into $list//tei:listPerson
};
And the listPerson.xml
looks more or less like a typical list with person-entries
I have a tei:header (here omitted) followed by
<text>
<body>
<listPerson xml:id="person">
<person xml:id="abbadie_jacques">
<persName ref="http://d-nb.info/gnd/100002307">
<forename>Jacques</forename>
<surname>Abbadie</surname>
</persName>
<note>Prediger der französisch-reformierten Gemeinde in <rs type="place" ref="#berlin">Berlin</rs>
</note>
</person>
</body>
</text>
</TEI>
(sorry for ruining indentions, it's just an excerpt )
I do not get an error, which means that my app:addPerson should be fine, right?
I want the listPers_test to look like this:
<text>
<body>
<listPerson xml:id="person">
<person xml:id="abbadie_jacques">
<persName ref="http://d-nb.info/gnd/100002307">
<forename>Jacques</forename>
<surname>Abbadie</surname>
</persName>
<note>Prediger der französisch-reformierten Gemeinde in <rs type="place" ref="#berlin">Berlin</rs>
</note>
</person>
<!-- here comes the output that I wish to have :-) -->
<person xml:id=""><persName><forename>Albert</forename><surname>Test</surname></persName></person>
</body>
</text>
</TEI>
In the long run, I aim for an html-form that allows users to input names etc., where ids are generated using sth like
to-lowercase(concat($surname, "_", $forename));
But I will not get into my questions regarding forms and xquery, as I have barely done a quick Google-trip regarding html forms and xquery!
Can anyone hint me at why I do not get the listPers_test.xml file updated with the second value? :-)
All the best and thanks in advance to everyone,
K
Alright, I have a solution for anyone interested in it:
My first snippet $person:= ... contains a STRING, not an element.Changing the line
let $person := "<person xml:id=""><persName><forename>Albert</forename><surname>Test</surname></persName></person>"
to this one actually solves the issue:
let $person := <tei:person xml:id=""><persName><forename>Albert</forename><surname>Test</surname></persName></tei:person>
Below is the XML structure where I want to get the entries for which element co:isbn is not available:-
<tr:trackingRecord xmlns:tr="https://www.mla.org/Schema/Tracking/tr"
xmlns:co="https://www.mla.org/Schema/commonModule/co"
xmlns:r="http://www.rsuitecms.com/rsuite/ns/metadata">
<tr:journal>
<tr:trackingDetails>
<tr:entry>
<co:trackingEntryID>2015323313</co:trackingEntryID>
<co:publicationDate>2015</co:publicationDate>
<co:volume>21</co:volume>
</tr:entry>
<tr:entry>
<co:trackingEntryID>2015323314</co:trackingEntryID>
<co:publicationDate>2015</co:publicationDate>
<co:isbn>
<co:entry>NA</co:entry>
<co:value>1234567890128</co:value>
</co:isbn>
</tr:entry>
<tr:entry>
<co:trackingEntryID>2015323315</co:trackingEntryID>
<co:publicationDate>2015</co:publicationDate>
<co:volume>21</co:volume>
<co:isbn></co:isbn>
</tr:entry>
<tr:entry>
<co:trackingEntryID>2015323316</co:trackingEntryID>
<co:publicationDate>2015</co:publicationDate>
<co:volume>21</co:volume>
</tr:entry>
</tr:trackingDetails>
</tr:journal>
</tr:trackingRecord>
Please suggest the cts:query for the same.
If you can edit xml structure, add one attribute in entry element, like
<tr:entry isbnPresent="yes"> for isbn present,
<tr:entry isbnPresent="no"> for isbn absent
and based on these field fire search with,
cts:element-attribute-value
on it.
OR
without editing schema, try like, ,
for $i in cts:search(//tr:entry,"2015")
return if(fn:exists($i//co:isbn)) then () else $i
New to XQuery and probably a noob q. I installed a BaseX db as my sandbox (which included a sample file etc/factbook.xml). I constructed a simple query which I thought would return all 'cities' with population > 10million.
for $x in doc("etc/factbook.xml")/mondial/country
where $x/city/population > 10000000.0
return $x/city
but I'm getting cities with lower populations, any insight?
<city id="f0_1726" country="f0_553" longitude="126.967" latitude="37.5667">
<name>Seoul</name>
<population year="95">10229262</population>
</city>
<city id="f0_10300" country="f0_553">
<name>Kunsan</name>
<population year="95">266517</population>
</city>
(I've only included first two but many more both < and > 10million)
You're returning all countries that have a city with population larger than 10 millions. Loop over the cities instead (and please, use meaningful variable names):
for $city in doc("etc/factbook.xml")/mondial/country/city
where $city/population > 10000000
return $city
Or just go for an XPath expression doing the same:
doc("etc/factbook.xml")/mondial/country/city[population > 10000000]
First off, yes this is homework - please suggest where I am going wrong,but please do not do my homework for me.
I am learning XQuery, and one of my tasks is to take a list of song ID's for a performance and determine the total duration of the performance. Given the snippits below, can anyone point me to where I can determine how to cross reference the songID from the performance to the duration of the song?
I've listed my attempts at the end of the question.
my current XQuery code looks like:
let $songIDs := doc("C:/Users/rob/Downloads/A4_FLOWR.xml")
//SongSet/Song
for $performance in doc("C:/Users/rob/Downloads/A4_FLOWR.xml")
//ContestantSet/Contestant/Performance
return if($performance/SongRef[. =$songIDs/#SongID])
then <performanceDuration>{
data($performance/SongRef)
}</performanceDuration>
else ()
Which outputs:
<performanceDuration>S005 S003 S004</performanceDuration>
<performanceDuration>S001 S007 S002</performanceDuration>
<performanceDuration>S008 S009 S006</performanceDuration>
<performanceDuration>S002 S004 S007</performanceDuration>
Each S00x is the ID of a song, which us found in the referenced xml document (partial document):
<SongSet>
<Song SongID="S001">
<Title>Bah Bah Black Sheep</Title>
<Composer>Mother Goose</Composer>
<Duration>2.99</Duration>
</Song>
<Song SongID="S005">
<Title>Thank You Baby</Title>
<Composer>Shania Twain</Composer>
<Duration>3.02</Duration>
</Song>
</SongSet>
The performance section looks like:
<Contestant Name="Fletcher Gee" Hometown="Toronto">
<Repertoire>
<SongRef>S001</SongRef>
<SongRef>S002</SongRef>
<SongRef>S007</SongRef>
<SongRef>S010</SongRef>
</Repertoire>
<Performance>
<SongRef>S001</SongRef>
<SongRef>S007</SongRef>
<SongRef>S002</SongRef>
</Performance>
</Contestant>
My Attempts
I thought I would use nested loops, but that fails:
let $songs := doc("C:/Users/rob/Downloads/A4_FLOWR.xml")
//SongSet/Song
for $performance in doc("C:/Users/rob/Downloads/A4_FLOWR.xml")
//ContestantSet/Contestant/Performance
return if($performance/SongRef[. =$songs/#SongID])
for $song in $songIDs
(: gives an error in BaseX about incomplete if :)
then <performanceDuration>{
data($performance/SongRef)
}</performanceDuration>
else ()
--Edit--
I've fixed the inner loop, however I am getting all the songs durations, not just the ones that match id's. I have a feeling that this is due to variable scope, but I'm not sure:
let $songs := doc("C:/Users/rob/Downloads/A4_FLOWR.xml")//SongSet/Song
for $performance in doc("C:/Users/rob/Downloads/A4_FLOWR.xml")//ContestantSet/Contestant/Performance
return if($performance/SongRef[. =$songs/#SongID])
then <performanceDuration>{
for $song in $songs
return if($performance/SongRef[. =$songs/#SongID])
then
sum($song/Duration)
else ()
}</performanceDuration>
else ()
}
Output:
<performanceDuration>2.99 1.15 3.15 2.2 3.02 2.25 3.45 1.29 2.33 3.1</performanceDuration>
Your immediate problem is syntactic: you've inserted your inner loop between the condition and the keyword 'then' in a conditional. Fix that first:
return if ($performance/SongRef = $songs/#SongID) then
<performanceDuration>{
(: put your inner loop HERE :)
}</performanceDuration>
else ()
Now think yourself into the situation of the query evaluator inside the performanceDuration element. You have the variable $performance, you can find all the song references using $performance/SongRef, and for each song reference in the performance element, you can find the corresponding song element by matching the SongRef value with $songs/#SongID.
My next step at this point would be to ask myself:
For a given song reference, how do I find the song element for that song, and then the duration for that song?
Is there a way to get the sum of some set of durations? Is there, for example, a sum() function? (I'm pretty sure there is, but at this point I always pull up the Functions and Operators spec and look it up to be sure of the signature.)
What type does the duration info have? I'd expect it to be minutes and seconds, and I'd be worrying about duration arithmetic, but your sample makes it look like decimals, which will be easy.
Hi I am new to marklogic and in Xquery world. I am not able to think of starting point to write the following logic in Marklogic Xquery. I would be thankful if somebody can give me idea/sample so I can achieve the following:
I want to Query A.XML based on a word lookup in B.XML. Query should produce C.XML. The logic should be as follows:
A.XML
<root>
<content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Creicket HBO</content>
</root>
B.XML
<WordLookUp>
<companies>
<company name="Vodafone">Vodafone</company>
<company name="Nokia">Nokia</company>
</companies>
<topics>
<topic group="Sports">Cricket</topic>
<topic group="Entertainment">HBO</topic>
<topic group="Finance">GDP</topic>
</topics>
<moods>
<mood number="4">Growth</mood>
<mood number="-5">Depression</mood>
<mood number="-3">Recession</mood>
</moods>
C.XML (Result XML)
<root>
<content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Creicket HBO</content>
<updatedElement>
<companies>
<company count="1">Vodafone</company>
<company count="2">Nokia</company>
</companies>
<mood>1</mood>
<topics>
<topic count="1">Sports</topic>
<topic count="1">Entertainment</topic>
</topics>
<word-count>22</word-count>
</updatedElement>
</root>
Search each company/text() of A.xml in B.xml, if match found create tag:
TAG {company count="Number of occurrence of that word"}company/#name
{/company}
Search each topic/text() of A.xml in B.xml, if match found create tag
TAG {topic topic="Number of occurrences of that word"}topic/#group{/topic}
Search each mood/text() of A.xml in B.xml, if match found
[occurrences of first word * {/mood[first word]/#number}] + [occurrences of second word * {/mood[second word]/#number})]....
get the word count of element.
This was a fun one, and I learned a few things in the process. Thanks!
Note: to get the results you wanted, I fixed a typo in A.xml ("Creicket" -> "Cricket").
The following solution uses two MarkLogic-specific functions:
cts:highlight (for replacing matching text with nodes which you can then count)
cts:tokenize (for breaking up a given string into word, space, and punctuation parts)
It also includes some powerful magic specific to those two functions, respectively:
the dynamic binding of the special variable $cts:text (which isn't really necessary for this particular use case, but I digress), and
the data model extension which adds these subtypes of xs:string:
cts:word,
cts:space, and
cts:punctuation.
Enjoy!
xquery version "1.0-ml";
(: Generic function using MarkLogic's ability to find query matches within a single node :)
declare function local:find-matches($content, $search-text) {
cts:highlight($content, $search-text, <MATCH>{$cts:text}</MATCH>)
//MATCH
};
(: Generic function using MarkLogic's ability to tokenize text into words, punctuation, and spaces :)
declare function local:get-words($text) {
cts:tokenize($text)[. instance of cts:word]
};
(: The rest of this is pure XQuery :)
let $content := doc("A.xml")/root/content,
$lookup := doc("B.xml")/WordLookUp
return
<root>
{$content}
<updatedElement>
<companies>{
for $company in $lookup/companies/company
let $results := local:find-matches($content, string($company))
where exists($results)
return
<company count="{count($results)}">{string($company/#name)}</company>
}</companies>
<mood>{
sum(
for $mood in $lookup/moods/mood
let $results := local:find-matches($content, string($mood))
return count($results) * $mood/#number
)
}</mood>
<topics>{
for $topic in $lookup/topics/topic
let $results := local:find-matches($content, string($topic))
where exists($results)
return
<topic count="{count($results)}">{string($topic/#group)}</topic>
}</topics>
<word-count>{
count(local:get-words($content))
}</word-count>
</updatedElement>
</root>
Let me know if you have any follow-up questions about how all the above works. At first, I was inclined to use cts:search or cts:contains, which are the bread and butter for search in MarkLogic. But I realized that this example wasn't so much about search (finding documents) as it was about looking up matching text within an already-given document. If you needed to extend this somehow to aggregate across a large number of documents, then you'd want to look into the additional use of cts:search or cts:contains.
One final caveat: if you think your content might have <MATCH> elements already, you'll want to use a different element name when calling cts:highlight (a name which you can guarantee won't conflict with your content's existing element names). Otherwise, you'll potentially get the wrong number of results (higher than the accurate count).
ADDENDUM:
I was curious if this could be done without cts:highlight, given that cts:tokenize already breaks up the text into all the words for you. The same result is produced using this alternative implementation of local:find-matches (provided you swap the order of the function declarations because one depends on the other):
(: Find word matches by comparing them one-by-one :)
declare function local:find-matches($content, $search-text) {
local:get-words($content)[cts:stem(.) = cts:stem($search-text)]
};
It uses cts:stem to normalize the given word to its stem, so, for example searching for "pass" will match "passed", etc. However, this still won't work for multi-word (phrase) searches. So to be safe, I'd stick with using cts:highlight, which, like cts:search and cts:contains, can handle any cts:query you give it (including simple word/phrase searches like we do above).
Might make sense to step back and ask if you might be better served modeling your data and or documents for use with a document oriented database instead of an rdbms
This is simpler/shorter and fully compliant XQuery not containing any implementation extensions, which make it work with any compliant XQuery 1.0 processor:
let $content := doc('file:///c:/temp/delete/A.xml')/*/*,
$lookup := doc('file:///c:/temp/delete/B.xml')/*,
$words := tokenize($content, '\W+')[.]
return
<root>
{$content}
<updatedElement>
<companies>
{for $c in $lookup/companies/*,
$occurs in count(index-of($words, $c))
return
if($occurs)
then
<company count="{$occurs}">
{$c/text()}
</company>
else ()
}
</companies>
<mood>
{
sum($lookup/moods/*[false or index-of($words, data(.))]/#number)
}
</mood>
<topics>
{for $t in $lookup/topics/*,
$occurs in count(index-of($words, $t))
return
if($occurs)
then
<topic count="{$occurs}">
{data($t/#group)}
</topic>
else ()
}
</topics>
<word-count>{count($words)}</word-count>
</updatedElement>
</root>
When applied on the provided files A.xml and B.XML (contained in the local directory c:/temp/delete), the wanted, correct result is produced:
<root>
<content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Cricket HBO</content>
<updatedElement>
<companies>
<company count="1">Vodafone</company>
<company count="2">Nokia</company>
</companies>
<mood>1</mood>
<topics>
<topic count="1">Sports</topic>
<topic count="1">Entertainment</topic>
</topics>
<word-count>22</word-count>
</updatedElement>
</root>