I have attached an xml document called hamlet.xml found here (http://www.ibiblio.org/xml/examples/shakespeare/hamlet.xml).
I am looking to get the following output using xquery:
<speaker name="BERNARDO" lines="38"/>
<speaker name="FRANCISCO" lines="10"/>
<speaker name="HORATIO" lines="291"/>
...and so on for all the distinct speakers.
I have been able to get the name of the speaker using
========================
for $s in distinct-values(doc("hamlet")//SPEAKER
return <speaker name="$s}" />
=========================
But I don't know how to pull up the lines.
Any help will be appreciated.
Simply enough:
for $speaker in
distinct-values(//SPEAKER/text())
return
<speaker
name="{$speaker}"
count="{count(//LINE[../SPEAKER=$speaker])}"/>
Related
I have a listPers.xml (TEI List containing persons, obviously ) . I want to write a function to update the listPers.xml
My function looks like this:
declare function app:addPerson($node as node(), $model as map(*)) {
let $person := "<person xml:id=""><persName><forename>Albert</forename><surname>Test</surname></persName></person>"
let $list := doc(concat($config:app-root, '/resources/listPers_test.xml'))
return
update insert $person into $list//tei:listPerson
};
And the listPerson.xml
looks more or less like a typical list with person-entries
I have a tei:header (here omitted) followed by
<text>
<body>
<listPerson xml:id="person">
<person xml:id="abbadie_jacques">
<persName ref="http://d-nb.info/gnd/100002307">
<forename>Jacques</forename>
<surname>Abbadie</surname>
</persName>
<note>Prediger der französisch-reformierten Gemeinde in <rs type="place" ref="#berlin">Berlin</rs>
</note>
</person>
</body>
</text>
</TEI>
(sorry for ruining indentions, it's just an excerpt )
I do not get an error, which means that my app:addPerson should be fine, right?
I want the listPers_test to look like this:
<text>
<body>
<listPerson xml:id="person">
<person xml:id="abbadie_jacques">
<persName ref="http://d-nb.info/gnd/100002307">
<forename>Jacques</forename>
<surname>Abbadie</surname>
</persName>
<note>Prediger der französisch-reformierten Gemeinde in <rs type="place" ref="#berlin">Berlin</rs>
</note>
</person>
<!-- here comes the output that I wish to have :-) -->
<person xml:id=""><persName><forename>Albert</forename><surname>Test</surname></persName></person>
</body>
</text>
</TEI>
In the long run, I aim for an html-form that allows users to input names etc., where ids are generated using sth like
to-lowercase(concat($surname, "_", $forename));
But I will not get into my questions regarding forms and xquery, as I have barely done a quick Google-trip regarding html forms and xquery!
Can anyone hint me at why I do not get the listPers_test.xml file updated with the second value? :-)
All the best and thanks in advance to everyone,
K
Alright, I have a solution for anyone interested in it:
My first snippet $person:= ... contains a STRING, not an element.Changing the line
let $person := "<person xml:id=""><persName><forename>Albert</forename><surname>Test</surname></persName></person>"
to this one actually solves the issue:
let $person := <tei:person xml:id=""><persName><forename>Albert</forename><surname>Test</surname></persName></tei:person>
I try to parse out the xmlValue for the attribute "NAME" in an XML Document in R.
<NN ID_NAME="107232" ID_NTYP="6" NAME="dSpace_ECat1Error.STS" KOMMENTAR="dSpace_ECat1Error.STS" IS_SYSTEM="0" IS_LOCKED="0" DTYP="Ganzzahl" ADIM="" AFMT=""/><NN ID_NAME="107233" ID_NTYP="6" NAME="dSpace_ECat2Error.STS" KOMMENTAR="dSpace_ECat2Error.STS" IS_SYSTEM="0" IS_LOCKED="0" DTYP="Ganzzahl" ADIM="" AFMT=""/>
The result should be like this:
dSpace_ECat1Error.STS
dSpace_ECat2Error.STS
I use this function:
xpathSApply(root,"//NN[#NAME]",xmlValue)
But as a result, I get just empty "" (Quotes)
What have I done wrong?
Thank's in advance!
I just found out by using:
erg<-xpathSApply(root,"//NN",xmlGetAttr,'NAME')
There should be a better tutorial for this particular XML-function in R....
Hi I am new to marklogic and in Xquery world. I am not able to think of starting point to write the following logic in Marklogic Xquery. I would be thankful if somebody can give me idea/sample so I can achieve the following:
I want to Query A.XML based on a word lookup in B.XML. Query should produce C.XML. The logic should be as follows:
A.XML
<root>
<content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Creicket HBO</content>
</root>
B.XML
<WordLookUp>
<companies>
<company name="Vodafone">Vodafone</company>
<company name="Nokia">Nokia</company>
</companies>
<topics>
<topic group="Sports">Cricket</topic>
<topic group="Entertainment">HBO</topic>
<topic group="Finance">GDP</topic>
</topics>
<moods>
<mood number="4">Growth</mood>
<mood number="-5">Depression</mood>
<mood number="-3">Recession</mood>
</moods>
C.XML (Result XML)
<root>
<content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Creicket HBO</content>
<updatedElement>
<companies>
<company count="1">Vodafone</company>
<company count="2">Nokia</company>
</companies>
<mood>1</mood>
<topics>
<topic count="1">Sports</topic>
<topic count="1">Entertainment</topic>
</topics>
<word-count>22</word-count>
</updatedElement>
</root>
Search each company/text() of A.xml in B.xml, if match found create tag:
TAG {company count="Number of occurrence of that word"}company/#name
{/company}
Search each topic/text() of A.xml in B.xml, if match found create tag
TAG {topic topic="Number of occurrences of that word"}topic/#group{/topic}
Search each mood/text() of A.xml in B.xml, if match found
[occurrences of first word * {/mood[first word]/#number}] + [occurrences of second word * {/mood[second word]/#number})]....
get the word count of element.
This was a fun one, and I learned a few things in the process. Thanks!
Note: to get the results you wanted, I fixed a typo in A.xml ("Creicket" -> "Cricket").
The following solution uses two MarkLogic-specific functions:
cts:highlight (for replacing matching text with nodes which you can then count)
cts:tokenize (for breaking up a given string into word, space, and punctuation parts)
It also includes some powerful magic specific to those two functions, respectively:
the dynamic binding of the special variable $cts:text (which isn't really necessary for this particular use case, but I digress), and
the data model extension which adds these subtypes of xs:string:
cts:word,
cts:space, and
cts:punctuation.
Enjoy!
xquery version "1.0-ml";
(: Generic function using MarkLogic's ability to find query matches within a single node :)
declare function local:find-matches($content, $search-text) {
cts:highlight($content, $search-text, <MATCH>{$cts:text}</MATCH>)
//MATCH
};
(: Generic function using MarkLogic's ability to tokenize text into words, punctuation, and spaces :)
declare function local:get-words($text) {
cts:tokenize($text)[. instance of cts:word]
};
(: The rest of this is pure XQuery :)
let $content := doc("A.xml")/root/content,
$lookup := doc("B.xml")/WordLookUp
return
<root>
{$content}
<updatedElement>
<companies>{
for $company in $lookup/companies/company
let $results := local:find-matches($content, string($company))
where exists($results)
return
<company count="{count($results)}">{string($company/#name)}</company>
}</companies>
<mood>{
sum(
for $mood in $lookup/moods/mood
let $results := local:find-matches($content, string($mood))
return count($results) * $mood/#number
)
}</mood>
<topics>{
for $topic in $lookup/topics/topic
let $results := local:find-matches($content, string($topic))
where exists($results)
return
<topic count="{count($results)}">{string($topic/#group)}</topic>
}</topics>
<word-count>{
count(local:get-words($content))
}</word-count>
</updatedElement>
</root>
Let me know if you have any follow-up questions about how all the above works. At first, I was inclined to use cts:search or cts:contains, which are the bread and butter for search in MarkLogic. But I realized that this example wasn't so much about search (finding documents) as it was about looking up matching text within an already-given document. If you needed to extend this somehow to aggregate across a large number of documents, then you'd want to look into the additional use of cts:search or cts:contains.
One final caveat: if you think your content might have <MATCH> elements already, you'll want to use a different element name when calling cts:highlight (a name which you can guarantee won't conflict with your content's existing element names). Otherwise, you'll potentially get the wrong number of results (higher than the accurate count).
ADDENDUM:
I was curious if this could be done without cts:highlight, given that cts:tokenize already breaks up the text into all the words for you. The same result is produced using this alternative implementation of local:find-matches (provided you swap the order of the function declarations because one depends on the other):
(: Find word matches by comparing them one-by-one :)
declare function local:find-matches($content, $search-text) {
local:get-words($content)[cts:stem(.) = cts:stem($search-text)]
};
It uses cts:stem to normalize the given word to its stem, so, for example searching for "pass" will match "passed", etc. However, this still won't work for multi-word (phrase) searches. So to be safe, I'd stick with using cts:highlight, which, like cts:search and cts:contains, can handle any cts:query you give it (including simple word/phrase searches like we do above).
Might make sense to step back and ask if you might be better served modeling your data and or documents for use with a document oriented database instead of an rdbms
This is simpler/shorter and fully compliant XQuery not containing any implementation extensions, which make it work with any compliant XQuery 1.0 processor:
let $content := doc('file:///c:/temp/delete/A.xml')/*/*,
$lookup := doc('file:///c:/temp/delete/B.xml')/*,
$words := tokenize($content, '\W+')[.]
return
<root>
{$content}
<updatedElement>
<companies>
{for $c in $lookup/companies/*,
$occurs in count(index-of($words, $c))
return
if($occurs)
then
<company count="{$occurs}">
{$c/text()}
</company>
else ()
}
</companies>
<mood>
{
sum($lookup/moods/*[false or index-of($words, data(.))]/#number)
}
</mood>
<topics>
{for $t in $lookup/topics/*,
$occurs in count(index-of($words, $t))
return
if($occurs)
then
<topic count="{$occurs}">
{data($t/#group)}
</topic>
else ()
}
</topics>
<word-count>{count($words)}</word-count>
</updatedElement>
</root>
When applied on the provided files A.xml and B.XML (contained in the local directory c:/temp/delete), the wanted, correct result is produced:
<root>
<content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Cricket HBO</content>
<updatedElement>
<companies>
<company count="1">Vodafone</company>
<company count="2">Nokia</company>
</companies>
<mood>1</mood>
<topics>
<topic count="1">Sports</topic>
<topic count="1">Entertainment</topic>
</topics>
<word-count>22</word-count>
</updatedElement>
</root>
Using Xquery, how can I search the file below (consisting of many items), for all items with 'XC' in the part-number (there are many), then for matches return all 3 of the interesting data elements (part-number, part-name, and name)? The return is the main problem--my attempts result in every permutation of the interesting data elements. Thank you!
<catalog>
<item>
<description>
<partref>
<part-id>
<part-number>XC51222</part-number>
</part-id>
</partref>
<part-name>DSP, Network Vectoring<part-name>
<vendors>
<vendor1>
<pay-to>
<name>JCOF Industries</name>
</pay-to>
</vendor1>
</vendors>
</description>
</item>
<item>
</item>
[many items…]
</catalog>
xquery version "1.0";
let $sep := ','
for $x in catalog/item
where fn:matches($x/description/partref/part-id/part-number, 'XC')
return fn:string-join( ($x/description/partref/part-id/part-number/text(), $x/description/part-name/text(), $x/description/vendors/vendor1/pay-to/name/text()), $sep)
Say I have the following documents in my database: a_doc1, a_doc2, b_doc1, and b_doc2
All these documents are of the following format
<doc>
....
<updatedTime>2011-02-07T14:41:02.133-05:00</updatedTime>
....
</doc
>
The value of the "updatedTime" element is inserted when the document is created using the fn:current-dateTime()
Now I am trying to do the following:
find all documents whose name starts with "a_"
order these documents by their <updatedTime> element in descending order
Return the first document name from the descending order
I tried the following:
for $doc_name in db:list()
where fn:starts-with($doc_name, 'a_')
order by xs:dateTime(doc($doc_name)/updatedTime) descending
return $doc_name
Say "a_doc1" is created at "2011-02-07T14:40:00.78-05:00" and "a_doc2" is created at "2011-02-07T14:41:02.133-05:00", the desired output is a_doc2. In short the document name(starting with a_) of the most recent document created must be returned.
When I try my sample code, the output returned is : [a_doc1, a_doc2].
The expected output is: [a_doc2, a_doc1].
Thanks,
Sony
order by dateTime(doc($doc_name)/updatedTime/text())
Instead of this use:
order by xs:dateTime(doc($doc_name)/updatedTime)
you should use (xs: http://www.w3.org/2001/XMLSchema):
xs:dateTime($arg as xs:anyAtomicType?) as xs:dateTime?
Example:
xs:dateTime("1999-12-31T12:00:00")
instead of (fn: http://www.w3.org/2005/xpath-functions)
fn:dateTime($arg1 as xs:date?, $arg2 as xs:time?) as xs:dateTime?
Example:
fn:dateTime(xs:date("1999-12-31"), xs:time("12:00:00"))