Below is a excerpt from a XML file with 65 lectures:
<?xml version="1.0" encoding="iso-8859-1" ?>
<university>
<lecture>
<class>English</class>
<hours>3</hours>
<pupils>30</pupils>
</lecture>
<lecture>
<class>Math</class>
<hours>4</hours>
<pupils>27</pupils
</lecture>
<lecture>
<class>Science</class>
<hours>2</hours>
<pupils>25</pupils>
</lecture>
</university>
I need a where clause that gives me a list of lectures with more pupils than an English lecture. However, not with the attribute "30" used, but calling the English's lecture attribute instead
E.g., I want to use a where clause with a condition like pupils > English.pupils, instead of pupils > 30.
(The "pupils > English.pupils" is just puesdo code as an example)
A where clause isn't strictly necessary, but to use one you would make it part of a for iterator:
let $lectures := doc("lectures.xml")/university/lecture
let $english-pupils := $lectures[class = "English"]/pupils/xs:integer(.)
for $lecture in $lectures
where ($lecture/pupils/xs:integer(.) gt $english-pupils)
return $lecture
You could also avoid the flwor altogether by using an XPath predicate.
let $lectures := doc("lectures.xml")/university/lecture
let $english-pupils := $lectures[class = "English"]/pupils/xs:integer(.)
return $lectures[pupils/xs:integer(.) gt $english-pupils]
Related
At the moment in Xquery 3.1 (in eXist 4.7) I receive XML fragments that look like the following (from eXist's Lucene full text search):
let $text :=
<tei:text>
<front>
<tei:div>
<tei:listBibl>
<tei:bibl>There is some</tei:bibl>
<tei:bibl>text in certain elements</tei:bibl>
</tei:listBibl>
</tei:div>
<tei:div>
<tei:listBibl>
<tei:bibl>which are subject <exist:match>to</exist:match> a Lucene search</tei:bibl>
<tei:bibl></tei:bibl>
<tei:listBibl>
</tei:div>
<tei:front>
<tei:body>
<tei:p>and often produces</tei:p>
<tei:p>a hit.</tei:p>
<tei:body>
<tei:text>
Currently I have Xquery send this fragment to an XSLT stylesheet in order to transform it into HTML like this:
<td>...elements which are subject <span class="search-hit">to</span> a Lucene search and often p...
Where the stylesheet's job is to return 30 characters of text before and after <exist:match/> and put the content of <exist:match/> into a span. There is only one <exist:match/> per transformation.
This all works fine. However, it's occurred to me that it is a very small job with effectively a single transformation of only one element, the rest being a sort of string-join. I therefore wonder if this can't be done efficiently in Xquery.
In trying to do this, I'm can't seem to find a way to handle the string content up to the <exist:match/> and then the string content after <exist:match/>. My idea is, in pseudo code, to output a result like:
let $textbefore := some function to get the text before <exist:match/>
let $textafter := some function to get text before <exist:match/>
return <td>...{$textbefore}
<span class="search-hit">
{$text//exist:match/text()}
</span> {$textafter}...</td>
Is this even worth doing in Xquery vs the current Xquery -> XSLT pipeline I have?
Many thanks.
I think it can be done as
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare namespace tei = "http://example.com/tei";
declare namespace exist = "http://example.com/exist";
declare option output:method 'html';
let $text :=
<tei:text>
<tei:front>
<tei:div>
<tei:listBibl>
<tei:bibl>There is some</tei:bibl>
<tei:bibl>text in certain elements</tei:bibl>
</tei:listBibl>
</tei:div>
<tei:div>
<tei:listBibl>
<tei:bibl>which are subject <exist:match>to</exist:match> a Lucene search</tei:bibl>
<tei:bibl></tei:bibl>
</tei:listBibl>
</tei:div>
</tei:front>
<tei:body>
<tei:p>and often produces</tei:p>
<tei:p>a hit.</tei:p>
</tei:body>
</tei:text>
,
$match := $text//exist:match,
$text-before-all := normalize-space(string-join($match/preceding::text(), ' ')),
$text-before := substring($text-before-all, string-length($text-before-all) - 30),
$text-after := substring(normalize-space(string-join($match/following::text(), ' ')), 1, 30)
return
<td>...{$text-before}
<span class="search-hit">
{$match/text()}
</span> {$text-after}...</td>
which is not really much of a query in XQuery either but just some XPath selection plus some possibly expensive string joining and extraction on the preceding and following axis.
This is the xml file.
<?xml version="1.0" encoding="UTF-8"?>
<root>
<AtcoCode> System-Start-Date= 2018-05-16T12:35:48.6929328-04:00, " ", System-End-Date = 9999-12-31, " ", 150042010003</AtcoCode>
<NaptanCode>esxatgjd</NaptanCode>
<PlateCode>
</PlateCode>
<CleardownCode>
</CleardownCode>
<CommonName>Upper Park</CommonName>
<CommonNameLang>
</CommonNameLang>
<ShortCommonName>
</ShortCommonName>
<ShortCommonNameLang>
</ShortCommonNameLang>
<Landmark>Upper Park</Landmark>
<LandmarkLang>
</LandmarkLang>
<Street>High Road</Street>
<StreetLang>
</StreetLang>
<Crossing>
</Crossing>
<CrossingLang>
</CrossingLang>
<Indicator>adj</Indicator>
<IndicatorLang>
</IndicatorLang>
<Bearing>NE</Bearing>
<NptgLocalityCode>E0046286</NptgLocalityCode>
<LocalityName>Loughton</LocalityName>
<ParentLocalityName>
</ParentLocalityName>
<GrandParentLocalityName>
</GrandParentLocalityName>
<Town>Loughton</Town>
<TownLang>
</TownLang>
<Suburb>
</Suburb>
<SuburbLang>
</SuburbLang>
<LocalityCentre>1</LocalityCentre>
<GridType>U</GridType>
<Easting>541906</Easting>
<Northing>195737</Northing>
<Co-ordinates>51.64255,0.04944</Co-ordinates>
<StopType>BCT</StopType>
<BusStopType>MKD</BusStopType>
<TimingStatus>OTH</TimingStatus>
<DefaultWaitTime>
</DefaultWaitTime>
<Notes>
</Notes>
<NotesLang>
</NotesLang>
<AdministrativeAreaCode>080</AdministrativeAreaCode>
<CreationDateTime>2006-11-06T00:00:00</CreationDateTime>
<ModificationDateTime>2010-01-16T07:58:02</ModificationDateTime>
<RevisionNumber>5</RevisionNumber>
<Modification>rev</Modification>
<Status>act</Status>
</root>
How to achieve this?
Question: Create the path range index for the status element and fetch all the documents that has status del
after fetching all the documents, you need to create the new element called currentreservationnumber under RevisionNumber element.
The value of the currentrevisionnumber will be +1 to the RevisionNumber.
I think the warning about sequential numbers is related to system-wide unique numbers/ids (like Oracle sequence), so not a worry in this case?
If you only ever have one RevisionNumber, and you can find it without a path index, you can maybe get by with element-value query on the RevisionNumber since it's already indexed.
Given that you get the document somehow, it could be as simple as:
let $doc := fn:doc ('/foo.xml')
let $rev-node := $doc/root/RevisionNumber
return xdmp:node-insert-after ($rev-node, <currentreservationnumber>{$rev-node + 1}</currentreservationnumber>)
though remember to consider locking if you are doing a big query/update. And you might need to switch to node-replace if there is already a currentreservationnumber.
I am extracting data from an XML file and I need to extract a delimited list of sub-elements. I have the following:
for $record in //record
let $person := $record/person/names
return concat($record/#uid/string()
,",", $record/#category/string()
,",", $person/first_name
,",", $person/last_name
,",", $record/details/citizenships
,"
")
The element "citizenships" contains sub-elements called "citizenship" and as the query stands it sticks them all together in one string, e.g. "UKFrance". I need to keep them in one string but separate them, e.g. "UK|France".
Thanks in advance for any help!
fn:string-join($arg1 as xs:string*, $arg2 as xs:string) is what you're looking for here.
In your currently desired usage, that would look something like the following:
fn:string-join($record/details/citizenships/citizenship, "|")
Testing outside your document, with:
fn:string-join(("UK", "France"), "|")
...returns:
UK|France
Notably, ("UK", "France") is a sequence of strings, just as a query returning multiple citizenships would likewise be a sequence (the entries in which will be evaluated for their string value when passed to fn:string-join(), which is typed as taking a sequence of strings for its first argument).
Consider the following (simplified) query:
declare context item := document { <root>
<record uid="1">
<person>
<citizenships>
<citizenship>France</citizenship>
<citizenship>UK</citizenship>
</citizenships>
</person>
</record>
</root> };
for $record in //record
return concat(fn:string-join($record//citizenship, "|"), "
")
...and its output:
France|UK
Hi I am new to marklogic and in Xquery world. I am not able to think of starting point to write the following logic in Marklogic Xquery. I would be thankful if somebody can give me idea/sample so I can achieve the following:
I want to Query A.XML based on a word lookup in B.XML. Query should produce C.XML. The logic should be as follows:
A.XML
<root>
<content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Creicket HBO</content>
</root>
B.XML
<WordLookUp>
<companies>
<company name="Vodafone">Vodafone</company>
<company name="Nokia">Nokia</company>
</companies>
<topics>
<topic group="Sports">Cricket</topic>
<topic group="Entertainment">HBO</topic>
<topic group="Finance">GDP</topic>
</topics>
<moods>
<mood number="4">Growth</mood>
<mood number="-5">Depression</mood>
<mood number="-3">Recession</mood>
</moods>
C.XML (Result XML)
<root>
<content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Creicket HBO</content>
<updatedElement>
<companies>
<company count="1">Vodafone</company>
<company count="2">Nokia</company>
</companies>
<mood>1</mood>
<topics>
<topic count="1">Sports</topic>
<topic count="1">Entertainment</topic>
</topics>
<word-count>22</word-count>
</updatedElement>
</root>
Search each company/text() of A.xml in B.xml, if match found create tag:
TAG {company count="Number of occurrence of that word"}company/#name
{/company}
Search each topic/text() of A.xml in B.xml, if match found create tag
TAG {topic topic="Number of occurrences of that word"}topic/#group{/topic}
Search each mood/text() of A.xml in B.xml, if match found
[occurrences of first word * {/mood[first word]/#number}] + [occurrences of second word * {/mood[second word]/#number})]....
get the word count of element.
This was a fun one, and I learned a few things in the process. Thanks!
Note: to get the results you wanted, I fixed a typo in A.xml ("Creicket" -> "Cricket").
The following solution uses two MarkLogic-specific functions:
cts:highlight (for replacing matching text with nodes which you can then count)
cts:tokenize (for breaking up a given string into word, space, and punctuation parts)
It also includes some powerful magic specific to those two functions, respectively:
the dynamic binding of the special variable $cts:text (which isn't really necessary for this particular use case, but I digress), and
the data model extension which adds these subtypes of xs:string:
cts:word,
cts:space, and
cts:punctuation.
Enjoy!
xquery version "1.0-ml";
(: Generic function using MarkLogic's ability to find query matches within a single node :)
declare function local:find-matches($content, $search-text) {
cts:highlight($content, $search-text, <MATCH>{$cts:text}</MATCH>)
//MATCH
};
(: Generic function using MarkLogic's ability to tokenize text into words, punctuation, and spaces :)
declare function local:get-words($text) {
cts:tokenize($text)[. instance of cts:word]
};
(: The rest of this is pure XQuery :)
let $content := doc("A.xml")/root/content,
$lookup := doc("B.xml")/WordLookUp
return
<root>
{$content}
<updatedElement>
<companies>{
for $company in $lookup/companies/company
let $results := local:find-matches($content, string($company))
where exists($results)
return
<company count="{count($results)}">{string($company/#name)}</company>
}</companies>
<mood>{
sum(
for $mood in $lookup/moods/mood
let $results := local:find-matches($content, string($mood))
return count($results) * $mood/#number
)
}</mood>
<topics>{
for $topic in $lookup/topics/topic
let $results := local:find-matches($content, string($topic))
where exists($results)
return
<topic count="{count($results)}">{string($topic/#group)}</topic>
}</topics>
<word-count>{
count(local:get-words($content))
}</word-count>
</updatedElement>
</root>
Let me know if you have any follow-up questions about how all the above works. At first, I was inclined to use cts:search or cts:contains, which are the bread and butter for search in MarkLogic. But I realized that this example wasn't so much about search (finding documents) as it was about looking up matching text within an already-given document. If you needed to extend this somehow to aggregate across a large number of documents, then you'd want to look into the additional use of cts:search or cts:contains.
One final caveat: if you think your content might have <MATCH> elements already, you'll want to use a different element name when calling cts:highlight (a name which you can guarantee won't conflict with your content's existing element names). Otherwise, you'll potentially get the wrong number of results (higher than the accurate count).
ADDENDUM:
I was curious if this could be done without cts:highlight, given that cts:tokenize already breaks up the text into all the words for you. The same result is produced using this alternative implementation of local:find-matches (provided you swap the order of the function declarations because one depends on the other):
(: Find word matches by comparing them one-by-one :)
declare function local:find-matches($content, $search-text) {
local:get-words($content)[cts:stem(.) = cts:stem($search-text)]
};
It uses cts:stem to normalize the given word to its stem, so, for example searching for "pass" will match "passed", etc. However, this still won't work for multi-word (phrase) searches. So to be safe, I'd stick with using cts:highlight, which, like cts:search and cts:contains, can handle any cts:query you give it (including simple word/phrase searches like we do above).
Might make sense to step back and ask if you might be better served modeling your data and or documents for use with a document oriented database instead of an rdbms
This is simpler/shorter and fully compliant XQuery not containing any implementation extensions, which make it work with any compliant XQuery 1.0 processor:
let $content := doc('file:///c:/temp/delete/A.xml')/*/*,
$lookup := doc('file:///c:/temp/delete/B.xml')/*,
$words := tokenize($content, '\W+')[.]
return
<root>
{$content}
<updatedElement>
<companies>
{for $c in $lookup/companies/*,
$occurs in count(index-of($words, $c))
return
if($occurs)
then
<company count="{$occurs}">
{$c/text()}
</company>
else ()
}
</companies>
<mood>
{
sum($lookup/moods/*[false or index-of($words, data(.))]/#number)
}
</mood>
<topics>
{for $t in $lookup/topics/*,
$occurs in count(index-of($words, $t))
return
if($occurs)
then
<topic count="{$occurs}">
{data($t/#group)}
</topic>
else ()
}
</topics>
<word-count>{count($words)}</word-count>
</updatedElement>
</root>
When applied on the provided files A.xml and B.XML (contained in the local directory c:/temp/delete), the wanted, correct result is produced:
<root>
<content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Cricket HBO</content>
<updatedElement>
<companies>
<company count="1">Vodafone</company>
<company count="2">Nokia</company>
</companies>
<mood>1</mood>
<topics>
<topic count="1">Sports</topic>
<topic count="1">Entertainment</topic>
</topics>
<word-count>22</word-count>
</updatedElement>
</root>
Using Xquery, how can I search the file below (consisting of many items), for all items with 'XC' in the part-number (there are many), then for matches return all 3 of the interesting data elements (part-number, part-name, and name)? The return is the main problem--my attempts result in every permutation of the interesting data elements. Thank you!
<catalog>
<item>
<description>
<partref>
<part-id>
<part-number>XC51222</part-number>
</part-id>
</partref>
<part-name>DSP, Network Vectoring<part-name>
<vendors>
<vendor1>
<pay-to>
<name>JCOF Industries</name>
</pay-to>
</vendor1>
</vendors>
</description>
</item>
<item>
</item>
[many items…]
</catalog>
xquery version "1.0";
let $sep := ','
for $x in catalog/item
where fn:matches($x/description/partref/part-id/part-number, 'XC')
return fn:string-join( ($x/description/partref/part-id/part-number/text(), $x/description/part-name/text(), $x/description/vendors/vendor1/pay-to/name/text()), $sep)