Complicated (?) nesting - xquery

I'm new to this and it's easy to get stuck in SQL mode.
I've prepared a small example to illustrate my problem.
<bigXML>
<Product>
<Attributes>
<Attribute>
<Descriptions>
<Description languageCode='DE'>Farbe</Description>
<Description languageCode='EN'>Color</Description>
</Descriptions>
<Value>0000ff</Value>
</Attribute>
<Attribute>
<Descriptions>
<Description languageCode='DE'>Länge</Description>
<Description languageCode='EN'>Length</Description>
</Descriptions>
<Value>2 mm</Value>
</Attribute>
<Attribute>
<Descriptions>
<Description languageCode='DE'>Name</Description>
<Description languageCode='EN'>Name</Description>
</Descriptions>
<Value>Circle</Value>
</Attribute>
</Attributes>
</Product>
</bigXML>
I want to get to the VALUE of 0000ff.
Here's my attempt:
<Reply>{for $i in //Product where $i/Attributes/Attribute/Descriptions/Description="Color" return $i/Attributes/Attribute/Value}</Reply>
It returns the values of all VALUE tags eventhough I specifically (maybe?) asked for the one where Description is Color.
Please tell me what part of the WHERE syntax I'm getting wrong.

Since you only fix the Product node in the for loop and then traverse down to the Attributes twice (once to check, and once to retrieve the Value), you cannot make sure that you only get values of Color attributes. Your query says: "For every Product that has one or more Color attributes, return all attribute values."
One easy fix is to just iterate over attributes instead of products:
<Reply>{
for $attr in //Product/Attributes/Attribute
where $attr/Descriptions/Description="Color"
return $attr/Value
}</Reply>
If you also need the product reference, you can use nested loops:
<Reply>{
for $i in //Product
for $attr in $i/Attributes/Attribute
where $attr/Descriptions/Description="Color"
return $attr/Value
}</Reply>
You can also replace the where with an XPath predicate and make the whole expression shorter:
<Reply>{
//Product/Attributes/Attribute[Descriptions/Description="Color"]/Value
}</Reply>

Try
//Attribute[Descriptions[Description='Color']]/Value
It selects 0000ff.

Related

Exist-db Add node with XQUERY

I have a listPers.xml (TEI List containing persons, obviously ) . I want to write a function to update the listPers.xml
My function looks like this:
declare function app:addPerson($node as node(), $model as map(*)) {
let $person := "<person xml:id=""><persName><forename>Albert</forename><surname>Test</surname></persName></person>"
let $list := doc(concat($config:app-root, '/resources/listPers_test.xml'))
return
update insert $person into $list//tei:listPerson
};
And the listPerson.xml
looks more or less like a typical list with person-entries
I have a tei:header (here omitted) followed by
<text>
<body>
<listPerson xml:id="person">
<person xml:id="abbadie_jacques">
<persName ref="http://d-nb.info/gnd/100002307">
<forename>Jacques</forename>
<surname>Abbadie</surname>
</persName>
<note>Prediger der französisch-reformierten Gemeinde in <rs type="place" ref="#berlin">Berlin</rs>
</note>
</person>
</body>
</text>
</TEI>
(sorry for ruining indentions, it's just an excerpt )
I do not get an error, which means that my app:addPerson should be fine, right?
I want the listPers_test to look like this:
<text>
<body>
<listPerson xml:id="person">
<person xml:id="abbadie_jacques">
<persName ref="http://d-nb.info/gnd/100002307">
<forename>Jacques</forename>
<surname>Abbadie</surname>
</persName>
<note>Prediger der französisch-reformierten Gemeinde in <rs type="place" ref="#berlin">Berlin</rs>
</note>
</person>
<!-- here comes the output that I wish to have :-) -->
<person xml:id=""><persName><forename>Albert</forename><surname>Test</surname></persName></person>
</body>
</text>
</TEI>
In the long run, I aim for an html-form that allows users to input names etc., where ids are generated using sth like
to-lowercase(concat($surname, "_", $forename));
But I will not get into my questions regarding forms and xquery, as I have barely done a quick Google-trip regarding html forms and xquery!
Can anyone hint me at why I do not get the listPers_test.xml file updated with the second value? :-)
All the best and thanks in advance to everyone,
K
Alright, I have a solution for anyone interested in it:
My first snippet $person:= ... contains a STRING, not an element.Changing the line
let $person := "<person xml:id=""><persName><forename>Albert</forename><surname>Test</surname></persName></person>"
to this one actually solves the issue:
let $person := <tei:person xml:id=""><persName><forename>Albert</forename><surname>Test</surname></persName></tei:person>

XmlReader how to read or skip a specific child that does not always exist

I have a big XML file that I must read with XmlReader because it can not be loaded into memory. This XML is formatted in this way (is a reduced version):
<?xml version="1.0" encoding="windows-1252"?>
<Products>
<Product>
<Code>A14</Code>
<Name>Name1</Name>
<Manufacturer>
<Name>ManufacturerName</Name>
</Manufacturer>
<ProdCategories>
<ProdCategory>
<Code>015</Code>
<Name>ProdCategoryName</Name>
</ProdCategory>
</ProdCategories>
<Barcodes> <!-- note this line -->
</Barcodes>
</Product>
<Product>
<Code>A15</Code>
<Name>Name2</Name>
<Manufacturer>
<Name>ManufacturerName</Name>
</Manufacturer>
<ProdCategories>
<ProdCategory>
<Code>016</Code>
<Name>ProdCategoryName</Name>
</ProdCategory>
</ProdCategories>
<Barcodes>
<Barcode>
<Code>1234567890</Code> <!-- note this line -->
</Brcode>
</Barcodes>
</Product>
Note the <Barcode> <Code> elements: in the first <product> is missing.
This is the code that I use for read it and for put these data in a database:
XmlReader reader = XmlReader.Create("Products.xml");
reader.MoveToContent();
do
{
reader.ReadToFollowing("Code");
code = reader.ReadElementContentAsString();
reader.ReadToFollowing("Name");
Name = reader.ReadElementContentAsString();
reader.ReadToFollowing("Name");
ManufacturerName = reader.ReadElementContentAsString();
reader.ReadToFollowing("Code");
ProdCategoryCode = reader.ReadElementContentAsString();
reader.ReadToFollowing("Code");
BarcodeCode = reader.ReadElementContentAsString();
//Here I use "code", "Name", "ManufacturerName" variables to insert into a database
} while (reader.Read());
reader.Close();
All XML tags are present in all products except the <Barcodes> childs (<Barcode><Code>) that is present only on some product, then I cannot jump at next "code" with last ReadToFollowing because if not present I capture the first <product><code>.
I cant control XML output and cant modify it (is third-party).
There's a way to "ReadToFollowing('<Barcodes><Barcode><Code>')" so that I can specific what should seek and if there is not found I can jump it?
Thank you for your help, excuse my bad english.
I would suggest to pull each Product element into a tree model, using either https://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.readfrom(v=vs.110).aspx or https://msdn.microsoft.com/en-us/library/system.xml.xmldocument.readnode(v=vs.110).aspx, then you can use LINQ to XML query methods or XPath to read out the data of each Product in a safe way while maintaining a low memory footprint.

Filtering by attribute

Below is a excerpt from a XML file with 65 lectures:
<?xml version="1.0" encoding="iso-8859-1" ?>
<university>
<lecture>
<class>English</class>
<hours>3</hours>
<pupils>30</pupils>
</lecture>
<lecture>
<class>Math</class>
<hours>4</hours>
<pupils>27</pupils
</lecture>
<lecture>
<class>Science</class>
<hours>2</hours>
<pupils>25</pupils>
</lecture>
</university>
I need a where clause that gives me a list of lectures with more pupils than an English lecture. However, not with the attribute "30" used, but calling the English's lecture attribute instead
E.g., I want to use a where clause with a condition like pupils > English.pupils, instead of pupils > 30.
(The "pupils > English.pupils" is just puesdo code as an example)
A where clause isn't strictly necessary, but to use one you would make it part of a for iterator:
let $lectures := doc("lectures.xml")/university/lecture
let $english-pupils := $lectures[class = "English"]/pupils/xs:integer(.)
for $lecture in $lectures
where ($lecture/pupils/xs:integer(.) gt $english-pupils)
return $lecture
You could also avoid the flwor altogether by using an XPath predicate.
let $lectures := doc("lectures.xml")/university/lecture
let $english-pupils := $lectures[class = "English"]/pupils/xs:integer(.)
return $lectures[pupils/xs:integer(.) gt $english-pupils]

XQuery: update insert attribute failed

test.xml:
<?xml version="1.0" encoding="UTF-8"?>
<breakfast_menu>
<food>
<name>French Toast aaa</name>
<price>$5.95</price>
<description>Our famous Belgian Waffles with plenty of real maple syrup</description>
<calories>650</calories>
</food>
<food>
<name>French Toast</name>
<price>$4.50</price>
<description>Thick slices made from our homemade sourdough bread</description>
<calories>600</calories>
</food>
<food>
<name>Homestyle Breakfast</name>
<price>$6.95</price>
<description>Two eggs, bacon or sausage, toast, and our ever-popular hash browns</description>
<calories>950</calories>
</food>
</breakfast_menu>
test.xqy:
for $x in doc('test.xml')//*
return update insert attribute id {'abcd'} into $x
For each XML markup I add a new attribute.
The xqy file is pretty simple. And I got:
[XPST0003] Unexpected end of query: 'insert attribut...'.
Any help?
You're having two issues here:
misuse of the update statement and
missing node keyword.
The BaseX-specific update statement is only meant to be used with the copy/modify construct; you don't need it here. Then, the operator for inserting any kinds of nodes is always insert node $node [positional clause] into $target with an optional [positional clause]. Instead of a node variable $node, you can of course also use a node constructor like attribute id {'abcd'}.
The correct query is:
for $x in doc('test.xml')//*
return insert node attribute id {'abcd'} into $x

MarkLogic Join Query

Hi I am new to marklogic and in Xquery world. I am not able to think of starting point to write the following logic in Marklogic Xquery. I would be thankful if somebody can give me idea/sample so I can achieve the following:
I want to Query A.XML based on a word lookup in B.XML. Query should produce C.XML. The logic should be as follows:
A.XML
<root>
<content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Creicket HBO</content>
</root>
B.XML
<WordLookUp>
<companies>
<company name="Vodafone">Vodafone</company>
<company name="Nokia">Nokia</company>
</companies>
<topics>
<topic group="Sports">Cricket</topic>
<topic group="Entertainment">HBO</topic>
<topic group="Finance">GDP</topic>
</topics>
<moods>
<mood number="4">Growth</mood>
<mood number="-5">Depression</mood>
<mood number="-3">Recession</mood>
</moods>
C.XML (Result XML)
<root>
<content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Creicket HBO</content>
<updatedElement>
<companies>
<company count="1">Vodafone</company>
<company count="2">Nokia</company>
</companies>
<mood>1</mood>
<topics>
<topic count="1">Sports</topic>
<topic count="1">Entertainment</topic>
</topics>
<word-count>22</word-count>
</updatedElement>
</root>
Search each company/text() of A.xml in B.xml, if match found create tag:
TAG {company count="Number of occurrence of that word"}company/#name
{/company}
Search each topic/text() of A.xml in B.xml, if match found create tag
TAG {topic topic="Number of occurrences of that word"}topic/#group{/topic}
Search each mood/text() of A.xml in B.xml, if match found
[occurrences of first word * {/mood[first word]/#number}] + [occurrences of second word * {/mood[second word]/#number})]....
get the word count of element.
This was a fun one, and I learned a few things in the process. Thanks!
Note: to get the results you wanted, I fixed a typo in A.xml ("Creicket" -> "Cricket").
The following solution uses two MarkLogic-specific functions:
cts:highlight (for replacing matching text with nodes which you can then count)
cts:tokenize (for breaking up a given string into word, space, and punctuation parts)
It also includes some powerful magic specific to those two functions, respectively:
the dynamic binding of the special variable $cts:text (which isn't really necessary for this particular use case, but I digress), and
the data model extension which adds these subtypes of xs:string:
cts:word,
cts:space, and
cts:punctuation.
Enjoy!
xquery version "1.0-ml";
(: Generic function using MarkLogic's ability to find query matches within a single node :)
declare function local:find-matches($content, $search-text) {
cts:highlight($content, $search-text, <MATCH>{$cts:text}</MATCH>)
//MATCH
};
(: Generic function using MarkLogic's ability to tokenize text into words, punctuation, and spaces :)
declare function local:get-words($text) {
cts:tokenize($text)[. instance of cts:word]
};
(: The rest of this is pure XQuery :)
let $content := doc("A.xml")/root/content,
$lookup := doc("B.xml")/WordLookUp
return
<root>
{$content}
<updatedElement>
<companies>{
for $company in $lookup/companies/company
let $results := local:find-matches($content, string($company))
where exists($results)
return
<company count="{count($results)}">{string($company/#name)}</company>
}</companies>
<mood>{
sum(
for $mood in $lookup/moods/mood
let $results := local:find-matches($content, string($mood))
return count($results) * $mood/#number
)
}</mood>
<topics>{
for $topic in $lookup/topics/topic
let $results := local:find-matches($content, string($topic))
where exists($results)
return
<topic count="{count($results)}">{string($topic/#group)}</topic>
}</topics>
<word-count>{
count(local:get-words($content))
}</word-count>
</updatedElement>
</root>
Let me know if you have any follow-up questions about how all the above works. At first, I was inclined to use cts:search or cts:contains, which are the bread and butter for search in MarkLogic. But I realized that this example wasn't so much about search (finding documents) as it was about looking up matching text within an already-given document. If you needed to extend this somehow to aggregate across a large number of documents, then you'd want to look into the additional use of cts:search or cts:contains.
One final caveat: if you think your content might have <MATCH> elements already, you'll want to use a different element name when calling cts:highlight (a name which you can guarantee won't conflict with your content's existing element names). Otherwise, you'll potentially get the wrong number of results (higher than the accurate count).
ADDENDUM:
I was curious if this could be done without cts:highlight, given that cts:tokenize already breaks up the text into all the words for you. The same result is produced using this alternative implementation of local:find-matches (provided you swap the order of the function declarations because one depends on the other):
(: Find word matches by comparing them one-by-one :)
declare function local:find-matches($content, $search-text) {
local:get-words($content)[cts:stem(.) = cts:stem($search-text)]
};
It uses cts:stem to normalize the given word to its stem, so, for example searching for "pass" will match "passed", etc. However, this still won't work for multi-word (phrase) searches. So to be safe, I'd stick with using cts:highlight, which, like cts:search and cts:contains, can handle any cts:query you give it (including simple word/phrase searches like we do above).
Might make sense to step back and ask if you might be better served modeling your data and or documents for use with a document oriented database instead of an rdbms
This is simpler/shorter and fully compliant XQuery not containing any implementation extensions, which make it work with any compliant XQuery 1.0 processor:
let $content := doc('file:///c:/temp/delete/A.xml')/*/*,
$lookup := doc('file:///c:/temp/delete/B.xml')/*,
$words := tokenize($content, '\W+')[.]
return
<root>
{$content}
<updatedElement>
<companies>
{for $c in $lookup/companies/*,
$occurs in count(index-of($words, $c))
return
if($occurs)
then
<company count="{$occurs}">
{$c/text()}
</company>
else ()
}
</companies>
<mood>
{
sum($lookup/moods/*[false or index-of($words, data(.))]/#number)
}
</mood>
<topics>
{for $t in $lookup/topics/*,
$occurs in count(index-of($words, $t))
return
if($occurs)
then
<topic count="{$occurs}">
{data($t/#group)}
</topic>
else ()
}
</topics>
<word-count>{count($words)}</word-count>
</updatedElement>
</root>
When applied on the provided files A.xml and B.XML (contained in the local directory c:/temp/delete), the wanted, correct result is produced:
<root>
<content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Cricket HBO</content>
<updatedElement>
<companies>
<company count="1">Vodafone</company>
<company count="2">Nokia</company>
</companies>
<mood>1</mood>
<topics>
<topic count="1">Sports</topic>
<topic count="1">Entertainment</topic>
</topics>
<word-count>22</word-count>
</updatedElement>
</root>

Resources