I'm still new to xQuery / MarkLogic and I'm having trouble understanding how to query based on the number of elements in the XML document. For example, imagine I have a database of XML documents roughly similar to the following:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
</book>
<book category="web">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
As you can see in book[2], price is missing. Most documents in the database I'm working with would either have the child element price for each book or no price element attached to any of the book elements. My goal is to find only the documents where some of the child elements are missing (like the above XML); and ignore the documents where either all the child elements exist or where none of the child elements exist. So in my head the logic is something along the lines of "return results where the number of price elements is < the number of book elements AND > 0."
The best I can do so far is the following query:
let $some-docs := cts:search(fn:collection('/my/collection'),
cts:and-query((
cts:element-query(xs:QName("book"), cts:true-query()),
cts:not-query(cts:element-query(xs:QName("price"), cts:true-query()))
)))
return (xdmp:node-uri($some-docs))
But this obviously only returns documents where book elements exist and no price elements exist. I need a way of indicating I want the documents where the price element exists, but is missing for some books.
I prefer a solution that is using the cts:search function, but any help is appreciated
I need a way of indicating I want the documents where the price element exists, but is missing for some books.
So basically you need to find documents that have both <bookstore><book><price/></book></bookstore> and ones missing the child <price/> element?
The simplest thing to do is modify the existing documents using a tool like CORB to include an element indicating that document matches your criteria or perhaps place them in a distinct collection. Then just use CTS to return documents with that added indicator.
If you don't want to touch the dataset you could create a field range index on /bookstore/book/price and /bookstore/book[not(./price)]/title. Then you just need to query for documents where both indexes are present with something like:
cts:and-query((
cts:field-word-query("field1", "*", ("wildcarded")),
cts:field-word-query("field2", "*", ("wildcarded"))
))
Getting the count of elements within a document isn't something that is exposed and available for a query. You could apply a predicate filter and test if there are any book that do not have a price for the docs returned from the search for those bookstore docs:
cts:search(fn:collection('/my/collection'),
cts:element-query(xs:QName("book"), cts:true-query())
)[bookstore/book[not(price)]]
return results where the number of price elements is < the number of book elements AND > 0
You could write not(count(//price) = (count(//book), 0))
or perhaps
empty(//price) or empty(//book[not(price)]
It seems a very strange query though. Perhaps you should be using a schema for validation?
Related
I am trying to understand difference between cts:element-query, cts:element-value-query and cts:element-word-query using cts:search().
When someone can achieve the same thing using all three why did they created these many?
I am sure I am missing something here to understand. I have following data:
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
<CD>
<TITLE>Hide your heart</TITLE>
<ARTIST>Bonnie Tyler</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>CBS Records</COMPANY>
<PRICE>9.90</PRICE>
<YEAR>1988</YEAR>
</CD>
<CD>
<TITLE>Greatest Hits</TITLE>
<ARTIST>Dolly Parton</ARTIST>
<COUNTRY>EU</COUNTRY>
<COMPANY>RCA</COMPANY>
<PRICE>9.90</PRICE>
<YEAR>1982</YEAR>
</CD>
</CATALOG>
I want to filter the data for country say "EU". I can achieve the same thing with any query listed below.
cts:search(//CD,cts:element-query(xs:QName("COUNTRY"),"EU"))
cts:search(//CD,cts:element-value-query(xs:QName("COUNTRY"),"EU"))
cts:search(//CD,cts:element-word-query(xs:QName("COUNTRY"),"EU"))
So what is the difference? When to use what? Can someone help me understand?
My understand was to use cts:search with cts:element-query. I was researching with the other queries if I can get the same thing using other queries too. (I have gone thru the documentation I still don't understand). Can someone please give me a simple explanation?
Those three cts:element-* query functions have some overlapping functionality, and it is possible to get the same results, but there are some key differences that affect what is possible and how efficient the query may be for your system.
cts:element-query() is a container query. It matches the element specified in the first parameter. The query from second parameter is applied to the matched element and all of its descendants. So the cts:word-query would match the text of COUNTRY or any descendant elements, if there were a more complex structure.
Using xdmp:plan() to see the query plan,
xdmp:plan(cts:search(//CD,cts:element-query(xs:QName("COUNTRY"),"EU")))
you can see the plan has criteria with an unconstrained word-query being applied:
<qry:term-query weight="1">
<qry:key>17785254954065741518</qry:key>
<qry:annotation>word("EU")</qry:annotation>
</qry:term-query>
cts:element-value-query() only matches against simple elements (that is, elements that contain only text and have no element children) with text content matching the phrase from the second parameter.
The xdmp:plan() for that query:
xdmp:plan( cts:search(//CD,cts:element-value-query(xs:QName("COUNTRY"),"EU")) )
reveals that there is a value being applied specifically to the COUNTRY element:
<qry:term-query weight="1">
<qry:key>9358511946618902997</qry:key>
<qry:annotation>element(COUNTRY,value("EU"))</qry:annotation>
</qry:term-query>
cts:element-word-query() is similar to a cts:element-value-query except that it searches only through immediate text node children of the specified element as well as any text node children of child elements defined in the Admin Interface as element-word-query-throughs or phrase-throughs. It does not search through any other children of the specified element.
The xdmp:plan() for that query:
xdmp:plan( cts:search(//CD,cts:element-word-query(xs:QName("COUNTRY"),"EU")) )
shows that there is a word query applied specifically to the COUNTRY element:
<qry:term-query weight="1">
<qry:key>6958980695756965065</qry:key>
<qry:annotation>element(COUNTRY,word("EU"))</qry:annotation>
</qry:term-query>
cts:element-word-query is most helpful if you had mixed content and a known vocabulary of specific elements that you want to be able to "see through" when searching. One example is MS Word or XHTML markup in which there are elements wrapping text that are used for applying styling and formatting, such as <b>, <i>, and <u> inside of a <p> and you wanted to search for a word in a given paragraph and search through the b, i, and u child elements.
For this specific instance, looking to search for a value in a specific element, you should use:
cts:search(//CD,cts:element-value-query(xs:QName("COUNTRY"),"EU"))
It is the most specific and efficient means of telling MarkLogic that you want to search for the value "EU" in the COUNTRY element (and not any of it's children or descendants).
I am trying to map parts of the following source structure that has two sets of properties - one flat and one looped:
Source Document
<root>
<flat>
<prop1>foo</prop1>
<prop2>bar</prop2>
...
</flat>
<loop>
<prop>
<qual>propA</qual>
<data>baz</data>
<more>blah</more>
</prop>
<prop>
<qual>propB</qual>
<data>qux</data>
<more>bhal</more>
</prop>
...
</loop>
</root>
Specifically, the flat part is the PO1 segment of an X12 850 EDI document, and the looping properties are the subsequent REF segments.
These should be mapped to a looping destination structure of key-value pairs that looks like this:
Destination Document
<root>
<props>
<prop>
<name>prop1</name>
<value>foo</value>
</prop>
<prop>
<name>propA</name>
<value>baz</value>
</prop>
</props>
</root>
I would like to map only some of the values, depending on the property name.
What I've Tried
I have successfully mapped the flat portion to the destination using a table looping functoid and two table extractor functoids:
I have also successfully mapped the looping portion to the destination using a looping functoid and some equality checks to select only certain qual values:
When I attempt to include both of these mappings at the same time, the map succeeds, but doesn't generate the combined output.
The Question
How I can I map both sections of the source document to the same looping section in the destination document?
Update 1
Turns out I had oversimplified the problem; the flat group of properties actually contains the property name in one node and the value in another node. This is what they actually look like:
<flat>
<name1>prop1</name1>
<value1>foo</value1>
<name2>prop2</name2>
<value2>bar</value2>
...
</flat>
The concept of #Dijkgraaf's answer still works with this change if you use a Value Mapping (Flattening) functoid to get the property name from the correct location.
Usually the only way to solve this is with either
Inline Custom XSLT via the Scripting Functoid
Custom XSLT setting Custom XSLT Path for the whole map
Having an intermediate schema that contains two Option nodes and having two maps. The first that maps the flat structure to one node and the looping to the second. Then a second map that loops across both and maps to to the same node.
In your case however, you need to have both (prop1,prop2,..) and the looping prop linked to the same looping functoid, and linking to the name and value and setting the link properties on the links from prop1,prop2 etc. to Copy name instead of value.
With your sample input that gives
<root>
<props>
<prop>
<name>prop1</name>
<value>foo</value>
</prop>
<prop>
<name>prop2</name>
<value>bar</value>
</prop>
<prop>
<name>propA</name>
<value>baz</value>
</prop>
<prop>
<name>propB</name>
<value>qux</value>
</prop>
</props>
</root>
I want to find all movies which don't have styles of anthology and art.
To achieve this I am using the following query
for $movie in db:open("movies","movies.xml")/movies/movie
where not(deep-equal(($movie/styles/style),("anthology","art")))
return $movie
However, all nodes are getting selected instead of filtering them.
What is going wrong?
You query doesn't make much sense and deep-equal isn't useful here at all. The following will return all movies with a style not equal to anthology or art:
db:open("movies", "movies.xml")/movies/movie[not(styles/style = ("anthology", "art"))]
I am using BaseX to store XML data with multiple nodes in the following format:
<root>
<item id="65816" parent_id="45761" type="test">
<content>
<name>Name of my node on the tree</name>
</content>
</item>
</root>
The code above is essentially one typical node under 'root'.
Now, I am trying to delete a node based on the 'id' property of the 'Item' object.
I looked at the documentation on BaseX.org but that does not explicitly tell me how to deal with nodes which have IDs linked to it. I am trying to something like this:
XQUERY delete node //root/item.id="65816"
Note: The above line doesn't work. That is just to give an idea of what I am trying to achieve.
This is a newbie MarkLogic question. Imagine an xml structure like this, a condensation of my real business problem:
<Person id="1">
<Name>Bob</Name>
<City>Oakland</City>
<Phone>2122931022</Phone>
<Phone>3123032902</Phone>
</Person>
Note that a document can and will have multiple Phone elements.
I have a requirement to return information from EVERY document that has a Phone element that matches ANY of a list of phone numbers. The list may have a couple of dozen phone numbers in it.
I have tried this:
let $a := cts:word-query("3738494044")
let $b := cts:word-query("2373839383")
let $c := cts:word-query("3933849383")
let $or := cts:or-query( ($a, $b, $c) )
return cts:search(/Person/Phone, $or)
which does the query properly, but it returns a sequence of Phone elements inside a Results element. My goal is instead to return all the Name and City elements along with the id attribute from the Person element, for every matching document. Example:
<results>
<match id="18" phone="2123339494" name="bob" city="oakland"/>
<match id="22" phone="3940594844" name="mary" city="denver"/>
etc...
</results>
So I think I need some form of cts:search that allows both this boolean capability but also allows me to specify what part of each document gets returned. At that point then I could further process the result with XPATH. I need to do this efficiently so for example I think it would NOT be efficient to return a list of document uri's and then query for each document in a loop. Thanks!
Your approach is not as bad as you might think. There are only a few changes necessary to make it work as you like.
First of all, you are better off using cts:element-value-query instead of cts:word-query. It will allow you to limit the searched values to a specific element. It performs best when you add an element range index for that element, but it is not required. It can rely on the always present word index as well.
Secondly, there is no need for the cts:or-query. Both cts:word-query and cts:element-value-query functions (as well as all other related functions) accept multiple search strings as one sequence argument. They are automatically treated as or-query.
Thirdly, the phone numbers are your 'primary key' in the result, so returning a list of all matching Phone elements is the way to go. You just need to realize that the resulting Phone element are still aware of where they came from. You can easily use XPath to navigate to parent and siblings.
Fourthly, there is nothing against looping over the search results. It may sound a bit weird, but it doesn't cost much extra performance. Actually, it is pretty much negligable, in MarkLogic Server that is. Most performance could be lost when you try to return many results (more than several thousands), in which case most time is lost in serializing it all. And if it is likely you will have to handle lots of search results, it is wise to start using pagination straight away.
To get what you ask, you could use the following code:
<results>{
for $phone in
cts:search(
doc()/Person/Phone,
cts:element-value-query(
xs:QName("Phone"),
("3738494044", "2373839383", "3933849383")
)
)
return
<match id="{data($phone/../#id)}" phone="{data($phone)}" name="{data($phone/../Name)}" city="{data($phone/../City)}"/>
}</results>
Best of luck.
Here's what I would do:
let $numbers := ("3738494044", "2373839383", "3933849383")
return
<results>{
for $person in cts:search(/Person, cts:element-value-query(xs:QName("Phone"),$numbers))
return
<match id="{data($person/#id)}" name="{data($person/Name)}" city="{data($person/City)}">
{
for $phone in $person/Phone[cts:contains(.,$numbers)]
return element phone {$phone}
}
</match>
}
First, there's an implicit OR when passing multiple values into word-query and value-query and their cousins, and this query is more efficiently resolved from the indexes, so do this when you can.
Second, an individual might match on more than one phone number, so you need that additional inner loop to effectively group by individual.
I would not create a range index for this - no need, and it isn't necessarily faster. There are indexes for element values by default, so you can leverage those with element-value-query.
You could do all of this with the SearchAPI and a little XSLT. That would make it easy to start combining names and numbers and other conditions in a single query.