How to reduce duplicated nodes in XQuery result? - xquery

Currently, I have a problem in having duplicate nodes.
Here is the query where I suffer from duplicate node result.
for $cityA in doc("countries.xml")//city
for $cityB in doc("countries.xml")//city
where not ($cityA is $cityB) and $cityA/name = $cityB/name
return $cityA/name
and the result of my query is shown here:
<name>Hyderabad</name>
<name>Hyderabad</name>
But what I want is this:
<name>Hyderabad</name>
I understand the problem in my query, why duplicates occur. But: how can I get a result without duplicates?
The countries.xml file is available for download.

The problem is that you calculate the cross product, which you're then filtering. There are different ways to mitigate this. An obvious one would be to return only distinct values:
for $city in distinct-values(
for $cityA in doc("countries.xml")//city
for $cityB in doc("countries.xml")//city
where not ($cityA is $cityB) and $cityA/name = $cityB/name
return $cityA/name
)
return <name>{ $city }</name>
But this feels like a horrible hack. Better make sure that you're only returning the "first" result, which can be done using the node order operator << in a where clause:
for $cityA in doc("countries.xml")//city
for $cityB in doc("countries.xml")//city
where not ($cityA is $cityB) and $cityA/name = $cityB/name
where $cityA << $cityB
return $cityA/name
But still, this has the unnecessary explicit cross product. You can do without by changing the query:
for $city in doc("countries.xml")//city
where $city/following::city[name=$city/name]
return $city/name
This one loops over all cities, and selects whose that have another one occurring later in the document having the same value. You could even use predicates to perform the same query with plain XPath 1.0 (being a subset of XQuery) in a single line:
doc("countries.xml")//city[following::city/name=name]/name

In XQuery 3.0 you can use grouping
for $city in doc('countries.xml')//city
group by $name := $city/name
where count($city) ge 2
return <name>{$name}</city>

Related

MarkLogic optic query using two indexes returns no results

I want to use the MarkLogic optic API to join two range indexes but somehow they don't join. Is the query I wrote wrong or can't I compare the indexes used?
I have two indexes defined:
an element-attribute range index x/#refid
a range field index 'id'
Both are of type string and have the same collation defined. Both indexes have data that I can retrieve with cts:values() function. Both are huge indexes and I want to join them using optics so I have constructed the following query :
import module namespace op="http://marklogic.com/optic"
at "/MarkLogic/optic.xqy";
let $subfrag := op:fragment-id-col("subfrag")
let $notfrag := op:fragment-id-col("notfrag")
let $query :=
cts:and-query((
cts:collection-query("latest")
))
let $subids := op:from-lexicons(
map:entry("subid", cts:field-reference("id")), (), $subfrag) => op:where($query)
let $notids := op:from-lexicons(
map:entry("notid", cts:element-attribute-reference(xs:QName("x"), xs:QName("refid"))),
(),
$notfrag)
return $subids
=> op:join-cross-product($notids)
=> op:where(op:eq($notfrag, $subfrag))
=> op:result()
This query uses the join-cross-product and when I remove the op:where clause I get all values left and right. I verified and some are equal so the clause should filter only those rows i'm actually interested in. But somehow it doesn't work and I get an empty result. Also, if I replace one of the values in the op:eq with a string value it doesn't return a result.
When I use the same variable in the op:eq operator (like op:eq($notfrag, $notfrag)) I get results back so the statement as is works. Just not the comparison between the two indexes.
I have also used variants with join-inner and left-outer-join but those are also returning no results.
Am I comparing two incomparable indexes or am I missing some statement (as documentation/example is a bit thin).
(of course I can solve by not using optics but in this case it would be a perfect fit)
[update]
I got it working by eventually by changing the final statement:
return $subids
=> op:join-cross-product($notids)
=> op:where(op:eq(op:col('subid'), op:col('notid')))
=> op:result()
So somehow you cannot use the fragment definitions in the condition. After this I replaced the join-cross-product with a join-inner construction which should be a bit more efficient.
And to be complete, I initially used the example from the MarkLogic documentation found here (https://docs.marklogic.com/guide/app-dev/OpticAPI#id_87356), specifically the last example where they use a fragment column definition to be used as param in the join-inner statement that didn't work in my case.
Cross products are typically useful only for small rows sets.
Putting both reference in the same from-lexicons() accessor does an implicit join, meaning that the engine forms rows by constructing a local cross-product of the values indexed for each document.
Such a query could be expressed by:
op:from-lexicons(
map:entry("subid", cts:field-reference("id"))
=>map:with("notid", cts:element-attribute-reference(xs:QName("x"),
xs:QName("refid")))
=>op:where(cts:collection-query("latest"))
=>op:result()
Making the joins explicitly could be done with:
let $subids := op:from-lexicons(
map:entry("subid", cts:field-reference("id")), (), $subfrag)
=> op:where($query)
let $notids := op:from-lexicons(
map:entry("notid", cts:element-attribute-reference(xs:QName("x"),
xs:QName("refid"))),
(),
$notfrag)
return $subids
=> op:join-inner($notids, op:on($notfrag, $subfrag))
=> op:result()
Hoping that helps,

searching in multiple collections joined by common fileds in xquery marklogic

I have two collections('A' and 'B') with millions of transport insurance data documents. The two collections have four elements in common(customer-no, date-of-insurance, insurance-no,accident-number) and one element(license-no) exists only in one collection('A'). I want to extract all the documents that are present in both the collections and also have the element of collection'A'. I am able to retrieve all the customer-nos from 'A' with cts-search. Then I loop through each of these customer-nos to look for license-no in 'A'. It gives an empty sequence. But I know this is not possible. Could someone guide me with appropriate search logic?
let $col-A := cts:search(
doc(),
cts:and-query((
cts:collection-query('col-A'),
cts:element-value-query(xs:QName('abc:Acusno'), '*', (("wildcarded")))
)))
for $each in $col-A
let $col-B := cts:search(doc(),
cts:and-query((cts:collection-query('col-B'),
cts:element-value-query(xs:QName('abc:Bcusno'), $each)
)))
return $col-B
returns empty sequence
Your first cts:search is returning entire documents, which you are then passing in as argument into the value-query. You probably want to pass in just the value of abc:Acusno. You could do that with something like $each//abc:Acusno.
Your code is not using a very efficient approach though, and what if certain Acusno values occur multiple times?
I would recommend putting a range index on abc:Acusno, and using cts:values to pull up the unique values that match a given query. Then feed that entire list as one argument without any looping to a query against abc:Bcusno. You don't have to use a range index, and range query on Bcusno, but it could be useful to have that index anyhow. The code would then look something like this:
let $query :=
cts:and-query((
cts:collection-query('col-A'),
cts:element-query(xs:QName('abc:Acusno'), cts:true-query())
))
let $customerNrs :=
cts:values(
cts:element-reference(xs:QName("abc:Acusno")),
(),
(),
$query
)
return cts:search(
collection(),
cts:and-query((
cts:collection-query('col-B'),
cts:element-range-query(xs:QName('abc:Bcusno'), '=', $customerNrs)
))
)
Note: be careful when returning full search lists like this. You might want to paginate the response.
HTH!

number inside [] after in orderby in XQuery

Following is the query
for $x in $books
where $x/price>=38
order by ($x/price)[l]
return ($x/title, $x/price)
what is denoted by [1] located after order by($x/price)?
It looks to me like a lower-case-L rather than a digit-one.
If it's really a one [1] then it means select the first item in the sequence $x/price. I suspect each book has only one price, in which case it's completely redundant.

Compare two elements of the same document in MarkLogic

I have a MarkLogic 8 database in which there are documents which have two date time fields:
created-on
active-since
I am trying to write an Xquery to search all the documents for which the value of active-since is less than the value of created-on
Currently I am using the following FLWOR exression:
for $entity in fn:collection("entities")
let $id := fn:data($entity//id)
let $created-on := fn:data($entity//created-on)
let $active-since := fn:data($entity//active-since)
where $active-since < $created-on
return
(
$id,
$created-on,
$active-since
)
The above query takes too long to execute and with increase in the number of documents the execution time of this query will also increase.
Also, I have
element-range-index for both the above mentioned dateTime fields but they are not getting used here. The cts-element-query function only compares one element with a set of atomic values. In my case I am trying to compare two elements of the same document.
I think there should be a better and optimized solution for this problem.
Please let me know in case there is any search function or any other approach which will be suitable in this scenario.
This may be efficient enough for you.
Take one of the values and build a range query per value. This all uses the range indexes, so in that sense, it is efficient. However, at some point, there is a large query that us built. It reads similiar to a flword statement. If really wanted to be a bit more efficient, you could find out which if your elements had less unique values (size of the index) and use that for your iteration - thus building a smaller query. Also, you will note that on the element-values call, I also constrain it to your collection. This is just in case you happen to have that element in documents outside of your collection. This keeps the list to only those values you know are in your collection:
let $q := cts:or-query(
for $created-on in cts:element-values(xs:QName("created-on"), (), cts:collection-query("entities"))
return cts:element-value-range-query(xs:Qname("active-since"), "<" $created-on)
)
return
cts:search(
fn:collection("entities"),
$q
)
So, lets explain what is happening in a simple example:
Lets say I have elements A and B - each with a range index defined.
Lets pretend we have the combinations like this in 5 documents:
A,B
2,3
4,2
2,7
5,4
2,9
let $ := cts:or-query(
for $a in cts:element-values(xs:QName("A"))
return cts:element-value-range-query(xs:Qname("B"), "<" $a)
)
This would create the following query:
cts:or-query(
(
cts:element-value-range-query(xs:Qname("B"), "<" 2),
cts:element-value-range-query(xs:Qname("B"), "<" 4),
cts:element-value-range-query(xs:Qname("B"), "<" 5)
)
)
And in the example above, the only match would be the document with the combination: (5,4)
You might try using cts:tuple-values(). Pass in three references: active-since, created-on, and the URI reference. Then iterate the results looking for ones where active-since is less than created-on, and you'll have the URI of the doc.
It's not the prettiest code, but it will let all the data come from RAM, so it should scale nicely.
I am now using the following script to get the count of documents for which the value of active-since is less than the value of created-on:
fn:sum(
for $value-pairs in cts:value-tuples(
(
cts:element-reference(xs:QName("created-on")),
cts:element-reference(xs:QName("active-since"))
),
("fragment-frequency"),
cts:collection-query("entities")
)
let $created-on := json:array-values($value-pairs)[1]
let $active-since := json:array-values($value-pairs)[2]
return
if($active-since lt $created-on) then cts:frequency($value-pairs) else 0
)
Sorry for not having enough reputation, hence I need to comment here on your answer. Why do you think that ML will not return (2,3) and (4,2). I believe we are using an Or-query which will take any single query as true and return the document.

How to set the value of one field to another filed in using XQuery

Very new to XQuery and MarkLogic, what is the XQuery version of the following statement?
update all_the_records
set B_field = A_field
where B_field is null and A_field is not null
Something like this might get you started. But remember that you're working with trees, not tables. Things are generally more complicated because of that extra dimension.
for $doc in collection()/doc[not(b)][a]
let $a as element() := $doc/a
return xdmp:node-insert-child($doc, element b { $a/#*, $a/node() })

Resources