Xquery on MarkLogic using OR - xquery

This is a newbie MarkLogic question. Imagine an xml structure like this, a condensation of my real business problem:
<Person id="1">
<Name>Bob</Name>
<City>Oakland</City>
<Phone>2122931022</Phone>
<Phone>3123032902</Phone>
</Person>
Note that a document can and will have multiple Phone elements.
I have a requirement to return information from EVERY document that has a Phone element that matches ANY of a list of phone numbers. The list may have a couple of dozen phone numbers in it.
I have tried this:
let $a := cts:word-query("3738494044")
let $b := cts:word-query("2373839383")
let $c := cts:word-query("3933849383")
let $or := cts:or-query( ($a, $b, $c) )
return cts:search(/Person/Phone, $or)
which does the query properly, but it returns a sequence of Phone elements inside a Results element. My goal is instead to return all the Name and City elements along with the id attribute from the Person element, for every matching document. Example:
<results>
<match id="18" phone="2123339494" name="bob" city="oakland"/>
<match id="22" phone="3940594844" name="mary" city="denver"/>
etc...
</results>
So I think I need some form of cts:search that allows both this boolean capability but also allows me to specify what part of each document gets returned. At that point then I could further process the result with XPATH. I need to do this efficiently so for example I think it would NOT be efficient to return a list of document uri's and then query for each document in a loop. Thanks!

Your approach is not as bad as you might think. There are only a few changes necessary to make it work as you like.
First of all, you are better off using cts:element-value-query instead of cts:word-query. It will allow you to limit the searched values to a specific element. It performs best when you add an element range index for that element, but it is not required. It can rely on the always present word index as well.
Secondly, there is no need for the cts:or-query. Both cts:word-query and cts:element-value-query functions (as well as all other related functions) accept multiple search strings as one sequence argument. They are automatically treated as or-query.
Thirdly, the phone numbers are your 'primary key' in the result, so returning a list of all matching Phone elements is the way to go. You just need to realize that the resulting Phone element are still aware of where they came from. You can easily use XPath to navigate to parent and siblings.
Fourthly, there is nothing against looping over the search results. It may sound a bit weird, but it doesn't cost much extra performance. Actually, it is pretty much negligable, in MarkLogic Server that is. Most performance could be lost when you try to return many results (more than several thousands), in which case most time is lost in serializing it all. And if it is likely you will have to handle lots of search results, it is wise to start using pagination straight away.
To get what you ask, you could use the following code:
<results>{
for $phone in
cts:search(
doc()/Person/Phone,
cts:element-value-query(
xs:QName("Phone"),
("3738494044", "2373839383", "3933849383")
)
)
return
<match id="{data($phone/../#id)}" phone="{data($phone)}" name="{data($phone/../Name)}" city="{data($phone/../City)}"/>
}</results>
Best of luck.

Here's what I would do:
let $numbers := ("3738494044", "2373839383", "3933849383")
return
<results>{
for $person in cts:search(/Person, cts:element-value-query(xs:QName("Phone"),$numbers))
return
<match id="{data($person/#id)}" name="{data($person/Name)}" city="{data($person/City)}">
{
for $phone in $person/Phone[cts:contains(.,$numbers)]
return element phone {$phone}
}
</match>
}
First, there's an implicit OR when passing multiple values into word-query and value-query and their cousins, and this query is more efficiently resolved from the indexes, so do this when you can.
Second, an individual might match on more than one phone number, so you need that additional inner loop to effectively group by individual.
I would not create a range index for this - no need, and it isn't necessarily faster. There are indexes for element values by default, so you can leverage those with element-value-query.
You could do all of this with the SearchAPI and a little XSLT. That would make it easy to start combining names and numbers and other conditions in a single query.

Related

Incorrect work of autocomplete with Cyrillic

When sending a request to https://autocomplete.geocode.ls.hereapi.com/6.2/suggest.json?query=Вильнюс with an indication of cyrillic nothing comes and with a latin https: //autocomplete.geocode.ls.heraapi.com/6.2/suggest.json?query=Viln all is well. Tell me what the problem is or what I'm doing wrong?
You're not doing anything wrong. Autocomplete is designed to give you addresses that contain (perfectly match) your input string, and the results are sorted by relevance.
When you make your query in russian and provide only "Вильнюс" as input, the service is finding a lot of results (street names) that it considers are more relevant than the city. The city name is also found, but since the service doesn't think that this is what you're searching for, it puts the city much lower in the results list. You don't see it because you're limiting your query to give you only the first 10 matches (with the maxresults=10 parameter), but if you change the maxresults parameter to 20, for example, you will see that Vilnius appears in the 16th place of the API response.
If you want the service to better understand what is the thing you're querying for, you'll need to provide additional information. For example, if you continue typing and your input string is now "Вильнюс " (with a space at the end) or "Вильнюс Л" (a space and another letter), the service will understand what you mean and will return the result you want.
Another way of providing more information to change the way the service ranks the results is by adding a spatial filter, like the country, mapview, or prox parameters mentioned in the API Reference section of the documentation. Alternatively, the resultType parameter can help you filter out all the results with street names and return only city names, if that's what you want. These are just some options available, the one that is right for you will depend on your use case.

Aggregations in Marklogic 8 Java

I'm trying to group all the documents based on an element value. Through X-Query, I'm able to get the element value and its corresponding count. But, with Java API I'm not able to do that.
X-Query:
for $name in distinct-values(doc()/document/<element_name>)
return fn:concat("Element Value:",$name,", Count:",fn:count(doc()/document/[element_name eq $name]));
Output:
Element Value:A, Count:100
Element Value:B, Count:200
Java:
QueryManager qryMgr = client.newQueryManager();
StructuredQueryBuilder qb = new StructuredQueryBuilder();
StructuredQueryDefinition querydef = qb.containerQuery(qb.element("<element_name>"), qb.term("A"));
SearchHandle handle = new SearchHandle();
qryMgr.search(querydef, handle);
System.out.println(handle.getTotalResults());
By this method, I'm able to get the document count only for a particular value. Is there any way to get the count of all documents. Kindly Help!
If I understand your use case, you can use a range index to solve this problem, which is - you want to know what all the values are for a particular element, and then how many documents have that value. That's exactly what a range index is for.
Try adding a range index on "element_name" - you can use the ML Admin app for that - go to your database and click on Element Range Indexes.
In XQuery, you can then do something like this:
for $val in cts:element-values(xs:QName("element_name"))
return text{$val, cts:frequency($val)}
With the Java Client, you can do the same by adding a range-based constraint to a search options file, and then the response from SearchManager will have all of the values and frequencies in it that match your query. Check the REST API docs for constructing such a search options file.

MarkLogic cts:element-query false positives?

Given this document :-
<items>
<item><type>T1</type><value>V1</value></item>
<item><type>T2</type><value>V2</value></item>
</items>
unsurprisingly, I find that this will pull back the page in a cts:uris() :-
cts:and-query((
cts:element-query(xs:QName('item'),
cts:element-value-query(xs:QName('type'),'T1')
),
cts:element-query(xs:QName('item'),
cts:element-value-query(xs:QName('value'),'V2')
)
))
but somewhat surprisingly (to me at least) I also find that this will too :-
cts:element-query(xs:QName('item'),
cts:and-query((
cts:element-value-query(xs:QName('type'),'T1'),
cts:element-value-query(xs:QName('value'),'V2')
))
)
This doesn't seem right, as there is no single item with type=T1 and value=V2.
To me this seems like a false positive.
Have I misunderstood how cts:element-query works?
(I have to say that the documentation isn't particularly clear in this area).
Or is this something where MarkLogic strives to give me the result I expect, and had I had more or better indexes in place, I would be less likely to get a false positive match.
In addition to the answer by #wst, you only need to enable element value positions to get accurate results from unfiltered search. Here some code to show this:
xdmp:document-insert("/items.xml", <items>
<item><type>T1</type><value>V1</value></item>
<item><type>T2</type><value>V2</value></item>
</items>);
cts:search(collection(),
cts:element-query(xs:QName('item'),
cts:and-query((
cts:element-value-query(xs:QName('type'),'T1'),
cts:element-value-query(xs:QName('value'),'V2')
))
), 'unfiltered'
)
Without element value positions enabled this returns the test document. After enabling the positions, the query returns nothing.
As said by #wst, cts:search() runs filtered by default, whereas cts:uris() (and for instance xdmp:estimate() only runs unfiltered.
HTH!
Yes, I think this is a slight misunderstanding of how queries work. In cts:search, the default behavior is to enable the filtered option. In this case ML will evaluate the query using only indexes, and then once candidate documents have been selected, it will load them into memory, inspect, and filter out false positives. This is more time consuming, but more accurate.
cts:uris is a lexicon function, so queries passed to it will only resolve via indexes, and there is no option to filter false positives.
The simple way to handle this query via indexes would be to change your schema such that documents are based on <item> instead of <items>. Then each item would have a separate index entry, and results would not be commingled before filtering.
Another way that doesn't involve updating documents is to wrap the queries you expect to occur in the same element in a cts:near-query. That would prevent a <type> in one <item> from matching with a <value> in a different <item>. I suggest reading the documentation because you may need to enable one or more position-based indexes for cts:near-query to be accurate.

element-attribute-range-query fetching result but element-attribute-value-query is not fetching any result

I wanted to fetch the document which have the particular element attribute value.
So, I tried the cts:element-attribute-value-query but I didn't get any result. But the same element attribute value, I am able to get using cts:element-attribute-range-query.
Here the sample snippet used.
let $s-query := cts:element-attribute-range-query(xs:QName("tit:title"),xs:QName("name"),"=",
"SampleTitle",
("collation=http://marklogic.com/collation/codepoint"))
let $s-query := cts:element-attribute-value-query(xs:QName("tit:title"),xs:QName("name"),
"SampleTitle",
())
return cts:search(fn:doc(),($s-query))
The problem with range-query is it needs the range index. I have hundreds of DB's in multiple hosts. I need to create range indexes on each DB.
What could be the problem with attribute-value-query?
I found the issue with a couple of research.
Actually the result document is a french language document. It has the structure as follows. This is a sample.
<doc xml:lang="fr:CA" xmlns:tit="title">
<tit:title name="SampleTitle"/>
</doc>
The cts:element-attribute-value-query is a language dependent query. To get the french language results, then language needs to be mentioned in the option as follows.
cts:element-attribute-value-query(xs:QName("tit:title"),xs:QName("name"), "SampleTitle",("lang=fr"))
But cts:element-attribute-range-query don't require the language option.
Thanks for the effort.

Filtering a multivalued attribute in StringTemplate

I have a template which uses the same multivalued attribute in various places. I often find myself in a situation where I would like to filter the attribute before a template is applied to the individual values.
I can do this:
<#col:{c|<if(cond)><# c.Attribute2 #><endif>};separator=\",\"#>
but that is not what I want, because then there are separators in the output separating "skipped" entries, like:
2,4,,,6,,4,5,,
I can modify it to
<#col:{c|<if(c.Attribute1)><# c.Attribute2 #>,<endif>};separator=\"\"#>
Which is almost OK, but I get an additional separator after the last number, which sometimes does not matter (usually when the separator is whitespace), but sometimes does:
2,4,6,4,5,
I sometimes end up doing:
<#first(col):{c|<if(cond)><# c.Attribute2 #><endif>};separator=\"\"#>
<#rest(col):{c|<if(cond)>,<# c.Attribute2 #><endif>};separator=\"\"#>
But this approach fails if the first member does not satisfy the condition, then there is an extra separator in the beginning:
,2,4,6,4,5
Can someone give me a better solution?
First, let me point out that I think you are trying to do logic inside your template. Any time you hear things like "filter my list according to some condition based upon the data" it might be time to compute that filtered list in the model and then push it in. That said something like this might work where we filter the list first:
<col:{c | <if(c.cond)>c<endif>}:{c2 | <c2.c.attribute>}>
c2.c accesses the c parameter from the first application
The answer by "The ANTLR Guy" didn't help in my case and I found another workaround. See at Filter out empty strings in ST4

Resources