MarkLogic cts:element-query false positives? - xquery

Given this document :-
<items>
<item><type>T1</type><value>V1</value></item>
<item><type>T2</type><value>V2</value></item>
</items>
unsurprisingly, I find that this will pull back the page in a cts:uris() :-
cts:and-query((
cts:element-query(xs:QName('item'),
cts:element-value-query(xs:QName('type'),'T1')
),
cts:element-query(xs:QName('item'),
cts:element-value-query(xs:QName('value'),'V2')
)
))
but somewhat surprisingly (to me at least) I also find that this will too :-
cts:element-query(xs:QName('item'),
cts:and-query((
cts:element-value-query(xs:QName('type'),'T1'),
cts:element-value-query(xs:QName('value'),'V2')
))
)
This doesn't seem right, as there is no single item with type=T1 and value=V2.
To me this seems like a false positive.
Have I misunderstood how cts:element-query works?
(I have to say that the documentation isn't particularly clear in this area).
Or is this something where MarkLogic strives to give me the result I expect, and had I had more or better indexes in place, I would be less likely to get a false positive match.

In addition to the answer by #wst, you only need to enable element value positions to get accurate results from unfiltered search. Here some code to show this:
xdmp:document-insert("/items.xml", <items>
<item><type>T1</type><value>V1</value></item>
<item><type>T2</type><value>V2</value></item>
</items>);
cts:search(collection(),
cts:element-query(xs:QName('item'),
cts:and-query((
cts:element-value-query(xs:QName('type'),'T1'),
cts:element-value-query(xs:QName('value'),'V2')
))
), 'unfiltered'
)
Without element value positions enabled this returns the test document. After enabling the positions, the query returns nothing.
As said by #wst, cts:search() runs filtered by default, whereas cts:uris() (and for instance xdmp:estimate() only runs unfiltered.
HTH!

Yes, I think this is a slight misunderstanding of how queries work. In cts:search, the default behavior is to enable the filtered option. In this case ML will evaluate the query using only indexes, and then once candidate documents have been selected, it will load them into memory, inspect, and filter out false positives. This is more time consuming, but more accurate.
cts:uris is a lexicon function, so queries passed to it will only resolve via indexes, and there is no option to filter false positives.
The simple way to handle this query via indexes would be to change your schema such that documents are based on <item> instead of <items>. Then each item would have a separate index entry, and results would not be commingled before filtering.
Another way that doesn't involve updating documents is to wrap the queries you expect to occur in the same element in a cts:near-query. That would prevent a <type> in one <item> from matching with a <value> in a different <item>. I suggest reading the documentation because you may need to enable one or more position-based indexes for cts:near-query to be accurate.

Related

How can I be notified of an index being updated on DynamoDB?

I have a table (key=username, value=male or female) and an index on the values.
After I add an item to the table, I want to update the counts of males and females. However, after a successful write, as the index is a Global Secondary Index, the count query is not consistent.
Is there a way (dynamo db Streams, Lambda, ...) to monitor when the index is up to date?
Note that Im not looking for a solution that involves something else (keep count of increments in redis or ...), what I describe here is a simplified problem to especially ask a question about how can I monitor an index in dynamo.
Thanks!
I am not sure if there is any mechanism currently provided to check this but, you can easily solve this problem by adding a single line of code to your query.
ConsistentRead = True
DynamoDB has a parameter when set as true will make sure that you get latest updated value.
Now, when you add/update the item and then query the data add ConsistentRead option in it, this will ensure that you will have latest count value.
Here is the reference link.
If you are able to accomplish using other technique then please do share it.
Hope that helps.

element-attribute-range-query fetching result but element-attribute-value-query is not fetching any result

I wanted to fetch the document which have the particular element attribute value.
So, I tried the cts:element-attribute-value-query but I didn't get any result. But the same element attribute value, I am able to get using cts:element-attribute-range-query.
Here the sample snippet used.
let $s-query := cts:element-attribute-range-query(xs:QName("tit:title"),xs:QName("name"),"=",
"SampleTitle",
("collation=http://marklogic.com/collation/codepoint"))
let $s-query := cts:element-attribute-value-query(xs:QName("tit:title"),xs:QName("name"),
"SampleTitle",
())
return cts:search(fn:doc(),($s-query))
The problem with range-query is it needs the range index. I have hundreds of DB's in multiple hosts. I need to create range indexes on each DB.
What could be the problem with attribute-value-query?
I found the issue with a couple of research.
Actually the result document is a french language document. It has the structure as follows. This is a sample.
<doc xml:lang="fr:CA" xmlns:tit="title">
<tit:title name="SampleTitle"/>
</doc>
The cts:element-attribute-value-query is a language dependent query. To get the french language results, then language needs to be mentioned in the option as follows.
cts:element-attribute-value-query(xs:QName("tit:title"),xs:QName("name"), "SampleTitle",("lang=fr"))
But cts:element-attribute-range-query don't require the language option.
Thanks for the effort.

Is it just me, or does Firebase's default LIMIT() behavior work backwards?

This is not a big deal, but I wanted to throw it out there - if you had 100 items with incremental priorities, you would have a list like this:
item#1: { .priority: 1 }
...
item#100: { .priority: 100 }
The first item is #1 with priority 1, and the last item is #100 with priority 100. Now if you LIMIT() the list to 3 items, like this:
firebaseRef.limit(3).once(...)
Rather than being returned items 1-3, you would be returned items 97-100. Do most people expect that? It's the opposite of how limits generally work in other environments. In SQL for example, you start at the beginning of the set and stop when you hit the limit.
Now this isn't a technical limitation or anything (I believe), because we can actually get records 1-3 pretty easily by using STARTAT() on the first item:
firebaseRef.startAt(1).limit(3).once(...)
In fact, when LIMIT() is used without STARTAT() or ENDAT() it actually behaves like you specified ENDAT() with the last item. For example these produce the same results:
firebaseRef.limit(3).once(...)
firebaseRef.endAt(100).limit(3).once(...)
Doesn't it seem like the default behavior should be to mimic STARTAT() from the first position, rather than ENDAT() from the last position, if only LIMIT() is specified?
You are absolutely right in concluding that the default behavior works as if endAt() was used (and thus the latest items will be returned). This is because in the most common use cases you'd want to display the latest data, not the oldest; eg: chat history or notifications.

Filtering a multivalued attribute in StringTemplate

I have a template which uses the same multivalued attribute in various places. I often find myself in a situation where I would like to filter the attribute before a template is applied to the individual values.
I can do this:
<#col:{c|<if(cond)><# c.Attribute2 #><endif>};separator=\",\"#>
but that is not what I want, because then there are separators in the output separating "skipped" entries, like:
2,4,,,6,,4,5,,
I can modify it to
<#col:{c|<if(c.Attribute1)><# c.Attribute2 #>,<endif>};separator=\"\"#>
Which is almost OK, but I get an additional separator after the last number, which sometimes does not matter (usually when the separator is whitespace), but sometimes does:
2,4,6,4,5,
I sometimes end up doing:
<#first(col):{c|<if(cond)><# c.Attribute2 #><endif>};separator=\"\"#>
<#rest(col):{c|<if(cond)>,<# c.Attribute2 #><endif>};separator=\"\"#>
But this approach fails if the first member does not satisfy the condition, then there is an extra separator in the beginning:
,2,4,6,4,5
Can someone give me a better solution?
First, let me point out that I think you are trying to do logic inside your template. Any time you hear things like "filter my list according to some condition based upon the data" it might be time to compute that filtered list in the model and then push it in. That said something like this might work where we filter the list first:
<col:{c | <if(c.cond)>c<endif>}:{c2 | <c2.c.attribute>}>
c2.c accesses the c parameter from the first application
The answer by "The ANTLR Guy" didn't help in my case and I found another workaround. See at Filter out empty strings in ST4

Xquery on MarkLogic using OR

This is a newbie MarkLogic question. Imagine an xml structure like this, a condensation of my real business problem:
<Person id="1">
<Name>Bob</Name>
<City>Oakland</City>
<Phone>2122931022</Phone>
<Phone>3123032902</Phone>
</Person>
Note that a document can and will have multiple Phone elements.
I have a requirement to return information from EVERY document that has a Phone element that matches ANY of a list of phone numbers. The list may have a couple of dozen phone numbers in it.
I have tried this:
let $a := cts:word-query("3738494044")
let $b := cts:word-query("2373839383")
let $c := cts:word-query("3933849383")
let $or := cts:or-query( ($a, $b, $c) )
return cts:search(/Person/Phone, $or)
which does the query properly, but it returns a sequence of Phone elements inside a Results element. My goal is instead to return all the Name and City elements along with the id attribute from the Person element, for every matching document. Example:
<results>
<match id="18" phone="2123339494" name="bob" city="oakland"/>
<match id="22" phone="3940594844" name="mary" city="denver"/>
etc...
</results>
So I think I need some form of cts:search that allows both this boolean capability but also allows me to specify what part of each document gets returned. At that point then I could further process the result with XPATH. I need to do this efficiently so for example I think it would NOT be efficient to return a list of document uri's and then query for each document in a loop. Thanks!
Your approach is not as bad as you might think. There are only a few changes necessary to make it work as you like.
First of all, you are better off using cts:element-value-query instead of cts:word-query. It will allow you to limit the searched values to a specific element. It performs best when you add an element range index for that element, but it is not required. It can rely on the always present word index as well.
Secondly, there is no need for the cts:or-query. Both cts:word-query and cts:element-value-query functions (as well as all other related functions) accept multiple search strings as one sequence argument. They are automatically treated as or-query.
Thirdly, the phone numbers are your 'primary key' in the result, so returning a list of all matching Phone elements is the way to go. You just need to realize that the resulting Phone element are still aware of where they came from. You can easily use XPath to navigate to parent and siblings.
Fourthly, there is nothing against looping over the search results. It may sound a bit weird, but it doesn't cost much extra performance. Actually, it is pretty much negligable, in MarkLogic Server that is. Most performance could be lost when you try to return many results (more than several thousands), in which case most time is lost in serializing it all. And if it is likely you will have to handle lots of search results, it is wise to start using pagination straight away.
To get what you ask, you could use the following code:
<results>{
for $phone in
cts:search(
doc()/Person/Phone,
cts:element-value-query(
xs:QName("Phone"),
("3738494044", "2373839383", "3933849383")
)
)
return
<match id="{data($phone/../#id)}" phone="{data($phone)}" name="{data($phone/../Name)}" city="{data($phone/../City)}"/>
}</results>
Best of luck.
Here's what I would do:
let $numbers := ("3738494044", "2373839383", "3933849383")
return
<results>{
for $person in cts:search(/Person, cts:element-value-query(xs:QName("Phone"),$numbers))
return
<match id="{data($person/#id)}" name="{data($person/Name)}" city="{data($person/City)}">
{
for $phone in $person/Phone[cts:contains(.,$numbers)]
return element phone {$phone}
}
</match>
}
First, there's an implicit OR when passing multiple values into word-query and value-query and their cousins, and this query is more efficiently resolved from the indexes, so do this when you can.
Second, an individual might match on more than one phone number, so you need that additional inner loop to effectively group by individual.
I would not create a range index for this - no need, and it isn't necessarily faster. There are indexes for element values by default, so you can leverage those with element-value-query.
You could do all of this with the SearchAPI and a little XSLT. That would make it easy to start combining names and numbers and other conditions in a single query.

Resources