DocumentDB: ST_DISTANCE function does not work with '<' (smaller than) sign - azure-cosmosdb

I have a perfectly working query that looks as follows:
SELECT p.id FROM place p WHERE ST_DISTANCE(p.geometry, {'type': 'Point', 'coordinates':[52.0826443333333, 5.11771783333333]} ) > 6000
It returns a list of id's of documents that are more than 6000 m from the geospatial point. Everything seems fine. However, if I turn around the '>' (greater than) sign to '<' (smaller than), it does not give any result. Interestingly, it does return false/true statements if I put the WHERE clause in a SELECT statement, as follows:
SELECT ST_DISTANCE(p.geometry, {'type': 'Point', 'coordinates':[52.0826443333333, 5.11771783333333]}) < 6000 AS result FROM place p
It generates both true and false statements as expected. So the evaluation seems to work, but it does not return any output. Currently, I just use this latter work around, and also select the computed distances. But now I have to compute the points that are within a certain distance somewhere else (like on the client side or in a stored procedure).
UPDATE
I tested with a specified index policy (thanks to this example):
'indexingPolicy': {'includedPaths': [{'path': '/"geometry"/?', 'indexes': [ {'kind': 'Spatial', 'dataType': 'LineString'}]}, {'path': '/'}]}
And that solved the problem. I still think it is odd that the spatial function did work on 'greater than' and not on 'smaller than', but I think it is solved with this.

You should specify a Spatial index on that field like this:
'indexingPolicy': {
'includedPaths': [
{
'path': '/"geometry"/?',
'indexes': [
{'kind': 'Spatial', 'dataType': 'LineString'}
]
},
{'path': '/'}
]
}

Related

Multiple add if doesn't exist steps Gremlin

I have an injected array of values. I'm I want to add vertices if they don't exist. I use the fold and coalesce step, but it doesn't work in this instance since I'm trying to do it for multiple vertices. Since 1 vertex exists I can no longer get a null value, and the the unfold inside the coalesce step returns a value from there on. This leads to vertices that don't exist yet not to be added.
This is my current traversal:
const traversal = await g
?.inject([
{ twitterPostId: 'kay', like: true, retweet: false },
{ twitterPostId: 'fay', like: true, retweet: false },
{ twitterPostId: 'nay', like: true, retweet: false },
])
.unfold()
.as('a')
.aggregate('ta')
.V()
.as('b')
.where('b', p.eq('a'))
.by(__.id())
.by('twitterPostId')
.fold()
.coalesce(__.unfold(), __.addV().property(t.id, __.select('ta').unfold().select('twitterPostId')))
.toList();
Returns:
[Bn { id: 'kay', label: 'vertex', properties: undefined }]
Without using coalesce you can do conditional upserts using what we often refer to as "map injection". The Gremlin does get a little advanced, but here is an example
g.withSideEffect('ids',['3','4','xyz','abc']).
withSideEffect('p',['xyz': ['type':'dog'],'abc':['type':'cat']]).
V('3','4','xyz','abc').
id().fold().as('found').
select('ids').
unfold().
where(without('found')).as('missing').
addV('new-vertex').
property(id,select('missing')).
property('type',select('p').select(select('missing')).select('type'))
That query will look for a set of vertices, figure out which ones exist, and for the rest use the ID values and properties from the map called 'p' to create the new vertices. You can build on this pattern a great many ways and I find it very useful until mergeV and mergeE are more broadly available
You can also use the list of IDs in the query to check which ones exist. However, this may lead to inefficient query plans depending on the given implementation:
g.withSideEffect('ids',['3','4','xyz','abc']).
withSideEffect('p',['xyz': ['type':'dog'],'abc':['type':'cat']]).
V().
where(within('ids')).
by(id).
by().
id().fold().as('found').
select('ids').
unfold().
where(without('found')).as('missing').
addV('new-vertex').
property(id,select('missing')).
property('type',select('p').select(select('missing')).select('type'))
This is trickier than the first query, as the V step cannot take a traversal. So you cannot do V(select('ids')) in Gremlin today.

MarkLogic optic query using two indexes returns no results

I want to use the MarkLogic optic API to join two range indexes but somehow they don't join. Is the query I wrote wrong or can't I compare the indexes used?
I have two indexes defined:
an element-attribute range index x/#refid
a range field index 'id'
Both are of type string and have the same collation defined. Both indexes have data that I can retrieve with cts:values() function. Both are huge indexes and I want to join them using optics so I have constructed the following query :
import module namespace op="http://marklogic.com/optic"
at "/MarkLogic/optic.xqy";
let $subfrag := op:fragment-id-col("subfrag")
let $notfrag := op:fragment-id-col("notfrag")
let $query :=
cts:and-query((
cts:collection-query("latest")
))
let $subids := op:from-lexicons(
map:entry("subid", cts:field-reference("id")), (), $subfrag) => op:where($query)
let $notids := op:from-lexicons(
map:entry("notid", cts:element-attribute-reference(xs:QName("x"), xs:QName("refid"))),
(),
$notfrag)
return $subids
=> op:join-cross-product($notids)
=> op:where(op:eq($notfrag, $subfrag))
=> op:result()
This query uses the join-cross-product and when I remove the op:where clause I get all values left and right. I verified and some are equal so the clause should filter only those rows i'm actually interested in. But somehow it doesn't work and I get an empty result. Also, if I replace one of the values in the op:eq with a string value it doesn't return a result.
When I use the same variable in the op:eq operator (like op:eq($notfrag, $notfrag)) I get results back so the statement as is works. Just not the comparison between the two indexes.
I have also used variants with join-inner and left-outer-join but those are also returning no results.
Am I comparing two incomparable indexes or am I missing some statement (as documentation/example is a bit thin).
(of course I can solve by not using optics but in this case it would be a perfect fit)
[update]
I got it working by eventually by changing the final statement:
return $subids
=> op:join-cross-product($notids)
=> op:where(op:eq(op:col('subid'), op:col('notid')))
=> op:result()
So somehow you cannot use the fragment definitions in the condition. After this I replaced the join-cross-product with a join-inner construction which should be a bit more efficient.
And to be complete, I initially used the example from the MarkLogic documentation found here (https://docs.marklogic.com/guide/app-dev/OpticAPI#id_87356), specifically the last example where they use a fragment column definition to be used as param in the join-inner statement that didn't work in my case.
Cross products are typically useful only for small rows sets.
Putting both reference in the same from-lexicons() accessor does an implicit join, meaning that the engine forms rows by constructing a local cross-product of the values indexed for each document.
Such a query could be expressed by:
op:from-lexicons(
map:entry("subid", cts:field-reference("id"))
=>map:with("notid", cts:element-attribute-reference(xs:QName("x"),
xs:QName("refid")))
=>op:where(cts:collection-query("latest"))
=>op:result()
Making the joins explicitly could be done with:
let $subids := op:from-lexicons(
map:entry("subid", cts:field-reference("id")), (), $subfrag)
=> op:where($query)
let $notids := op:from-lexicons(
map:entry("notid", cts:element-attribute-reference(xs:QName("x"),
xs:QName("refid"))),
(),
$notfrag)
return $subids
=> op:join-inner($notids, op:on($notfrag, $subfrag))
=> op:result()
Hoping that helps,

Xquery - How to match two sequences within a quantifier expression

Like many, I'm tackling the Mondial database on XML. It would be a piece of cake, if XQuery syntax wasn't doing its best to sabotage.
let $inland := //province/#id
where every $sea in //sea satisfies
$sea/located/#province != $inland
return $inland
What I am trying to do in the above is find all "inland" provinces, the provinces that don't have a sea next to it. This, however, doesn't work, because the $sea/located/province is a big string, with every single province that it borders in it.
So I tried to modify into.
let $inland := //province/#id
where every $sea in //sea satisfies
not(contains($sea/located/#province, $inland))
return $inland
Where I would like it to only find the provinces that are a part of the sea's bordering provinces. Simple and straightforward.
Error message:
Stopped at C:/Users/saffekaffe/Desktop/mondial/xml/country_without_island.xml, 2/1:
[XPTY0004] Item expected, sequence found: (attribute id {"prov-Greece-2"},....
How do I get around this?
Example of //sea/located/province#
province="prov-France-5 prov-France-20 prov-France-89 prov-France-99"
Example of //province/#id
id="prov-Greece-2"
There are multiple ways in which XQuery works in a different way than you seem to expect.
The comparison operators = and != have existential semantics if at least one of their arguments is a sequence instead of a single item. This means that $seq1 = $seq2 is equivalent to some $x in $seq1, $y in $seq2 satisfies $x = $y. The query ('foo', 'bar') = ('bar', 'baz', 'quuz') returns true because there is at least one common item.
An XQuery exception like //province/#id evaluates to a sequence of all matching nodes. In your case that would be a sequence of over 1000 province IDs: (id="prov-cid-cia-Greece-2", id="prov-cid-cia-Greece-3", id="prov-cid-cia-Greece-4", [...]). This sequence is then bound to the variable $inland in your let clause. Since you don't iterate over individual items in $inland (for example using a for clause), the where condition then works on the whole sequence of all provinces worldwide at once. So your condition every $sea in //sea satisfies
$sea/located/#province != $inland now means:
"For every sea there is a province located next to it that has an #id that is not equal to at least one of all existing province IDs."
Th is returns false because there are seas with no located children, e.g.the Gulf of Aden.
contains($str, $sub) is not a good fit for checking if a substring is contained in a space-delimited string, because it also matches parts of entries: contains("foobar baz quux", "oob") returns true.
Instead you should either split the string into its parts using tokenize($str) and look through its parts, or use contains-token($str, $token).
Putting it all together, a correct query very similar to your original one is:
for $inland in //province/#id
where
every $sea in //sea
satisfies not(contains-token($sea/located/#province, $inland))
return $inland
Another approach would be to first gather all (unique) provinces that are next to seas and then return all provinces not in that sequence:
let $next-to-sea := distinct-values(//sea/located/#province/tokenize(.))
return //province/#id[not(. = $next-to-sea)]
Even more compact (but potentially less efficient):
//province/#id[not(. = //sea/located/#province/tokenize(.))]
On the other end of the spectrum you can use XQuery 3.0 maps to replace the potentially linear search through all seaside provinces by a single lookup:
let $seaside :=
map:merge(
for $id in //sea/located/#province/tokenize(.)
return map{ $id: () }
)
return //province/#id[not(map:contains($seaside, .))]

Compare two elements of the same document in MarkLogic

I have a MarkLogic 8 database in which there are documents which have two date time fields:
created-on
active-since
I am trying to write an Xquery to search all the documents for which the value of active-since is less than the value of created-on
Currently I am using the following FLWOR exression:
for $entity in fn:collection("entities")
let $id := fn:data($entity//id)
let $created-on := fn:data($entity//created-on)
let $active-since := fn:data($entity//active-since)
where $active-since < $created-on
return
(
$id,
$created-on,
$active-since
)
The above query takes too long to execute and with increase in the number of documents the execution time of this query will also increase.
Also, I have
element-range-index for both the above mentioned dateTime fields but they are not getting used here. The cts-element-query function only compares one element with a set of atomic values. In my case I am trying to compare two elements of the same document.
I think there should be a better and optimized solution for this problem.
Please let me know in case there is any search function or any other approach which will be suitable in this scenario.
This may be efficient enough for you.
Take one of the values and build a range query per value. This all uses the range indexes, so in that sense, it is efficient. However, at some point, there is a large query that us built. It reads similiar to a flword statement. If really wanted to be a bit more efficient, you could find out which if your elements had less unique values (size of the index) and use that for your iteration - thus building a smaller query. Also, you will note that on the element-values call, I also constrain it to your collection. This is just in case you happen to have that element in documents outside of your collection. This keeps the list to only those values you know are in your collection:
let $q := cts:or-query(
for $created-on in cts:element-values(xs:QName("created-on"), (), cts:collection-query("entities"))
return cts:element-value-range-query(xs:Qname("active-since"), "<" $created-on)
)
return
cts:search(
fn:collection("entities"),
$q
)
So, lets explain what is happening in a simple example:
Lets say I have elements A and B - each with a range index defined.
Lets pretend we have the combinations like this in 5 documents:
A,B
2,3
4,2
2,7
5,4
2,9
let $ := cts:or-query(
for $a in cts:element-values(xs:QName("A"))
return cts:element-value-range-query(xs:Qname("B"), "<" $a)
)
This would create the following query:
cts:or-query(
(
cts:element-value-range-query(xs:Qname("B"), "<" 2),
cts:element-value-range-query(xs:Qname("B"), "<" 4),
cts:element-value-range-query(xs:Qname("B"), "<" 5)
)
)
And in the example above, the only match would be the document with the combination: (5,4)
You might try using cts:tuple-values(). Pass in three references: active-since, created-on, and the URI reference. Then iterate the results looking for ones where active-since is less than created-on, and you'll have the URI of the doc.
It's not the prettiest code, but it will let all the data come from RAM, so it should scale nicely.
I am now using the following script to get the count of documents for which the value of active-since is less than the value of created-on:
fn:sum(
for $value-pairs in cts:value-tuples(
(
cts:element-reference(xs:QName("created-on")),
cts:element-reference(xs:QName("active-since"))
),
("fragment-frequency"),
cts:collection-query("entities")
)
let $created-on := json:array-values($value-pairs)[1]
let $active-since := json:array-values($value-pairs)[2]
return
if($active-since lt $created-on) then cts:frequency($value-pairs) else 0
)
Sorry for not having enough reputation, hence I need to comment here on your answer. Why do you think that ML will not return (2,3) and (4,2). I believe we are using an Or-query which will take any single query as true and return the document.

Using Parameters in Neo4j Relationship Queries

I'm struggling to work around a small limitation of Neo4j in that I am unable to use a parameter in the Relationship section of a Cypher query.
Christophe Willemsen has already graciously assisted me in working my query to the following:
MATCH (n1:Point { name: {n1name} }),
(n2:Point { name: {n2name} }),
p = shortestPath((n1)-[r]->(n2))
WHERE type(r) = {relType}
RETURN p
Unfortunately as r is a Collection of relationships and not a single relationship, this fails with an error:
scala.collection.immutable.Stream$Cons cannot be cast to org.neo4j.graphdb.Relationship
Removing the use of shortestPath() allows the query to run successfully but returns no results.
Essentially my graph is a massive collection of "paths" that link "points" together. It is currently structured as such:
http://console.neo4j.org/r/rholp
I need to be able to provide a starting point (n1Name), an ending point (n2Name), and a single path to travel along (relType). I need a list of nodes to come out of the query (all the ones along the path).
Have I structured my graph incorrectly / not optimally? I am open to advice on whether the overall structure is not optimal as well as advice on how best to structure the query!
EDIT
Regarding your edit, the nodes() function returns you the nodes along the path :
MATCH p=allShortestPaths((n:Point { name:"Point5" })-[*]->(n2:Point { name:"Point8" }))
WHERE ALL (r IN rels(p) WHERE type(r)={relType})
RETURN nodes(p)
In the console link, it is returning nodes Points 5,6,7,8
I guess in your case that using a common relationship type name for connecting your Point nodes would be more efficient.
If having a Path1, Path2, .. is for knowing the distance between two points, you can easily know the distance by asking for the length of the path, like this query related to your console link :
MATCH (n:Point { name:"Point1" })
WITH n
MATCH (n2:Point { name:"Point4" })
WITH n, n2
MATCH p=shortestPath((n)-[]->(n2))
RETURN length(p)
If you need to return only paths having a defined relationship length, you can use it without the shortestPath by specifying a strict depth :
MATCH (n:Point { name:"Point1" })
WITH n
MATCH (n2:Point { name:"Point4" })
WITH n, n2
MATCH p=(n)-[*3..3]->(n2)
RETURN length(p)
LIMIT1
As you can see here, the need to specify the relationship is not mandatory, you can just omit it or add the :NEXT type if you have other relationship types in your graph
If you need to match on the type, for e.g. the path from point 5 to point 8 in your console link, and the path can only have a PATH_TWO relationship, then you can do this :
MATCH (n:Point { name:"Point5" })
WITH n
MATCH (n2:Point { name:"Point8" })
WITH n, n2
MATCH p=(n)-[r*]->(n2)
WHERE type(r[0])= 'PATH_TWO'
WITH p, length(p) AS l
ORDER BY l
RETURN p, l
LIMIT 1
If you really NEED to have the Path1, Path2 style, maybe a short explanation on the need could help us find the more appropriate query
MATCH p=shortestpath((n1:Point{name:{n1name}})-[:relType *]->(n2:Point {name:{n2name}}))
RETURN p

Resources