Is fuzzy search in CosmosDB ARRAY_CONTAINS possible? - azure-cosmosdb

I am trying to search a CosmosDB Array using array contains as follows
SELECT c.id, c.PromotionName FROM c where ARRAY_CONTAINS(c.Variants, '0828570001006')
This works fine but I cannot figure out how to make the value fuzzy so that I can match on only the first 7 digits so the query would look like below
SELECT c.id, c.PromotionName FROM c where ARRAY_CONTAINS(c.Variants, '0828570')
But still match the same as the query above ( and more )
Is this possible?

Related

How to get distinct values as a single document in CosmosDB

When I do a SELECT DISTINCT c.Damage.Cause_of_Damage FROM c, where Cause_of_Damage is an array, I get a lot of documents with individual values back. Is there a way to concat all of them to a single array in a single result document?
It works using a self-join:
SELECT DISTINCT VALUE i
FROM c
JOIN i IN c.Damage.Cause_of_Damage

How are you supposed to think of joins in cosmosdb?

I am very confused by the cosmosdb documentation on joins. When I think of a join conventionally, I think of 2 tables, with 1 shared id, on which I perform the join. These 2 tables have different schemas, but the result of the join is a combined table with a merge of the columns from both tables. The join for cosmosdb does not seem to me intuitively congruent with that.
I have a collection with heterogenous data. Each document can have a different structure from the next. I want to count the number of documents that have a value that is present in the result set of a subquery. Intuitively, I want to do something like this:
SELECT COUNT(1) as c
FROM CollectionName as outer
where outer.type = "table"
JOIN ((SELECT c.id from c where c.type = "database") as inner) on outer.databaseId == t.id
// count the number of tables that are in deleted databases
It would seem like I would need to do a join on the result of the subquery with the result of the outer query, and then process that resulting table. But I am not understanding right now how to do that:
Select COUNT(1)
from Collection outer
where outer.type = 'table'
JOIN (select c.id from c IN outer.databaseId where c.type = "database" and c.state = "deleted")
I am constantly getting a 400 with the above query. So how am I supposed to think about joins in cosmosdb?
Cosmos is a document database. It stores and operates on json data which can be in hierarchical format. Joins in Cosmos reference tuples within these hierarchies where they can be projected with other data in the document.
There is a really good article that talks through this at pretty deep level but also have lots of examples too, Joins in Cosmos DB.
This takes some getting used to writing queries like this but once you get the hang of it you'll be ok. You can easily practice queries using the Query Playground that has a bunch of sample queries for nutrition dataset with food and ingredients. Or follow along with the families data in the docs. You can create additional items and then write some queries to see how joins work.
Hope that is helpful.

Cosmos db Order by on 'computed field'

I am trying to select data based on a status which is a string. What I want is that status 'draft' comes first, so I tried this:
SELECT *
FROM c
ORDER BY c.status = "draft" ? 0:1
I get an error:
Unsupported ORDER BY clause. ORDER BY item expression could not be mapped to a document path
I checked Microsoft site and I see this:
The ORDER BY clause requires that the indexing policy include an index for the fields being sorted. The Azure Cosmos DB query runtime supports sorting against a property name and not against computed properties.
Which I guess makes what I want to do impossible with queries... How could I achieve this? Using a stored procedure?
Edit:
About stored procedure: actually, I am just thinking about this, that would mean, I need to retrieve all data before ordering, that would be bad as I take max 100 value from my database... IS there any way I can do it so I don t have to retrieve all data first? Thanks
Thanks!
ORDER BY item expression could not be mapped to a document path.
Basically, we are told we can only sort with properties of document, not derived values. c.status = "draft" ? 0:1 is derived value.
My idea:
Two parts of query sql: The first one select c.* from c where c.status ='draft',second one select c.* from c where c.status <> 'draft' order by c.status. Finally, combine them.
Or you could try to use stored procedure you mentioned in your question to process the data from the result of select * from c order by c.status. Put draft data in front of others by if-else condition.

Use views and table valued functions as node or edge tables in match clauses

I like to use Table Valued functions in MATCH clauses in the same way as is possible with Node tables. Is there a way to achieve this?
The need for table valued functions
There can be various use cases for using table valued functions or views as Node tables. For instance mine is the following.
I have Node tables that contain NVarChar(max) fields that I would like to search for literal text. I need only equality searching and no full text searching, so I opted for using a index on the hash value of the text field. As suggested by Remus Rusanu in his answer to SQL server - worth indexing large string keys? and https://www.brentozar.com/archive/2013/05/indexing-wide-keys-in-sql-server/. A table valued function handles using the CHECKSUM index; see Msg 207 Invalid column name $node_id for pseudo column in inline table valued function.
Example data definitions
CREATE TABLE [Tags](
[tag] NVarChar(max),
[tagHash] AS CHECKSUM([Tag]) PERSISTED NOT NULL
) as Node;
CREATE TABLE [Sites](
[endPoint] NVarChar(max),
[endPointHash] AS CHECKSUM([endPoint]) PERSISTED NOT NULL
) as Node;
CREATE TABLE [Links] as Edge;
CREATE INDEX [IX_TagsByName] ON [Tags]([tagHash]);
GO
CREATE FUNCTION [TagsByName](
#tag NVarChar(max))
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN SELECT
$node_id AS [NodeId],
[tag],
[tagHash]
FROM [dbo].[Tags]
WHERE [tagHash] = CHECKSUM(#tag) AND
[tag] = #tag;
[TagsByName] returns the $node_id with an alias NodeId as suggested by https://stackoverflow.com/a/45565410/814206. However, real Node tables contain two more internal columns which I do not know how to export.
Desired query
I would like to query the database similar to this:
SELECT *
FROM [TagsByName]('important') as t,
[Sites] as s,
[Links] as l
WHERE MATCH ([t]-([l])->[s])
However, this results in the error1:
Msg 13901, Level 16, State 2, Line ...
Identifier 't' in a MATCH clause is not a node table or an alias for a node table.
I there a way to do this?
PS. There are some workarounds but they do not look as elegant as the MATCH-query; especially considering that my actual query involves matching more relations and more string equality tests. I will post these workarounds as answers and hope that someone comes with a better idea.
1 This gives a very specific difference between views and tables for Difference between View and table in sql; which only occurs in sql-server-2017 and only when using SQL Graph.
Workaround
Revert to traditional relational joins via JOIN clauses or FROM with <table_or_view_name> and WHERE clauses. In queries that match on more relations, the latter has the advantage that sql-server-2017-graph can MATCH on FROM <table_or_view_name> but not on FROM <table_source> JOIN <table_source>.
SELECT *
FROM [TagsByName]('important') as t
[Sites] as s,
[Links] as l
WHERE t.NodeId = l.$from_id AND
l.$to_id = s.$node_id;
Workaround
Add the Node table twice to the from clause: once as table and once as table valued function and join them via the $node_id in the where clause:
SELECT *
FROM [TagsByName]('important') as t1,
[Tags] as t2,
[Sites] as s,
[Links] as l
WHERE MATCH ([t2]-([l])->[s]) AND
t1.[NodeId] = t2.$node_id
Does this affect performance?
Workaround
Do not use the table valued function, but include its expression in the WHERE clause:
SELECT *
FROM [Tags] as t,
[Sites] as s,
[Links] as l
WHERE MATCH ([t]-([l])->[s]) AND
[t].[tagHash] = CHECKSUM('important') AND
[t].[tag] = 'important'
Downside: This is easy to get wrong; for example by forgetting to join on the CHECKSUM

Sqlite FTS, Using OR between match operators

When I execute the following query in a sqlite engine (android or sqlitebrowser) it throws an exception that says unable to use function MATCH in the requested context.
select
a.Body1,
b.Body2
from
tbl1_fts as a,
tbl2_fts as b
where
a.ID = b.ParentID and
(
a.Body1 match('value') or
b.Body2 match('value')
)
-Both tables have fts.
-Using And operator between two matches (instead of OR) runs normally.
How can I fix this or change the query to find rows with above condition?
you can not use OR Operation, just change your Match Keyword.
like
SELECT * FROM docs WHERE docs MATCH 'sqlite OR database';
OR maybe you can use union
SELECT docid FROM docs WHERE docs MATCH 'sqlite AND database'
UNION
SELECT docid FROM docs WHERE docs MATCH 'library';
MATCH as a function would have two parameters:
... WHERE match('value', SomeColumn) ...
However, the usual method of using MATCH is as an operator:
... WHERE SomeColumn MATCH 'value' ...
MATCH has to be used without parentheses.
The AND and OR operators must be in capital letters when used with FTS.

Resources