How to do a Case Insensitive search on Azure DocumentDb? - azure-cosmosdb

is it possible to perform a case insensitive search on DocumnetDb?
Let's say I have a record with 'name' key and value as "Timbaktu"
This will work:
select * from json j where j.name = "Timbaktu"
This wont:
select * from json j where j.name = "timbaktu"
So how do yo do a case insensitive search?
Thanks in advance.
Regards.

There are two ways to do this. 1. use the built-in LOWER/UPPER function, for example,
select * from json j where LOWER(j.name) = 'timbaktu'
This will require a scan though. Another more efficient way is to store a "canonicalized" form e.g. lowercase and use that for querying. For example, the JSON would be
{ name: "Timbaktu", nameLowerCase: "timbaktu" }
Then use it for querying like:
select * from json j WHERE j.nameLowerCase = "timbaktu"
Hope this helps.

Cosmos recently added a case-insensitive option for string functions:
You now have an option to make these string comparisons
case-insensitive: Contains, EndsWith, StringEquals, and StartsWith.
and Significant performance improvements have been realized for these
string system functions. Each of these four string system functions
now benefit from an index and will therefore have much lower latency
and request unit (RU) consumption.
Announcement

Perhaps this is an ancient case, I just want to provide a workaround.
You could use UDF in azure cosmos db.
udf:
function userDefinedFunction(str){
return str .toLowerCase();
}
And use below sql to query results:
SELECT c.firstName FROM c where udf.lowerConvert(c.firstName) = udf.lowerConvert('John')

Related

Cosmos DB, C# SQL Api - case-insensitive WHERE clause

I am working on a project with Azure Cosmos DB using the C# SQL Api (DocumentDB) and need to know if it's possible to have a case-insensitive WHERE clause. From what I can find online it doesn't appear to be possible yet.
I want to write a query like:
SELECT l.CustomerName, l.LogDetail
FROM Logs l
WHERE l.CustomerName = 'Acme'
and have documents returned with CustomerName equal to "ACME", "Acme", or even "aCmE". I don't want to take a performance hit of a scan. I'd prefer to have the query use an index.
I know I could create a second CustomerName field with all lowercase values to filter on, but I'm looking to see if I can avoid that. Is this possible?
Unfortunately, unless it was added in the past two months, this is not possible.
If you use ToLower() or ToUpper() on an indexed field it will result in a scan, so that is not an option.
Some valid solutions are like you said to add another field with a case-insensitive string, or to only insert data with a certain case. It sounds like your DB is case insensitive anyway, so why not ensure that the cases really are insensitive?
At the time of this writing, there is now a LOWER function that can be used Cosmos SQL API queries. This would enable you to write your query like this:
SELECT l.CustomerName, l.LogDetail
FROM Logs l
WHERE LOWER(l.CustomerName) = 'acme'
Here are the docs for the LOWER function.
There is a StringEquals function now which can be used to do case insensitive compares.
SELECT STRINGEQUALS("abc", "abc", false) AS c1, STRINGEQUALS("abc", "ABC", false) AS c2, STRINGEQUALS("abc", "ABC", true) AS c3
returns
[{
"c1": true,
"c2": false,
"c3": true
}]
Here is the documentation - https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-stringequals

Azure Cosmos DB Graph Wildcard search

Is it possible to search Vertex properties with a contains in Azure Cosmos Graph DB?
For example, I would like to find all persons which have 'Jr' in their name?
g.V().hasLabel('person').has('name',within('Jr')).values('name')
Seems like the within('') function only filters values that are exactly equal to 'Jr'. I am looking for a contains. Ideally case insensitive.
None of the text matching functions are available for CosmosDB at this time. However, I was able to implement a wildcard search functionality by using a UDF (User Defined Function) which uses the Javascript match() function:
function userDefinedFunction(input, pattern) { return input.match(pattern) !== null; };
Then you'd have to write your query as SQL and use the UDF that you defined (the example below assumes you called you function 'REGEX'
SELECT * FROM c where(udf.REGEX(c.name[0]._value, '.*Jr.*') and c.label='person')
The performance will be far from ideal so you need to decide if the solution is acceptable or not based on your latency and cost perspectives.
The Azure team has now implemented Tinkerpop predicates for String
The Azure team has "announced" this to a user here on their feedback website.
I haven't tested all of them, but containing works for me (it is case sensitive though)
g.V().hasLabel('doc').or(__.has('title', containing('truc')), __.has('tags', containing('truc')))
TextP.startingWith(string)
Does the incoming String start with the provided String?
TextP.endingWith(string)
Does the incoming String end with the provided String?
TextP.containing(string)
Does the incoming String contain the provided String?
TextP.notStartingWith(string)
Does the incoming String not start with the provided String?
TextP.notEndingWith(string)
Does the incoming String not end with the provided String?
TextP.notContaining(string)
Does the incoming String not contain the provided String?

EF Core with SQLite, Code First, case insensitive Contains for one column, sensitive for other

I have EF Core 1.1.0 preview with SQLite database and code-first approach. I have two strings in the table A and B. I have checked that if I have
IQueryable<T> query = dbSet;
query = query.Where(i => i.A.Contains(pattern));
the matching is case sensitive. That is what I need for string A, but when I want to query about string B, I would like to specify case insensitive matching.
Is it possible?
My current workaround is using ToLower:
query = query.Where(i => i.B.ToLower().Contains(pattern.ToLower()));
this works well, but I'm not sure it is the best solution.
In my OnModelCreating override, I have tried to use
modelBuilder.Entity<E>().Property(i => i.B).HasAnnotation("CaseSensitive", false);
but this had no effect and the matching (without using ToLower) was still case sensitive.
Is there any way to say that this B string inside the entity should be always compared case insensitively (i.e. operations like ==, Contains, StartsWith, EndsWith)?
From the logs, I can see the query translates to
SELECT ...
FROM ... AS "i"
WHERE ((instr("i"."B", #__pattern_0) > 0) OR (#__pattern_0 = ''))

Is there a way to do the google did you mean in linq?

I have list of words. I type in a word misspelled. Can I query the list using linq to get words that sounds like (soundex) the misspelled word?
I believe you can.
A quick google search came up with this link:
Code Snippet
from elt in SomeTable.AsEnumerable()
where SoundEx(elt.SomeWordsSoundExCode) == SoundEx("MyWord")
select elt;
If you want to use LINQ to SQL to query database, then you'll probably want to run the comparison on SQL side. You could use AsEnumerable, but then you'll need to implement the algorithm in C# and process the data in-memory.
I believe that LINQ to SQL doesn't provide any built-in method that would be translated to a call to the SOUNDEX function in SQL. However, you can add mapping for a user-defined SQL function (See for example this article). So, you could define your SQL function that performs the comparison and then write something like:
var db = new MyDatabaseContext();
var q = from w in db.Products
where db.SimilarSoundEx(w.Name, searchInput)
select w;

Asp.net fulltext multiple search terms methodology

I've got a search box that users can type terms into. I have a table setup with fulltext searching on a string column. Lets say a user types this: "word, office, microsoft" and clicks "search".
Is this the best way to deal with multiple search terms?
(pseudocode)
foreach (string searchWord in searchTerms){
select col1 from myTable where contains(fts_column, ‘searchWord’)
}
Is there a way of including the search terms in the sql and not iterating? I'm trying to reduce the amount of calls to the sql server.
FREETEXT might work for you. It will separate the string into individual words based on word boundaries (word-breaking). Then you'd only have a single SQL call.
MSDN -- FREETEXT
Well you could just build your SQL Query Dynamically...
string [] searchWords = searchTerm.Split(",");
string SQL = "SELECT col1 FROM myTable WHERE 1=2";
foreach (string word in searchWords)
{
SQL = string.Format("{0} OR contains(fts_column, '{1}')", SQL, word);
}
//EXEC SQL...
Obviously this comes with the usual warnings/disclaimers about SQL Injection etc... but the principal is that you would dynamically build up all your clauses and apply them in one query.
Depending on how your interacting with your DB, it might be feasible for you to pass the entire un-split search term into a SPROC and then split & build dynamic SQL inside the stored procedure.
You could do it similar to what you have there: just parse the search terms based on delimiter, and then make a call on each, joining the results together. Alternatively, you can do multiple CONTAINS:
SELECT Name FROM Products WHERE CONTAINS(Name, #Param1) OR CONTAINS(Name, #Param2) etc.
Maybe try both and see which is faster in your environment.
I use this class for Normalizing SQL Server Full-text Search Conditions

Resources