Technical reason for no support in lowercase searches in firebase - firebase

There is currently no support for lowercase searches in Firebase. The best way to handle this would be store the lowercase string along side the original string and then query the lowercase string instead.
But what is the technical reason behind this? Why can't I do an insensitive query?
Is there a chance that this will be implemented someday?

As Jay said in the comments: Performing a case insensitive search requires logic; i.e. knowing to search for 'A' OR 'a' and Firebase nodes do not contain logic of any kind, so there's no computing involved which is one of the reasons it's so darn fast - it just raw data.

Related

In Azure Search, how do you run a "contains" search with multiple terms?

I am using Azure Search in full query mode on top of CosmosDB and I want to run a query for any documents with a field that contains the string "azy do". This should match, for example, a document containing "lazy dog".
Reading the Azure Search documentation, it looks like this is impossible due to the term-based indexes it uses.
Rejected solutions
0 matches since it is looking for whole words:
"azy do"
Doesn't work since regexes are not allowed to span multiple terms:
/.*azy do.*/
This "works", to the extent that it will match "lazy dog", but this does not respect the ordering of the query and will also match "dog lazy", for example
/.*azy.*/ AND /.*do.*/
Is there any way of doing this correctly in Azure Search?
If you want to test a potential solution, you can modify the search variable in this JSFiddle that is using a demo Microsoft instance of Azure Search.
It doesn't appear that this exact scenario is possible if I understand correctly.
Without the regex wildcard you could do a proximity search:
var search = '"business budget"~3';
Here's a link for more reading:
https://learn.microsoft.com/en-us/rest/api/searchservice/lucene-query-syntax-in-azure-search

Misspelled words in Waton conversation?

How to handle misspelled words in Watson conversation API. NLP technique/Algorithm used in converation API calculates the word ranking and matches the trained data based on the rank.But how to handle the mispelled words or the short names in english.
At the moment there is nothing special to handle misspellings. The best process is to use the 'Synonyms' option within entities to add what you expect the user to use, including misspellings, short names, and acronym's.

Is the Google Id/Subject string returned from a GoogleIDToken validation actually a number?

I'm currently looking into server-side validation of a GoogleIDToken for Google Sign-in (Android & iOS). Documentation here
In the example, the "sub" field in the object returned by the Google API endpoint is read as a string, but it looks like it may actually be a (really big) number.
Some other tests using some users on my side also show big numbers.
Looking deeper in the Payload documentation, it looks like this value could be null, but outside of this possibility, can we assume that this string is actually a number?
This is important because we want to store it in a database, and saving it as a number might actually be more efficient than a string.
I work on the team at Google: this value should be stored as a string, it may be parseable as a number, but there is no guarantee, do not rely on that assumption!
If you're going to do arithmetic on it, then store it as a number. Otherwise, don't.
This is a general rule.

Constructing a Windows Search query

We have a website in which we will be using the Windows Search feature to allow users to search the pages and documents of the site. I would like to make the search as intuitive as possible, given that most users are already familiar with google-style search syntax. However, using Windows Search seems to present two problems.
If I use the FREETEXT() predicate, then users can enter certain google-style syntax options, such as double quotes for exact phrase matching or use the minus sign to exclude a certain word. These are features I consider necessary. However, the FREETEXT() predicate seems to demand that every search term appear somewhere in the page / document in order for it to be returned in the results.
If I use the CONTAINS() predicate, then users can enter search terms using boolean operators, and they can execute wildcard searches using the * character. However, all search terms must be joined by one of the logical operators or enclosed in double quotation marks.
What I would like is a combination of the two. Users should be able to search for exact phrases using double quotations marks, exclude words using the minus sign, but also have anything not enclosed in quotation marks be subject to wildcard matching (e.g. searching for civ would return documents containing the words civil or civility or civilization).
How could I go about implementing this?
I followed some of the instructions at http://www.codeproject.com/Articles/21142/How-to-Use-Windows-Vista-Search-API-from-a-WPF-App to create the Interop.SearchAPI.dll assembly for .NET. I then used the ISearchQueryHelper.GenerateSQLFromUserQuery() method to build the SQL command.
The generated SQL uses the CONTAINS() predicate, but it builds the CONTAINS() predicate numerous times with different combinations of the search terms, including wild cards. This allows the user to enter exact phrases using double quotation marks, exclude words using the minus sign, and perform automatic wildcard matching as I mentioned in the original question.

How to check if user input data is in other than English language?

I am using Facebook API in my app to do the user authentication and then saves the user data into DB. And I am using same (i.e. facebook) username for my app if it exist else I create the username using name, but the problem is that some user's don't have their display name in English. So how can I check for such input at server side?
My app is written in Asp.net.
You can use regular expressions to check if the characters are only a, b, c...z or A, B, C...Z:
using System.Text.RegularExpressions;
Regex rgx = new Regex("^[a-zA-Z]+$");
if (rgx.IsMatch(inputData))
// input data is in English alphabet; take appropriate action...
else
// input data is not in English alphabet; take appropriate action...
It may be overkill for this task but correct way to detect input language is using something like Extended Linguistic Services APIs or services like Free Language Detection API
In your case I suggesting saving user names in appropriate encoding (like utf-8 or utf-16, which should be fine for user names on Facebook)
Your problem isn't that the usernames are in a foreign language, but rather that you are trying to store data into a database without using the appropriate character encoding (the only reason I've ever seen those ??? is when character encoding was at least one level too low for the current problem).
At a minimum, you should be using utf-8, but you probably want to use utf-16 (or even utf-32 if you're being really conservative). I also recommend this mandatory reading.
Determining whether a username is in English or not is impossible. There are too many possible variants on proper nouns to be able to provide any reliable metric. Then there are transplanted names and the like. You can try to detect if there are non-ASCII characters (I believe /[^ -~]/ should match all of them — space is the lowest "typeable" character in ASCII, ~ is the highest), but then you are compensating for the unicode problem instead of letting the computer handle that gracefully.

Resources