We have a website in which we will be using the Windows Search feature to allow users to search the pages and documents of the site. I would like to make the search as intuitive as possible, given that most users are already familiar with google-style search syntax. However, using Windows Search seems to present two problems.
If I use the FREETEXT() predicate, then users can enter certain google-style syntax options, such as double quotes for exact phrase matching or use the minus sign to exclude a certain word. These are features I consider necessary. However, the FREETEXT() predicate seems to demand that every search term appear somewhere in the page / document in order for it to be returned in the results.
If I use the CONTAINS() predicate, then users can enter search terms using boolean operators, and they can execute wildcard searches using the * character. However, all search terms must be joined by one of the logical operators or enclosed in double quotation marks.
What I would like is a combination of the two. Users should be able to search for exact phrases using double quotations marks, exclude words using the minus sign, but also have anything not enclosed in quotation marks be subject to wildcard matching (e.g. searching for civ would return documents containing the words civil or civility or civilization).
How could I go about implementing this?
I followed some of the instructions at http://www.codeproject.com/Articles/21142/How-to-Use-Windows-Vista-Search-API-from-a-WPF-App to create the Interop.SearchAPI.dll assembly for .NET. I then used the ISearchQueryHelper.GenerateSQLFromUserQuery() method to build the SQL command.
The generated SQL uses the CONTAINS() predicate, but it builds the CONTAINS() predicate numerous times with different combinations of the search terms, including wild cards. This allows the user to enter exact phrases using double quotation marks, exclude words using the minus sign, and perform automatic wildcard matching as I mentioned in the original question.
Related
There is currently no support for lowercase searches in Firebase. The best way to handle this would be store the lowercase string along side the original string and then query the lowercase string instead.
But what is the technical reason behind this? Why can't I do an insensitive query?
Is there a chance that this will be implemented someday?
As Jay said in the comments: Performing a case insensitive search requires logic; i.e. knowing to search for 'A' OR 'a' and Firebase nodes do not contain logic of any kind, so there's no computing involved which is one of the reasons it's so darn fast - it just raw data.
I am using Azure Search in full query mode on top of CosmosDB and I want to run a query for any documents with a field that contains the string "azy do". This should match, for example, a document containing "lazy dog".
Reading the Azure Search documentation, it looks like this is impossible due to the term-based indexes it uses.
Rejected solutions
0 matches since it is looking for whole words:
"azy do"
Doesn't work since regexes are not allowed to span multiple terms:
/.*azy do.*/
This "works", to the extent that it will match "lazy dog", but this does not respect the ordering of the query and will also match "dog lazy", for example
/.*azy.*/ AND /.*do.*/
Is there any way of doing this correctly in Azure Search?
If you want to test a potential solution, you can modify the search variable in this JSFiddle that is using a demo Microsoft instance of Azure Search.
It doesn't appear that this exact scenario is possible if I understand correctly.
Without the regex wildcard you could do a proximity search:
var search = '"business budget"~3';
Here's a link for more reading:
https://learn.microsoft.com/en-us/rest/api/searchservice/lucene-query-syntax-in-azure-search
When I put 'chromosome' in Google, it shows the meaning, phonetic notation, usage of 'chromosome' among other things. I think it is quit useful and I use this function to look up words often. But not all words you put in the engine give you the dictionary result, I am wondering if there is a way to impose Google to do it.
Prefix define to your search query to get definitions.
For example,
define chromosome
I need to recognize a complex chemichal names from a scanned document (pdf). They contain special characters and are written in a table format. I also have an Excel document that contains ALL possible names (I would say rows because there are no combinations) that I may encounter during scanning. Is there a way to create ligatures (so the Finereader will recognize an entire row instead of dissecting it into separate characters)? I tried creating a user dictionary but Finereader does not treat it as a one row.
The only way to create ligatures is to use "user pattern training". In FineReader, go to Tools -> Options -> Read tab (changes slightly depending on FR version) and enable User pattern training. During training extend your box to include several combined characters, thus creating a ligature.
The formulas recognition using this method is tough but may be possible.
I have done this many times in my work at www.wisetrend.com. I am a former ABBYY support employee and current integrator and OCR consulting specialist. I will be glad to help if you need more specific assistance.
Let's say that, being abstract from any language, we have some ontology made of triples (e.g. subject (S) - predicate (P) - object (O))
Now if I want to, for some reason, annotate any of these triples (nodes), than I'd like to keep links to them that I can use in web documents.
Here are some conditions:
1) Such link must be in a form of one line of text
2) Such link should be easily parseable both by machine and person
3) Sections of such links should be delimited
4) Such link must be easy to grep, which IMO means they should be wrapped in some distinct letters or characters to make them easy to regex from any web or other document
5) Such link can be used in URL pathnames or query strings, thus has to comply with URL syntax
6) Characters used in such link must not be reserved for URL pathnames, query strings or hashes (e.g. not "/", ";" "?", "#")
My ideas so far were as follows:
a) Start and end such link with some distinct, constant set of letters, e.g. STK_....._OVRFLW
b) Separate sections with dashes "-", e.g. Subject-Predicate-Object
So it would look like:
STK_S1234-P123-O1234_OVRFLW
You have better ideas?
I'm with #msalvadores on this one - this seems to be a classic use of semantic web / linked data (albeit in a rather complex form), and your example seems to be more related to URI design rather than anthing else.
# is dealt with extensively in the semantic web lit, also there are javascript libraries for querying rdf through sparql - it just makes more sense to stick with the standard.
To link to a triple, the standard method is to use reification - essentially naming a triple (to keep with the triple model, it ends up creating 4 triples, but I would consider it the "correct" method in this situation). There is also the "named graph" method, which isn't a standard, but probably has more widespread adoption.
The link will be 1 line of text
It will be easily machine parsable, to make it human parsable, it might be necessary to give some thought to URI design.
Delimitation is once again on URI design
easy grepping - URI design
URL syntax - tick
no "/", ";" "?", "#" - I would try to incorporate it into a url instead of pushing it out
I would consider www.stackoverflow.com/statement/S1234_P123_O123, where S1234 etc. are unique labels (I don't necessarily agree with human readable uris, but I guess they'll have to stay until humans don't have to read uris). The beautiful thing is that it should dereference and give a nice human vs machine readable representation