I am using Python with sqlite3.
Is there an advantage to using FTS3 or FTS4 if I only want to search for words in one column, or I could just use LIKE "%word%"?
While yes, you could handle the simplest of cases with LIKE '%word%', FTS allows user queries like clown NOT circus (matches rows talking about clowns but not circuses). It handles stemming (well, in English) so that break would match against breaks but not breakdance. It also builds indexes in a different way so that searching is much faster for this sort of query, and can return not just where the match happened but also a snippet of the matched text.
Finally, all the parsing of those potentially-complex user queries is done for you; that's code you don't have to write at all.
Related
Is there a way to search for a keyword across all columns of all tables in Azure Data Explorer? I know "* has" syntax works for searching in all columns in a table but if I want to search for a keyword across all tables, how do I do it?
from an efficiency standpoint, it's better, whenever possible, to scope your query only to the specific table(s) which is (are) relevant to your use case.
that said, you may want to look at the find operator: https://learn.microsoft.com/en-us/azure/kusto/query/findoperator
You can try below option:
search in (*) "NPA*" or "CSA*"
It appears that SQLite, apparently as a "compatibility feature", parses double quoted identifiers as string literals if no matching column is found.
I understand that it does so for people who write improper sql, and for backwards compatibility with legacy projects created by such people, but it makes debugging very difficult for those of us writing proper sql on brand new projects.
For example,
SELECT * FROM "users" WHERE "usernme" = 'joe';
returns a query with 0 rows, since the string 'usernme' does not equal the string 'joe'.
This leaves me scratching my head wondering why i'm not getting joe's row even when i know there's a user by that name until I painstakingly backtrack my code and realize that I left out an a.
Is there any "strict mode" PRAGMA or API option to enforce quoting rules and treat all double-quoted strings as identifiers so that it will inform me immediately if one is misspelled?
(And please, no answers telling me not to quote identifiers if I don't need to, because any such answer is basically telling me that in order to get proper debugging, you have to write bad code in the first place.)
This is hardcoded in the SQLite parser and cannot be changed from the outside.
I also asked in the SQLite channel and someone there was kind enough to look through the source code and create a patch, and even started a thread on the mailing list describing the patch:
http://www.mail-archive.com/sqlite-users#sqlite.org/msg73832.html
It's not an answer that works for the official builds, but it may be someday. For the moment, I'm just going to recompile it myself with this patch.
Ten years later, and this doesn't completely meet your criteria about "strict mode" kinds of things, but here's a trick I used to make some queries safer, if you can remember to use it. It's to give your table an alias and reference it:
SELECT t."nosuch_column" FROM some_table t;
I suppose in this form, it's clear to SQLite that a literal isn't desired.
I'm trying to build a search that is similar to that on Google (with regards to exact match encapsulated in double quotes).
Let's use the following phrase for an example
"phrase search" single terms [different phrase]
Currently if I use the following code
Dim searchTermsArray As String() = searchTerms.Split(New String() {" ", ",", ";"}, StringSplitOptions.RemoveEmptyEntries)
For Each entry In searchTermsArray
Response.Write(entry & "<br>")
Next
my output is
"phrase
search"
single
terms
[different
phrase]
but what I really need is to build a key value pair
phrase search | table1
single | table1
terms | table1
different phrase | table2
where table1 is a table with general info, and table2 is a table of "tags" similar to that on stackoverflow.
Can anybody point me in the right direction on how to properly capture the input?
What are you trying to do is not that trivial. Implementing a search "similar to Google's" is far beyond parsing the search string.
I'd suggest you not to reinvent the wheel and instead use production ready solutions such as Apache Lucene.NET or Apache Solr. Those cope with both parsing and fulltext search.
But if you only need to parse this kind of strings then you should really consider solution Pete pointed to.
Regex is your friend. See this question
Depending on how fancy you plan in getting, you might consider the search grammar/implementation that's included with Irony.
http://irony.codeplex.com/
Search string parsing is a non-regular problem. That means that while a regular expression can get deceptively close, it won't take you all the way there without using proprietary extensions, building an unmaintainable mess of an expression, leaving nasty edge cases open that don't work how you'd like, or some combination of the three.
Instead, there are three correct ways to handle this:
Use a third-party solution like Lucene.
Build a grammar via something like antlr.
Build your own state machine.
For a problem of this level (and assuming that search is core enough to what you're doing to really want to implement it yourself), I'd probably go with option 3. This makes more sense when you realize that regular expressions are themselves instructions for how to set up state machines. All you're doing is building that right into your code. This should give you the ability to tune performance and features as well, without requiring adding a larger lexer component into your code.
For an example of how you might do this take a look at my answer to this question:
Reading CSV files in C#
hat I would do is build a state machine to parse the string character by character. This will be the easiest way to implement a fully-correct solution, and should also result in the fastest code.
I have a website that utilizes MS SQL 2008's FTS (Full-Text Search). The search works fine if the user searches for a string with an apostrophe like that's - it returns any results that contain that's. However, it will not return a result if the user searches for thats, and the database stores that's.
Also, ideally a search for that's should also return thats and that, etc.
Anybody know if FTS supports this, and what I can do to enable it?
WT
use a parameter
myCommand.Parameters.AddWithValue("#searchterm", txtSearch.Text.Trim());
it will be handle by you, without any hassle.
regarding that, that's and thats you should look up SOUNDEX keyword. 4GuysfromRolla have an old article about it as well.
updated
there is a great talk from one of the TFS Team member regarding this here.
quoting him:
Daniel is correct Full-text Search (FTS) does not use SOUNDEX directly,
but it can be used in combination with SOUNDEX. Additionally, you may
want to review the following links as well as the below TSQL
examples of combining CONTAINS & SOUNDEXYou may want to look at
some of the improved soundex algorithms as well as the
Levenshtein Distance algorithm You
should be able to search Google to find more code examples, for example:
'METAPHONE soundex "sql server" fuzzy name search' and I quickly found -
"Double Metaphone Sounds Great" at http://www.winnetmag.com/Article/ArticleID/26094/26094.html
You can freely download the code in a zip file that has several a user-defined function (UDF) that implement Double Metaphone. Below
are some additional SOUNDEX links:
http://www.merriampark.com/ld.htm
http://www.bcs-mt.org.uk/nala_006.htm
(omitted for out of scoop)
use pubs
-- Combined SOUNDEX OR CONTAINS query that
-- Searches for names that sound like "Michael".
SELECT
au_lname, au_fname
FROM
authors -- returns 2 rows
WHERE
contains(au_fname, 'Mich*') or SOUNDEX(au_fname) = 'M240'
Thanks,
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
I have an SQL database with multiple tables, and I am working on creating a searching feature. Other than having multiple queries for the different tables, is there a different way to go about said searching function?
I should probably add that a lot of my content is database driven to make upkeep easier. Lucene will not work for this, correct?
Different approaches to consider:
1) Multiple queries pre-baked, like you described.
2) Dynamic sql that you put together on the fly based on user-entered criteria.
3) If text is involved, based on SQL Server full text search or Lucene.
In my open source app BugTracker.NET, I do both 2 and 3 (using Lucene.NET).
I documented how I use Lucene.NET here:
http://www.ifdefined.com/blog/post/2009/02/Full-Text-Search-in-ASPNET-using-LuceneNET.aspx
Since you have tagged the question with Asp.net I suppose you want to search your webpages. In that case you can use Indexing Server to perform freetext searches easily that search the generated html and any keywords you have set up.
As Corey Trager suggested, using Lucene.NET is also an option. It has a good reputation of being fast and quite easy to use.
Although the other answers provide good suggestions such as using Lucene, I have much preferred using a custom caching method.
So for a website that I help create, we cached the searchable data every couple of hours, from many tables, into one simple table with columns such as:
URL
Item/Page Name
Main Keywords
Text Only Contents
Date Updated
I would then write my SQL statement to search this field using different functions to determin the rank.
You might want to check out this post i wrote on writing full text queries, its in C#, but its easilly portable, or just stick it in a library and use it as it.
How to build an SQL full text index search term in c#