full text searching - asp.net

I have an application that allow users to search on multiple columns (prod_name,prod_desc)
So I used full text search like below, but it does not return all the records, for excample I tried to find 'o' character in 2 columns (prod_name,prod_desc)but it can not find for some records.
Also when I do not use wildcard for the 'o' character it can not find any thing while contains means like %o%.
I am a bit confused about full text search.
Please help what is the problem.
CREATE FULLTEXT CATALOG catalog_crashcourse3;
CREATE FULLTEXT INDEX ON products(prod_name,prod_desc)
KEY INDEX pk_products ON catalog_crashcourse3;
SELECT prod_name, prod_desc
FROM products
WHERE CONTAINS((prod_name,prod_desc), '"*o*"');

SQL Server FTS is a word-based search process. When you create a full-text index on a column, the indexing engine crawls the content and breaks it into individual words in a process known as tokenization. The index then stored the word, the primary key of the row it was found in, and the word's position in the content (i.e. is is the first word in the field, the 57th word, or whatever).
When you specify a CONTAINS predicate such as
CONTAINS((prod_name,prod_desc), '"o"');
the SQL Server FTS engine looks for tokens (i.e. words) in its index that are "o". If your content does not have the word "o" in it (which is probably doesn't) then no matches will be found.
As you point out, you can do wildcard searches, where you try and matched patterns in the indexed word. For example, if you specify a predicate such as
CONTAINS((prod_name,prod_desc), '"o*"');
then the search will return all words in the indexed content that start with the letter "o"
FTS is best used when you want to search for groups of words in your indexed content. It can do sophisticated word stemming (such as searching for "ran" and "running" when you specify "run"). It also provides a ranking of the search result content so that you can find the best match. If you just want to search for a specified word in your content and your content is not too large, you may not need FTS. As MikeSmithDev pointed out in the comments, you may be able to just get away with a LIKE clause.
Note added: In response to your comment, if you have a table with 8 columns that you want to search using FTS, then you would create full-text indexes on each of these columns and search them as follows:
CONTAINS(*, '"Word"')
where the asterik indicates that all 8 indexed columns in the table should be included in the search.

You have two issues:
You are using a prefix wildcard *o which Sql Server FTS is
helpless with. It only works with suffix wildcards like word*.
You are using a single-character search term. Single character words
are excluded from the FT index by default, which is a good thing.
Unless specified otherwise, SQL Server associates the system
full-text stoplist by default when creating the index.
To see the default stoplist your database is using behind your back, use this query
Select SysStop.stopword, Langs.name
From sys.fulltext_system_stopwords SysStop
Inner Join sys.fulltext_languages Langs
On Langs.lcid = SysStop.language_id;
If you really want to search for single characters, you can drop and
recreate the FT index using the option WITH STOPLIST OFF, but be prepared
for a lot of noise. See Create FullText Index.

Related

How can I implement a junction index in DynamoDB?

Given two DynamoDB tables: Books and Words, how can I create an index that associates the two? Specifically, I'd like to query to get all Books that contain a certain Word, and query to get all Words that appear in a specific Book.
The objective is to avoid scanning an entire table for these queries.
Based on your question I can't tell if you only care about unique words or if you want every word including duplicates. I'll assume unique words.
This can be done with a single table and a Global Secondary Index.
Create a table called BookWords with a Hash key of bookId and a Sort key of word. If you Query this table with a bookId you will get all of the unique words in that book.
Create a Global Secondary Index with a Hash key of word and a Sort key of bookId. If you Query this index with a word you will get all of the bookIds of books that contain that word.
Depending of your use case, you will probably want to normalize the words. For example, is "Word" the same as "word"?
If you want all words, not just unique words, you can use a similar approach with a few small changes. Let me know

CustTableListPage filtering is too slow

When I'm trying to filter CustAccount field on CustTableListPage it's taking too long to filter. On the other fields there is no latency. I'm trying to filter just part of account number like "*123".
I have done reindexing for custtable and also updated statics but not appreciable difference at all.
When i have added listpage's query in a view it's filtering custAccount field normally like the other fields.
Any suggestion?
Edit:
Our version is AX 2012 r2 cu8, not a user based problem it occurs for every user, Interaction class has some custimizations but just for setting some buttons enable/disable props. etc... i tryed to look query execution what i found is not clear. something like FETCH_API_CURSOR_000000..x
Record a trace of this execution and locate what is a bottleneck.
Keep in mind that that wildcards (such as *) have to be used with care. Using a filter string that starts with a wildcard kills all performance because the SQL indexes cannot be used.
Using a wildcard at the end
Imagine that you have a dictionnary and have to list all the words starting with 'Foo'. You can skip all entries before 'F', then all those before 'Fo', then all those before 'Foo' and start your result list from there.
Similarly, asking the underlying SQL engine to list all CustAccount entries starting with '123' (= filter string '123*') allows using an index on CustAccount to quickly skip to the relevant data.
Using a wildcard at the start
Imagine that you still have that dictionnary and have to list all the words ending with 'ing'. You would have no other choice than going through the entire dictionnary and checking the ending of every word (due to the alphabetical sorting).
This explains why asking the SQL engine to list all CustAccount entries ending with '123' (= filter string '*123') means that all CustAccount values must be investigated. So the AOS loops through all the entries and uses an SQL cursor to do this. That is the FETCH_API_CURSOR statement you see on the SQL level.
Possible solutions
Educate your end user that using a wildcard at the beginning of a filter string will always be slow on a large table.
Step up the SQL server hardware / allocated resources (faster CPU, more RAM, faster disk, ...).
Create a full text index on CustAccount (not a fan of this one and performance impact should be thoroughly investigated).
I've solve the problem. CustTableListPage query had a sorting over DirPartyTable.Name field. When I remove this sorting, filtering with wildcard working like a charm.

Is this normal behavior for a unique index in Sqlite?

I'm working with SQLite in Flash.
I have this unique index:
CREATE UNIQUE INDEX songsIndex ON songs ( DiscID, Artist, Title )
I have a parametised recursive function set up to insert any new rows (single or multiple).
It works fine if I try to insert a row with the same DiscID, Artist and Title as an existing row - ie it ignores inserting the existing row, and tells me that 0 out of 1 records were updated - GOOD.
However, if, for example the DiscId is blank, but the artist and title are not, a new record is created when there is already one with a blank DiscId and the same artist and title - BAD.
I traced out the disc id prior to the insert, and Flash is telling me it's undefined. So I've coded it to set anything undefined to "" (an empty string) to make sure it's truly an empty string being inserted - but subsequent inserts still ignore the unique index and add a brand new row even though the same row exists.
What am I misunderstanding?
Thanks for your time and help.
SQLite allows NULLable fields to participate in UNIQUE indexes. If you have such an index, and if you add records such that two of the three columns have identical values and the other column is NULL in both records, SQLite will allow that, matching the behavior you're seeing.
Therefore the most likely explanation is that despite your effort to INSERT zero-length strings, you're actually still INSERTing NULLs.
Also, unless you've explicitly included OR IGNORE in your INSERT statements, the expected behavior of SQLite is to throw an error when you attempt to insert a duplicate INDEX value into a UNIQUE INDEX. Since you're not seeing that behavior, I'm guessing that Flash provides some kind of wrapper around SQLite that's hiding the true behavior from you (and could also be translating empty strings to NULL).
Larry's answer is great. To anyone having the same problem here's the SQLite docs citation explaining that in this case all NULLs are treated as different values:
For the purposes of unique indices, all NULL values are considered
different from all other NULL values and are thus unique. This is one
of the two possible interpretations of the SQL-92 standard (the
language in the standard is ambiguous). The interpretation used by
SQLite is the same and is the interpretation followed by PostgreSQL,
MySQL, Firebird, and Oracle. Informix and Microsoft SQL Server follow
the other interpretation of the standard, which is that all NULL
values are equal to one another.
See here: https://www.sqlite.org/lang_createindex.html

Why does SQLite full-text search (FTS4) treat angle brackets differently in a compound search?

I have an SQLite database using FTS4. It is used to store emails with message id's of the form:
Searching for messages using the FTS MATCH syntax, I get a result from:
SELECT rowid FROM emails WHERE emails MATCH '<8200#comms.io>'
This returns the correct row. But when I try to find multiple emails, I get an empty response:
SELECT rowid FROM emails WHERE emails MATCH '<8200#comms.io> OR <8188#comms.io>'
Strangely though, I can search without the angle bracket characters. This returns both rows:
SELECT rowid FROM emails WHERE emails MATCH '8200#comms.io OR 8188#comms.io'
This even though the angle brackets are present in the stored columns. I can find no mention that these are special characters in SQLite, and without the 'OR', the single-term search works fine.
Why are these characters treated differently in my compound search?
The default (simple) tokenizer reads alphanumerical characters and treats all others as word separators to be ignored.
So when searching for a message ID, you have to actually search for a phrase with multiple words (8200, comms, and io).
If you want to treat the entire message ID as a word, you have to write a custom tokenizer.

SQLite3, FTS3 and stop-words

How do you prevent SQLite3 from not indexing certain key words, or "stop-words", during the build of a virtual FTS3 table?
Examples I'd like to not index include "is", "the", "a" etc.
Unfortunately there is no built in tokenizer that handles stop words, so you will either need to implement your own tokenizer in C and filter out the stop words from the list manually, insert pre-tokenized/pre-filtered text into the relevant FTS table column or use a somewhat convoluted scheme where you insert the text into the FTS column, fetch it back after its been tokenized, filter it and then update the column value.

Resources