How do I handle an apostrophe (') in MS SQL 2008 FTS? - asp.net

I have a website that utilizes MS SQL 2008's FTS (Full-Text Search). The search works fine if the user searches for a string with an apostrophe like that's - it returns any results that contain that's. However, it will not return a result if the user searches for thats, and the database stores that's.
Also, ideally a search for that's should also return thats and that, etc.
Anybody know if FTS supports this, and what I can do to enable it?
WT

use a parameter
myCommand.Parameters.AddWithValue("#searchterm", txtSearch.Text.Trim());
it will be handle by you, without any hassle.
regarding that, that's and thats you should look up SOUNDEX keyword. 4GuysfromRolla have an old article about it as well.
updated
there is a great talk from one of the TFS Team member regarding this here.
quoting him:
Daniel is correct Full-text Search (FTS) does not use SOUNDEX directly,
but it can be used in combination with SOUNDEX. Additionally, you may
want to review the following links as well as the below TSQL
examples of combining CONTAINS & SOUNDEXYou may want to look at
some of the improved soundex algorithms as well as the
Levenshtein Distance algorithm You
should be able to search Google to find more code examples, for example:
'METAPHONE soundex "sql server" fuzzy name search' and I quickly found -
"Double Metaphone Sounds Great" at http://www.winnetmag.com/Article/ArticleID/26094/26094.html
You can freely download the code in a zip file that has several a user-defined function (UDF) that implement Double Metaphone. Below
are some additional SOUNDEX links:
http://www.merriampark.com/ld.htm
http://www.bcs-mt.org.uk/nala_006.htm
(omitted for out of scoop)
use pubs
-- Combined SOUNDEX OR CONTAINS query that
-- Searches for names that sound like "Michael".
SELECT
au_lname, au_fname
FROM
authors -- returns 2 rows
WHERE
contains(au_fname, 'Mich*') or SOUNDEX(au_fname) = 'M240'
Thanks,
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/

Related

Accent insensitive search with SQLite for html5sql

I'm a little bit confused with a quite simple looking problem.
I'm working on an offline HTLM5 web sql database for mobile, using html5sql.js.
So far, everything works fine, except the search with accents.
Here is an example:
Records: "Céline", "Elisa"
Search "el" --> "Elisa"
Search "él" --> No result
I would expect to find both results in both searches.
So far, I don't have any encoding specs in my queries. I read several posts about collation, but I did not manage to use any of them.
Would you have a typical example of how to write a query for this search?
PhoneGap does not have any portable way to do case-insensitive comparisons of non-ASCII characters.
You could remove accents from your strings and store those separately.
You could try my open source library, ydn-db, which has full text search. A demo app can be found here. Currently, it only has English language normalization though. You can find more languages on NaturalNode repo and recompiled it.

What is a strategy for a simple site site search in a SQL Server 2008 and ASP.NET MVC environment?

I am trying to hash out a strategy for implementing a very simple site search in ASP.NET MVC and SQL Server 2008.
Really, all I want to to do is to be able to rank search results based on the number of times a search word or phrase is found in the webpage. I attempted to do this using LINQtoSQL but I ran into a lot of issues where some LINQ commands don't have a SQL equivalent. This was a few months ago so I don't remember specific errors.
So, I'm just trying to figure out an approach. What I'm thinking is this:
Approach 1:
I should probably write a program to spider the site and somehow index the site's text - I'm thinking I should save information in a table like:
ID
Word
URL
I could then query that and rank based on how many time that word is associated with a certain URL. But then I realized that this technique would completely breakdown if a user was searching for a phrase.
Approach 2:
Then I was toying with the idea of using SPROCs to create a temporary table with a record for each URL that would somehow parse the text and determine how many times the phrase or word appeared in each individual URL. and then we would return the results from the temp table. I am thinking the temporary table would look something like this:
ID
SearchText
URL
Frequency
And then select * from temptable order by Frequency asc or something like that.
However, I'm not sure if SPROCs are capable of parsing text like that, or if simultanious searching would be possible.
I am looking for something very lightweight. I'm not really interested in using Lucene or Solr or anything like that because the learning curve seems very steep and those applications' features are far away more than what I need.
Any thoughts on how I should approach this problem? Is there a different approach that I should consider?
For your phrase versus word issue, why not use wildcards and LIKE operators?
Select Count(*) from temptable where SearchPhrase LIKE '%Apple%'
Maybe not exactly what you want, but Windows SharePoint Search Server isn't all that bad.
Yes, it has the word 'SharePoint' in it, which would usually make me grab the scissors on my desk and start stabbing my eyes out, but having to use it once in a pinch, I was actually somewhat impressed with it.
It's free, so maybe worth a couple of hours playing with it for comparison to writing something custom.
After a little poking around, it looks like SQL Server 2008's Full Text Search is what I would want to use. I'm not 100% sure yet, but it looks promising.
http://msdn.microsoft.com/en-us/library/ms142547.aspx
If you're considering Full Text Search, then also check out lucene.net.
I used FTS for one project, and later used lucene.net for another, and although the requirements were different from yours, I'd never go back to FTS now.

ASP.NET - How to properly split a string for search?

I'm trying to build a search that is similar to that on Google (with regards to exact match encapsulated in double quotes).
Let's use the following phrase for an example
"phrase search" single terms [different phrase]
Currently if I use the following code
Dim searchTermsArray As String() = searchTerms.Split(New String() {" ", ",", ";"}, StringSplitOptions.RemoveEmptyEntries)
For Each entry In searchTermsArray
Response.Write(entry & "<br>")
Next
my output is
"phrase
search"
single
terms
[different
phrase]
but what I really need is to build a key value pair
phrase search | table1
single | table1
terms | table1
different phrase | table2
where table1 is a table with general info, and table2 is a table of "tags" similar to that on stackoverflow.
Can anybody point me in the right direction on how to properly capture the input?
What are you trying to do is not that trivial. Implementing a search "similar to Google's" is far beyond parsing the search string.
I'd suggest you not to reinvent the wheel and instead use production ready solutions such as Apache Lucene.NET or Apache Solr. Those cope with both parsing and fulltext search.
But if you only need to parse this kind of strings then you should really consider solution Pete pointed to.
Regex is your friend. See this question
Depending on how fancy you plan in getting, you might consider the search grammar/implementation that's included with Irony.
http://irony.codeplex.com/
Search string parsing is a non-regular problem. That means that while a regular expression can get deceptively close, it won't take you all the way there without using proprietary extensions, building an unmaintainable mess of an expression, leaving nasty edge cases open that don't work how you'd like, or some combination of the three.
Instead, there are three correct ways to handle this:
Use a third-party solution like Lucene.
Build a grammar via something like antlr.
Build your own state machine.
For a problem of this level (and assuming that search is core enough to what you're doing to really want to implement it yourself), I'd probably go with option 3. This makes more sense when you realize that regular expressions are themselves instructions for how to set up state machines. All you're doing is building that right into your code. This should give you the ability to tune performance and features as well, without requiring adding a larger lexer component into your code.
For an example of how you might do this take a look at my answer to this question:
Reading CSV files in C#
hat I would do is build a state machine to parse the string character by character. This will be the easiest way to implement a fully-correct solution, and should also result in the fastest code.

Is there a utility for finding SQL statements in multiple files and listing any referenced tables and stored procedures

I'm currently looking at a terrible legacy ColdFusion app written with very few stored procedures and lots of nasty inline SQL statements (it has a similarly bad database too).
Does anyone know of any app which could be used to search the files of the app picking out any SQL statements and listing the tables/stored procedures which are referenced?
Dreamweaver will allow you to search the code of the entire site. If the site is setup properly including the RDS password and provide a data source it can tell you a lot of information. I've only set it up once so I can't remember exactly what information it gives you, I think maybe just the DB structure. Application window > databases. Even if it isn't set up properly just searching for "cfquery" will quickly find all your queries.
You could also write a CF script using CFDirectory/CFFile to loop the .cfm files and parse everything between cfquery and /cfquery tags.
CFBuilder may have some features like that but I'm not to familiar with it yet.
edit I've heard that CFBuilder can't natively find all your cfqueries that don't have cfqueryparam but you can use CF to extend CFB to do so. I imagine you could find/write something for CFB to help you with your problem.
another edit
I know it isn't indexing the contents of the query, but you can use regex to search using the editor as well. searching for <cfquery.+(select|insert|update|delete) checking the regex box should find the queries that aren't using cfstoredProc (be sure to uncheck the match case option if there is one). I know Dreamweaver and Eclipse can both search for Regex.
HTH
As mentioned above I would try a grep with a regex looking for
"<cfquery*" "</cfquery>" and "<cfstoredproc*" "</cfstoredproc>"
In addition if you have tests that have good code coverage or even just feel like the app is fully exercised in production you could try turning on "Log Database Calls" in Admin - > Datasources or maybe even at the JDBC driver level, just monitor performance to make sure it does not slow the site down unacceptably.
In short: no. You'd have to do alot of tricky parsing to make sure you get all the SQL. And because you can glob SQL together from lots of strings, you'll almost always miss some of it.
The best you're likely to do will be a case insensitive grep for "SELECT|INSERT|UPDATE|DELETE" and then manually pulling out the table names.
Depending on how the code is structured, you might be able to get the table names by regexing the SQL from clause. But that's not foolproof. Alot of people use string concatenation to build SQL statements. This is bad because it can introduce SQL injection attacks, and it also make this particular problem harder.

Best way to create a search function ASP.NET and SQL server

I have an SQL database with multiple tables, and I am working on creating a searching feature. Other than having multiple queries for the different tables, is there a different way to go about said searching function?
I should probably add that a lot of my content is database driven to make upkeep easier. Lucene will not work for this, correct?
Different approaches to consider:
1) Multiple queries pre-baked, like you described.
2) Dynamic sql that you put together on the fly based on user-entered criteria.
3) If text is involved, based on SQL Server full text search or Lucene.
In my open source app BugTracker.NET, I do both 2 and 3 (using Lucene.NET).
I documented how I use Lucene.NET here:
http://www.ifdefined.com/blog/post/2009/02/Full-Text-Search-in-ASPNET-using-LuceneNET.aspx
Since you have tagged the question with Asp.net I suppose you want to search your webpages. In that case you can use Indexing Server to perform freetext searches easily that search the generated html and any keywords you have set up.
As Corey Trager suggested, using Lucene.NET is also an option. It has a good reputation of being fast and quite easy to use.
Although the other answers provide good suggestions such as using Lucene, I have much preferred using a custom caching method.
So for a website that I help create, we cached the searchable data every couple of hours, from many tables, into one simple table with columns such as:
URL
Item/Page Name
Main Keywords
Text Only Contents
Date Updated
I would then write my SQL statement to search this field using different functions to determin the rank.
You might want to check out this post i wrote on writing full text queries, its in C#, but its easilly portable, or just stick it in a library and use it as it.
How to build an SQL full text index search term in c#

Resources