Keyword search with SQL Server - asp.net

I have a scenario where I need to search for cars by keywords using a single search field. The keywords can relate to any of the car's attributes for e.g. the make or the model or the body style. In the database there is a table named 'Car' with foreign keys referencing tables that represent models or makes or body style.
What would be the best way of doing this? Specifically, How should I take the query from user(must support exact phrase search, or, and) and how do I actually do the search.
I am using SQL Server and ASP.NET 3.5 (Data access using LINQ)

Easily the best and most comprehensive article on the subject : http://www.sommarskog.se/dyn-search-2005.html

Regardless of which implementation you pick from Aaron's article, I always log the search criteria and execution time in this situation. Just because you provide search flexibility, it doesn't mean most users will make use of it. You usally find most searches occur on a limited number of fields and logging the search criteria will allow you to create targetted indexes.

Related

How to setup data model for customizable application

I have an ASP.NET data entry application that is used by multiple clients. The application consists of multiple data entry modules that are common to all clients.
I now have multiple clients that want their own custom module added which will typically consist of a dozen or so data points. Some values will be text, others numeric, some will be dropdown selections, etc.
I'm in need of suggestions for handling the data model for this. I have two thoughts on how to handle. First would be to create a new table for each new module for each client. This is pretty clean but I don't particular like it. My other thought is to have one table with columns for each custom data point for each client. This table would end up with a lot of columns and a lot of NULL values. I don't really like either solution and suspect there's a better way to do this, so any feedback you have will be appreciated.
I'm using SQL Server 2008.
As always with these questions, "it depends".
The dreaded key-value table.
This approach relies on a table which lists the fields and their values as individual records.
CustomFields(clientId int, fieldName sysname, fieldValue varbinary)
Benefits:
Infinitely flexible
Easy to implement
Easy to index
non existing values take no space
Disadvantage:
Showing a list of all records with complete field list is a very dirty query
The Microsoft way
The Microsoft way of this kind of problem is "sparse columns" (introduced in SQL 2008)
Benefits:
Blessed by the people who design SQL Server
records can be queried without having to apply fancy pivots
Fields without data don't take space on disk
Disadvantage:
Many technical restrictions
a new field requires DML
The xml tax
You can add an xml field to the table which will be used to store all the "extra" fields.
Benefits:
unlimited flexibility
can be indexed
storage efficient (when it fits in a page)
With some xpath gymnastics the fields can be included in a flat recordset.
schema can be enforced with schema collections
Disadvantages:
not clearly visible what's in the field
xquery support in SQL Server has gaps which makes getting your data a real nightmare sometimes
There are maybe more solutions, but to me these are the main contenders. Which one to choose:
key-value seems appropriate when the number of extra fields is limited. (say no more than 10-20 or so)
Sparse columns is more suitable for data with many properties which are filled out infrequent. Sounds more appropriate when you can have many extra fields
xml column is very flexible, but a pain to query. Appropriate for solutions that write rarely and query rarely. ie: don't run aggregates etc on the data stored in this field.
I'd suggest you go with the first option you described. I wouldn't over think it. The second option you outlined would be a bad idea in my opinion.
If there are fields common to all the modules you're adding to the system you should consider keeping those in a single table then have other tables with the fields specific to a particular module related back to the primary key in the common table. This is basically table inheritance (http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server) and will centralize the common module data and make it easier to query across modules.

Find Unique Words in One or More Columns?

I'm looking at implementing tags in my ASP.NET website. After looking at several algorithms, I'm leaning towards having a couple of database columns that contain one or more tag words. I will then use full-text search to locate rows with specified tags.
All of this seems pretty straight forward except for one thing: I need to be able to generate a list of available tags, which the user can select from.
I know I can write a C# program to build the list of available tags, and then run it once every week or so, but I was just wondering if there's any SQL-method for doing stuff like this more efficiently.
Also, I can't help but notice that the words will be extracted anyway as part of building the full-text index. I don't suppose there's any way to access that information?
This isn't how I'd choose to structure this but to answer the actual question...
In SQL Server 2008 you can query the sys.dm_fts_index_keywords and sys.dm_fts_index_keywords_by_document table valued functions to get the information that you want.
Why not to use separate table for tags with many-to-many relationship with tagged items table?
I mean something like that:
--Articles
ArticleId
Text
--Tags
TagId
Name
--TagsToArticles
ArticleRef
TagRef

What is the best way to implement multilingual domain objects using NHibernate?

What is the best way to design the Domain objects which can have multi-lingual fields. An example can be a Product class with Description being multi-lingual.
I have found few links but could not decide which one is the best way.
http://fabiomaulo.blogspot.com/2009/06/localized-property-with-nhibernate.html
(This stores all localised language data in one field. Can be a problem if we query from Sql)
http://ayende.com/Blog/archive/2006/12/26/LocalizingNHibernateContextualParameters.aspx
(This one has a warning at the beginning that it is a hack and no longer supported)
http://www.webdevbros.net/2009/06/24/create-a-multi-languaged-domain-model-with-nhibernate-and-c/
(This does not describe how multilingual data will be structured in the database.)
Anyone having experience with using NHibernate with multi-lingual data. Is there a better way?
The third option looks great. The hibernate mapping is given, but not the database schema - if that's what you are missing, then I'll sketch it out here:
dictionary
----------
ID: int - identity
name: nvarchar(255)
phrase
------
dictionary_id:int (fkey dictionary.ID)
culture_id:int (LCID)
phrase:nvarchar(255) - this is the default size - seems too small
According to this blog entry, 255 is the default string length for String values. To overcome the short string length on the phrase text, you can change the <element> tag to
<element column="phrase" type="String" length="4001"></element>
To use this in your domain model, you add a PhraseDictionary property to your entity where you want translatable text. E.g. the title property or decription property.
I think the article describes a great approach, and is the one that I would go
for.
EDIT: In response to the comments, make the length less than 4001 if you know the absolute maximum size is less than that, as this will typically be faster. Also, NHibernate will lazily fetch the collection, but it may fetch all the items at once. You can profile to determine if this has any performance implications. (If you have only a handful of languages then I doubt you will see a difference.) If you have many languages (Say 50+) then it may be worthwhile creating custom properties to fetch the localized text. These will issue queries to fetch specifically the text required. More importantly, you may be able to fetch all the text for a given entity in one query, rather than each localized text property as a separate query.
Note that this extra effort is only needed if profiling gives you reason to be concerned about the performance. Chances are that the implementation in the article as is will function more than adequately.
I only have experience for Hibernate, but since nHibernate is so similar:
One option is to define a component type MultilingualString with members for each language (this assumes the set of languages is known at coding time). This type is also a convenient location to place an getter for the string by language id.
class MultiLingualString {
String english;
String chinese;
String klingon;
String forLanguage(Language lang) {
switch (lang) {
// you can guess what goes here
}
}
}
This results in the strings for all languages being stored in separate columns in the database while the representation in the object world retains fine granularity.
The advantage is that no join is required to fetch the strings. On the other hand, the only way not to fetch a string with this approach is to use a projection, which is a severe limitation if the strings are large, numerous and rarely needed.
If you do this a lot, writing a UserType might be worth it.
From a strictly database oriented standpoint with SQL Server, you should have one table with all of the base data (record key, dates, numbers, etc) and one table with all of the translatable string data. Let call the two tables Base and Base_Description.
Base ensures that there is a single key for each record, the key might be a string or auto-generated id depending on your particular use case.
The Base_Description table is related to the Base table, but also contains a value to select the language that the data is in. In my projects we use the langid column from sys.languages because we can set the language of the connection with and then grab it with ##LANGID for most operations.
In our testing we found this to be significantly faster than having multiple fields for each language, it also allows you to add other languages more easily. We are also using SQL Server Full-Text indexing and it fully works with this method. You should index in the neutral language and then you can pick the language to search against at run time (also filtering against the LangID column in Base_Description).
Do your requirements include the domain objects actually having multiple-language properties in the same object? And, if so, is it unlimited translations stored in the object (in a collection, say - in which case I would say that it would need to be just like any master/detail or parent/child collection) or fixed translations, in which case the languages (and thus the mapping to results of a stored proc or whatever) have to be determined statically anyway?
In many internationalized applications I worked on, the data was in only one language - customer names, the product names (there was no point in mapping even identical products used in one country to products in another, they all had different distributors and different SKUs, and of course localized pricing). The interface was also only in one language (at a time). So all the domain objects only required one language at a time. Thus the language of the translation would be determined when the object was instantiated.
We had translation user interfaces which allowed users to update the translated texts, but these only required two languages at a time (local and the default). I can see this being closest to what you are talking about. I guess that you would have child collections for each translatable property with all the possible translations in the collection. This would probably be closest to the second solution in the third article you linked. Of course, at this point you would also need to see if you want eager/lazy loading etc.

storing frequent search strings and their results in ASP.Net cache

I have a search functionality on my site that is accessible from every page. Typical top of the masterpage textbox and button deal. I'm looking for a better way to accomplish my caching of the most common search strings and their result using System.Web.Caching.Cache.
I was thinking of concatenating the search string with some applicable user group permission data and using that as the cache key with the value being the List.
example cache key: Microsoft Visual Studio 2008 Service Pack 1--usergroup2,3,6,17,89
But that got me thinking about what's the max length of cache key. Is there a max length that the key can be? By trying to store things this way can end up with some pretty lengthy key name values and it really doesn't do anything about keeping the most common searches as well the most recently used.
Is there already a commonly used method to accomplish what I'm trying to do? Does my question even make sense? Thanks for any help.
But that got me thinking about what's the max length of cache key. Is there a max length that the key can be? By trying to store things this way can end up with some pretty lengthy key name values and it really doesn't do anything about keeping the most common searches as well the most recently used.
The length for the key is the maximum length of the "string" itself.
According to the documentation here : http://msdn.microsoft.com/en-us/library/system.web.caching.cache.add.aspx, the key can be defined in a string with the value in Object type.
I would suggest to tag a custom Object with a unique key, so that when you query from the Cache, you can object your custom Object with more complex information tagged along in the Custom Object.
EDIT 11072009_1154
After i carefully read your requirement again, i noticed that your objective is to cache the frequently search string.
In your given example, the frequently search string might be "Microsoft Visual Studio 2008 Service Pack 1". In my opinion this should be the key, while the value is a custom object which will have additional properties to hold your other necessary attributes.
In summary, this might be the example :
Key : "Microsoft Visual Studio 2008 Service Pack 1"
Value : CustomObjectInstance where : CustomObjectInstance.UserLanguage = "English" and CustomObjectInstance.UserLocalization : "USA" , CustomObjectInstance.UserKeyboardLayout = "UK" etc.
AFAIK, The Cache implement a dictionary type of data structure, so the key must be unique enough. So if your key is "Microsoft Visual Studio 2008 Service Pack 1--usergroup2,3,6,17,89" How can you uniquely identify this particular key from your ASp.NET web apps ? Because in my search textbox, i will not insert usergroup2,3,6,17,89
Think also like StackOverflow site search functionality: users will insert a common search string i.e. "learn jquery material", then in my opinion, your cache key should have an entry of "learn jquery material".
EDIT 11072009_1250
Thanks for the additional information. I can also give additional solution by enforcing multiple layers, what i mean is, rather than cramming all the information into one layer of cache, why not store additional layer.
Means that your cache will have a key (string) and a value which point to a dictionary again.
Another possible solution, is to push these feature by using SQL Server Full Text Index Search, i am not quite familiar to the SQL Server Full Text Index Search, but it can be good if we can leverage this functionality to existing infrastructure if possible.
Caching search results is a fairly common technique. ASP.NET Cache will store all the cached data in memory for faster access. It all depends on how much memory is available to you for caching. If you want to deviate from the ASP.NET Cache approach, there's another method for implementing this - that method for caching the data retrieved from search is to store it in a database table.
Searching a table with billions of records is really expensive; so, you can store the data for the most searched keywords in a table for faster access. You can also create a job to refresh the table at regular intervals, based on some fairly easy algorithms. Least Recently Used algorithm, for example. You can remove the search results which have not been used recently.
EDIT: And, as for your question for the length of the cache key; it is a string, and the length of a string is dependent on the memory available to store it.

SQLite Query - Need Help with Full Text Search

Here's what I'm trying to do.
User (a):
Enters data in two fields (description-1) and (description-2).
User (b) Enters similar data in opposite fields.
User (a) or (b) search on both fields would find a match.
A good analogy would be a dating search. User (a) enters a description of themselves and the match they are looking for, and User (b) enters a description of themselves and the match they are looking for and Both would be able to do a search and find a match.
So in psuedo query english...
Select name from data where me = 'target' and target = 'me'
The catch would be, some of the words in the field would match, but not all.
This type of matching is hard no matter what the technology. You may have bitten off more than you can chew.
My recommendation to you is to read up on the Text Search data types in PostgreSQL.
PostgreSQL offers a flexible and powerful solution for full-text search, and it may do what you need, whereas SQLite probably won't.
Using the PostgreSQL tsquery and tsvector data types, you could turn one user's description into a form that queries the description of another user. Both tsquery and tsvector can be generated dynamically or saved in database columns and indexed.
If you still need to use SQLite, you need to learn about the various FTS virtual table types. These are all experimental and are not enabled by default. So you need to recompile SQLite, enabling FTS1, FTS2, or FTS3.
Documentation for these features is pretty sparse. Here's all I have found:
http://www.sqlite.org/cvstrac/wiki?p=FullTextIndex
http://www.sqlite.org/cvstrac/wiki?p=FtsUsage
http://www.sqlite.org/cvstrac/wiki?p=FtsOne
http://www.sqlite.org/cvstrac/wiki?p=FtsTwo
http://www.sqlite.org/cvstrac/wiki?p=CompilingFts
http://www.sqlite.org/cvstrac/wiki?p=CompilingFtsThree

Resources