SQLite Query - Need Help with Full Text Search - sqlite

Here's what I'm trying to do.
User (a):
Enters data in two fields (description-1) and (description-2).
User (b) Enters similar data in opposite fields.
User (a) or (b) search on both fields would find a match.
A good analogy would be a dating search. User (a) enters a description of themselves and the match they are looking for, and User (b) enters a description of themselves and the match they are looking for and Both would be able to do a search and find a match.
So in psuedo query english...
Select name from data where me = 'target' and target = 'me'
The catch would be, some of the words in the field would match, but not all.

This type of matching is hard no matter what the technology. You may have bitten off more than you can chew.
My recommendation to you is to read up on the Text Search data types in PostgreSQL.
PostgreSQL offers a flexible and powerful solution for full-text search, and it may do what you need, whereas SQLite probably won't.
Using the PostgreSQL tsquery and tsvector data types, you could turn one user's description into a form that queries the description of another user. Both tsquery and tsvector can be generated dynamically or saved in database columns and indexed.
If you still need to use SQLite, you need to learn about the various FTS virtual table types. These are all experimental and are not enabled by default. So you need to recompile SQLite, enabling FTS1, FTS2, or FTS3.
Documentation for these features is pretty sparse. Here's all I have found:
http://www.sqlite.org/cvstrac/wiki?p=FullTextIndex
http://www.sqlite.org/cvstrac/wiki?p=FtsUsage
http://www.sqlite.org/cvstrac/wiki?p=FtsOne
http://www.sqlite.org/cvstrac/wiki?p=FtsTwo
http://www.sqlite.org/cvstrac/wiki?p=CompilingFts
http://www.sqlite.org/cvstrac/wiki?p=CompilingFtsThree

Related

Single multi-tenanted firestore or many single tenanted firestores?

I'm building a SaaS system that allows users to define their own data models and enter data according to those models. It's a bit like airtable.
One user might model a bookshop, and would have a Book model, with title and ISBN fields. Another user might model medical records, and would have "date of last visit" as a field.
In the case of the bookshop, I want users to be able to search on title and ISBN. In the case of the medical records, I want users to be able to search on the date of the last visit.
I am using Firestore as my backend.
Firestore requires an index to enable a search. So that approach will not scale as # of customers increases.
My thought therefore was to have a Firestore instance for each customer, and those specific instances would have the necessary indexes.
I'm sure there are downsides to doing this though.
What would folks recommend to best solve this need?
What you are trying to achieve is some kind of weird, since you will not provide at least a few standard common properties for each user of your Bookshop.
When you want to perform a search in a Cloud Firestore database, you need the exact name of the property on which you want to search for. Having dynamic properties might not help you solve the search feature. However, you can create a document with a property of type array that can hold the name of all properties the users have chosen and perform a search on every property, but this solution will be much too expensive.
In my opinion, a possible solution might be to create at least a few common properties, so you can have the properties on which you can search. When someone creates, for example, a book shop you can display at the beginning all available properties a user can choose. Once you create a shop, you can have different users with different shop properties. This means that if a user does not choose a property, when you perform a search on that property, the results won't contain his/her products. This will work, only if you have predefined properties.

Sqlite linked tables in Access give #deleted values, again

Situation: MS Access (happens to be 2010) using SQLite ODBC driver (0.997) to link to tables in a SQLite (3.x) database.
Problem: data values in all columns in all rows display as "#Deleted".
Solution: This is a "answer my own question" kind of post, with a solution, below.
Edited: to move solution to Answers section.
Earlier, I searched in stackoverflow, found a similar question (sqlite linked tables in Access give #deleted values) with a good answer that turns out to be inapplicable in my case. So I'm adding some info here.
Half of the problem is explained here: http://support.microsoft.com/kb/128809 '"#Deleted" errors with linked ODBC tables.'
The above link was no longer available in Jul-2021. However you may find a good explanation for '#DELETED# Records Reported by Access' in https://dev.mysql.com/doc/connector-odbc/en/connector-odbc-errors.html
This explains that Access (Jet) wants a table to have a unique index in order to be able to insert/update the table if necessary.
If your SQLite table doesn't have a unique index (or primary key), then Access will only allow read access to the table -- you can't edit the table's data in Access, but the data displays fine.
To make the table updateable you might revise your SQLite code (or using a SQLite tool) to add an index to the table.
If your PK/unique index happens to use a TEXT field, that's fine for SQLite. However, when you link to it in Access, Access will show the #Deleted indications.
The chain of events appears to be:
Access/Jet notices the unique index, and tries to use it. However, SQLite TEXT fields are variable length and possibly BLOBs. This apparently doesn't fulfill Access's requirements for a unique index field, hence the #Delete indication.
To avoid that problem, the index has to be a SQLite field type that Access will accept. I don't know the complete list of types that are acceptable, but INTEGER works.
Hope this helps someone.

What is the best way to implement multilingual domain objects using NHibernate?

What is the best way to design the Domain objects which can have multi-lingual fields. An example can be a Product class with Description being multi-lingual.
I have found few links but could not decide which one is the best way.
http://fabiomaulo.blogspot.com/2009/06/localized-property-with-nhibernate.html
(This stores all localised language data in one field. Can be a problem if we query from Sql)
http://ayende.com/Blog/archive/2006/12/26/LocalizingNHibernateContextualParameters.aspx
(This one has a warning at the beginning that it is a hack and no longer supported)
http://www.webdevbros.net/2009/06/24/create-a-multi-languaged-domain-model-with-nhibernate-and-c/
(This does not describe how multilingual data will be structured in the database.)
Anyone having experience with using NHibernate with multi-lingual data. Is there a better way?
The third option looks great. The hibernate mapping is given, but not the database schema - if that's what you are missing, then I'll sketch it out here:
dictionary
----------
ID: int - identity
name: nvarchar(255)
phrase
------
dictionary_id:int (fkey dictionary.ID)
culture_id:int (LCID)
phrase:nvarchar(255) - this is the default size - seems too small
According to this blog entry, 255 is the default string length for String values. To overcome the short string length on the phrase text, you can change the <element> tag to
<element column="phrase" type="String" length="4001"></element>
To use this in your domain model, you add a PhraseDictionary property to your entity where you want translatable text. E.g. the title property or decription property.
I think the article describes a great approach, and is the one that I would go
for.
EDIT: In response to the comments, make the length less than 4001 if you know the absolute maximum size is less than that, as this will typically be faster. Also, NHibernate will lazily fetch the collection, but it may fetch all the items at once. You can profile to determine if this has any performance implications. (If you have only a handful of languages then I doubt you will see a difference.) If you have many languages (Say 50+) then it may be worthwhile creating custom properties to fetch the localized text. These will issue queries to fetch specifically the text required. More importantly, you may be able to fetch all the text for a given entity in one query, rather than each localized text property as a separate query.
Note that this extra effort is only needed if profiling gives you reason to be concerned about the performance. Chances are that the implementation in the article as is will function more than adequately.
I only have experience for Hibernate, but since nHibernate is so similar:
One option is to define a component type MultilingualString with members for each language (this assumes the set of languages is known at coding time). This type is also a convenient location to place an getter for the string by language id.
class MultiLingualString {
String english;
String chinese;
String klingon;
String forLanguage(Language lang) {
switch (lang) {
// you can guess what goes here
}
}
}
This results in the strings for all languages being stored in separate columns in the database while the representation in the object world retains fine granularity.
The advantage is that no join is required to fetch the strings. On the other hand, the only way not to fetch a string with this approach is to use a projection, which is a severe limitation if the strings are large, numerous and rarely needed.
If you do this a lot, writing a UserType might be worth it.
From a strictly database oriented standpoint with SQL Server, you should have one table with all of the base data (record key, dates, numbers, etc) and one table with all of the translatable string data. Let call the two tables Base and Base_Description.
Base ensures that there is a single key for each record, the key might be a string or auto-generated id depending on your particular use case.
The Base_Description table is related to the Base table, but also contains a value to select the language that the data is in. In my projects we use the langid column from sys.languages because we can set the language of the connection with and then grab it with ##LANGID for most operations.
In our testing we found this to be significantly faster than having multiple fields for each language, it also allows you to add other languages more easily. We are also using SQL Server Full-Text indexing and it fully works with this method. You should index in the neutral language and then you can pick the language to search against at run time (also filtering against the LangID column in Base_Description).
Do your requirements include the domain objects actually having multiple-language properties in the same object? And, if so, is it unlimited translations stored in the object (in a collection, say - in which case I would say that it would need to be just like any master/detail or parent/child collection) or fixed translations, in which case the languages (and thus the mapping to results of a stored proc or whatever) have to be determined statically anyway?
In many internationalized applications I worked on, the data was in only one language - customer names, the product names (there was no point in mapping even identical products used in one country to products in another, they all had different distributors and different SKUs, and of course localized pricing). The interface was also only in one language (at a time). So all the domain objects only required one language at a time. Thus the language of the translation would be determined when the object was instantiated.
We had translation user interfaces which allowed users to update the translated texts, but these only required two languages at a time (local and the default). I can see this being closest to what you are talking about. I guess that you would have child collections for each translatable property with all the possible translations in the collection. This would probably be closest to the second solution in the third article you linked. Of course, at this point you would also need to see if you want eager/lazy loading etc.

Keyword search with SQL Server

I have a scenario where I need to search for cars by keywords using a single search field. The keywords can relate to any of the car's attributes for e.g. the make or the model or the body style. In the database there is a table named 'Car' with foreign keys referencing tables that represent models or makes or body style.
What would be the best way of doing this? Specifically, How should I take the query from user(must support exact phrase search, or, and) and how do I actually do the search.
I am using SQL Server and ASP.NET 3.5 (Data access using LINQ)
Easily the best and most comprehensive article on the subject : http://www.sommarskog.se/dyn-search-2005.html
Regardless of which implementation you pick from Aaron's article, I always log the search criteria and execution time in this situation. Just because you provide search flexibility, it doesn't mean most users will make use of it. You usally find most searches occur on a limited number of fields and logging the search criteria will allow you to create targetted indexes.

Allowing nulls vs default values

I'm working on an ASP.NET project that replaces many existing paper forms. One of the requirements is that the user can save the form in any state, i.e. they could create a new blank form and immediately save it with no data or with partial data. I'm validating for data type on every save but validation for required fields does not occur until the user marks the form as completed.
I'm not sure what the best approach is to handle this requirement in the database and domain model. As I see it, I have two options:
Allow nulls for any field that may not have data. This feels like the "correct" approach but it requires that almost every database field allow nulls and I have to code around a lot of nullable types. Also, when the form is finalized none of the required fields are enforced in the database.
Populate my business objects with meaningful default values. In some cases, there are meaningful default values for many (but not all) fields that I could use. This approach verges on "magic numbers" which makes me uncomfortable.
Which approach is best? Or is there a third way? I'm not willing to go to extremes, such as splitting the tables.
Edited to add: I wanted to expand on this a bit since I accepted a response. The primary reason that I'm not interested in splitting the tables is that once a project is submitted, the data on the forms is used to generate data for another system that is the system of record. At that point the original form data is unlikely to be revised or used for reporting.
I don't understand why you don't want to split the tables. I don't know what domain you're in but in any I could imagine there are two classes of people:
people who have submitted the form
people who haven't
And as a business executive I don't care about the second. But the first I care deeply about, and they need to have all their data in correctly.
It also improves efficiency - most of your queries about aggregate data will be over the first table, not the second. The second table will only be used for index seeks.
If splitting the table(s) (are there more than one?) is not an option, I would consider creating single table to store serialisations of objects of incomplete forms, and only commit a form to the "real" tables when the form is fully submitted by the user.
If there isn't a sensible default, and you don't want to split the data, then nulls are almost certainly your best option. Re the db not being to verify that they are not null when completed... well, if you don't want to split the table there isn't much you can do (short of using a CHECK constraint, or an INSTEAD OF trigger to run validation). But the DB isn't the only place responsible for data validation. Your app logic can do that too.
You could use a temporary table with "allow nulls" on every column to store the form containing partial or no data and copy / move the data to the final table when the user marks the form as completed. This way, you do not depend on default values (which the user may forget to change), you can save in any state, and you still have the validation in the end.
This is a situation that cries out for split tables. I know you said you don't want to do that, and in a comment even said "this project doesn't warrant that level of effort". but it's really the best solution.
Set up preliminary table(s) with everything except your key nullable. When the user marks the form complete, and it passes validation, move it to the final table(s). not only is this The Right Thing To Do, but it's probably less effort than "coding around nullable values" when working with finished forms.
If you need to see all forms, finished or not, make a Union view.
I'd take the first option but add a column to the database tables so that when the form is completed this is flagged. Then for anything using the form data it merely needs to check that the form has been completed.
That's my suggestion for a way around this.
NULL values are not searchable by the indexes.
If you'll need to issue a query like "select first 10 forms with a certain field unfilled", this query will use a FULL TABLE SCAN which may be not efficient.
Oracle does not distinguish between NULL and empty string, but other databases do. You'll probably want to make an empty string to be the DEFAULT for unfilled fields and use it in a search.
If you don't need to search on unfilled fields, then just make them NULL.
NULL generally means "Don't Know" (in a database) whereas an empty string could actually represent an empty string.
I would tend to use NULL as the "Don't Know" value in your case. When you print out data you'll just have to assume that any NULL value means an empty string.
CHECK CONSTRAINT + VIEW
if you don't have a status field add one so you can tell that it is finished.
add a check constraint on that status field so it can't be marked finished if any of the columns are null.
When you write your queries on "finished" forms you can ignore checking for nulls everywhere if you do one of these two options:
just add Status="F"inished in the where clause
make a view of only finished ones
when using the "finished view" you don't have to do all the validation checks or worry about unfinished ones showing up in the results
I've had a similar situation, and while I haven't yet come up with a solution, I have been toying with the idea of just using simple XML serialization to store the temporary document data. If you generate simple classes that model the data in the objects (using nullable types where needed, perhaps), it would be easy to stuff data from the screen into those objects, serialize them to XML and then store them in a temporary "staging" table. When your users are done working and want to submit or finalize the document, then you perform all of your needed validation against the serialized data, eventually putting into the "real" table with the proper data structures and constraints.

Resources