Full-Text search in Sql server with multiple tables and ranking - asp.net

We have a website which is running on DNN 7.1 with SQL server. We implemented full text search to show search results. We need to search several tables and show the results to the user. Right now the implementation is user enters search word(s) and clicks search, the code behind creates several threads to search different tables, and merge the data. Currently we are using contains predicate, issue with this is, there is no ranking and sometimes after the merge the results on the first page are not the best matches. I thought that I can use containstable and order the results by ranking but I read ranking doesn't have any meaning by itself, it merely tells which one best matches in the current resultset. But in my scenario I have multiple resultsets, how will I know which are best matches across multiple resultsets. Or am I going about this wrong way? What is a good way to handle this scenario? We need to improve the response time along with better results. Any help is greatly appreciated.

this is how we implemented full text searching across multiple tables:
1) create a new table to store the primary keys of the other tables in each column, another column to store the string concatenated values of all the search fields from each table, and another column to store the checksum value of the concatenated values.
2) implement the FTI on this new table, and create a job that regularly syncrhonizes/updates the concatenated search values only if the binary_checksum value is different
3) use the contains predicate on this new table and based on the results, join back to their corresponding tables based on the primary keys returned.

Related

Pagination with Filtering using Query Operation in DynamoDB Template

I would like to be able to filter a pagination result using query operation before the limit is taken into consideration.Is there any suggestion to get right pagination on filtered results?
I would like to implement a DynamoDB Scan OR Query with the following logic:
Scanning -> Filtering(boolean true or false) -> Limiting(for pagination)
However, I have only been able to implement a Scan OR Query with this logic:
Scanning -> Limiting(for pagination) -> Filtering(boolean true or false)
Note: I have already tried Global Secondary Index but it didn't work in my case Because I have 5 different attributes to filter and limit.
Unfortunatelly DynamoDB is not capable to do this, once you do Query on one of your indexes, it will read every single item that satisfies your partition and sort key.
Lets check your example - You have boolean and you have index over that field. Lets say 50% of items are false and 50% are true. Once you search by that index you will read through 50% of all items in table (so its almost like SCAN). If you set up limit, it will read only that number of items and then it stops. You cannot use the combination of limit and skip/page/offset like in other databases.
There is some level of pagination https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.Pagination.html but it does not allow you to jump to i.e. page 10, it only allows you go through all the pages one by one. Also I am not sure how it is priced, maybe internally the AWS will go through all the items before preparing the results for you, so you will pay for reading 50% of whole table even if you stop iterating before you reach the end.
There is also the limitation that index can have maximum of 2 fields (partition, sort).
EXAMPLE
You wrote that you have 5 parameters you want to query. The workaround that is used to address these limitations is to create and manage extra fields that have combination of parameters you want to query. Lets say you have table of users and you have there gender, age, name, surname and position. Lets say its huge database, so you have to think about amount of data you can load. Then if you want to use DynamoDB, you have to think about all queries you want to do.
You most likely want to search by name and surname, so you create index with surname as partition key and name as sort key (in such case you can search by surname or by both surname and name). It can work for lot of names, but you found out that some name combinations are too common and you need to filter by position as well. In such case, you create new field (column) called i.e. name-surname and whenever you create or update item, you will need to handle this field in your app to make sure it contains both of it, i.e. will-smith. Then you can make another index, that has name-surname as partition key and position as sort key. Now you can use it for such searches.
However you found out, that for some name-surname-position combination you get too many results and you dont want to handle it on application level and you want to limit results by age as well. Then you can create index with name-surname-position as partition key and age as sort key. At this moment you can also figure out that your old name-surname field and index can be removed as it server no purposes anymore (name and surname are handled by another index and for searching just name-surname-position you can use this index)
You want to query by gender as well sometimes? Its probably better to handle that in application level (or extra filter in db query) rather than creating new index that must be handled and payed for. There are only two types of gender (ok, lets say there exists more, but 99% of people will have just male or female) so its probably cheaper to just hide few fields on application level if someone wants to check only male/female/transgenders..., but load all of them. Because for extra index you would have to pay for every single insert, but this filter will be used only from time to time. Also when someone searches already by name, surname and position you dont expect that much results anyway, so if you get 20 (all genders) or just 10 (male only) results does not make much difference.
This ^^ was just example of how you can think and work with DynamoDB. How exactly you use it depends on your business logic.
Very important note: DynamoDB is very simple database that can only do very simple queries. It has little more functionality than Redis but a lot less functionality than traditional databases. The valid result of thinking about your business model/use-cases is that maybe you should NOT use the DynamoDB at all, because it can simply not satisfy your needs and queries.
Some basic thinking can look like this:
Is key-value persistant storage enough? Use DynamoDB
Is key-value persistant storage, where one item can have multiple keys and I can search and filter by maximum of 2 fields enough? Use DynamoDB
Is persistant storage, where I want to search single Table/Collection by many multiple keys with lot of options enough? Use MongoDB
Do I need to search through multiple tables or do complex joins or need transactions? Use traditional SQL database

Ionic storage - tables with key-value pairs?

Is there a way to use something like tables combined with key value pairs in ionic 2+?
Explanation: I know ionic supports sqlite, but I don't need actual sql queries nor table structures. However the key-value pairs quickly hit a dead end.
For example if I have records of posts, all with unique ids (e.g. uuid), I could save every post as key-value like
let posts = [post1,post2,post3]
posts.foreEach(post=>{
this.storage.set(post.id,post)
})
However then I cannot retrieve the posts, because I don't know their ids.
Alternatively I could store the whole array like
let posts = [post1,post2,post3]
this.storage.set("posts",posts)
However then I cannot add, remove or edit a single post without first loading and then saving the whole array again. Especially with a lot of entries the rewriting becomes quite slow as I noticed.
It would be nice to have the option to group the key-value pairs into something like a table. Any chance to do so without using actual sql commands a la CREATE TABLE...?
I've seen the storage offers the option to create different instances, but unsure whether this fits the purpouse.

Ordering results (from a search) by number of matches

I'm implementing search for an Android app using an SQLite db, and am wanting to order the results from a search according to the number of matches in a TEXT column
For example let's say that my db table is called article, the TEXT column I want to search is called article_main_text, and that the the user searches for the word "friend". Then I want the db table rows which have the highest number of occurrences of "friend" in their article_main_text column to be shown first
I already know that I can provide custom SQLite statements in the sort order in my CursorLoader (for example supplying this to the CursorLoader constructor), so I'd like to ask: How can I write SQLite code that orders the results with the help of SQLite?
Counting just the occurrences of a word might not be the most accurate or most efficient way of doing it. What you really need is full text search. Which is supported in sqlite and available in Android.
Fortunately there is a tutorial on Android Developer and I am sure there would be a full working sample in their code collections.

SQL server inserting lots of data from ASP.NET?

I have this application, where there is a parent child table, and customers can order products. The whole structure is quite complex to post here but suffice to say, there is one Order table and one OrderDetails table for storing the orders. Currently what we are doing is INSERT one record in Order table, and then for each item the customer added, insert each item in a loop to OrderDetails table. The solution is not scalable for obvious reasons. It works fine for 100 or so items, but if user goes over 1000 items, or 1000 qty of a item or so, one can start to notice the unresponsiveness of the application.
There are a couple of solutions that come to mind, but I am not sure which one would scale well. One is I use BulkInsert from my asp.net application to insert into the OrderDetails table. Second is I generate XML and then pass that to a sql proc and extract / insert data into OrderDetails table from that XML, but that have associate overhead of memory consumption of the XML generated. I know I could benchmark and see for myself what would suit best for my application, but I would like to know what is the most common strategy and would scale better when compared to other. Also, if there is another technique that I could use instead of these two, that would be better performance wise ( I know performance is subjective word, but let me narrow it down to speed ) I could use that. Which is generally used the most? What do you use in your application?
You could consider exploring the option of using a table valued parameter in the database. You will have to create a table type object, whose structure will mimic that of the OrderDetails table. The stored proc for inserting the data will accept an input parameter of this type (such parameters are always READONLY).
In your server side code, you can construct a DataTable object containing all the Order Details data, which will be mapped to the input parameter of the stored proc. Ensure that the order of columns in the DataTable object exactly matches the order in the table valued parameter. Upon executing the query, all the data will be inserted in one shot. This will save you from looping for each row of data that is there, and will also prevent the overhead of XML parsing. This approach though will involve passing an entire object over the network.
You can read more about it here : MSDN Table Valued Parameters
1000 items for an order does seem quite excessive!
Would it be feasible to introduce a limit of 100 items per order into the business logic of the application?

Do we have to use fact table for reports?

I am working on building a data mart for reporting purpose.
I am new to this field and looking for help.
I have a fact table and two dimension tables.
The fact table has only 3 fields, its primary key and foreign key references to two dimension tables.
The two dimension tables have data related to 1)phonenumbers and 2)extension numbers.
(I cannot combine these dimension tables because they have different information)
As you see my fact table does not have any quantitative columns.
I want to generate a report that displays phonenumbers and corresponding extensions.
I can get this information by performing a join on the two dimension tables.
So my question is do I have to use fact table for the report? i.e Should I first get the key from phonenumber table, perform join on fact table, get extension key and perform join on extension table?
OR
Simply join the two dimension tables to generate the report because it is possible in this case?
Do we have to involve the fact table?
Thanks for reading.
Any help is appreciated.
do I have to use fact table for the report? i.e Should I first get the key from phonenumber table, perform join on fact table, get extension key and perform join on extension table?
Often, this is necessary.
Simply join the two dimension tables to generate the report because it is possible in this case?
Sometimes, this works, also.
Do we have to involve the fact table?
Depends on the relationships.
If you have a "hierarchy" of dimensional information, then the two dimensions could be directly related. In this case, the fact table doesn't tie them together. The fact ties to the detailed dimension; the detailed dimension ties to the summary. This is rare.
Dimensions change.
If you have two or more Slowly Changing Dimensions, then your dimensions may include lots of "previous" relationship information.
Fact 1: Phone xxx-xxx-xxxx, Extension yyyy
Fact 2: Phone xxx-xxx-xxxx, Extension zzzz
Then, another load applies an SCD rule to modify zzzz to aaaa, as of 7/1/11 You may have the old dimension values available, as well as the new dimension values, with an applicable date range.
Now, the fact (and the date range) are required to define which copy of the dimension value you're going to get.
Fact 2: Phone xxx-xxx-xxxx, Extension zzzz, from beginning to before 7/1/11.
Fact 2: Phone xxx-xxx-xxxx, Extension aaaa, from 7/1/11 to end.
So, you may need the fact, dimensions and time to sort out the relationships.

Resources