Store in DB or not to store? - asp.net

There are few string lists in my web application that i don't know where to store in DB or just class.
ie. I have 7 major browsers with which users enter the site. I want to save these stats thus i need to create browser column in UserLogin database. I don't want to waste space and resources so i can save full browser name in each login row. So i either need to save browserID field and hook it up with Browsers table which will store names following db normalization rules or to have sort of Dataholder abstract class which has a list of browsers from which i can retrieve browser name by it's ID...
The question what should i do ? These few data lists i have contain no more than 200 items each so i think it makes sense to have them as abstract class but again i don't know whether MS-SQL will handle multiple joins so well. Think of idea when i have user with country,ip,language,browser and few more stats ..
thanks

I have been on both sides of the fence about this.
My rule of thumb is:
If one of these lists changes, will I have to do changes to the code, too?
(e.g..: in your case, if someone writes "yet another browser" tomorrow, will I need to write code that caters for it?)
If the answer is "most probably yes" or "definitely" you can leave it inside code.
In all other cases (even just a "maybe, 50%-50%) you better put it in the DB, or at the very least a property file.
And please consider this, too: if you expect to have to provide statistics based on this data (e.g.: "how many users use Explorer") you better put it in the DB anyway: it becomes part of your domain data and therefore it must be there.
About the "domain data" part.
The information stored in your DB is the "domain data" of your application. It is, in a sense, a (hopefully consistent) representation of what your application is about - it represents the "known universe" for your application.
If you agree to this definition, then you must also accept that it does not make sense to have 99.9% of your "reality" in the DB, and 0.1% outside of it - if nothing else, it makes some operations cumbersome (if you only store the smallint you can't create meaningful reports without either post-processing them using the class to decode "1" into "Firefox" or providing some other key for the end-user).
It also makes impossible for you to leverage some inherent DB techniques like foreign key (if you just use a smallint without correlating it to any other table, who guarantees that "10" is an acceptable value in your domain?)

MS SQL handles multiple joins really well; it's up to you where you want to store the data. You can also consider XML too, as another option. I would consider the database or XL; it is easier to change the values than if the values are in code (have to recompile/deploy to change when in production).
HTH.

Related

What's the best way to cache complicated search queries in a .NET webapp?

I have a website that allows users to query for specific recipes using various search criteria. For example, you can say "Show me all recipes that I can make in under 30 minutes that will use chicken, garlic and pasta but not olive oil."
This query is sent to the web server over JSON, and deserialized into a SearchQuery object (which has various properties, arrays, etc).
The actual database query itself is fairly expensive, and there's a lot of default search templates that would be used quite frequently. For this reason, I'd like to start caching common queries. I've done a little investigation into various caching technologies and read plenty of other SO posts on the subject, but I'm still looking for advice on which way to go. Right now, I'm considering the following options:
Built in System.Web.Caching: This would provide a lot of control over how many items are in the cache, when they expire, and their priority. However, cached objects are keyed by a string, rather than a hashable object. Not only would I need to be able to convert a SearchQuery object into a string, but the hash would have to be perfect and not produce any collisions.
Develop my own InMemory cache: What I'd really like is a Dictionary<SearchQuery, Results> object that persists in memory across all sessions. Since search results can start to get fairly large, I'd want to be able to cap how many queries would be cached and provide a way for older queries to expire. Something like a FIFO queue would work well here. I'm worried about things like thread safety, and am wondering if writing my own cache is worth the effort here.
I've also looked into some other third party cache providers such as NCache and Velocity. These are both distributed cache providers and are probably completely overkill for what I need at the moment. Plus, it seems every cache system I've seen still requires objects to be keyed by a string. Ideally, I want something that holds a cache in process, allows me to key by an object's hash value, and allows me to control expiration times and priorities.
I'd appreciate any advice or references to free and preferably open source solutions that could help me out here. Thanks!
Based on what you are saying, I recommend you use System.Web.Caching and build that into your DataAccess layer shielding it from the rest of you system. When called you can make your real time query or pull from a cached object based on your business/application needs. I do this today, but with Memcached.
An in-memory cache should be pretty easy to implement. I can't think of any reason why you should have particular concerns about validating the uniqueness of a SearchQuery object versus any other - that is, while the key must be a string, you can just store the original object along with the results in the cache, and validate equality directly after you've got a hit on the hash. I would use System.Web.Caching for the benefits you've noted (expiration, etc.). If there happened to be a collision, then the 2nd one would just not get cached. But this would be extremely rare.
Also, the amount of memory needed to store search results should be trivial. You don't need to keep the data of every single field, of every single row, in complete detail. You just need to keep a fast way to access each result, e.g. an int primary key.
Finally, if there are possibly thousands of results for a search that could be cached, you don't even need to keep an ID for each one - just keep the first 100 or something (as well as the total number of hits). I suspect if you analyzed how people use search results, it's a rare person that goes beyond a few pages. If someone did, then you can just run the query again.
So basically you're just storing a primary key for the first X records of each common search, and then if you get a hit on your cache, all you have to do is run a very inexpensive lookup of a handful of indexed keys.
Give a quick look to the Enterprise library Caching Application Block. Assuming you want a web application wide cache, this might be the solution your looking for.
I'm assuming that generating a database query from a SearchQuery object is not expensive, and you want to cache the result (i.e. rowset) obtained from executing the query.
You could generate the query text from your SearchQuery object and use that text as the key for a lookup using System.Web.Caching.
From a quick reading the documentation for the Cache class it appears that the keys have to be unique - which they would be if you used they query text - not the hash of the key.
EDIT
If you are concerned about long cache keys then check the following links:
Cache key length in asp.net
Maximum length of cache keys in HttpRuntime.Cache object?
It seems that the Cache class stores the cached items in an internal dictionary, which uses the key's hash. Keys (query text) with the same hash would end-up in the same bucket in the dictionary, where its just a quick linear search to find the required one when do a cache lookup. So I think you'd be okay with long key strings.
The asp.net caching is pretty well thought out, and I don't think this is a case where you need something else.

How to implement gapless, user-friendly IDs in NHibernate?

I'm designing an application where my Order objects need to have a sequential and user-friendly Id field. I'm avoiding the HiLo algorithm because of the rather large gaps it produces (see here). Naturally, Guid values would make my corporate users go bananas. I'm also avoiding Oracle sequences because of the major disadvantages of it:
(From: NHibernate POID Generators revealed)
Post insert generators, as the name
suggest, assigns the id’s after the
entity is stored in the database. A
select statement is executed against
database. They have many drawbacks,
and in my opinion they must be used
only on brownfield projects. Those
generators are what WE DO NOT SUGGEST
as NH Team.
> Some of the drawbacks are the
following:
Unit Of Work is broken with the use of
those strategies. It doesn’t matter if
you’re using FlushMode.Commit, each
Save results in an insert statement
against DB. As a best practice, we
should defer insertions to the commit,
but using a post insert generator
makes it commit on save (which is what
UoW doesn’t do).
Those strategies
nullify batcher, you can’t take the
advantage of sending multiple queries
at once(as it must go to database at
the time of Save).
Any ideas/experience on implementing user-friendly IDs without major gaps between them?
Edit:
User friendly Id fields are ones my corporate users can memorize and even discuss and/or have phone conversations talking about a particular Order by its code, e.g. "I'm calling to know why the order #1625 was denied.".
The Id doesn't need to be strictly gapless, but I am worried that my users would get confused when they see gaps like 100, 201, 305. For my older projects, I currently implement NHibernate using Oracle sequences which occasionally lose a few sequences when exceptions are thrown, but yet keep a rather tidy order to them. The downside to them is how they break the Unit of Work which results in additional hits to the database for every Save command with or without the Session.Flush.
One option would be to keep a key-table that simply stores an incrementing value. This can introduce a few problems, namely possible locking issues as well as additional hits to the database.
Another option might be to refine what you mean by "User-friendly Id". This could consist of a combination of a Date/Time and a customer-specific sequence (or including the customer id as well). Also, your order id does not necessarily have to be the actual key on the table. There is nothing to say that you can't use a surrogate key with a separate "calculated" column which represents the order id.
The bottom-line is that it sounds like you want to use a surrogate key, but have the benefits of a natural key. It can be very difficult to have it both ways and a lot comes down to how you actually plan on using the data, how users interpret the data, and personal preference.

What is the best way to implement multilingual domain objects using NHibernate?

What is the best way to design the Domain objects which can have multi-lingual fields. An example can be a Product class with Description being multi-lingual.
I have found few links but could not decide which one is the best way.
http://fabiomaulo.blogspot.com/2009/06/localized-property-with-nhibernate.html
(This stores all localised language data in one field. Can be a problem if we query from Sql)
http://ayende.com/Blog/archive/2006/12/26/LocalizingNHibernateContextualParameters.aspx
(This one has a warning at the beginning that it is a hack and no longer supported)
http://www.webdevbros.net/2009/06/24/create-a-multi-languaged-domain-model-with-nhibernate-and-c/
(This does not describe how multilingual data will be structured in the database.)
Anyone having experience with using NHibernate with multi-lingual data. Is there a better way?
The third option looks great. The hibernate mapping is given, but not the database schema - if that's what you are missing, then I'll sketch it out here:
dictionary
----------
ID: int - identity
name: nvarchar(255)
phrase
------
dictionary_id:int (fkey dictionary.ID)
culture_id:int (LCID)
phrase:nvarchar(255) - this is the default size - seems too small
According to this blog entry, 255 is the default string length for String values. To overcome the short string length on the phrase text, you can change the <element> tag to
<element column="phrase" type="String" length="4001"></element>
To use this in your domain model, you add a PhraseDictionary property to your entity where you want translatable text. E.g. the title property or decription property.
I think the article describes a great approach, and is the one that I would go
for.
EDIT: In response to the comments, make the length less than 4001 if you know the absolute maximum size is less than that, as this will typically be faster. Also, NHibernate will lazily fetch the collection, but it may fetch all the items at once. You can profile to determine if this has any performance implications. (If you have only a handful of languages then I doubt you will see a difference.) If you have many languages (Say 50+) then it may be worthwhile creating custom properties to fetch the localized text. These will issue queries to fetch specifically the text required. More importantly, you may be able to fetch all the text for a given entity in one query, rather than each localized text property as a separate query.
Note that this extra effort is only needed if profiling gives you reason to be concerned about the performance. Chances are that the implementation in the article as is will function more than adequately.
I only have experience for Hibernate, but since nHibernate is so similar:
One option is to define a component type MultilingualString with members for each language (this assumes the set of languages is known at coding time). This type is also a convenient location to place an getter for the string by language id.
class MultiLingualString {
String english;
String chinese;
String klingon;
String forLanguage(Language lang) {
switch (lang) {
// you can guess what goes here
}
}
}
This results in the strings for all languages being stored in separate columns in the database while the representation in the object world retains fine granularity.
The advantage is that no join is required to fetch the strings. On the other hand, the only way not to fetch a string with this approach is to use a projection, which is a severe limitation if the strings are large, numerous and rarely needed.
If you do this a lot, writing a UserType might be worth it.
From a strictly database oriented standpoint with SQL Server, you should have one table with all of the base data (record key, dates, numbers, etc) and one table with all of the translatable string data. Let call the two tables Base and Base_Description.
Base ensures that there is a single key for each record, the key might be a string or auto-generated id depending on your particular use case.
The Base_Description table is related to the Base table, but also contains a value to select the language that the data is in. In my projects we use the langid column from sys.languages because we can set the language of the connection with and then grab it with ##LANGID for most operations.
In our testing we found this to be significantly faster than having multiple fields for each language, it also allows you to add other languages more easily. We are also using SQL Server Full-Text indexing and it fully works with this method. You should index in the neutral language and then you can pick the language to search against at run time (also filtering against the LangID column in Base_Description).
Do your requirements include the domain objects actually having multiple-language properties in the same object? And, if so, is it unlimited translations stored in the object (in a collection, say - in which case I would say that it would need to be just like any master/detail or parent/child collection) or fixed translations, in which case the languages (and thus the mapping to results of a stored proc or whatever) have to be determined statically anyway?
In many internationalized applications I worked on, the data was in only one language - customer names, the product names (there was no point in mapping even identical products used in one country to products in another, they all had different distributors and different SKUs, and of course localized pricing). The interface was also only in one language (at a time). So all the domain objects only required one language at a time. Thus the language of the translation would be determined when the object was instantiated.
We had translation user interfaces which allowed users to update the translated texts, but these only required two languages at a time (local and the default). I can see this being closest to what you are talking about. I guess that you would have child collections for each translatable property with all the possible translations in the collection. This would probably be closest to the second solution in the third article you linked. Of course, at this point you would also need to see if you want eager/lazy loading etc.

Allowing nulls vs default values

I'm working on an ASP.NET project that replaces many existing paper forms. One of the requirements is that the user can save the form in any state, i.e. they could create a new blank form and immediately save it with no data or with partial data. I'm validating for data type on every save but validation for required fields does not occur until the user marks the form as completed.
I'm not sure what the best approach is to handle this requirement in the database and domain model. As I see it, I have two options:
Allow nulls for any field that may not have data. This feels like the "correct" approach but it requires that almost every database field allow nulls and I have to code around a lot of nullable types. Also, when the form is finalized none of the required fields are enforced in the database.
Populate my business objects with meaningful default values. In some cases, there are meaningful default values for many (but not all) fields that I could use. This approach verges on "magic numbers" which makes me uncomfortable.
Which approach is best? Or is there a third way? I'm not willing to go to extremes, such as splitting the tables.
Edited to add: I wanted to expand on this a bit since I accepted a response. The primary reason that I'm not interested in splitting the tables is that once a project is submitted, the data on the forms is used to generate data for another system that is the system of record. At that point the original form data is unlikely to be revised or used for reporting.
I don't understand why you don't want to split the tables. I don't know what domain you're in but in any I could imagine there are two classes of people:
people who have submitted the form
people who haven't
And as a business executive I don't care about the second. But the first I care deeply about, and they need to have all their data in correctly.
It also improves efficiency - most of your queries about aggregate data will be over the first table, not the second. The second table will only be used for index seeks.
If splitting the table(s) (are there more than one?) is not an option, I would consider creating single table to store serialisations of objects of incomplete forms, and only commit a form to the "real" tables when the form is fully submitted by the user.
If there isn't a sensible default, and you don't want to split the data, then nulls are almost certainly your best option. Re the db not being to verify that they are not null when completed... well, if you don't want to split the table there isn't much you can do (short of using a CHECK constraint, or an INSTEAD OF trigger to run validation). But the DB isn't the only place responsible for data validation. Your app logic can do that too.
You could use a temporary table with "allow nulls" on every column to store the form containing partial or no data and copy / move the data to the final table when the user marks the form as completed. This way, you do not depend on default values (which the user may forget to change), you can save in any state, and you still have the validation in the end.
This is a situation that cries out for split tables. I know you said you don't want to do that, and in a comment even said "this project doesn't warrant that level of effort". but it's really the best solution.
Set up preliminary table(s) with everything except your key nullable. When the user marks the form complete, and it passes validation, move it to the final table(s). not only is this The Right Thing To Do, but it's probably less effort than "coding around nullable values" when working with finished forms.
If you need to see all forms, finished or not, make a Union view.
I'd take the first option but add a column to the database tables so that when the form is completed this is flagged. Then for anything using the form data it merely needs to check that the form has been completed.
That's my suggestion for a way around this.
NULL values are not searchable by the indexes.
If you'll need to issue a query like "select first 10 forms with a certain field unfilled", this query will use a FULL TABLE SCAN which may be not efficient.
Oracle does not distinguish between NULL and empty string, but other databases do. You'll probably want to make an empty string to be the DEFAULT for unfilled fields and use it in a search.
If you don't need to search on unfilled fields, then just make them NULL.
NULL generally means "Don't Know" (in a database) whereas an empty string could actually represent an empty string.
I would tend to use NULL as the "Don't Know" value in your case. When you print out data you'll just have to assume that any NULL value means an empty string.
CHECK CONSTRAINT + VIEW
if you don't have a status field add one so you can tell that it is finished.
add a check constraint on that status field so it can't be marked finished if any of the columns are null.
When you write your queries on "finished" forms you can ignore checking for nulls everywhere if you do one of these two options:
just add Status="F"inished in the where clause
make a view of only finished ones
when using the "finished view" you don't have to do all the validation checks or worry about unfinished ones showing up in the results
I've had a similar situation, and while I haven't yet come up with a solution, I have been toying with the idea of just using simple XML serialization to store the temporary document data. If you generate simple classes that model the data in the objects (using nullable types where needed, perhaps), it would be easy to stuff data from the screen into those objects, serialize them to XML and then store them in a temporary "staging" table. When your users are done working and want to submit or finalize the document, then you perform all of your needed validation against the serialized data, eventually putting into the "real" table with the proper data structures and constraints.

Bulk Collection Manipulation through a REST (RESTful) API

I'd like some advice on designing a REST API which will allow clients to add/remove large numbers of objects to a collection efficiently.
Via the API, clients need to be able to add items to the collection and remove items from it, as well as manipulating existing items. In many cases the client will want to make bulk updates to the collection, e.g. adding 1000 items and deleting 500 different items. It feels like the client should be able to do this in a single transaction with the server, rather than requiring 1000 separate POST requests and 500 DELETEs.
Does anyone have any info on the best practices or conventions for achieving this?
My current thinking is that one should be able to PUT an object representing the change to the collection URI, but this seems at odds with the HTTP 1.1 RFC, which seems to suggest that the data sent in a PUT request should be interpreted independently from the data already present at the URI. This implies that the client would have to send a complete description of the new state of the collection in one go, which may well be very much larger than the change, or even be more than the client would know when they make the request.
Obviously, I'd be happy to deviate from the RFC if necessary but would prefer to do this in a conventional way if such a convention exists.
You might want to think of the change task as a resource in itself. So you're really PUT-ing a single object, which is a Bulk Data Update object. Maybe it's got a name, owner, and big blob of CSV, XML, etc. that needs to be parsed and executed. In the case of CSV you might want to also identify what type of objects are represented in the CSV data.
List jobs, add a job, view the status of a job, update a job (probably in order to start/stop it), delete a job (stopping it if it's running) etc. Those operations map easily onto a REST API design.
Once you have this in place, you can easily add different data types that your bulk data updater can handle, maybe even mixed together in the same task. There's no need to have this same API duplicated all over your app for each type of thing you want to import, in other words.
This also lends itself very easily to a background-task implementation. In that case you probably want to add fields to the individual task objects that allow the API client to specify how they want to be notified (a URL they want you to GET when it's done, or send them an e-mail, etc.).
Yes, PUT creates/overwrites, but does not partially update.
If you need partial update semantics, use PATCH. See http://greenbytes.de/tech/webdav/draft-dusseault-http-patch-14.html.
You should use AtomPub. It is specifically designed for managing collections via HTTP. There might even be an implementation for your language of choice.
For the POSTs, at least, it seems like you should be able to POST to a list URL and have the body of the request contain a list of new resources instead of a single new resource.
As far as I understand it, REST means REpresentational State Transfer, so you should transfer the state from client to server.
If that means too much data going back and forth, perhaps you need to change your representation. A collectionChange structure would work, with a series of deletions (by id) and additions (with embedded full xml Representations), POSTed to a handling interface URL. The interface implementation can choose its own method for deletions and additions server-side.
The purest version would probably be to define the items by URL, and the collection contain a series of URLs. The new collection can be PUT after changes by the client, followed by a series of PUTs of the items being added, and perhaps a series of deletions if you want to actually remove the items from the server rather than just remove them from that list.
You could introduce meta-representation of existing collection elements that don't need their entire state transfered, so in some abstract code your update could look like this:
{existing elements 1-100}
{new element foo with values "bar", "baz"}
{existing element 105}
{new element foobar with values "bar", "foo"}
{existing elements 110-200}
Adding (and modifying) elements is done by defining their values, deleting elements is done by not mentioning it the new collection and reordering elements is done by specifying the new order (if order is stored at all).
This way you can easily represent the entire new collection without having to re-transmit the entire content. Using a If-Unmodified-Since header makes sure that your idea of the content indeed matches the servers idea (so that you don't accidentally remove elements that you simply didn't know about when the request was submitted).
Best way is :
Pass Only Id Array of Deletable Objects from Front End Application To Web API
2. Then You have Two Options:
2.1 Web API Way : Find All Collections/Entities using Id arrays and Delete in API , but you need to take care of Dependant entities like Foreign Key Relational Table Data too
2.2. Database Way : Pass Ids to your database side, find all records in Foreign Key Tables and Primary Key Tables and Delete in same order i.e. F-Key Table records then P-Key Table records

Resources