Best Practices for updating multiple check boxes on a web form to a database - asp.net

A sample case scenario - I have a form with one question and multiple answers as checkboxes, so you can choose more than one. Table for storing answers is as below:
QuestionAnswers
(
UserID int,
QuestionID int,
AnswerID int
)
What is the best way of updating those answers to the database using a stored proc? At different jobs I've seen all spectrum, from simply deleting all previous answers and inserting new ones, to passing list of answers to remove and list of answers to add to the stored proc.
In my current project performance and scalability are pretty important, so I'm wondering what's the best way of doing it?
Thanks!
Andrey

If I had a choice of table design, and the following statements are true:
You know the maximum choices count per question/
Each choice is a simple checked/unchecked.
Each answer be classified as correct/wrong rather than marked by some scale. (Like 70% right.)
Then considering performance I would considered the following table instead of the one you presented:
QuestionAnswers
(
UserID int,
QuestionID int,
Choice1 bool,
Choice2 bool,
...
ChoiceMax bool
)
Yes, it is ugly in terms of normalization but that denormalization will buy performance and simplify queries -- just one update/insert for one question. (And I would update first and insert only if affected rows equals to zero.)
Also detecting whether the answer was correct will be also more simple -- with the following table:
QuestionCorrectAnswers
(
QuestionID int,
Choice1 bool,
Choice2 bool,
...
ChoiceMax bool
)
All you need to do is just to lookup for the row in QuestionCorrectAnswers with the same combination of choices as user answered.

If the questions are always the same, then you'd never delete anything - just run an update query on all changed Answers.
Update QuestionAnswers
SET AnswerID = #AnswerID
WHERE UserID = #UserID AND QuestionID = #QuestionID
If for some reason you still need to do some delete/insert - I'd check which QuestionIDs already exist (for the given UserID) so you do a minimum of Delete/Insert.
Updates are just far faster than Delete then Insert, plus you don't make any identity columns skyrocket.
I presume you load the QuestionAnswers from DB upon entering the page, so the user can see which answers he/she gave last time - if you did you already have the necessary data in memory, for determining what to delete, insert and update.

Andrey: If the user (userid=1) selects choices a(answerid=1) & b(answerid=2) for question 1(questionid=1) and later switches to c (a-id=3) & d(a-id=4), you would have to check, then delete the old and add the new. If you choose to follow the other approach, you would not be checking if a particular record exists (so that you can update it), you would just delete old records and insert new records. Anyways, since you are not storing any identity columns, I would go with the latter approach.

It is a simple solution:
Every [Answer] should have integer value (bit) and this value is unique for current Question.
For example, you have Question1 and four predefined answers:
[Answer] [Bit value]
answer1 0x00001
answer2 0x00002
answer3 0x00004
answer4 0x00008
...
So, you SQL INSERT/UPDATE will be:
declare #checkedMask int
set #checkedMask = 0x00009 -- answer 1 and answer 4 are checked
declare #questionId int
set #questionId = 1
-- delete
delete
--select r.*
r
from QuestionResult r
inner join QuestionAnswer a
on r.QuestionId = a.QuestionId and r.AnswerId = a.AnswerId
where r.QuestionId = #questionId
and (a.mask & #checkedMask) = 0
-- insert
insert QuestionResult (AnswerId, QuestionId)
select
AnswerId,
QuestionId
from QuestionAnswer a
where a.QuestionId = #questionId
and (a.mask & #checkedMask) > 0
and not exists(select AnswerId from QuestionResult r
where r.QuestionId = #questionId and r.AnswerId = a.AnswerId)

Sorry to resurect an old thread. I would have thought the only realistic solution is to delete all responses for that question, and create new rows where the checkbox is ticked. Having a column per answer may be efficient as far as updates go, but the inflexibility of this approach is just not an option. You need to be able to add options to a question without having to redesign your database.
Just delete and re-insert. Thats what databases are designed to do, store and retrieve lots of rows of data.

I disagree that regent's answer is denormalized. As long as each answer is not dependent on another column, and is only dependent on the key, it is in 3rd normal form. It is no different than a table with the following fields for a customer name:
CustomerName
(
name_prefix
name_first
name_mi
name_last
name_suffix
city
state
zip
)
Same as
QuestionAnswers
(
Q1answer1
Q1answer2
Q1answerN
)
There really is no difference between the "Question" of name and the multiple answers which may or may not be filled out and the "Question" of the form and the multiple answers that may or may not be selected.

Related

Efficient insertion of row and foreign table row if it does not exist

Similar to this question and this solution for PostgreSQL (in particular "INSERT missing FK rows at the same time"):
Suppose I am making an address book with a "Groups" table and a "Contact" table. When I create a new Contact, I may want to place them into a Group at the same time. So I could do:
INSERT INTO Contact VALUES (
"Bob",
(SELECT group_id FROM Groups WHERE name = "Friends")
)
But what if the "Friends" Group doesn't exist yet? Can we insert this new Group efficiently?
The obvious thing is to do a SELECT to test if the Group exists already; if not do an INSERT. Then do an INSERT into Contacts with the sub-SELECT above.
Or I can constrain Group.name to be UNIQUE, do an INSERT OR IGNORE, then INSERT into Contacts with the sub-SELECT.
I can also keep my own cache of which Groups exist, but that seems like I'm duplicating functionality of the database in the first place.
My guess is that there is no way to do this in one query, since INSERT does not return anything and cannot be used in a subquery. Is that intuition correct? What is the best practice here?
My guess is that there is no way to do this in one query, since INSERT
does not return anything and cannot be used in a subquery. Is that
intuition correct?
You could use a Trigger and a little modification of the tables and then you could do it with a single query.
For example consider the folowing
Purely for convenience of producing the demo:-
DROP TRIGGER IF EXISTS add_group_if_not_exists;
DROP TABLE IF EXISTS contact;
DROP TABLE IF EXISTS groups;
One-time setup SQL :-
CREATE TABLE IF NOT EXISTS groups (id INTEGER PRIMARY KEY, group_name TEXT UNIQUE);
INSERT INTO groups VALUES(-1,'NOTASSIGNED');
CREATE TABLE IF NOT EXISTS contact (id INTEGER PRIMARY KEY, contact TEXT, group_to_use TEXT, group_reference TEXT DEFAULT -1 REFERENCES groups(id));
CREATE TRIGGER IF NOT EXISTS add_group_if_not_exists
AFTER INSERT ON contact
BEGIN
INSERT OR IGNORE INTO groups (group_name) VALUES(new.group_to_use);
UPDATE contact SET group_reference = (SELECT id FROM groups WHERE group_name = new.group_to_use), group_to_use = NULL WHERE id = new.id;
END;
SQL that would be used on an ongoing basis :-
INSERT INTO contact (contact,group_to_use) VALUES
('Fred','Friends'),
('Mary','Family'),
('Ivan','Enemies'),
('Sue','Work colleagues'),
('Arthur','Fellow Rulers'),
('Amy','Work colleagues'),
('Henry','Fellow Rulers'),
('Canute','Fellow Ruler')
;
The number of values and the actual values would vary.
SQL Just for demonstration of the result
SELECT * FROM groups;
SELECT contact,group_name FROM contact JOIN groups ON group_reference = groups.id;
Results
This results in :-
1) The groups (noting that the group "NOTASSIGNED", is intrinsic to the working of the above and hence added initially) :-
have to be careful regard mistakes like (Fellow Ruler instead of Fellow Rulers)
-1 used because it would not be a normal value automatically generated.
2) The contacts with the respective group :-
Efficient insertion
That could likely be debated from here to eternity so I leave it for the fence sitters/destroyers to decide :). However, some considerations:-
It works and appears to do what is wanted.
It's a little wasteful due to the additional wasted column.
It tries to minimise the waste by changing the column to an empty string (NULL may be even more efficient, but for some can be confusing)
There will obviously be an overhead BUT in comparison to the alternatives probably negligible (perhaps important if you were extracting every Facebook user) but if it's user input driven likely irrelevant.
What is the best practice here?
Fences again. :)
Note Hopefully obvious, but the DROP statements are purely for convenience and that all other SQL up until the INSERT is run once
to setup the tables and triggers in preparation for the single INSERT
that adds a group if necessary.

To CRUD or not to CRUD

I know the benefits of using CRUD, and that there are also some disadvantages, but I'd like to get some more expert feedback and advice on the process below for writing data to a database, particularly regarding best practice and possible pro's and con's.
I've come across two basic methods of creating records in my time as a developer. The first (and usually least helpful in most of the works I've seen) is to create a stub and use the various populated fields (including the PK) wherever it is needed. This usually leads to a raft of disowned records floating around the database with no real purpose.
The second way is to only hold a stub in memory, giving (what would be) the object's PK field a default value of, for instance, -1 to represent a new record. This keeps database access to a minimum, especially if the record is not needed later.
Personally, I've found the second way a lot more forgiving and straightforward than the first. The question I'd like to pose, though, is whether to rule out CRUD in favour of a stored procedure that carries out both the INSERT and UPDATE aspects of the CRUD process based on the afore mentioned default value, something like...
BEGIN
IF #record_id = -1
INSERT ....
ELSE
UPDATE ....
END
Any feedback would be appreciated.
As a rule of thumb, I tend to write Upsert procedures.......but I based the "match" on the unique_constraint, not the surrogate key.
For example.
dbo.Employee
EmployeeUUID is the PK, Surrogate Key
SSN is a unique constraint.
dbo.uspEmployeeUpsert would look something like this:
Insert into dbo.Employee (EmployeeUUID , LastName , FirstName, SSN )
Select NEWID() , LastName , FirstName , SSN
from #SomeHolderTable holder
where not exists (select null from dbo.Employee innerRealTable where
innerRealTable.SSN = holder.SSN )
Update dbo.Employee
Set EmployeeUUID = holder.EmployeeUUID
, LastName = ISNULL ( holder.LastName , e.LastName ) /* or COALESCE */
, FirstName = COALESCE ( holder.FirstName , e.FirstName )
from dbo.Employee e , #SomeHolderTable holder
Where e.SSN = holder.SSN
You can also use the MERGE function.
You can also replace the SSN with the SurrogateKey (EmployeeUUID in this case)
What is #SomeHolderTable you ask?
I like to pass xml to the stored procedure, shred it into a #Variable or #Temp table, then write the logic for CU. D(elete) is possible as well, but I usually isolate to a separate procedure.
Why do I do it this way?
Because I can update 1 or 100 or 1000 or N records with one db hit.
My logic seldom changes, and is isolated to one place.
Now, there is a small performance hit for shredding the Xml.
But I find it acceptable 99% of the time.
Every once in a while, I write a non "set based" Upsert routine. But that is for heavy hitter procedures for heavy hitting usage.
That's my take.
You can see the "set based" part of this approach (with the older OPENXML syntax) at this article:
http://msdn.microsoft.com/en-us/library/ff647768.aspx
Find the phrase : "Perform bulk updates and inserts by using OpenXML"
Here is the "more code" version of what the above URL talks about:
http://support.microsoft.com/kb/315968
EDIT
if exists ( select 1 from dbo.Employee e where e.SSN = holder.SSN )
BEGIN
Insert into dbo.Employee (EmployeeUUID , LastName , FirstName, SSN )
Select NEWID() , LastName , FirstName , SSN
from #SomeHolderTable holder
where not exists (select null from dbo.Employee innerRealTable where
innerRealTable.SSN = holder.SSN )
END
I wouldn't necessarily do this. But its an option if you want a "boolean check".
So, with my uniqueidentifier setup, I will pass down an "Empty Guid" (00000000-0000-0000-0000-000000000000) (Guid.Empty in C#) to the procedure, when I know I have a new item. That would be my "-1" check in your scenario.
That's one method, that you could check for an "if exists".
It kinda depends on how many hands you have in the pot.
Also, I didn't mention that when I have lot of hands in the pot, I'll shred the xml.....then I'll do a BEGIN TRAN and COMMIT TRAN around my CU statements (with ROLLBACK in there as well). That way my CU is atomic, all or nothing.
The MERGE function will do this as well. But the pros and cons of MERGE is a different topic.

How do you write a good stored procedure for update?

I want to write a stored procedure (SQL server 2008r2), say I have a table:
person
Columns:
Id int (pk)
Date_of_birth date not null
Phone int allow null
Address int allow null
Name nvarchat(50) not null
Sample data:
Id=1,Date_of_birth=01/01/1987,phone=88888888,address=null,name='Steve'
Update statement in Stored procedure, assume
The parameters are already declare:
Update person set
Date_of_birth=#dob,phone=#phone,address=#address,name=#name where id=#id
The table has a trigger to log any changes.
Now I have an asp.net update page for updating the above person table
The question is, if user just want to update address='apple street' , the above update statement will update all the fields but not check if the original value = new value, then ignore this field and then check the next field. So my log table will log all the event even the columns are not going to be updated.
At this point, my solutions
Select all the value by id and store them into local variables.
Using if-else check and generate the update statement. At last,
dynamically run the generated SQL (sp_executesql)
Select all the value by id and store them into local variables.
Using if-else check and update each field seperately:
If #dob <> #ori_dob
Begin
Update person set date_of_birth=#dob where id=#id
End
May be this is a stupid question but please advice me if you have better idea, thanks!
This is an answer to a comment by the OP and does not address the original question. It would, however, be a rather ugly comment.
You can use a statement like this to find the changes to Address within an UPDATE trigger:
select i.Id, d.Address as OldAddress, i.Address as NewAddress
from inserted as i inner join
deleted as d on d.Id = i.Id
where d.Address <> i.Address
One such statement would be needed for each column that you want to log.
You could accumulate the results of the SELECTs into a single table variable, then summarize the results for each Id. Or you can use INSERT/SELECT to save the results directly to your log table.

Comment system database design

I'm developing a system like SO (completely different topic) and replies and comments are alike with the system we see everyday on StackOverflow.
My question is, I'm loading the question with a Stored PROC, loading replies with another Stored PROC and now I'm adding comment system. Do I need to fetch the comments 1 by 1 for each of the replies on topic?
This means that if I have my page size set to 20 replies, I'll be doing 22 database operations which is more than I was thinking.
I don't think I need to add my database diagram for this question but still here it is:
Questions
-----------
QUESTION_ID
USER_ID
QUESTION_TEXT
DATE
REPLIES
-----------
REPLY_ID
QUESTION_ID
USER_ID
REPLY_TEXT
DATE
COMMENTS
------------
REPLY_ID (fk replies)
USER_ID
TEXT
DATE
You should get all your comments at once.
Then make DataViews from the result with a filter for each reply and bind to that DataView. You could also use linq to entities and just filter out new sets on each bind. Here is a basic pseudo code example:
Get all comments for all replies to question
Bind replies
Implement the OnDataBinding for the reply control that will display the comments
In the OnDataBinding add a filter to the result set for the comments with the same reply ID
Bind the filtered list of comments to the display control for comments
This should work and I have implement the same scenario for similar types of data structures.
Pabuc,
For your initial Question, why not get all the results using a single Query for the given question / reply ?
select reply_text, user_id
from REPLIES
order by DATE asc
Also, as you pointed out, except for the minor differences, the question and answer have almost the same attributes as that of a post.
Wouldn't a model like the one below make more sense? The Question and Answer are both "posts" with the only difference being an answer has the question as the parent and the question has no parent.
Create table post -- question/reply (
post_id number,
parent_post_id number, -- will be null if it is the question, will have the question id
-- if it is a reply to a question
post_text varchar2(4000),
user_id number,
post_date date);
-self referential foreign key
Alter table post
add constraint foreign key (parent_post_id) references post(post_id);
--comments to all posts (questions/replies).
create table comments(
comment_id number,
post_id number,
comment_txt varchar2(140),
comment_user_id number,
comment_date date
);
alter table comments add constraint fk_comments_post
foreign key (post_id) references post(post_id).
-- for a given Question (post) id, you can get all the replies and posts using...
select replies.*,
comments.*
from posts replies,
comments
where replies.parent_id = :Question_id --input
and comments.post_id = replies.post_id
You might have to add an order by clause to get the results based on points, updated_timestamp or any other attribute as needed.

Hierarchical Database Select / Insert Statement (SQL Server)

I have recently stumbled upon a problem with selecting relationship details from a 1 table and inserting into another table, i hope someone can help.
I have a table structure as follows:
ID (PK) Name ParentID<br>
1 Myname 0<br>
2 nametwo 1<br>
3 namethree 2
e.g
This is the table i need to select from and get all the relationship data. As there could be unlimited number of sub links (is there a function i can create for this to create the loop ?)
Then once i have all the data i need to insert into another table and the ID's will now have to change as the id's must go in order (e.g. i cannot have id "2" be a sub of 3 for example), i am hoping i can use the same function for selecting to do the inserting.
If you are using SQL Server 2005 or above, you may use recursive queries to get your information. Here is an example:
With tree (id, Name, ParentID, [level])
As (
Select id, Name, ParentID, 1
From [myTable]
Where ParentID = 0
Union All
Select child.id
,child.Name
,child.ParentID
,parent.[level] + 1 As [level]
From [myTable] As [child]
Inner Join [tree] As [parent]
On [child].ParentID = [parent].id)
Select * From [tree];
This query will return the row requested by the first portion (Where ParentID = 0) and all sub-rows recursively. Does this help you?
I'm not sure I understand what you want to have happen with your insert. Can you provide more information in terms of the expected result when you are done?
Good luck!
For the retrieval part, you can take a look at Common Table Expression. This feature can provide recursive operation using SQL.
For the insertion part, you can use the CTE above to regenerate the ID, and insert accordingly.
I hope this URL helps Self-Joins in SQL
This is the problem of finding the transitive closure of a graph in sql. SQL does not support this directly, which leaves you with three common strategies:
use a vendor specific SQL extension
store the Materialized Path from the root to the given node in each row
store the Nested Sets, that is the interval covered by the subtree rooted at a given node when nodes are labeled depth first
The first option is straightforward, and if you don't need database portability is probably the best. The second and third options have the advantage of being plain SQL, but require maintaining some de-normalized state. Updating a table that uses materialized paths is simple, but for fast queries your database must support indexes for prefix queries on string values. Nested sets avoid needing any string indexing features, but can require updating a lot of rows as you insert or remove nodes.
If you're fine with always using MSSQL, I'd use the vendor specific option Adrian mentioned.

Resources