I need to write a web application using SQL Server 2005, asp.net, and ado.net. Much of the user data stored in this application must be encrypted (read HIPAA).
In the past for projects that required encryption, I encrypted/decrypted in the application code. However, this was generally for encrypting passwords or credit card information, so only a handful of columns in a couple tables. For this application, far more columns in several tables need to be encrypted, so I suspect pushing the encryption responsibilities into the data layer will be better performing, especially given SQL Server 2005's native support for several encryption types. (I could be convinced otherwise if anyone has real, empirical evidence.)
I've consulted BOL, and I'm fairly adept at using google. So I don't want links to online articles or MSDN documentation (its likely I've already read it).
One approach I've wrapped my head around so far is to use a symmetric key which is opened using a certificate.
So the one time setup steps are (performed by a DBA in theory):
Create a Master Key
Backup the Master Key to a file, burn to CD and store off site.
Open the Master Key and create a certificate.
Backup the certificate to a file, burn to CD and store off site.
Create the Symmetric key with encryption algorithm of choice using the certificate.
Then anytime a stored procedure (or a human user via Management Studio) needs to access encrypted data you have to first open the symmetric key, execute any tsql statements or batches, and then close the symmetric key.
Then as far as the asp.net application is concerned, and in my case the application code's data access layer, the data encryption is entirely transparent.
So my questions are:
Do I want to open, execute tsql statements/batches, and then close the symmetric key all within the sproc? The danger I see is, what if something goes wrong with the tsql execution, and code sproc execution never reaches the statement that closes the key. I assume this means the key will remain open until sql kills the SPID that sproc executed on.
Should I instead consider making three database calls for any given procedure I need to execute (only when encryption is necessary)? One database call to open the key, a second call to execute the sproc, and a third call to close the key. (Each call wrapped in its own try catch loop in order to maximize the odds that an open key ultimately is closed.)
Any considerations should I need to use client side transactions (meaning my code is the client, and initiates a transaction, executes several sprocs, and then commits the transaction assuming success)?
1) Look into using TRY..CATCH in SQL 2005. Unfortunately there is no FINALLY, so you'll have to handle both the success and error cases individually.
2) Not necessary if (1) handles the cleanup.
3) There isn't really a difference between client and server transactions with SQL Server. Connection.BeginTransaction() more or less executes "BEGIN TRANSACTION" on the server (and System.Transactions/TransactionScope does the same, until it's promoted to a distributed transaction). As for concerns with open/closing the key multiple times inside a transaction, I don't know of any issues to be aware of.
I'm a big fan of option 3.
Pretend for a minute you were going to set up transaction infrastructure anyways where:
Whenever a call to the datastore was about to be made if an existing transaction hadn't been started then one was created.
If a transaction is already in place then calls to the data store hook into that transaction. This is often useful for business rules that are raised by save/going-to-the-database events. IE. If you had a rule that whenever you sold a widget you needed to update a WidgetAudit table, you'd probably want to wrap the widget audit insert call in the same transaction as that which is telling the datastore a widget has been sold.
Whenever a the original caller to the datastore (from step 1) is finished it commits/rollbacks the transaction, which affects all the database actions which happened during its call (using a try/catch/finally).
Once this type of transactioning is created then it becomes simple to tack on a open key at the beginning (when the transaction opens) and close the key at the end (just before the transaction ends). Making "calls" to the datastore isn't nearly as expensive as opening a connection to the database. It's really things like SQLConnection.Open() that burns resources (even if ADO.NET is pooling them for you).
If you want an example of these types of codes I would consider looking at NetTiers. It has quite an elegant solution for the transactioning that we just described (assuming you don't already have something in mind).
Just 2 cents. Good luck.
you can use ##error to see if any errors occured during the call to a sproc in SQL.
No to complicated.
You can but I prefer to use transactions in SQL Server itself.
Related
I need to persist my domain objects into two different databases. This use case is purely write-only. I don't need to read back from the databases.
Following Domain Driven Design, I typically create a repository for each aggregate root.
I see two alternatives. I can create one single repository for my AG, and implement it so that it persists the domain object into the two databases.
The second alternative is to create two repositories, one each for each database.
From a domain driven design perspective, which alternative is correct?
My requirement is that it must persist the AR in both databases - all or nothing. So if the first one goes through and the second fails, I would need to remove the AG from the first one.
If you had a transaction manager that were to span across those two databases, you would use that manager to automatically roll back all of the transactions if one of them fails. A transaction manager like that would necessarily add overhead to your writes, as it would have to ensure that all transactions succeeded, and while doing so, maintain a lock on the tables being written to.
If you consider what the transaction manager is doing, it is effectively writing to one database and ensuring that write is successful, writing to the next, and then committing those transactions. You could implement the same type of process using a two-phase commit process. Unfortunately, this can be complicated because the process of keeping two databases in sync is inherently complex.
You would use a process manager or saga to manage the process of ensuring that the databases are consistent:
Write to the first database and leave the record in a PENDING status (not visible to user reads).
Make a request to second database to write the record in a PENDING status.
Make a request to the first database to leave the record in a VALID status (visible to user reads).
Make a request to the second database to leave the record in a VALID status.
The issue with this approach is that the process can fail at any point. In this case, you would need to account for those failures. For example,
You can have a process that comes through and finds records in PENDING status that are older than X minutes and continues pushing them through the workflow.
You can can have a process that cleans up any PENDING records after X minutes and purges them from the database.
Ideally, you are using something like a queue based workflow that allows you to fire and forget these commands and a saga or process manager to identify and react to failures.
The second alternative is to create two repositories, one each for each database.
Based on the above, hopefully you can understand why this is the correct option.
If you don't need to write why don't build some sort of commands log?
The log acts as a queue, you write the operation in it, and two processes pulls new command from it and each one update a database, if you can accept that in worst case scenario the two dbs can have different version of the data, with the guarantees that eventually they will be consistent it seems to me much easier than does transactions spanning two different dbs.
I'm not sure how much DDD is your use case, as if you don't need to read back you don't have any state to manage, so no need for entities/aggregates
I am working on an ASP.NET webforms application with Entity Framework. Also for some reports it uses a dll and in that we have explicit query to get the records from SQL Server (such as ADO).
The problem is that when I change a column such as ParentID in SQL Server, I must to reset the website in IIS to see it and this solves the problem. This dependency is not logical and I want to know why this happens? Is there any relation to caching because of calling method in the dll?
How can I solve this problem?
When you run a query against SQL server (or any database, really), the result that you see is not the data "in the database", so to speak. The query returns a copy of that data that belongs only to you. The copy of the data gets sent over the network, to the client - in your case, an ASP.NET web application - and the application does whatever it needs to do, such as show it to a user.
Once the query which retrieved the data is complete, there is no longer any link between the data in the client, and the data in the database. There is no continuous, "live" connection between the two, even if your actual database connection is still open. The database connection is merely a way to send queries to the server, and for it to send copies of the data back.
It's like taking a copy of a file from a different machine. If you copy a file from my machine, and then I update my copy, your copy doesn't instantly get updated.
If you want data in some user interface to stay perfectly up to date with the data that actually exists in the database, you have a difficult problem to solve. There is no "easy" way to do this. Or perhaps more accurately, there is no simple or efficient way to do this.
This might seem odd to you. You're thinking "well, why not? Why doesn't it just show me the values as they actually exist?". The reason is that these systems need to be able to support many users - often thousands at once - who are all both reading the database and writing to it. Imagine someone was in the middle of updating data in the database, but then they rollback their transaction. Should you see the data as it was being modified, but not committed? What if two users are trying to update "the same" data at once? All sorts of concurrency questions come into play, which basically boils down to questions about locking.
What you are encountering here is a basic principle of multi-threaded environments, which translates to systems with multiple clients: Data can't be accessed directly by multiple people at the same time. Instead, you give each person their own immutable copy.
In a web application things are even more disconnected. When the browser requests the web page, the server side of the web application gets a copy of the data from the database, and then transmits that to the browser. Once the page is loaded there is no longer any link between the web server and the database server, or any link between the web server and the web browser at the client, and certainly no link between the web browser and the database.
Ultimately, this is one of the "hard problems" in computer science. You want to know how to tell the client to invalidate their "cache", and refresh their local data. There are a few mechanisms provided by .NET to do this with SQL Server, but they are quite technical. One of them is query notifications
AFAIK, Memcached does not support synchronization with database (at least SQL Server and Oracle). We are planning to use Memcached (it is free) with our OLTP database.
In some business processes we do some heavy validations which requires lot of data from database, we can not keep static copy of these data as we don't know whether the data has been modified so we fetch the data every time which slows the process down.
One possible solution could be
Write triggers on database to create/update prefixed-postfixed (table-PK1-PK2-PK3-column) files on change of records
Monitor this change of file using FileSystemWatcher and expire the key (table-PK1-PK2-PK3-column) to get updated data
Problem: There would be around 100,000 users using any combination of data for 10 hours. So we will end up having a lot of files e.g. categ1-subcateg5-subcateg-78-data100, categ1-subcateg5-subcateg-78-data250, categ2-subcateg5-subcateg-78-data100, categ1-subcateg5-subcateg-33-data100, etc.
I am expecting 5 million files at least. Now it looks a pathetic solution :(
Other possibilities are
call a web service asynchronously from the trigger passing the key
to be expired
call an exe from trigger without waiting it to finish and then this
exe would expire the key. (I have got some success with this approach on SQL Server using xp_cmdsell to call an exe, calling an exe from oracle's trigger looks a bit difficult)
Still sounds pathetic, isn't it?
Any intelligent suggestions please
It's not clear (to me) if the use of Memcached is mandatory or not. I would personally avoid it and use instead SqlDependency and OracleDependency. The two both allow to pass a db command and get notified when the data that the command would return changes.
If Memcached is mandatory you can still use this two classes to trigger the invalidation.
MS SQL Server has "Change Tracking" features that maybe be of use to you. You enable the database for change tracking and configure which tables you wish to track. SQL Server then creates change records on every update, insert, delete on a table and then lets you query for changes to records that have been made since the last time you checked. This is very useful for syncing changes and is more efficient than using triggers. It's also easier to manage than making your own tracking tables. This has been a feature since SQL Server 2005.
How to: Use SQL Server Change Tracking
Change tracking only captures the primary keys of the tables and let's you query which fields might have been modified. Then you can query the tables join on those keys to get the current data. If you want it to capture the data also you can use Change Capture, but it requires more overhead and at least SQL Server 2008 enterprise edition.
Change Data Capture
I have no experience with Oracle, but i believe it may also have some tracking functionality as well. This article might get you started:
20 Using Oracle Streams to Record Table Changes
From sqlite FAQ I've known that:
Multiple processes can have the same database open at the same time.
Multiple processes can be doing a SELECT at the same time. But only
one process can be making changes to the database at any moment in
time, however.
So, as far as I understand I can:
1) Read db from multiple threads (SELECT)
2) Read db from multiple threads (SELECT) and write from single thread (CREATE, INSERT, DELETE)
But, I read about Write-Ahead Logging that provides more concurrency as readers do not block writers and a writer does not block readers. Reading and writing can proceed concurrently.
Finally, I've got completely muddled when I found it, when specified:
Here are other reasons for getting an SQLITE_LOCKED error:
Trying to CREATE or DROP a table or index while a SELECT statement is
still pending.
Trying to write to a table while a SELECT is active on that same table.
Trying to do two SELECT on the same table at the same time in a
multithread application, if sqlite is not set to do so.
fcntl(3,F_SETLK call on DB file fails. This could be caused by an NFS locking
issue, for example. One solution for this issue, is to mv the DB away,
and copy it back so that it has a new Inode value
So, I would like to clarify for myself, when I should to avoid the locks? Can I read and write at the same time from two different threads? Thanks.
For those who are working with Android API:
Locking in SQLite is done on the file level which guarantees locking
of changes from different threads and connections. Thus multiple
threads can read the database however one can only write to it.
More on locking in SQLite can be read at SQLite documentation but we are most interested in the API provided by OS Android.
Writing with two concurrent threads can be made both from a single and from multiple database connections. Since only one thread can write to the database then there are two variants:
If you write from two threads of one connection then one thread will
await on the other to finish writing.
If you write from two threads of different connections then an error
will be – all of your data will not be written to the database and
the application will be interrupted with
SQLiteDatabaseLockedException. It becomes evident that the
application should always have only one copy of
SQLiteOpenHelper(just an open connection) otherwise
SQLiteDatabaseLockedException can occur at any moment.
Different Connections At a Single SQLiteOpenHelper
Everyone is aware that SQLiteOpenHelper has 2 methods providing access to the database getReadableDatabase() and getWritableDatabase(), to read and write data respectively. However in most cases there is one real connection. Moreover it is one and the same object:
SQLiteOpenHelper.getReadableDatabase()==SQLiteOpenHelper.getWritableDatabase()
It means that there is no difference in use of the methods the data is read from. However there is another undocumented issue which is more important – inside of the class SQLiteDatabase there are own locks – the variable mLock. Locks for writing at the level of the object SQLiteDatabase and since there is only one copy of SQLiteDatabase for read and write then data read is also blocked. It is more prominently visible when writing a large volume of data in a transaction.
Let’s consider an example of such an application that should download a large volume of data (approx. 7000 lines containing BLOB) in the background on first launch and save it to the database. If the data is saved inside the transaction then saving takes approx. 45 seconds but the user can not use the application since any of the reading queries are blocked. If the data is saved in small portions then the update process is dragging out for a rather lengthy period of time (10-15 minutes) but the user can use the application without any restrictions and inconvenience. “The double edge sword” – either fast or convenient.
Google has already fixed a part of issues related to SQLiteDatabase functionality as the following methods have been added:
beginTransactionNonExclusive() – creates a transaction in the “IMMEDIATE mode”.
yieldIfContendedSafely() – temporary seizes the transaction in order to allow completion of tasks by other threads.
isDatabaseIntegrityOk() – checks for database integrity
Please read in more details in the documentation.
However for the older versions of Android this functionality is required as well.
The Solution
First locking should be turned off and allow reading the data in any situation.
SQLiteDatabase.setLockingEnabled(false);
cancels using internal query locking – on the logic level of the java class (not related to locking in terms of SQLite)
SQLiteDatabase.execSQL(“PRAGMA read_uncommitted = true;”);
Allows reading data from cache. In fact, changes the level of isolation. This parameter should be set for each connection anew. If there are a number of connections then it influences only the connection that calls for this command.
SQLiteDatabase.execSQL(“PRAGMA synchronous=OFF”);
Change the writing method to the database – without “synchronization”. When activating this option the database can be damaged if the system unexpectedly fails or power supply is off. However according to the SQLite documentation some operations are executed 50 times faster if the option is not activated.
Unfortunately not all of PRAGMA is supported in Android e.g. “PRAGMA locking_mode = NORMAL” and “PRAGMA journal_mode = OFF” and some others are not supported. At the attempt to call PRAGMA data the application fails.
In the documentation for the method setLockingEnabled it is said that this method is recommended for using only in the case if you are sure that all the work with the database is done from a single thread. We should guarantee than at a time only one transaction is held. Also instead of the default transactions (exclusive transaction) the immediate transaction should be used. In the older versions of Android (below API 11) there is no option to create the immediate transaction thru the java wrapper however SQLite supports this functionality. To initialize a transaction in the immediate mode the following SQLite query should be executed directly to the database, – for example thru the method execSQL:
SQLiteDatabase.execSQL(“begin immediate transaction”);
Since the transaction is initialized by the direct query then it should be finished the same way:
SQLiteDatabase.execSQL(“commit transaction”);
Then TransactionManager is the only thing left to be implemented which will initiate and finish transactions of the required type. The purpose of TransactionManager – is to guarantee that all of the queries for changes (insert, update, delete, DDL queries) originate from the same thread.
Hope this helps the future visitors!!!
Not specific to SQLite:
1) Write your code to gracefully handle the situation where you get a locking conflict at the application level; even if you wrote your code so that this is 'impossible'. Use transactional re-tries (ie: SQLITE_LOCKED could be one of many codes that you interpret as "try again" or "wait and try again"), and coordinate this with application-level code. If you think about it, getting a SQLITE_LOCKED is better than simply having the attempt hang because it's locked - because you can go do something else.
2) Acquire locks. But you have to be careful if you need to acquire more than one. For each transaction at the application level, acquire all of the resources (locks) you will need in a consistent (ie: alphabetical?) order to prevent deadlocks when locks get acquired in the database. Sometimes you can ignore this if the database will reliably and quickly detect the deadlocks and throw exceptions; in other systems it may just hang without detecting the deadlock - making it absolutely necessary to take the effort to acquire the locks correctly.
Besides the facts of life with locking, you should try to design the data and in-memory structures with concurrent merging and rolling back planned in from the beginning. If you can design data such that the outcome of a data race gives a good result for all orders, then you don't have to deal with locks in that case. A good example is to increment a counter without knowing its current value, rather than reading the value and submitting a new value to update. It's similar for appending to a set (ie: adding a row, such that it doesn't matter which order the row inserts happened).
A good system is supposed to transactionally move from one valid state to the next, and you can think of exceptions (even in in-memory code) as aborting an attempt to move to the next state; with the option to ignore or retry.
You're fine with multithreading. The page you link lists what you cannot do while you're looping on the results of your SELECT (i.e. your select is active/pending) in the same thread.
Are there well-known best practices for synchronizing tasks across a server farm? For example if I have a forum based website running on a server farm, and there are two moderators trying to do some action which requires writing to multiple tables in the database, and the requests of those moderators are being handled by different servers in the server farm, how can one implement some locking functionality to ensure that they can't take that action on the same item at the same time?
So far, I'm thinking about using a table in the database to sync, e.g. check the id of the item in the table if doesn't exsit insert it and proceed, otherwise return. Also probably a shared cache could be used for this but I'm not using this at the moment.
Any other way?
By the way, I'm using MySQL as my database back-end.
Your question implies data level concurrency control -- in that case, use the RDBMS's concurrency control mechanisms.
That will not help you if later you wish to control application level actions which do not necessarily map one to one to a data entity (e.g. table record access). The general solution there is a reverse-proxy server that understands application level semantics and serializes accordingly if necessary. (That will negatively impact availability.)
It probably wouldn't hurt to read up on CAP theorem, as well!
You may want to investigate a distributed locking service such as Zookeeper. It's a reimplementation of a Google service that provides very high speed distributed resource locking coordination for applications. I don't know how easy it would be to incorporate into a web app, though.
If all the state is in the (central) database then the database transactions should take care of that for you.
See http://en.wikipedia.org/wiki/Transaction_(database)
It may be irrelevant for you because the question is old, but it still may be useful for others so i'll post it anyway.
You can use a "SELECT FOR UPDATE" db query on a locking object, so you actually use the db for achieving the lock mechanism.
if you use ORM, you can also do that. for example, in nhibernate you can do:
session.Lock(Member, LockMode.Upgrade);
Having a table of locks is a OK way to do it is simple and works.
You could also have the code as a Service on a Single Server, more of a SOA approach.
You could also use the the TimeStamp field with Transactions, if the timestamp has changed since you last got the data you can revert the transaction. So if someone gets in first they have priority.