Is it a best practice to use stored procedure for every single SQL call in .NET applications?
Is it encouraged for performance reasons and to reduce surface area for SQL injection attacks (in web applications)?
Stored procedures have a few advantages over parameterized queries:
When used exclusively, you can turn off CREATE, INSERT, SELECT, UPDATE, ALTER, DROP, DELETE, etc access for your application accounts, and this way add a small amount of security.
They provide a consistent, manageable interface when you have multiple applications using the same database.
Using procedures allows a DBA to manage and tune queries even after an application is deployed.
Deploying small changes and bug fixes is much simpler.
They also have a few disadvantages:
The number of procedures can quickly grow to the point where maintaining them is difficult, and current tools don't provide a simple method for adequate documentation.
Parameterized queries put the database code next to the place where it's used. Stored procedures keep it far separated, making finding related code more difficult.
Stored procedures are harder to version.
You'll need to weigh those costs/benefits for your system.
No.
If you send your queries to SQL Server as parameterized queries, SQL Server will cache the execution plan AND will sanitize your parameter inputs properly to avoid SQL injection attacks.
I prefer stored procs over inline SQL, because this way the SQL is one consolidated place; however, I prefer using a tool like nHibernate which will auto generate the SQL for me, then you have no SQL to worry about!
There is one more advantage - when it comes to tuning, especially per customer, it can be easily done with SP (by adding hints or even rewriting the code). With embedded SQL it is practically impossible.
It's just one way of doing things. Upsides include keeping all your SQL code in one place, procs being verified for syntax at creation time, and being able to set permissions on procs, which usually represent some kind of "action" and are well suited to a conceptual security model.
Downsides include massive numbers of procs for any medium or larger application, and all the housekeeping that comes with that.
My employer's product uses procs for everything, and I must say with the right practices in place it's quite bearable.
Related
I'm about to begin designing the architecture of a personal project that has the following characteristics:
Essentially a "game" containing several concurrent users based on a sport.
Matches in this sport are simulated on a regular basis and their results stored in a database.
Users can view the details of a simulated match "live" when it is occurring as well as see results after they have occurred.
I developed a similar web application with a much smaller scope as the previous iteration of this project. In that case, however, I chose to go with SQLite as my DB provider since I also had a redistributable desktop application that could be used to manually simulate matches (and in fact that ran as a standalone simulator outside of the web application). My constraints have now shifted to be only a web application, so I don't have to worry about this additional level of complexity.
My main problem with my previous implementation was handling concurrent requests. I made the mistake of using one database (which was represented by a single file on disk) to power both the simulation aspect (which ran in a separate process on the server) and the web application. Hence, when users were accessing the website concurrently with a live simulation happening, there were all sorts of database access issues since it was getting locked by one process. I fixed this by implementing a cross-process mutex on database operations but this drastically slowed down the performance of the website.
The tools I will be using are:
ASP.NET for the web application.
SQL Server 2008 R2 for the database... probably with an NHibernate layer for object relational mapping.
My question is, how do I design this so I will achieve optimal efficiency as well as concurrent access? Obviously shifting to an actual DB server from a file will have it's positives, but do I need to have two redundant servers--one for the simulation process and one for the web server process?
Any suggestions would be appreciated!
Thanks.
You should be fine doing both on the same database. Concurrent access is what modern database engines are designed for. Concurrent reads are usually no problem at all; concurrent writes lock the minimum possible amount of data (a table, or even just a number of rows), not the entire database.
A few things you should keep in mind though:
Use transactions wisely. On the one hand, a transaction is an important tool in making sure your database is always consistent - in short, a transaction either happens completely, or not at all. On the other hand, two concurrent transactions can cause deadlocks, and those buggers can be extremely hard to debug.
Normalize, and use constraints to protect your data integrity. Enforcing foreign keys can save the day, even though it often leads to more cumbersome administration.
Minimize the amount of time spent on data access: don't keep connections around when you don't need them, make absolutely sure you're not leaking any connections, don't fetch data you know don't need, do as much data-related processing (especially things that can be solved using joins, subqueries, groupings, views, etc.) in SQL instead of in code
Python --> SQLite --> ASP.NET C#
I am looking for an in memory database application that does not have to write the data it receives to disc. Basically, I'll be having a Python server which receives gaming UDP data and translates the data and stores it in the memory database engine.
I want to stay away from writing to disc as it takes too long. The data is not important, if something goes wrong, it simply flushes and fills up with the next wave of data sent by players.
Next, another ASP.NET server must be able to connect to this in memory database via TCP/IP at regular intervals, say once every second, or 10 seconds. It has to pull this data, and this will in turn update on a website that displays "live" game data.
I'm looking at SQlite, and wondering, is this the right tool for the job, anyone have any suggestions?
Thanks!!!
This sounds like a premature optimization (apologizes if you've already done the profiling). What I would suggest is go ahead and write the system in the simplest, cleanest way, but put a bit of abstraction around the database bits so they can easily by swapped out. Then profile it and find your bottleneck.
If it turns out it is the database, optimize the database in the usual way (indexes, query optimizations, etc...). If its still too slow, most databases support an in-memory table format. Or you can mount a RAM disk and mount individual tables or the whole database on it.
Totally not my field, but I think Redis is along these lines.
The application of SQlite depends on your data complexity.
If you need to perform complex queries on relational data, then it might be a viable option. If your data is flat (i.e. not relational) and processed as a whole, then some python-internal data structures might be applicable.
Perhaps AppFabric would work for you?
http://msdn.microsoft.com/en-us/windowsserver/ee695849.aspx
SQLite doesn't allow remote "connections" as far as I know, it only supports being invoked as an in-process library. However, you could try to use MySQL which, while heavier, supports remote connections and does have in-memory tables.
See http://dev.mysql.com/doc/refman/5.5/en/memory-storage-engine.html
So I have a challenge to build a site that people online can use to interact with organizations.: Asp.NET MVC Customer Application
One of the requirements is financial processing and accounting.
I'm very comfortable using SQL Transactions and stored procedures to do this; i.e. CreateCustomer also creates an entity, and an account record. We have a stored procedure to do this, that does a begin transaction, creates some setup records we need, then does a commit. I'm not seeing a good way to do this with an ORM, and after reading some great blog articles I'm starting to wonder if I'm going down the wrong path.
Part of the complexity here is the data itself:
I'm querying x databases (one per existing customer) to get some of my data, though my app has its own data store as well. I need to query the x databases, run stored procedures on the x databases, and also to my own datastore.
I'm not seeing strong support for things like stored procedures and thereby transactions, though it does seem to be present.
Maybe I'm just trying to make my app a nail here, cause the MVC hammer is sooo shiny. I'm plenty comfortable with raw ADO.NET of course, but I'm in love with the expressive feel to writing Linq code in C# and I'd rather not give up on it.
Down to the question:
Is this a bad idea? Should I try to use Linq / Entity Framework, or something like nHibernate... and stick with the ORM pattern or should I trash it and use raw ADO.NET data access?
Edit: a note on scale; from a queries per second standpoint this app is not "huge". But, from a data complexity perspective, it does need to query against 50+ databases (all identical, or close to it) to read data from an external application and publish data back to that application. ORM feels right when dealing with "my" data store, but feels very wrong for accessing the data from the external application.
From a certain size (number of databases) up, you have to change the paradigm. Are you at that size?
When you deploy what ultimately is a distributed application and yet try to controll it as an ordinary local application you are going to run into a set of fundamental issues around availability, scalability and correctness. If you use concepts like 'distributed transactions', 'linked servers' and 'ORM', your are down the wrong path. True distributed applications will use terms like 'message', 'queue' and and 'service'. Terms like Linq, EF, nHibernate are all fine and good, but none will bring you anything extra from what a simple Transact-SQL SELECT statement brings. In other words, if a SELECT solves your issues, then the client side various ORM will work. If not, they won't add any miraculos value.
I recommend you go over the slides on the SQLCAT: High Performance Distributed Applications in Real World Deployments which explain how a site like MySpace manages to read and write into a store of nearly 500 servers and thousands of databases.
Ultimately what you need to internalize is this: one database can have 95% availability (uptime and acceptable service response time). A system consiting of 10 databases with 95% availability has 59% availability. And a system of 100 databases each with 99.5% availability has 60% availability. 1000 databases with 99.95% availability (5 min downtime per week) have 60% availability. And this is for an ideal situation. In reality there is always a snowball effect caused by resource consumption (eg. threads blocked on trying to access an unavailable or slow resource) that makes things far worse.
This means that one cannot write a large distributed system relying on synchronous, tightly coupled operatiosn and transactions. Is simply impossible. You always rely on asynchronous operations (usually messaging and queues), which is something completely different from your run-of-the-mill database application.
use TransactionScope object available in System.Transaction.
What I have chosen is to use Entity Framework to allow access to the application's main data store, and create a custom DAL for access to external application data and for access to stored procedures within the application.
Here's hoping Entity Framework 4.0 fixes the issue. For now, I'm using the concept listed here.
http://social.msdn.microsoft.com/forums/en-US/adodotnetentityframework/thread/44a0a7c2-7c1b-43bc-98e0-4d072b94b2ab/
We are developing an ASP.NET HR Application that will make thousands of calls per user session to relatively static database tables (e.g. tax rates). The user cannot change this information, and changes made at the corporate office will happen ~once per day at most (and do not need to be immediately refreshed in the application).
About 2/3 of all database calls are to these static tables, so I am considering just moving them into a set of static objects that are loaded during application initialization and then refreshed every 24 hours (if the app has not restarted during that time). Total in-memory size would be about 5MB.
Am I making a mistake? What are the pitfalls to this approach?
From the info you present, it looks like you definitely should cache this data -- rarely changing and so often accessed. "Static" objects may be inappropriate, though: why not just access the DB whenever the cached data is, say, more than N hours old?
You can vary N at will, even if you don't need special freshness -- even hitting the DB 4 times or so per day will be much better than "thousands [of times] per user session"!
Best may be to keep with the DB info a timestamp or datetime remembering when it was last updated. This way, the check for "is my cache still fresh" is typically very light weight, just get that "latest update" info and check it with the latest update on which you rebuilt the local cache. Kind of like an HTTP "if modified since" caching strategy, except you'd be implementing most of it DB-client-side;-).
If you decide to cache the data (vs. make a database call each time), use the ASP.NET Cache instead of statics. The ASP.NET Cache provides functionality for expiry, handles multiple concurrent requests, it can even invalidate the cache automatically using the query notification features of SQL 2005+.
If you use statics, you'll probably end up implementing those things anyway.
There are no drawbacks to using the ASP.NET Cache for this. In fact, it's designed for caching data too (see the SqlCacheDependency class http://msdn.microsoft.com/en-us/library/system.web.caching.sqlcachedependency.aspx).
With caching, a dbms is plenty efficient with static data anyway, especially only 5M of it.
True, but the point here is to avoid the database roundtrip at all.
ASP.NET Cache is the right tool for this job.
You didnt state how you will be able to find the matching data for a user. If it is as simple as finding a foreign key in the cached set then you dont have to worry.
If you implement some kind of filtering/sorting/paging or worst searching then you might at some point miss the quereing capabilities of SQL.
ORM often have their own quereing and linq makes things easy to, but it is still not SQL.
(try to group by 2 columns)
Sometimes it is a good way to have the db return the keys of a resultset only and use the Cache to fill the complete set.
Think: Premature Optimization. You'll still need to deal with the data as tables eventually anyway, and you'd be leaving an "unusual design pattern".
With event default caching, a dbms is plenty efficient with static data anyway, especially only 5M of it. And the dbms partitioning you're describing is often described as an antipattern. One example: multiple identical databases for multiple clients. There are other questions here on SO about this pattern. I understand there are security issues, but doing it this way creates other security issues. I've recently seen this same concept in a medical billing database (even more highly sensitive) that ultimately had to be refactored into a single database.
If you do this, then I suggest you at least wait until you know it's solving a real problem, and then test to measure how much difference it makes. There are lots of opportunities here for Unintended Consequences.
Are there well-known best practices for synchronizing tasks across a server farm? For example if I have a forum based website running on a server farm, and there are two moderators trying to do some action which requires writing to multiple tables in the database, and the requests of those moderators are being handled by different servers in the server farm, how can one implement some locking functionality to ensure that they can't take that action on the same item at the same time?
So far, I'm thinking about using a table in the database to sync, e.g. check the id of the item in the table if doesn't exsit insert it and proceed, otherwise return. Also probably a shared cache could be used for this but I'm not using this at the moment.
Any other way?
By the way, I'm using MySQL as my database back-end.
Your question implies data level concurrency control -- in that case, use the RDBMS's concurrency control mechanisms.
That will not help you if later you wish to control application level actions which do not necessarily map one to one to a data entity (e.g. table record access). The general solution there is a reverse-proxy server that understands application level semantics and serializes accordingly if necessary. (That will negatively impact availability.)
It probably wouldn't hurt to read up on CAP theorem, as well!
You may want to investigate a distributed locking service such as Zookeeper. It's a reimplementation of a Google service that provides very high speed distributed resource locking coordination for applications. I don't know how easy it would be to incorporate into a web app, though.
If all the state is in the (central) database then the database transactions should take care of that for you.
See http://en.wikipedia.org/wiki/Transaction_(database)
It may be irrelevant for you because the question is old, but it still may be useful for others so i'll post it anyway.
You can use a "SELECT FOR UPDATE" db query on a locking object, so you actually use the db for achieving the lock mechanism.
if you use ORM, you can also do that. for example, in nhibernate you can do:
session.Lock(Member, LockMode.Upgrade);
Having a table of locks is a OK way to do it is simple and works.
You could also have the code as a Service on a Single Server, more of a SOA approach.
You could also use the the TimeStamp field with Transactions, if the timestamp has changed since you last got the data you can revert the transaction. So if someone gets in first they have priority.