I am using SignalR with my ASP.NET application. What my application needs is to pressist the groups data that is updated from various servers. According to SignalR documentation it's my responsibility to do this. It means that I need to use an external server/service that will collect the data from one or more servers and I can query that data from a single place.
I first thought that MemCached is the best candidate, because it's fast and the data that I need to put there is volatile. The problem is that I need to store collections, for example: collection A with user Ids, so I can have Collection A with 2000 user ids and Collection B with 40,000 ids. The problem is that I need to update this collection and remove and insert id very quickly. I afraid that because the commands will be initiated from several servers, and the fact that I might need to read the entire collection and update it on a either web servers, the data won't be consistent. Web Server A might update the data, but Server B will read the data before Server A finished updating it. There is a concurrency conflict.
I'm searching for the best way to implement this kind of strategy in my ASP.NET 4.5 application. I think that this might be a choice to use a in-memory database or that to insure no data integrity.
I want to ask you what is the best solution for my problem.
Here's an example for my problen:
MemCached Server - stores the collections (e.g. Collection A, B, C, D), each collection stores User Id's, which can be thousands of Ids and even much more.
Web Servers - My Amazon EC2 web servers with SignalR installed. Can be behind load balancer. Those servers need to gain access to the memcached server and get a complete collection items by the Collection name (e.g. "Collection_23"). They need to be able to remove items (User Id's) and add Items. All this should be fast as possible.
I hope that I explained myself right. Thanks.
Alternatively, you can use Redis, like Memcached everything is served from in-memory. Redis has many other capabilities beyond a simple key-value datastore; for your specific case you might use Redis transactions, which ensures data consistency.
In a comment in another post it shows a link to redis provider. The link is broken, it seems that it is now integrated in the main SignalR project: https://github.com/SignalR/SignalR/tree/master/src/Microsoft.AspNet.SignalR.Redis
You have the redis nuget here:
http://www.nuget.org/packages/Microsoft.AspNet.SignalR.Redis
and documentation here:
http://www.asp.net/signalr/overview/signalr-20/performance-and-scaling/scaleout-with-redis
Related
For client security and privacy reasons, we want to deploy a unique database for each client while using the same website.
I envision that during the session_start event, we would determine which database to use for them (by looking at the subdomain they come in on) and set the connection string in a session variable. Then on every page_init, we'd dynamically set any object's connection string. In code behind, we'd do the same thing with the connection string.
Is there a better approach to doing this and will setting the connection string in page_init work? Is using a session variable wise? I've tended not to ever use them except when no other solution was possible.
The problem with the model itself it is really complex and can let you with some errors specially when we are talking about changes in the database. Imagine that you need to add an extra field on the interface. if you have 100 clients this will mean updating 100 different databases. when we talk about dealing with downtime them things get even worst.
I would do with that in a light different abstract your database layer create one api that will call the database. And from the website you always call the api passing the domain that you want the data to come from.
You can ask me what advantage this will give to you. The biggest one that you will see it is when doing upgrades and maintenance. Having one api per client it is a lot better to think them having one database per client. and if you really want to have just one (I would really recommend having one per client and deploying automatically) you can have a switch on the call and base with some parameters that you pass to the api ( can be on the header like the subdomain on the header) you can chose what database to connect.
Let me give you a sample scenario and how I would suggest to approach this scenario (that is true for database or api)
I want to include a new data field. So first thing it is to add this field on the backend (api or database) deploy this new field if it is one api you can even test that calling the api and see that the new field it is now returned that is not a problem for your ui because it is just a field that it does not use. after that you change the ui to actually use this field and deploy that to production.
I have an ASP.net application that I'm moving to Azure. In the application, there's a query that joins 9 tables to produce a user record. Each record is then serialized in json and sent back and forth with the client. To increase query performance, the first time the 9 queries run and the record is serialized in json, the resulting string is saved to a table called JsonUserCache. The table only has 2 columns: JsonUserRecordID (that's unique) and JsonRecord. Each time a user record is requested from the client, the JsonUserCache table is queried first to avoid having to do the query with the 9 joins. When the user logs off, the records he created in the JsonUserCache are deleted.
The table JsonUserCache is SQL Server. I could simply leave everything as is but I'm wondering if there's a better way. I'm thinking about creating a simple dictionary that'll store the key/values and put that dictionary in AppFabric. I'm also considering using a NoSQL provider and if there's an option for Azure or if I should just stick to a dictionary in AppFabric. Or, is there another alternative?
Thanks for your suggestions.
"There are only two hard problems in Computer Science: cache invalidation and naming things."
Phil Karlton
You are clearly talking about a cache and as a general principle, you should not persist any cached data (in SQL or anywhere else) as you have the problem of expiring the cache and having to do the deletes (as you currently are). If you insist on storing your result somewhere and don't mind the clearing up afterwards, then look at putting it in an Azure blob - this is easily accessible from the browser and doesn't require that the request be handled by your own application.
To implement it as a traditional cache, look at these options.
Use out of the box ASP.NET caching, where you cache in memory on the web role. This means that your join will be re-run on every instance that the user goes to, but depending on the number of instances and the duration of the average session may be the simplest to implement.
Use AppFabric Cache. This is an extra API to learn and has additional costs which may get quite high if you have lots of unique visitors.
Use a specialised distributed cache such as Memcached. This has the added cost/hassle of having to run it all yourself, but gives you lots of flexibility in the long run.
Edit: All are RAM based. Using ASP.NET caching is simpler to implement and is faster to retrieve the data from cache because it is on the same machine - BUT requires the cache to be populated for each instance of the web role (i.e. it is not distributed). AppFabric caching is distributed but is also a bit slower (network latency) and, depending what you mean by scalable, AppFabric caching currently behaves a bit erratically at scale - so make sure you run tests. If you want scalable, feature rich distributed caching, and it is a big part of your application, go and put in Memcached.
I am building an ASP.NET web application that will use SQL Server for data storage. I am inheriting an existing structure and I am not able to modify it very much. The people who use this application are individual companies who have paid to use the application. Each company has about 5 or 10 people who will use the application. There are about 1000 companies. The way that the system is currently structured, every company has their own unique database in the SQL Server instance. The structure of each database is the same. I don't think that this is a good database design but there is nothing I can do about it. There are other applications that hit this database and it would be quite an undertaking to rewrite the DB interfaces for all of those apps.
So my question is how to design the architecture for the new web app. There are times of the month where the site will get a lot of traffic. My feeling is that the site will not perform well at these times because I am guessing that when we have 500 people from different companies accessing the site simultaneously that they will each have their own unique database connection because they are accessing different SQL Server databases with different connection strings. SQL Server will not use any connection pooling. My impression is that this is bad.
What happens if they were to double their number of customers? How many unique database connections can SQL Server handle? Is this a situation where I should tell the client that they must redesign this if they want to remain scalable?
Thanks,
Corey
You don't have to create separate connections for every DB
I have an app that uses multiple DBs on the same server. I prefix each query with a "USE dbName; "
I've even run queries on two separate DB's in the same call.
As for calling stored procs, it's a slightly different process. Since you can't do
Use myDB; spBlahBLah
Instead you have to explicity change the DB in the connection object. In .Net it looks something like this:
myConnection.ChangeDatabase("otherDBName");
then call your stored procedure.
Hopefully, you have a single database for common items. Here, I hope you have a Clients table with IsEnabled, Logo, PersonToCallWhenTheyDontPayBills, etc. Add a column for Database (i.e. catalog) and while you're at it, Server. You web application will point to the common database when starting up and build the list of database connetions per client. Programmatically build your database connection strings with the Server and Database columns in the table.
UPDATE:
After my discussion with #Neil, I want to point out that my method assumes a singleton database connection. If you don't do this then it would be silly to follow my advice.
Scaling is a complex issue. However why are you not scaling the web aspect as well? Then the connection pooling is limited to the web application.
edit:
I'm talking about the general case here. I know tha pooling occurs at many levels, not just the IDbConnection (http://stackoverflow.com/questions/3526617/are-ado-net-2-0-connection-pools-pre-application-domain-or-per-process). I was wondering whether the questioner had considered scaling at the we application level.
We are thinking to make some architectural changes in our application, which might affect the technologies we'll be using as a result of those changes.
The change that I'm referring in this post is like this:
We've found out that some parts of our application have common data and common services, so we extracted those into a GlobalServices service, with its own master data db.
Now, this service will probably have its own cache, so that it won't have to retrieve data from the db on each call.
So, when one client makes a call to that service that updates data, other clients might be interested in that change, or not. Now that depends on whether we decide to keep a cache on the clients too.
Meaning that if the clients will have their own local cache, they will have to be notified somehow (and first register for notifications). If not, they will always get the data from the GlobalServices service.
I need your educated advice here guys:
1) Is it a good idea to keep a local cache on the clients to begin with?
2) If we do decide to keep a local cache on the clients, would you use
SqlCacheDependency to notify the clients, or would you use WCF for
notifications (each might have its cons and pros)
Thanks a lot folks,
Avi
I like the sound of your SqlCacheDependency, but I will answer this from a different perspective as I have worked with a team on a similar scenario. We created a master database and used triggers to create XML representations of data that was being changed in the master, and stored it in a TransactionQueue table, with a bit of meta data about what changed, when and who changed it. The client databases would periodically check the queue for items it was interested in, and would process the XML and update it's own tables as necessary.
We also did the same in reverse for the client to update the master. We set up triggers and a TransactionQueue table on the client databases to send data back to the master. This in turn would update all of the other client databases when they next poll.
The nice thing about this is that it is fairly agnostic on client platform, and client data structure, so we were able to use the method on a range of legacy and third party systems. The other great point here is that you can take any of the databases out of the loop (including the master - e.g. connection failure) and the others will still work fine. This worked well for us as our master database was behind our corporate firewall, and the simpler web databases were sitting with our ISP.
There are obviously cons to this approach, like race hazard, so we were careful with the order of transaction processing, error handling, de-duping etc. We also built a management GUI to provide a human interaction layer before important data was changed in the master.
Good luck! Tim
Are there well-known best practices for synchronizing tasks across a server farm? For example if I have a forum based website running on a server farm, and there are two moderators trying to do some action which requires writing to multiple tables in the database, and the requests of those moderators are being handled by different servers in the server farm, how can one implement some locking functionality to ensure that they can't take that action on the same item at the same time?
So far, I'm thinking about using a table in the database to sync, e.g. check the id of the item in the table if doesn't exsit insert it and proceed, otherwise return. Also probably a shared cache could be used for this but I'm not using this at the moment.
Any other way?
By the way, I'm using MySQL as my database back-end.
Your question implies data level concurrency control -- in that case, use the RDBMS's concurrency control mechanisms.
That will not help you if later you wish to control application level actions which do not necessarily map one to one to a data entity (e.g. table record access). The general solution there is a reverse-proxy server that understands application level semantics and serializes accordingly if necessary. (That will negatively impact availability.)
It probably wouldn't hurt to read up on CAP theorem, as well!
You may want to investigate a distributed locking service such as Zookeeper. It's a reimplementation of a Google service that provides very high speed distributed resource locking coordination for applications. I don't know how easy it would be to incorporate into a web app, though.
If all the state is in the (central) database then the database transactions should take care of that for you.
See http://en.wikipedia.org/wiki/Transaction_(database)
It may be irrelevant for you because the question is old, but it still may be useful for others so i'll post it anyway.
You can use a "SELECT FOR UPDATE" db query on a locking object, so you actually use the db for achieving the lock mechanism.
if you use ORM, you can also do that. for example, in nhibernate you can do:
session.Lock(Member, LockMode.Upgrade);
Having a table of locks is a OK way to do it is simple and works.
You could also have the code as a Service on a Single Server, more of a SOA approach.
You could also use the the TimeStamp field with Transactions, if the timestamp has changed since you last got the data you can revert the transaction. So if someone gets in first they have priority.