How robust is the asp.net profile system? Is it ready for prime time? - asp.net

I've used asp.net profiles (using the AspNetSqlProfileProvider) for holding small bits of information about my users. I started to wonder how it would handle a robust profile for a large number of users. Does anyone have experience using this on a large website with large numbers of simultaneous users? What are the performance implications? How about maintenance?

Running against this via SQL I have found is a bit tricky, but i have worked with clients that have scaled it up to a few hundred properties, and 10K+ users without difficulty. Granted not a lot of users but it is working thus far.
I think it really depends on the specific project, and your exact needs when it comes to working with the profile information. Do you need to query on it regularly via SQL? Do you just need to for user display only, these types of things might help provide a more solid answer for your needs.

The SQL provider performance is more closely correlated to big iron throughput. Performance is more or less directly proportional to a single SQL Server's ability to handle the number of queries. Scale-up is the only option, so as such its not really five-nines robust out the box.
You'll have to figure out if you need scale-out performance and availability e.g. through partitioning, replication, redundancy etc. and at what cost to performance. Some of the capabilities are are possible as is - the current implementation is more aimed at the middle-market and enterprise.
Good thing is you can put your own implementation of the profile provider - then attach it to services and systems with capabilities outlined above.
We wrote a custom authn,authz and profile provider and strapped it to large AD/LDS LDAP cluster across 3 datacenters. We're in the Comscore Top 10 - so you could say that we deal with a good slice of internet every day. 1000's of profile queries per second and 100'millions of profiles - it can scale with good planning, engineering and operations.

Related

Is Graph Database a good use case for a messaging system?

I am diving in the universe of Graph Databases and I'm simply amazed by how powerful it is. I chose OrientDB to start my first use case but I'm not certain if my domain applies to this specific section of my App.
An User follows another User.
An User can be part of a Conversation.
A Message can be sent (with a timestamp) to a Conversation.
A Message can be read (with a timestamp) by an User.
I'm worried to end up with millions (even billions) of Message nodes and sent or read edges thus affecting the overall performance of the system. The messaging section is not the main concept of the app, it is just a small portion of it.
Would it be a problem for OrientDB to handle? Is it a good application for a Graph Database?
Thank you all for your patience,
Vinicius
Don't think a Graph Database is a best candidate for a messaging system. Message system are relational in nature and suits the likes my MySQL.
You wouldn't be surprised to hear though that Facebook uses document-oriented databases for their messaging system.
Facebook is currently the largest installation of Cassandra, which is excellent for scalability. We already know that from Facebook. Plus its great for storing messages due to its distributed nature.
Take a look at the suggested way to use OrientDB with a similar use case:
http://orientdb.com/docs/last/Chat-use-case.html
The choice of a graph database ultimately depends on what are you going to do with the data.
In your case, do you plan to use any graph-processing algorithms, or graph traversals?
An edge in graph theory represents a relationship between nodes (objects). In the case of a timestamp for read and sent, it does not really fit and you will end up with billions of edges, killing the performance of the system.
The follower concept perfectly fits the database. Now concerning the Conversation it could be an attribute of the node. Do you need to create an edge to represent ownership just to query the Conversation ID ?
If the messaging is just a small part of your application, I suggest to use the best tool for your need and to combine both a column-oriented database (Cassandra) and use Orient-DB to represent relationships or use Orient-DB as in the Chat use case (Thanks #Lvca)
What we suggest is to avoid using Edges or Vertices connected with
edges for messages. The best way is using the document API by creating
one class per chat room, with no index, to have super fast access to
last X messages.
Also wondering about this topic but I think any RDBMS will be better for this task.
Also, Chat is kinda of a log. So ElasticSearch (and similar) can be perfect match for storing Terra bytes of chat data.
A lot of dissonant answers here.. Speaking from experience, I've built a few messaging systems on plain MongoDB instances with no issues whatsoever handling hundreds/thousands of concurrent users (with chat groups).
I'd say go with either Cassandra as it's a battle-tested database if you're very worried about scalability (as it's got it practically built-in) or some of the newcomers like MongoDB which is constantly being upgraded and you can relatively easily then include search via ElasticSearch on top of that. MongoDB supports scaling via sharding and it can therefore horizontally scale to your needs.
Just be sure to not bottleneck your speed on your backend service, implement as much asynchronous operations as possible.
Now, you can even go as far as to implement a streaming platform like Kafka which is excellent for CDC (change data capture) and will persist your message log until it is read by a service that actually writes messages to your database of choice, adding to your resiliency factor.

ASP.NET MembershipProvider - SQL Server vs. Active Directory

There are several option for storing the users info when dealing with ASP.NET Membership providers. I would like to ask if they are comparable in terms of performance. Especially of ActiveDirectoryMembershipProvider and SqlMembershipProvider if when there will be e.g. 100 000 users recorded.
Both Providers can handle the workload. Question is if the infrastructure below can handle it. An AD-Server with 100.000 accounts should be big enough to handle it.
So, the real question in my eyes is, do you write the app for an intranet and want to provide SSO functionality? Then, by all means, go with ActiveDirectory!
Your question is unanswerable, as "performance" depends greatly upon many factors.. for instance, network speed, network latency, network saturation, the power of your AD server vs your SQL Server, the disk subsystems in use in either, etc...
There is no way to say one way or the other without thoroughly evaluating each environment, and even at that point, you should just benchmark each and determine what works best for you.
In most cases, though.. the decision between sql vs ad has nothing to do with performance, and has to do with the features offered by each. I would strongly doubt you have 100,000 users in your active directory, as that would cost a millions of dollars in licensing costs.

To Multi-Tenant, or Not To Multi-tenant

I have a difficult database design decision to make regarding multi-tenancy for the growing number of branches of my client's web-based CRM, which I actively maintain.
I made the decision early on to use separate applications with separate databases for each branch, because it was the simplest way to cater for three different branches with disparate data and code requirements. I also wanted to avoid managing Tenant IDs in every query, like I had to with the legacy Classic ASP (cringe) application I built in 2007...the horror.
But now the data requirements for branches are converging and as the business expands, I need to be able to roll out new branches quickly and share global product SKUs.
Since tables and views are the same for all branches and better ORM tools are now available to manage multi-tenant applications, I wonder if it would be better to have a shared database for multiple branches.
Considerations for a centralised database:
Global product SKUs
Simplified inventory requisitions
Easier to backup
Deploy once instead of for every branch
Considerations against a centralised database:
Easier to differentiate branch requirements with separate DBs
Modular deployment (one downed branch doesn't break all)
Harder to manage and develop for shared DB
I have to re-design invoice numbering (sequence generated by seed)
Less WHERE clauses everywhere
Restoring one broken branch has plenty of implications for other branches
It is unlikely there will be ever be as many as 10 branches. Right now there are 3.
Developers with real-world experience in this area, what would you do in my situation? Keep apps & DBs separate, or combine into one giant system?
Edit: Great Microsoft article on multi-tenancy pro's and cons. I should note that data isolation between branches is not a major issue.
Bite the bullet and merge them. Add your tenant ID where it needs to be, and change your queries.
For customizations, look into a plugin type architecture that would allow you to deploy specific screens for particular clients.
We have a software product that is built in just such a fashion. Sometimes it's deployed on a client site, sometimes we host it. For all intents and purposes it is an order of magnitude easier to deal with a single code base that has client specific extensions than dealing with multiple branches of the code.
For one, when we fix a problem, we fix it for everyone. Sure, if we break it, we break it for everyone but that's what unit tests are for. And it is a heck of a lot easier to maintain a set of unit tests against one code base than it is to maintain them for multiple branches.
We've been doing multi-tenant for over 10 years and not once have I looked back. Generally speaking, queries aren't that different if you are already security conscious in verifying that the person retrieving the record is actually allowed to get it.
I disagree with the issues brought up by Corbin. The one around versioning should already be handled by having an attribute based security structure in place. That way you can turn things on/off via user or tenant configuration. Also, I find it very rare that client A doesn't want the same new feature that client B asked for.
The second one about data mingling is also a non-issue. Just look at salesforce.com or any of the other large scale sites. They absolutely use a multi-tenant architecture and judging by the shear number of clients that use them this doesn't seem to be a problem. The main thing here is being able to ensure to your clients that their data is secured.
If you're talking about 10 branches, multi-tenancy seems like a big cost with little benefit.
There are complications with multi-tenancy you don't mention:
Versioning becomes difficult. Clients X, Y, and Z may want a new feature while clients A, B, and C don't. A multi-tenant app makes accommodating everyone difficult, especially if a new feature requires database schema changes. It's not impossible, it's just more difficult.
Some clients are very uncomfortable with their data mingling in the same tables as other clients. Even though we know better, it feels like a security risk to them. Legal departments hate it. In addition, if you ever dump raw data for a client, a shared database requires caution.
You can eliminate a few of your pain points with better practices:
Automate deployment. This should make it easier to add a new client or upgrade/downgrade an existing client. Database maintenance (backups, rebuilding indexes) should be set up automatically as well.
Store shared data (SKUs, inventory) in a central database and have every application instance access it either directly or through a service.
Don't get me wrong, one of the more interesting apps I worked on was multi-tenant. There can be huge benefits, but you'll more likely see them with thousands of clients versus ten.
Honestly, this is a business question. You are either going to be able to deliver more customized features to a smaller user group in a multi-tenant setup but with more IT overhead. That is, you will need more people and hardware (management reads this:money) but deliver greater flexibility.
If you are in a one GIANT Borg situation then you lower your IT overhead (again, people & things, to management money) but your end users have to absorb less flexibility in their software. All bugs are problems for all users, so big ones get whacked fast. However new features impact all users as well so they happen slower.
If you personally have the juice to make this call and the business just has to listen to what you say, or you can nudge management one way or another I'd suggest asking YOURSELF a series of questions about which scenario you prefer:
A) Do you want to have to have more people managing this and share salary/responsibilty
B) To the best of your knowledge is there going to be a 4th user group soon?
C) How long do you want to stay at this company?
If you answer yes to the first two, then you probably want multi-tenant.
I work in a situation where, for regulatory/legal reasons, we have to keep each client's data in a separate database. However, there is certain information that must be shared, mostly related to things like a lookup table for which client's URL corresponds to which database. Also, a client can choose to have multiple databases if they wish to separate their data in some logical way. So, for each of our products, we really have three types of databases:
ApplicationData, which has just a few tables that contain information about the clients themselves, like which MasterData database (see below) to use when reached by a certain URL and which features are available to that client. Each product has just one ApplicationData, no matter how many different clients are using that product.
MasterData, which contains client-specific information such as users, roles, and permissions (in our case, the tables that aspnet_regsql creates are here). Among the permissions specified here are which ClientData databases are available to a given user (see below). The schema for all MasterData databases (for the same product) are the same.
ClientData, which contains the data with which the user interacts. In one product, this is data that the client can search based on a large number of criteria, create reports about, etc. In another product, this contains the dynamic data that a client can upload so that other users can contact people to take surveys over the phone, etc. The schema for all ClientData databases for the same product is the same.
Now, one caveat: We actually use the same schema, and often the same actual database, for MasterData and ClientData. This is for historical reasons, as the ability to allow a client to have one authentication database (MasterData) corresponding to a number of ClientData databases is a relatively new feature that only applies to one of our products. Also, this structure simplifies deployment, since most clients only use one ClientData database. However, MasterData and ClientData have separate entity models under Entity Framework in our projects, and we have to ensure that there are no direct relationships between MasterData and ClientData such as foreign keys.
This setup works pretty well for us. One major advantage is that there is no problem with putting different ClientData databases on different servers. This helps greatly with load balancing, and it provides a natural way to partition data. We can essentially offer a client with a huge amount of data a dedicated database server if they are willing to pay for it.
One more thing that has really helped us in this situation are Red Gate's tools, specifically tools like Multi-Script, SQL Source Control, and Schema Compare. When we upgrade something, and the schema changes, we have to deploy the changes to all the relevant databases. These tools have more than paid for themselves in time saved. Note that I have no affiliation with Red Gate other than as a satisfied user.
Edit: (in response to comment)
ApplicationData is one database per product. The three web-based products we have use the same schema for ApplicationData, since they record basically the same types of information. However, there is no reason it would have to stay that way. The ApplicationData databases are all on the same server. One of the tables in ApplicationData points to the correct server and database name for the client's MasterData, so MasterData for a given client can reside on any server.
MasterData has server and database name information for each ClientData database, so again, the databases can reside on any server. In practice, for now, we only have two production database servers total for these products. The MasterData schema is similar per product, but I do not think they are exactly the same (I would have to check). Each client has its own MasterData. If a client purchases multiple products, there is a MasterData for each product for that client; the products interact in other ways (through web services, basically) if a client has purchased that feature (or requests custom development of such a feature. ClientData for a given product always has the same schema.
So, in summary:
ApplicationData is per product and happens to have the same schema in each product.
MasterData is per client for a product.
There are one or more ClientData instances for a client within a product.
I did oversimplify slightly in that only one of our products supports multiple ClientData instances per client. For a second product, that will probably be implemented eventually. For a third product, it would make no sense at all as a feature and will likely just remain as is.
I hope that answers your question!
Well, if the tendency is towards sharing information and data among different branches you're probably better off having one central database.
Otherwise the hoops you'll have to go through to achieve the ability to share data will be far worse than the extra WHERE clauses needed for a shared DB.
You could, of course, have a DB per branch and an extra database (a fourth database as of now) as a centralized storage for the information that needs sharing. Although you'd have to see if the over-complication makes this a best or worst of both worlds solution :)
If we're talking about CRM, then what are the chances of one customer being in multiple databases? If there's even the slightest chance of you being asked to combine customer details across branches then I'd definitely go with one centralised database.
IMO decentralization is becoming a tenant of maintainable and scalable design. The only centralized database I use is for security for authentication, which I'm currently growing into a decentralized database for authorization. So all authorization can stay at the same edge as the application physically sits with no network traversals since authorization is not a great candidate for caching.
Reading that you're specifically interested in multiple branches of the same application as opposed to truly disparate applications, it sounds like a great option would be to build your database around a seeding process (Entity Framework supports this) that would allow you to just deploy your new branch code to ASP.NET, and then during the initial build up the database that the tables are physically created that you poll the "blessed" server and dump all needed data to the edge server.
After this you would need some replication setup if new products are being added to the primary data store and those are expected to make it to each edge store. You could accomplish this with direct replication of your database or look at tools like the Microsoft Sync framework.
You may think today that you will only have a few customers, but a few years from now you may realize that the product has the potential to be sold to hundreds of customers. If that happens you will regret that you used a single-tenant approach.
Compare the costs of:
Converting a production system from single-tenant to multi-tenant where databases are populated with customer data
Developing a multi-tenant system despite thinking you won't need the benefits
Converting a production system is a daunting and very expensive task.
Using the second approach may cost you more initially, but it does give you a very valuable option to be able to add more customers in the future at low cost. The price of that option could be worth paying.

Is it possible to "measure" the usage (e.g. in MBs) per user, of an SQL Server database in web farm conditions?

I have an ASP.NET web application hosted in a web-farm environment, and I need a way to be able to indicate how much a user is using my database.
There are several reasons for this, and I mention a couple. First, because I pay for the database space per month, I want to have a reasonable way to charge my users. Second, it would be nice to know (again in a per user basis) when to inform the user to upgrade his subscription.
I don't have enough experience in RDBMS, I come from a different background (windows applications, graphics), and so I can't figure out if this is possible, and if it is, how this can be handled: through SQL or ASP.NET (some tool, library, etc.).
If you, also, have some other idea, I'd like to hear what you suggest.
Any other advice on this subject, including good places to learn, would also be appreciated.
It depends on your schema. If you use a database-per-user multi-tenant schema then is very easy, the size of the database is the size consumed, and is really easy to measure and, morei mportantly, enforce. If you use a shared database schema then you'll need to keep track in each table of what rows belong to which user and keep accounting. Both measurement and enforcement are more difficult and there is no general answer, you will have to properly code for accounting the bytes used and to enforce any max size per user constraint.

How to upgrade my asp.net app to support more users?

When an asp.net website has about 1,000 active users, it works good.
How should I do if the website has about 100,000 active users?
How to upgrade my asp.net app to support a larger number of users?
Changing the webApp's architecture?
Or buying more web servers?
I just wonder in the real-world, how do other people build an asp.net website supporting millions of users? What's the app architecture of a website to support that?
Any suggestion will be welcome.
First, make sure you're with a first rate hosting provider.
Second, download a performance profiler (I always suggest Red Gate Performance Profiler) and profile your app. Find the bottlenecks and eliminate them. Repeat until you get your desired performance metric.
If your application is querying a database or other web services, try to use asynchronous methods. Using asynch methods will free up the web server to handle a lot more client requests while it is waiting for a response from the database server or web service.
You say it "works good" at the moment. It's impossible to know what the point at which this may change will be wihtout knowing a whole lot more about the nature of your traffic, current set up, what else runs on the server, etc ,etc. It could be that it continues to "work good" with a million users as it is.
When you need to make changes (and slowly reducing performance will alert you), that's whne you need to worry. And then, as Justin says, knowing the potential bottelnecks will give you pointers as to what solution you need.
Buying more servers is one strategy. So is changing the architecture. The easiest and cost effective is throwing more servers at it. It does depend a little bit on the current application architecture, but nothing that can't be easily overcome.
What I suggest, is to load test your application. See what happens as you increase the active users. Who knows it might handle 100k active users, maybe it won't but at least you will know the tipping point.
In regards to what you should do, that really depends on your business needs. If your company has the $$ and this is a core product, then it makes sense to architect a robust application. If it's not, maybe throwing hardware at the problem is good enough.
It would also help if you could define an active user. Is it someone who is visiting your site and has a session? Is it 100k concurrent requests to the server...?
In terms of hardware scaling: Scaling Up or Scaling out
Software scaling - Profile your app

Resources