Data aggregation - multiple websites, single superuser site

Data aggregation - multiple websites, single superuser site - asp.net

I have a requirement for a set of asp.net MVC websites as follows:
Multiple sites, using the same codebase, but each site will have a separate database (this is a requirement), and users will login and enter data.
A single site for super users where they log in and work on data aggregated from each of the individual sites.
The number of sites in point one is liable to expand as we roll it out to more clients.
My question is about the architecture of the above - how to manage the data aggregation, given that it needs to be real time. Do we maintain this at the database level (e.g. a view that is essentially a union across the individual site databases), or at the application level.
A few infrastructure points:
We have complete control over the database server and naming of databases.
All these websites are deployed onto a server that we manage.
I'd appreciate any input/ideas from folks that may have done this before.

Does the data aggregation have to be completely real-time, or can you get away with almost real-time? If "almost real-time" is acceptable then you can write a service application that harvests the data from the sites databases into your single central database. As long as the process runs continuously and you don't have too many sites to gather data from the delay should be more or less invisible for the user.
Having a view that accumulates the data from all the databases doesn't sound like a good solution. Not only will it probably be very slow, but you will also have to update the view whenever you add a new site.
What is the intention of the super user site, btw? Is it only for reporting or should super users edit the data across all sites as well? That may affect which solution you choose.

Related

Data access in ASP.NET: In-memory collection vs database

I'm new to ASP.Net, MVC and the Entity Framework.
I'd like to understand the best practice for small databases. For example, say at Contoso University we know there are only going to be a few hundred or a few thousand students and courses. So all the data would comfortably fit in memory. So then is it better to use an in-memory collection and avoid potentially high-latency database operations?
I am thinking of small-scale production web sites deployed to Windows Azure.
To be more specific, the particular scenario I am thinking of has a few thousand records that are read-only, although users can create their own items too. Think of a collection of movies, albums or song lyrics that has been assembled offline from a list of a few thousand popular titles. The user can browse the collection (read-only), and most of the time they find what they are looking for there. However the user can also add their own records.
Since the popular titles fit in memory, and these are read-only, is it maybe better not to use a database for the popular titles? How would you organize data and code for this scenario?
Thanks for any thoughts and pointers.

I think a database is good place to store your information.
However, you are concerned about database latency.
You can mitigate that with caching - the data is stored in memory.
In short, it isn't an either or scenario...

You should definitely store your data in some persistent storage medium (SQL, Azure Tables, XML file, etc). The issues with storing items in memory are:
You have to find a way to store them once for the application and not once per user. Else, you will have potentially several copies of a 2-5 MB dataset floating around your memory space.
Users can add records, are these for everyone to see or just them. How would you handle user specific data.
If your app pool recycles, server gets moved by the Azure engineers, etc, you have to repopulate that data.
As said above, caching can really help to alleviate any SQL Azure latency (which btw, is not that high, we use SQL Azure and web roles and have not had any issues).
Complex queries. Sure, you can use LINQ to process in memory lists, but SQL is literally built to perform relational queries in a fast, efficient, data-safe manner.
Thread safe operations on an in-memory collection could be troublesome.
Edit/Addendum
The key, we have found, to working with SQL Azure is to not issue tons of little tiny queries, but rather, get the data you need in as few queries as possible. This is something all web applications should do, but it becomes much more apparent when using SQL Azure rather than a locally hosted database. Lastly, as far as performance/caching/etc, don't prematurely optimize! Get your application working, then identify bottlenecks. More often than not, it will be a code solution to fix the bottleneck and not necessarily a hardware/infrastructure issue.

Best caching framework for asp.net application

I have an order system developed on asp.net 4 web forms. I need to store order details (order object) for a user on the cache in order to manage it till I save it in the DB.
I want to install my site at least on two server with option to scale for more in the future .
As you know , the two servers are located behind load balancer , so I need the cached order object to be shared on the both servers.
I hear about App fabric.
Any recommendation to good frameworks to do that , Hope will be simple and easy to maintain one .
Thanks in advance ...

I need to store order details (order object) for a user on the cache
in order to manage it till I save it in the DB.
If your data is not persisted, SQL Server-based Session state will work across machines on a per-user basis and can be configured with a minimum of fuss.
However, I would suggest regularly saving the order to your application database (not just the Session database) so that the user doesn't lose it. This is fairly standard practice on e-commerce sites. Unless the order process is very short, inevitably the user will want to pause and return, or accidentally close the browser, spill coffee into their computer, etc.
Either way, the database makes a good intermediate and/or permanent location for this data.

Sending notifications according to database value changes

I am working on a vendor portal. An owner of a shop will login and in the navigation bar (similar to facebook) I would like the number of items sold to appear INSTANTLY, WITHOUT ANY REFRESH. In facebook, new notifications pop up immediately. I am using sql azure as my database. Is it possible to note a change in the database and INSTANTLY INFORM the user?
Part 2 of my project will consist of a mobile phone app for the vendor. In this app I, too , would like to have the same notification mechanism. In this case, would I be correct if I search on push notifications and apply them?
At the moment my main aim is to solve the problem in paragraph 1. I am able to retrieve the number of notifications, but how on earth is it possible to show the changes INSTANTLY? thank you very much

First you need to define what INSTANT means to you. For some, it means within a second 90% of the time. For others, they would be happy to have a 10-20 second gap on average. And more importantly, you need to understand the implications of your requirements; in other words, is it worth it to have near zero wait time for your business? The more relaxed your requirements, the cheaper it will be to build and the easier it will be to maintain.
You should know that having near-time notification can be very expensive in terms of computing and locking resources. The more you refresh, the more web roundtrips are needed (even if they are minimal in this case). Having data fresh to the second can also be costly to the database because you are potentially creating a high volume of requests, which in turn could affect otherwise good performing requests. For example, if your website runs with 1000 users logged on, you may need 1000 database requests per second (assuming that's your definition of INSTANT), which could in turn create a throttling condition in SQL Azure if not designed properly.
An approach I used in the past, for a similar requirement (although the precision wasn't to the second; more like to the minute) was to load all records from a table in memory in the local website cache. A background thread was locking and refreshing the in memory data for all records in one shot. This allowed us to reduce the database traffic by a factor of a thousand since the data presented on the screen was coming from the local cache and a single database connection was needed to refresh the cache (per web server). Because we had multiple web servers, and we needed the data to be exactly the same on all web servers within a second of each other, we synchronized the requests of all the web servers to refresh the cache every minute. Putting this together took many hours, but it allowed us to build a system that was highly scalable.
The above technique may not work for your requirements, but my point is that the higher the need for fresh data, the more design/engineering work you will need to make sure your system isn't too impacted by the freshness requirement.
Hope this helps.

To Multi-Tenant, or Not To Multi-tenant

I have a difficult database design decision to make regarding multi-tenancy for the growing number of branches of my client's web-based CRM, which I actively maintain.
I made the decision early on to use separate applications with separate databases for each branch, because it was the simplest way to cater for three different branches with disparate data and code requirements. I also wanted to avoid managing Tenant IDs in every query, like I had to with the legacy Classic ASP (cringe) application I built in 2007...the horror.
But now the data requirements for branches are converging and as the business expands, I need to be able to roll out new branches quickly and share global product SKUs.
Since tables and views are the same for all branches and better ORM tools are now available to manage multi-tenant applications, I wonder if it would be better to have a shared database for multiple branches.
Considerations for a centralised database:
Global product SKUs
Simplified inventory requisitions
Easier to backup
Deploy once instead of for every branch
Considerations against a centralised database:
Easier to differentiate branch requirements with separate DBs
Modular deployment (one downed branch doesn't break all)
Harder to manage and develop for shared DB
I have to re-design invoice numbering (sequence generated by seed)
Less WHERE clauses everywhere
Restoring one broken branch has plenty of implications for other branches
It is unlikely there will be ever be as many as 10 branches. Right now there are 3.
Developers with real-world experience in this area, what would you do in my situation? Keep apps & DBs separate, or combine into one giant system?
Edit: Great Microsoft article on multi-tenancy pro's and cons. I should note that data isolation between branches is not a major issue.

Bite the bullet and merge them. Add your tenant ID where it needs to be, and change your queries.
For customizations, look into a plugin type architecture that would allow you to deploy specific screens for particular clients.
We have a software product that is built in just such a fashion. Sometimes it's deployed on a client site, sometimes we host it. For all intents and purposes it is an order of magnitude easier to deal with a single code base that has client specific extensions than dealing with multiple branches of the code.
For one, when we fix a problem, we fix it for everyone. Sure, if we break it, we break it for everyone but that's what unit tests are for. And it is a heck of a lot easier to maintain a set of unit tests against one code base than it is to maintain them for multiple branches.
We've been doing multi-tenant for over 10 years and not once have I looked back. Generally speaking, queries aren't that different if you are already security conscious in verifying that the person retrieving the record is actually allowed to get it.
I disagree with the issues brought up by Corbin. The one around versioning should already be handled by having an attribute based security structure in place. That way you can turn things on/off via user or tenant configuration. Also, I find it very rare that client A doesn't want the same new feature that client B asked for.
The second one about data mingling is also a non-issue. Just look at salesforce.com or any of the other large scale sites. They absolutely use a multi-tenant architecture and judging by the shear number of clients that use them this doesn't seem to be a problem. The main thing here is being able to ensure to your clients that their data is secured.

If you're talking about 10 branches, multi-tenancy seems like a big cost with little benefit.
There are complications with multi-tenancy you don't mention:
Versioning becomes difficult. Clients X, Y, and Z may want a new feature while clients A, B, and C don't. A multi-tenant app makes accommodating everyone difficult, especially if a new feature requires database schema changes. It's not impossible, it's just more difficult.
Some clients are very uncomfortable with their data mingling in the same tables as other clients. Even though we know better, it feels like a security risk to them. Legal departments hate it. In addition, if you ever dump raw data for a client, a shared database requires caution.
You can eliminate a few of your pain points with better practices:
Automate deployment. This should make it easier to add a new client or upgrade/downgrade an existing client. Database maintenance (backups, rebuilding indexes) should be set up automatically as well.
Store shared data (SKUs, inventory) in a central database and have every application instance access it either directly or through a service.
Don't get me wrong, one of the more interesting apps I worked on was multi-tenant. There can be huge benefits, but you'll more likely see them with thousands of clients versus ten.

Honestly, this is a business question. You are either going to be able to deliver more customized features to a smaller user group in a multi-tenant setup but with more IT overhead. That is, you will need more people and hardware (management reads this:money) but deliver greater flexibility.
If you are in a one GIANT Borg situation then you lower your IT overhead (again, people & things, to management money) but your end users have to absorb less flexibility in their software. All bugs are problems for all users, so big ones get whacked fast. However new features impact all users as well so they happen slower.
If you personally have the juice to make this call and the business just has to listen to what you say, or you can nudge management one way or another I'd suggest asking YOURSELF a series of questions about which scenario you prefer:
A) Do you want to have to have more people managing this and share salary/responsibilty
B) To the best of your knowledge is there going to be a 4th user group soon?
C) How long do you want to stay at this company?
If you answer yes to the first two, then you probably want multi-tenant.

I work in a situation where, for regulatory/legal reasons, we have to keep each client's data in a separate database. However, there is certain information that must be shared, mostly related to things like a lookup table for which client's URL corresponds to which database. Also, a client can choose to have multiple databases if they wish to separate their data in some logical way. So, for each of our products, we really have three types of databases:
ApplicationData, which has just a few tables that contain information about the clients themselves, like which MasterData database (see below) to use when reached by a certain URL and which features are available to that client. Each product has just one ApplicationData, no matter how many different clients are using that product.
MasterData, which contains client-specific information such as users, roles, and permissions (in our case, the tables that aspnet_regsql creates are here). Among the permissions specified here are which ClientData databases are available to a given user (see below). The schema for all MasterData databases (for the same product) are the same.
ClientData, which contains the data with which the user interacts. In one product, this is data that the client can search based on a large number of criteria, create reports about, etc. In another product, this contains the dynamic data that a client can upload so that other users can contact people to take surveys over the phone, etc. The schema for all ClientData databases for the same product is the same.
Now, one caveat: We actually use the same schema, and often the same actual database, for MasterData and ClientData. This is for historical reasons, as the ability to allow a client to have one authentication database (MasterData) corresponding to a number of ClientData databases is a relatively new feature that only applies to one of our products. Also, this structure simplifies deployment, since most clients only use one ClientData database. However, MasterData and ClientData have separate entity models under Entity Framework in our projects, and we have to ensure that there are no direct relationships between MasterData and ClientData such as foreign keys.
This setup works pretty well for us. One major advantage is that there is no problem with putting different ClientData databases on different servers. This helps greatly with load balancing, and it provides a natural way to partition data. We can essentially offer a client with a huge amount of data a dedicated database server if they are willing to pay for it.
One more thing that has really helped us in this situation are Red Gate's tools, specifically tools like Multi-Script, SQL Source Control, and Schema Compare. When we upgrade something, and the schema changes, we have to deploy the changes to all the relevant databases. These tools have more than paid for themselves in time saved. Note that I have no affiliation with Red Gate other than as a satisfied user.
Edit: (in response to comment)
ApplicationData is one database per product. The three web-based products we have use the same schema for ApplicationData, since they record basically the same types of information. However, there is no reason it would have to stay that way. The ApplicationData databases are all on the same server. One of the tables in ApplicationData points to the correct server and database name for the client's MasterData, so MasterData for a given client can reside on any server.
MasterData has server and database name information for each ClientData database, so again, the databases can reside on any server. In practice, for now, we only have two production database servers total for these products. The MasterData schema is similar per product, but I do not think they are exactly the same (I would have to check). Each client has its own MasterData. If a client purchases multiple products, there is a MasterData for each product for that client; the products interact in other ways (through web services, basically) if a client has purchased that feature (or requests custom development of such a feature. ClientData for a given product always has the same schema.
So, in summary:
ApplicationData is per product and happens to have the same schema in each product.
MasterData is per client for a product.
There are one or more ClientData instances for a client within a product.
I did oversimplify slightly in that only one of our products supports multiple ClientData instances per client. For a second product, that will probably be implemented eventually. For a third product, it would make no sense at all as a feature and will likely just remain as is.
I hope that answers your question!

Well, if the tendency is towards sharing information and data among different branches you're probably better off having one central database.
Otherwise the hoops you'll have to go through to achieve the ability to share data will be far worse than the extra WHERE clauses needed for a shared DB.
You could, of course, have a DB per branch and an extra database (a fourth database as of now) as a centralized storage for the information that needs sharing. Although you'd have to see if the over-complication makes this a best or worst of both worlds solution :)

If we're talking about CRM, then what are the chances of one customer being in multiple databases? If there's even the slightest chance of you being asked to combine customer details across branches then I'd definitely go with one centralised database.

IMO decentralization is becoming a tenant of maintainable and scalable design. The only centralized database I use is for security for authentication, which I'm currently growing into a decentralized database for authorization. So all authorization can stay at the same edge as the application physically sits with no network traversals since authorization is not a great candidate for caching.
Reading that you're specifically interested in multiple branches of the same application as opposed to truly disparate applications, it sounds like a great option would be to build your database around a seeding process (Entity Framework supports this) that would allow you to just deploy your new branch code to ASP.NET, and then during the initial build up the database that the tables are physically created that you poll the "blessed" server and dump all needed data to the edge server.
After this you would need some replication setup if new products are being added to the primary data store and those are expected to make it to each edge store. You could accomplish this with direct replication of your database or look at tools like the Microsoft Sync framework.

You may think today that you will only have a few customers, but a few years from now you may realize that the product has the potential to be sold to hundreds of customers. If that happens you will regret that you used a single-tenant approach.
Compare the costs of:
Converting a production system from single-tenant to multi-tenant where databases are populated with customer data
Developing a multi-tenant system despite thinking you won't need the benefits
Converting a production system is a daunting and very expensive task.
Using the second approach may cost you more initially, but it does give you a very valuable option to be able to add more customers in the future at low cost. The price of that option could be worth paying.

SQLite use it for websites, but not for client/server apps?

After reading this question and the suggested link explaining when is more appropriate to use SQLite vs another DB it's still unclear to me one simple thing, and I hope someone could clarify it.
They say:
Situations Where SQLite Works Well
Websites
SQLite usually will work
great as the database engine for low
to medium traffic websites...
...
Situations Where Another RDBMS May
Work Better
Client/Server
Applications...
If you have many
client programs accessing a common
database over a network...
Isn't a website also a client/server app?
I mean I don't understand, a website is exactly a situation where I have many client programs (users with their web browsres) concurrently accessing a common DB via one server application.
Just to keep it simple: at the end of the day, is it possible for instance to use this SQLite for an ecommerce site or an online catalog or a CMS site with about 1000 products/pages?

The users' web browsers don't directly access the database; the web application does. And normally the request/response cycle for each page the user views will be very fast, usually lasting a fraction of a second.
IIRC, a transaction in SQLite locks the whole database file, meaning that if a web app request requires a blocking transaction, all traffic will effectively be serialized. This is fine, for a low-to-medium traffic website, because many requests per second can still be handled.
In a client-server database application, however, multiple users may need to keep connections open for longer periods of time, and may also need to perform transactions. This is far less of a problem for bigger RDBMS systems because locking can be performed in a more fine-grained way.

SQLite can allow multiple client reads but only single client write. See: https://www.sqlite.org/faq.html
Client/server is when multiple clients do simultaneous writes to the database, such as order entry where there are multiple users simultanously inserting and updating information, or a multi-user blog where there are multiple simultaneous editors.
A website, in the case of read-only, is not client/server but rather simply a server with multiple requests. In many cases, a website is heavily cached and the database is not even accessed, or rarely.
In the case of a slightly used ecommerce website, say a few simultaneous shoppers, this could be supported by SQLite, or by MySQL. Somewhere there is a line where performance is better for a highly-concurrent database as opposed to SQLite.
Note that the number of products/pages is not a great way to determine the requirement for MySQL over SQLite, rather it is the number of concurrent users, and at what point their concurrent behavior experiences slowness due to waiting for locks to clear.

A website isn't necessarily a client server application in the context of use.
I think when they say website, they mean that the web application will directly manage the database. That is, the database file will live within the web site and will not be access via any other means. (A single point of access, put simply)
In contrast, a client/server app may have the web site accessing the data store as well as another web site, SOAP client or even a smart client. IN this context, you have multiple clients access one database (server). This is where the web site would become (yet another) client.
Another aspect to consider when constrasting the two, is what is the percentage of writes compared to reads. I think SQLite will perform happiply when there is little writing going on compared to the amount of reads. SQLite, I understand, doesn't do well in a multiple write scenario. It's intended for a single (handful?) process to be manipulating it.

I mainly only use SQLite on embedded applications. (iOS, Android). For larger, more complex websites (like your describing) I would use something like mySQL.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex