I have an application which works well on a single server. My customer now wants to move it to a load-balanced environment. Which things are likely to bite me when doing this?
Currently I know of
Session state, and
Machine key.
Both of these are described here, for example, so I'm looking for things additional to this.
These similar questions, but the first addresses load balancing in general, and I'm looking for a migration guide, and the second addresses a specific problem.
One thing that you might experience is an increased load on your database server. If you implement any serverside caching of data, you will be used to your site hitting the database once for a given dataset, caching the data and then not going to the database again until the cache times out. This is a good strategy for commonly accessed data.
In a load-balanced environment, subsequent requests might go to different servers and the database will be hit twice for the same data. Or more if you have more than 2 servers. This is not bad in itself, but you would be well advised to keep an eye on it. If the database is queueing, the benefits of running a webfarm might be negated. As always, benchmark, profile and run tests.
One way around this is to have sticky-sessions. This is a router-based approach. once a user session is established, all requests from that user are routed to the same server. This has its draw backs, most notably, a potential decrease in the efficiency of the load-balancing, not to mention the problems when you lose a server. Furthermore, it will only help when you are caching mostly user-specific data, such as paged search results, which are only cached for a short time.
Another solution is to have a distributed, in memory cache, such as memcached, Velocity or NCache. There are others as well, but these are the ones that I have worked with.
Another thing to look out for, which is unrelated to the above is: How you deal with file uploads from your users. Many sites allow files to be uploaded by users. If such files have not previously been saved to a central store, such as a database or a common file share, they will need to be.
Look at this article which describes some tips regarding preparing asp.net application for load balancing.
Related
I have an application that needs to send data to a cloud database (DynamoDb).
The app runs on a computer that can lose internet connectivity or be switched off at any time, but I must ensure that all data eventually gets to the cloud database.
I can assume the application will eventually be switched on, and will eventually get internet access back.
The app is written in VB .NET
What are some schemes for achieving this, and are there any ready-made products that already achieve this?
You could implement a write-through cache using a local DynamoDB instance (or even using SQLite). But without getting specific details about what kind of data you'd be storing into the database, and what data should be made available "offline" it's hard to say exactly how you should structure your application. You'll definitely want to not keep everything local, unless the volume of data is really small overall.
Then there is the problem of resolving conflicts that may occur during network partitions (ie. a client goes offline and makes some database modifications, while other clients also make modifications to the database; these need to be reconciled and it's up to you, and your users to determine how)
It's not a simple problem to solve.
We are preparing to scale the API side of an API-heavy web application. My (technically savvy) client proposes a rather unconventional approach to this: instead of balancing the load to several app servers, which would talk to a sharded database, he wants us to:
“shard the app servers”, putting both app server code and db on each physical server, so that the app server only connects to its own db shard;
have the app servers talk to each other when they need to access other shards (instead of talking to another shard's DB directly);
have the API client pick an app shard itself (on the client side, based on some stable hash) and talk directly to it.
The underlying reasoning is that this is the most natural thing to do it, and that this would allow us to move to a multisite distributed system in the future.
(The stack is PHP + Node.js on MySQL, although at this point a transition to MongoDB is considered too.)
Now, I don't see huge problems with it off the shelf. It might get somewhat cumbersome to code these server-to-server interactions, but then it will surely have its own benefits. Basically I'm at a loss on whether this is a good idea or not.
What pros and cons come to your mind? I'm looking for technical issues and advantages here. Thanks!
This is just plain bad for many reasons.
The API client should not know which app shard to talk to. This will limit you in ways you probably can't foresee now, but may/will become a problem in the future. The API client should play dumb so you can route requests appropriately if an app server dies, changes, gets sharded again etc.
What happens if your app code or database architecture is slow? (Not both at the same time, just one). Now you have a db shard slowing down an app shard.
Your db+app shards will need to keep both app code+memory and db code+memory in RAM. This means the CPUs will spend more time swapping code and memory in and out to perform both sets of tasks.
I'm finding it hard to put down in words, but this type of architecture screams 'bad coupling' and 'no separation of concerns' (probably not the right terminology but I hope you understand what I mean). You are putting two distinctly different types of applications (app server and database) onto one box. The management nightmare of updating them and routing around failed instances will be very difficult.
I hate to argue my point this way, but a lot of very smart people have dealt with these problems before and I've never heard of this type of architecture. There's probably a reason for it. Not to mention there's a lot of technology and resources out there that can help you handle traditional sharding and load balancing of app and database servers. If you go with your client's suggested architecture you're on your own.
After reading this question and the suggested link explaining when is more appropriate to use SQLite vs another DB it's still unclear to me one simple thing, and I hope someone could clarify it.
They say:
Situations Where SQLite Works Well
Websites
SQLite usually will work
great as the database engine for low
to medium traffic websites...
...
Situations Where Another RDBMS May
Work Better
Client/Server
Applications...
If you have many
client programs accessing a common
database over a network...
Isn't a website also a client/server app?
I mean I don't understand, a website is exactly a situation where I have many client programs (users with their web browsres) concurrently accessing a common DB via one server application.
Just to keep it simple: at the end of the day, is it possible for instance to use this SQLite for an ecommerce site or an online catalog or a CMS site with about 1000 products/pages?
The users' web browsers don't directly access the database; the web application does. And normally the request/response cycle for each page the user views will be very fast, usually lasting a fraction of a second.
IIRC, a transaction in SQLite locks the whole database file, meaning that if a web app request requires a blocking transaction, all traffic will effectively be serialized. This is fine, for a low-to-medium traffic website, because many requests per second can still be handled.
In a client-server database application, however, multiple users may need to keep connections open for longer periods of time, and may also need to perform transactions. This is far less of a problem for bigger RDBMS systems because locking can be performed in a more fine-grained way.
SQLite can allow multiple client reads but only single client write. See: https://www.sqlite.org/faq.html
Client/server is when multiple clients do simultaneous writes to the database, such as order entry where there are multiple users simultanously inserting and updating information, or a multi-user blog where there are multiple simultaneous editors.
A website, in the case of read-only, is not client/server but rather simply a server with multiple requests. In many cases, a website is heavily cached and the database is not even accessed, or rarely.
In the case of a slightly used ecommerce website, say a few simultaneous shoppers, this could be supported by SQLite, or by MySQL. Somewhere there is a line where performance is better for a highly-concurrent database as opposed to SQLite.
Note that the number of products/pages is not a great way to determine the requirement for MySQL over SQLite, rather it is the number of concurrent users, and at what point their concurrent behavior experiences slowness due to waiting for locks to clear.
A website isn't necessarily a client server application in the context of use.
I think when they say website, they mean that the web application will directly manage the database. That is, the database file will live within the web site and will not be access via any other means. (A single point of access, put simply)
In contrast, a client/server app may have the web site accessing the data store as well as another web site, SOAP client or even a smart client. IN this context, you have multiple clients access one database (server). This is where the web site would become (yet another) client.
Another aspect to consider when constrasting the two, is what is the percentage of writes compared to reads. I think SQLite will perform happiply when there is little writing going on compared to the amount of reads. SQLite, I understand, doesn't do well in a multiple write scenario. It's intended for a single (handful?) process to be manipulating it.
I mainly only use SQLite on embedded applications. (iOS, Android). For larger, more complex websites (like your describing) I would use something like mySQL.
I build ASP.NET websites (hosted under IIS 6 usually, often with SQL Server backends and forms authentication).
Clients sometimes ask if I can check whether there are people currently browsing (and/or whether there are users currently logged in to) their website at a given moment, usually so the can safely do a deployment (they want a hotfix, for example).
I know the web is basically stateless so I can't be sure whether someone has closed the browser window, but I imagine there'd be some count of not-yet-timed-out sessions or something, and surely logged-in-users...
Is there a standard and/or easy way to check this?
Jakob's answer is correct but does rely on installing and configuring the Membership features.
A crude but simple way of tracking users online would be to store a counter in the Application object. This counter could be incremented/decremented upon their sessions starting and ending. There's an example of this on the MSDN website:
Session-State Events (MSDN Library)
Because the default Session Timeout is 20 minutes the accuracy of this method isn't guaranteed (but then that applies to any web application due to the stateless and disconnected nature of HTTP).
I know this is a pretty old question, but I figured I'd chime in. Why not use Google Analytics and view their real time dashboard? It will require minor code modifications (i.e. a single script import) and will do everything you're looking for...
You may be looking for the Membership.GetNumberOfUsersOnline method, although I'm not sure how reliable it is.
Sessions, suggested by other users, are a basic way of doing things, but are not too reliable. They can also work well in some circumstances, but not in others.
For example, if users are downloading large files or watching videos or listening to the podcasts, they may stay on the same page for hours (unless the requests to the binary data are tracked by ASP.NET too), but are still using your website.
Thus, my suggestion is to use the server logs to detect if the website is currently used by many people. It gives you the ability to:
See what sort of requests are done. It's quite easy to detect humans and crawlers, and with some experience, it's also possible to see if the human is currently doing something critical (such as writing a comment on a website, editing a document, or typing her credit card number and ordering something) or not (such as browsing).
See who is doing those requests. For example, if Google is crawling your website, it is a very bad idea to go offline, unless the search rating doesn't matter for you. On the other hand, if a bot is trying for two hours to crack your website by doing requests to different pages, you can go offline for sure.
Note: if a website has some critical areas (for example, writing this long answer, I would be angry if Stack Overflow goes offline in a few seconds just before I submit my answer), you can also send regular AJAX requests to the server while the user stays on the page. Of course, you must be careful when implementing such feature, and take in account that it will increase the bandwidth used, and will not work if the user has JavaScript disabled).
You can run command netstat and see how many active connection exist to your website ports.
Default port for http is *:80.
Default port for https is *:443.
When an asp.net website has about 1,000 active users, it works good.
How should I do if the website has about 100,000 active users?
How to upgrade my asp.net app to support a larger number of users?
Changing the webApp's architecture?
Or buying more web servers?
I just wonder in the real-world, how do other people build an asp.net website supporting millions of users? What's the app architecture of a website to support that?
Any suggestion will be welcome.
First, make sure you're with a first rate hosting provider.
Second, download a performance profiler (I always suggest Red Gate Performance Profiler) and profile your app. Find the bottlenecks and eliminate them. Repeat until you get your desired performance metric.
If your application is querying a database or other web services, try to use asynchronous methods. Using asynch methods will free up the web server to handle a lot more client requests while it is waiting for a response from the database server or web service.
You say it "works good" at the moment. It's impossible to know what the point at which this may change will be wihtout knowing a whole lot more about the nature of your traffic, current set up, what else runs on the server, etc ,etc. It could be that it continues to "work good" with a million users as it is.
When you need to make changes (and slowly reducing performance will alert you), that's whne you need to worry. And then, as Justin says, knowing the potential bottelnecks will give you pointers as to what solution you need.
Buying more servers is one strategy. So is changing the architecture. The easiest and cost effective is throwing more servers at it. It does depend a little bit on the current application architecture, but nothing that can't be easily overcome.
What I suggest, is to load test your application. See what happens as you increase the active users. Who knows it might handle 100k active users, maybe it won't but at least you will know the tipping point.
In regards to what you should do, that really depends on your business needs. If your company has the $$ and this is a core product, then it makes sense to architect a robust application. If it's not, maybe throwing hardware at the problem is good enough.
It would also help if you could define an active user. Is it someone who is visiting your site and has a session? Is it 100k concurrent requests to the server...?
In terms of hardware scaling: Scaling Up or Scaling out
Software scaling - Profile your app