What's your disaster recovery plan? [closed] - asp.net

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
And what would you recommend for an ASP Net web application, with a not so large SQL server database (around 10Gb)?
I was just wondering, is that a good idea to have an Amazon EC2 instance configured ready to host your app in an emergency?
In this scenario, what would be the best approach to keep the database updated (log shipping? manual backup restore?) and the easiest and fastest way to change the dns settings?
Edit: the acceptable downtime would be something between 4 to 6 hours, thats why i considered using the Amazon ec2 option for its lower cost if compared to renting a secondary server.

Update - Just saw your comment. Amazon EC2 with log shipping is definitely the way to go. Don't use mirroring because that normally assumes the other standby database is available. Changing your DNS should not take more than 1/2 hour if you set your TTL to that. That would give you time to integrate any logs that are pending. Might turn on the server once a week or so just to integrate logs that are pending (or less to avoid racking up hourly costs.)
Your primary hosting location should have redundancy at all levels:
Multiple internet connections,
Multiple firewalls set to failover,
Multiple clustered web servers,
Multiple clustered database servers,
If you store files, use a SAN or Amazon S3,
Every server should have some form of RAID depending on the server's purpose,
Every server can have multiple PSUs connected to separate power sources/breakers,
External and internal server monitoring software,
Power generator that automatically turns on when the power goes out, and a backup generator for good measure.
That'll keep you running at your primary location in the event of most failure scenarios.
Then have a single server set up at a remote location that is kept updated using log shipping and include it in your deployment script (after your normal production servers are updated...) A colocated server on the other side of the country does nicely for these purposes. To minimize downtime of having to switch to the secondary location keep your TTL on the DNS records as low as you are comfortable.
Of course, so much hardware is going to be steep so you'll need to determine what is worth being down for 1 second, 1 minute, 10 minutes, etc. and adjust accordingly.

It all depends on what your downtime requirements are. If you've got to be back up in seconds in order to not lose your multi-billion dollar business, then you'll do things a lot differently to if you've got a site that makes you maybe $1000/month and whose revenue won't be noticeably affected if it's down for a day.
I know that's not a particularly helpful answer, but this is a big area, with a lot of variables, and without more information it's almost impossible to recommend something that's actually going to work for your situation (since we don't really know what your situation is).

The starting point for a rock solid DR Strategy is to first work out what the true cost is to the business of your server/platform downtime.
The following article will get you started along the right lines.
https://web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-10878_11-1038783.html
If you require further guidelines good old Google can provide plenty more reading.
A project of this nature requires you to collaborate with your key business decision makers and you will need to communicate to them what the associated costs of downtime are and what the business impact would be. You will likely need to collaborate with several business units in order to gather the required information. Collectively you then need to come to a decision as to what is considered acceptable downtime for your business. Only then can you devise a DR strategy to accommodate these requirements.
You will also find that conducting this exercise may highlight shortcomings in your platforms current configuration with regard to high availability and this may also need to be reviewed as an aside project.
The key point to take away from all of this is that the decision as to what is an acceptable period of downtime is not for the DBA alone to decide but rather to provide the information and expert knowledge necessary so that a realistic decision can be reached. Your task is to implement a strategy that can meet the business requirements.
Don’t forget to test your DR strategy by conducting a test scenario in order to validate your recovery times and to practice the process. Should the time come when you need to implement your DR strategy you will likely be under pressure, your phone will be ringing frequently and people will be hovering around you like mosquitoes. Having already honed and practiced your DR response, you can be confident in taking control of the situation and implementing the recovery will be a smooth process.
Good luck with your project.

I haven't worked with different third party tools but I've experienced cloudendure, and as for the replica you get I can tell it is a really high end product. Replication is done in really tiny time intervals which makes your replica very reliable, but I can see you're not in need of having your site back up within seconds so maybe asking for a price offer or getting away with a different vendor might help.

Related

How to speed .NET MVC site deployed on AZURE? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Of course i want to reach maximum perfomance.
What can I do for it?
Use Bundles for CSS & JS files? Ok.
What kind of storage shold I use? Now its SQL Database.
But site and DB are placed in different regions. Size of DB will be not too big -1 gb - is enough. And - how to reduce query-time. Now - it's too long.
Should I turn on "always on" feature for my site?
Is there anything else? Is the any article ti read?
Thanks in advance.
There is only so much optimization you can do - if you really want "maximum performance" then you'd rewrite your site in C/C++ as a kext or driver-service and store all of your data in memcached, or maybe encode your entire website as a series of millions of individual high-frequency electronic logic-gates all etched into an integrated circuit and hooked-up directly to a network interface...
...now that we're on realistic terms ;) your posting has the main performance-issue culprit right there: your database and webserver are not local to each other, which is a problem: every webpage users request is going to trigger a database request, and if the database is more than a few miliseconds away then it's going to cause problems (MSSQL Server has a rather chatty network protocol too, which multiplies the latency effect considerably).
Ideally, total page generation time from request-sent to response-arrived should be under 100ms before users will notice your site being "slow". Considering that a webserver might be 30ms or more from the client, that means you have approximately 50-60ms to generate the page, which means your database server has to be within 0-3ms of your webserver. Even 5ms latency is too great because something as innocuous as 3-4 database queries is going to incur a delay of at least 4 * ( 5ms + DB read time)ms - DB read-time can vary from 0ms (if the data is in memory) or up to 20ms if it's on a slow platter drive, or even slower depending on server-load - that's how you can easily find a "simple" website taking over 100ms just to generate on the server, let alone send to the client.
In short: move your DB to a server on the same local network as your webserver to reduce the latency.
The immediate and simplest way to start in your conditions is to move the database and the site in the same datacenter.
Later you may think to:
INSTRUMENT YOUR CODE
Add (Azure Redis) Cache
Load balance your web site (if it is charged enough)
And everything around compacting/bundling/minimizing your code.
Hope it helps,

Is it realistic for a dedicated server to send out many requests per second?

TL;DR
Is it appropriate for a (dedicated) web server to be sending many requests out to other servers every second (naturally with permission from said server)?
I'm asking this purely to save myself spending a long time implementing an idea that won't work, as I hope that people will have some more insight into this than me.
I'm developing a solution which will allow clients to monitor the status of their server. I need to constantly (24/7) obtain more recent logs from these servers. Unfortunately, I am limited to getting the last 150 entries to their logs. This means that for busy clients I will need to poll their servers more.
I'm trying to make my solution scalable so that if it gets a number of customers, I won't need to concern myself with rewriting it, so my benchmark is 1000 clients, as I think this is a realistic upper limit.
If I have 1000 clients, and I need to poll their servers every, let's give it a number, two minutes, I'm going to be sending requests off to more than 8 servers every second. The returned result will be on average about 15,000 characters, however it could go more or less.
Bearing in mind this server will also need to cope with clients visiting it to see their server information, and thus will need to be lag-free.
Some optimisations I've been considering, which I would probably need to implement relatively early on:
Only asking for 50 log items. If we find one already stored (They are returned in chronological order), we can terminate. If not, we throw out another request for the other 100. This should cut down traffic by around 3/5ths.
Detecting which servers get more traffic and requesting their logs less commonly (i.e. if a server only gets 10 logged events every hour, we don't want to keep asking for 150 every few minutes)
I'm basically asking if sending out this many requests per second is considered a bad thing and whether my future host might start asking questions or trying to throttle my server. I'm aiming to go shared for the first few customers, then if it gets popular enough, move to a dedicated server.
I know this has a slight degree of opinion enabled, so I fear that it might be a candidate for closure, but I do feel that there is a definite degree of factuality required in the answer that should make it an okay question.
I'm not sure if there's a networking SE or if this might be more appropriate on SuperUser or something, but it feels right on SO. Drop me a comment ASAP if it's not appropriate here and I'll delete it and post to a suggested new location instead.
You might want to read about the C10K Problem. The article compares several I/O strategies. A certain number of threads that each handle several connections using nonblocking I/O is the best approach imho.
Regarding your specific project I think it is a bad idea to poll for a limited number of log items. When there is a peak in log activity you will miss potentially critical data, especially when you apply your optimizations.
It would be way better if the clients you are monitoring pushed their new log items to your server. That way you won't miss something important.
I am not familiar with the performance of ASP.NET so I can't answer if a single dedicated server is enough. Especially because I do not know what the specs of your server are. Using a reasonable strong server it should be possible. If it turns out to be not enough you should distribute your project across multiple servers.

How common is web farming/gardens? Should i design my website for it? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I'm running a ASP.NET website, the server both reads and writes data to a database but also stores some frequently accessed data directly in the process memory as a cache. When new requests come in they are processed depending on data in the cache before it's written to the DB.
My hosting provider suddenly decided to put their servers under a load balancer. This means that my caching system will go bananas as several servers randomly processes the requests. So i have to rewrite a big chunk of my application only to get worse performance since i now have to query the database instead of a lightning fast in memory variable check.
First i don't really see the point of distributing the load on the iis server as in my experience DB queries are most often the bottleneck, now the DB has to take even more banging. Second, it seems like these things would require careful planning, not just something a hosting provider would set up for all their clients and expect all applications to be written to suit them.
Are these sort of things common or was i stupid using the process memory as cache in the first place?
Should i start looking for a new hosting provider or can i expect web farming to arrive sooner or later anywhere? Should I keep transitions like this in consideration for all future apps i write and avoid in process caching and similar designs completely?
(Please don't want to make this into a farming vs not farming battle, i'm just wondering if it's so common that i have to keep it in mind when developing.)
I am definitely more of a developer than a network/deployment guru. So while I have a reasonably good overall understanding of these concepts (and some firsthand experience with pitfalls/limitations), I'll rely on other SO'ers to more thoroughly vet my input. With that caveat...
First thing to be aware of: a "web farm" is different from a "web garden". A web farm is usually a series of (physical or virtual) machines, usually each with a unique IP address, behind some sort of load-balancer. Most load balancers support session-affinity, meaning a given user will get a random machine on their first hit to the site, but will get that same machine on every subsequent hit. Thus, your in-memory state-management should still work fine, and session affinity will make it very likely that a given session will use the same application cache throughout its lifespan.
My understanding is a "web garden" is specific to IIS, and is essentially "multiple instances" of the webserver running in parallel on the same machine. It serves the same primary purpose as a web farm (supporting a greater number of concurrent connections). However, to the best of my knowledge it does not support any sort of session affinity. That means each request could end up in a different logical application, and thus each could be working with a different application cache. It also means that you cannot use in-process session handling - you must go to an ASP Session State Service, or SQL-backed session configuration. Those were the big things that bit me when my client moved to a web-garden model.
"First i don't really see the point of distributing the load on the iis server as in my experience DB queries are most often the bottleneck". IIS has a finite number of worker threads available (configurable, but still finite), and can therefore only serve a finite number of simultaneous connections. Even if each request is a fairly quick operation, on busy websites, that finite ceiling can cause slow user experience. Web farms/gardens increases that number of simultaneous requests, even if it doesn't perfectly address leveling of CPU load.
"Are these sort of things common or was i stupid using the process memory as cache in the first place? " This isn't really an "or" question. Yes, in my experience, web farms are very common (web gardens less so, but that might just be the clients I've worked with). Regardless, there is nothing wrong with using memory caches - they're an integral part of ASP.NET. Of course, there's numerous ways to use them incorrectly and cause yourself problems - but that's a much larger discussion, and isn't really specific to whether or not your system will be deployed on a web farm.
IN MY OPINION, you should design your systems assuming:
they will have to run on a web farm/garden
you will have session-affinity
you will NOT have application-level-cache-affinity
This is certainly not an exhaustive guide to distributed deployment. But I hope it gets you a little closer to understanding some of the farm/garden landscape.

Networking problems in games

I am looking for networking designs and tricks specific to games. I know about a few problems and I have some partial solutions to some of them but there can be problems I can't see yet. I think there is no definite answer to this but I will accept an answer I really like. I can think of 4 categories of problems.
Bad network
The messages sent by the clients take some time to reach the server. The server can't just process them FCFS because that is unfair against players with higher latency. A partial solution for this would be timestamps on the messages but you need 2 things for that:
Be able to trust the clients clock. (I think this is impossible.)
Constant latencies you can measure. What can you do about variable latency?
A lot of games use UDP which means messages can be lost. In that case they try to estimate the game state based on the information they already have. How do you know if the estimated state is correct or not after the connection is working again?
In MMO games the server handles a large amount of clients. What is the best way for distributing the load? Based on location in game? Bind a groups of clients to servers? Can you avoid sending everything through the server?
Players leaving
I have seen 2 different behaviours when this happens. In most FPS games if the player who hosted the game (I guess he is the server) leaves the others can't play. In most RTS games if any player leaves the others can continue playing without him. How is it possible without dedicated server? Does everyone know the full state? Are they transfering the role of the server somehow?
Access to information
The next problem can be solved by a dedicated server but I am curious if it can be done without one. In a lot of games the players should not know the full state of the game. Fog-of-war in RTS and walls in FPS are good examples. However, they need to know if an action is valid or not. (Eg. can you shoot me from there or are you on the other side of the map.) In this case clients need to validate changes to an unknown state. This sounds like something that can be solved with clever use of cryptographic primitives. Any ideas?
Cheating
Some of the above problems are easy in a trusted client environment but that can not be assumed. Are there solutions which work for example in a 80% normal user - 20% cheater environment? Can you really make an anti-cheat software that works (and does not require ridiculous things like kernel modules)?
I did read this questions and some of the answers https://stackoverflow.com/questions/901592/best-game-network-programming-articles-and-books but other answers link to unavailable/restricted content. This is a platform/OS independent question but solutions for specific platforms/OSs are welcome as well.
Thinking cryptography will solve this kind of problem is a very common and very bad mistake: the client itself of course have to be able to decrypt it, so it is completely pointless. You are not adding security, you're just adding obscurity (and that will be cracked).
Cheating is too game specific. There are some kind of games where it can't be totally eliminated (aimbots in FPS), and some where if you didn't screw up will not be possible at all (server-based turn games).
In general network problems like those are deeply related to prediction which is a very complicated subject at best and is very well explained in the famous Valve article about it.
The server can't just process them FCFS because that is unfair against players with higher latency.
Yes it can. Trying to guess exactly how much latency someone has is no more fair as latency varies.
In that case they try to estimate the game state based on the information they already have. How do you know if the estimated state is correct or not after the connection is working again?
The server doesn't have to guess at all - it knows the state. The client only has to guess while the connection is down - when it's back up, it will be sent the new state.
In MMO games the server handles a large amount of clients. What is the best way for distributing the load? Based on location in game?
There's no "best way". Geographical partitioning works fairly well, however.
Can you avoid sending everything through the server?
Only for untrusted communications, which generally are so low on bandwidth that there's no point.
In most RTS games if any player leaves the others can continue playing without him. How is it possible without dedicated server? Does everyone know the full state?
Many RTS games maintain the full state simultaneously across all machines.
Some of the above problems are easy in a trusted client environment but that can not be assumed.
Most games open to the public need to assume a 100% cheater environment.
Bad network
Players with high latency should buy a new modem. I don't think its a good idea to add even more latency because one person in the game got a bad connection. Or if you mean minor latency differences, who cares? You will only make things slower and complicated if you refuse to FCFS.
Cheating: aimbots and similar
Can you really make an anti-cheat software that works? No, you can not. You can't know if they are running your program or another program that acts like yours.
Cheating: access to information
If you have a secure connection with a dedicated server you can trust, then cheating, like seeing more state than allowed, should be impossible.
There are a few games where cryptography can prevent cheating. Card games like poker, where every player gets a chance to 'shuffle the deck'. Details on wikipedia : Mental Poker.
With a RTS or FPS you could, in theory, encrypt your part of the game state. Then send it to everyone and only send decryption keys for the parts they are allowed to see or when they are allowed to see it. However, I doubt that in 2010 we can do this in real time.
For example, if I want to verify, that you could indeed be at location B. Then I need to know where you came from and when you were there. But if you've told me that before, I knew something I was not allowed to know. If you tell me afterwards, you can tell me anything you want me to believe. You could have told me before, encrypted, and give me the decryption key when I need to verify it. That would mean, you'll have to encrypt every move you make with a different encryption key. Ouch.
If your not implementing a poker site, cheating won't be your biggest problem anyway.
With a lot of people accessing games on mobile devices, a "bad network" can occur when a player is in an area of poor reception or they're connected to a slow-wifi connection. So it's not just a problem of people connecting in sparsely populated areas. With mobile clients "bad networks" can occur very very often and it's usually EXTREMELY hard to diagnose.
UDP results in packet loss, but even games that use TCP and HTTP based can experience problems where the client & server communication slows to a crawl while packets are verified to have been sent. With communication UDP compensation for packet loss USUALLY depends on what the packets contain. If you're talking about motion data, usually if packets aren't received, the server interpolates the previous trajectory and makes a position change. Usually it's custom to the game how this is handled, which is why people often avoid UDP unless their game type requires it. Often to handle high network latency, problems games will automatically degrade the amount of features available to the users so that they can still interact with the game without causing the user to get kicked or experience too many broken features.
Optimally you want to have a logging tool like Loggly available that can help you find errors related to bad connection and latency and show you the conditions on the clients and server at the time they happened, this visibility lets you diagnose common problems users experience and develop strategies to address them.
Players leaving
Most games these days have dedicated servers, so this issue is mostly moot. However, sometimes yes, the server can be changed to another client.
Cheating
It's extremely hard to anticipate how players will cheat and create a cheat-proof system no one can hack. These days, a lot of cheat detection strategies are based on heuristic analysis of logging and behavioral analytics information data to spot abnormalities when they happen and flag it for review. You definitely should try to cheat-proof as much as is reasonable, but you also really need an early detection system that can spot new flaws people are exploiting.

ASP.Net Web Farm Monitoring

I am looking for suggestions on doing some simple monitoring of an ASP.Net web farm as close to real-time as possible. The objectives of this question are to:
Identify the best way to monitor several Windows Server production boxes during short (minutes long) period of ridiculous load
Receive near-real-time feedback on a few key metrics about each box. These are simple metrics available via WMI such as CPU, Memory and Disk Paging. I am defining my time constraints as soon as possible with 120 seconds delayed being the absolute upper limit.
Monitor whether any given box is up (with "up" being defined as responding web requests in a reasonable amount of time)
Here are more details, things I've tried, etc.
I am not interested in logging. We have logging solutions in place.
I have looked at solutions such as ELMAH which don't provide much in the way of hardware monitoring and are not visible across an entire web farm.
ASP.Net Health Monitoring is too broad, focuses too much on logging and is not acceptable for deep analysis.
We are on Amazon Web Services and we have looked into CloudWatch. It looks great but messages in the forum indicate that the metrics are often a few minutes behind, with one thread citing 2 minutes as the absolute soonest you could expect to receive the feedback. This would be good to have for later analysis but does not help us real-time
Stuff like JetBrains profiler is good for testing but again, not helpful during real-time monitoring.
The closest out-of-box solution I've seen is Nagios which is free and appears to measure key indicators on any kind of box, including Windows. However, it appears to require a Linux box to run itself on and a good deal of manual configuration. I'd prefer to not spend my time mining config files and then be up a creek when it fails in production since Linux is not my main (or even secondary) environment.
Are there any out-of-box solutions that I am missing? Obviously a windows-based solution that is easy to setup is ideal. I don't require many bells and whistles.
In the absence of an out-of-box solution, it seems easy for me to write something simple to handle what I need. I've been thinking a simple client-server setup where the server requests a few WMI metrics from each client over http and sticks them in a database. We could then monitor the metrics via a query or a dashboard or something. If the client doesn't respond, it's effectively down.
Any problems with this, best practices, or other ideas?
Thanks for any help/feedback.
UPDATE: We looked into Cloudwatch a bit more and we may focus on trying it out. This forum post is the most official thing I can find. In it, an Amazon representative says that the offical delay window for data is 4 minutes. However, the user says that 2 minute old data is always reliable and 1 minute is sometimes reliable. We're going to try it out and hope it is enough for our needs.
Used Quest software and it seemed to be a good monitoring solution. Here is a link.
http://www.quest.com/application-performance-monitoring-solutions/
Also performance monitoring of Windows may also help.

Resources