I am looking for suggestions on doing some simple monitoring of an ASP.Net web farm as close to real-time as possible. The objectives of this question are to:
Identify the best way to monitor several Windows Server production boxes during short (minutes long) period of ridiculous load
Receive near-real-time feedback on a few key metrics about each box. These are simple metrics available via WMI such as CPU, Memory and Disk Paging. I am defining my time constraints as soon as possible with 120 seconds delayed being the absolute upper limit.
Monitor whether any given box is up (with "up" being defined as responding web requests in a reasonable amount of time)
Here are more details, things I've tried, etc.
I am not interested in logging. We have logging solutions in place.
I have looked at solutions such as ELMAH which don't provide much in the way of hardware monitoring and are not visible across an entire web farm.
ASP.Net Health Monitoring is too broad, focuses too much on logging and is not acceptable for deep analysis.
We are on Amazon Web Services and we have looked into CloudWatch. It looks great but messages in the forum indicate that the metrics are often a few minutes behind, with one thread citing 2 minutes as the absolute soonest you could expect to receive the feedback. This would be good to have for later analysis but does not help us real-time
Stuff like JetBrains profiler is good for testing but again, not helpful during real-time monitoring.
The closest out-of-box solution I've seen is Nagios which is free and appears to measure key indicators on any kind of box, including Windows. However, it appears to require a Linux box to run itself on and a good deal of manual configuration. I'd prefer to not spend my time mining config files and then be up a creek when it fails in production since Linux is not my main (or even secondary) environment.
Are there any out-of-box solutions that I am missing? Obviously a windows-based solution that is easy to setup is ideal. I don't require many bells and whistles.
In the absence of an out-of-box solution, it seems easy for me to write something simple to handle what I need. I've been thinking a simple client-server setup where the server requests a few WMI metrics from each client over http and sticks them in a database. We could then monitor the metrics via a query or a dashboard or something. If the client doesn't respond, it's effectively down.
Any problems with this, best practices, or other ideas?
Thanks for any help/feedback.
UPDATE: We looked into Cloudwatch a bit more and we may focus on trying it out. This forum post is the most official thing I can find. In it, an Amazon representative says that the offical delay window for data is 4 minutes. However, the user says that 2 minute old data is always reliable and 1 minute is sometimes reliable. We're going to try it out and hope it is enough for our needs.
Used Quest software and it seemed to be a good monitoring solution. Here is a link.
http://www.quest.com/application-performance-monitoring-solutions/
Also performance monitoring of Windows may also help.
Related
I was wondering if someone could discuss some methods of possibly handling packet loss between a .NET WinForms application and our ASP.NET Web Services. Currently we have sales people who use our software that are going over 4G or 3G air cards and the application has problems occasionally, if they don't have full bars.
I can think of two hypothetical ways to handle this, but not really sure how to implement the first one (assuming it's even possible):
Can HTTP resuming be used somehow to resume an HTTP call?
What about handling an exception and re-issuing the web service call X number of times? Unfortunately I'm not sure what exception they're getting. I've done a fair amount of testing with their air cards and I don't get the errors that they get when they're on the road.
Also, is there some sort of tool that can be used to simulate packet loss? This would make testing a lot easier.
Thanks
-Adam
You can use Wireshark which is a great tool to find packet loss and re-transmissions and also lot other problems pertaining to packets. You can get some good starting guide videos from YouTube too.
Part of this question is I'm not even sure what exactly I'll need to ask, so I'll start with the situation and work out from there.
One project I'm working on involves use of COMET via the aspComet library. The use case of the program is somewhat of a collaborative slideshow. One person runs the bulk of it, with one or more participants able to perform certain actions. Low latency between when an action is performed on screen
Previously, it was just running on one server. Now, we're wanting to scale it out a bit, more for reliability than performance reasons. So, we have some boxes out in Rackspace's cloud and all that fun stuff.
I knew from the start of this that I was going to need to make some changes to the way the COMET stuff works since different people in the same "show" might be on different servers, and I have no way of knowing what "show" they belong to until after they have already arrived on the site.
I initially tackled this using the WCF Mesh provider, which wasn't well documented to start with, now I'm running into issues with dispatching messages to it sometimes get lost, or delayed (I'm not 100% sure what is going on there), but it screws up the long poll for COMET and breaks things in rather strange ways (Clicking a button may trigger an event, or it'll hang for 10 seconds {long poll duration} and not actually do anything).
More research leads me to believe one of the .Net service bus providers may do what I need. However, I can't find examples that would cover what I need:
No single point of failure (outside of a database)
No hardcoding of peers.
Near-realtime (no polling, event based would be best)
My ideal solution would involve that when a server comes up, it lets other servers know of its existence (Even if it's just a row in a table somewhere), and they can start sending broadcast messages among each other, with each server being both a publisher and subscriber. This is what I somewhat had in the WCF Mesh provider, but I'm not overly confident in that code.
Can anyone point me in the right direction with this? Even proper terms to look for in the docs for service bus providers would be good at this point. Or are service buses not what I want? At this point I would settle for setting up a Jabber server on each web server and use that, if it could fit within my constraints.
I can't speak a ton to NServiceBus, but I expect the answers will be similar.
Single point of failure: MSMQ can use multicasting, which means each endpoint will broadcast it's existence and no DB table is needed. RabbitMQ uses this Exchange-to-Queue binding process which means as long as the Rabbit instance or cluster is up then messages still exist. RabbitMQ can be clustered, MSMQ cannot be. *Note: You might have issues with multicasting with Rackspace, no idea how they work. If so, you'll have to fall back on the runtime services for MSMQ (not RabbitMQ), that would create a single point of failure because everyone has a single point to coordinate control messages through.
Hard coding of peers: discussed above a bit; MSMQ's multicast handles it. Rabbit it can also be done, just bind queues to an exchange you want to listen to. MassTransit takes care of this for you.
Near-realtime: These both use messaging which is near real time. There's no polling in your message consumer code.
I think a service bus seems like a reasonable solution for what you're trying. Some more details would likely be needed, but the general messaging approach is correct. There are other more light weight messaging libraries if you decided you just want something on top of RabbitMQ and configure Rabbit to handle most of the stuff.
To get started with MassTransit, we have documentation up: http://readthedocs.org/projects/masstransit/ and mailing list http://groups.google.com/group/masstransit-discuss. Join the mailing list if you have future questions and someone will try and help you out.
I'm profiling a asp(classic) web service. The web service makes database calls, reads/writes to files, and processes xml. On a windows server 2003 box(2.7ghz, 4 core, 4gb ram) how many requests per second should I be able to handle before things start to fail.
I'm building a tool to test this, but I'm looking for a number of requests per second to shoot for.
I know this is fairly vague, but please give the best estimate you can. If you need more information, please ask.
95% of the performance of any data-driven app is dependent on the database: 1) the way you do your calls, 2) the indexes, 3) the hardware under the database (disk subsystem in particular).
I have seen a machine, like you are describing, handle 40 requests per second (2500/minute), but numbers like 10 per second (600/minute) are more common. I would expect even lower if you are running your DB on the same machine, and even lower still if that DB is SQLExpress or MSAccess.
Also, at capacity, your app will probably not fail, but IIS will Queue requests, once it is saturated, and may timeout some of those requests if it can't service them before the timeout expires.
Btw, instead of building a tool to test your app, you may want to look into using a test tool such as Microsoft WCAT. It is pretty smooth and easy to use.
How fast should it be? Fast enough.
How fast is fast enough? That's a question that only you and your users can answer. If your service is horrifically inefficient and keeps up with demand, it's fast enough. If your service is assembly-optimized, lightning-fast, and overwhelmed with requests, it's not fast enough.
If the server is handling its actual workload, then don't worry about how fast it "should" be. When the server is having trouble, or when you anticipate that it soon will, then you should look at improving the code or upgrading the hardware. Remember Knuth's Law – premature optimization is the root of all evil. Any work you do now to make it faster may never pay off, and you may be forced to make compromises with flexivility or maintainability. Remember, too, an older adage – if it ain't broke, don't fix it.
Yes I would also say 10 per second is a good benchmark. For a high performance app you would want to get more than this, but if you have no specific goal you should generally be able to get at least 10 requests per sec for a general web page with a bunch of database queries.
Our app need instant notification, so obvious I should use some some WCF duplex, or socket communication. Problem is the the app is partial trust XBAP, and thus I'm not allowd to use anything but BasicHttpBinding. Therefore I need to poll for changes.
No comes the question: My PM says the update interval should be araound 2 sec, and run on a intranet with 500 users on a single web server.
Does any of you have experience how polling woould influence the web server.
The service is farly simple, it takes a guid as an arg, and returns a list of guids. All data access are cached, so I guess the load on the server is minimal for one single call, but for 500...
Except from the polling, the webserver has little work.
So, based on this little info (assume a standard server HW, whatever that is), is it possible to make a qualified guess?
Is it possible or not to implement this, will it work?
Yes, I know estimating this is difficult, but I'd be really glad if some of you could share some thoughts on this
Regards
Larsi
Don't estimate, benchmark.
Polling.. bad, but if you have no other choice, then it's ideal :)
Bear in mind the fact that you will no doubt have keep-alives on, so you will have 500 permanently-connected users. The memory usage of that will probably be more significant than the processor usage. I can't imagine network access (even in a relatively bloaty web service) would use much network capacity, but your network latency might become an issue - especially as we've all seen web applications 'pause' for a little while.
In the end though, you'll probably be ok, but you'll have to check it yourself. There are plenty of web service stress testers, you can use Microsoft's WAS tool for one, here's a few links to others.
Try using soapui, a web service testing tool, to check the performance of your web service. There is a paid version and an open source version that is free.
cheers
I don't think it will be a particular problem. I'd imagine the response time for each request would be pretty low, unless you're pulling back a hell of a lot of data, so 500 connections spread over 2 seconds shouldn't hit it too hard.
You can use a stress testing tool to verify your webserver can handle the load though, before you commit to this design.
250 qps probably is doable with quite modest hardware and network bandwidth provided you do minimize the data sent back & forth. I assume you're caching these GUID lists on the client so you can just send a small "no updates" response in the normal case.
Should be pretty easy to measure with a simple prototype though to be more confident.
I have a large ASP.NET website on a hosted platform. It shares the machine with a lot of other applications. We do not have access to the machine itself (only an FTP account).
Our client is complaining that it is starting to perform rather badly, particularly around peak hours. I've run some remote measurements (using a JMeter-like tool) that tells me that, yes, it does indeed perform rather badly during peak hours. It doesn't tell me why though. The client is resisting a move to a dedicated server without some hard facts.
As I see it, what I need are hard data about the machine itself. Setting up a local performance test environment would be extremely time-consuming, and I have no way to estimate the server performance.
My question: is there a good way to collect (a lot) of performance measurements when I have limited access to the machine, and certainly no access to the performance monitor? Any code would have to run in the asp.net application itself, without screwing it up too much.
We had a similar problem with our asp.net application hosted on a shared server, which also started to perform badly during peak hours.
Although I don't know of an elegant solution to your question, this is what we did:
Talk to your host providers to see what additional information they can give you - it's in their best interest to keep their clients happy. Our host providers were able to give us some time with one of their network engineers who provided us with some decent CPU and memory utilization stats.
Take your own performance measurements by dumping information to either a log file (using log4net) and/or the database - for example, user sessions, search times, page hits, timing measurements around key functionality. From this information we were able to ascertain what our systems normal behavior was for a set number of automation tests.
Setup a local server (not necessarily same stats as hosted/production server) with your application loaded and give it a full load/performance/capacity testing (we used Red Gate's ANTS Profiler). The stats that you gather from that will give you and your client a good indication of how the system should behave under certain loads with a known environment. Yes, this can be time consuming but it will give you a great performance measuring tool so that you can catch/fix bottlenecks locally rather than on production.
Good luck.