What can I do to speed up my load test using NBomber? (VS LT 250 RPS easily; NBomber maxed with 25 RPS) - .net-core

We've been using Visual Studio Load Test to exercise our .NET Framework 4.7.2 telemetry client where we can set up the load test to post metrics to our Rabbit MQ at a rate of about 250 metrics per second. Recently, we've had to migrate our telemetry client to .NET Core and need to run load testing and verify that it can still post metrics at the same rate. Now, Visual Studio Load Test (VSLT) is being deprecated and has no support for .NET Core framework so we've had to look to something like NBomber to use in place of VSLT.
With regards to NBomber, there doesn't seem to be enough documentation or support that I can get because I've tried all I know and cannot get NBomber to post more than 25 metrics per second. At the same time, I'm seeing 100% CPU usage.
Anyone has any insight to share with me? Thanks in advance for your help,
Tien

Turns out, my logic was bad. A senior developer and friend shared with me some insights where I was initializing a telemetry client for each posting of a metric. This was the key to high consumption of CPU and not allowing me to reach the performance I was expecting. I'm in the process of re-coding my test(s) so that NBomber can be used to initialize 250 telemetry clients posting a minimum of 1MM metrics within an hour. I ran a fix yesterday that posted 17K metrics within 56secs with just 1 telemetry client or of about a rate of 300 RPS. I thought VS LT was awesome, but I'm thinking NBomber is quite impressive.
Cheers to Load Testing with NBomber!!
Tien

If single instance of NBomber is consuming 100% of CPU and not conducting the necessary load you will need to set up another machine and run NBomber in distributed cluster mode
Why do you need cluster?
You reached the point that the capacity of one node is not enough to create a relevant load.
You want to delegate running multiple scenarios to different nodes. For example, you want to test the database by sending in parallel read and write queries. In this case, one node can send inserts and another one can send read queries.
You want to simulate a real production load that requires several nodes to participate. For example, you may have one node that periodically writes data to the Kafka broker and two nodes that constantly read this data from the Redis cache.
Also it seems that Microsoft recommends using Apache JMeter™ so it might worth giving it a try. JMeter is capable of sending messages to various MQ implementations and its documentation is more concise, i.e. see Building a JMS Topic Test Plan

Related

Azure App Insights Operation count is inexplicably high

We are currently monitoring a web API using the data in the Performance page of Application Insights, to give us the number of requests received per operation.
The architecture of our API solution is to use APIM as the frontend and an App Service as the backend. Both instances have App Insights enabled, and we don't see a reasonable correlation between the number of requests to APIM and the requests to the App Service. Also, this is most noticeable only in a couple of operations.
For example,
Apim-GetUsers operation has a count of 60,000 requests per day (APIM's AI instance)
APIM App Insights Performance Page
AS-GetUsers operation has a count of 3,000,000 requests per day (App Service's AI instance)
App Service App Insights Performance Page
Apim-GetUsers routes the request to AS-GetUsers and Apim-GetUsers is the only operation that can call AS-GetUsers.
Given this, I would expect to see ~60,000 requests on the App Service's AI performance page for that operation, instead we see that huge number.
I looked into this issue a little bit and found out about sampling and that some App Insights features use the itemCount property to find the exact number of requests. In summary,
Is my expectation correct, and if so what could cause this? Also, would disabling adaptive sampling and using a fixed sampling rate give me the expected result?
Is my expectation wrong, and if so, what is a good way to get the expected result? Should I not use the Performance page for that metric?
Haven't tried a whole lot yet as I don't have access to play with the settings until I can find a viable solution, but I looked into sampling and itemCount property as mentioned above. APIM sampling is set to 100%.
I ran a query in Log Analytics on the requests table and when I just used the requests count, I got a number that was closer to the one I see in APIM, but when I use a sum of the itemCount, as suggested by some MS docs, I get that huge number as seen in the performance page.
List of NuGet packages and version that you are using:
Microsoft.Extensions.Logging.ApplicationInsights 2.14.0
Microsoft.ApplicationInsights.AspNetCore 2.14.0
Runtime version (e.g. net461, net48, netcoreapp2.1, netcoreapp3.1, etc. You can find this information from the *.csproj file):
netcoreapp3.1
Hosting environment (e.g. Azure Web App, App Service on Linux, Windows, Ubuntu, etc.):
App Service on Windows
Edit 1: Picture of operation_Id and itemCount

How to build a predictive dialer?

I need to build a reliable predictive dialer based on Asterisk. Currently the system we use includes Wombat and Asterisk, and we do not find this solution usable as Wombat provides a poor API and it's impossible to use it without regular manual operations.
The system we want:
Can be used solely via API or direct database queries (adding lists to campaigns, updating lists, starting campaigns, stopping campaigns etc.) so that it can be completely integrated into an existing product
Is free, or paid for annually independent to the usage rate
Is considered stable
Should be able to handle tens of thousands of calls per day, if it matters
Use vicidial.org or hire freelancer to build new core with your needed api.
You can also check OSdial for this, it also developed using asterisk.
We have been working with a preview of the next version of Wombat, through the Early Access program, and Wombat has a complete configuration and reporting JSON API and you can deploy it "headless" in order to scale up to thousands of parallel lines. If you ask Loway they can likely get you access to the Early Access program.
BTW, Vicidial is great for agent-based outbound, but imposes quite a large penalty on the number of agents per server - you cannot reasonably use it to do telecasting at the scale we are looking for as it would require too many servers. Wombat is leaner and can drive over one thousands channel per server. YMMV.
This question would be better placed on a "hire-a-freelancer" site like oDesk ... if you need custom programing done, those are the sorts of places to go to get manpower.
Your specifications are well within what is possible with Asterisk. I'd strongly recommend looking at Vici Dial and OS Dial as others have suggested; out of the can, they are pretty good.
The hard part of any auto-dialer is not the dialer, oddly enough. It's the prediction algorithms, the answering machine detection algorithms and the agent UI. Those are what makes or breaks an auto-dialer application for a company.

How to prevent proxy timeouts with SQL Server Reporting Services

We have a system running Windows Server 2008R2 x64 and SQL Server 2008R2 x64 with SSRS installed/configured. This is a shared reporting server used by a large number of people, with some fairly large inefficient databases (400-500gb of data ish), and these users use the system to generate ad-hoc reports based of a reporting model that sits on top of the aforementioned databases. Note that the users are using NTLM to logon and identify for running reports.
Most reports are quick, but if you are running a report for 1 or 2 years worth of data, they can take a while to return (5minutes ish). This is fine for most users, however some of the users are stuck behind a proxy, which has a connection timeout set at 2minutes. As SSRS 2008R2 does not seem to send back a "keep-alive" signal (confirmed via wireshark), when running one of these long reports the proxy server thinks the connection has died, and as such it just gives up and kills the connection. This gives the user a 401 or 503 error and obviously cancels the report (the incorrect error is a known bug in SSRS which Microsoft refuse to fix).
We're getting a lot of flak from the user's about this, even though it's not really our issue..so I am looking for a creative solution.
So far I have come up with:
1) Discovering some as yet unknown setting for SSRS that can make it keep the connection alive.
2) installing our own proxy in between the users and our reports server, which WILL send a keep-alive back (not sure this will work and it's a bit hacky, just thinking creatively!)
3) re-writing our reports databases to be more efficient (yes this is the best solution, but also incredibly expensive)
4) ask the experts :) :)
We have a call booked in with Microsoft Support to see if they can help - but can any experts on Stack help out? I appreciate that this may be a better question for server fault (and I may post it there) but it's a development question too really :)
Thanks!
A few things:
A. For SSRS overall on it's service:
I personally use a keep alive service as I believe the default recycle is 12 hours for SSRS server. I use a tool someone turned me onto called 'VisualCron' that can do many task processes automatically. You can also just make a call in a WCF service or similar to. Basically I know the first report from a user for the day is generally slow. Usually you need to hit http:// (servername)/ReportServer to keep it alive.
B. For cachine report level items:
If this does not help I would suggest possibly caching DataSets when possible. Some people have data that is up to the moment but for a lot of people that is not the case. You may create a shared dataset in SSRS and then cache that on a schedule. So if you have domain like tables that only need to be updated once in a blue moon put them there. Same with data that is nightly or in batches. If you are transactional based shop that is up to the moment this may not help but for batch based businesses this can help tremendously.
You can also cache the reports for their data as a continuation of this. Under 'Manage' drop down for a report when in the /Reports landing page you can set the data to run under a specific schedule. You can also set a snapshot which is an extension of this as it executes with some default parameters set on a schedule and is a copy of the report when it was ran.
You are mentioning ASP.NET so I am not certain how much some of this will work if you are doing this all through a site you are setting up internally as a pass through. But you could email or save files on a schedule as well through SSRS's subscription service.
C. Change how you store your data for reporting.
You can create a Report Warehouse of select item level values of queries. Create a small database that is just a few recent years of data and only certain fields and certain tables. Then index it to death and report off of that. In my experience this method will fly in terms of performance but it does take the extra overhead of setting it up. Generally most companies will whine about this but it often takes a single day to set up and then you create one SSMS job that does it all nightly or an SSIS package then you don't worry about it. I like this method as I know my data is not being reported off of production and is isolated personally.

Determining what is putting pressure on IIS

I got a dedicated server running both IIS 7.5 and SQL Server 2010. Server CPU load is often near 100%. The SQL server does not take too much but the w3wp process is taking a significant amount of CPU (often 70+%).
I'd like to find out, what is causing this pressure:
* Too many requests of static files (a CDN could be added)
* Too many ajax requests (I am thinking about comet/web sockets anyways)
* Single asp.net pages consuming too much processing power (should be easy to optimize)
Where would you start looking to find out where to start optimizing?
The easiest possible way is to profile the app in production. Not sure if that is possible in your case. Some options:
look into the logs and look at the duration of the requests. Long requests are likely to put load on the system
Remote debug w3wp with Visual Studio and pause the debugger 10 times to see where it stops most. That is the hot spot
Use XPerf or PerfView to capture (managed) stacks. This has almost no impact on production performance
A good starting point would be to fire up the development tools (F12 in IE / Chrome) and look at the timings under the network tab. That will show you a waterfall-style diagram for how the page has loaded and should help you identify any particularly slow-loading static files which might be sensibly moved off to a cdn, any unnecessary requests being made, how much time is being spent getting the actual page itself, etc.
After that, profile the application with a performance profiler. A good profiler like ANTS Performance Profiler will let you look at things like execution time / hit counts for different methods, as well as what database queries are being run and how long they’re taking. A new version of ANTS (currently in EAP) will also group that activity by http request so you can see if specific pages need optimisation or are being hit too many times.
You'd also do well to check that caching is working as you intend it so that users aren’t unnecessarily re-requesting pages.
There's also a nice article on ASP.NET performance which you might want to read at http://aspalliance.com/1533_ASPNET_Performance_Tips.7.
Disclaimer: I work for Red Gate which makes ANTS.
I found an easy way to see what's going on on the server.
Nevertheless, the professional way is probably to go and use a profiling tool.
What did I do?
In IIS Console you can get a list of all current worker threads and if you choose one you can see what this thread is working on. So I was able to see that the thread was handling 100 requests in parallel, 70 of those were tracing back to the same ajax call.
The immediate solution was to reduce the frequency of that call (from every 10 to every 30 seconds). The next step will be to further optimize the call on the server side since I do have other ajax calls with the same frequency (every 10 seconds) which nearly never showed up in the active requests list since they were so fast.
Probably the easiest way to figure it out would be to install New Relic on the server. The trial lasts 30 days I think so it should give you enough time to get to the bottom of this. It'll show you long-running SQL queries, .NET methods, as well as just about everything else you can think of. It makes it very easy to identify bottlenecks.
By the way, I suggested New Relic because it sounds like your problem is in a production environment. New Relic isn't an incredibly detailed profiler. It gathers enough information to be helpful, but not so much as to slow down the server. That makes it well suited to this purpose.
If, however, you could reproduce the problem in a development environment you might try something like the free Eqatec profiler.

IIS 6.0 wildcard mapping benchmarks?

I'm quickly falling in love with ASP.NET MVC beta, and one of the things I've decided I won't sacrifice in deploying to my IIS 6 hosting environment is the extensionless URL. Therefore, I'm weighing the consideration of adding a wildcard mapping, but everything I read suggests a potential performance hit when using this method. However, I can't find any actual benchmarks!
The first part of this question is, do you know where I might find such benchmarks, or is it just an untested assumption?
The second part of the question is in regards to the 2 load tests I ran using jMeter on our dev server over a 100Mbs connection.
Background Info
Our hosting provider has a 4Gbs burstable internet pipe with a 1Gbs backbone for our VLAN, so anything I can produce over the office lan should translate well to the hosting environment.
The test scenario was to load several images / css files, since the supposed performance hit comes when requesting files that are now being passed through the ASP.NET ISAPI filter that would not normally pass through it. Each test contained 50 threads (simulated users) running the request script for 1000 iterations each. The results for each test are posted below.
Test Results
Without wildcard mapping:
Samples: 50,000
Average response time: 428ms
Number of errors: 0
Requests per second: 110.1
Kilobytes per second: 11,543
With wildcard mapping:
Samples: 50,000
Average response time: 429ms
Number of errors: 0
Requests per second: 109.9
Kilobytes per second: 11,534
Both tests were run warm (everything was in memory, no initial load bias), and from my perspective, performance was about even. CPU usage was approximately 60% for the duration of both tests, memory was fine, and network utilization held steady around 90-95%.
Is this sufficient proof that wildcard mappings that pass through the ASP.NET filter for ALL content don't really affect performance, or am I missing something?
Edit: 11 hours and not a single comment? I was hoping for more.. lol
Chris, very handy post.
Many who suggest a performance disadvantage infer that the code processed in a web application is some how different/inferior to code processed in the standard workflow. The base code type maybe different, and sure you'll be needing the MSIL interpreter, but MS has shown in many cases you'll actually see a performance increase in a .NET runtime over a native one.
It's also wise to consider how IIS has to be a "jack of all trades" - allowing all sorts of configuration and overrides even on static files. Some of those are designed for performance increase (caching, compression) and - indeed - will be lost unless you reimplement them in your code, but many of them are for other purposes and may not ever be used. If you build for your needs (only) you can ignore those other pieces and should be realising some kind of performance advantage, even though there's a potential ASP.NET disadvantage.
In my (non-.NET) MVC testing I'm seeing considerable (10x or more) performance benefits over webforms. Even if there was a small hit on the static content - that wouldn't be a tough pill to swallow.
I'm not surprised the difference is almost negligible in your tests, but I'm happy to see it backed up.
NOTE: You can disable wildcard mapping from static directories (I keep all static files in /static/(pics|styles|...)) in IIS. Switch the folder to an application, remove the wildcard mapping, and switch it back from an application and - voilà - static files are handled by IIS without pestering your ASP.NET.
I think there are several additional things to check:
Since we're using the .Net ISAPI filter, we might be using threads used to run application for serving static assets. I would run the same load test while reviewing performance counters of threads - Review this link
I would run the same load test while running Microsoft Performance Analyzer and compare the reports.
I was looking for benchmark like this for a long time. Thanx!
In my company we did wildcard mapping on several web sites (standard web forms, .net1.1 and 2, iis6), and sys admins said to me that they didn't noticed any performance issues.
But, it seems you stressed network, not server. So maybe the scores are so similar because network bottleneck? Just thinking...
That's quite an impressive post there, thanks very much for that.
We're also assessing the security and performance concerns with removing a piece of software that's always been in place to filter out unwanted traffic.
Will there by any further benchmarking on your part?
Cheers,
Karl.
Seems the bottleneck in your test is network utilization. If the performance degradation is expected to be on the CPU usage (I'm not sure it is, but it's reasonable), then you wouldn't notice it with the test you did.
Since this is a complex system, with many variables - it does not mean that there is no performance degradation. It means that in your scenario - the performance degradation is probably negligible.

Resources