Performance tip: x% of this request was spent in waiting - azure-application-insights

When reviewing Application Insights for slow API requests I noticed a message stating: "98.49% of this request was spent in waiting.". I'm finding next to no explanation about this online.
What does this mean? What is it waiting for?
How can I fix it?

Application Insights collects performance details for the different operations in your application. By identifying those operations with the longest duration, you can diagnose potential problems or best target your ongoing development to improve the overall performance of the application.
The Performance Tip at the top of the screen supports the assessment that the excessive duration is due to waiting. Click the waiting link for documentation on interpreting the different types of events.
These are all indication of slow server operations.
You can read more about here. Also please look for the event which is causing waiting time duration and then work accordingly.
Let me know if you need any help related to fix perf issue.

Related

app insights think time does not take into account think times

I am executing web tests in App Insights as availability tests.
The problem is that, those web test contain requests with certain think time.
For the tests I am doing the think time is crucial.
The problem I have is that seems that Application Insights does not take into account the think time values, so I don't see any way to pause the request calls within a web test.
Is there any way to make think times work in App Insights? Is it foreseen to solve this issue soon? Is there any recommendation or workaround?
This question was answered here on MSDN.
The answer provided there:
"At this point - we do not have plans on supporting arbitrary think times. We ourselves, and some customers work around this by calling a controller that can take a parameter on the duration it waits on before responding, from the web test.
Hope this helps.
"

Building a scalable http client in Java that fires 10k http requests per minute

In our application we need to make around 10k REST api calls to 10k different endpoints URLs per minute. Earlier I was using synchronous model, but quickly realized that I can not scale beyond ~2k+ limit, so I am working on switching to an async model. Using HttpCore-NIO lib I could scale upto 5k or so, but beyond that I randomly get an error 'I/O reactor has been shut down' and the entire app basically stops processing requests. Don't see any stack trace either, which makes it extremely hard to debug.
So I am trying to evaluate what could be the best strategy/library to achieve this scale with Java as the programming language. Any suggestions on which libraries out there should I look into ?
If your machine itself is not stressed out on CPU, Network etc, then perhaps one thing that can help is horizontal scaling - if you can achieve 5K with 1 process, try firing requests from two processes; see what you get; how far you can horizontally scale. The extra coding to achieve horizontal scale should not be that much.
OR try trapping exceptions and ignoring them, to prevent I/O reactor shutdown. Experiment with what can be ignored. See example here https://hc.apache.org/httpcomponents-core-4.4.x/tutorial/html/nio.html#d5e477, search for "I/O reactor exception handling"
Experiment with thread count - setIoThreadCount
Make sure you are not doing too much processing with the HTTP responses; that would hold up the reactor.

Slow Transactions - WebTransaction taking the hit. What does this mean?

Trying to work out why some of my application servers have creeped up over 1s response times using newrelic. We're using WebApi 2.0 and MVC5.
As you can see below the bulk of the time is spent under 'WebTransaction'. The throughput figures aren't particularly high - what could be causing this, and what are the steps I can take to reduce it down?
Thanks
EDIT I added transactional tracing to this function to get some further analysis - see below:
Over 1 second waiting in System.Web.HttpApplication.BeginRequest().
Any insight into this would be appreciated.
Ok - I have now solved the issue.
Cause
One of my logging handlers which syncs it's data to cloud storage was initializing every time it was instantiated, which also involved a call to Azure table storage. As it was passed into the controller in question, every call to the API resulted in this instantiate.
It was a blocking call, so it added ~1s to every call. Once i configured this initialization to be server life-cycle wide,
Observations
As the blocking call was made at the time of the Controller being build (due to Unity resolving the dependancies at this point) New Relic reports this as
System.Web.HttpApplication.BeginRequest()
Although I would love to see this a little granular, as we can see from the transactional trace above it was in fact the 7 calls to table storage (still not quite sure why it was 7) that led me down this path.
Nice tool - my new relic subscription is starting to pay for itself.
It appears that the bulk of time is being spent in Account.NewSession. But it is difficult to say without drilling down into your data. If you need some more insight into a block of code, you may want to consider adding Custom Instrumentation
If you would like us to investigate this in more depth, please reach out to us at support.newrelic.com where we will have you account information on hand.

Azure SqlException: Database on server is not currently available

Our site has been running for a few weeks in Azure without getting this error:
SqlException: Database 'database' on server 'server' is not currently
available. Please retry the connection later. If the problem
persists, contact customer support, and provide them the session
tracing ID of 'guid'.
It finally got that one day when there were a little over 2K of active (concurrent) users. This is the closest question that I can find in SO. We are not using EF though but rather we're using Dapper. I'm out of ideas how to debug our application to find out what caused the issue, and it's even harder now that the issue has not come up for the past 2 days. I definitely need to be on the lookout and I need you guys, any tip, on where I should be looking into, what I need to do to determine the cause of the issue, and possibly fix it.
It sounds like you need to handle transient failures via some sort of transient fault handling mechanism. Here is post asking a similar question:
SQL Azure Database retry logic David's answer is similar to the approach we took do deal with the issue.
Here is another link to some code that is similar to the David's and our solution to get your head around it. http://www.getcodesamples.com/src/4A7E4E66/41D6FAD
We had similar issues when we first moved to SQL Azure but by implementing back-off retry logic for the transient connection issues the majority of the time it recovers after a few seconds.
We went down the path of handling transient errors with the Azure Transient Fault Block, but this caused bigger issues - namely, if you reach the SQL connection limit (easy to do), having retry logic in place only makes things worse.
If it only happens once a month, I'd leave it be, and just handle it gracefully higher up the stack. An alternative is to create a custom retry policy to avoid retrying on certain errors, but it may still do more harm than good.

How useful is Response.IsClientConnected?

I was wondering if anyone had experience they could share using the Response.IsClientConnected property as a performance optimization for asp.net websites.
The reason I ask is that I am a bit skeptical on how effective it would be in real life scenarios. I understand the concept of checking the value before performing a large task but I just can't see how useful this would be as clients could disconnect at any point time.
I think the main usage would be for optimizing the delivery of long processes. For example, if you had to generate a huge report or something, you might run the report in a separate thread and then periodically check to see if the user is still connnected. If not, you could kill this long running process so that it is not running needlessly since the user is no longer expecting a response.
This helps to prevent users from starting long processes and then making more requests over and over because they might think it is slow or something. If you were not doing this type of checking, you could tax your server due to all the requests even though all but one is valid. This scenario could be handled by allowing only one user to run one long running task, but it would also help in a multi-user environment as well to make sure you are only spending time serving up requests where the user is still connected and waiting for the response.
Note: I have never actually used this before, this is just based on my very basic understanding of what I have read.
I have used this extensively in my applications and it can give you a huge saving on resources.
Try this: create a page that needs -some- time to complete and try refresh it many many times before it complete. You will see that requests are queued to be executed. Imagine a user that has a slow connection and refreshes his page many many times thinking this will fetch the page (a very common issue from what a site can die out of resources when all users are connected and for some reason it becomes slow).
Now, change it and at the start of each page load, (or sooner at page init) check if HttpContext.Current.Response.IsClientConnected and in the case that he is not connetced throw a threadabord exception. You will see, your site will respond much sooner.
Actually I check if client is connected before any heavy action on the page so as to prevent needless executions. In production environments, I have seen that especially in cases where the system becomes slow, this validation will help much.

Resources