I have a simple web app with one endpoint which goes through 2 others http call & 2 database access. When I run some naïve load test with siege on my laptop I observe surprising result: a clearly low throughput and when profiling a high level on lock contention. I remove all the http calls and even if the throughput increase but it's not incredible, I observe the same high level of lock contention.
Here is the result of my profiling:
What I misunderstand is that the majority of my lock contention is done outside my user code. How can I diagnose this? Any idea about the method to find the root cause?
We use .net 6 - Ef 6 - SqlClient 3.
I suspect this at the beginning https://github.com/dotnet/SqlClient/issues/422 , upgrade to SQL client 5 and disable MARS but same behavior. My env is macOS.
Related
I am trying to stress test a IIS running a AspNet core App.
to do this i setup a Thread Group with 100 workers
In the thread group I use a Loop Controller
in the loop controller I use a Access Log sampler in order to replay real Get requests obtained from NCSA formatted logfile.
I am amazed to see that i obtain as total throughput only 100 request per sec.
how can i check if this is a limitation of jmeter or if this is a limitation of my web App ?
I would expect jmeter to blast my server and see target CPU shoot at 100%. or shall I increas again already high value of 100 threads ?
Total throughput is 86 requests per second
100 users might be not enough to "blast" your IIS instance, I would rather recommend going for stress test, i.e. start with 1 user and gradually increase the load until throughput starts decreasing or errors start occurring, whatever comes the first. Moreover it might be the case that the bottleneck is not CPU usage, it may be somewhere else, in case of incorrect configuration or inefficient algorithms the web application may not fully utilize underlying OS and hardware resources
Don't use GUI mode for testing, it's only for test development and debugging, when it comes to execution you should be running JMeter tests in command-line non-GUI mode
There are 464 errors, check jmeter.log file for any suspicious entries
I don't think you can really replay your access log, it can be used only for something simple like static HTML pages, if there is authentication or dynamic parameters it might be the case all your requests are hitting the same login page, you can try running your test using View Results Tree listener and inspect the responses to ensure that your test is doing what it's supposed to be doing
I'm executing a load test against an application hosted in Azure. It's a cloud service with 3 instances behind an internal load balancer (Hash based load balancing mode).
When I execute the load test, it queues request even though the req/sec and total current request to IIS is quite low. I'm not sure what could be the problem.
Any suggestions?
Adding few screenshot of performance counters which might help you take decision.
Click on image to view original image.
Edit-1: Per request from Rohit Rajan,
Cloud Service is having 2 instances (meaning 2 VMs), each of them having 14 GBs of RAM and 8 cores.
I'm executing a Step load pattern start with 100 and add 100,150 user every 5 minutes, till 4-5 hours until the load reaches to 10,000 VUs.
Any call to external system are written async. Database calls are synchronous.
There is no straight forward answer to your question. One possible way would be to explore additional investigation options.
Based on your explanation, there seems to be a bottleneck within the application which is causing the requests to queue-up.
In order to investigate this, collect a memory dump when you see the requests queuing up and then use DebugDiag to run a hang analysis on it.
There are several ways to gather the memory dump.
Task Manager
Procdump.exe
Debug Diagnostics
Process Explorer
Once you have the memory dump you can install debug diag and then run analysis on it. It will generate a report which can help you get started.
Debug Diagnostics download: https://www.microsoft.com/en-us/download/details.aspx?id=49924
we're currently having issue in our production servers and would like to try to replicate the issue in our dev. I'm currently awaiting access to our Performance Monitoring Tool, and while waiting would like to play with it a little.
I'm thinking of, since I suspect a host throttling in prod, forcing hosts to throttle in dev and see if it will recreate the issue.
Is there a way to do this?
As others have mentioned, monitoring of the throttling counters and other counters like memory and WIP messages is a must to see what is going on in your production server. Also would recommend that set up a SCOM alert on throttling states of 3+ (publishing + delivery states), if you have SCOM.
Message throughput can grind to a halt on especially the memory (4, 5) and Queue Size (6) states. States 1+2 are generally short lived (e.g. arrival of a large batch of messages) and Biztalk recovers within a few seconds.
Simulating the memory state in your Dev environment should be straightforward by tweaking the throttling thresholds (obviously not something to be taken lightly in production!)
e.g. to trigger the Memory threshold states - AFAIK the lowest memory usage threshold you can set is 101MB. Running a load test in dev should then be able reproduce the throttle.
There is also apparantly a user-based throttling override to set states 10 and 11 although haven't actually tried this.
Some other experience on avoiding throttling:
(Caveat - I don't have an active BizTalk 2006/R2 setup - this is for 2009 / 2010)
If you do a lot of asynchronous processing (e.g. Queue receives), ensure that you have split functionality into separate Hosts for Receive, Processing and Send hosts. This way you can adjust the throttling for asynch Receive hosts to trigger much earlier than the processing and sending hosts - this should have the effect of constricting new incoming messages to the messagebox but allowing existing messages to complete processing.
On 64 bit hosts, the default 25% memory host usage throttling level is usually an unnecessary liability - we increased this using Yossi Dahan's recommendation of 50% on a 4GB server
Note that suspended messages count toward throttling state 6 - ensure that you have a strategy for dealing with suspended messages (and obviously ensure that the Sql Agent jobs are running!).
I have an IIS Web Server that hosts 400 web applications (distributed across 30 application pools). They are both ASP.NET applications and WCF Services end points. The server has 32GB of RAM and is usually running fast; although it's running at 95% memory usage. Worker processes each take between 500MB and 1.5GB of RAM.
I also have another box running SQL Server. That one has plenty of free memory.
Sometimes, the Web Server starts throwing SQL Timeout exceptions. A few per minutes at first and rapidly increasing to hundreds per minute; effectively making the server down. This problem affects applications in all pools. Some requests still complete but most of them don't. While this happens the CPU usage on the server is around 30% (which is the normal load on that box).
While this is happening, we can still use SQL Server Management Studio (from the IIS Server) to execute requests successfully (and fast).
The fix is to restart IIS. And then everything goes back to normal until the next time.
Because the server is running with very low memory, I feel like this is the cause. But I cannot explain the relationship between low memory and sudden bursts of SQL Timeout exceptions.
Any idea?
Memory pressure can trigger paging and garbage collection. Both introduce latency which would not be present otherwise.
GC'ing 32GB of data can take seconds. Why would all app processes GC at the same time? Because at about 95% memory utilization Windows sets a "low memory" event that the CLR listens to. It will try to release memory to help other processes.
If the applications get into a paging frenzy that would also explain huge delays in normal execution.
This is just guessing, though. You can try proving it by looking at the "Hard page faults/sec" counter. There also must be a counter for "full GC" or "Gen 2 GC".
The fix would be running at a higher margin to the physical memory limit.
The first problem is to discover where the timeout is happening. Can you tell from the stack trace if the timeout is happening when executing a request against the database, or when connecting to the database? (Or even connecting to the web server?)
Timeouts executing database requests can be a variety of causes. The problem might be in the database with blocking processes, database maintenance (also locking), deadlocks, etc. When apps are running slowly, do you see a lot of entries in sys.dm_exec_requests, and if so, what are their wait_types?
Even if you can run SQL in the query window while the web server is timing out, that doesn't mean there isn't massive blocking or deadlocking going on.
If it is a timeout connecting to the database, then it is possible the ADO connection pools are overwhelmed and not getting cleaned up, or the database has a connection limit, and the web services are timing out waiting for a connection.
One of the best ways to find out what is going on is to capture a memory dump of the w3wp.exe process and analyze it. Even if you aren't adept at a debugger like WinDbg, Microsoft's DebugDiag tool can produce some nice reports with helpful information.
SqlCommand.CommandTimeout
This property is the cumulative time-out for all network reads during command execution or processing of the results. A time-out can still occur after the first row is returned, and does not include user processing time, only network read time.
It is a client based time out. If stuff is getting queued due to memory constraints then that could cause a timeout.
Are you retrieving a lot of data from these queries?
If some queries return a lot of data consider breaking them up and give the user a next and prior button.
Have you considered asynch like BeginExecuteReader?
The advantage is no timeout.
It does not release the calling thread.
isExecutingFTSindexWordOnce = true;
sqlCmdFTSindexWordOnce.BeginExecuteNonQuery(callbackFTSindexWordOnce, sqlCmdFTSindexWordOnce);
// isExecutingFTSindexWordOnce set to false in the callback
Debug.WriteLine("Calling thread active");
But I agree with your comment how to respond to the request as the answer does not come back to the calling thread.
Sorry I am used to WPF where I just update a public property on the call back.
We have a fairly popular site that has around 4 mil users a month. It is hosted on a Dedicated Box with 16 gb of Ram, 2 procc with 24 cores.
At any given time the CPU is always under 40% and the memory is under 12 GB but at the highest traffic we see a very poor performance. The site is very very slow. We have 2 app pools one for our main site and one for our forum. Only the site is being slow. We don't have any restrictions on cpu or memory per app pool.
I have looked at he Performance counters and I saw something very interesting. At our peek time for some reason Request are being queued. Overall context switching numbers are very high around 30 - 110 000 k.
As i understand high context switching is caused by locks. Can anyone give me an example code that would cause a high number of context switches.
I am not too concerned with the context switching, and i don't think the numbers are huge. You have a lot of threads running in IIS (since its a 24 core machine), and higher context switching numbers re expected. However, I am definitely concerned with the request queuing.
I would do several things and see how it affects your performance counters:
Your server CPU is evidently under-utilized, since you run below 40% all the time. You can try to set a higher value of "Threads per processor limit" in IIS until you get to a 50-60% utilization. An optimal value of threads per core by the books is 20, but it depends on the scenario, and you can experiment with higher or lower values. I would recommend trying setting a value >=30. Low CPU utilization can also be a sign of blocking IO operations.
Adjust the "Queue Length" settings in IIS properties. If you have configured the "Threads per processor limit" to be 20, then you should configure the Queue Length to be 20 x 24 cores = 480. Again, if the requests are getting Queued, that can be a sign that all your threads are blocked serving other requests or blocked waiting for an IO response.
Don't serve your static files from IIS. Move them to a CDN, amazon S3 or whatever else. This will significantly improve your server performance, because 1,000s of Server requests will go somewhere else! If you MUST serve the files from IIS, than configure IIS file compression. In addition use expire headers for your static content, so they get cached on the client, which will save a lot of bandwidth.
Use Async IO wherever possible (reading/writing from disk, db, network etc.) in your ASP.NET controllers, handlers etc. to make sure you are using your threads optimally. Blocking the available threads using blocking IO (which is done in 95% of the ASP.NET apps i have seen in my life) could easily cause the thread pool to be fully utilized under heavy load, and Queuing would occur.
Do a general optimization to prevent the number of requests that hit your server, and the processing time of single requests. This can include Minification and Bundling of your CSS/JS files, refactoring your Javascript to do less roundtrips to the server, refactoring your controller/handler methods to be faster etc. I have added links below to Google and Yahoo recommendations.
Disable ASP.NET debugging in IIS.
Google and Yahoo recommendations:
https://developers.google.com/speed/docs/insights/rules
https://developer.yahoo.com/performance/rules.html
If you follow all these advices, i am sure you will get some improvements!