getting out of memory with doctrine within Symfony, but it's just like 20 rows with pagination.. How do I start debugging this?
Related
We have a problem on our website, seemingly at random (every day or so, up to once every 7-10 days) the website will become unresponsive.
We have two web servers on Azure, and we use Redis.
I've managed to run DotNetMemory and caught it when it crashes, and what I observe is under Event handlers leak two items seem to increase in count into the thousands before the website stops working. Those two items are CaliEventHandlerDelegateProxy and ArglessEventHandlerProxy. Once the site crashes, we get lots of Redis exceptions that it can't connect to the Redis server. According to Azure Portal, our Redis server load never goes above 10% in peak times and we're following all best practises.
I've spent a long time going through our website ensuring that there are no obvious memory leaks, and have patched a few cases that went under the radar. Anecdotally, these seem to of improved the website stability a little. Things we've checked:
All iDisposable objects are now wrapped in using blocks (we did this strictly before but we did find a few not disposed properly)
Event handlers are unsubscribed - there are very few in our code base
We use WebUserControls pretty heavily. Each one had the current master page passed in as a parameter. We've removed the dependency on this as we thought it could cause GC to not collect the page perhaps
Our latest issue is that when the web server runs fine, but then we run DotNetMemory and attach it to the w3wp.exe process it causes the CaliEventHandlerDelegateProxy and ArglessEventHandlerProxy event leaks to increase rapidly until the site crashes! So the crash is reproducible just by running DotNetMemory. Here is a screenshot of what we saw:
I'm at a loss now, I believe I've exhausted all possibilities of memory leaks in our code base, and our "solution" is to have the app pools recycle every several hours to be on the safe side.
We've even tried upgrading Redis to the Premium tier, and even upgraded all drives on the webservers to SSDs to see if it helps things which it doesn't appear to.
Can anyone shed any light on what might be causing these issues?
All iDisposable objects are now wrapped in using blocks (we did this
strictly before but we did find a few not disposed properly)
We can't say a lot about crash without any information about it, but I have some speculations about it.
I see 10 000 (!) not disposed objects handled by finalization queue. Let start with them, find all of them and add Dispose call in your app.
Also I would recommend to check how many system handles utilized by your application. There is an OS limit on number of handles and if they are exceeded no more file handles, network sockets, etc can be created. I recommend it especially since the number of not disposed objects.
Also if you have a timeout on accessing Redis get performance profiler and look why so. I recommend to get JetBrains dotTrace and use TIMELINE mode to get a profile of your app, it will show thread sleeping, threads contention and many many more information what will help you to find a problem root. You can use command line tool to obtain profile data, in order not to install GUI application on the server side.
it causes the CaliEventHandlerDelegateProxy and
ArglessEventHandlerProxy event leaks to increase rapidly
dotMemory doesn't change your application code and doesn't allocate any managed objects in profiled process. Microsoft Profiling API injects a dll (written in c++) into the profiling process, it's a part of dotMemory, named Profilng Core, playing the role of the "server" (where standalone dotMemory written in C# is a client). Profiling Core doing some work with gathered data before sending it to the client side, it requires some memory, which allocated, of course, in the address space of the profiling process but it doesn't affect managed memory.
Memory profiling may affect performance of your application. For example, profiling API disables concurrent GC when application is under profiling or memory allocation data collecting can significantly slow down your application.
Why do you thing that CaliEventHandlerDelegateProxy and ArglessEventHandlerProxy are allocated only under dotMemory profiling? Could you please describe how you explored this?
Event handlers are unsubscribed - there are very few in our code base
dotMemory reports an event handler as a leak means there is only one reference to it - from event source at there is no possibility to unsubscribe from this event. Check all these leaks, find yours at look at the code how it is happened. Anyway, there are only 110.3 KB retained by these objects, why do you decide your site crashed because of them?
I'm at a loss now, I believe I've exhausted all possibilities of memory leaks in our code base
Take several snapshots in a period of time when memory consumption is growing, open full comparison of some of these snapshots and look at all survived objects which should not survive and find why they survived. This is the only way to prove that your app doesn't have memory leak, looking the code doesn't prove it, sorry.
Hope if you perform all the activities I recommend you to do (performance profiling, full snapshots and snapshots comparison investigation, not only inspections view, checking why there are huge amount of not disposed objects) you will find and fix the root problem.
After deploying a new version of a hybrid asp.net web application, Framework 4.5.1, IIS 7.5, we immediately noticed that CPU usage was spiking to 100%.
I followed CPU spike debugging using DebugDiag as described in this article: http://www.iis.net/learn/troubleshoot/performance-issues/troubleshooting-high-cpu-in-an-iis-7x-application-pool
I now have my report, and every one of the threads identified as High CPU usage problems look like this, with varying thread numbers:
Thread 1576 - .SNIReadSyncOverAsync(SNI_ConnWrapper*, SNI_Packet**, Int32)
I'm guessing this means the culprit is a LINQ to SQL call. The application uses a lot of LINQ to SQL. Unfortunately the DebugDiag report gives no clue as to which LINQ to SQL call is causing the difficulty.
Is there any way to use the information in the DebugDiag report to identify the SQL Server calls that causes the High CPU usage?
We never did find an answer to the question. I was hoping for an answer that would tell us what we could add to the performance monitor data collection to see the actual SQL that was being passed by the threads that were spiking CPU.
Instead we ran SQL Server performance monitor, duly filtered to cover only traffic from the web application, for about a minute. We dumped all the data collected into a table, then examined statement start and end times to identify statements that were taking an inordinate amount of time. From this collection of sluggish statements we identified the SQL call that was spiking CPU.
Oddly enough, the SQL call (selecting the results of an Inline Table-Valued Function) takes 2-3 seconds to complete, but most of that time is taken with sql server breaking the connection (sp_reset_connection). The call itself returns in less than a millisecond, and when we execute the same function in SSMS using identical parameters the call executes in less than a millisecond. However, this will be the topic of a separate question.
I have a legacy ASP.NET website consisting of over 230 unique .ASPX files. This website has hundreds of thousands of hits per day across many of the different files. It leaks memory, causing periodic process recycles throughout the day (Windows Event ID 5117: A worker process with process id of '%1' serving application pool '%2' has requested a recycle because it reached its private bytes memory limit.)
I've already tested the 30 most-frequently accessed pages and fixed the memory leaks in several of them, resulting in a significant improvement. And load testing of those pages show they don't leak any more. But that leaves over 200 pages that still haven't been checked. But with 200 more files to check I wonder if there isn't something a little more organized or clever that can be done.
For instance, is there instrumentation that could be added to the Application_BeginRequest or Application_EndRequest event handlers in the Global.asax? If so, what specifically should be monitored? Example code and/or discussion would be most helpful.
The best tool you can use to get organized and plug your biggest leaks first is windbg.
It comes with the windows sdk
heres a reference
http://msdn.microsoft.com/en-us/library/windows/hardware/ff551063(v=vs.85).aspx
It can be a little tough at first to get used to the commands but heres what youll want to do.
1. Install windbg on a machine that is running the site
2. Load test all the pages and get the memory usage way up
3. (optional) Force garbage collection with gccollect()
4. Attach to w3wp.exe with windbg
5. load your symbols (.pdbs and the clr and sos)
6. dump the memory (!dumpheap -stat)
This will show you a sorted list by the number of objects in memory. When you have a leak you start to build up tons of the same object.
You need to dig deeper to get the size of these objects
1. The first number in the row will be the method table copy this number
2. Dump the method table (!dumpheap -mt #######)
3. Choose a single objects ID from the first column and copy this number
4. Get the size of the object (!objsize #######)
( # of objects) X (size of single object) = size of the leak
Find the classes that are taking up the most space and plug them first.
This may help too - CLR profiler
http://www.microsoft.com/en-us/download/details.aspx?id=14727
Last night I did a load test on a site. I found that one of my shared caches is a bottleneck. I'm using a ReaderWriterLockSlim to control the updates of the data. Unfortunately at one point there are ~200 requests trying to update the data at approximately the same time. This also coincided with CPU usage spikes.
The data being updated is in the ASP.NET Cache. What I'd like to do is if the CPU usage is around 75%, I'd like to just skip the cache and hit the database on another machine.
My problem is that I don't know how expensive it is to create a new performance counter to check the cpu usage. Also, if I would probably like the average cpu usage over the last 2 or 3 seconds. However, I can't sit there and calculate the cpu time as that would take longer than it's taking to update the cache currently.
Is there an easy way to get the average CPU usage? Are there any drawbacks to this?
I'm also considering totaling the wait count for the lock and then at a certain threshold switch over to the database. The concern I had with this approach would be that changing hardware might allow more locks with less of a strain on the system. And also finding the right balance for the threshold would be cumbersome and it doesn't take into account any other load on the machine. But it's a simple approach, and simple is 99% of the time better.
This article from Microsoft covers Tuning .Net Application Performance and highlights which counters to collect and compare to determine CPU and I/O bound applications.
You sound like you want to monitor this during execution and bypass your cache when things get intensive. Would this not just move the intensive processing from the cache calls to your database calls? Surely you have the cache to avoid expensive DB calls.
Are you trying to repopulate an invalidated cache? What is the affect of serving stale data from the cache? You could just lock on the re-populating function and serve stale data to other requests until the process completes.
Based on the above article, we collect the following counter objects during our tests and that gives us all the necessary counters to determine the bottlenecks.
.NET CLR Exceptions
.NET CLR Memory
ASP.NET Applications
ASP.NET
Memory
Paging File
Processor
Thread
The sections in the article for CLR Tuning and ASP.NET Tuning highlight the bottlenecks that can occur and suggest configuration changes to improve performance. We certainly made changes to the thread pool settings to get better performance.
Changing and Retrieving Performance Counter Values might help with accessing the existing Processor counter via code but this isn't something I've tried personally.
Setup: ASP.net 3.5, Linq-to-Sql. Separate Web and DB servers (each 8-core, 8GB RAM). 4 databases. I am running an insert operation with a few million records into DB4 (using Linq-to-Sql for now, though I might switch to SqlBulkCopy). Logging shows that records are being put in consistently at a rate of 600-700 per second (I am running DataContext.SubmitChanges() every 1000 records to keep the transaction size down). The insert is run during one Http Request (timeout is set pretty high).
The problem is that while this insert operation is running, the web application becomes completely unresponsive (both within different browser windows on my machine, and on other browsers in remote locations).
This insert operation is touching one table in DB4. Most pages will only touch DB1 (so I don't think that it is a locking issue - I also checked in through Management Studio, and no objects are being locked unnecessarily). I have checked out performance stats on both the Web and DB servers, and while they may spike from time to time, throughout the inserts they stay well within the "green".
Any idea about what can be causing the app to become unresponsive or suggestions about things that I should do in order to narrow down the issue?
Responses to suggestions:
Suggestion that inserts are using all DB connections: the inserts are being done off of a different connection string (and DB) than what other pages in the app use. Also, I checked in SSMS, and there is just one connection open for DB4, and one open for DB1 (so it doesn't look like it is running out of connections).
Suggestion that inserts are maxing out CPU on web server: this is the only application on the server (and less than 5 users at any one time). Performance monitor shows CPU staying in between 12%-20%. Memory is hardly being touched.
My first guess would be that you are using up available connections to the database with the insert operations that you are doing and the web applications are waiting to get a connection to the database.
You have a few options.
Look in SSMS and see what you have for open and active connections under regular load and when doing the inserts see if that is a problem.
Use a profiling tool such as ANTS profiler to see what is going on with the web application at the time of the slow down, it might help pinpoint the issue.
You could also try manually executing the queries that the web application is using, on the SQL Server and see if you notice a similar behavior.
The other option, a bit less likely, but it could be that the web application doing the bulk insert, it taking all of the CPU time from the other web applications on the server, preventing use. If you haven't done so already, split out the application to its own pool so you can monitor its load.
I don't know about Linq-to-Sql, but NHibernate specifically states that using it for bulk inserts is a bad idea. I have found Array Binding in ADO.NET to be very fast, Here is an article explaining how to do it with Oracle, but it should work with other providers too.
Seems like it is bad idea to do long operations in web app (for example, your IIS server can restart your application for next to no reason). Split your long application into Web App and Service App. Do long operations in Service App. Communicate between them via WCF & pipes.
Eventual Solution: I changed the data insertions from LinqToSql to use SqlBulkCopy via DataTable. The first time I did this, I got an OutOfMemory exception when trying to build a DataTable with 2 million rows in memory. So I am adding 50,000 rows at a time, and loading them into the DB with SqlBulkCopy (Batch Rate: 10,000) and then clearing the DataTable Rows collection. I am now getting in 2.1 million rows in 108 seconds (About 20,000 per second; Rate rate last night was average of 200 per second with L2S). With the increased data insertion performance, the app-wide unresponsiveness has gone away.
It possible what you have a lock statement some where in you web application what blocking some important resurse during the whole time you loading you data into DB.