Symfony2 - Random Failed to start the session - symfony

We have a site with Symfony2 with some traffic.
Every day the site begins to fail with this error for 1 or 2 minutes (15-20 errors). This occurs at random hours, could not find a pattern. It does not fit even to peak hours.
2015-10-09 02:23:57.635 [2015-10-09 06:23:38] request.CRITICAL: Uncaught PHP Exception RuntimeException: "Failed to start the session" at /var/www/thing.com/httpdocs/app/cache/prod/classes.php line 121 {"exception":"[object] (RuntimeException(code: 0): Failed to start the session at /var/www/thing.com/httpdocs/app/cache/prod/classes.php:121)"} []
Doesn't seem to be a double header problem or double start problem.
Site does not interact with any PHP legacy code that could be messing with the sessions.
Sessions are stored in the database so a file problem is discarded.
Lowered the session duration so the session table does not get too big and the problem persists.
Think It could be a problem with HWIOAuthBundle and it's facebook login, but cannot find where is the conflict.
Also the site uses a lot of render_esi for caching with Symfony2 internal cache system.
Update -------------------------------------------------
Emptied the /var/lib/php/sessions folder of older session files than were not being used.
Lowered the session lifespan. Sql entries in the sessions table went from ~3 Million to ~1.3 Million.
Seems that the problem is gone but this is not a real solution.
My guess is that the pdo_handler in symfony2 has a performance problem.
Maybe someone with more knowledge in this matter (pdo_handler, table optimization) can point a real solution for high traffic.

Where does your PHP installation save sessions to?
[You can find this in your php.ini file in the session.save_path setting, assuming you have CLI access]
It is very likely PHP uses your servers /tmp folder. If this folder is full at any point, then PHP can't create new sessions.
You can see the current size of your /tmp folder with:
du -ch /tmp/ |grep total
If, as is common, the /tmp folder is on its own partition, you can see its maximum size with :
df -h
Some programs can suddenly guzzle Gbs of this folder for their purposes.

Related

Google Cloud Composer (Apache Airflow) cannot access log files

I'm running a DAG in Google Cloud Composer (hosted Airflow) which runs fine in Airflow locally. All it does is print "Hello World". However, when I run it through Cloud Composer I receive the error:
*** Log file does not exist: /home/airflow/gcs/logs/matts_custom_dag/main_test/2020-04-20T23:46:53.652833+00:00/2.log
*** Fetching from: http://airflow-worker-d775d7cdd-tmzj9:8793/log/matts_custom_dag/main_test/2020-04-20T23:46:53.652833+00:00/2.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-d775d7cdd-tmzj9', port=8793): Max retries exceeded with url: /log/matts_custom_dag/main_test/2020-04-20T23:46:53.652833+00:00/2.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8825920160>: Failed to establish a new connection: [Errno -2] Name or service not known',))
I've also tried making the DAG add data into a database and it actually succeeds 50% of the time. However, it always returns this error message (and no other print statements or logs). Any help much appreciated on why this might be happening.
We also faced the same issue then raised a support ticket to GCP and got the following reply.
The message is related to the latency of syncing logs from Airflow workers to WebServer, it takes at least some minutes (depending on the number of objects and their size)
The total log size seems not large but it’s enough to noticeably slow down synchronization, hence, we recommend cleanup/archive the logs
Basically we recommend relying on Stackdriver logs instead, because of latency due to the design of this sync
I hope this will help you solve the problem.
I have the same problem after upgrading from 1.10.3 to 1.10.6 of Google Composer.
I can see in my logs that airflow is trying to get the logs from a bucket with a name ended with -tenant while the bucket in my account ends with -bucket
In the configuration, I can see something weird too.
## airflow.cfg
[core]
remote_base_log_folder = gs://us-east1-dada-airflow-xxxxx-bucket/logs
## also in the running configuration says
core remote_base_log_folder gs://us-east1-dada-airflow-xxxxx-tenant/logs env var
I wrote to google support and they said the team is working on a fix.
EDIT:
I've been accessing my logs with gsutil and replacing the bucket name suffix to -bucket
gsutil cat gs://us-east1-dada-airflow-xxxxx-bucket/logs/...../5.logs
I faced the same situation in multiple occasions.
As soon as when the job finished when I take a look at the log on Airflow Web UI, it used to give me the same error. Although when I check back the same logs on UI after a min or 2, I could see the logs properly.
As per the above answers, its a sync issue between the webserver and the Worker node.
In general, the issue describe here should be more like a sporadic issue.
In certain situations, what could help is setting default-task-retries to a value that allows for retrying a task at least 1.
This issue is resolved at least since Airflow version: 1.10.10+composer.

Why isn't Carbon writing Whisper data points as per updated storage-schema retention?

My original carbon storage-schema config was set to 10s:1w, 60s:1y and was working fine for months. I've recently updated it to 1s:7d, 10s:30d, 60s,1y. I've resized all my whisper files to reflect the new retention schema using the following bit of bash:
collectd_dir="/opt/graphite/storage/whisper/collectd/"
retention="1s:7d 1m:30d 15m:1y"
find $collectd_dir -type f -name '*.wsp' | parallel whisper-resize.py \
--nobackup {} $retention \;
I've confirmed that they've been updated using whisper-info.py with the correct retention and data points. I've also confirmed that the storage-schema is valid using a storage-schema validation script.
The carbon-cache{1..8}, carbon-relay, carbon-aggregator, and collectd services have been stopped before the whisper resizing, then started once the resizing was complete.
However, when checking in on a Grafana dashboard, I'm seeing empty graphs with correct data points (per sec, but no data) on collectd plugin charts; but with the graphs that are providing data, it's showing data and data points every 10s (old retention), instead of 1s.
The /var/log/carbon/console.log is looking good, and the collectd whisper files all have carbon user access, so no permission denied issues when writing.
When running an ngrep on port 2003 on the graphite host, I'm seeing connections to the relay, along with metrics being sent. Those metrics are then getting relayed to a pool of 8 caches to their pickle port.
Has anyone else experienced similar issues, or can possibly help me diagnose the issue further? Have I missed something here?
So it took me a little while to figure this out. It had nothing to do with the local_settings.py file like some of the old responses, but it had to do with the Interval function in the collectd.conf.
A lot of the older responses mentioned that you needed to include 'Interval 1' inside each Plugin container. I think this would have been great due to the control of each metric. However, that would create config errors in my logs, and break the metric. Setting 'Interval 1' at top level of the config resolved my issues.

Random w3wp.exe crashes in .net 4

I have a website which has been up and running absolutely fine for about 8 months now. It's running .net 4 intergrated mode.
Recently i've started to get some "random" w3wp.exe crashes, and after 5 of them, IIS rapid fail protection kicks in and I have to manually login to the server and start the application pool again.
Here's what the event viewer says for the Error:
Faulting application name: w3wp.exe, version: 7.5.7601.17514, time stamp: 0x4ce7afa2
Faulting module name: nlssorting.dll, version: 4.0.30319.296, time stamp: 0x504835c7
Exception code: 0xc00000fd
Fault offset: 0x000000000000191f
Faulting process id: 0x1998
Faulting application start time: 0x01ce6e6b9b80c949
Faulting application path: c:\windows\system32\inetsrv\w3wp.exe
Faulting module path: C:\Windows\Microsoft.NET\Framework64\v4.0.30319\nlssorting.dll
Report Id: d9cf3164-da5e-11e2-8cc5-f46d0440f6d5
Straight after the crashes, i get an "Information" log in the event viewer which at the bottom gives me the location of a .wer file.
This is what the .wer files contains:
Version=1
EventType=APPCRASH
EventTime=130162847687759734
ReportType=2
Consent=1
ReportIdentifier=d7c5e520-da5e-11e2-8cc5-f46d0440f6d5
IntegratorReportIdentifier=d7c5e51f-da5e-11e2-8cc5-f46d0440f6d5
Response.type=4
Sig[0].Name=Application Name
Sig[0].Value=w3wp.exe
Sig[1].Name=Application Version
Sig[1].Value=7.5.7601.17514
Sig[2].Name=Application Timestamp
Sig[2].Value=4ce7afa2
Sig[3].Name=Fault Module Name
Sig[3].Value=nlssorting.dll
Sig[4].Name=Fault Module Version
Sig[4].Value=4.0.30319.296
Sig[5].Name=Fault Module Timestamp
Sig[5].Value=504835c7
Sig[6].Name=Exception Code
Sig[6].Value=c00000fd
Sig[7].Name=Exception Offset
Sig[7].Value=000000000000197d
DynamicSig[1].Name=OS Version
DynamicSig[1].Value=6.1.7601.2.1.0.1296.17
DynamicSig[2].Name=Locale ID
DynamicSig[2].Value=2057
DynamicSig[22].Name=Additional Information 1
DynamicSig[22].Value=6141
DynamicSig[23].Name=Additional Information 2
DynamicSig[23].Value=61419d6dee6cf74b8ac2b00b4c3b3373
DynamicSig[24].Name=Additional Information 3
DynamicSig[24].Value=c19b
DynamicSig[25].Name=Additional Information 4
DynamicSig[25].Value=c19b8acf029a3088171b1f5f3dd9dc77
UI[2]=c:\windows\system32\inetsrv\w3wp.exe
UI[5]=Check online for a solution (recommended)
UI[6]=Check for a solution later (recommended)
UI[7]=Close
UI[8]=IIS Worker Process stopped working and was closed
UI[9]=A problem caused the application to stop working correctly. Windows will notify you if a solution is available.
UI[10]=&Close
LoadedModule[0]=c:\windows\system32\inetsrv\w3wp.exe
LoadedModule[1]=C:\Windows\SYSTEM32\ntdll.dll
LoadedModule[2]=C:\Windows\system32\kernel32.dll
LoadedModule[3]=C:\Windows\system32\KERNELBASE.dll
LoadedModule[4]=C:\Windows\system32\ADVAPI32.dll
LoadedModule[5]=C:\Windows\system32\msvcrt.dll
LoadedModule[6]=C:\Windows\SYSTEM32\sechost.dll
LoadedModule[7]=C:\Windows\system32\RPCRT4.dll
LoadedModule[8]=C:\Windows\system32\pcwum.DLL
LoadedModule[9]=C:\Windows\system32\USER32.dll
LoadedModule[10]=C:\Windows\system32\GDI32.dll
LoadedModule[11]=C:\Windows\system32\LPK.dll
LoadedModule[12]=C:\Windows\system32\USP10.dll
LoadedModule[13]=C:\Windows\system32\ole32.dll
LoadedModule[14]=c:\windows\system32\inetsrv\IISUTIL.dll
LoadedModule[15]=C:\Windows\system32\IMM32.DLL
LoadedModule[16]=C:\Windows\system32\MSCTF.dll
LoadedModule[17]=C:\Windows\system32\CRYPTBASE.dll
LoadedModule[18]=C:\Windows\system32\ntmarta.dll
LoadedModule[19]=C:\Windows\system32\WLDAP32.dll
LoadedModule[20]=c:\windows\system32\inetsrv\w3wphost.dll
LoadedModule[21]=C:\Windows\system32\OLEAUT32.dll
LoadedModule[22]=c:\windows\system32\inetsrv\nativerd.dll
LoadedModule[23]=C:\Windows\system32\CRYPT32.dll
LoadedModule[24]=C:\Windows\system32\MSASN1.dll
LoadedModule[25]=C:\Windows\system32\XmlLite.dll
LoadedModule[26]=C:\Windows\system32\ktmw32.dll
LoadedModule[27]=c:\windows\system32\inetsrv\IISRES.DLL
LoadedModule[28]=C:\Windows\system32\CRYPTSP.dll
LoadedModule[29]=C:\Windows\system32\rsaenh.dll
LoadedModule[30]=C:\Windows\system32\mscoree.dll
LoadedModule[31]=C:\Windows\system32\CLBCatQ.DLL
LoadedModule[32]=C:\Windows\system32\mlang.dll
LoadedModule[33]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\webengine4.dll
LoadedModule[34]=C:\Windows\system32\MSVCR100_CLR0400.dll
LoadedModule[35]=C:\Windows\system32\USERENV.dll
LoadedModule[36]=C:\Windows\system32\profapi.dll
LoadedModule[37]=C:\Windows\system32\PSAPI.DLL
LoadedModule[38]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\mscoreei.dll
LoadedModule[39]=C:\Windows\system32\SHLWAPI.dll
LoadedModule[40]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\clr.dll
LoadedModule[41]=C:\Windows\system32\inetsrv\iiscore.dll
LoadedModule[42]=c:\windows\system32\inetsrv\W3TP.dll
LoadedModule[43]=c:\windows\system32\inetsrv\w3dt.dll
LoadedModule[44]=C:\Windows\system32\HTTPAPI.dll
LoadedModule[45]=C:\Windows\system32\slc.dll
LoadedModule[46]=C:\Windows\system32\WS2_32.dll
LoadedModule[47]=C:\Windows\system32\NSI.dll
LoadedModule[48]=C:\Windows\system32\Normaliz.dll
LoadedModule[49]=C:\Windows\system32\faultrep.dll
LoadedModule[50]=C:\Windows\system32\Secur32.dll
LoadedModule[51]=C:\Windows\system32\SSPICLI.DLL
LoadedModule[52]=C:\Windows\system32\NLAapi.dll
LoadedModule[53]=C:\Windows\system32\napinsp.dll
LoadedModule[54]=C:\Windows\System32\mswsock.dll
LoadedModule[55]=C:\Windows\system32\DNSAPI.dll
LoadedModule[56]=C:\Windows\System32\winrnr.dll
LoadedModule[57]=C:\Windows\System32\wshtcpip.dll
LoadedModule[58]=C:\Windows\System32\wship6.dll
LoadedModule[59]=C:\Windows\system32\IPHLPAPI.DLL
LoadedModule[60]=C:\Windows\system32\WINNSI.DLL
LoadedModule[61]=C:\Windows\system32\rasadhlp.dll
LoadedModule[62]=C:\Windows\System32\fwpuclnt.dll
LoadedModule[63]=C:\Windows\System32\inetsrv\cachuri.dll
LoadedModule[64]=C:\Windows\System32\inetsrv\cachfile.dll
LoadedModule[65]=C:\Windows\System32\inetsrv\cachtokn.dll
LoadedModule[66]=C:\Windows\System32\inetsrv\cachhttp.dll
LoadedModule[67]=C:\Windows\System32\inetsrv\compdyn.dll
LoadedModule[68]=C:\Windows\System32\inetsrv\compstat.dll
LoadedModule[69]=C:\Windows\System32\inetsrv\defdoc.dll
LoadedModule[70]=C:\Windows\System32\inetsrv\protsup.dll
LoadedModule[71]=C:\Windows\System32\inetsrv\redirect.dll
LoadedModule[72]=C:\Windows\System32\inetsrv\static.dll
LoadedModule[73]=C:\Windows\System32\inetsrv\authanon.dll
LoadedModule[74]=C:\Windows\System32\inetsrv\authbas.dll
LoadedModule[75]=C:\Windows\System32\inetsrv\authsspi.dll
LoadedModule[76]=C:\Windows\system32\NETAPI32.dll
LoadedModule[77]=C:\Windows\system32\netutils.dll
LoadedModule[78]=C:\Windows\system32\srvcli.dll
LoadedModule[79]=C:\Windows\system32\wkscli.dll
LoadedModule[80]=C:\Windows\System32\inetsrv\iprestr.dll
LoadedModule[81]=C:\Windows\System32\inetsrv\modrqflt.dll
LoadedModule[82]=C:\Windows\System32\inetsrv\logcust.dll
LoadedModule[83]=C:\Windows\System32\inetsrv\custerr.dll
LoadedModule[84]=C:\Windows\System32\inetsrv\loghttp.dll
LoadedModule[85]=C:\Windows\System32\inetsrv\isapi.dll
LoadedModule[86]=C:\Windows\System32\inetsrv\filter.dll
LoadedModule[87]=C:\Windows\System32\inetsrv\validcfg.dll
LoadedModule[88]=c:\Windows\Microsoft.NET\Framework64\v4.0.30319\aspnet_filter.dll
LoadedModule[89]=C:\Windows\system32\inetsrv\wbhst_pm.dll
LoadedModule[90]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\webengine.dll
LoadedModule[91]=C:\Windows\assembly\NativeImages_v4.0.30319_64\mscorlib\4f52500ab48877b85e71430f4f46670f\mscorlib.ni.dll
LoadedModule[92]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\nlssorting.dll
LoadedModule[93]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System\a91f32875cb3ba779f1b3ceff1690251\System.ni.dll
LoadedModule[94]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Core\0a8d99339ffe6b25debb8f8201c27664\System.Core.ni.dll
LoadedModule[95]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Web\5b905bd7b71f9fd6bea2d05cc1ae85f8\System.Web.ni.dll
LoadedModule[96]=C:\Windows\system32\sxs.dll
LoadedModule[97]=C:\Windows\system32\RpcRtRemote.dll
LoadedModule[98]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Configuration\fa65f89fd682c459fc5e7bcbd0418317\System.Configuration.ni.dll
LoadedModule[99]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Xml\f4afb233f160b8e55aad4660e45b374c\System.Xml.ni.dll
LoadedModule[100]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\clrjit.dll
LoadedModule[101]=C:\Windows\assembly\NativeImages_v4.0.30319_64\Microsoft.Build.Uti#\14e16d61fae3cd1d9a1fa79b789f8438\Microsoft.Build.Utilities.v4.0.ni.dll
LoadedModule[102]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Runtime.Cach#\8fdbe304abab0631b8a4310b35f3d93a\System.Runtime.Caching.ni.dll
LoadedModule[103]=C:\Windows\system32\shfolder.dll
LoadedModule[104]=C:\Windows\system32\SHELL32.dll
LoadedModule[105]=C:\Windows\assembly\NativeImages_v4.0.30319_64\Microsoft.JScript\85204dde340780329b569b025e249c23\Microsoft.JScript.ni.dll
LoadedModule[106]=C:\Windows\system32\version.dll
LoadedModule[107]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\587f6661\a99d8ff8\App_Code.cgixlnxh.dll
LoadedModule[108]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Data.Linq\feaa494ad67542d2060b31b9eeb6458b\System.Data.Linq.ni.dll
LoadedModule[109]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Data\b928128fca867546a858a1a39240d85c\System.Data.ni.dll
LoadedModule[110]=C:\Windows\Microsoft.Net\assembly\GAC_64\System.Data\v4.0_4.0.0.0__b77a5c561934e089\System.Data.dll
LoadedModule[111]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\587f6661\a99d8ff8\assembly\dl3\595a888a\f26c0653_7f81cd01\HtmlAgilityPack.dll
LoadedModule[112]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Drawing\5ae853f556290da9399b15b3619f7e15\System.Drawing.ni.dll
LoadedModule[113]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\587f6661\a99d8ff8\assembly\dl3\85ba5013\f0c8f388_706bce01\TweetSharp.dll
LoadedModule[114]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Web.Extensio#\0180a2d993d2a9699cf07f7163524fff\System.Web.Extensions.ni.dll
LoadedModule[115]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Transactions\7b2099a1386e38ff198a51939304ce6e\System.Transactions.ni.dll
LoadedModule[116]=C:\Windows\Microsoft.Net\assembly\GAC_64\System.Transactions\v4.0_4.0.0.0__b77a5c561934e089\System.Transactions.dll
LoadedModule[117]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\587f6661\a99d8ff8\App_global.asax.yxdky-qn.dll
LoadedModule[118]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.ServiceModel#\7a5a5ff4a0b3bb4ba4bcc13166918e36\System.ServiceModel.Activation.ni.dll
LoadedModule[119]=C:\Windows\system32\bcrypt.dll
LoadedModule[120]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Runtime.Dura#\799274e49455d0fe4ca563f42143bef2\System.Runtime.DurableInstancing.ni.dll
LoadedModule[121]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Numerics\a66416296451fe6d2d8a5506ca41b23d\System.Numerics.ni.dll
LoadedModule[122]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.ServiceModel\15834d73d2846fc01ed54488ccfff5c8\System.ServiceModel.ni.dll
LoadedModule[123]=C:\Windows\assembly\NativeImages_v4.0.30319_64\SMDiagnostics\31f93b6be386908ff2727bcd825de0ca\SMDiagnostics.ni.dll
LoadedModule[124]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Xaml.Hosting\cf8401f4952deb5303e0d7fd459ce530\System.Xaml.Hosting.ni.dll
LoadedModule[125]=C:\Windows\system32\inetsrv\gzip.dll
LoadedModule[126]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\587f6661\a99d8ff8\assembly\dl3\3d63b311\fe7c9b8a_706bce01\Hammock.ClientProfile.dll
LoadedModule[127]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\587f6661\a99d8ff8\assembly\dl3\6a128bd2\c184e08a_706bce01\Newtonsoft.Json.dll
LoadedModule[128]=C:\Windows\system32\rasapi32.dll
LoadedModule[129]=C:\Windows\system32\rasman.dll
LoadedModule[130]=C:\Windows\system32\rtutils.dll
LoadedModule[131]=C:\Windows\system32\winhttp.dll
LoadedModule[132]=C:\Windows\system32\webio.dll
LoadedModule[133]=C:\Windows\system32\credssp.dll
LoadedModule[134]=C:\Windows\system32\dhcpcsvc6.DLL
LoadedModule[135]=C:\Windows\system32\dhcpcsvc.DLL
LoadedModule[136]=C:\Windows\system32\security.dll
LoadedModule[137]=C:\Windows\system32\schannel.DLL
LoadedModule[138]=C:\Windows\system32\ncrypt.dll
LoadedModule[139]=C:\Windows\system32\bcryptprimitives.dll
LoadedModule[140]=C:\Windows\system32\GPAPI.dll
FriendlyEventName=Stopped working
ConsentKey=APPCRASH
AppName=IIS Worker Process
AppPath=c:\windows\system32\inetsrv\w3wp.exe
That nlssorting.dll seems to crop up a lot but I can't seem to find anything online related. The only thing I can find which matches my error is here, but that doesn't really help me.
I'm completely stumped as to where to go from here to fix this. Here's what I've tried:
Loading up IIS log files and trying every request from about 30 minutes before a crash, and none of the pages cause any errors.
Searching my code for any recursion which might cause a stackoverflow, but there isn't any
trawling online for ANYTHING that might help
Has anyone else ever had any problems with nlssorting.dll ? Can i get some more information from the .wer file that might help me pin point where this is happening?
Thanks in advance for any help!
UPDATE
I was using a 3rd party DLL, which was causing a stack overflow exception (0xc00000fd)
After more investigation, it was only happening after a certain chain of events happened - hence the "random" in the title. Removing the DLL fixed the problem.
We had the same problem with one of our sites. Using SVN we tracked it down to a method that was scaning for images within a folder.
I modified the code as follows:
Checking array length of scan results to be > 0 instead of == 1
Adding CultureInfo.InvariantCulture to all Int32.ToString() calls
After this we no longer experienced the error. The exact reason is still unknown.
I believe that none of the above points should make a difference in our environment. I believe that the problem could have been, people modifying image files and folders while the image scanning method was called.
I hope this helps somebody.
For anyone who's curious, this is a PITA to debug. Here are three reasons rumored for this to happen:
(1) Stack overflows, as in the original post.
(2) Too much CPU / memory usage, which becomes obvious and rapid fail protection closes the process.
(3) Unable to respond to pings / requests due to application hogging resources, but in a way that rapid fail protection deems appropriate to end the process, not explicitly because of either (1) or (2).
Our solution was to add manual log tracing in the production environment until we eventually found recursion which was leading the application to be stopped by reliability services (for inability to respond to pings, or process randomly crashing) rather than throwing an in-application exception.
I had the issue where w3p would throw an unhandled error as soon I spun up the site\api URI from the web browser, then it would crash.
I was able to pinpoint what part of my code was causing it in my case it was in the Owin Startup class and I was reading some configuration records from a database but prior to that it gets the connection string from a configuration file outside of the web app directory.
I checked the ownership of the folder it showed my account but apparently the subfolders were not owned by me so I set ownership to me again, clicked OK to allow permissions to traverse the objects and voila bye bye w3p error and the API loaded.
So in my case it was an access denied error on the folder \ file that contained the connection string.

Optimize APC Caching

here is a link to how my APC is running : [removed]
As you can see, it fills up pretty quickly and my Cache Full Count goes over 1000 sometimes
My website uses Wordpress.
I notice that every time I make a new post or edit a post, 2 things happen.
1) APC Memory "USED" resets
2) I get a whole lot of Fragments
I've tried giving more Memory to APC (512 mb) but then it crashes sometimes, it seems 384 is best. I also have a Cron job that restarts apache, clearing all APC of fragments and used memory, every 4 hours. Again, my apache crashes if APC is running for a long period of time, I think due to the fragment buildup.
Should I use the apc.Filters and filter out some stuff that should not be cached?
I am really beginner at this sort of stuff, so if someone can explain with full instructions, Thank you very much !!!
I work as a Linux Systems Admin, the wordpress server runs 5 different WordPress installs. If you are running just one, I will comment the configurations to consider.
APC / PHP Versions, 3.1.9 / 5.3.7
Here is my complete apc.conf,
apc.enabled=1
apc.shm_segments=1
; I would try 32M per WP install, go from there
apc.shm_size=128M
; Relative to approx cached PHP files,
apc.num_files_hint=512
; Relative to approx WP size W/ APC Object Cache Backend,
apc.user_entries_hint=4096
apc.ttl=7200
apc.use_request_time=1
apc.user_ttl=7200
apc.gc_ttl=3600
apc.cache_by_default=1
apc.filters
apc.mmap_file_mask=/tmp/apc.XXXXXX
apc.file_update_protection=2
apc.enable_cli=0
apc.max_file_size=2M
;This should be used when you are finished with PHP file changes.
;As you must clear the APC cache to recompile already cached files.
;If you are still developing, set this to 1.
apc.stat=0
apc.stat_ctime=0
apc.canonicalize=1
apc.write_lock=1
apc.report_autofilter=0
apc.rfc1867=0
apc.rfc1867_prefix =upload_
apc.rfc1867_name=APC_UPLOAD_PROGRESS
apc.rfc1867_freq=0
apc.rfc1867_ttl=3600
;This MUST be 0, WP can have errors otherwise!
apc.include_once_override=0
apc.lazy_classes=0
apc.lazy_functions=0
apc.coredump_unmap=0
apc.file_md5=0
apc.preload_path
#Chris_O, your configuration is not optimal in a few aspects.
1. apc.shm_segments=3
If you run a modern Linux Distro, your SHM should be sufficiantly large enough.
If it is too small search how to set sysctl.conf entries, You can check like this.
#Check Max Segment size
cat /proc/sys/kernel/shmmax
Exception when running on certain BSD's, or Other Unix's, Or managed hosts you don't control. There is disadvantages to not having a contiguous segment, read details of APC for that info.
2. apc.enable_cli=1
BAD BAD BAD, this is for debug only! Every time you run php-cli, it clears the APC cache.
3. apc.max_file_size=10M
Unnecessary and ridiculous! If you had a file that big, it would eat 1/3rd of that small 32M SHM. Even though you specify 3, they don't just act like one big segment in three pieces. Regardless WP doesn't even have single PHP files even close to that size.
'hope I helped people with their apc.conf.
The APC ttl should take care of fragment build up. I usually set it at 7200. I am running it on a small VPS with WordPress and my settings are:
apc.enabled=1
apc.shm_segments=3
apc.shm_size=32
apc.ttl=7200
apc.user_ttl=7200
apc.num_files_hint=2048
apc.mmap_file_mask=/tmp/apc.XXXXXX
apc.enable_cli=1
apc.max_file_size=10M
You will also get a lot more benefit from it by using WordPress's built in object cache and Mark Jaquith wrote a really good drop in plugin that should also help with some of your fragmentation issues when saving or editing a post.
You really should set apc.stat=0 on your production server and it will prevent APC from actually going to the IO to check if the file has been changed.
Check out documentation first: http://php.net/manual/en/apc.configuration.php

What Tool or Script Can I Use to Find Which Directory Is Invalid When Receiving a "The directory name is invalid" error in IIS 7?

The Goal
I would like only a certain group of users (who are in an Active Directory group composed of users from two domains) to be able to execute a web script, in http://www.site.org/protected, after being challenged for authentication.
The Setup
Windows 2008, IIS 7. User Account Access has been disabled, as it is a pain and sometimes causes perfectly reasonable things to fail. The server is part of a domain I will called LITTLEDOMAIN. We have a trust with BIGDOMAIN.
I have a group, called "LITTLEDOMAIN\can-use-this." In that group are the members LITTLEDOMAIN\me and BIGDOMAIN\me. I did the bit in Active Directory where the server now allows that group to authenticate against another domain (BIGDOMAIN).
The application pool for www.site.org runs as "NetworkService."
The dirctory has the user SYSTEM, the user NETWORK SERVICE, the group Enterprise Admins, and the group LITTLEDOMAIN\can-use-this with at least Read and Execute permissions.
In IIS 7, I have disabled all forms of authentication for that directory but Windows Authentication. As to Authorization Rules, All Users are Allowed.
The Error
When I use, say, FireFox, visit the URL http://www.site.org/protected, and am presented with a challenge, I can enter the username LITTLEDOMAIN\me and my password, then see the minimal HTML generated by my very simple Python script, which is basically a "Hello, World" with a timestamp thrown in so I can make sure caching of the page does not occur. If I use BIGDOMAIN\me, I receive an HTTP 500 error.
Diagnostics Performed
The passwords for LITTLEDOMAIN\me and BIGDOMAIN\me are the same; this has been checked.
I look in the HTTP logs and see the "500 0 267" for "sc-status sc-substatus sc-win32-status" in the HTTP logs. A "net helpmsg 267" from the command line gives me "The directory name is invalid."
I added Failed Request Tracing Rules and see the same unhelpful message in the XML: "The directory name is invalid. (0x8007010b)"
I have turned on file object auditing in the policy for that server, then set the auditing for the directory and the files within it to have all failures for "Everyone," but nothing shows up in the Security section of Event Viewer. I was able to cause other failures, so I know that failure auditing is working. This suggests that the system is not even getting to the point where the file is being accessed.
I gave, temporarily, the group LITTLEDOMAIN\can-use-this full control over the C:\TEMP directory, on the off chance this was in use. I recycled the application pool. The same error occurs. I tried this in C:\Windows\Temp as well, to no avail.
The Question
How can I find out "well, WHICH directory name is invalid?" It's pretty obvious that something, somewhere along the line, wants permissions for BIGDOMAIN, but I cannot figure out where.
The missing component, in addition to an audit policy and Failed Request Tracing, is Process Monitor. Not Process Explorer, but Process Monitor.
Run Process Monitor for three or so seconds, just long enough to get your request in, and have it fail. Use Failed Request Tracing to get the process ID that failed. Use Process Monitor's filter to show only events where the process ID appears -- you can then see where it fails.
Then set auditing on that directory to see what account is being used.
It appears that IIS 7 is returning to the root of the webserver when looking at a protected subdirectory. Odd.

Resources