use global SLA in byon but have exception "no management space located". - cloudify

byon with phycal machine, SLA is global, how to ensure that the applications are not be installed on the same machine
I set this SLA in jetty-service.groovy
isolationSLA {
global
{ instanceCpuCores 0 instanceMemoryMB 256 useManagement false }
} But when I deploy an application with 2 instance, then two instance be install in same machine, at last, the result is one instance start successful, but another start failed, the failed one get an exception: "no management space located". I find this exception in org.cloudifysource.utilitydomain.context.kvstore.AttributesFacadeImpl.getManagem‌​entSpace() the exception in the pic: http://i.stack.imgur.com/9MVF9.png
how can i do? thank you!

There doesn't seem to be an apparent reason for this to happen.
Most likely this is due to heavy load on the machine, check you CPU and memory readings while you deploy.
It may be just a matter of using a larger machine..
Hope this helps.

Related

Google Cloud VM instance can't start and cannot be moved because of terminated state

I wanted to resize RAM and CPU of my machine, so I stopped the VM instance and when I tried to start it I got an error:
The zone 'projects/freesarkarijobalerts/zones/asia-south1-a' does not
have enough resources available to fulfill the request. Try a
different zone, or try again later.`
Here you can see the screenshot.
I've tried to start VM instance today, but result was the same and I got an error message again:
The zone 'projects/freesarkarijobalerts/zones/asia-south1-a' does not
have enough resources available to fulfill the request. Try a
different zone, or try again later.`
Then I tried to move my instance to different region, but I got an error message:
sarkarijobalerts123#cloudshell:~ (freesarkarijobalerts)$ gcloud compute instances move wordpress-2-vm --zone=asia-south1-a --destination-zone=asia-south1-b
Moving gce instance wordpress-2-vm...failed.
ERROR: (gcloud.compute.instances.move) Instance cannot be moved while in state: TERMINATED
My website is DOWN for a couple of days, please help me.
The standard procedure is to create a snapshot out of the stopped VM instance [1] and then create a new one in another zone [2].
[1] https://cloud.google.com/compute/docs/disks/create-snapshots
[2] https://cloud.google.com/compute/docs/disks/restore-and-delete-snapshots#restore_a_snapshot_of_a_persistent_disk_to_a_new_disk
Let's have a look at the cause of this issue:
When you stop an instance it releases some resources like vCPU and memory.
When you start an instance it requests resources like vCPU and memory back and if there's not enough resources available in the zone you'll get an error message:
Error: The zone 'projects/freesarkarijobalerts/zones/asia-south1-a' does not have enough resources available to fulfill the request. Try a different zone, or try again later.
more information available in the documentation:
If you receive a resource error (such as ZONE_RESOURCE_POOL_EXHAUSTED
or ZONE_RESOURCE_POOL_EXHAUSTED_WITH_DETAILS) when requesting new
resources, it means that the zone cannot currently accommodate your
request. This error is due to Compute Engine resource obtainability,
and is not due to your Compute Engine quota.
Resource availability are depending from users requests and therefore are dynamic.
There are a few ways to solve your issue:
Move your instance to another zone by following instructions.
Wait for a while and try to start your VM instance again.
Reserve resources for your VM by following documentation to avoid such issue in future:
Create reservations for Virtual Machine (VM) instances in a specific
zone, using custom or predefined machine types, with or without
additional GPUs or local SSDs, to ensure resources are available for
your workloads when you need them. After you create a reservation, you
begin paying for the reserved resources immediately, and they remain
available for your project to use indefinitely, until the reservation
is deleted.

Google Cloud Composer (Apache Airflow) cannot access log files

I'm running a DAG in Google Cloud Composer (hosted Airflow) which runs fine in Airflow locally. All it does is print "Hello World". However, when I run it through Cloud Composer I receive the error:
*** Log file does not exist: /home/airflow/gcs/logs/matts_custom_dag/main_test/2020-04-20T23:46:53.652833+00:00/2.log
*** Fetching from: http://airflow-worker-d775d7cdd-tmzj9:8793/log/matts_custom_dag/main_test/2020-04-20T23:46:53.652833+00:00/2.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-d775d7cdd-tmzj9', port=8793): Max retries exceeded with url: /log/matts_custom_dag/main_test/2020-04-20T23:46:53.652833+00:00/2.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8825920160>: Failed to establish a new connection: [Errno -2] Name or service not known',))
I've also tried making the DAG add data into a database and it actually succeeds 50% of the time. However, it always returns this error message (and no other print statements or logs). Any help much appreciated on why this might be happening.
We also faced the same issue then raised a support ticket to GCP and got the following reply.
The message is related to the latency of syncing logs from Airflow workers to WebServer, it takes at least some minutes (depending on the number of objects and their size)
The total log size seems not large but it’s enough to noticeably slow down synchronization, hence, we recommend cleanup/archive the logs
Basically we recommend relying on Stackdriver logs instead, because of latency due to the design of this sync
I hope this will help you solve the problem.
I have the same problem after upgrading from 1.10.3 to 1.10.6 of Google Composer.
I can see in my logs that airflow is trying to get the logs from a bucket with a name ended with -tenant while the bucket in my account ends with -bucket
In the configuration, I can see something weird too.
## airflow.cfg
[core]
remote_base_log_folder = gs://us-east1-dada-airflow-xxxxx-bucket/logs
## also in the running configuration says
core remote_base_log_folder gs://us-east1-dada-airflow-xxxxx-tenant/logs env var
I wrote to google support and they said the team is working on a fix.
EDIT:
I've been accessing my logs with gsutil and replacing the bucket name suffix to -bucket
gsutil cat gs://us-east1-dada-airflow-xxxxx-bucket/logs/...../5.logs
I faced the same situation in multiple occasions.
As soon as when the job finished when I take a look at the log on Airflow Web UI, it used to give me the same error. Although when I check back the same logs on UI after a min or 2, I could see the logs properly.
As per the above answers, its a sync issue between the webserver and the Worker node.
In general, the issue describe here should be more like a sporadic issue.
In certain situations, what could help is setting default-task-retries to a value that allows for retrying a task at least 1.
This issue is resolved at least since Airflow version: 1.10.10+composer.

Execution group not starting Websphere message broker 8

We have Websphere Message Broker v8.0.0.3 on Redhat Linux env, and we encounterd the following runtime error message:
BIP2057
Execution Group <insert_1> could not be started: broker name <insert_2>; UUID <insert_3>; label <insert_4>; Pub-Sub server <insert_5> (1=Yes, 0=No).
The situation is that one of the execution groups stopped (without any trace/log about it) and the wmb could not auto start with the BIP2057 error.
the weird part occurs when another EG restarted (unlinked reason to the first problem, because of some applicative errors) and then the first EG "took its place", started successfully, and now the second EG could not start for the same reason.
We have checked for user permissions or the logs without any success in identifying the problem.
Any help would be much appreciated.
Run the following service IBM MQSeries if it's not running already:
C:\Program Files (x86)\IBM\WebSphere MQ\bin\amqlsvc.exe
If it gives an
exception of Access Denied, your user must not be having rights to
log on.
Another possible resolution can be by restarting the
DataFlowEngine.exe process.
If none of the above scenarios fulfill the need, restart the system.

Random w3wp.exe crashes in .net 4

I have a website which has been up and running absolutely fine for about 8 months now. It's running .net 4 intergrated mode.
Recently i've started to get some "random" w3wp.exe crashes, and after 5 of them, IIS rapid fail protection kicks in and I have to manually login to the server and start the application pool again.
Here's what the event viewer says for the Error:
Faulting application name: w3wp.exe, version: 7.5.7601.17514, time stamp: 0x4ce7afa2
Faulting module name: nlssorting.dll, version: 4.0.30319.296, time stamp: 0x504835c7
Exception code: 0xc00000fd
Fault offset: 0x000000000000191f
Faulting process id: 0x1998
Faulting application start time: 0x01ce6e6b9b80c949
Faulting application path: c:\windows\system32\inetsrv\w3wp.exe
Faulting module path: C:\Windows\Microsoft.NET\Framework64\v4.0.30319\nlssorting.dll
Report Id: d9cf3164-da5e-11e2-8cc5-f46d0440f6d5
Straight after the crashes, i get an "Information" log in the event viewer which at the bottom gives me the location of a .wer file.
This is what the .wer files contains:
Version=1
EventType=APPCRASH
EventTime=130162847687759734
ReportType=2
Consent=1
ReportIdentifier=d7c5e520-da5e-11e2-8cc5-f46d0440f6d5
IntegratorReportIdentifier=d7c5e51f-da5e-11e2-8cc5-f46d0440f6d5
Response.type=4
Sig[0].Name=Application Name
Sig[0].Value=w3wp.exe
Sig[1].Name=Application Version
Sig[1].Value=7.5.7601.17514
Sig[2].Name=Application Timestamp
Sig[2].Value=4ce7afa2
Sig[3].Name=Fault Module Name
Sig[3].Value=nlssorting.dll
Sig[4].Name=Fault Module Version
Sig[4].Value=4.0.30319.296
Sig[5].Name=Fault Module Timestamp
Sig[5].Value=504835c7
Sig[6].Name=Exception Code
Sig[6].Value=c00000fd
Sig[7].Name=Exception Offset
Sig[7].Value=000000000000197d
DynamicSig[1].Name=OS Version
DynamicSig[1].Value=6.1.7601.2.1.0.1296.17
DynamicSig[2].Name=Locale ID
DynamicSig[2].Value=2057
DynamicSig[22].Name=Additional Information 1
DynamicSig[22].Value=6141
DynamicSig[23].Name=Additional Information 2
DynamicSig[23].Value=61419d6dee6cf74b8ac2b00b4c3b3373
DynamicSig[24].Name=Additional Information 3
DynamicSig[24].Value=c19b
DynamicSig[25].Name=Additional Information 4
DynamicSig[25].Value=c19b8acf029a3088171b1f5f3dd9dc77
UI[2]=c:\windows\system32\inetsrv\w3wp.exe
UI[5]=Check online for a solution (recommended)
UI[6]=Check for a solution later (recommended)
UI[7]=Close
UI[8]=IIS Worker Process stopped working and was closed
UI[9]=A problem caused the application to stop working correctly. Windows will notify you if a solution is available.
UI[10]=&Close
LoadedModule[0]=c:\windows\system32\inetsrv\w3wp.exe
LoadedModule[1]=C:\Windows\SYSTEM32\ntdll.dll
LoadedModule[2]=C:\Windows\system32\kernel32.dll
LoadedModule[3]=C:\Windows\system32\KERNELBASE.dll
LoadedModule[4]=C:\Windows\system32\ADVAPI32.dll
LoadedModule[5]=C:\Windows\system32\msvcrt.dll
LoadedModule[6]=C:\Windows\SYSTEM32\sechost.dll
LoadedModule[7]=C:\Windows\system32\RPCRT4.dll
LoadedModule[8]=C:\Windows\system32\pcwum.DLL
LoadedModule[9]=C:\Windows\system32\USER32.dll
LoadedModule[10]=C:\Windows\system32\GDI32.dll
LoadedModule[11]=C:\Windows\system32\LPK.dll
LoadedModule[12]=C:\Windows\system32\USP10.dll
LoadedModule[13]=C:\Windows\system32\ole32.dll
LoadedModule[14]=c:\windows\system32\inetsrv\IISUTIL.dll
LoadedModule[15]=C:\Windows\system32\IMM32.DLL
LoadedModule[16]=C:\Windows\system32\MSCTF.dll
LoadedModule[17]=C:\Windows\system32\CRYPTBASE.dll
LoadedModule[18]=C:\Windows\system32\ntmarta.dll
LoadedModule[19]=C:\Windows\system32\WLDAP32.dll
LoadedModule[20]=c:\windows\system32\inetsrv\w3wphost.dll
LoadedModule[21]=C:\Windows\system32\OLEAUT32.dll
LoadedModule[22]=c:\windows\system32\inetsrv\nativerd.dll
LoadedModule[23]=C:\Windows\system32\CRYPT32.dll
LoadedModule[24]=C:\Windows\system32\MSASN1.dll
LoadedModule[25]=C:\Windows\system32\XmlLite.dll
LoadedModule[26]=C:\Windows\system32\ktmw32.dll
LoadedModule[27]=c:\windows\system32\inetsrv\IISRES.DLL
LoadedModule[28]=C:\Windows\system32\CRYPTSP.dll
LoadedModule[29]=C:\Windows\system32\rsaenh.dll
LoadedModule[30]=C:\Windows\system32\mscoree.dll
LoadedModule[31]=C:\Windows\system32\CLBCatQ.DLL
LoadedModule[32]=C:\Windows\system32\mlang.dll
LoadedModule[33]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\webengine4.dll
LoadedModule[34]=C:\Windows\system32\MSVCR100_CLR0400.dll
LoadedModule[35]=C:\Windows\system32\USERENV.dll
LoadedModule[36]=C:\Windows\system32\profapi.dll
LoadedModule[37]=C:\Windows\system32\PSAPI.DLL
LoadedModule[38]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\mscoreei.dll
LoadedModule[39]=C:\Windows\system32\SHLWAPI.dll
LoadedModule[40]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\clr.dll
LoadedModule[41]=C:\Windows\system32\inetsrv\iiscore.dll
LoadedModule[42]=c:\windows\system32\inetsrv\W3TP.dll
LoadedModule[43]=c:\windows\system32\inetsrv\w3dt.dll
LoadedModule[44]=C:\Windows\system32\HTTPAPI.dll
LoadedModule[45]=C:\Windows\system32\slc.dll
LoadedModule[46]=C:\Windows\system32\WS2_32.dll
LoadedModule[47]=C:\Windows\system32\NSI.dll
LoadedModule[48]=C:\Windows\system32\Normaliz.dll
LoadedModule[49]=C:\Windows\system32\faultrep.dll
LoadedModule[50]=C:\Windows\system32\Secur32.dll
LoadedModule[51]=C:\Windows\system32\SSPICLI.DLL
LoadedModule[52]=C:\Windows\system32\NLAapi.dll
LoadedModule[53]=C:\Windows\system32\napinsp.dll
LoadedModule[54]=C:\Windows\System32\mswsock.dll
LoadedModule[55]=C:\Windows\system32\DNSAPI.dll
LoadedModule[56]=C:\Windows\System32\winrnr.dll
LoadedModule[57]=C:\Windows\System32\wshtcpip.dll
LoadedModule[58]=C:\Windows\System32\wship6.dll
LoadedModule[59]=C:\Windows\system32\IPHLPAPI.DLL
LoadedModule[60]=C:\Windows\system32\WINNSI.DLL
LoadedModule[61]=C:\Windows\system32\rasadhlp.dll
LoadedModule[62]=C:\Windows\System32\fwpuclnt.dll
LoadedModule[63]=C:\Windows\System32\inetsrv\cachuri.dll
LoadedModule[64]=C:\Windows\System32\inetsrv\cachfile.dll
LoadedModule[65]=C:\Windows\System32\inetsrv\cachtokn.dll
LoadedModule[66]=C:\Windows\System32\inetsrv\cachhttp.dll
LoadedModule[67]=C:\Windows\System32\inetsrv\compdyn.dll
LoadedModule[68]=C:\Windows\System32\inetsrv\compstat.dll
LoadedModule[69]=C:\Windows\System32\inetsrv\defdoc.dll
LoadedModule[70]=C:\Windows\System32\inetsrv\protsup.dll
LoadedModule[71]=C:\Windows\System32\inetsrv\redirect.dll
LoadedModule[72]=C:\Windows\System32\inetsrv\static.dll
LoadedModule[73]=C:\Windows\System32\inetsrv\authanon.dll
LoadedModule[74]=C:\Windows\System32\inetsrv\authbas.dll
LoadedModule[75]=C:\Windows\System32\inetsrv\authsspi.dll
LoadedModule[76]=C:\Windows\system32\NETAPI32.dll
LoadedModule[77]=C:\Windows\system32\netutils.dll
LoadedModule[78]=C:\Windows\system32\srvcli.dll
LoadedModule[79]=C:\Windows\system32\wkscli.dll
LoadedModule[80]=C:\Windows\System32\inetsrv\iprestr.dll
LoadedModule[81]=C:\Windows\System32\inetsrv\modrqflt.dll
LoadedModule[82]=C:\Windows\System32\inetsrv\logcust.dll
LoadedModule[83]=C:\Windows\System32\inetsrv\custerr.dll
LoadedModule[84]=C:\Windows\System32\inetsrv\loghttp.dll
LoadedModule[85]=C:\Windows\System32\inetsrv\isapi.dll
LoadedModule[86]=C:\Windows\System32\inetsrv\filter.dll
LoadedModule[87]=C:\Windows\System32\inetsrv\validcfg.dll
LoadedModule[88]=c:\Windows\Microsoft.NET\Framework64\v4.0.30319\aspnet_filter.dll
LoadedModule[89]=C:\Windows\system32\inetsrv\wbhst_pm.dll
LoadedModule[90]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\webengine.dll
LoadedModule[91]=C:\Windows\assembly\NativeImages_v4.0.30319_64\mscorlib\4f52500ab48877b85e71430f4f46670f\mscorlib.ni.dll
LoadedModule[92]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\nlssorting.dll
LoadedModule[93]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System\a91f32875cb3ba779f1b3ceff1690251\System.ni.dll
LoadedModule[94]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Core\0a8d99339ffe6b25debb8f8201c27664\System.Core.ni.dll
LoadedModule[95]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Web\5b905bd7b71f9fd6bea2d05cc1ae85f8\System.Web.ni.dll
LoadedModule[96]=C:\Windows\system32\sxs.dll
LoadedModule[97]=C:\Windows\system32\RpcRtRemote.dll
LoadedModule[98]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Configuration\fa65f89fd682c459fc5e7bcbd0418317\System.Configuration.ni.dll
LoadedModule[99]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Xml\f4afb233f160b8e55aad4660e45b374c\System.Xml.ni.dll
LoadedModule[100]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\clrjit.dll
LoadedModule[101]=C:\Windows\assembly\NativeImages_v4.0.30319_64\Microsoft.Build.Uti#\14e16d61fae3cd1d9a1fa79b789f8438\Microsoft.Build.Utilities.v4.0.ni.dll
LoadedModule[102]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Runtime.Cach#\8fdbe304abab0631b8a4310b35f3d93a\System.Runtime.Caching.ni.dll
LoadedModule[103]=C:\Windows\system32\shfolder.dll
LoadedModule[104]=C:\Windows\system32\SHELL32.dll
LoadedModule[105]=C:\Windows\assembly\NativeImages_v4.0.30319_64\Microsoft.JScript\85204dde340780329b569b025e249c23\Microsoft.JScript.ni.dll
LoadedModule[106]=C:\Windows\system32\version.dll
LoadedModule[107]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\587f6661\a99d8ff8\App_Code.cgixlnxh.dll
LoadedModule[108]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Data.Linq\feaa494ad67542d2060b31b9eeb6458b\System.Data.Linq.ni.dll
LoadedModule[109]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Data\b928128fca867546a858a1a39240d85c\System.Data.ni.dll
LoadedModule[110]=C:\Windows\Microsoft.Net\assembly\GAC_64\System.Data\v4.0_4.0.0.0__b77a5c561934e089\System.Data.dll
LoadedModule[111]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\587f6661\a99d8ff8\assembly\dl3\595a888a\f26c0653_7f81cd01\HtmlAgilityPack.dll
LoadedModule[112]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Drawing\5ae853f556290da9399b15b3619f7e15\System.Drawing.ni.dll
LoadedModule[113]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\587f6661\a99d8ff8\assembly\dl3\85ba5013\f0c8f388_706bce01\TweetSharp.dll
LoadedModule[114]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Web.Extensio#\0180a2d993d2a9699cf07f7163524fff\System.Web.Extensions.ni.dll
LoadedModule[115]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Transactions\7b2099a1386e38ff198a51939304ce6e\System.Transactions.ni.dll
LoadedModule[116]=C:\Windows\Microsoft.Net\assembly\GAC_64\System.Transactions\v4.0_4.0.0.0__b77a5c561934e089\System.Transactions.dll
LoadedModule[117]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\587f6661\a99d8ff8\App_global.asax.yxdky-qn.dll
LoadedModule[118]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.ServiceModel#\7a5a5ff4a0b3bb4ba4bcc13166918e36\System.ServiceModel.Activation.ni.dll
LoadedModule[119]=C:\Windows\system32\bcrypt.dll
LoadedModule[120]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Runtime.Dura#\799274e49455d0fe4ca563f42143bef2\System.Runtime.DurableInstancing.ni.dll
LoadedModule[121]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Numerics\a66416296451fe6d2d8a5506ca41b23d\System.Numerics.ni.dll
LoadedModule[122]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.ServiceModel\15834d73d2846fc01ed54488ccfff5c8\System.ServiceModel.ni.dll
LoadedModule[123]=C:\Windows\assembly\NativeImages_v4.0.30319_64\SMDiagnostics\31f93b6be386908ff2727bcd825de0ca\SMDiagnostics.ni.dll
LoadedModule[124]=C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Xaml.Hosting\cf8401f4952deb5303e0d7fd459ce530\System.Xaml.Hosting.ni.dll
LoadedModule[125]=C:\Windows\system32\inetsrv\gzip.dll
LoadedModule[126]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\587f6661\a99d8ff8\assembly\dl3\3d63b311\fe7c9b8a_706bce01\Hammock.ClientProfile.dll
LoadedModule[127]=C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\587f6661\a99d8ff8\assembly\dl3\6a128bd2\c184e08a_706bce01\Newtonsoft.Json.dll
LoadedModule[128]=C:\Windows\system32\rasapi32.dll
LoadedModule[129]=C:\Windows\system32\rasman.dll
LoadedModule[130]=C:\Windows\system32\rtutils.dll
LoadedModule[131]=C:\Windows\system32\winhttp.dll
LoadedModule[132]=C:\Windows\system32\webio.dll
LoadedModule[133]=C:\Windows\system32\credssp.dll
LoadedModule[134]=C:\Windows\system32\dhcpcsvc6.DLL
LoadedModule[135]=C:\Windows\system32\dhcpcsvc.DLL
LoadedModule[136]=C:\Windows\system32\security.dll
LoadedModule[137]=C:\Windows\system32\schannel.DLL
LoadedModule[138]=C:\Windows\system32\ncrypt.dll
LoadedModule[139]=C:\Windows\system32\bcryptprimitives.dll
LoadedModule[140]=C:\Windows\system32\GPAPI.dll
FriendlyEventName=Stopped working
ConsentKey=APPCRASH
AppName=IIS Worker Process
AppPath=c:\windows\system32\inetsrv\w3wp.exe
That nlssorting.dll seems to crop up a lot but I can't seem to find anything online related. The only thing I can find which matches my error is here, but that doesn't really help me.
I'm completely stumped as to where to go from here to fix this. Here's what I've tried:
Loading up IIS log files and trying every request from about 30 minutes before a crash, and none of the pages cause any errors.
Searching my code for any recursion which might cause a stackoverflow, but there isn't any
trawling online for ANYTHING that might help
Has anyone else ever had any problems with nlssorting.dll ? Can i get some more information from the .wer file that might help me pin point where this is happening?
Thanks in advance for any help!
UPDATE
I was using a 3rd party DLL, which was causing a stack overflow exception (0xc00000fd)
After more investigation, it was only happening after a certain chain of events happened - hence the "random" in the title. Removing the DLL fixed the problem.
We had the same problem with one of our sites. Using SVN we tracked it down to a method that was scaning for images within a folder.
I modified the code as follows:
Checking array length of scan results to be > 0 instead of == 1
Adding CultureInfo.InvariantCulture to all Int32.ToString() calls
After this we no longer experienced the error. The exact reason is still unknown.
I believe that none of the above points should make a difference in our environment. I believe that the problem could have been, people modifying image files and folders while the image scanning method was called.
I hope this helps somebody.
For anyone who's curious, this is a PITA to debug. Here are three reasons rumored for this to happen:
(1) Stack overflows, as in the original post.
(2) Too much CPU / memory usage, which becomes obvious and rapid fail protection closes the process.
(3) Unable to respond to pings / requests due to application hogging resources, but in a way that rapid fail protection deems appropriate to end the process, not explicitly because of either (1) or (2).
Our solution was to add manual log tracing in the production environment until we eventually found recursion which was leading the application to be stopped by reliability services (for inability to respond to pings, or process randomly crashing) rather than throwing an in-application exception.
I had the issue where w3p would throw an unhandled error as soon I spun up the site\api URI from the web browser, then it would crash.
I was able to pinpoint what part of my code was causing it in my case it was in the Owin Startup class and I was reading some configuration records from a database but prior to that it gets the connection string from a configuration file outside of the web app directory.
I checked the ownership of the folder it showed my account but apparently the subfolders were not owned by me so I set ownership to me again, clicked OK to allow permissions to traverse the objects and voila bye bye w3p error and the API loaded.
So in my case it was an access denied error on the folder \ file that contained the connection string.

How can I remove Host Instance Zombies from BTMessageBox

After moving most of our BT-Applications from BizTalk 2009 to BizTalk 2010 environment, we began the work to remove old applications and unused host. In this process we ended up with a zombie host instance.
This has resulted in that the bts_CleanupDeadProcesses startet to fail with error “Executed as user: RH\sqladmin. Could not find stored procedure 'dbo.int_ProcessCleanup_ProcessLabusHost'. [SQLSTATE 42000] (Error 2812). The step failed.”
After looking at the CleanupDeatProcess process, I found the zombie host instance found in the BTMsgBox.ProcessHeartBeats table, with dtNextHeartbeatTime set to the time when the host was removed.
(I'm assuming that the Host Instance Processes don't exist in your services any longer, and that the SQL Agent job fails)
From looking at the source of the [dbo].[bts_CleanupDeadProcesses] job, it loops through the dbo.ProcessHeartbeats table with a cursor (btsProcessCurse, lol) looking for 'dead' hearbeats.
Each process instance has its own cleanup sproc int_ProcessCleanup_[HostName] and a sproc for the heartbeat watchdog to call, viz bts_ProcessHeartbeat_[HostName] (although FWR the SPROC calls it #ApplicationName), filtered by WHERE (s.dtNextHeartbeatTime < #dtCurrentTime).
It is thus tempting to just delete the record for your deleted / zombie host (or, if you aren't that brave, to simply update the Next dtNextHeartbeatTime on the heartbeat record for your dead host instance to sometime next century). Either way, the SQL agent job should skip the dead instances.
An alternative could be to try and re-create the Host and Instances with the same name through the Admin Console, just to delete them (properly) again. This might however cause additional problems as BizTalk won't be able to create the 2 SPROCs above because of the undeleted objects.
However, I wouldn't obviously do this on your prod environment until you've confirmed this works with a trial run first.
It looks like someone else got stuck with a similar situation here
And there is also a good dive into the details of how the heartbeat mechanism works by XiaoDong Zhu here
Have you tried BTSTerminator? That works for one-off cleanups.
http://www.microsoft.com/en-us/download/details.aspx?id=2846

Resources