WildFly 8.2.X hangs after REdeployment and gets unreponsive - deadlock

We move an application from JBoss AS 7.1.1 to WildFly 8.2.X (8.2.0-Final and 8.2.1-Final) and discovered the following problem:
First deployment works OK (slower than with JBoss AS 7.1.1, but that seems to me to be another problem).
After we redeploy the same EAR file (either from Eclipse or from the Web Interface), the JAX-RS requests are processed as long as they are not concurrent/sequential. When two parallel JAX-RS requests come, any Jax-RS requests (incl. the first two parallel) will simply timeout. No matter to which REST Resource the HTTP Requests will be dispatched.
I have debugged a bit the RestEasy 3.0.10 library and detected that the code simply waits for the dispatched REST method to return. On the other side once hanged, it never enters my REST method (of my Rest Resource).
Any ideas on how to debug further? I cannot reproduce this behavior with other EAR applications on exactly the same server.

After checking further with jconsole, I have seen that a deadlock is created: a thread waits in
org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:231)
org.apache.log4j.JBossAppenderHandler.doPublish(JBossAppenderHandler.java:42)
org.jboss.logmanager.ExtHandler.publish(ExtHandler.java:79)
org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:296)
org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:304)
org.jboss.logmanager.Logger.logRaw(Logger.java:721)
org.jboss.logmanager.Logger.log(Logger.java:506)
org.jboss.stdio.AbstractLoggingWriter.write(AbstractLoggingWriter.java:71)
- locked java.lang.StringBuilder#497a942
org.jboss.stdio.WriterOutputStream.finish(WriterOutputStream.java:143)
org.jboss.stdio.WriterOutputStream.flush(WriterOutputStream.java:164)
- locked sun.nio.cs.UTF_8$Decoder#e92e69
java.io.PrintStream.write(PrintStream.java:482)
- locked java.io.PrintStream#d4482dd
and another waits in
java.io.PrintStream.flush(PrintStream.java:335)
org.jboss.stdio.StdioContext$DelegatingPrintStream.flush(StdioContext.java:216)
sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:297)
sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
- locked java.io.OutputStreamWriter#7797a41d
java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59)
org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324)
org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
The problem seems to be that the EAR application comes with its own log4j library, without excluding the one from WildFly. The following part in the jboss-deployment-structure.xml file seems to solve the problem, by disabling the loading of the logging subsystem:
<jboss-deployment-structure>
<deployment>
<!-- exclude-subsystem prevents a subsystems deployment unit processors running on a deployment -->
<!-- which gives basically the same effect as removing the subsystem, but it only affects single deployment -->
<exclude-subsystems>
<subsystem name="logging" />
</exclude-subsystems>
</deployment>
</jboss-deployment-structure>

Related

IIS hung requests - can't see CLR stacktraces in memory dump

ASP.NET WebAPI2 application on .NET 4.6.2, hosted on IIS on Windows Server 2016. From time to time, there is a lot (hundreds) of requests stuck for hours (despite the fact I have request timeout 60s set) with no CPU usage. So, I took the memory dump of w3wp process, along with sos.dll, clr.dll and mscordacwks.dll and all my project's dlls and pdbs from bin directory from server and used WinDbg as described in many blogs and tutorials. But, in all of them, they are able to directly see CLR stack by calling ~*e !clrstack. I can see CLR stacktrace for some Redis and ApplicationInsights workers, but for all other managed threads I can see only:
OS Thread Id: 0x1124 (3)
Child SP IP Call Site
GetFrameContext failed: 1
0000000000000000 0000000000000000
!dumpstack for any of these gives just this:
0:181> !dumpstack
OS Thread Id: 0x1754 (181)
Current frame: ntdll!NtWaitForSingleObject+0x14
Child-SP RetAddr Caller, Callee
000000b942c7f6a0 00007fff33d63acf KERNELBASE!WaitForSingleObjectEx+0x8f, calling ntdll!NtWaitForSingleObject
000000b942c7f740 00007fff253377a6 clr!CLRSemaphore::Wait+0x8a, calling kernel32!WaitForSingleObjectEx
000000b942c7f7b0 00007fff25335331 clr!GCCoop::GCCoop+0xe, calling clr!GetThread
000000b942c7f800 00007fff25337916 clr!ThreadpoolMgr::UnfairSemaphore::Wait+0xf1, calling clr!CLRSemaphore::Wait
000000b942c7f840 00007fff253378b1 clr!ThreadpoolMgr::WorkerThreadStart+0x2d1, calling clr!ThreadpoolMgr::UnfairSemaphore::Wait
000000b942c7f8e0 00007fff253d952f clr!Thread::intermediateThreadProc+0x86
000000b942c7f9e0 00007fff253d950f clr!Thread::intermediateThreadProc+0x66, calling clr!_chkstk
000000b942c7fa20 00007fff37568364 kernel32!BaseThreadInitThunk+0x14, calling ntdll!LdrpDispatchUserCallTarget
000000b942c7fa50 00007fff3773e821 ntdll!RtlUserThreadStart+0x21, calling ntdll!LdrpDispatchUserCallTarget
So I have no idea, where to look for bug in my code.
(here is the full result:
https://gist.github.com/rouen-sk/eff11844557521de367fa9182cb94a82
and here is the results of !threads:
https://gist.github.com/rouen-sk/b61cba97a4d8300c08d6a8808c4bff6e)
What can I do? Google search for GetFrameContext failed gives nothing helpful.
As mentioned, this is not trivial, however you can find a case study of similar problem here: https://blogs.msdn.microsoft.com/rodneyviana/2015/03/27/the-case-of-the-non-responsive-mvc-web-application/
In a nutshell:
Download NetExt. It is the zip file here:
https://github.com/rodneyviana/netext/tree/master/Binaries
Open your dump and load NetExt
Run !windex to index the heap
Run !whttp -order -running to see a list of running requests
If the requests contains thread number you can go to the thread to see what is happening
If the requests contains --- instead of thread number, they are waiting a thread and this is a sign that some throttling is happening
If it is a WCF service, run !wservice to see the services
Run !wruntime to see runtime information
Run !wapppool to see Application Pool information
Run !wdae to list all errors
... And so it goes. When you do this again and again you will be able to spot issues easily

Running Apache spark job from Spring Web application using Yarn client or any alternate way

I have recently started using spark and I want to run spark job from Spring web application.
I have a situation where I am running web application in Tomcat server using Spring boot.My web application receives a REST web service request based on that It needs to trigger spark calculation job in Yarn cluster. Since my job can take longer to run and can access data from HDFS, so I want to run the spark job in yarn-cluster mode and I don't want to keep spark context alive in my web layer. One other reason for this is my application is multi tenant so each tenant can run it's own job, so in yarn-cluster mode each tenant's job can start it's own driver and run in it's own spark cluster. In web app JVM, I assume I can't run multiple spark context in one JVM.
I want to trigger spark jobs in yarn-cluster mode from java program in the my web application. what is the best way to achieve this. I am exploring various options and looking your guidance on which one is best
1) I can use spark-submit command line shell to submit my jobs. But to trigger it from my web application I need to use either Java ProcessBuilder api or some package built on java ProcessBuilder. This has 2 issues. First it doesn't sound like a clean way of doing it. I should have a programatic way of triggering my spark applications. Second problem will be I will loose the capability of monitoring the submitted application and getting it's status.. Only crude way of doing it is reading the output stream of spark-submit shell, which again doesn't sound like good approach.
2) I tried using Yarn client to submit the job from spring application. Following is the code that I use to submit spark job using Yarn Client:
Configuration config = new Configuration();
System.setProperty("SPARK_YARN_MODE", "true");
SparkConf conf = new SparkConf();
ClientArguments cArgs = new ClientArguments(sparkArgs, conf);
Client client = new Client(cArgs, config, conf);
client.run();
But when I run the above code, it tries to connect on localhost only. I get this error:
5/08/05 14:06:10 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/08/05 14:06:12 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
So I don't think it can connect to remote machine.
Please suggest, what is best way of doing this with latest version of spark. Later I have plans to deploy this entire application in amazon EMR. So approach should work there also.
Thanks in advance
Spark JobServer might help:https://github.com/spark-jobserver/spark-jobserver, this project receives RESTful web requests and start a spark job. Results is returned as json response.
I also had similar issues trying to run Spark app that connects to YARN cluster - having no cluster config it was trying to connect to the local machine as for the main node of the cluster, which obviously failed.
It worked for me when I've placed core-site.xml and yarn-site.xml into the classpath (src/main/resources in typical sbt or Maven project structure) - application correctly connected to the cluster.
When using spark-submit location of those files is typically specified by HADOOP_CONF_DIR environment variable, but for stand-alone application it didn't have effect.

Transaction issue while migrating Jboss 5.1 to WildFly 8.2

I am mygrating my application from Jboss 5.1.0 to Wild Fly 8.2. While starting the server we are fetching the data from data base and storing in application scope. This was working fine in Jboss 5.1.0 and not working on WildFly 8.2. It is showing the below warnings.
15:04:38,152 WARN [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff0ab6a4f7:3057cd0:55c9c01b:8 in state RUN
15:04:38,156 WARN [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012095: Abort of action id 0:ffff0ab6a4f7:3057cd0:55c9c01b:8 invoked while multiple threads active within it.
15:04:38,157 WARN [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012108: CheckedAction::check - atomic action 0:ffff0ab6a4f7:3057cd0:55c9c01b:8 aborting with 1 threads active!
15:04:38,158 WARN [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012121: TransactionReaper::doCancellations worker Thread[Transaction Reaper Worker 0,5,main] successfully canceled TX 0:ffff0ab6a4f7:3057cd0:55c9c01b:8
I have double checked that my data source configuration is correct.
As part of migration I have upgraded Seam 2.2.0 to 2.3.1 and EJB 3.0 to EJB 3.1. I am suspecting that there might be an issue with upgrading Seam and EJB.
I am understanding why I am getting the above transaction, please help me if any one has solution for the above issue.
Thanks,
Sreenath
It looks like one of your transaction is Timing out. There are a couple of things I would suggest that you can do.
In your standalone.xml file change the logging level to TRACE
to get more details of the issue. You'll need to change the value to
<logger category="com.arjuna">
<level name="TRACE"/>
</logger>
You can increase the Timeout value to something higher, the default value is 300. For changing the time out
Login To JBoss Management Console (localhost:9990 by default)
Go To Configuration > container > Timeout
Change the Default Timeout value to something higher.
You can look into this thread for some help.
The exact issue was I am using hibernate 3.6.3 which is not compatible with Seam 2.3.1. This issue was resolved after migrating to Hibernate 4.0.1.

How can I destroy the threads created by Akka (not just actors) on servlet undeploy?

I'm using a simple Spray-based servlet. After deploying and running this servlet on Tomcat7 I undeploy it (and possibly deploy it again afterwards) without restarting the servlet container (so basically the JVM instance is preserved).
The problem is that the threads created by Akka at each servlet deploy are not destroyed when the servled is undeployed (i.e. when Akka shuts-down) and a new set of threads are created at every deploy. Thus... leakage.
Calling system.shutdown() and system.awaitTermination() is useless.
Is there a way of killing these threads spawned at servlet initialization?
Here is a sample log entry from Tomcat7:
SEVERE: The web application [/...] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal#68871741]) and a value of type [scala.concurrent.forkjoin.ForkJoinPool.Submitter] (value [scala.concurrent.forkjoin.ForkJoinPool$Submitter#155aa3ef]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
Nov 14, 2013 1:53:24 PM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
Have you tried calling system.shutdown() and system.awaitTermination() at ServletContextListener#contextDestroyed()? That should clear up all resources before moving on to undeploy the app.
If you're using the Scala API, I've created a PR for this: https://github.com/spray/spray/pull/787
Cheers
Tulio

servlet initialization failure in websphere 6.0

I have many servlets in a web applicaton; for some stange reason, only and only one of them always fails in initialization with the following error trace:-
00000045 ServletWrappe E SRVE0100E: Did not realize init() exception thrown by servlet MyServletX: java.lang.NullPointerException
at com.ibm.ws.webcontainer.WebAppPmiListener.onServletStartInit(WebAppPmiListener.java:120)
at com.ibm.ws.webcontainer.webapp.FireOnServletStartInit.fireEvent(WebAppEventSource.java:237)
at com.ibm.ws.webcontainer.util.EventListeners.fireEvent(EventListeners.java:48)
at com.ibm.ws.webcontainer.webapp.WebAppEventSource.onServletStartInit(WebAppEventSource.java:105)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.init(ServletWrapper.java:261)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:444)
at com.ibm.ws.webcontainer.webapp.WebApp.handleRequest(WebApp.java:2841)
at com.ibm.ws.webcontainer.webapp.WebGroup.handleRequest(WebGroup.java:220)
at com.ibm.ws.webcontainer.VirtualHost.handleRequest(VirtualHost.java:204)
at com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:1681)
at com.ibm.ws.webcontainer.channel.WCChannelLink.ready(WCChannelLink.java:77)
at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:421)
at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleNewInformation(HttpInboundLink.java:367)
at com.ibm.ws.http.channel.inbound.impl.HttpICLReadCallback.complete(HttpICLReadCallback.java:94)
at com.ibm.ws.tcp.channel.impl.WorkQueueManager.requestComplete(WorkQueueManager.java:548)
at com.ibm.ws.tcp.channel.impl.WorkQueueManager.attemptIO(WorkQueueManager.java:601)
at com.ibm.ws.tcp.channel.impl.WorkQueueManager.workerRun(WorkQueueManager.java:934)
at com.ibm.ws.tcp.channel.impl.WorkQueueManager$Worker.run(WorkQueueManager.java:1021)
at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1332)
I could not figure out if there is anything extra ordinary with this servlet. There is no init() method in this servlet and it extends HTTPServlet. Any idea what could be reason? I am using websphere server 6.0.x. How to get more debugging information in this case?
Well I don't know still cause of above error, but this is how it started working strangely:- i) Re-applied recommended fixes by IBM for my WAS version (especially there are IBM JDK upgrade related fix patches) ii) created a new profile of server iii) Install web application to new profile and it started working.
I don't think this is a product issue.
To debug this problem what i would suggest is to place a simple servlet (kind of Hello World) and deploy it to the server and see what happens.
initialization does not necessarily mean init() method alone.
If you have a static block in your servlet, if you have any variables that are initialized they would all be part of the initialization activity.
Look at the FFDC logs that were generated when this error occurred and that should provide you with clues.
As bkail mentioned, also ensure that yo have the latest fixpacks just to eliminate known problems with the product.
if the hello world servlet works, suggest you place hte servlet code here along with the SystemOut and System Err logs that correspond to this issue along with the relevant FFDC logs and i am sure most of us will be able to help you out with this
HTH
Manglu

Resources