What is the proper #EmbeddedKafka usage for IntegrationTests? - spring-kafka

I've read the spring-kafka documentation, examples I found, and half of stackoverflow, but I fail to understand, who #EmbeddedKafka should work. Especially for integration tests.
#RunWith(SpringRunner.class)
#SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
#EnableKafka
#EmbeddedKafka(controlledShutdown = true
// brokerProperties = {"log.dir=${kafka.broker.logs-dir}",
// "listeners=PLAINTEXT://localhost:${kafka.broker.port:3333}",
// "auto.create.topics.enable=${kafka.broker.topics-enable:true}"}
)
and I had to add into test application.properties following:
….kafka.bootstrap-servers=${spring.embedded.kafka.brokers}
with all this in place, when test it executed, it complains several lines about kafka not being accessible:
[Producer clientId=producer-2] Connection to node 0 could not be established. Broker may not be available.
but after half of screen it proceeds, test succeds and finished with last unplesant message:
2019-06-17 12:31:54.438 WARN [testing,,,] 32448 --- [ost-startStop-1] o.a.c.loader.WebappClassLoaderBase : The web application [testing] appears to have started a thread named [kafka-producer-network-thread | producer-3] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
org.apache.kafka.common.network.Selector.select(Selector.java:674)
org.apache.kafka.common.network.Selector.poll(Selector.java:396)
org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460)
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:239)
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163)
java.lang.Thread.run(Thread.java:748)
questions:
1/ why there are those messages about broker inavailability at the start of each method, even if the context wasn't dirties and reintialized?
2/ why is the ugly message with unclosed thread present? What's wrong?

Related

Google Cloud Composer (Airflow) - Scalability issues

I'm facing some issues on my Cloud Composer instance resulting in failed tasks.
Details of instance configuration :
Composer image : composer-2.0.29-airflow-2.3.3 / Airflow version : 2.3.3
Airflow.cfg :
parallelism = 32 / dag_concurrency = 100 / worker_concurrency = 24
In terms of resources :
I have 60 DAGs which can contains up to 55 tasks that needs to run in parallel.
They don't do any compute, only some light PythonOperator/GCSOperator/BigQueryOperator.
I often encounter this type of errors :
*** Log file is not found: gs://xxx/xxx/attempt=2.log.
*** The task might not have been executed or worker executing it might have finished abnormally (e.g. was evicted).
*** Please, refer to https://cloud.google.com/composer/docs/how-to/using/troubleshooting-dags#common_issues hints to learn what might be possible reasons for a missing log.
All of my tasks have 3 retries but when it happens for a reason it stops at 2 retries and send a failure error. I don't understand why. Example of error in mail sent :
Try 2 out of 3
Exception:
Executor reports task instance finished (failed) although the task says its queued. (Info: None) Was the task killed externally?
I also receives random zombie tasks Detected as zombie
My metrics are the following :
When I clear the task, it succeeds as it should.
(I don't have access to GKE but if it helps I can ask to have access)
Any advice to prevent this errors and understand what happens ?

Including Multiple Attachments In Transaction Kills Node

Setup
Corda 4.6
Working from Java template
I have been experimenting adding up to 10 Attachments of small (1K) zip files to a transaction.
Error when testing with StartedMockNodes:
io.github.classgraph.ClassGraphException: Uncaught exception during scan
at io.github.classgraph.ClassGraphException.newClassGraphException(ClassGraphException.java:89) ~[classgraph-4.8.90.jar:4.8.90]
at io.github.classgraph.ClassGraph.scan(ClassGraph.java:1555) ~[classgraph-4.8.90.jar:4.8.90]
...
Caused by: java.lang.OutOfMemoryError: Java heap space
at nonapi.io.github.classgraph.fastzipfilereader.NestedJarHandler.readAllBytesWithSpilloverToDisk(NestedJarHandler.java:815) ~[classgraph-4.8.90.jar:4.8.90]
at nonapi.io.github.classgraph.fastzipfilereader.PhysicalZipFile.<init>(PhysicalZipFile.java:161) ~[classgraph-4.8.90.jar:4.8.90]
at nonapi.io.github.classgraph.fastzipfilereader.NestedJarHandler.downloadJarFromURL(NestedJarHandler.java:576) ~[classgraph-4.8.90.jar:4.8.90]
...
Error when testing local nodes built with CordForm and connecting with RPC:
Node will stop suddenly. No errors in the log. In the directory of the failed node there will be two files:
hs_err_pid20400.log
java_pid20400.hprof
log file has similar errors as StartedMockNode failures:
j nonapi.io.github.classgraph.fastzipfilereader.NestedJarHandler.readAllBytesWithSpilloverToDisk(Ljava/io/InputStream;Ljava/lang/String;JLnonapi/io/github/classgraph/utils/LogNode;)Lnonapi/io/github/classgraph/fileslice/Slice;+65
j nonapi.io.github.classgraph.fastzipfilereader.PhysicalZipFile.<init>(Ljava/io/InputStream;JLjava/lang/String;Lnonapi/io/github/classgraph/fastzipfilereader/NestedJarHandler;Lnonapi/io/github/classgraph/utils/LogNode;)V+25
j nonapi.io.github.classgraph.fastzipfilereader.NestedJarHandler.downloadJarFromURL(Ljava/lang/String;Lnonapi/io/github/classgraph/utils/LogNode;)Lnonapi/io/github/classgraph/fastzipfilereader/PhysicalZipFile;+428
j nonapi.io.github.classgraph.fastzipfilereader.NestedJarHandler.access$000(Lnonapi/io/github/classgraph/fastzipfilereader/NestedJarHandler;Ljava/lang/String;Lnonapi/io/github/classgraph/utils/LogNode;)Lnonapi/io/github/classgraph/fastzipfilereader/PhysicalZipFile;+3
j nonapi.io.github.classgraph.fastzipfilereader.NestedJarHandler$4.newInstance(Ljava/lang/String;Lnonapi/io/github/classgraph/utils/LogNode;)Ljava/util/Map$Entry;+124
Clarification #1: The error occurs during transaction execution. Not when originally uploading the files to the node using CordaRPCOps.uploadAttachmentWithMetadata (that works fine).
Clarification #2: The first node to fail is the one constructing the transaction. If you try restarting this node it will fail on restart. It will take several re-starts to get up and running again. Then any node that was receiving the transaction will fail. They will also require several restarts to get up an running again. As a testament to Corda's Flow framework - after enough restarts the transaction will eventually be successful and the Attachment's will be transmitted.
Clarification #3: I can pre-upload the Attachments to all the nodes before executing the transaction and the failures still occur.
StartedMockNodes:
Found this
Added the following to my workFlows build.gradle file to stop the errors:
test {
maxHeapSize = "4096m"
}
Local Nodes & RPC:
??? - haven't found a solution yet

Solr cloud shard splitting fails when index type is native or simple

I am using solr 4.10.2. I tried to perform a shard split on my solr cloud test cluster. It fails all the time if the index type is set to "native" or "simple".
Is that normal? I can perform shard splitting if the index type is set to "single" or "none".
They advertise that shard splitting can be done while solr is running and i hardly imagine poking around changing the lock type of a production server...
Here is the test environment:
1 shard, 2 nodes, 1 collection.
Initially the collection was empty. I added few documents, verified that they have been replicated. All worked.
Issued the split shard command:
server1:port/solr/admin/collections?action=SPLITSHARD&collection=mycollection&shard=shard1&async=myhandle
After verifying that the operation had finished, by calling
server1:port/solr/admin/collections?action=REQUESTSTATUS&requestid=myhandle
The status was "complete".
Here is the log:
OverseerCollectionProcessor.processMessage : splitshard , {
"operation":"splitshard",
"shard":"shard1",
"collection":"mycollection",
"async":"myhandle"}
1/26/2015, 1:49:02 PM
ERROR
CoreContainer
Error creating core [mycollection_shard1_0_replica1]: Error opening new searcher
org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:873)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:646)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:491)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:466)
at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:575)
at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:199)
at org.apache.solr.handler.admin.CoreAdminHandler$ParallelCoreAdminHandlerThread.run(CoreAdminHandler.java:1234)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:845)
... 9 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock#/nfs/solr/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:89)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:753)
at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:77)
at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:279)
at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:111)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1528)
... 11 more
org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:873)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:646)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:491)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:466)
at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:575)
at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:199)
at org.apache.solr.handler.admin.CoreAdminHandler$ParallelCoreAdminHandlerThread.run(CoreAdminHandler.java:1234)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:845)
... 9 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock#/nfs/solr/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:89)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:753)
at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:77)
at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:279)
at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:111)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1528)
... 11 more
1/26/2015, 1:49:26 PM
ERROR
SolrIndexWriter
SolrIndexWriter was not closed prior to finalize(),​ indicates a bug -- POSSIBLE RESOURCE LEAK!!!
1/26/2015, 1:49:26 PM
ERROR
SolrIndexWriter
Error closing IndexWriter
java.lang.NullPointerException
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3230)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3203)
at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:907)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:984)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:954)
at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:129)
at org.apache.solr.update.SolrIndexWriter.finalize(SolrIndexWriter.java:182)
at java.lang.System$2.invokeFinalize(System.java:1213)
at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:98)
at java.lang.ref.Finalizer.access$100(Finalizer.java:34)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:210)
I fixed the problem. Here is how:
When i created the solr cloud environment, i used the -Dsolr.data.dir property to map the collection storage to a different file system. This was because i was running VMs with limited storage capacity. Once i removed this property everything started working.
I think solr tries to use the same solr.data.dir path for the new cores created by the shard split causing the lock problem.

How to make SBT task fail and hence build itself?

I wrote a SBT task to run cssLint for my project using rhino. cssLint returns exit code to my SBT task.
My question is how to make the task fail if the exit code is non-zero?
I don't want to throw any exceptions. I want my last line of the task result to show [Failed] instead of [success] and exit code of my SBT task to be non-zero.
SAMPLE
MyTask {
val exitcode = //rhino functions
//what to do??
}
The actual intent is to fail the build if css errors are present.
The way of failing the build without producing the stacktrace on the console is using the exceptions that are specifically handled:
for sbt.MessageOnlyException an error message is logged twice (without task name and then with task name) and the build is stopped
mix in sbt.FeedbackProvidedException or sbt.UnprintableException to implement custom exceptions for which sbt does not print a stacktrace. The string with task name and exception's toString is logged on the top level once and the build is stopped. It is expected that essential information for the user is already logged before throwing these.
Disclaimer: I've not seen this information in sbt manual. Extracted from the sources of sbt 0.13.16. sbt.FeedbackProvidedException is used this way by sbt compiler, sbt tests and by sbt-web and Play sbt plugins.
My understanding is that the success message is printed out always unless
showSuccess setting is set to false or
a task throws an exception.
In your particular case you want to report an error and so you should throw an exception or a value of the type of the result that might be considered a sort of exception like None or Failure.
Say, you've got the following task defined in build.sbt:
lazy val tsk = taskKey[Unit]("Task that always fails")
tsk := {
throw new IllegalStateException("a message")
}
When you execute the tsk task, the exception is printed out with no [success] afterwards.
[no-success]> tsk
[trace] Stack trace suppressed: run last *:tsk for the full output.
[error] (*:tsk) java.lang.IllegalStateException: a message
[error] Total time: 0 s, completed Feb 15, 2014 11:45:27 PM
I would rather prefer avoiding this style of programming and rely on Option as a way to report an issue with processing.
With the following tskO definition:
lazy val tskO = taskKey[Option[String]]("Task that reports failure as None")
tskO := None
you'd then check the result and if it's None you'd know it's a failure.

Severe error in re-deploying servlet - silly error

I was trying to redeploy a servlet I had recently undeployed but kept getting a org.apache.catalina.LifecycleException. Tomcat wasn't able to load the context properly.
This is what triggered the search:
FAIL - Unable to delete [/var/lib/tomcat7/conf/Catalina/localhost/app.xml].
The continued presence of this file may cause problems.
Here are the errors from the log catalina.out, rearranged to fit the column:
WARNING: Calling stop() on failed component [{0}] to trigger clean-up
did not complete.
org.apache.catalina.LifecycleException:
An invalid Lifecycle transition was attempted ([after_stop]) for
component [org.apache.catalina.startup.FailedContext#13150fc] in state
[FAILED]
WARNING: Error while removing context [/app]
java.lang.ClassCastException:
org.apache.catalina.startup.FailedContext cannot be cast to
org.apache.catalina.core.StandardContext
Scrolling back up, I noticed a whole list of SEVERE errors, beginning with:
SEVERE: Parse error in context.xml for /app
org.xml.sax.SAXParseException;
systemId: file:/etc/tomcat7/Catalina/localhost/app.xml/;
lineNumber: 1;
columnNumber: 1;
Content is not allowed in prolog.
This is not related to Cannot Undeploy a web-app completely in Tomcat 7 or Installing JSTL results in org.xml.sax.SAXParseException: Content is not allowed in prolog, though I wasted quite some time looking through these. Everything seemed to be as it should be, so I was stumped. Eventually, I solved it myself. My answer is below.
It turns out that I had deployed
/path/to/servlet/web/
as the XML Configuration file instead of
/path/to/servlet/website.xml
as Tomcat had been expecting.
A quick check in the saved form content of my other browser revealed this kickself error, so I am posting a Q and A here in the hope that this could save someone some time.
Please be careful when deploying manually, people.

Resources