Storm Error: backtype.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read - r

I am running a storm topology on a local cluster and it works fine for about an hour and then it stops after throwing this error.
backtype.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read.
The entire error trace is:
ERROR b.s.t.ShellBolt - Halting process: ShellBolt died. Command: [Rscript, stormtest_with_OR1.R], ProcessInfo pid:32381, name:RBolt exitCode:-1, errorString:
java.lang.RuntimeException: backtype.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read.
Serializer Exception:
at backtype.storm.utils.ShellProcess.readShellMsg(ShellProcess.java:101) ~[storm-core-0.10.0.jar:0.10.0]
at backtype.storm.task.ShellBolt$BoltReaderRunnable.run(ShellBolt.java:321) [storm-core-0.10.0.jar:0.10.0]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_67]
2016-04-05 11:07:04,665 FATAL Unable to register shutdown hook because JVM is shutting down.
51144 [Thread-14-RBolt] ERROR b.s.util - Halting process: ("Worker died")
java.lang.RuntimeException: ("Worker died")
at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:336) [storm-core-0.10.0.jar:0.10.0]
at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.6.0.jar:?]
at backtype.storm.daemon.worker$fn__7184$fn__7185.invoke(worker.clj:532) [storm-core-0.10.0.jar:0.10.0]
at backtype.storm.daemon.executor$mk_executor_data$fn__5523$fn__5524.invoke(executor.clj:261) [storm-core-0.10.0.jar:0.10.0]
at backtype.storm.util$async_loop$fn__545.invoke(util.clj:489) [storm-core-0.10.0.jar:0.10.0]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:?]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_67]
The Bolt is written in R. In my topology code the cluster.shutdown() is commented which has been cited as the reason for this error in many threads related to this error.

Related

wso2 apimanager Active-Active deployment

I have deploy API manager 4.0.0 All-in-one on 2 VMs, front the system with a load balancer.
When one node shutdown by command "sh api-manager.sh stop", another swithes success and runs well , but there are some error in console like below:
TID: [-1] [] [2022-03-14 10:14:13,270] ERROR {org.wso2.carbon.databridge.agent.endpoint.DataEndpointConnectionWorker} - Error while trying to connect
to the endpoint. Cannot borrow client for ssl://10.32.73.10:9711 org.wso2.carbon.databridge.agent.exception.DataEndpointAuthenticationException: Cannot borrow client for ssl://10.32.73.10:9711 at org.wso2.carbon.databridge.agent.endpoint.DataEndpointConnectionWorker.connect(DataEndpointConnectionWorker.java:147)
at org.wso2.carbon.databridge.agent.endpoint.DataEndpointConnectionWorker.run(DataEndpointConnectionWorker.java:59)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.wso2.carbon.databridge.agent.exception.DataEndpointException: Error while opening socket to 10.32.73.10:9711. Connection refused (Conne
ction refused) at org.wso2.carbon.databridge.agent.endpoint.binary.BinarySecureClientPoolFactory.createClient(BinarySecureClientPoolFactory.java:75)
at org.wso2.carbon.databridge.agent.client.AbstractClientPoolFactory.makeObject(AbstractClientPoolFactory.java:39)
at org.apache.commons.pool.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:1212)
at org.wso2.carbon.databridge.agent.endpoint.DataEndpointConnectionWorker.connect(DataEndpointConnectionWorker.java:137)
... 6 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:476)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:218)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:394)
at java.net.Socket.connect(Socket.java:606)
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:287)
at sun.security.ssl.SSLSocketImpl.<init>(SSLSocketImpl.java:146)
at sun.security.ssl.SSLSocketFactoryImpl.createSocket(SSLSocketFactoryImpl.java:88)
at org.wso2.carbon.databridge.agent.endpoint.binary.BinarySecureClientPoolFactory.createClient(BinarySecureClientPoolFactory.java:58)
... 9 more
TID: [-1] [] [2022-03-14 10:14:15,158] WARN {org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup} - No receiver is reachable at URL Endpoint/
Endpoints [tcp://10.32.73.10:9611], will try to reconnect every 30 sec
Are there anything wrong in the deployment.toml?
In APIM active-active setup, each node is publishing throttling data to itself and to the other node. When you stop the other node, it can't publish the throttling data to the other node. Hence you see connection refused errors and this is expected. No harm having these error logs. It will recover when the other node is started. If you look at the deployment.toml, can find the other node details under the throttling configurations.

AcquireJobsRunnableImpl trows PSQLException: SSL error: readHandshakeRecord

With a successfully migrated Alfresco (5.2 to) 7.0 repository, I get PSQLException: SSL error: readHandshakeRecord (see stracktrace below) every morning at 4:00, which then causes the repository to stop responding.
Could someone please help me decipher this stack trace? Why is this job running around 4am? I can't find a suitable quartz job. Does anyone know how to manually force this call to fix this problem? At first I thought it might be related to the contentStoreCleaner running at 4:00 am, but disabling this job doesn't change anything.
The only work around I found so far was to disable the activity workflow engine.
2021-08-08 04:35:39,396 ERROR [org.activiti.engine.impl.jobexecutor.AcquireJobsRunnableImpl] [Thread-46] exception during job acquisition: Could not open JDBC Connection for transaction; nested exception is org.postgresql.util.PSQLException: SSL error: readHandshakeRecord
org.springframework.transaction.CannotCreateTransactionException: Could not open JDBC Connection for transaction; nested exception is org.postgresql.util.PSQLException: SSL error: readHandshakeRecord
at org.springframework.jdbc.datasource.DataSourceTransactionManager.doBegin(DataSourceTransactionManager.java:309)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.startTransaction(AbstractPlatformTransactionManager.java:400)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:373)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:137)
at org.activiti.spring.SpringTransactionInterceptor.execute(SpringTransactionInterceptor.java:45)
at org.activiti.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:31)
at org.activiti.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:40)
at org.activiti.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:35)
at org.activiti.engine.impl.jobexecutor.AcquireJobsRunnableImpl.run(AcquireJobsRunnableImpl.java:54)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.postgresql.util.PSQLException: SSL error: readHandshakeRecord
at org.postgresql.ssl.MakeSSL.convert(MakeSSL.java:43)
at org.postgresql.core.v3.ConnectionFactoryImpl.enableSSL(ConnectionFactoryImpl.java:534)
at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:149)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:213)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:51)
at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:223)
at org.postgresql.Driver.makeConnection(Driver.java:465)
at org.postgresql.Driver.connect(Driver.java:264)
at org.apache.commons.dbcp.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:38)
at org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:582)
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1188)
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
at org.springframework.jdbc.datasource.DataSourceTransactionManager.doBegin(DataSourceTransactionManager.java:265)
... 9 more
Caused by: javax.net.ssl.SSLException: readHandshakeRecord
at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1335)
at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:440)
at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:411)
at org.postgresql.ssl.MakeSSL.convert(MakeSSL.java:41)
... 22 more
Suppressed: java.net.SocketException: Broken pipe (Write failed)
at java.base/java.net.SocketOutputStream.socketWrite0(Native Method)
at java.base/java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:110)
at java.base/java.net.SocketOutputStream.write(SocketOutputStream.java:150)
at java.base/sun.security.ssl.SSLSocketOutputRecord.encodeAlert(SSLSocketOutputRecord.java:81)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:380)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:292)
at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:450)
... 24 more
Caused by: java.net.SocketException: Broken pipe (Write failed)
at java.base/java.net.SocketOutputStream.socketWrite0(Native Method)
at java.base/java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:110)
at java.base/java.net.SocketOutputStream.write(SocketOutputStream.java:150)
at java.base/sun.security.ssl.SSLSocketOutputRecord.flush(SSLSocketOutputRecord.java:251)
at java.base/sun.security.ssl.HandshakeOutStream.flush(HandshakeOutStream.java:89)
at java.base/sun.security.ssl.Finished$T13FinishedProducer.onProduceFinished(Finished.java:679)
at java.base/sun.security.ssl.Finished$T13FinishedProducer.produce(Finished.java:658)
at java.base/sun.security.ssl.SSLHandshake.produce(SSLHandshake.java:436)
at java.base/sun.security.ssl.Finished$T13FinishedConsumer.onConsumeFinished(Finished.java:1011)
at java.base/sun.security.ssl.Finished$T13FinishedConsumer.consume(Finished.java:874)
at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392)
at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:443)
at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:421)
at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:182)
at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:171)
at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1418)
at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1324)
... 25 more
The stacktrace was misleading - the jdbc connection problem was caused by memory filling up by the Alfresco trashcan cleaner module: OutOfMemoryError: Java heap space due to no limit on getChildAssocs.
We have ~ 4 million nodes in the trashcan and the module retrieves in every batch run all nodes again and again until the memory has been filled up ...

MPI error from very simple example fortran code

I have compiled this code:
program mpisimple
implicit none
integer ierr
include 'mpif.h'
call mpi_init(ierr)
write(6,*) 'Hello World!'
call mpi_finalize(ierr)
end
using the command: mpif90 -o helloworld simplempi.f90
When I run with this command:
$ mpiexec -np 1 ./helloworld
Hello World!
it works fine as you can see. But when I run with any other number of processors (here 4) I get the errors and I basically have to ctrl+C to kill it.
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(805).....: fail failed
MPID_Init(1859)...........: channel initialization failed
MPIDI_CH3_Init(126).......: fail failed
MPID_nem_init_ckpt(858)...: fail failed
MPIDI_CH3I_Seg_commit(427): PMI_KVS_Get returned 4
In: PMI_Abort(69777679, Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(805).....: fail failed
MPID_Init(1859)...........: channel initialization failed
MPIDI_CH3_Init(126).......: fail failed
MPID_nem_init_ckpt(858)...: fail failed
MPIDI_CH3I_Seg_commit(427): PMI_KVS_Get returned 4)
forrtl: severe (174): SIGSEGV, segmentation fault occurred
What could be the problem? I am doing this on a Linux hpc system.
I figured out why this happened. The system I am using does not require users to submit single-core jobs through the scheduler, but does require it for multi-core jobs. Once the mpiexec command was submitted through a PBS bash script, the errors went away and output was as expected.

Airflow: Increase Airflow Task Timeout time

I'm hitting some issues where a few of my tasks are timing out. When the task reruns, or I retry the task, it works fine.
Is there a way to increase the time out either in the config, or within the DAG itself to avoid these errors?
ERROR - Process timed out
ERROR - Failed to import: /home/ec2-user/airflow/dags/dse/mydag.py
airflow.exceptions.AirflowTaskTimeout: Timeout
airflow.exceptions.AirflowException: dag_id could not be found: mydag. Either the dag did not exist or it failed to parse.

Aptana crashes out of memory

Aptana crashes with
java.lang.OutOfMemoryError: PermGen space
Error while logging event loop exception:
Java HotSpot(TM) Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminat
Increase the memory adding/modifing
AptanaStudio3.ini
the following lines
-Xms1024m
-Xmx1024m
-XX:NewSize=256m
-XX:MaxNewSize=356m
-XX:PermSize=256m

Resources