pykafka deadlock on nosetests' integration tests - python-3.6
we are having issues with pykafka 2.7.0 on python3.6. There are some integration tests, and for some reason a deadlock is formed on topics we produce to. The nosetests refuses to finish after all tests finished. This does not happen when we run the same tests on python2.7, and we have used pykafka 2.8.0 on python3.6 with no help.
Only way it got fixed is a temp fix of stopping/deleting the producer right after a message is produced to the topic(as you can see in last 2 line of code), which is costing a lot of time just to stop the producer.
def publish(self, topic, message):
topic = topic.lower()
self._log.info('publish top topic ' + topic)
if topic not in self.producer_map:
k_topic = self.__messenger.topics[topic.encode()]
self._log.info(k_topic)
new_producer = k_topic.get_producer()
self.producer_map[topic] = new_producer
self.producer_map[topic].produce(message)
# The fix
self.producer_map[topic].stop()
del self.producer_map[topic]
If the last 2 lines were not there, I see in gdb that the process in stuck right after garbage collecting, and trying to stop the producer. Then it hangs at _wait_for_tstate_lock. It is preventing our CI in Jenkins from finishing, and would like to know why it deadlocks on garbage collection but not when called in code.
#9 Frame 0x7f4824fe7958, for file /usr/lib64/python3.6/threading.py, line 1072, in _wait_for_tstate_lock (self=<Thread(_target=<function at remote 0x7f4824ef4ae8>, _name='3: pykafka.OwnedBroker.queue_reader for broker 0', _args=(), _kwargs={}, _daemonic=True, _ident=139948465846016, _tstate_lock=<_thread.lock at remote 0x7f4824de78a0>, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7f4824de7670>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f4824de7670>, release=<built-in method release of _thread.lock object at remote 0x7f4824de7670>, _waiters=<collections.deque at remote 0x7f4824ddc2b8>) at remote 0x7f4824de3128>, _flag=True) at remote 0x7f4824de31d0>, _is_stopped=False, _initialized=True, _stderr=<redacted>, <redacted>, <_io.TextIOWrapper at remote 0x7f48549b1708>)) at remote 0x7f48275474a8>)) at remote 0x7f4827547518>)) at remote 0x7f48275476d8>) at remote 0x7f4824de3160>, block=True, timeout=-1, lock=<_thread.lock at remote 0x7f4824de78a0>)
elif lock.acquire(block, timeout):
#13 Frame 0x7f4824e045a0, for file /usr/lib64/python3.6/threading.py, line 1056, in join (self=<Thread(_target=<function at remote 0x7f4824ef4ae8>, _name='3: pykafka.OwnedBroker.queue_reader for broker 0', _args=(), _kwargs={}, _daemonic=True, _ident=139948465846016, _tstate_lock=<_thread.lock at remote 0x7f4824de78a0>, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7f4824de7670>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f4824de76---Type <return> to continue, or q <return> to quit---
70>, release=<built-in method release of _thread.lock object at remote 0x7f4824de7670>, _waiters=<collections.deque at remote 0x7f4824ddc2b8>) at remote 0x7f4824de3128>, _flag=True) at remote 0x7f4824de31d0>, _is_stopped=False, _initialized=True, _stderr=<redacted>, <redacted>, <redacted>, <redacted>)) at remote 0x7f4827547518>)) at remote 0x7f48275476d8>) at remote 0x7f4824de3160>, timeout=None)
self._wait_for_tstate_lock()
#17 (frame information optimized out)
#21 Frame 0x7f484d8bee08, for file /usr/local/lib/python3.6/site-packages/pykafka/producer.py, line 235, in __del__ (self=<Producer(_cluster=<Cluster(_seed_hosts='<redacted>:9092', _socket_timeout_ms=30000, _offsets_channel_socket_timeout_ms=10000, _handler=<ThreadingHandler at remote 0x7f4824e81da0>, _brokers={0: <Broker(_connection=<BrokerConnection(_buff=<bytearray at remote 0x7f4824fdb228>, host=b'<redacted>', port=9092, _handler=<...>, _socket=<socket at remote 0x7f4824dff048>, source_host='', source_port=0, _wrap_socket=<function at remote 0x7f4824dfba60>) at remote 0x7f4824fd5898>, _offsets_channel_connection=None, _id=0, _host=b'<redacted>', _port=9092, _source_host='', _source_port=0, _ssl_config=None, _handler=<...>, _req_handler=<RequestHandler(handler=<...>, shared=<Shared at remote 0x7f4824dd0048>, t=<Thread(_target=<function at remote 0x7f4824dfbae8>, _name="1: pykafka.RequestHandler.worker for b'<redacted>':9092", _args=(), _kwargs={}, _daemonic=True, _ident=139948381951744, _tstate_lock=<_thread.lock at remote 0x7f4824de7120>, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7f4824de71e8>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f4824de71e8>, release=<built-in method release of _thread.lock object at remote 0x7f4824de71e8>, _waiters=<collections.deque at remote 0x7f4824e079a0>) at remote 0x7f4835420198>, _flag=True) at remote 0x7f4835424358>, _is_stopped=False, _initialized=True, _stderr=<redacted>, <redacted>, <redacted>, <_io.TextIOWrapper at remote 0x7f48549b1708>)) at remote 0x7f48275474a8>)) at remote 0x7f4827547518>)) at remote 0x7f48275476d8>) at remote 0x7f4835424940>) at remote 0x7f4824f62438>, _offsets_channel_req_handler=None, _socket_timeout_ms=30000, _offsets_channel_socket_timeout_ms=10000, _buffer_size=1048576, _req_handlers={}, _broker_version='0.9.0', _api_versions={0: <ApiVersionsSpec at remote 0x7f4824e88408>, 1: <ApiVersionsSpec at remote 0x7f4824e88458>, 2: <ApiVersionsSpec at remote 0x7f4824e884a8>, 3: <ApiVersionsSpec at remote 0x7f4824e884f8>, 4: <ApiVersionsSpec at remote 0x7f4824e88548>, 5: <ApiVersionsSpec at remote 0x7f4824e88598>, 6: <ApiVersionsSpec at remote 0x7f4824e885e8>, 7: <ApiVersionsSpec at remote 0x7f4824e88638>, 8: <ApiVersionsSpec at remote 0x7f4824e88688>, 9: <ApiVersionsSpec at remote 0x7f4824e886d8>, 10: <ApiVersionsSpec at remote 0x7f4824e88728>, 11: <ApiVersionsSpec at remote 0x7f4824e88778>, 12: <ApiVersionsSpec at remote 0x7f4824e887c8>, 13: <ApiVersionsSpec at remote 0x7f4824e88818>, 14: <ApiVersionsSpec at remote 0x7f4824e88868>, 15: <ApiVersionsSpec at remote 0x7f4824e888b8>, 16: <ApiVersionsSpec at remote 0x7f4824e88908>}) at remote 0x7f482500f4e0>}, _topics=<TopicDict(_cluster=<weakref at remote 0x7f4824f9c4f8>, _exclude_internal_topics=True) at remote 0x7f4824fcd888>, _source_address='', _source_host='', _source_port=0, _ssl_config=None, _zookeeper_connect=None, _max_connection_retries=3, _max_connection_retries_offset_mgr=8, _broker_version='0.9.0', _api_versions={...}, controller_broker=None) at remote 0x7f4824ed8198>, _protocol_version=0, _topic=<Topic(_name=b'<redacted>', _cluster=<...>, _partitions={0: <Partition(_id=0, _leader=<...>, _replicas=[<...>], _isr=[<...>], _topic=<weakref at remote 0x7f4824e0cdb8>) at remote 0x7f48353ffbe0>}) at remote 0x7f4824deedd8>, _partitioner=<RandomPartitioner(idx=0) at remote 0x7f48526baba8>, _compression=0, _max_retries=3, _retry_backoff_ms=100, _required_acks=1, _ack_timeout_ms=10000, _max_queued_messages=100000, _min_queued_messages=70000, _linger_ms=5000, _queue_empty_timeout_ms=0, _block_on_queue_full=True, _max_request_size=1000012, _synchronous=False, _worker_exception=None, _owned_brokers={0: <OwnedBroker(producer=<weakproxy at remote 0x7f4824e0ce08>, broker=<...>, lock=<_thread.RLock at remote 0x7f4824fd8420>, flush_ready=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7f4824de7698>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f4824de7698>, release=<built-in method release of _thread.lock object at remote 0x7f4824de7698>, _waiters=<collections.deque at remote 0x7f4824ddc118>) at remote 0x7f4824de3320>, _flag=False) at remote 0x7f4824de33c8>, has_message=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7f4824de7710>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f4824de7710>, release=<built-in method release of _thread.lock object at remote 0x7f4824de7710>, _waiters=<collections.deque at remote 0x7f4824ddc0b0>) at remote 0x7f4824de3240>, _flag=True) at remote 0x7f4824de3390>, slot_available=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7f4824de7788>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f4824de7788>, release=<built-in method release of _thread.lock object at remote 0x7f4824de7788>, _waiters=<collections.deque at remote 0x7f4824ddc180>) at remote 0x7f4824de3208>, _flag=True) at remote 0x7f4824de3278>, queue=<collections.deque at remote 0x7f4824ddc1e8>, messages_pending=2, running=False, _auto_start=True, _queue_reader_worker=<Thread(_target=<function at remote 0x7f4824ef4ae8>, _name='3: pykafka.OwnedBroker.queue_reader for broker 0', _args=(...), _kwargs={}, _daemonic=True, _ident=139948465846016, _tstate_lock=<_thread.lock at remote 0x7f4824de78a0>, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7f4824de7670>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f4824de7670>, release=<built-in method release of _thread.lock object at remote 0x7f4824de7670>, _waiters=<collections.deque at remote 0x7f4824ddc2b8>) at remote 0x7f4824de3128>, _flag=True) at remote 0x7f4824de31d0>, _is_stopped=False, _initialized=True, _stderr=<...>) at remote 0x7f4824de3160>) at remote 0x7f4824e81cc0>}, _delivery_reports=<_DeliveryReportNone(queue=None) at remote 0x7f4824e814a8>, _pending_timeout_ms=5000, _auto_start=True, _serializer=None, _running=True, _update_lock=<_thread.lock at remote 0x7f4824de7620>) at remote 0x7f4824fdb208>)
self.stop()
#30 Garbage-collecting
Related
Access impossible to newly setup EJBCA PKI
I have just finished installing ejbca community edition on top of wildfly. The EJBCA server is a VM in the azure cloud. everything went fine during build : Build successful for every 3 steps of deployment. - ant deployear - ant runinstall - ant deploy-keystore) Versions : Wildfly 18.0 EJBCA 7.4.3.2 Ant 1.10.10 Mysql Ver 15.1 Distrib 10.3.27-MariaDB JDBC connector : mariadb 2.7.3 Debian 10 buster However i am unable to reach the destination https://<public ip address>:8443/ejbca/ Error message : The connection has timed out The server at <my public ip #> is taking too long to respond. So, started checking the different ports open : **remote** nmap scan from my local vm to the remote EJBCA VM : nmap -Pn8080,22,8442,8443,9990,3306 52.188.59.103 Host is up (0.037s latency). Not shown: 995 filtered ports PORT STATE SERVICE 21/tcp open ftp 80/tcp open http 443/tcp open https 554/tcp open rtsp 1723/tcp open pptp Nmap done: 1 IP address (1 host up) scanned in 5.62 seconds On the EJBCA VM a local port scan shows that port 8443 and 8080 are open : rDNS record for 127.0.0.1: localhost Not shown: 996 closed ports PORT STATE SERVICE 22/tcp open ssh 3306/tcp open mysql 8080/tcp open http-proxy 8443/tcp open https-alt Azure connectivity tests from my ip to EJBCA host is OK for every ports tested. however, online Port check says ports 8443 and 8442 are closed https://portchecker.co/ So i don't know which test to trust ? I tried disabling both my local firewall and my proxy but it didn't make any difference. I did a tcpdump on the EJBCA server whilst trying to access ejbca url : but nothing was displayed. What am i missing here ? What other tests can i perform? EDIT : serverlog: (errors and warnings ) web admin error: 2021-06-14 13:00:07,332 ERROR [org.jboss.as.jsf] (MSC service thread 1-2) WFLYJSF0002: Could not load JSF managed bean class: org.ejbca.ui.web.admin.peerconnector.PeerConnectorsMBean 2021-06-14 13:00:07,433 ERROR [org.jboss.as.jsf] (MSC service thread 1-2) WFLYJSF0002: Could not load JSF managed bean class: org.ejbca.ui.web.admin.peerconnector.PeerConnectorMBean Deprecated lib: 2021-06-14 13:00:14,598 WARN [org.jboss.weld.Bootstrap] (MSC service thread 1-4) WELD-000146: BeforeBeanDiscovery.addAnnotatedType(AnnotatedType<?>) used for class com.sun.faces.flow.FlowDiscoveryCDIHelper is deprecated from CDI 1.1! Severe errors : 2021-06-14 13:00:15,967 SEVERE [javax.enterprise.resource.webcontainer.jsf.flow] (MSC service thread 1-4) Unable to obtain CDI 1.1 utilities for Mojarra 2021-06-14 13:00:15,971 SEVERE [javax.enterprise.resource.webcontainer.jsf.application.view] (MSC service thread 1-4) Unable to obtain CDI 1.1 utilities for Mojarra Warnings: 2021-06-14 13:00:16,770 INFO [org.ejbca.core.ejb.StartupSingletonBean] (ServerService Thread Pool -- 94) Init, EJBCA 7.4.3.2 Community (67479006a69140e81d66e39871bed8255362effc) startup. 2021-06-14 13:00:16,780 WARN [io.undertow.servlet] (ServerService Thread Pool -- 66) UT015020: Path /* is secured for some HTTP methods, however it is not secured for [HEAD, POST, GET] 2021-06-14 13:00:16,780 WARN [io.undertow.servlet] (ServerService Thread Pool -- 73) UT015020: Path /* is secured for some HTTP methods [...]
During startup WildFly should log something like the following, so you can verify that WildFly is configured to listen on ports for all IPs. 16:58:12,890 INFO [org.wildfly.extension.undertow] (MSC service thread 1-7) WFLYUT0006: Undertow HTTPS listener httpspriv listening on 0.0.0.0:8443 16:58:12,920 INFO [org.wildfly.extension.undertow] (MSC service thread 1-8) WFLYUT0006: Undertow HTTPS listener httpspub listening on 0.0.0.0:8442 You can also try connecting to port 8442, to check that the problem is not that you don't have the client certificate in your browser.
Getting this error Status : Failure -Test failed: IO Error: The Network Adapter could not establish the connection
I am new to Oracle, installed oracle SQL developer but each time I try to connect, I get the error: Status: Failure -Test failed: IO Error: The Network Adapter could not establish the connection and the oracleTNSlistener service turns off on its own. Each time I start the service, it turns off immediately on its own.
Getting error while redeploying nodes
I am still using Corda 1.0 version. when i try to redeploy nodes with existing data, getting below error while start-up but able to access the nodes . If i clear the data and redeploy the nodes, i didn't face these error message. Logs can be found in : C:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\kotlin- source\build\nodes\xxxxxxxx\logs Database connection url is : jdbc:h2:tcp://xxxxxxxxx/node E 18:38:46+0530 [main] core.client.createConnection - AMQ214016: Failed to create netty connection javax.net.ssl.SSLException: handshake timed out at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source) ~[netty all-4.1.9.Final.jar:4.1.9.Final] Incoming connection address : xxxxxxxxxxxx Listening on port : 10014 RPC service listening on port : 10015 Loaded CorDapps : corda-finance-1.0.0, kotlin- source-0.1, corda-core-1.0.0 Node for "xxxxxxxxxxx" started up and registered in 213.08 sec Welcome to the Corda interactive shell. Useful commands include 'help' to see what is available, and 'bye' to shut down the node. Wed May 23 18:39:20 IST 2018>>> E 18:39:24+0530 [Thread-6 (ActiveMQ-server- org.apache.activemq.artemis.core.server.impl.ActiveMQServerImp l$3#4a532271)] core.client.createConnection - AMQ214016: Failed to create netty connection javax.net.ssl.SSLException: handshake timed out at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source) ~[netty- all-4.1.9.Final.jar:4.1.9.Final]
This looks like the Artemis failed to connect to the node which means the node fails to start. You should look at the log and if there are any other previous Corda node started which occupy the node. If there are any legacy Corda nodes that have not been killed, try ps -ef |grep java to see if there is any other java still alive. Especially look for the port number and check if they are overlapped
openstack: Failed to launch instance from the glance
We have setup OpenStack using conjure-up on a (Ubuntu LTS server 16.04.3) single machine. All are services are up and running, and successfully I am able to upload images to the glance. We wanted to save these glance images created by "glance image-create" in remote machine which have nfs server. So we have configured glance-api.conf file as below. My glance-api.conf looks like this: [glance_store] filesystem_store_datadir = /var/lib/glance/images default_store = file And in glance controller node, I have mounted remote machine Ip/home/glance/images/ in this directory path /var/lib/glance/images and have mentioned the same mounted directory path inside the glance-api.conf file. I have created the two sample private network with some ip (192.168.1.0 and 10.221.50.0) but have not created a public network as at this moment I don't want to access this VM instance from outside. When I am trying to launch the instance from dashboard UI as well as through CLI, I am getting below error. Error: Failed to perform requested operation on instance "Ubuntu_Hawkbit", the instance has an error status: Please try again later [Error: No valid host was found. There are not enough hosts available.]. Note: I have tried by associating Instance with different private network ,thinking that it may be network IP address issue but facing the same error. When I check /var/log/nova/nova-compute.log logs, I see below error. ERROR nova.image.glance [req-1459f1b2-491c-46a2-b803-6ff621a79d30 6ebc7996240c4ce688234f544c9d0116 07427c9d49704357a049b24193ee0a28 - - -] Error contacting glance server 'http://10.206.193.159:9292' for 'data', done trying. ERROR nova.image.glance CommunicationError: Error finding address for http://10.206.193.159:9292/v1/images/6c30e2ab-1078-45ad-bed2-3e3a75f6af8c: ('Connection aborted.', BadStatusLine("''",)) ERROR nova_lxd.nova.virt.lxd.image [req-1459f1b2-491c-46a2-b803-6ff621a79d30 6ebc7996240c4ce688234f544c9d0116 07427c9d49704357a049b24193ee0a28 - - -] [instance: eedc008d-ef34-498d-8774-b3813ce032f4] Failed to upload 6c30e2ab-1078-45ad-bed2-3e3a75f6af8c to LXD: Connection to glance ERROR nova_lxd.nova.virt.lxd.operations [req-1459f1b2-491c-46a2-b803-6ff621a79d30 6ebc7996240c4ce688234f544c9d0116 07427c9d49704357a049b24193ee0a28 - - -] [instance: eedc008d-ef34-498d-8774-b3813ce032f4] Faild to start container instance-00000020: Connection to glance host http://10.206.193.159:9292 failed: Error finding address for http://10.206.193.159:9292/v1/images/6c30e2ab-1078-45ad-bed2-3e3a75f6af8c: ('Connection aborted.', BadStatusLine("''",)) ERROR nova.compute.manager [req-1459f1b2-491c-46a2-b803-6ff621a79d30 6ebc7996240c4ce688234f544c9d0116 07427c9d49704357a049b24193ee0a28 - - -] [instance: eedc008d-ef34-498d-8774-b3813ce032f4] Instance failed to spawn
Nodes not starting. Jar manifest missing
I am not not able to start nodes on a Linux server I am getting the following (edited) output [user#host nodes]$ ./runnodes which: no osascript in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/home/user/.local/bin:/home/user/bin) Starting nodes in /opt/nodes Starting corda.jar in /opt/nodes/NodeA on debug port 5005 Starting corda-webserver.jar in /opt/nodes/Agent on debug port 5006 Starting corda.jar in /opt/nodes/NodeB on debug port 5007 Starting corda-webserver.jar in /opt/nodes/NodeB on debug port 5008 Starting corda.jar in /opt/nodes/NodeC on debug port 5009 Starting corda-webserver.jar in /opt/nodes/NodeC on debug port 5010 Starting corda.jar in /opt/nodes/NodeZ on debug port 5011 Starting corda-webserver.jar in /opt/nodes/NodeZ on debug port 5012 Started 8 processes Finished starting nodes [user#host nodes]$ Error opening zip file or JAR manifest missing : /home/user/.capsule/apps/net.corda.webserver.WebServer_0.12.1/quasar-core-0.7.6-jdk8.jar Error occurred during initialization of VM agent library failed to init: instrument Error opening zip file or JAR manifest missing : /home/user/.capsule/apps/net.corda.node.Corda_0.12.1/quasar-core-0.7.6-jdk8.jar Error occurred during initialization of VM agent library failed to init: instrument Error opening zip file or JAR manifest missing : /home/user/.capsule/apps/net.corda.webserver.WebServer_0.12.1/quasar-core-0.7.6-jdk8.jar Error occurred during initialization of VM agent library failed to init: instrument Error opening zip file or JAR manifest missing : /home/user/.capsule/apps/net.corda.node.Corda_0.12.1/quasar-core-0.7.6-jdk8.jar Error occurred during initialization of VM agent library failed to init: instrument Error opening zip file or JAR manifest missing : /home/user/.capsule/apps/net.corda.webserver.WebServer_0.12.1/quasar-core-0.7.6-jdk8.jar Error occurred during initialization of VM agent library failed to init: instrument Error opening zip file or JAR manifest missing : /home/user/.capsule/apps/net.corda.node.Corda_0.12.1/quasar-core-0.7.6-jdk8.jar Error occurred during initialization of VM agent library failed to init: instrument Listening for transport dt_socket at address: 5012 Unknown command line arguments: no-local-shell is not a recognized option Listening for transport dt_socket at address: 5011 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/user/.capsule/apps/net.corda.node.Corda_0.12.1/log4j-slf4j-impl-2.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/nodes/NodeZ/dependencies/log4j-slf4j-impl-2.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] What am I missing to start these nodes?
Did you build the nodes on Mac, than transfer them to Linux? If so, try building the nodes directly on the Linux machine.