pykafka deadlock on nosetests' integration tests - python-3.6

we are having issues with pykafka 2.7.0 on python3.6. There are some integration tests, and for some reason a deadlock is formed on topics we produce to. The nosetests refuses to finish after all tests finished. This does not happen when we run the same tests on python2.7, and we have used pykafka 2.8.0 on python3.6 with no help.
Only way it got fixed is a temp fix of stopping/deleting the producer right after a message is produced to the topic(as you can see in last 2 line of code), which is costing a lot of time just to stop the producer.
def publish(self, topic, message):
topic = topic.lower()
self._log.info('publish top topic ' + topic)
if topic not in self.producer_map:
k_topic = self.__messenger.topics[topic.encode()]
self._log.info(k_topic)
new_producer = k_topic.get_producer()
self.producer_map[topic] = new_producer
self.producer_map[topic].produce(message)
# The fix
self.producer_map[topic].stop()
del self.producer_map[topic]
If the last 2 lines were not there, I see in gdb that the process in stuck right after garbage collecting, and trying to stop the producer. Then it hangs at _wait_for_tstate_lock. It is preventing our CI in Jenkins from finishing, and would like to know why it deadlocks on garbage collection but not when called in code.
#9 Frame 0x7f4824fe7958, for file /usr/lib64/python3.6/threading.py, line 1072, in _wait_for_tstate_lock (self=<Thread(_target=<function at remote 0x7f4824ef4ae8>, _name='3: pykafka.OwnedBroker.queue_reader for broker 0', _args=(), _kwargs={}, _daemonic=True, _ident=139948465846016, _tstate_lock=<_thread.lock at remote 0x7f4824de78a0>, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7f4824de7670>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f4824de7670>, release=<built-in method release of _thread.lock object at remote 0x7f4824de7670>, _waiters=<collections.deque at remote 0x7f4824ddc2b8>) at remote 0x7f4824de3128>, _flag=True) at remote 0x7f4824de31d0>, _is_stopped=False, _initialized=True, _stderr=<redacted>, <redacted>, <_io.TextIOWrapper at remote 0x7f48549b1708>)) at remote 0x7f48275474a8>)) at remote 0x7f4827547518>)) at remote 0x7f48275476d8>) at remote 0x7f4824de3160>, block=True, timeout=-1, lock=<_thread.lock at remote 0x7f4824de78a0>)
elif lock.acquire(block, timeout):
#13 Frame 0x7f4824e045a0, for file /usr/lib64/python3.6/threading.py, line 1056, in join (self=<Thread(_target=<function at remote 0x7f4824ef4ae8>, _name='3: pykafka.OwnedBroker.queue_reader for broker 0', _args=(), _kwargs={}, _daemonic=True, _ident=139948465846016, _tstate_lock=<_thread.lock at remote 0x7f4824de78a0>, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7f4824de7670>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f4824de76---Type <return> to continue, or q <return> to quit---
70>, release=<built-in method release of _thread.lock object at remote 0x7f4824de7670>, _waiters=<collections.deque at remote 0x7f4824ddc2b8>) at remote 0x7f4824de3128>, _flag=True) at remote 0x7f4824de31d0>, _is_stopped=False, _initialized=True, _stderr=<redacted>, <redacted>, <redacted>, <redacted>)) at remote 0x7f4827547518>)) at remote 0x7f48275476d8>) at remote 0x7f4824de3160>, timeout=None)
self._wait_for_tstate_lock()
#17 (frame information optimized out)
#21 Frame 0x7f484d8bee08, for file /usr/local/lib/python3.6/site-packages/pykafka/producer.py, line 235, in __del__ (self=<Producer(_cluster=<Cluster(_seed_hosts='<redacted>:9092', _socket_timeout_ms=30000, _offsets_channel_socket_timeout_ms=10000, _handler=<ThreadingHandler at remote 0x7f4824e81da0>, _brokers={0: <Broker(_connection=<BrokerConnection(_buff=<bytearray at remote 0x7f4824fdb228>, host=b'<redacted>', port=9092, _handler=<...>, _socket=<socket at remote 0x7f4824dff048>, source_host='', source_port=0, _wrap_socket=<function at remote 0x7f4824dfba60>) at remote 0x7f4824fd5898>, _offsets_channel_connection=None, _id=0, _host=b'<redacted>', _port=9092, _source_host='', _source_port=0, _ssl_config=None, _handler=<...>, _req_handler=<RequestHandler(handler=<...>, shared=<Shared at remote 0x7f4824dd0048>, t=<Thread(_target=<function at remote 0x7f4824dfbae8>, _name="1: pykafka.RequestHandler.worker for b'<redacted>':9092", _args=(), _kwargs={}, _daemonic=True, _ident=139948381951744, _tstate_lock=<_thread.lock at remote 0x7f4824de7120>, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7f4824de71e8>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f4824de71e8>, release=<built-in method release of _thread.lock object at remote 0x7f4824de71e8>, _waiters=<collections.deque at remote 0x7f4824e079a0>) at remote 0x7f4835420198>, _flag=True) at remote 0x7f4835424358>, _is_stopped=False, _initialized=True, _stderr=<redacted>, <redacted>, <redacted>, <_io.TextIOWrapper at remote 0x7f48549b1708>)) at remote 0x7f48275474a8>)) at remote 0x7f4827547518>)) at remote 0x7f48275476d8>) at remote 0x7f4835424940>) at remote 0x7f4824f62438>, _offsets_channel_req_handler=None, _socket_timeout_ms=30000, _offsets_channel_socket_timeout_ms=10000, _buffer_size=1048576, _req_handlers={}, _broker_version='0.9.0', _api_versions={0: <ApiVersionsSpec at remote 0x7f4824e88408>, 1: <ApiVersionsSpec at remote 0x7f4824e88458>, 2: <ApiVersionsSpec at remote 0x7f4824e884a8>, 3: <ApiVersionsSpec at remote 0x7f4824e884f8>, 4: <ApiVersionsSpec at remote 0x7f4824e88548>, 5: <ApiVersionsSpec at remote 0x7f4824e88598>, 6: <ApiVersionsSpec at remote 0x7f4824e885e8>, 7: <ApiVersionsSpec at remote 0x7f4824e88638>, 8: <ApiVersionsSpec at remote 0x7f4824e88688>, 9: <ApiVersionsSpec at remote 0x7f4824e886d8>, 10: <ApiVersionsSpec at remote 0x7f4824e88728>, 11: <ApiVersionsSpec at remote 0x7f4824e88778>, 12: <ApiVersionsSpec at remote 0x7f4824e887c8>, 13: <ApiVersionsSpec at remote 0x7f4824e88818>, 14: <ApiVersionsSpec at remote 0x7f4824e88868>, 15: <ApiVersionsSpec at remote 0x7f4824e888b8>, 16: <ApiVersionsSpec at remote 0x7f4824e88908>}) at remote 0x7f482500f4e0>}, _topics=<TopicDict(_cluster=<weakref at remote 0x7f4824f9c4f8>, _exclude_internal_topics=True) at remote 0x7f4824fcd888>, _source_address='', _source_host='', _source_port=0, _ssl_config=None, _zookeeper_connect=None, _max_connection_retries=3, _max_connection_retries_offset_mgr=8, _broker_version='0.9.0', _api_versions={...}, controller_broker=None) at remote 0x7f4824ed8198>, _protocol_version=0, _topic=<Topic(_name=b'<redacted>', _cluster=<...>, _partitions={0: <Partition(_id=0, _leader=<...>, _replicas=[<...>], _isr=[<...>], _topic=<weakref at remote 0x7f4824e0cdb8>) at remote 0x7f48353ffbe0>}) at remote 0x7f4824deedd8>, _partitioner=<RandomPartitioner(idx=0) at remote 0x7f48526baba8>, _compression=0, _max_retries=3, _retry_backoff_ms=100, _required_acks=1, _ack_timeout_ms=10000, _max_queued_messages=100000, _min_queued_messages=70000, _linger_ms=5000, _queue_empty_timeout_ms=0, _block_on_queue_full=True, _max_request_size=1000012, _synchronous=False, _worker_exception=None, _owned_brokers={0: <OwnedBroker(producer=<weakproxy at remote 0x7f4824e0ce08>, broker=<...>, lock=<_thread.RLock at remote 0x7f4824fd8420>, flush_ready=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7f4824de7698>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f4824de7698>, release=<built-in method release of _thread.lock object at remote 0x7f4824de7698>, _waiters=<collections.deque at remote 0x7f4824ddc118>) at remote 0x7f4824de3320>, _flag=False) at remote 0x7f4824de33c8>, has_message=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7f4824de7710>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f4824de7710>, release=<built-in method release of _thread.lock object at remote 0x7f4824de7710>, _waiters=<collections.deque at remote 0x7f4824ddc0b0>) at remote 0x7f4824de3240>, _flag=True) at remote 0x7f4824de3390>, slot_available=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7f4824de7788>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f4824de7788>, release=<built-in method release of _thread.lock object at remote 0x7f4824de7788>, _waiters=<collections.deque at remote 0x7f4824ddc180>) at remote 0x7f4824de3208>, _flag=True) at remote 0x7f4824de3278>, queue=<collections.deque at remote 0x7f4824ddc1e8>, messages_pending=2, running=False, _auto_start=True, _queue_reader_worker=<Thread(_target=<function at remote 0x7f4824ef4ae8>, _name='3: pykafka.OwnedBroker.queue_reader for broker 0', _args=(...), _kwargs={}, _daemonic=True, _ident=139948465846016, _tstate_lock=<_thread.lock at remote 0x7f4824de78a0>, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7f4824de7670>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f4824de7670>, release=<built-in method release of _thread.lock object at remote 0x7f4824de7670>, _waiters=<collections.deque at remote 0x7f4824ddc2b8>) at remote 0x7f4824de3128>, _flag=True) at remote 0x7f4824de31d0>, _is_stopped=False, _initialized=True, _stderr=<...>) at remote 0x7f4824de3160>) at remote 0x7f4824e81cc0>}, _delivery_reports=<_DeliveryReportNone(queue=None) at remote 0x7f4824e814a8>, _pending_timeout_ms=5000, _auto_start=True, _serializer=None, _running=True, _update_lock=<_thread.lock at remote 0x7f4824de7620>) at remote 0x7f4824fdb208>)
self.stop()
#30 Garbage-collecting

Related

Access impossible to newly setup EJBCA PKI

I have just finished installing ejbca community edition on top of wildfly.
The EJBCA server is a VM in the azure cloud.
everything went fine during build : Build successful for every 3 steps of deployment.
- ant deployear
- ant runinstall
- ant deploy-keystore)
Versions :
Wildfly 18.0
EJBCA 7.4.3.2
Ant 1.10.10
Mysql Ver 15.1 Distrib 10.3.27-MariaDB
JDBC connector : mariadb 2.7.3
Debian 10 buster
However i am unable to reach the destination
https://<public ip address>:8443/ejbca/
Error message :
The connection has timed out
The server at <my public ip #> is taking too long to respond.
So, started checking the different ports open :
**remote** nmap scan from my local vm to the remote EJBCA VM :
nmap -Pn8080,22,8442,8443,9990,3306 52.188.59.103
Host is up (0.037s latency).
Not shown: 995 filtered ports
PORT STATE SERVICE
21/tcp open ftp
80/tcp open http
443/tcp open https
554/tcp open rtsp
1723/tcp open pptp
Nmap done: 1 IP address (1 host up) scanned in 5.62 seconds
On the EJBCA VM a local port scan shows that port 8443 and 8080 are open :
rDNS record for 127.0.0.1: localhost
Not shown: 996 closed ports
PORT STATE SERVICE
22/tcp open ssh
3306/tcp open mysql
8080/tcp open http-proxy
8443/tcp open https-alt
Azure connectivity tests from my ip to EJBCA host is OK for every ports tested.
however, online Port check says ports 8443 and 8442 are closed
https://portchecker.co/
So i don't know which test to trust ?
I tried disabling both my local firewall and my proxy but it didn't make any difference.
I did a tcpdump on the EJBCA server whilst trying to access ejbca url : but nothing was displayed.
What am i missing here ?
What other tests can i perform?
EDIT :
serverlog: (errors and warnings )
web admin error:
2021-06-14 13:00:07,332 ERROR [org.jboss.as.jsf] (MSC service thread 1-2) WFLYJSF0002: Could not load JSF managed bean class: org.ejbca.ui.web.admin.peerconnector.PeerConnectorsMBean
2021-06-14 13:00:07,433 ERROR [org.jboss.as.jsf] (MSC service thread 1-2) WFLYJSF0002: Could not load JSF managed bean class: org.ejbca.ui.web.admin.peerconnector.PeerConnectorMBean
Deprecated lib:
2021-06-14 13:00:14,598 WARN [org.jboss.weld.Bootstrap] (MSC service thread 1-4) WELD-000146: BeforeBeanDiscovery.addAnnotatedType(AnnotatedType<?>) used for class com.sun.faces.flow.FlowDiscoveryCDIHelper is deprecated from CDI 1.1!
Severe errors :
2021-06-14 13:00:15,967 SEVERE [javax.enterprise.resource.webcontainer.jsf.flow] (MSC service thread 1-4) Unable to obtain CDI 1.1 utilities for Mojarra
2021-06-14 13:00:15,971 SEVERE [javax.enterprise.resource.webcontainer.jsf.application.view] (MSC service thread 1-4) Unable to obtain CDI 1.1 utilities for Mojarra
Warnings:
2021-06-14 13:00:16,770 INFO [org.ejbca.core.ejb.StartupSingletonBean] (ServerService Thread Pool -- 94) Init, EJBCA 7.4.3.2 Community (67479006a69140e81d66e39871bed8255362effc) startup.
2021-06-14 13:00:16,780 WARN [io.undertow.servlet] (ServerService Thread Pool -- 66) UT015020: Path /* is secured for some HTTP methods, however it is not secured for [HEAD, POST, GET]
2021-06-14 13:00:16,780 WARN [io.undertow.servlet] (ServerService Thread Pool -- 73) UT015020: Path /* is secured for some HTTP methods [...]
During startup WildFly should log something like the following, so you can verify that WildFly is configured to listen on ports for all IPs.
16:58:12,890 INFO [org.wildfly.extension.undertow] (MSC service thread 1-7) WFLYUT0006: Undertow HTTPS listener httpspriv listening on 0.0.0.0:8443
16:58:12,920 INFO [org.wildfly.extension.undertow] (MSC service thread 1-8) WFLYUT0006: Undertow HTTPS listener httpspub listening on 0.0.0.0:8442
You can also try connecting to port 8442, to check that the problem is not that you don't have the client certificate in your browser.

Getting this error Status : Failure -Test failed: IO Error: The Network Adapter could not establish the connection

I am new to Oracle, installed oracle SQL developer but each time I try to connect, I get the error:
Status: Failure -Test failed: IO Error: The Network Adapter could not
establish the connection
and the oracleTNSlistener service turns off on its own. Each time I start the service, it turns off immediately on its own.

Getting error while redeploying nodes

I am still using Corda 1.0 version. when i try to redeploy nodes with existing data, getting below error while start-up but able to access the nodes . If i clear the data and redeploy the nodes, i didn't face these error message.
Logs can be found in :
C:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\kotlin-
source\build\nodes\xxxxxxxx\logs
Database connection url is : jdbc:h2:tcp://xxxxxxxxx/node
E 18:38:46+0530 [main] core.client.createConnection - AMQ214016: Failed to
create netty connection
javax.net.ssl.SSLException: handshake timed out
at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source) ~[netty
all-4.1.9.Final.jar:4.1.9.Final]
Incoming connection address : xxxxxxxxxxxx
Listening on port : 10014
RPC service listening on port : 10015
Loaded CorDapps : corda-finance-1.0.0, kotlin-
source-0.1, corda-core-1.0.0
Node for "xxxxxxxxxxx" started up and registered in 213.08 sec
Welcome to the Corda interactive shell.
Useful commands include 'help' to see what is available, and 'bye' to shut
down the node.
Wed May 23 18:39:20 IST 2018>>> E 18:39:24+0530 [Thread-6 (ActiveMQ-server-
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImp
l$3#4a532271)] core.client.createConnection - AMQ214016: Failed to create
netty connection
javax.net.ssl.SSLException: handshake timed out
at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source) ~[netty-
all-4.1.9.Final.jar:4.1.9.Final]
This looks like the Artemis failed to connect to the node which means the node fails to start.
You should look at the log and if there are any other previous Corda node started which occupy the node.
If there are any legacy Corda nodes that have not been killed, try ps -ef |grep java to see if there is any other java still alive. Especially look for the port number and check if they are overlapped

openstack: Failed to launch instance from the glance

We have setup OpenStack using conjure-up on a (Ubuntu LTS server 16.04.3) single machine. All are services are up and running, and successfully I am able to upload images to the glance.
We wanted to save these glance images created by "glance image-create" in remote machine which have nfs server. So we have configured glance-api.conf file as below.
My glance-api.conf looks like this:
[glance_store]
filesystem_store_datadir = /var/lib/glance/images
default_store = file
And in glance controller node, I have mounted
remote machine Ip/home/glance/images/ in this directory path
/var/lib/glance/images
and have mentioned the same mounted directory path inside the glance-api.conf file.
I have created the two sample private network with some ip (192.168.1.0 and 10.221.50.0) but have not created a public network as at this moment I don't want to access this VM instance from outside.
When I am trying to launch the instance from dashboard UI as well as through CLI, I am getting below error.
Error: Failed to perform requested operation on instance "Ubuntu_Hawkbit", the instance has an error status: Please try again later [Error: No valid host was found. There are not enough hosts available.].
Note: I have tried by associating Instance with different private network ,thinking that it may be network IP address issue but facing the same error.
When I check /var/log/nova/nova-compute.log logs, I see below error.
ERROR nova.image.glance [req-1459f1b2-491c-46a2-b803-6ff621a79d30 6ebc7996240c4ce688234f544c9d0116 07427c9d49704357a049b24193ee0a28 - -
-] Error contacting glance server 'http://10.206.193.159:9292' for 'data', done trying.
ERROR nova.image.glance CommunicationError:
Error finding address for
http://10.206.193.159:9292/v1/images/6c30e2ab-1078-45ad-bed2-3e3a75f6af8c:
('Connection aborted.', BadStatusLine("''",))
ERROR
nova_lxd.nova.virt.lxd.image
[req-1459f1b2-491c-46a2-b803-6ff621a79d30
6ebc7996240c4ce688234f544c9d0116 07427c9d49704357a049b24193ee0a28 - -
-] [instance: eedc008d-ef34-498d-8774-b3813ce032f4] Failed to upload 6c30e2ab-1078-45ad-bed2-3e3a75f6af8c to LXD: Connection to glance
ERROR nova_lxd.nova.virt.lxd.operations
[req-1459f1b2-491c-46a2-b803-6ff621a79d30
6ebc7996240c4ce688234f544c9d0116 07427c9d49704357a049b24193ee0a28 - -
-] [instance: eedc008d-ef34-498d-8774-b3813ce032f4] Faild to start container instance-00000020: Connection to glance host
http://10.206.193.159:9292 failed: Error finding address for
http://10.206.193.159:9292/v1/images/6c30e2ab-1078-45ad-bed2-3e3a75f6af8c:
('Connection aborted.', BadStatusLine("''",))
ERROR nova.compute.manager [req-1459f1b2-491c-46a2-b803-6ff621a79d30
6ebc7996240c4ce688234f544c9d0116 07427c9d49704357a049b24193ee0a28 - -
-] [instance: eedc008d-ef34-498d-8774-b3813ce032f4] Instance failed to spawn

Nodes not starting. Jar manifest missing

I am not not able to start nodes on a Linux server
I am getting the following (edited) output
[user#host nodes]$ ./runnodes
which: no osascript in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/home/user/.local/bin:/home/user/bin)
Starting nodes in /opt/nodes
Starting corda.jar in /opt/nodes/NodeA on debug port 5005
Starting corda-webserver.jar in /opt/nodes/Agent on debug port 5006
Starting corda.jar in /opt/nodes/NodeB on debug port 5007
Starting corda-webserver.jar in /opt/nodes/NodeB on debug port 5008
Starting corda.jar in /opt/nodes/NodeC on debug port 5009
Starting corda-webserver.jar in /opt/nodes/NodeC on debug port 5010
Starting corda.jar in /opt/nodes/NodeZ on debug port 5011
Starting corda-webserver.jar in /opt/nodes/NodeZ on debug port 5012
Started 8 processes
Finished starting nodes
[user#host nodes]$ Error opening zip file or JAR manifest missing : /home/user/.capsule/apps/net.corda.webserver.WebServer_0.12.1/quasar-core-0.7.6-jdk8.jar
Error occurred during initialization of VM
agent library failed to init: instrument
Error opening zip file or JAR manifest missing : /home/user/.capsule/apps/net.corda.node.Corda_0.12.1/quasar-core-0.7.6-jdk8.jar
Error occurred during initialization of VM
agent library failed to init: instrument
Error opening zip file or JAR manifest missing : /home/user/.capsule/apps/net.corda.webserver.WebServer_0.12.1/quasar-core-0.7.6-jdk8.jar
Error occurred during initialization of VM
agent library failed to init: instrument
Error opening zip file or JAR manifest missing : /home/user/.capsule/apps/net.corda.node.Corda_0.12.1/quasar-core-0.7.6-jdk8.jar
Error occurred during initialization of VM
agent library failed to init: instrument
Error opening zip file or JAR manifest missing : /home/user/.capsule/apps/net.corda.webserver.WebServer_0.12.1/quasar-core-0.7.6-jdk8.jar
Error occurred during initialization of VM
agent library failed to init: instrument
Error opening zip file or JAR manifest missing : /home/user/.capsule/apps/net.corda.node.Corda_0.12.1/quasar-core-0.7.6-jdk8.jar
Error occurred during initialization of VM
agent library failed to init: instrument
Listening for transport dt_socket at address: 5012
Unknown command line arguments: no-local-shell is not a recognized option
Listening for transport dt_socket at address: 5011
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/user/.capsule/apps/net.corda.node.Corda_0.12.1/log4j-slf4j-impl-2.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/nodes/NodeZ/dependencies/log4j-slf4j-impl-2.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
What am I missing to start these nodes?
Did you build the nodes on Mac, than transfer them to Linux? If so, try building the nodes directly on the Linux machine.

Resources