How to deploy listener of text-index on NebulaGraph Database? - nebula-graph

Here are the steps (and problems):
to stop 172.16.0.17 nebula graph
sudo /usr/local/nebula/scripts/nebula.service stop all
kill -9 to stop listener
restart service
sudo /usr/local/nebula/scripts/nebula.service start all
start listener
./bin/nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged-listener.conf
On 172.16.0.20 nebula graph, create a new space and use this space. Then
ADD LISTENER ELASTICSEARCH 172.16.0.17:9789
to add listener
SHOW LISTENER. Here is the problem: It's offline

The major reason is that one step is missing:
We must sign in text service before add listener. I.e., before step 4. Otherwise, an error occurs:
Running on machine: k3s01
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E1130 19:25:45.874213 20331 MetaClient.cpp:636] Send request to "172.16.0.17":9559, exceed retry limit
E1130 19:25:45.874900 20339 MetaClient.cpp:139] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused

Related

Airflow Papermill operator: task externally skipped after 60 minutes

I am using Airflow in a Docker container. I run a DAG with multiple Jupyter notebooks. I have the following error everytime after 60 minutes:
[2021-08-22 09:15:15,650] {local_task_job.py:198} WARNING - State of this instance has been externally set to skipped. Terminating instance.
[2021-08-22 09:15:15,654] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 277
[2021-08-22 09:15:15,655] {taskinstance.py:1284} ERROR - Received SIGTERM. Terminating subprocesses.
[2021-08-22 09:15:18,284] {taskinstance.py:1501} ERROR - Task failed with exception
I tried to tweak the config file but could not find the good option to remove the 1 hour timeout.
Any help would be appreciated.
The default is no timeout. When your DAG defines dagrun_timeout=timedelta(minutes=60) and execution time exceeds 60 minutes then active task stops with message "State of this instance has been externally set to skipped" logged.

Rabbitmq: Node down

I am getting node down error on rabbitmq, this is happening sometimes.
Able to see the below error when I execute: sudo rabbitmqctl status or sudo rabbitmqctl list_queues
Error: unable to connect to node : nodedown
connected to epmd (port 4369) on host-name
epmd reports node 'rabbit' running on port 25672
can't establish TCP connection, reason: timeout
suggestion: blocked by firewall?
version: {rabbit,"RabbitMQ","3.6.9"}
os: Ubuntu 16.04
I have checked hostname which is ok with me, not changed since the installation
Also able to telnet localhost 25672
What could be the reason behind this error and possible solution?
And one more question, I am checking node status using below API
curl -s GET http://edx:edx#127.0.0.1:15672/api/healthchecks/node/
Is above API ok or not to check the health status of the node? Please suggest if there is anything else. I have set up one shell script which will call this API and if status is not ok then it will restart rabbitmq-server service. Script is executed from cron every minute.
Looks like your rabbitmq node is... down. rabbitmqctl needs a running node to perform these commands.
If you're using systemd, you can check the service status:
service rabbitmq-server status
Or just try to restart the node:
rabbitmqctl start_app
Telnet on port 25672 tells you the rabbitmqctl is running, but RabbitMQ itself does not run on that port (by default, it's listening on 5672).

Getting error while redeploying nodes

I am still using Corda 1.0 version. when i try to redeploy nodes with existing data, getting below error while start-up but able to access the nodes . If i clear the data and redeploy the nodes, i didn't face these error message.
Logs can be found in :
C:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\kotlin-
source\build\nodes\xxxxxxxx\logs
Database connection url is : jdbc:h2:tcp://xxxxxxxxx/node
E 18:38:46+0530 [main] core.client.createConnection - AMQ214016: Failed to
create netty connection
javax.net.ssl.SSLException: handshake timed out
at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source) ~[netty
all-4.1.9.Final.jar:4.1.9.Final]
Incoming connection address : xxxxxxxxxxxx
Listening on port : 10014
RPC service listening on port : 10015
Loaded CorDapps : corda-finance-1.0.0, kotlin-
source-0.1, corda-core-1.0.0
Node for "xxxxxxxxxxx" started up and registered in 213.08 sec
Welcome to the Corda interactive shell.
Useful commands include 'help' to see what is available, and 'bye' to shut
down the node.
Wed May 23 18:39:20 IST 2018>>> E 18:39:24+0530 [Thread-6 (ActiveMQ-server-
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImp
l$3#4a532271)] core.client.createConnection - AMQ214016: Failed to create
netty connection
javax.net.ssl.SSLException: handshake timed out
at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source) ~[netty-
all-4.1.9.Final.jar:4.1.9.Final]
This looks like the Artemis failed to connect to the node which means the node fails to start.
You should look at the log and if there are any other previous Corda node started which occupy the node.
If there are any legacy Corda nodes that have not been killed, try ps -ef |grep java to see if there is any other java still alive. Especially look for the port number and check if they are overlapped

ICp 2.1.0.1: Installation failed with error TASK [master: Waiting for MariaDB service to start]

I am installing ICp 2.1.0.1 and I received an error at the TASK
[master: Waiting for MariaDB service to start] msg: The MariaDB
component failed to start.
After this msg the installation completed with failed status.
We are installing ICp with 3 Masters, 3 Proxies and 2 Workers. We have 1 IP for VIP master and 1 for VIP proxy.
I tried to install multiple times and all installations got the same error.
For prior issues with that error, the correct db admin password was not used. So check the db user and password to resolve issue.
Would you validate whether each master host was able to access port 3306 on the other hosts?
If you run with .. install -vv | tee -a install-log.txt, do you get additional details as well?
The error was solved by following the steps below.
Check whether kubelet is running:
Log in to your master node.
Run the following command to check kubelet status:
systemctl status kubelet
If kubelet is not running, run the following command to get the logs:
journalctl -u kubelet &> kubelet.log
We found the error in the kubelet.log log:
Error: failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false.
We found this troubleshoot in this link, and the solution at the ICP issue 4651.
https://www.ibm.com/support/knowledgecenter/en/SSBS6K_2.1.0/troubleshoot/etcd_fails.html
https://github.ibm.com/IBMPrivateCloud/roadmap/issues/4651

Autosys jobs hung

We have jobs getting stuck in autosys R11 screen due to app server down
So is there any way to monitor autoys itself is up and running
Note-The jobs which got stuck shows completed in database but the dependent jobs cannot start though from front end the jobs are still in runnig status
Please help
chk_auto_up command will check if application server, event server,
scheduler and agent are working fine.
chase command checks if agent is running fine.
autoping command checks if agent is able to communicate with the
application server.
Check the log files of components by below commands :
autosyslog -e (scheduler)
autosyslog -s (server)
autosyslog -d j (job)
check the status of each components manually by below commands
unisrvcntr status waae_server.$AUTOSERV
unisrvcntr status waae_agent-$AGENT_NAME
unisrvcntr status waae_webserver.$AUOTSERV
unisrvcntr status waae_sched.$AUTOSERV

Resources