I'm try to run vault instance on aws and when i want to run command: vault operator init -key-shares=5 -key-threshold=3 -format json on Ansible role and i have error code :
fatal: [vault]: FAILED! => {"changed": true, "cmd": "vault operator init -key-shares=5 -key-threshold=3 -format json", "delta": "0:00:00.054870", "end": "2021-12-12 14:30:50.956504", "msg": "non-zero return code", "rc": 2, "start": "2021-12-12 14:30:50.901634", "stderr": "Error initializing: Put \"\": dial tcp connect: connection refused", "stderr_lines": ["Error initializing: Put \"\": dial tcp connect: connection refused"], "stdout": "", "stdout_lines": []}
When i'm on my vault server and when i do service vault status, i have this result :
vault.service - a tool for managing secrets
Loaded: loaded (/etc/systemd/system/vault.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2021-12-12 14:19:47 UTC; 6min ago
Process: 5152 ExecStart=/usr/local/bin/vault server -config=/etc/vault.hcl (code=exited, status=213/SECUREBITS)
Main PID: 5152 (code=exited, status=213/SECUREBITS)
Dec 12 14:19:47 ip-172-31-37-194 systemd[1]: Started a tool for managing secrets.
Dec 12 14:19:47 ip-172-31-37-194 systemd[5152]: vault.service: Failed to set process secure bits: Operation not perm
Dec 12 14:19:47 ip-172-31-37-194 systemd[5152]: vault.service: Failed at step SECUREBITS spawning /usr/local/bin/vau
Dec 12 14:19:47 ip-172-31-37-194 systemd[1]: vault.service: Main process exited, code=exited, status=213/SECUREBITS
Dec 12 14:19:47 ip-172-31-37-194 systemd[1]: vault.service: Failed with result 'exit-code'.
There'is my 2 config files :
vault.hcl :
disable_mlock = true
listener "tcp" {
address = "http://{{ listener_address }}"
tls_disable = 1
backend "file" {
path = "/var/lib/vault"
my vault.service :
Description=a tool for managing secrets
ExecStart=/usr/local/bin/vault server -config=/etc/vault.hcl
ExecReload=/usr/local/bin/kill --signal HUP $MAINPID
CapabilityBoundingSet=CAP_SYSLOG CAP_IPC_LOCK
I didn't find anything yet who could unlock this situation, if someone have an idea.


Fluent-bit can't verify ssl certificate

I'm having issues with ssl certificate verification. When I am trying to send logs to the server to nginx, I get an error message that says:
Feb 14 21:38:53 username td-agent-bit[31178]: [2022/02/14 21:38:53] [error] [tls] /tmp/fluent-bit-1.8.12/src/tls/mbedtls.c:380 X509 - Certificate verification failed, e.g. CRL, CA or signature check
Feb 14 21:38:53 username td-agent-bit[31178]: [2022/02/14 21:38:53] [error] [output:http:http.0] no upstream connections available to
Feb 14 21:38:53 username td-agent-bit[31178]: [2022/02/14 21:38:53] [ warn] [engine] failed to flush chunk '31025-1644867441.221825565.flb', retry in 32 seconds: task_id=20, input=storage_backlog.6 > out
put=http.0 (out_id=0)
Feb 14 21:38:53 username td-agent-bit[31178]: [2022/02/14 21:38:53] [ info] [output:http:http.0], HTTP status=200
Feb 14 21:38:53 username td-agent-bit[31178]: {"status":200}
Feb 14 21:38:54 username td-agent-bit[31178]: [2022/02/14 21:38:54] [error] [tls] /tmp/fluent-bit-1.8.12/src/tls/mbedtls.c:380 X509 - Certificate verification failed, e.g. CRL, CA or signature check
Feb 14 21:38:54 username td-agent-bit[31178]: [2022/02/14 21:38:54] [error] [output:http:http.0] no upstream connections available to
Feb 14 21:38:54 username td-agent-bit[31178]: [2022/02/14 21:38:54] [ warn] [engine] failed to flush chunk '31025-1644867401.174594241.flb', retry in 37 seconds: task_id=12, input=storage_backlog.6 > out
put=http.0 (out_id=0)
Feb 14 21:38:54 username td-agent-bit[31178]: [2022/02/14 21:38:54] [error] [tls] /tmp/fluent-bit-1.8.12/src/tls/mbedtls.c:380 X509 - Certificate verification failed, e.g. CRL, CA or signature check
Feb 14 21:38:54 username td-agent-bit[31178]: [2022/02/14 21:38:54] [error] [output:http:http.0] no upstream connections available to
Feb 14 21:38:54 username td-agent-bit[31178]: [2022/02/14 21:38:54] [ warn] [engine] failed to flush chunk '31025-1644867416.136883568.flb', retry in 12 seconds: task_id=15, input=storage_backlog.6 > out
put=http.0 (out_id=0)
Feb 14 21:38:54 username td-agent-bit[31178]: [2022/02/14 21:38:54] [error] [tls] /tmp/fluent-bit-1.8.12/src/tls/mbedtls.c:380 X509 - Certificate verification failed, e.g. CRL, CA or signature check
Feb 14 21:38:54 username td-agent-bit[31178]: [2022/02/14 21:38:54] [error] [output:http:http.0] no upstream connections available to
Feb 14 21:38:54 username td-agent-bit[31178]: [2022/02/14 21:38:54] [ warn] [engine] failed to flush chunk '31025-1644867481.167299560.flb', retry in 10 seconds: task_id=28, input=storage_backlog.6 > out
put=http.0 (out_id=0)
Feb 14 21:38:54 username td-agent-bit[31178]: [2022/02/14 21:38:54] [ info] [output:http:http.0], HTTP status=200
Feb 14 21:38:54 username td-agent-bit[31178]: {"status":200}
Feb 14 21:38:55 username td-agent-bit[31178]: [2022/02/14 21:38:55] [error] [tls] /tmp/fluent-bit-1.8.12/src/tls/mbedtls.c:380 X509 - Certificate verification failed, e.g. CRL, CA or signature check
Feb 14 21:38:55 username td-agent-bit[31178]: [2022/02/14 21:38:55] [error] [output:http:http.0] no upstream connections available to
Feb 14 21:38:55 username td-agent-bit[31178]: [2022/02/14 21:38:55] [ warn] [engine] failed to flush chunk '31178-1644867522.155353155.flb', retry in 19 seconds: task_id=3, input=tail.2 > output=http.0 (
Feb 14 21:38:55 username td-agent-bit[31178]: [2022/02/14 21:38:55] [ info] [output:http:http.0], HTTP status=200
Feb 14 21:38:55 username td-agent-bit[31178]: {"status":200}
CRL, CA or signature verification failed, for some reason. Verification passes only after certain number of attempts.
How to fix it?
# Flush
# =====
# set an interval of seconds before to flush records to a destination
flush 5
# Daemon
# ======
# instruct Fluent Bit to run in foreground or background mode.
daemon Off
# Log_Level
# =========
# Set the verbosity level of the service, values can be:
# - error
# - warning
# - info
# - debug
# - trace
# by default 'info' is set, that means it includes 'error' and 'warning'.
log_level info
# Parsers File
# ============
# specify an optional 'Parsers' configuration file
parsers_file parsers.conf
# Plugins File
# ============
# specify an optional 'Plugins' configuration file to load external plugins.
plugins_file plugins.conf
# HTTP Server
# ===========
# Enable/Disable the built-in HTTP Server for metrics
http_server Off
http_port 2020
# Storage
# =======
# Fluent Bit can use memory and filesystem buffering based mechanisms
# -
# storage metrics
# ---------------
# publish storage pipeline metrics in '/api/v1/storage'. The metrics are
# exported only if the 'http_server' option is enabled.
# storage.metrics on
# storage.path
# ------------
# absolute file system path to store filesystem data buffers (chunks).
storage.path /tmp/fluent-bit-storage/
# storage.sync
# ------------
# configure the synchronization mode used to store the data into the
# filesystem. It can take the values normal or full.
storage.sync normal
# storage.checksum
# ----------------
# enable the data integrity check when writing and reading data from the
# filesystem. The storage layer uses the CRC32 algorithm.
storage.checksum off
# storage.backlog.mem_limit
# -------------------------
# if storage.path is set, Fluent Bit will look for data chunks that were
# not delivered and are still in the storage layer, these are called
# backlog data. This option configure a hint of maximum value of memory
# to use when processing these records.
storage.backlog.mem_limit 2M
name tail
tag log.development.production
path /home/username/production.log
Buffer_Max_Size 2mb
Refresh_interval 5
Offset_Key offset
Path_Key path
storage.type filesystem
DB /tmp/production.db
DB.sync normal
DB.locking false
DB.journal_mode wal
# Read interval (sec) Default: 1
#interval_sec 1
name tail
tag log.development.nginx
path /home/username/nginx.log
Buffer_Max_Size 2mb
Refresh_interval 5
Offset_Key offset
Path_Key path
storage.type filesystem
DB /tmp/nginx.db
DB.sync normal
DB.locking false
DB.journal_mode wal
# Read interval (sec) Default: 1
#interval_sec 1
name tail
tag log.development.apache
path /home/username/apache.log
Buffer_Max_Size 2mb
Refresh_interval 5
Offset_Key offset
Path_Key path
storage.type filesystem
DB /tmp/apache.db
DB.sync normal
DB.locking false
DB.journal_mode wal
# Read interval (sec) Default: 1
#interval_sec 1
name tail
tag log.development.syslog
path /home/username/syslog.log
Buffer_Max_Size 2mb
Refresh_interval 5
Offset_Key offset
Path_Key path
storage.type filesystem
DB /tmp/syslog.db
DB.sync normal
DB.locking false
DB.journal_mode wal
# Read interval (sec) Default: 1
#interval_sec 1
name tail
tag log.development.postgres
path /home/username/postgres.log
Buffer_Max_Size 2mb
Refresh_interval 5
Offset_Key offset
Path_Key path
storage.type filesystem
DB /tmp/postgres.db
DB.sync normal
DB.locking false
DB.journal_mode wal
# Read interval (sec) Default: 1
#interval_sec 1
name tail
tag log.development.zabbix
path /home/username/zabbix.log
Buffer_Max_Size 2mb
Refresh_interval 5
Offset_Key offset
Path_Key path
storage.type filesystem
DB /tmp/zabbix.db
DB.sync normal
DB.locking false
DB.journal_mode wal
# Read interval (sec) Default: 1
#interval_sec 1
Name http
Match *
Port 443
http_User fluentbit
http_Passwd fluentbit
tls on
tls.verify on
tls.debug 4
tls.ca_file /home/username/cert/ca_1/CA.pem
tls.crt_file /home/username/cert/ca_1/signed_certificates/server.crt
tls.key_file /home/username/cert/ca_1/signed_certificates/server.key
Format json
Header_tag header_tag_is_here
Header Location localhost
Retry_Limit no_limits
server {
listen 443 ssl default_server;
listen [::]:443 ssl default_server;
ssl on;
ssl_certificate /home/username/cert/ca_1/signed_certificates/server.crt;
ssl_certificate_key /home/username/cert/ca_1/signed_certificates/server.key;
ssl_session_cache builtin:1000 shared:SSL:10m;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
server_name _;
location / {
proxy_pass http://localhost:3000/;

Apache Airflow : Dag task marked zombie, with background process running on remote server

**Apache Airflow version:**1.10.9-composer
Kubernetes Version : Client Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.12-gke.6002", GitCommit:"035184604aff4de66f7db7fddadb8e7be76b6717", GitTreeState:"clean", BuildDate:"2020-12-01T23:13:35Z", GoVersion:"go1.12.17b4", Compiler:"gc", Platform:"linux/amd64"}
Environment: Airflow, running on top of Kubernetes - Linux version 4.19.112
OS : Linux version 4.19.112+ (builder#7fc5cdead624) (Chromium OS 9.0_pre361749_p20190714-r4 clang version 9.0.0 (/var/cache/chromeos-cache/distfiles/host/egit-src/llvm-project c11de5eada2decd0a495ea02676b6f4838cd54fb) (based on LLVM 9.0.0svn)) #1 SMP Fri Sep 4 12:00:04 PDT 2020
Kernel : Linux gke-europe-west2-asset-c-default-pool-dc35e2f2-0vgz
4.19.112+ #1 SMP Fri Sep 4 12:00:04 PDT 2020 x86_64 Intel(R) Xeon(R) CPU # 2.20GHz GenuineIntel GNU/Linux
What happened ?
A running task is marked as Zombie after the execution time crossed the latest heartbeat + 5 minutes.
The task is running in background in another application server, triggered using SSHOperator.
[2021-01-18 11:53:37,491] {} INFO - Executing <Task(SSHOperator): load_trds_option_composite_file> on 2021-01-17T11:40:00+00:00
[2021-01-18 11:53:37,495] {} INFO - Running on host: airflow-worker-6f6fd78665-lm98m
[2021-01-18 11:53:37,495] {} INFO - Running: ['airflow', 'run', 'dsp_etrade_process_trds_option_composite_0530', 'load_trds_option_composite_file', '2021-01-17T11:40:00+00:00', '--job_id', '282759', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/', '--cfg_path', '/tmp/tmpge4_nva0']
Task Executing time:
dag_id dsp_etrade_process_trds_option_composite_0530
duration 7270.47
start_date 2021-01-18 11:53:37,491
end_date 2021-01-18 13:54:47.799728+00:00
Scheduler Logs during that time:
[2021-01-18 13:54:54,432] {} ERROR - <TaskInstance: dsp_etrade_process_etrd.push_run_date 2021-01-18 13:30:00+00:00 [running]> detected as zombie
textPayload: "[2021-01-18 13:54:54,432] {} ERROR - <TaskInstance: dsp_etrade_process_etrd.push_run_date 2021-01-18 13:30:00+00:00 [running]> detected as zombie"
insertId: "1ca8zyfg3zvma66"
resource: {
type: "cloud_composer_environment"
labels: {3}
timestamp: "2021-01-18T13:54:54.432862699Z"
severity: "ERROR"
logName: "projects/asset-control-composer-prod/logs/airflow-scheduler"
receiveTimestamp: "2021-01-18T13:54:55.714437665Z"
Airflow-webserver log :
X.X.X.X - - [18/Jan/2021:13:54:39 +0000] "GET /_ah/health HTTP/1.1" 200 187 "-" "GoogleHC/1.0"
textPayload: " - - [18/Jan/2021:13:54:39 +0000] "GET /_ah/health HTTP/1.1" 200 187 "-" "GoogleHC/1.0"
insertId: "1sne0gqg43o95n3"
resource: {2}
timestamp: "2021-01-18T13:54:45.401670481Z"
logName: "projects/asset-control-composer-prod/logs/airflow-webserver"
receiveTimestamp: "2021-01-18T13:54:50.598807514Z"
Airflow Info logs :
2021-01-18 08:54:47.799 EST
textPayload: "NoneType: None
insertId: "1ne3hqgg47yzrpf"
resource: {2}
timestamp: "2021-01-18T13:54:47.799661030Z"
severity: "INFO"
logName: "projects/asset-control-composer-prod/logs/airflow-scheduler"
receiveTimestamp: "2021-01-18T13:54:50.914461159Z"
[2021-01-18 13:54:47,800] {} INFO - Marking task as FAILED.dag_id=dsp_etrade_process_trds_option_composite_0530, task_id=load_trds_option_composite_file, execution_date=20210117T114000, start_date=20210118T115337, end_date=20210118T135447
Copy link
textPayload: "[2021-01-18 13:54:47,800] {} INFO - Marking task as FAILED.dag_id=dsp_etrade_process_trds_option_composite_0530, task_id=load_trds_option_composite_file, execution_date=20210117T114000, start_date=20210118T115337, end_date=20210118T135447"
insertId: "1ne3hqgg47yzrpg"
resource: {2}
timestamp: "2021-01-18T13:54:47.800605248Z"
severity: "INFO"
logName: "projects/asset-control-composer-prod/logs/airflow-scheduler"
receiveTimestamp: "2021-01-18T13:54:50.914461159Z"
Airflow Database shows the latest heartbeat as:
select state, latest_heartbeat from job where id=282759
state | latest_heartbeat
running | 2021-01-18 13:48:41.891934
Airflow Configurations:
Kubernetes Cluster :
Worker nodes : 6
What was expected to happen ?
The backend process takes around 2hrs 30 minutes to finish. During
such long running jobs the task is detected as zombie. Eventhough the
worker node is still processing the task. The state of the job is
still marked as 'running'. State if the task is not known during the
run time.

JFrog Artifactory fails to connect to PostgreSQL database

I followed the following guides on installing JFrog Artifactory OSS using RPM/Yum and using an external PostgreSQL database.
SELinux is disabled and jfrog-artifactory-oss is installed from the JFrog repository [].
Check the service:
[root#jfrog ~]# systemctl status artifactory -l
● artifactory.service - Artifactory service
Loaded: loaded (/usr/lib/systemd/system/artifactory.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2020-08-08 01:56:50 +08; 11min ago
Process: 9714 ExecStop=/opt/jfrog/artifactory/app/bin/ stop (code=exited, status=0/SUCCESS)
Process: 10268 ExecStart=/opt/jfrog/artifactory/app/bin/ start (code=exited, status=0/SUCCESS)
Main PID: 12388 (java)
CGroup: /system.slice/artifactory.service
‣ 12388 /opt/jfrog/artifactory/app/third-party/java/bin/java -Djava.util.logging.config.file=/opt/jfrog/artifactory/app/artifactory/tomcat/conf/ -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -server -Xss256k -XX:+UseG1GC -XX:OnOutOfMemoryError=kill -9 %p --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.lang.reflect=ALL-UNNAMED --add-opens java.base/java.lang.invoke=ALL-UNNAMED --add-opens java.base/java.text=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.desktop/java.awt.font=ALL-UNNAMED -Dfile.encoding=UTF8 -Djruby.compile.invokedynamic=false -Djruby.bytecode.version=1.8 -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true -Dartdist=rpm -Djf.product.home=/opt/jfrog/artifactory -Xms512m -Xmx3g -Djruby.bytecode.version=1.8 -Dartifactory.metadata.native.ui=true -Dignore.endorsed.dirs= -classpath /opt/jfrog/artifactory/app/artifactory/tomcat/bin/bootstrap.jar:/opt/jfrog/artifactory/app/artifactory/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/opt/jfrog/artifactory/app/artifactory/tomcat -Dcatalina.home=/opt/jfrog/artifactory/app/artifactory/tomcat org.apache.catalina.startup.Bootstrap start
Aug 08 01:56:50 jfrog[10268]: 2020-08-07T17:56:50.027Z [shell] [INFO ] [] [ ] [main] - Resolved shared.logging.consoleLog.enabled (true) from /opt/jfrog/artifactory/var/etc/system.yaml
Aug 08 01:56:50 jfrog[10268]: JF_METADATA_ACCESSCLIENT_URL: http://localhost:8081/access
Aug 08 01:56:50 jfrog[10268]: metadata started. PID: 12988
Aug 08 01:56:50 jfrog su[13048]: (to artifactory) root on none
Aug 08 01:56:50 jfrog[10268]: Starting frontend...
Aug 08 01:56:50 jfrog[10268]: frontend not running. Proceed to start it up.
Aug 08 01:56:50 jfrog[10268]: 2020-08-07T17:56:50.317Z [shell] [INFO ] [] [ ] [main] - Resolved shared.logging.consoleLog.enabled (true) from /opt/jfrog/artifactory/var/etc/system.yaml
Aug 08 01:56:50 jfrog[10268]: frontend started. PID: 13147
Aug 08 01:56:50 jfrog systemd[1]: Started Artifactory service.
Aug 08 01:56:51 jfrog[10268]: 2020-08-07T17:56:51.003Z [shell] [INFO ] [] [ ] [main] - Resolved shared.logging.consoleLog.enabled (true) from /opt/jfrog/artifactory/var/etc/system.yaml
[root#jfrog ~]#
[root#jfrog ~]# curl -I http://localhost:8082/ui/
HTTP/1.1 503 Service Unavailable
Date: Fri, 07 Aug 2020 18:08:50 GMT
Content-Length: 19
Content-Type: text/plain; charset=utf-8
[root#jfrog ~]#
/opt/jfrog/artifactory/var/log/console.log shows the following errors:
[DEBUG] Resolved system configuration file path: /opt/jfrog/artifactory/var/etc/system.yaml
No ssl parameter found, falling back to sslmode=disable
2020-08-07T17:56:50.179Z [jfmd ] [INFO ] [1462831a45a25233] [database_bearer.go:84 ] [main ] - Connecting to (db config: {postgresql user='jfroguser' password='***' dbname=jfrogdb port= sslmode=disable}) [database]
2020-08-07T17:56:50.216Z [jfmd ] [ERROR] [1462831a45a25233] [database_bearer.go:68 ] [main ] - Could not initialize database (db config: {postgresql user='jfroguser' password='***' dbname=jfrogdb port= sslmode=disable}): error connecting to database*databaseBearer).init
goroutine 1 [running]:
runtime/debug.Stack(0x38, 0xc00015c040, 0xc00032c080)
/src/runtime/debug/stack.go:24 +0x9d*standardLogger).Panicfc(0xc00043bda0, 0x166e420, 0xc000142750, 0x13eb133, 0x32, 0xc00032c080, 0x2, 0x2)
/src/ +0x6a, 0xc000142750, 0x166f220, 0xc00007f770, 0x1673460, 0xc0000c97c0, 0x1666260, 0xc000011098, 0x16489c0, 0xc00043bd70, ...)
/src/ +0x2d4
/src/ +0x5b7
panic: Could not initialize database (db config: {postgresql user='jfroguser' password='***' dbname=jfrogdb port= sslmode=disable}): error connecting to database*databaseBearer).init
goroutine 1 [running]:
runtime/debug.Stack(0x38, 0xc00015c040, 0xc00032c080)
/src/runtime/debug/stack.go:24 +0x9d*standardLogger).Panicfc(0xc00043bda0, 0x166e420, 0xc000142750, 0x13eb133, 0x32, 0xc00032c080, 0x2, 0x2)
/src/ +0x6a, 0xc000142750, 0x166f220, 0xc00007f770, 0x1673460, 0xc0000c97c0, 0x1666260, 0xc000011098, 0x16489c0, 0xc00043bd70, ...)
/src/ +0x2d4
/src/ +0x5b7
goroutine 1 [running]:*Logger).Panic.func1(0xc000358500, 0x4bb)
/pkg/mod/ +0x4f*Event).msg(0xc0000be240, 0xc000358500, 0x4bb)
/pkg/mod/ +0x200*Event).Msgf(0xc0000be240, 0xc000961dc0, 0x35, 0xc00015c0c0, 0x3, 0x4)
/pkg/mod/ +0x83*standardLogger).logMessage(0xc00043bda0, 0x166e420, 0xc000142750, 0xc0000be240, 0xc000961dc0, 0x35, 0xc00015c0c0, 0x3, 0x4)
/src/ +0x197*standardLogger).Panicfc(0xc00043bda0, 0x166e420, 0xc000142750, 0x13eb133, 0x32, 0xc00015c0c0, 0x3, 0x4)
/src/ +0x1df, 0xc000142750, 0x166f220, 0xc00007f770, 0x1673460, 0xc0000c97c0, 0x1666260, 0xc000011098, 0x16489c0, 0xc00043bd70, ...)
/src/ +0x2d4
/src/ +0x5b7
Any ideas what to check? The server is an up-to-date Centos 7 server. Login to the external database is also possible:
[root#jfrog ~]# psql -h -p 5432 -U jfrog
Password for user jfrog:
psql (11.8)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.
jfrog=> SHOW server_version;
(1 row)
jfrog=> \q
[root#jfrog ~]#

AMQP server on localhost:5672 is unreachable: [Errno 111] ECONNREFUSED

i am trying to add additional compute node on different virtual machine to the pre-installed openstack. I disabled the firewall services,enable to ping other virtual machine.. but still compute node is not able to register with Rabbitmq service running on controller node..
Here it is my nova.conf file...
root_helper=sudo nova-rootwrap /etc/nova/rootwrap.conf
rpc_backend = rabbit
auth_strategy = keystone
use_neutron = True
firewall_driver = nova.virt.firewall.NoopFirewallDriver
my_ip = #compute node ip
rabbit_host= #controller_node_ip
rabbit_port = 5672
rabbit_userid = stackrabbit
rabbit_password = devstack
rabbit_use_ssl = False
auth_uri = http://controller_node_ip:5000
auth_url = http://controller_node_ip:35357
memcached_servers = controller_node_ip:11211
auth_type = password
project_domain_name = default
user_domain_name = default
project_name = service
username = nova
password = devstack
auth_host = controller_node_ip
auth_port = 35357
auth_protocol = http
enabled = True
vncserver_listen =
vncserver_proxyclient_address = $my_ip
novncproxy_base_url = http://controller_node_ip:6080/vnc_auto.html
api_servers = http://controller_node_ip:9292
lock_path = /var/lib/nova/tmp
Here it is my nova-compute.log:
2016-09-20 19:08:57.701 7201 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on localhost:5672
2016-09-20 19:08:57.701 7201 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds...
2016-09-20 19:08:58.708 7201 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on localhost:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 30 seconds...
Please suggest me something so that i can resolve this issue...
Thank you in advance...
I encountered this when expanding my nova-compute estate (although I'm not using Devstack).
In my newly created compute server, the following was seen in /var/log/nova/nova-compute.log : -
2017-11-14 11:40:53.287 52408 ERROR oslo.messaging._drivers.impl_rabbit [req-adfd6dc7-fe8c-4de5-8401-58d325c3b4a8 - - - - -] [be6e0302-dfc8-4512-8b48-0d824fc6ea14] AMQP server on is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds. Client port: None
The solution was quite simple. I checked /var/log/sysinfo (I run ubuntu; /var/log/messages for those on Redhat systems) and could see the following lines:-
Nov 14 12:01:48 compute2 systemd[1]: Started OpenStack Compute.
Nov 14 12:01:49 compute2 nova-compute[3222]: Traceback (most recent call last):
Nov 14 12:01:49 compute2 nova-compute[3222]: File "/usr/bin/nova-compute", line 10, in <module>
Nov 14 12:01:49 compute2 nova-compute[3222]: sys.exit(main())
Nov 14 12:01:49 compute2 nova-compute[3222]: File "/usr/lib/python2.7/dist-packages/nova/cmd/", line 42, in main
Nov 14 12:01:49 compute2 nova-compute[3222]: config.parse_args(sys.argv)
Nov 14 12:01:49 compute2 nova-compute[3222]: File "/usr/lib/python2.7/dist-packages/nova/", line 52, in parse_args
Nov 14 12:01:49 compute2 nova-compute[3222]: default_config_files=default_config_files)
Nov 14 12:01:49 compute2 nova-compute[3222]: File "/usr/lib/python2.7/dist-packages/oslo_config/", line 2355, in __call__
Nov 14 12:01:49 compute2 nova-compute[3222]: self._namespace._files_permission_denied)
Nov 14 12:01:49 compute2 nova-compute[3222]: oslo_config.cfg.ConfigFilesPermissionDeniedError: Failed to open some config files: /etc/nova/nova.conf
Nov 14 12:01:49 compute2 systemd[1]: nova-compute.service: Main process exited, code=exited, status=1/FAILURE
Which shows that my /etc/nova/nova.conf file was unreadable. It turns out this was because I used scp to copy the nova.conf from my first compute to my new machine, and the file was read-only to the root user. The solution was to (on my new compute)
cd /etc/nova/
chown nova:nova nova.conf
service nova-compute restart

Unable to connect to dynamoDB table - UnknownEndpoint: Inaccessible host:

I'm new to dynamoDB. I have created a table and am trying to insert data into the table. It works well when I connect from my home internet. But when I try from my office network, I get the below error:
I suspect this is due to proxy issues. Can you please help me resolve this issue? Thank you.
[UnknownEndpoint: Inaccessible host:'. This service may not be available in theap-southeast-2' region.]
message: 'Inaccessible host:\'. This service may not be available in theap-southeast-2\' region.',
code: 'UnknownEndpoint',
region: 'ap-southeast-2',
hostname: '',
retryable: true,
{ [NetworkingError: getaddrinfo ENOTFOUND]
message: 'getaddrinfo ENOTFOUND',
code: 'NetworkingError',
errno: 'ENOTFOUND',
syscall: 'getaddrinfo',
hostname: '',
host: '',
port: 443,
region: 'ap-southeast-2',
retryable: true,
time: Mon Sep 21 2015 11:19:58 GMT+1000 (AUS Eastern Standard Time) },
time: Mon Sep 21 2015 11:19:58 GMT+1000 (AUS Eastern Standard Time) }
Thank you for the pointers. I managed to solve the issue using below code snipped.
var proxy = require('proxy-agent');
httpOptions: {
agent: proxy('http://{user_name}:{password}#<proxy>:<port>')
This is documented in amazon's aws-sdk configuration site:
