JFrog Artifactory fails to connect to PostgreSQL database - artifactory

I followed the following guides on installing JFrog Artifactory OSS using RPM/Yum and using an external PostgreSQL database.
https://www.jfrog.com/confluence/display/JFROG/Installing+Artifactory
https://www.jfrog.com/confluence/display/RTF6X/PostgreSQL
SELinux is disabled and jfrog-artifactory-oss is installed from the JFrog repository [https://jfrog.bintray.com/artifactory-rpms].
Check the service:
[root#jfrog ~]# systemctl status artifactory -l
● artifactory.service - Artifactory service
Loaded: loaded (/usr/lib/systemd/system/artifactory.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2020-08-08 01:56:50 +08; 11min ago
Process: 9714 ExecStop=/opt/jfrog/artifactory/app/bin/artifactoryManage.sh stop (code=exited, status=0/SUCCESS)
Process: 10268 ExecStart=/opt/jfrog/artifactory/app/bin/artifactoryManage.sh start (code=exited, status=0/SUCCESS)
Main PID: 12388 (java)
CGroup: /system.slice/artifactory.service
‣ 12388 /opt/jfrog/artifactory/app/third-party/java/bin/java -Djava.util.logging.config.file=/opt/jfrog/artifactory/app/artifactory/tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -Dorg.apache.catalina.security.SecurityListener.UMASK=0027 -server -Xss256k -XX:+UseG1GC -XX:OnOutOfMemoryError=kill -9 %p --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.lang.reflect=ALL-UNNAMED --add-opens java.base/java.lang.invoke=ALL-UNNAMED --add-opens java.base/java.text=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.desktop/java.awt.font=ALL-UNNAMED -Dfile.encoding=UTF8 -Djruby.compile.invokedynamic=false -Djruby.bytecode.version=1.8 -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true -Djava.security.egd=file:/dev/./urandom -Dartdist=rpm -Djf.product.home=/opt/jfrog/artifactory -Xms512m -Xmx3g -Djruby.bytecode.version=1.8 -Dartifactory.metadata.native.ui=true -Dignore.endorsed.dirs= -classpath /opt/jfrog/artifactory/app/artifactory/tomcat/bin/bootstrap.jar:/opt/jfrog/artifactory/app/artifactory/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/opt/jfrog/artifactory/app/artifactory/tomcat -Dcatalina.home=/opt/jfrog/artifactory/app/artifactory/tomcat -Djava.io.tmpdir=/opt/jfrog/artifactory/var/work/artifactory/tomcat/temp org.apache.catalina.startup.Bootstrap start
Aug 08 01:56:50 jfrog artifactoryManage.sh[10268]: 2020-08-07T17:56:50.027Z [shell] [INFO ] [] [systemYamlHelper.sh:462 ] [main] - Resolved shared.logging.consoleLog.enabled (true) from /opt/jfrog/artifactory/var/etc/system.yaml
Aug 08 01:56:50 jfrog artifactoryManage.sh[10268]: JF_METADATA_ACCESSCLIENT_URL: http://localhost:8081/access
Aug 08 01:56:50 jfrog artifactoryManage.sh[10268]: metadata started. PID: 12988
Aug 08 01:56:50 jfrog su[13048]: (to artifactory) root on none
Aug 08 01:56:50 jfrog artifactoryManage.sh[10268]: Starting frontend...
Aug 08 01:56:50 jfrog artifactoryManage.sh[10268]: frontend not running. Proceed to start it up.
Aug 08 01:56:50 jfrog artifactoryManage.sh[10268]: 2020-08-07T17:56:50.317Z [shell] [INFO ] [] [systemYamlHelper.sh:462 ] [main] - Resolved shared.logging.consoleLog.enabled (true) from /opt/jfrog/artifactory/var/etc/system.yaml
Aug 08 01:56:50 jfrog artifactoryManage.sh[10268]: frontend started. PID: 13147
Aug 08 01:56:50 jfrog systemd[1]: Started Artifactory service.
Aug 08 01:56:51 jfrog artifactoryManage.sh[10268]: 2020-08-07T17:56:51.003Z [shell] [INFO ] [] [systemYamlHelper.sh:462 ] [main] - Resolved shared.logging.consoleLog.enabled (true) from /opt/jfrog/artifactory/var/etc/system.yaml
[root#jfrog ~]#
Test:
[root#jfrog ~]# curl -I http://localhost:8082/ui/
HTTP/1.1 503 Service Unavailable
Date: Fri, 07 Aug 2020 18:08:50 GMT
Content-Length: 19
Content-Type: text/plain; charset=utf-8
[root#jfrog ~]#
/opt/jfrog/artifactory/var/log/console.log shows the following errors:
[DEBUG] Resolved system configuration file path: /opt/jfrog/artifactory/var/etc/system.yaml
No ssl parameter found, falling back to sslmode=disable
2020-08-07T17:56:50.179Z [jfmd ] [INFO ] [1462831a45a25233] [database_bearer.go:84 ] [main ] - Connecting to (db config: {postgresql user='jfroguser' password='***' dbname=jfrogdb host=dbserver.example.com port= sslmode=disable}) [database]
2020-08-07T17:56:50.216Z [jfmd ] [ERROR] [1462831a45a25233] [database_bearer.go:68 ] [main ] - Could not initialize database (db config: {postgresql user='jfroguser' password='***' dbname=jfrogdb host=dbserver.example.com port= sslmode=disable}): error connecting to database
jfrog.com/metadata/services/common/db.(*databaseBearer).init
/src/jfrog.com/metadata/services/common/db/database_bearer.go:114
jfrog.com/metadata/services/common/db.NewDatabaseBearer
/src/jfrog.com/metadata/services/common/db/database_bearer.go:66
main.main
/src/jfrog.com/metadata/metadata.go:38
runtime.main
/src/runtime/proc.go:203
runtime.goexit
/src/runtime/asm_amd64.s:1373
goroutine 1 [running]:
runtime/debug.Stack(0x38, 0xc00015c040, 0xc00032c080)
/src/runtime/debug/stack.go:24 +0x9d
jfrog.com/jfrog-go-commons/pkg/log.(*standardLogger).Panicfc(0xc00043bda0, 0x166e420, 0xc000142750, 0x13eb133, 0x32, 0xc00032c080, 0x2, 0x2)
/src/jfrog.com/go-commons/pkg/log/standard_logger.go:42 +0x6a
jfrog.com/metadata/services/common/db.NewDatabaseBearer(0x166e420, 0xc000142750, 0x166f220, 0xc00007f770, 0x1673460, 0xc0000c97c0, 0x1666260, 0xc000011098, 0x16489c0, 0xc00043bd70, ...)
/src/jfrog.com/metadata/services/common/db/database_bearer.go:68 +0x2d4
main.main()
/src/jfrog.com/metadata/metadata.go:38 +0x5b7
[database]
panic: Could not initialize database (db config: {postgresql user='jfroguser' password='***' dbname=jfrogdb host=dbserver.example.com port= sslmode=disable}): error connecting to database
jfrog.com/metadata/services/common/db.(*databaseBearer).init
/src/jfrog.com/metadata/services/common/db/database_bearer.go:114
jfrog.com/metadata/services/common/db.NewDatabaseBearer
/src/jfrog.com/metadata/services/common/db/database_bearer.go:66
main.main
/src/jfrog.com/metadata/metadata.go:38
runtime.main
/src/runtime/proc.go:203
runtime.goexit
/src/runtime/asm_amd64.s:1373
goroutine 1 [running]:
runtime/debug.Stack(0x38, 0xc00015c040, 0xc00032c080)
/src/runtime/debug/stack.go:24 +0x9d
jfrog.com/jfrog-go-commons/pkg/log.(*standardLogger).Panicfc(0xc00043bda0, 0x166e420, 0xc000142750, 0x13eb133, 0x32, 0xc00032c080, 0x2, 0x2)
/src/jfrog.com/go-commons/pkg/log/standard_logger.go:42 +0x6a
jfrog.com/metadata/services/common/db.NewDatabaseBearer(0x166e420, 0xc000142750, 0x166f220, 0xc00007f770, 0x1673460, 0xc0000c97c0, 0x1666260, 0xc000011098, 0x16489c0, 0xc00043bd70, ...)
/src/jfrog.com/metadata/services/common/db/database_bearer.go:68 +0x2d4
main.main()
/src/jfrog.com/metadata/metadata.go:38 +0x5b7
goroutine 1 [running]:
github.com/rs/zerolog.(*Logger).Panic.func1(0xc000358500, 0x4bb)
/pkg/mod/github.com/rs/zerolog#v1.18.0/log.go:338 +0x4f
github.com/rs/zerolog.(*Event).msg(0xc0000be240, 0xc000358500, 0x4bb)
/pkg/mod/github.com/rs/zerolog#v1.18.0/event.go:146 +0x200
github.com/rs/zerolog.(*Event).Msgf(0xc0000be240, 0xc000961dc0, 0x35, 0xc00015c0c0, 0x3, 0x4)
/pkg/mod/github.com/rs/zerolog#v1.18.0/event.go:126 +0x83
jfrog.com/jfrog-go-commons/pkg/log.(*standardLogger).logMessage(0xc00043bda0, 0x166e420, 0xc000142750, 0xc0000be240, 0xc000961dc0, 0x35, 0xc00015c0c0, 0x3, 0x4)
/src/jfrog.com/go-commons/pkg/log/standard_logger.go:61 +0x197
jfrog.com/jfrog-go-commons/pkg/log.(*standardLogger).Panicfc(0xc00043bda0, 0x166e420, 0xc000142750, 0x13eb133, 0x32, 0xc00015c0c0, 0x3, 0x4)
/src/jfrog.com/go-commons/pkg/log/standard_logger.go:43 +0x1df
jfrog.com/metadata/services/common/db.NewDatabaseBearer(0x166e420, 0xc000142750, 0x166f220, 0xc00007f770, 0x1673460, 0xc0000c97c0, 0x1666260, 0xc000011098, 0x16489c0, 0xc00043bd70, ...)
/src/jfrog.com/metadata/services/common/db/database_bearer.go:68 +0x2d4
main.main()
/src/jfrog.com/metadata/metadata.go:38 +0x5b7
Any ideas what to check? The server is an up-to-date Centos 7 server. Login to the external database is also possible:
[root#jfrog ~]# psql -h dbserver.example.com -p 5432 -U jfrog
Password for user jfrog:
psql (11.8)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.
jfrog=> SHOW server_version;
server_version
----------------
11.8
(1 row)
jfrog=> \q
[root#jfrog ~]#

Related

vault dial tcp 127.0.0.1:8200: connect: connection refused

I'm try to run vault instance on aws and when i want to run command: vault operator init -key-shares=5 -key-threshold=3 -format json on Ansible role and i have error code :
fatal: [vault]: FAILED! => {"changed": true, "cmd": "vault operator init -key-shares=5 -key-threshold=3 -format json", "delta": "0:00:00.054870", "end": "2021-12-12 14:30:50.956504", "msg": "non-zero return code", "rc": 2, "start": "2021-12-12 14:30:50.901634", "stderr": "Error initializing: Put \"http://127.0.0.1:8200/v1/sys/init\": dial tcp 127.0.0.1:8200: connect: connection refused", "stderr_lines": ["Error initializing: Put \"http://127.0.0.1:8200/v1/sys/init\": dial tcp 127.0.0.1:8200: connect: connection refused"], "stdout": "", "stdout_lines": []}
When i'm on my vault server and when i do service vault status, i have this result :
vault.service - a tool for managing secrets
Loaded: loaded (/etc/systemd/system/vault.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2021-12-12 14:19:47 UTC; 6min ago
Docs: https://vaultproject.io/docs/
Process: 5152 ExecStart=/usr/local/bin/vault server -config=/etc/vault.hcl (code=exited, status=213/SECUREBITS)
Main PID: 5152 (code=exited, status=213/SECUREBITS)
Dec 12 14:19:47 ip-172-31-37-194 systemd[1]: Started a tool for managing secrets.
Dec 12 14:19:47 ip-172-31-37-194 systemd[5152]: vault.service: Failed to set process secure bits: Operation not perm
Dec 12 14:19:47 ip-172-31-37-194 systemd[5152]: vault.service: Failed at step SECUREBITS spawning /usr/local/bin/vau
Dec 12 14:19:47 ip-172-31-37-194 systemd[1]: vault.service: Main process exited, code=exited, status=213/SECUREBITS
Dec 12 14:19:47 ip-172-31-37-194 systemd[1]: vault.service: Failed with result 'exit-code'.
There'is my 2 config files :
vault.hcl :
disable_mlock = true
listener "tcp" {
address = "http://{{ listener_address }}"
tls_disable = 1
}
backend "file" {
path = "/var/lib/vault"
}
my vault.service :
[Unit]
Description=a tool for managing secrets
Documentation=https://vaultproject.io/docs/
After=network.target
ConditionFileNotEmpty=/etc/vault.hcl
[Service]
User=vault
Group=vault
ExecStart=/usr/local/bin/vault server -config=/etc/vault.hcl
ExecReload=/usr/local/bin/kill --signal HUP $MAINPID
CapabilityBoundingSet=CAP_SYSLOG CAP_IPC_LOCK
Capabilities=CAP_IPC_LOCK+ep
SecureBits=keep-caps
NoNewPrivileges=yes
KillSignal=SIGINT
[Install]
WantedBy=multi-user.target
I didn't find anything yet who could unlock this situation, if someone have an idea.

Apache Airflow : Dag task marked zombie, with background process running on remote server

**Apache Airflow version:**1.10.9-composer
Kubernetes Version : Client Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.12-gke.6002", GitCommit:"035184604aff4de66f7db7fddadb8e7be76b6717", GitTreeState:"clean", BuildDate:"2020-12-01T23:13:35Z", GoVersion:"go1.12.17b4", Compiler:"gc", Platform:"linux/amd64"}
Environment: Airflow, running on top of Kubernetes - Linux version 4.19.112
OS : Linux version 4.19.112+ (builder#7fc5cdead624) (Chromium OS 9.0_pre361749_p20190714-r4 clang version 9.0.0 (/var/cache/chromeos-cache/distfiles/host/egit-src/llvm-project c11de5eada2decd0a495ea02676b6f4838cd54fb) (based on LLVM 9.0.0svn)) #1 SMP Fri Sep 4 12:00:04 PDT 2020
Kernel : Linux gke-europe-west2-asset-c-default-pool-dc35e2f2-0vgz
4.19.112+ #1 SMP Fri Sep 4 12:00:04 PDT 2020 x86_64 Intel(R) Xeon(R) CPU # 2.20GHz GenuineIntel GNU/Linux
What happened ?
A running task is marked as Zombie after the execution time crossed the latest heartbeat + 5 minutes.
The task is running in background in another application server, triggered using SSHOperator.
[2021-01-18 11:53:37,491] {taskinstance.py:888} INFO - Executing <Task(SSHOperator): load_trds_option_composite_file> on 2021-01-17T11:40:00+00:00
[2021-01-18 11:53:37,495] {base_task_runner.py:131} INFO - Running on host: airflow-worker-6f6fd78665-lm98m
[2021-01-18 11:53:37,495] {base_task_runner.py:132} INFO - Running: ['airflow', 'run', 'dsp_etrade_process_trds_option_composite_0530', 'load_trds_option_composite_file', '2021-01-17T11:40:00+00:00', '--job_id', '282759', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/dsp_etrade_trds_option_composite_0530.py', '--cfg_path', '/tmp/tmpge4_nva0']
Task Executing time:
dag_id dsp_etrade_process_trds_option_composite_0530
duration 7270.47
start_date 2021-01-18 11:53:37,491
end_date 2021-01-18 13:54:47.799728+00:00
Scheduler Logs during that time:
[2021-01-18 13:54:54,432] {taskinstance.py:1135} ERROR - <TaskInstance: dsp_etrade_process_etrd.push_run_date 2021-01-18 13:30:00+00:00 [running]> detected as zombie
{
textPayload: "[2021-01-18 13:54:54,432] {taskinstance.py:1135} ERROR - <TaskInstance: dsp_etrade_process_etrd.push_run_date 2021-01-18 13:30:00+00:00 [running]> detected as zombie"
insertId: "1ca8zyfg3zvma66"
resource: {
type: "cloud_composer_environment"
labels: {3}
}
timestamp: "2021-01-18T13:54:54.432862699Z"
severity: "ERROR"
logName: "projects/asset-control-composer-prod/logs/airflow-scheduler"
receiveTimestamp: "2021-01-18T13:54:55.714437665Z"
}
Airflow-webserver log :
X.X.X.X - - [18/Jan/2021:13:54:39 +0000] "GET /_ah/health HTTP/1.1" 200 187 "-" "GoogleHC/1.0"
{
textPayload: "172.17.0.5 - - [18/Jan/2021:13:54:39 +0000] "GET /_ah/health HTTP/1.1" 200 187 "-" "GoogleHC/1.0"
"
insertId: "1sne0gqg43o95n3"
resource: {2}
timestamp: "2021-01-18T13:54:45.401670481Z"
logName: "projects/asset-control-composer-prod/logs/airflow-webserver"
receiveTimestamp: "2021-01-18T13:54:50.598807514Z"
}
Airflow Info logs :
2021-01-18 08:54:47.799 EST
{
textPayload: "NoneType: None
"
insertId: "1ne3hqgg47yzrpf"
resource: {2}
timestamp: "2021-01-18T13:54:47.799661030Z"
severity: "INFO"
logName: "projects/asset-control-composer-prod/logs/airflow-scheduler"
receiveTimestamp: "2021-01-18T13:54:50.914461159Z"
}
[2021-01-18 13:54:47,800] {taskinstance.py:1192} INFO - Marking task as FAILED.dag_id=dsp_etrade_process_trds_option_composite_0530, task_id=load_trds_option_composite_file, execution_date=20210117T114000, start_date=20210118T115337, end_date=20210118T135447
Copy link
{
textPayload: "[2021-01-18 13:54:47,800] {taskinstance.py:1192} INFO - Marking task as FAILED.dag_id=dsp_etrade_process_trds_option_composite_0530, task_id=load_trds_option_composite_file, execution_date=20210117T114000, start_date=20210118T115337, end_date=20210118T135447"
insertId: "1ne3hqgg47yzrpg"
resource: {2}
timestamp: "2021-01-18T13:54:47.800605248Z"
severity: "INFO"
logName: "projects/asset-control-composer-prod/logs/airflow-scheduler"
receiveTimestamp: "2021-01-18T13:54:50.914461159Z"
}
Airflow Database shows the latest heartbeat as:
select state, latest_heartbeat from job where id=282759
--------------------------------------
state | latest_heartbeat
running | 2021-01-18 13:48:41.891934
Airflow Configurations:
celery
worker_concurrency=6
scheduler
scheduler_health_check_threshold=60
scheduler_zombie_task_threshold=300
max_threads=2
core
dag_concurrency=6
Kubernetes Cluster :
Worker nodes : 6
What was expected to happen ?
The backend process takes around 2hrs 30 minutes to finish. During
such long running jobs the task is detected as zombie. Eventhough the
worker node is still processing the task. The state of the job is
still marked as 'running'. State if the task is not known during the
run time.

Advanced AWS CloudFormation - cfn-Init & cfn-Hup not working

I am experimenting with Cloudformation CFN-INIT & CFN-HUP based on below template but the wordpress stack doesn't get created. CFN-HUP process is not started and CFN-Init throws Code1 error. Please see Stack-template and error log details below. Can anyone help me understand what's going wrong here please?
Stack-Template:
**Parameters:
DecideEnvSize:
Type: String
Default: LOW
AllowedValues:
- LOW
- MEDIUM
- HIGH
Description: Select Environment Size (S,M,L)
DatabaseName:
Type: String
Default: DB4wordpress
DatabaseUser:
Type: String
Default: ***************
DatabasePassword:
Type: String
Default: *************
NoEcho: true
TestString:
Type: String
Default: Don't eat yourself up!!!
Mappings:
MyRegionMap:
us-east-1:
"AMALINUX" : "ami-c481fad3" # AMALINUX SEP 2016 - N. Verginia
us-east-2:
"AMALINUX" : "ami-71ca9114" # AMALINUX SEP 2016 - Ohio
InstanceSize:
LOW:
"EC2" : "t2.micro"
"DB" : "db.t2.micro"
MEDIUM:
"EC2" : "t2.small"
"DB" : "db.t2.small"
HIGH:
"EC2" : "t2.medium"
"DB" : "db.t2.medium"
Resources:
DBServer:
Type: "AWS::RDS::DBInstance"
Properties:
AllocatedStorage: 5
StorageType: gp2
DBInstanceClass: !FindInMap [InstanceSize, !Ref DecideEnvSize, DB] # Dynamic mapping + Pseudo Parameter
DBName: !Ref DatabaseName
Engine: MySQL
MasterUsername: !Ref DatabaseUser
MasterUserPassword: !Ref DatabasePassword
DeletionPolicy: Delete
EC2server:
Type: "AWS::EC2::Instance"
DependsOn: DBServer
Properties:
ImageId: !FindInMap [MyRegionMap, !Ref "AWS::Region", AMALINUX] # Dynamic mapping + Pseudo Parameter
InstanceType: !FindInMap [InstanceSize, !Ref DecideEnvSize, EC2]
KeyName: AdvancedCFN
**UserData:
"Fn::Base64":
!Sub |
#!/bin/bash
yum update -y aws-cfn-bootstrap # good practice - always do this.
/opt/aws/bin/cfn-init -v --stack ${AWS::StackName} --resource EC2server --configsets wordpress --region ${AWS::Region}
yum -y update
Metadata:
AWS::CloudFormation::Init:
configSets:
wordpress:
- "configure_cfn"
- "install_wordpress"
- "config_wordpress"
configure_cfn:
files:
/etc/cfn/cfn-hup.conf:
content: !Sub |
[main-just some name]
stack=${AWS::StackId}
region=${AWS::Region}
verbose=true
interval=5
mode: "000400"
owner: root
group: root
/etc/cfn/hooks.d/cfn-auto-reloader.conf:
content: !Sub |
[cfn-auto-reloader-hook #just a name]
triggers=post.update
path=Resources.EC2server.Metadata.AWS::CloudFormation::Init
action=/opt/aws/bin/cfn-init -v --stack ${AWS::StackName} --resource EC2server --configsets wordpress --region ${AWS::Region}
mode: "000400"
owner: root
group: root
/var/www/html/index2.html:
content: !Ref TestString
services:
sysvinit:
cfn-hup:
enabled: "true"
ensureRunning: "true"
files:
- "/etc/cfn/cfn-hup.conf"
- "/etc/cfn/hooks.d/cfn-auto-reloader.conf"**
install_wordpress:
packages:
yum:
httpd: []
php: []
mysql: []
php-mysql: []
sources:
/var/www/html: "http://wordpress.org/latest.tar.gz"
services:
sysvinit:
httpd:
enabled: "true"
ensureRunning: "true"
config_wordpress:
commands:
01_clone_config:
cwd: "/var/www/html/wordpress"
test: "test ! -e /var/www/html/wordpress/wp-config.php"
command: "cp wp-config-sample.php wp-config.php"
02_inject_dbhost:
cwd: "/var/www/html/wordpress"
command: !Sub |
sed -i 's/localhost/${DBServer.Endpoint.Address}/g' wp-config.php
03_inject_dbname:
cwd: "/var/www/html/wordpress"
command: !Sub |
sed -i 's/database_name_here/${DatabaseName}/g' wp-config.php
04_inject_dbuser:
cwd: "/var/www/html/wordpress"
command: !Sub |
sed -i 's/username_here/${DatabaseUser}/g' wp-config.php
05_inject_dbpassword:
cwd: "/var/www/html/wordpress"
command: !Sub |
sed -i 's/password_here/${DatabasePassword}/g' wp-config.php
S3blob:
Type: "AWS::S3::Bucket"**
Error & log details
[root#ip-172-31-25-239 ec2-user]# cd /var/log
[root#ip-172-31-25-239 log]# ls
*audit btmp cfn-init-cmd.log cfn-wire.log cloud-init-output.log dmesg lastlog maillog ntpstats spooler wtmp
boot.log cfn-hup.log cfn-init.log cloud-init.log cron dracut.log mail messages secure tallylog yum.log
[root#ip-172-31-25-239 log]# cat cfn-hup.log
2017-12-30 10:48:15,923 [ERROR] Error: [main] section must contain stack option*
===========================================================================================
===========================================================================================
[root#ip-172-31-25-239 log]# cat cfn-init.log
2017-12-30 10:48:15,499 [DEBUG] CloudFormation client initialized with endpoint https://cloudformation.us-east-1.amazonaws.com
2017-12-30 10:48:15,501 [DEBUG] Describing resource EC2server in stack Yetagain-init-hup-try10
2017-12-30 10:48:15,616 [INFO] -----------------------Starting build-----------------------
2017-12-30 10:48:15,616 [DEBUG] Not setting a reboot trigger as scheduling support is not available
2017-12-30 10:48:15,617 [INFO] Running configSets: wordpress
2017-12-30 10:48:15,618 [INFO] Running configSet wordpress
2017-12-30 10:48:15,619 [INFO] Running config configure_cfn
2017-12-30 10:48:15,620 [DEBUG] No packages specified
2017-12-30 10:48:15,620 [DEBUG] No groups specified
2017-12-30 10:48:15,620 [DEBUG] No users specified
2017-12-30 10:48:15,620 [DEBUG] No sources specified
2017-12-30 10:48:15,620 [DEBUG] Parent directory /etc/cfn does not exist, creating
2017-12-30 10:48:15,625 [DEBUG] Writing content to /etc/cfn/cfn-hup.conf
2017-12-30 10:48:15,625 [DEBUG] Setting mode for /etc/cfn/cfn-hup.conf to 000400
2017-12-30 10:48:15,626 [DEBUG] Setting owner 0 and group 0 for /etc/cfn/cfn-hup.conf
2017-12-30 10:48:15,626 [DEBUG] Parent directory /etc/cfn/hooks.d does not exist, creating
2017-12-30 10:48:15,626 [DEBUG] Writing content to /etc/cfn/hooks.d/cfn-auto-reloader.conf
2017-12-30 10:48:15,626 [DEBUG] Setting mode for /etc/cfn/hooks.d/cfn-auto-reloader.conf to 000400
2017-12-30 10:48:15,626 [DEBUG] Setting owner 0 and group 0 for /etc/cfn/hooks.d/cfn-auto-reloader.conf
2017-12-30 10:48:15,626 [DEBUG] Parent directory /var/www/html does not exist, creating
2017-12-30 10:48:15,627 [DEBUG] Writing content to /var/www/html/index2.html
2017-12-30 10:48:15,627 [DEBUG] No mode specified for /var/www/html/index2.html. The file will be created with the mode: 0644
2017-12-30 10:48:15,627 [DEBUG] No commands specified
2017-12-30 10:48:15,627 [DEBUG] Using service modifier: /sbin/chkconfig
2017-12-30 10:48:15,627 [DEBUG] Setting service cfn-hup to enabled
2017-12-30 10:48:15,634 [INFO] enabled service cfn-hup
2017-12-30 10:48:15,635 [DEBUG] Restarting cfn-hup due to change detected in dependency
2017-12-30 10:48:15,635 [DEBUG] Using service runner: /sbin/service
*2017-12-30 10:48:15,941 [ERROR] Could not restart service cfn-hup; return code was 1
2017-12-30 10:48:15,941 [DEBUG] Service output: Stopping cfn-hup: [FAILED]
Starting cfn-hup: [FAILED]
2017-12-30 10:48:15,942 [ERROR] Error encountered during build of configure_cfn: Could not restart cfn-hup*
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/cfnbootstrap/construction.py", line 542, in run_config
CloudFormationCarpenter(config, self._auth_config).build(worklog)
File "/usr/lib/python2.7/dist-packages/cfnbootstrap/construction.py", line 270, in build
CloudFormationCarpenter._serviceTools[manager]().apply(services, changes)
File "/usr/lib/python2.7/dist-packages/cfnbootstrap/service_tools.py", line 161, in apply
self._restart_service(service)
File "/usr/lib/python2.7/dist-packages/cfnbootstrap/service_tools.py", line 185, in _restart_service
raise ToolError("Could not restart %s" % service)
ToolError: Could not restart cfn-hup
2017-12-30 10:48:15,942 [ERROR] -----------------------BUILD FAILED!------------------------
2017-12-30 10:48:15,944 [ERROR] Unhandled exception during build: Could not restart cfn-hup
Traceback (most recent call last):
File "/opt/aws/bin/cfn-init", line 171, in <module>
worklog.build(metadata, configSets)
File "/usr/lib/python2.7/dist-packages/cfnbootstrap/construction.py", line 129, in build
Contractor(metadata).build(configSets, self)
File "/usr/lib/python2.7/dist-packages/cfnbootstrap/construction.py", line 530, in build
self.run_config(config, worklog)
File "/usr/lib/python2.7/dist-packages/cfnbootstrap/construction.py", line 542, in run_config
CloudFormationCarpenter(config, self._auth_config).build(worklog)
File "/usr/lib/python2.7/dist-packages/cfnbootstrap/construction.py", line 270, in build
CloudFormationCarpenter._serviceTools[manager]().apply(services, changes)
File "/usr/lib/python2.7/dist-packages/cfnbootstrap/service_tools.py", line 161, in apply
self._restart_service(service)
File "/usr/lib/python2.7/dist-packages/cfnbootstrap/service_tools.py", line 185, in _restart_service
raise ToolError("Could not restart %s" % service)
ToolError: Could not restart cfn-hup
=============================================================================================================
==============================================================================================================
[root#ip-172-31-25-239 log]# cat /etc/cfn/cfn-hup.conf
[main-just some name]
stack=arn:aws:cloudformation:us-east-1:523324464109:stack/Yetagain-init-hup-try10/908305e0-ed4d-11e7-b9f7-500c285ebefd
region=us-east-1
verbose=true
interval=5
==========================================================================================================
=========================================================================================================
[root#ip-172-31-25-239 log]# cat /etc/cfn/hooks.d/cfn-auto-reloader.conf
[cfn-auto-reloader-hook #just a name]
triggers=post.update
path=Resources.EC2server.Metadata.AWS::CloudFormation::Init
action=/opt/aws/bin/cfn-init -v --stack Yetagain-init-hup-try10 --resource EC2server --configsets wordpress --region us-east-1
Looks like the main error is in cfn-hup.log:
2017-12-30 10:48:15,923 [ERROR] Error: [main] section must contain stack option*
Try changing [main-just some name] to [main] in your cfn-hup.conf. For reference, my /etc/cfn/cfn-hup.conf looks like something like this:
[main]
stack=arn:aws:cloudformation:us-west-1:acccount_id:stack/mystack-dev-ecs-EC2-1VF68LZMOLAIY/cb2a6a80-554a-11e8-b318-503dcab41efa
region=us-west-1
interval=5
verbose=true

AMQP server on localhost:5672 is unreachable: [Errno 111] ECONNREFUSED

i am trying to add additional compute node on different virtual machine to the pre-installed openstack. I disabled the firewall services,enable to ping other virtual machine.. but still compute node is not able to register with Rabbitmq service running on controller node..
Here it is my nova.conf file...
[DEFAULT]
dhcpbridge_flagfile=/etc/nova/nova.conf
dhcpbridge=/usr/bin/nova-dhcpbridge
state_path=/var/lib/nova
lock_path=/var/lock/nova
force_dhcp_release=True
iscsi_helper=tgtadm
libvirt_use_virtio_for_bridges=True
connection_type=libvirt
root_helper=sudo nova-rootwrap /etc/nova/rootwrap.conf
verbose=True
ec2_private_dns_show_ip=True
api_paste_config=/etc/nova/api-paste.ini
volumes_path=/var/lib/nova/volumes
enabled_apis=ec2,osapi_compute,metadata
rpc_backend = rabbit
auth_strategy = keystone
use_neutron = True
firewall_driver = nova.virt.firewall.NoopFirewallDriver
my_ip = #compute node ip
rabbit_host= #controller_node_ip
rabbit_port = 5672
rabbit_userid = stackrabbit
rabbit_password = devstack
rabbit_use_ssl = False
rabbit_virtual_host=/
[keystone_authtoken]
auth_uri = http://controller_node_ip:5000
auth_url = http://controller_node_ip:35357
memcached_servers = controller_node_ip:11211
auth_type = password
project_domain_name = default
user_domain_name = default
project_name = service
username = nova
password = devstack
auth_host = controller_node_ip
auth_port = 35357
auth_protocol = http
[vnc]
enabled = True
vncserver_listen = 0.0.0.0
vncserver_proxyclient_address = $my_ip
novncproxy_base_url = http://controller_node_ip:6080/vnc_auto.html
[glance]
api_servers = http://controller_node_ip:9292
[oslo_concurrency]
lock_path = /var/lib/nova/tmp
Here it is my nova-compute.log:
2016-09-20 19:08:57.701 7201 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on localhost:5672
2016-09-20 19:08:57.701 7201 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds...
2016-09-20 19:08:58.708 7201 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on localhost:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 30 seconds...
Please suggest me something so that i can resolve this issue...
Thank you in advance...
I encountered this when expanding my nova-compute estate (although I'm not using Devstack).
In my newly created compute server, the following was seen in /var/log/nova/nova-compute.log : -
2017-11-14 11:40:53.287 52408 ERROR oslo.messaging._drivers.impl_rabbit [req-adfd6dc7-fe8c-4de5-8401-58d325c3b4a8 - - - - -] [be6e0302-dfc8-4512-8b48-0d824fc6ea14] AMQP server on 127.0.0.1:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds. Client port: None
The solution was quite simple. I checked /var/log/sysinfo (I run ubuntu; /var/log/messages for those on Redhat systems) and could see the following lines:-
Nov 14 12:01:48 compute2 systemd[1]: Started OpenStack Compute.
Nov 14 12:01:49 compute2 nova-compute[3222]: Traceback (most recent call last):
Nov 14 12:01:49 compute2 nova-compute[3222]: File "/usr/bin/nova-compute", line 10, in <module>
Nov 14 12:01:49 compute2 nova-compute[3222]: sys.exit(main())
Nov 14 12:01:49 compute2 nova-compute[3222]: File "/usr/lib/python2.7/dist-packages/nova/cmd/compute.py", line 42, in main
Nov 14 12:01:49 compute2 nova-compute[3222]: config.parse_args(sys.argv)
Nov 14 12:01:49 compute2 nova-compute[3222]: File "/usr/lib/python2.7/dist-packages/nova/config.py", line 52, in parse_args
Nov 14 12:01:49 compute2 nova-compute[3222]: default_config_files=default_config_files)
Nov 14 12:01:49 compute2 nova-compute[3222]: File "/usr/lib/python2.7/dist-packages/oslo_config/cfg.py", line 2355, in __call__
Nov 14 12:01:49 compute2 nova-compute[3222]: self._namespace._files_permission_denied)
Nov 14 12:01:49 compute2 nova-compute[3222]: oslo_config.cfg.ConfigFilesPermissionDeniedError: Failed to open some config files: /etc/nova/nova.conf
Nov 14 12:01:49 compute2 systemd[1]: nova-compute.service: Main process exited, code=exited, status=1/FAILURE
Which shows that my /etc/nova/nova.conf file was unreadable. It turns out this was because I used scp to copy the nova.conf from my first compute to my new machine, and the file was read-only to the root user. The solution was to (on my new compute)
cd /etc/nova/
chown nova:nova nova.conf
service nova-compute restart

Spring Boot (ConfigServer) is restarting all the time

we have a very simple Spring Boot Service (#EnableConfigServer) running behind a nginx proxy.
The service basically works, but it is restarting all the time (Context is closed and started continuously).
See the log files here: http://pastebin.com/GErCF5x6
The setup is basically just one Java Class and the two configs (bootstrap.properties as well as application.properties).
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
import org.springframework.cloud.config.server.EnableConfigServer;
import org.springframework.context.annotation.Configuration;
/**
* Main Application, which starts the Spring Boot context
*/
#Configuration
#EnableAutoConfiguration
#EnableConfigServer
public class Application {
#SuppressWarnings("PMD.SignatureDeclareThrowsException")
public static void main(String[] args) throws Exception {
SpringApplication.run(Application.class, args);
}
}
bootstrap.properties
spring.application.name = configserver
spring.cloud.config.enabled = false
encrypt.failOnError= false
encrypt.key= secret
application.properties
# HTTP Configuration
server.port = 8888
# Management Configuration
management.context-path=/manage
# SSH Based Git-Repository
spring.cloud.config.server.git.uri=git#bitbucket.org:xyz.git
spring.cloud.config.server.git.basedir = cache/config
security.user.name=ads
security.user.password={cipher}secret2
security.basic.realm=Config Server
Log-File
11:13:47.105 [qtp1131266554-101] INFO o.s.boot.SpringApplication - Started application in 0.176 seconds (JVM running for 245.66)
11:13:47.105 [qtp1131266554-101] INFO o.s.c.a.AnnotationConfigApplicationContext - Closing org.springframework.context.annotation.AnnotationConfigApplicationContext#7709b120: startup date [Wed Apr 08 11:13:47 UTC 2015]; root of context hierarchy
11:13:47.690 [qtp1131266554-51] INFO o.s.b.a.audit.listener.AuditListener - AuditEvent [timestamp=Wed Apr 08 11:13:47 UTC 2015, principal=ads, type=AUTHENTICATION_SUCCESS, data={details=org.springframework.security.web.authentication.WebAuthenticationDetails#ffffe21a: RemoteIpAddress: 10.10.100.207; SessionId: null}]
11:13:48.324 [qtp1131266554-19] INFO o.s.boot.SpringApplication - Starting application on api01.prd.rbx.xxxx.com with PID 24204 (started by ads in /home/ads/config-server)
11:13:48.328 [qtp1131266554-19] INFO o.s.c.a.AnnotationConfigApplicationContext - Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext#473cffd3: startup date [Wed Apr 08 11:13:48 UTC 2015]; root of context hierarchy
11:13:48.332 [qtp1131266554-19] INFO o.s.boot.SpringApplication - Started application in 0.092 seconds (JVM running for 246.887)
11:13:48.332 [qtp1131266554-19] INFO o.s.c.a.AnnotationConfigApplicationContext - Closing org.springframework.context.annotation.AnnotationConfigApplicationContext#473cffd3: startup date [Wed Apr 08 11:13:48 UTC 2015]; root of context hierarchy
11:13:48.612 [qtp1131266554-55] INFO o.s.b.a.audit.listener.AuditListener - AuditEvent [timestamp=Wed Apr 08 11:13:48 UTC 2015, principal=ads, type=AUTHENTICATION_SUCCESS, data={details=org.springframework.security.web.authentication.WebAuthenticationDetails#ffffe21a: RemoteIpAddress: 10.10.100.207; SessionId: null}]
11:13:50.601 [qtp1131266554-77] INFO o.s.boot.SpringApplication - Starting application on api01.prd.rbx.xxxx.com with PID 24204 (started by ads in /home/ads/config-server)
11:13:50.604 [qtp1131266554-77] INFO o.s.c.a.AnnotationConfigApplicationContext - Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext#44330486: startup date [Wed Apr 08 11:13:50 UTC 2015]; root of context hierarchy
11:13:50.607 [qtp1131266554-77] INFO o.s.boot.SpringApplication - Started application in 0.088 seconds (JVM running for 249.162)
11:13:50.607 [qtp1131266554-77] INFO o.s.c.a.AnnotationConfigApplicationContext - Closing org.springframework.context.annotation.AnnotationConfigApplicationContext#44330486: startup date [Wed Apr 08 11:13:50 UTC 2015]; root of context hierarchy
11:13:51.831 [qtp1131266554-55] INFO o.s.boot.SpringApplication - Starting application on api01.prd.rbx.xxxx.com with PID 24204 (started by ads in /home/ads/config-server)
11:13:51.834 [qtp1131266554-55] INFO o.s.c.a.AnnotationConfigApplicationContext - Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext#1843040d: startup date [Wed Apr 08 11:13:51 UTC 2015]; root of context hierarchy
11:13:51.840 [qtp1131266554-55] INFO o.s.boot.SpringApplication - Started application in 0.094 seconds (JVM running for 250.395)
Any idea, how to solve the problem?
best,
fritz
That's normal. It's not the application that is starting and stopping, it's just the mini-contexts that are used to create the config resources for remote clients. Completely harmless.
I found out the problem which I am facing here. Basically we have two remote services running on AWS and a configured Load Balanacer, which continously checks the /health endpoint. By invoking this method, the ConfigServerClient always calls our ConfigServer.
I don't understand why there is a HealthIndicator for the ConfigServer, is there a way to disable this HealthIndicator, as every request to this endpoint will query our config server again and again. Another disadvantage is, that the /health request is not responding as fast as possible anymore, which leads to timeouts in the Load Balancer (default 2s).
So, config server isn't restarting. It uses a new Spring Application Context and loads the files it grabbed from git into a new context and the formats the values to send back to the client. Below are the logs from my config server for one client connection. The logs just looks confusing like it is restarting.
2015-04-08 12:22:52.206 INFO 85076 --- [nio-8888-exec-1] o.s.b.a.audit.listener.AuditListener : AuditEvent [timestamp=Wed Apr 08 12:22:52 EDT 2015, principal=user, type=AUTHENTICATION_SUCCESS, data={details=org.springframework.security.web.authentication.WebAuthenticationDetails#957e: RemoteIpAddress: 127.0.0.1; SessionId: null}]
2015-04-08 12:22:52.944 INFO 85076 --- [nio-8888-exec-2] o.s.b.a.audit.listener.AuditListener : AuditEvent [timestamp=Wed Apr 08 12:22:52 EDT 2015, principal=user, type=AUTHENTICATION_SUCCESS, data={details=org.springframework.security.web.authentication.WebAuthenticationDetails#957e: RemoteIpAddress: 127.0.0.1; SessionId: null}]
2015-04-08 12:22:53.490 INFO 85076 --- [nio-8888-exec-1] o.s.boot.SpringApplication : Starting application on sgibb-mbp.local with PID 85076 (started by sgibb in /Users/sgibb/workspace/spring/spring-cloud-samples/configserver)
2015-04-08 12:22:53.494 INFO 85076 --- [nio-8888-exec-1] s.c.a.AnnotationConfigApplicationContext : Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext#571e4b84: startup date [Wed Apr 08 12:22:53 EDT 2015]; root of context hierarchy
2015-04-08 12:22:53.497 INFO 85076 --- [nio-8888-exec-1] f.a.AutowiredAnnotationBeanPostProcessor : JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
2015-04-08 12:22:53.498 INFO 85076 --- [nio-8888-exec-1] o.s.boot.SpringApplication : Started application in 0.151 seconds (JVM running for 23.747)
2015-04-08 12:22:53.499 INFO 85076 --- [nio-8888-exec-1] o.s.c.c.s.NativeEnvironmentRepository : Adding property source: file:/Users/sgibb/workspace/spring/spring-cloud-samples/configserver/target/config/foo.properties
2015-04-08 12:22:53.500 INFO 85076 --- [nio-8888-exec-1] o.s.c.c.s.NativeEnvironmentRepository : Adding property source: file:/Users/sgibb/workspace/spring/spring-cloud-samples/configserver/target/config/application.yml
2015-04-08 12:22:53.500 INFO 85076 --- [nio-8888-exec-1] s.c.a.AnnotationConfigApplicationContext : Closing org.springframework.context.annotation.AnnotationConfigApplicationContext#571e4b84: startup date [Wed Apr 08 12:22:53 EDT 2015]; root of context hierarchy
2015-04-08 12:22:54.090 INFO 85076 --- [nio-8888-exec-2] o.s.boot.SpringApplication : Starting application on sgibb-mbp.local with PID 85076 (started by sgibb in /Users/sgibb/workspace/spring/spring-cloud-samples/configserver)
2015-04-08 12:22:54.096 INFO 85076 --- [nio-8888-exec-2] s.c.a.AnnotationConfigApplicationContext : Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext#416d044c: startup date [Wed Apr 08 12:22:54 EDT 2015]; root of context hierarchy
2015-04-08 12:22:54.098 INFO 85076 --- [nio-8888-exec-2] f.a.AutowiredAnnotationBeanPostProcessor : JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
2015-04-08 12:22:54.099 INFO 85076 --- [nio-8888-exec-2] o.s.boot.SpringApplication : Started application in 0.433 seconds (JVM running for 24.348)
2015-04-08 12:22:54.099 INFO 85076 --- [nio-8888-exec-2] o.s.c.c.s.NativeEnvironmentRepository : Adding property source: file:/Users/sgibb/workspace/spring/spring-cloud-samples/configserver/target/config/foo.properties
2015-04-08 12:22:54.099 INFO 85076 --- [nio-8888-exec-2] o.s.c.c.s.NativeEnvironmentRepository : Adding property source: file:/Users/sgibb/workspace/spring/spring-cloud-samples/configserver/target/config/application.yml
2015-04-08 12:22:54.099 INFO 85076 --- [nio-8888-exec-2] s.c.a.AnnotationConfigApplicationContext : Closing org.springframework.context.annotation.AnnotationConfigApplicationContext#416d044c: startup date [Wed Apr 08 12:22:54 EDT 2015]; root of context hierarchy

Resources