DevStack: failed to create new CentOS instance - openstack

After deployed DevStack, I managed to create cirros instances. Now I want create CentOS instance:
I download image CentOS-7-x86_64-GenericCloud-1608.qcow2 from [here].(http://cloud.centos.org/centos/7/images/)
Then I run nova boot --flavor 75c84ea2-d5b0-4d99-b935-08f654122aa3 --image 997f51bd-1ee2-4cdb-baea-6cef766bf191 --security-groups 207880e9-165f-4295-adfd-1f91ac96aaaa --nic net-id=26c05c99-b82d-403f-a988-fc07d3972b6b centos-1
Then I run nova list, it gives: b9f97618-085b-4d2b-bc94-34f3b953e2ee | centos-1 | ERROR | - | NOSTATE
It is in ERROR state, so I grep log with that b9f97618-085b-4d2b-bc94-34f3b953e2ee (instance id): grep b9f97618-085b-4d2b-bc94-34f3b953e2ee *.log
The grep returns:
Result:
n-api.log:2016-10-13 22:09:27.975 DEBUG nova.compute.api
[req-6b5bf92a-ce53-46d4-8965-b54e02d21aef admin admin] [instance:
b9f97618-085b-4d2b-bc94-34f3b953e2ee] block_device_mapping
[BlockDeviceMapping(boot_index=0,connection_info=None,created_at=,delete_on_termination=True,deleted=,deleted_at=,destination_type='local',device_name=None,device_type='disk',disk_bus=None,guest_format=None,id=,image_id='997f51bd-1ee2-4cdb-baea-6cef766bf191',instance=,instance_uuid=,no_device=False,snapshot_id=None,source_type='image',tag=None,updated_at=,volume_id=None,volume_size=None),
BlockDeviceMapping(boot_index=-1,connection_info=None,created_at=,delete_on_termination=True,deleted=,deleted_at=,destination_type='local',device_name=None,device_type='disk',disk_bus=None,guest_format=None,id=,image_id=None,instance=,instance_uuid=,no_device=False,snapshot_id=None,source_type='blank',tag=None,updated_at=,volume_id=None,volume_size=1)]
from (pid=12331) _bdm_validate_set_size_and_instance
/opt/stack/nova/nova/compute/api.py:1239 n-api.log:2016-10-13
22:09:28.117 DEBUG nova.compute.api
[req-d9327bbd-d333-4d37-8651-57e95d21396b admin admin] [instance:
b9f97618-085b-4d2b-bc94-34f3b953e2ee] Fetching instance by UUID from
(pid=12331) get /opt/stack/nova/nova/compute/api.py:2215
n-api.log:2016-10-13 22:09:28.184 DEBUG neutronclient.v2_0.client
[req-d9327bbd-d333-4d37-8651-57e95d21396b admin admin] GET call to
neutron for
http://10.61.148.89:9696/v2.0/ports.json?device_id=b9f97618-085b-4d2b-bc94-34f3b953e2ee
used request id req-2b427b03-67d9-474e-be93-b631b6a2ba78 from
(pid=12331) _append_request_id
/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:127
n-api.log:2016-10-13 22:09:28.195 INFO nova.osapi_compute.wsgi.server
[req-d9327bbd-d333-4d37-8651-57e95d21396b admin admin] 10.61.148.89
"GET /v2.1/servers/b9f97618-085b-4d2b-bc94-34f3b953e2ee HTTP/1.1"
status: 200 len: 2018 time: 0.0843861 n-api.log:2016-10-13
22:09:52.232 DEBUG neutronclient.v2_0.client
[req-415982d6-9ff4-4c80-99a8-46e1765a58d9 admin admin] GET call to
neutron for
http://10.61.148.89:9696/v2.0/ports.json?device_id=b9f97618-085b-4d2b-bc94-34f3b953e2ee&device_id=d6c67c2f-0d21-4ef8-bcfe-eba852ed0cc1 used request id req-645a777a-35df-456e-a982-433e97cdb0e7 from
(pid=12331) _append_request_id
/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:127
n-api.log:2016-10-13 22:17:04.476 DEBUG neutronclient.v2_0.client
[req-3b1c4dff-d9e9-41a5-9719-5bbb7c68085c admin admin] GET call to
neutron for
http://10.61.148.89:9696/v2.0/ports.json?device_id=b9f97618-085b-4d2b-bc94-34f3b953e2ee&device_id=d6c67c2f-0d21-4ef8-bcfe-eba852ed0cc1 used request id req-eb8bd6ef-1ecb-4c41-9355-26e4edb84d5c from
(pid=12330) _append_request_id
/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:127
n-cond.log:2016-10-13 22:09:28.170 WARNING nova.scheduler.utils
[req-6b5bf92a-ce53-46d4-8965-b54e02d21aef admin admin] [instance:
b9f97618-085b-4d2b-bc94-34f3b953e2ee] Setting instance to ERROR state.
n-cond.log:2016-10-13 22:09:28.304 DEBUG nova.network.neutronv2.api
[req-6b5bf92a-ce53-46d4-8965-b54e02d21aef admin admin] [instance:
b9f97618-085b-4d2b-bc94-34f3b953e2ee] deallocate_for_instance() from
(pid=19162) deallocate_for_instance
/opt/stack/nova/nova/network/neutronv2/api.py:1154
n-cond.log:2016-10-13 22:09:28.350 DEBUG neutronclient.v2_0.client
[req-6b5bf92a-ce53-46d4-8965-b54e02d21aef admin admin] GET call to
neutron for
http://10.61.148.89:9696/v2.0/ports.json?device_id=b9f97618-085b-4d2b-bc94-34f3b953e2ee
used request id req-9dc53ce3-1f4e-4619-a22e-ce98a6f1c382 from
(pid=19162) _append_request_id
/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:127
n-cond.log:2016-10-13 22:09:28.351 DEBUG nova.network.neutronv2.api
[req-6b5bf92a-ce53-46d4-8965-b54e02d21aef admin admin] [instance:
b9f97618-085b-4d2b-bc94-34f3b953e2ee] Instance cache missing network
info. from (pid=19162) _get_preexisting_port_ids
/opt/stack/nova/nova/network/neutronv2/api.py:2133
n-cond.log:2016-10-13 22:09:28.362 DEBUG nova.network.base_api
[req-6b5bf92a-ce53-46d4-8965-b54e02d21aef admin admin] [instance:
b9f97618-085b-4d2b-bc94-34f3b953e2ee] Updating instance_info_cache
with network_info: [] from (pid=19162)
update_instance_cache_with_nw_info
/opt/stack/nova/nova/network/base_api.py:43 grep: n-dhcp.log: No such
file or directory n-sch.log:2016-10-13 22:09:28.166 DEBUG nova.filters
[req-6b5bf92a-ce53-46d4-8965-b54e02d21aef admin admin] Filtering
removed all hosts for the request with instance ID
'b9f97618-085b-4d2b-bc94-34f3b953e2ee'. Filter results:
[('RetryFilter', [(u'i-z78fw9mn', u'i-z78fw9mn')]),
('AvailabilityZoneFilter', [(u'i-z78fw9mn', u'i-z78fw9mn')]),
('RamFilter', [(u'i-z78fw9mn', u'i-z78fw9mn')]), ('DiskFilter', None)]
from (pid=19243) get_filtered_objects
/opt/stack/nova/nova/filters.py:129 n-sch.log:2016-10-13 22:09:28.166
INFO nova.filters [req-6b5bf92a-ce53-46d4-8965-b54e02d21aef admin
admin] Filtering removed all hosts for the request with instance ID
'b9f97618-085b-4d2b-bc94-34f3b953e2ee'. Filter results: ['RetryFilter:
(start: 1, end: 1)', 'AvailabilityZoneFilter: (start: 1, end: 1)',
'RamFilter: (start: 1, end: 1)', 'DiskFilter: (start: 1, end: 0)']
q-svc.log:2016-10-13 22:09:28.184 INFO neutron.wsgi
[req-2b427b03-67d9-474e-be93-b631b6a2ba78 admin
55a846ac28f847eca8521ff71dea8633] 10.61.148.89 - - [13/Oct/2016
22:09:28] "GET
/v2.0/ports.json?device_id=b9f97618-085b-4d2b-bc94-34f3b953e2ee
HTTP/1.1" 200 211 0.038510 q-svc.log:2016-10-13 22:09:28.350 INFO
neutron.wsgi [req-9dc53ce3-1f4e-4619-a22e-ce98a6f1c382 admin
55a846ac28f847eca8521ff71dea8633] 10.61.148.89 - - [13/Oct/2016
22:09:28] "GET
/v2.0/ports.json?device_id=b9f97618-085b-4d2b-bc94-34f3b953e2ee
HTTP/1.1" 200 211 0.042906 q-svc.log:2016-10-13 22:09:52.233 INFO
neutron.wsgi [req-645a777a-35df-456e-a982-433e97cdb0e7 admin
55a846ac28f847eca8521ff71dea8633] 10.61.148.89 - - [13/Oct/2016
22:09:52] "GET
/v2.0/ports.json?device_id=b9f97618-085b-4d2b-bc94-34f3b953e2ee&device_id=d6c67c2f-0d21-4ef8-bcfe-eba852ed0cc1 HTTP/1.1" 200 1241 0.041629 q-svc.log:2016-10-13 22:17:04.477 INFO
neutron.wsgi [req-eb8bd6ef-1ecb-4c41-9355-26e4edb84d5c admin
55a846ac28f847eca8521ff71dea8633] 10.61.148.89 - - [13/Oct/2016
22:17:04] "GET
/v2.0/ports.json?device_id=b9f97618-085b-4d2b-bc94-34f3b953e2ee&device_id=d6c67c2f-0d21-4ef8-bcfe-eba852ed0cc1 HTTP/1.1" 200 1241 0.044646
Now I have no idea about what's going wrong about that instance deployment, could anyone give me some suggestions?

Some suggestions in order to discard common problems:
The flavor: The flavor you are using is the same you used with cirros ?. Is the answer is yes: That flavor include a specific disk size for the root disk ?. If "yes", check the minimum disk size required for the centos generic image you are using. Either the image need a bigger disk, or, the disk is to big for your box. Then, check your available HD space, the flavor specs, and the image specs.
Network: Let's discard neutron. Instead of assigning the network, assign a port. Create a port in neutron, and in the nova boot command, assign the port to the vm instead of assigning the network (--nic port-id=port-uuid).
Glance image definition: When you created the glance image from the downloaded qcow2 file, did you included any metadata item that is forcing the image to request a cinder-based disk ?. Did you included any metadata at all ?. If so, get rid of all metadata items on the glance image.
Try again to launch a cirros instance. If the cirros goes OK, then it's something with the image (maybe any of the above: glance, flavor, disk space).
Let me know what you find !.

Related

DynamoDB local behaving erratically

This is a very strange situation that's driving me nuts, and I would really appreciate some help here.
I am using CDK to define the DynamoDB table and associated indices. To test them locally, I installed cdklocal and DynamoDB local using localstack. When the computer (Mac running Ventura 13.1) is restarted, everything works as expected. Here is the script I use to bootstrap and start the stack (this is in a file called startStack.sh):
docker-compose up -d
echo "Waiting for 5 seconds"
sleep 5
cd test-app
cdklocal bootstrap
echo "Waiting for 5 seconds"
sleep 5
cdklocal deploy TestAppStack
#cdklocal deploy TestAppStack/ops-table
DYNAMO_ENDPOINT="http://localhost:4566/" dynamodb-admin &
open http://0.0.0.0:8001
cd ..
The test-app directory contains a local copy of the DynamoDB (and associated indices) definition. I do not encounter any errors running the cdklocal (or cdk) deploy commands so I am assuming that the CDK definition is not an issue.
The docker-compose looks like this:
version: "3.8"
services:
localstack:
container_name: AWS-DEVELOPMENT-WITH-LOCALSTACK
image: localstack/localstack:latest
network_mode: bridge
ports:
- "127.0.0.1:53:53"
- "127.0.0.1:53:53/udp"
- "127.0.0.1:443:443"
- "127.0.0.1:4566:4566"
- "127.0.0.1:4571:4571"
- "127.0.0.1:${PORT_WEB_UI-8080}:${PORT_WEB_UI-8080}"
environment:
- DYNAMODB_SHARE_DB=1
- DISABLE_CORS_CHECKS=1
- SERVICES=s3,dynamodb,sns,sqs,firehose,kinesis,ses,sts,cloudformation
- DEBUG=1
- DATA_DIR=/tmp/localstack/data
- PORT_WEB_UI=8080
- LAMBDA_EXECUTOR=local
- KINESIS_ERROR_PROBABILITY=1.0
- DOCKER_HOST=unix:///var/run/docker.sock
- HOST_TMP_FOLDER=./.localstack
volumes:
- './.localstack:/var/lib/localstack'
- '/var/run/docker.sock:/var/run/docker.sock'
Everything works as expected when I first run the startStack.sh file - the dynamodb-admin window opens up correctly and other interfaces can interact with the local DynamoDB table. But after some time (and I have not been able to pinpoint the cause), all interactions with local DynamoDB start failing with the following error(s):
Bootstrapping environment aws://000000000000/us-west-2...
❌ Environment aws://000000000000/us-west-2 failed bootstrapping: UnknownEndpoint: Inaccessible host: `localhost' at port `4566'. This service may not be available in the `us-west-2' region.
at Request.ENOTFOUND_ERROR (/usr/local/lib/node_modules/aws-sdk/lib/event_listeners.js:611:46)
at Request.callListeners (/usr/local/lib/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
at Request.emit (/usr/local/lib/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (/usr/local/lib/node_modules/aws-sdk/lib/request.js:686:14)
at error2 (/usr/local/lib/node_modules/aws-sdk/lib/event_listeners.js:443:22)
at ClientRequest.<anonymous> (/usr/local/lib/node_modules/aws-sdk/lib/http/node.js:99:9)
at ClientRequest.emit (node:events:513:28)
at ClientRequest.emit (node:domain:489:12)
at Socket.socketErrorListener (node:_http_client:494:9)
at Socket.emit (node:events:513:28) {
code: 'UnknownEndpoint',
region: 'us-west-2',
hostname: 'localhost',
retryable: true,
originalError: [Error],
time: 2023-01-15T06:46:40.614Z
}
Inaccessible host: `localhost' at port `4566'. This service may not be available in the `us-west-2' region.
The script hangs at the following message:
[16:52:01] Retrieved account ID 000000000000 from disk cache
[16:52:01] Assuming role 'arn:aws:iam::000000000000:role/cdk-hnb659fds-deploy-role-000000000000-us-west-2'.
[16:52:01] Assuming role failed: Inaccessible host: `localhost' at port `4566'. This service may not be available in the `us-west-2' region.
[16:52:01] Could not assume role in target account using current credentials Inaccessible host: `localhost' at port `4566'. This service may not be available in the `us-west-2' region. . Please make sure that this role exists in the account. If it doesn't exist, (re)-bootstrap the environment with the right '--trust', using the latest version of the CDK CLI.
current credentials could not be used to assume 'arn:aws:iam::000000000000:role/cdk-hnb659fds-deploy-role-000000000000-us-west-2', but are for the right account. Proceeding anyway.
[16:52:01] Waiting for stack CDKToolkit to finish creating or updating...
Restarting the computer fixes it, but it's not clear what causes the issue in the first place. Restarting Docker does not help either.
Any thoughts on what could be causing the problem and how I can avoid it?
I'm adding this as an answer, although I do not have an affirmative answer I thought I would try to help.
I believe your port is being occupied and thus the process you are running is unable to obtain it resulting in error. Before running the job, check if the port is occupied:
sudo lsof -i :4566

Airflow server constantly restarting - Signal 15

I launch a airflow webserver command in my local machine to start an airflow instance on port 8081. The server starts, however the pŕompt constantly shows some warning messages, as a loop. No error message appears, but the server doesn't works. Those are the messages:
/usr/local/lib/python3.8/dist-packages/airflow/configuration.py:361 DeprecationWarning: The default_queue option in [celery] has been moved to the default_queue option in [operators] - the old setting has been used, but please update your config.
/usr/local/lib/python3.8/dist-packages/airflow/configuration.py:361 DeprecationWarning: The dag_concurrency option in [core] has been renamed to max_active_tasks_per_dag - the old setting has been used, but please update your config.
/usr/local/lib/python3.8/dist-packages/airflow/configuration.py:361 DeprecationWarning: The processor_poll_interval option in [scheduler] has been renamed to scheduler_idle_sleep_time - the old setting has been used, but please update your config.
[2022-06-13 15:11:57,355] {manager.py:779} WARNING - No user yet created, use flask fab command to do it.
[2022-06-13 15:12:01,925] {manager.py:512} WARNING - Refused to delete permission view, assoc with role exists DAG Runs.can_create User
[2022-06-13 15:12:19 +0000] [1117638] [INFO] Handling signal: ttou
[2022-06-13 15:12:19 +0000] [1120256] [INFO] Worker exiting (pid: 1120256)
[2022-06-13 15:12:19 +0000] [1117638] [WARNING] Worker with pid 1120256 was terminated due to signal 15
[2022-06-13 15:12:22 +0000] [1117638] [INFO] Handling signal: ttin
[2022-06-13 15:12:22 +0000] [1121568] [INFO] Booting worker with pid: 1121568
Do you know what can could be happening?
Thank you in advance!

Task fails due to not being able to read log file

Composer is failing a task due to it not being able to read a log file, it's complaining about incorrect encoding.
Here's the log that appears in the UI:
*** Unable to read remote log from gs://bucket/logs/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** 'ascii' codec can't decode byte 0xc2 in position 6986: ordinal not in range(128)
*** Log file does not exist: /home/airflow/gcs/logs/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** Fetching from: http://airflow-worker-68dc66c9db-x945n:8793/log/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-68dc66c9db-x945n', port=8793): Max retries exceeded with url: /log/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1c9ff19d10>: Failed to establish a new connection: [Errno -2] Name or service not known',))
I try viewing the file in the google cloud console and it also throws an error:
Failed to load
Tracking Number: 8075820889980640204
But I am able to download the file via gsutil.
When I view the file, it seems to have text overriding other text.
I can't show the entire file but it looks like this:
--------------------------------------------------------------------------------
Starting attempt 1 of 1
--------------------------------------------------------------------------------
#-#{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:23,313] {models.py:1569} INFO - Executing <Task(BigQueryOperator): merge_campaign_exceptions> on 2019-08-03T10:00:00+00:00#-#{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:23,314] {base_task_runner.py:124} INFO - Running: ['bash', '-c', u'airflow run __campaign_exceptions_0_0_1 merge_campaign_exceptions 2019-08-03T10:00:00+00:00 --job_id 22767 --pool _bq_pool --raw -sd DAGS_FOLDER//-campaign-exceptions.py --cfg_path /tmp/tmpyBIVgT']#-#{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:24,658] {base_task_runner.py:107} INFO - Job 22767: Subtask merge_campaign_exceptions [2019-08-04 10:01:24,658] {settings.py:176} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800#-#{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
Where the #-#{} pieces seems to be "on top of" the typical log.
I faced the same problem. In my case the problem was that I removed the google_gcloud_default connection that was being used to retrieve the logs.
Check the configuration and look for the connection name.
[core]
remote_log_conn_id = google_cloud_default
Then check the credentials used for that connection name has the right permissions to access the GCS bucket.
I'm having a similar problem with viewing logs in GCP Cloud Composer. It doesn't appear to be preventing the failing DAG task from running though. What it looks like is a permissions error between the GKE and Storage Bucket where the log files are kept.
You can still view the logs by going into your cluster's storage bucket in the same directory as your /dags folder where you should also see a logs/ folder.
Your helm chart should setup global env:
- name: AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT
value: "google-cloud-platform://"
Then, you should deploy a Dockerfile with root account only (not airflow account), additionaly, you set up your helm uid, gid as:
uid: 50000 #airflow user
gid: 50000 #airflow group
Then upgrade helm chart with new config
*** Unable to read remote log from gs://bucket
1)Found the solution after assigning the roles to the service account
2)The SA key(json or txt) to be added and configured to the connection in the
remote_log_conn_id = google_cloud_default
3)restart the scheduler and webserver of the airflow
4)restart the dags on the airflow
you can find the logs on the GCS bucket where its configured

openstack cinder error on liberty

I have an install of Liberty RDO openstack. However, when i attempt:
[root#controller ~(keystonerc_admin:admin)]# cinder --insecure quota-defaults edc8225a13404a00b44d8099e060c3d5
/usr/lib/python2.7/site-packages/urllib3/connectionpool.py:769: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
InsecureRequestWarning)
ERROR: The server has either erred or is incapable of performing the requested operation. (HTTP 500) (Request-ID: req-aee74e5b-b9da-460a-a4b1-14f67c165e48)
In Horizon, this error manifests itself as:
Error: Unable to retrieve volume limit information.
When navigating to horizon -> admin -> defaults.
The cinder logs show:
2016-03-10 02:07:19.970 30161 WARNING keystoneclient.auth.identity.generic.base [req-89efb8d4-299b-4cf6-bca3-386f6c4e9348 9bf9e8f990624c2ca0c08c1bf02edbdb edc8225a13404a00b44d8099e060c3d5 - - -] Discovering versions from the identity service failed when creating the password plugin. Attempting to determine version from URL.
2016-03-10 02:07:19.970 30161 ERROR cinder.api.middleware.fault [req-89efb8d4-299b-4cf6-bca3-386f6c4e9348 9bf9e8f990624c2ca0c08c1bf02edbdb edc8225a13404a00b44d8099e060c3d5 - - -] Caught error: Could not determine a suitable URL for the plugin
2016-03-10 02:07:19.971 30161 INFO cinder.api.middleware.fault [req-89efb8d4-299b-4cf6-bca3-386f6c4e9348 9bf9e8f990624c2ca0c08c1bf02edbdb edc8225a13404a00b44d8099e060c3d5 - - -] http://192.168.33.11:8776/v2/edc8225a13404a00b44d8099e060c3d5/os-quota-sets/edc8225a13404a00b44d8099e060c3d5/defaults returned with HTTP 500
2016-03-10 02:07:19.972 30161 INFO eventlet.wsgi.server [req-89efb8d4-299b-4cf6-bca3-386f6c4e9348 9bf9e8f990624c2ca0c08c1bf02edbdb edc8225a13404a00b44d8099e060c3d5 - - -] 192.168.33.11 - - [10/Mar/2016 02:07:19] "GET /v2/edc8225a13404a00b44d8099e060c3d5/os-quota-sets/edc8225a13404a00b44d8099e060c3d5/defaults HTTP/1.1" 500 425 0.082927
My cinder config:
[root#controller ~(keystonerc_admin:admin)]# cat /etc/cinder/cinder.conf | grep -vE '(^$|^\#)'
[DEFAULT]
my_ip=192.168.33.11
auth_strategy=keystone
debug=True
verbose=True
rpc_backend=rabbit
glance_host=192.168.33.11
enabled_backends=lvm
[BRCD_FABRIC_EXAMPLE]
[CISCO_FABRIC_EXAMPLE]
[cors]
[cors.subdomain]
[database]
connection=mysql://cinder:change_me#192.168.33.11/cinder
[fc-zone-manager]
[keymgr]
encryption_auth_url=http://localhost:5000/v3
[keystone_authtoken]
insecure=True
auth_uri=https://192.168.33.11:5000
auth_url=https://192.168.33.11:35357
auth_plugin=password
project_domain_id=default
user_domain_id=default
project_name=service
username=cinder
password=change_me
[matchmaker_redis]
[matchmaker_ring]
[oslo_concurrency]
lock_path=/var/lib/cinder/tmp
[oslo_messaging_amqp]
[oslo_messaging_qpid]
[oslo_messaging_rabbit]
rabbit_host=192.168.33.11
rabbit_userid=openstack
rabbit_password=change_me
[oslo_middleware]
[oslo_policy]
[oslo_reports]
[profiler]
[lvm]
volume_driver=cinder.volume.drivers.lvm.LVMVolumeDriver
volume_group=cinder-volumes
iscsi_protocol=iscsi
iscsi_helper=lioadm
This looks like it could be this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1272572
I don't know what way rdo deploys openstack - but it looks like you are using the v3 Identity API
encryption_auth_url=http://localhost:5000/v3
[keystone_authtoken]
insecure=True
auth_uri=https://192.168.33.11:5000
auth_url=https://192.168.33.11:35357
These unversioned auth endpoints present a http 300 'multiple choices' so they can work with cinder-pythonclient (v2.0) and openstack common client (v3).
I would determine - what is your default keystone endpoint (no version in endpoint = 3, otherwise /v2.0).
What version of keystone is Horizon using 'USE_IDENTITIY_API = X' in local_settings.py
the newer openstack common client uses a different systax for quotas - if you are on identity api v3
os quota set
# Compute settings
[--cores <num-cores>]
[--fixed-ips <num-fixed-ips>]
[--floating-ips <num-floating-ips>]
[--injected-file-size <injected-file-bytes>]
[--injected-files <num-injected-files>]
[--instances <num-instances>]
[--key-pairs <num-key-pairs>]
[--properties <num-properties>]
[--ram <ram-mb>]
# Volume settings
[--gigabytes <new-gigabytes>]
[--snapshots <new-snapshots>]
[--volumes <new-volumes>]
<project>

Failed to add new Host through Cloudera Manager

We're running RedHat 6.4 on 2 of our nodes.
We've installed the new Cloudera Manager 5.5.0 and we've been trying to create a cluster and add a first node to it (node is initially clean of any Cloudera component). Unfortunately, during the cluster installation, Cloudera Manager gets stuck every time at :
Installation failed. Failed to receive heartbeat from agent.
Ensure that the host's hostname is configured properly.
Ensure that port 7182 is accessible on the Cloudera Manager Server (check firewall rules).
Ensure that ports 9000 and 9001 are not in use on the host being added.
Check agent logs in /var/log/cloudera-scm-agent/ on the host being added. (Some of the logs can be found in the installation details).
If Use TLS Encryption for Agents is enabled in Cloudera Manager (Administration -> Settings -> Security), ensure that /etc/cloudera-scm-agent/config.ini has use_tls=1 on the host being added. Restart the corresponding agent and click the Retry link here.
We looked around and saw how this is usually caused by a misconfigured /etc/hosts file. So we edited ours on both Cloudera Manager and the new node, did a service network restart as well as service cloudera-scm-server restart but it didn't work either.
Here's what the /etc/hosts file looks like :
127.0.0.1 localhost
10.186.80.86 domain.node2.fr.net host
10.186.80.105 domain.node1.fr.net mgrnode
We also tried some cleaning up before relaunching the cluster creation by deleting scm_prepare_node.* and .scm_prepare_node.lock.
We looked at service cloudera-scm-agent status on the new node after each installation fail as well, and we noticed that the service isn't running (even when we do a service restart, the result is still the same)
service cloudera-scm-agent start
Starting cloudera-scm-agent: [ OK ]
service cloudera-scm-agent status
cloudera-scm-agent dead but pid file exists
Here's the agent logs on the new node side :
tail -f /var/log/cloudera-scm-agent/cloudera-scm-agent.log
[30/Nov/2015 15:07:27 +0000] 24529 MainThread agent INFO Agent Logging Level: INFO
[30/Nov/2015 15:07:27 +0000] 24529 MainThread agent INFO No command line vars
[30/Nov/2015 15:07:27 +0000] 24529 MainThread agent INFO Missing database jar: /usr/share/java/mysql-connector-java.jar (normal, if you're not using this database type)
[30/Nov/2015 15:07:27 +0000] 24529 MainThread agent INFO Missing database jar: /usr/share/java/oracle-connector-java.jar (normal, if you're not using this database type)
[30/Nov/2015 15:07:27 +0000] 24529 MainThread agent INFO Found database jar: /usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar
[30/Nov/2015 15:07:27 +0000] 24529 MainThread agent INFO Agent starting as pid 24529 user cloudera-scm(420) group cloudera-scm(207).
[30/Nov/2015 15:07:27 +0000] 24529 MainThread agent INFO Because agent not running as root, all processes will run with current user.
[30/Nov/2015 15:07:27 +0000] 24529 MainThread agent WARNING Expected mode 0751 for /var/run/cloudera-scm-agent but was 0755
[30/Nov/2015 15:07:27 +0000] 24529 MainThread agent INFO Re-using pre-existing directory: /var/run/cloudera-scm-agent
[30/Nov/2015 15:07:29 +0000] 24529 MainThread agent INFO Re-using pre-existing directory: /var/run/cloudera-scm-agent/cgroups
Is there anything we're doing wrong?
Thanks in advance for your help!
This time we just created the cluster with the root user (didn't check the single user mode)
Besides, our host had no internet access, and having created our own repository we needed to do one last step before launching the cluster creation which is importing the GPG key on the host using this command :
sudo rpm --import
If anybody finds themselves facing the same problem, hope this helps!

Resources