OS::Heat::SoftwareDeployment is staying stuck in CREATE_IN_PROGRESS status - openstack

I am trying customise new instances created within openstack mikata, using HEAT templates. Using OS::Nova::Server with a script in user_data works fine.
Next the idea is to do additional steps via OS::Heat::SoftwareConfig.
The config is:
type: OS::Nova::Server
....
user_data_format: SOFTWARE_CONFIG
user_data:
str_replace:
template:
get_file: vm_init1.sh
config1:
type: OS::Heat::SoftwareConfig
depends_on: vm
properties:
group: script
config: |
#!/bin/bash
echo "Running $0 OS::Heat::SoftwareConfig look in /var/tmp/test_script.log" | tee /var/tmp/test_script.log
deploy:
type: OS::Heat::SoftwareDeployment
properties:
config:
get_resource: config1
server:
get_resource: vm
The instance is setup nicely (the script vm_init1.sh above runs fine) and one can login, but he "config1" example above is never executed.
Analysis
- The base image is Ubuntu 16.04, created with disk-image-create and including "vm ubuntu os-collect-config os-refresh-config os-apply-config heat-config heat-config-script"
- From "openstack stack resource list $vm" one see that deployment never fisnihe, with OS::Heat::SoftwareDeployment status=CREATE_IN_PROGRESS
- "openstack stack resource show $vm config1" shows resource_status=CREATE_COMPLETE
- Within the vm, /var/log/cloud-init-output.log shows the output of the script vm_init1.sh, but no trace of the 'config1' script. The log os-apply-config.log is empty, is that normal?
How does one troubleshoot OS::Heat::SoftwareDeployment configs?
(I have read https://docs.openstack.org/developer/heat/template_guide/software_deployment.html#software-deployment-resources)

Related

Azure Form Recognizer Label Tool Docker: Missing EULA=accept command line option. You must provide this to continue

I am trying to run the Azure Forms Recognizer Label Tool in Azure Container instance.
I have followed the instructions given in here.
I was able to deploy the container image but when I try to start it, it terminates with the following message:
Missing EULA=accept command line option. You must provide this to continue.
This quite surprising, because this option has been specified in my YAML file (see below).
What can I do to fix this?
My YAML file:
apiVersion: 2018-10-01
location: West Europe
name: renecognitiveservice
imageRegistryCredentials: # This is required when pulling a non-public image
- server: mcr.microsoft.com
username: xxx
password: xxx
properties:
containers:
- name: xxxeamlabelingtool
properties:
image: mcr.microsoft.com/azure-cognitive-services/custom-form/labeltool
environmentVariables: # These env vars are required
- name: eula
value: accept
- name: billing
value: https://rk-formsrecognizer.cognitiveservices.azure.com/
- name: apikey
value: xxx
resources:
requests:
cpu: 2 # Always refer to recommended minimal resources
memoryInGb: 4 # Always refer to recommended minimal resources
ports:
- port: 5000
osType: Linux
restartPolicy: OnFailure
ipAddress:
type: Public
ports:
- protocol: tcp
port: 5000
tags: null
type: Microsoft.ContainerInstance/containerGroups
Apparently you can run it with command:
"command": [
"./run.sh", "eula=accept"
],
Worked from the portal
https://github.com/MicrosoftDocs/azure-docs/issues/46623
This is what you want to add in the Azure portal while creating the container instance.
You will find this in the "Advanced" tab.
Afterwards you can access the IP address of that instance to open the label-tool.
"./run.sh", "eula=accept"

Upgrading Kubernetes NGINX to use StackDriver new resource model in External Metrics

I have successfully set up NGINX as an ingress for my Kubernetes cluster on GKE. I have enabled and configured external metrics (and I am using an external metric in my HPA for auto-scaling). All good there and it's working well.
However, I have a deprecation warning in StackDriver around these external metrics. I have come to discover that these warnings are because of "old" resource types being used.
For example, using this command:
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|nginx-ingress-controller|nginx_ingress_controller_nginx_process_connections" | jq
I get this output:
{
"metricName": "custom.googleapis.com|nginx-ingress-controller|nginx_ingress_controller_nginx_process_connections",
"metricLabels": {
"metric.labels.controller_class": "nginx",
"metric.labels.controller_namespace": "ingress-nginx",
"metric.labels.controller_pod": "nginx-ingress-controller-[snip]",
"metric.labels.state": "writing",
"resource.labels.cluster_name": "[snip]",
"resource.labels.container_name": "",
"resource.labels.instance_id": "[snip]",
"resource.labels.namespace_id": "ingress-nginx",
"resource.labels.pod_id": "nginx-ingress-controller-[snip]",
"resource.labels.project_id": "[snip]",
"resource.labels.zone": "[snip]",
"resource.type": "gke_container"
},
"timestamp": "2020-01-26T05:17:33Z",
"value": "1"
}
Note that the "resource.type" field is "gke_container". As of the next version of Kubernetes this needs to be "k8s_container".
I have looked through the Kubernetes NGINX configuration to try to determine when (or if) an upgrade has been made to support the new StackDriver resource model, but I have failed so far. And I would rather not "blindly" upgrade NGINX if I can help it (even in UAT).
These are the Docker images that I am currently using:
quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.26.2
gcr.io/google-containers/prometheus-to-sd:v0.9.0
gcr.io/google-containers/custom-metrics-stackdriver-adapter:v0.10.0
Could anyone help out here?
Thanks in advance,
Ben
Ok this has nothing to do with NGINX and everything to do with Prometheus (and specifically the Prometheus sidecar prometheus-to-sd).
For future readers if your Prometheus start-up looks like this:
- name: prometheus-to-sd
image: gcr.io/google-containers/prometheus-to-sd:v0.9.0
ports:
- name: profiler
containerPort: 6060
command:
- /monitor
- --stackdriver-prefix=custom.googleapis.com
- --source=nginx-ingress-controller:http://localhost:10254/metrics
- --pod-id=$(POD_NAME)
- --namespace-id=$(POD_NAMESPACE)
Then is needs to look like this:
- name: prometheus-to-sd
image: gcr.io/google-containers/prometheus-to-sd:v0.9.0
ports:
- name: profiler
containerPort: 6060
command:
- /monitor
- --stackdriver-prefix=custom.googleapis.com
- --source=nginx-ingress-controller:http://localhost:10254/metrics
- --monitored-resource-type-prefix=k8s_
- --pod-id=$(POD_NAME)
- --namespace-id=$(POD_NAMESPACE)
That is, include the --monitored-resource-type-prefix=k8s_ option.

Write a playbook, which access the network devices present in host file and check for host OS of the device (nexus/cisco/arista)

I am writing a playbook, which dynamically enters the network servers provided by user at prompt in host file.Further i want to create a task which will ssh on the server one by one and run show version command to fetch the OS of those network devices.
As of now, I am getting the below error:
1."Unable to automatically determine host network os. Please manually configure ansible_network_os value for this host"
I dont want to define the OS beforehand, rather want the playbook to do it.
I tried entering some configurations in host file as below:
[device]
192.1xx.xxx.xx
[all:vars]
ansible_connection=network_cli
ansible_user=user
ansible_ssh_pass=password
ssh_args = -o GSSAPIAuthentication=no
The below plsybook is taking device name from the user and adding it to the file, further it need to ssh on those servers and fetch the OS details:
---
- name: add host dynamically
hosts: localhost
gather_facts: no
vars:
Server: null_val1
vars_prompt:
- name: "Server"
prompt: "Please enter the non-reporting server/IP"
private: no
default: null_val1
pre_tasks:
- name: Save variable
set_fact:
Server: "{{Server}}"
- name: Copying servers to host file
hosts: localhost
become: true
tasks:
- name: copying variable value
lineinfile:
path: "{{playbook_dir}}/hosts.txt"
line: "{{Server}}"
insertafter: '^\[device\]'
state: present
- name: Main execution task
hosts: device
gather_facts: true
tasks:
- name: Run command on devices
cli_command:
command: show version
register: result
- name: display result
debug:
var: result.stdout_lines
The actual result should show me the OS of network device(s) entered in host file by running show version command remotely.

SSH connectivity issues with ntc-ansible modules

I am trying to using the ntc-ansible module with Ansible running on Ubuntu (WSL). I have ssh connectivity to my remote device (Cisco 2960X) and I can run ansible playbooks to the same remote switch using the built in Ansible networking modules (ios_command) and it works fine.
Issue:
When I try to run any of the ntc-ansible modules, it fails, unable to connect to the device. Probably something simple, but I have hit a wall. There is something I am missing about how to use ntc-ansible modules. Ansible is seeing the modules as I can look at the docs as was suggested as a test in the readme.
I have ntc-ansible module installed here: /home/melshman/.ansible/plugins/modules/ntc-ansible
I am running my playbooks from here: ~/projects/ansible/
The first time I ran the playbook with the ntc-ansible modules it failed and based on error message and some research I installed sshpass (sudo apt-get install sshpass). But still having ssh problems using ntc-ansible… (playbook and traceback below)
I hear folks taking about an index file, but I can’t find that file? Where does it live and what do I need to do with it?
What is my connection supposed to be setup to be? Local? SSH? Netmiko_ssh?
What should I be using for platform? Cisco_ios? cisco_ios_ssh?
Appreciate any help I can get. I have been running in circles for hours and hours.
Ansible Version Info:
VTMNB17024:~/projects/ansible $ ansible --version
ansible 2.5.3
config file = /home/melshman/projects/ansible/ansible.cfg
configured module search path = [u'/home/melshman/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/lib/python2.7/dist-packages/ansible
executable location = /usr/local/bin/ansible
python version = 2.7.12 (default, Dec 4 2017, 14:50:18) [GCC 5.4.0 20160609]
Working playbook (ios_command:) note: ansible_ssh_pass and ansible_user in group var:
- name: Test Net Automation
hosts: ctil-ios-upgrade
connection: local
gather_facts: no
tasks:
- name: Grab run config
ios_command:
commands:
- show run
register: config
- name: Create backup of running configuration
copy:
content: "{{config.stdout[0]}}"
dest: "backups/show_run_{{inventory_hostname}}.txt"
Playbook (not working) using ntc-ansible module (Note: username and password are defined in Group VAR:
- name: Cisco IOS Automation
hosts: ctil-ios-upgrade
connection: local
gather_facts: no
tasks:
- name: GET UPTIME
ntc_show_command:
connection: ssh
platform: "cisco_ios"
command: 'show version | inc uptime'
host: "{{ inventory_hostname }}"
username: "{{ username }}"
password: "{{ password }}"
use_templates: True
template_dir: /home/melshman/.ansible/plugins/modules/ntc-ansible/ntc-templates/templates
Here is the traceback I get when the error occurs:
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: netmiko.ssh_exception.NetMikoTimeoutException: Connection to device timed-out: cisco_ios VTgroup_SW:22
fatal: [VTgroup_SW]: FAILED! => {"changed": false, "module_stderr": "Traceback (most recent call last):\n File \"/tmp/ansible_RJRY9m/ansible_module_ntc_save_config.py\", line 279, in \n main()\n File \"/tmp/ansible_RJRY9m/ansible_module_ntc_save_config.py\", line 251, in main\n device = ntc_device(device_type, host, username, password, **kwargs)\n File \"/usr/local/lib/python2.7/dist-packages/pyntc-0.0.6-py2.7.egg/pyntc/__init__.py\", line 35, in ntc_device\n return device_class(*args, **kwargs)\n File \"/usr/local/lib/python2.7/dist-packages/pyntc-0.0.6-py2.7.egg/pyntc/devices/ios_device.py\", line 39, in __init__\n self.open()\n File \"/usr/local/lib/python2.7/dist-packages/pyntc-0.0.6-py2.7.egg/pyntc/devices/ios_device.py\", line 55, in open\n verbose=False)\n File \"build/bdist.linux-x86_64/egg/netmiko/ssh_dispatcher.py\", line 178, in ConnectHandler\n File \"build/bdist.linux-x86_64/egg/netmiko/base_connection.py\", line 207, in __init__\n File \"build/bdist.linux-x86_64/egg/netmiko/base_connection.py\", line 693, in establish_connection\nnetmiko.ssh_exception.NetMikoTimeoutException: Connection to device timed-out: cisco_ios VTgroup_SW:22\n", "module_stdout": "", "msg": "MODULE FAILURE", "rc": 1}
Here is a working solution using ntc_show_command to a Cisco IOS device.
- name: Cisco IOS Automation
hosts: pynet-rtr1
connection: local
gather_facts: no
tasks:
- name: GET UPTIME
ntc_show_command:
connection: ssh
platform: "cisco_ios"
command: 'show version'
host: "{{ ansible_host }}"
username: "{{ ansible_user }}"
password: "{{ ansible_ssh_pass }}"
use_templates: True
template_dir: '/home/kbyers/ntc-templates/templates'
If you are going to use ntc-templates, I probably would not have the '| include uptime' in the 'show version'. In other words, let TextFSM convert the output to structured data first and then grab the uptime from that structured data.
I modified inventory_hostname to ansible_host to be consistent with my inventory format (my inventory_hostname doesn't actually resolve in DNS).
I modified username and password to 'ansible_user' and 'ansible_ssh_pass' to be consistent with my inventory and also to be more consistent with Ansible 2.5/2.6 variable naming.
On your above issue, your exception message does not match your playbook (i.e. are you sure that is the exception you get for that playbook).
Here is my inventory file (I simplified this to remove some unnecessary devices and to hide confidential information)
[all:vars]
ansible_connection=local
ansible_python_interpreter=/home/kbyers/VENV/ansible/bin/python
ansible_user=user
ansible_ssh_pass=password
[local]
localhost ansible_connection=local
[cisco]
pynet-rtr1 ansible_host=cisco1.domain.com
pynet-rtr2 ansible_host=cisco2.domain.com

Locating a valid NGINX template for nginx-ingress-controller? (Kubernetes)

I am trying to follow this tutorial on configuring nginx-ingress-controller for a Kubernetes cluster I deployed to AWS using kops.
https://daemonza.github.io/2017/02/13/kubernetes-nginx-ingress-controller/
When I run kubectl create -f ./nginx-ingress-controller.yml, the pods are created but error out. From what I can tell, the problem lies with the following portion of nginx-ingress-controller.yml:
volumes:
- name: tls-dhparam-vol
secret:
secretName: tls-dhparam
- name: nginx-template-volume
configMap:
name: nginx-template
items:
- key: nginx.tmpl
path: nginx.tmpl
Error shown on the pods:
MountVolume.SetUp failed for volume "nginx-template-volume" : configmaps "nginx-template" not found
This makes sense, because the tutorial does not have the reader create this configmap before creating the controller. I know that I need to create the configmap using:
kubectl create configmap nginx-template --from-file=nginx.tmpl=nginx.tmpl
I've done this using nginx.tmpl files found from sources like this, but they don't seem to work (always fail with invalid NGINX template errors). Log example:
I1117 16:29:49.344882 1 main.go:94] Using build: https://github.com/bprashanth/contrib.git - git-92b2bac
I1117 16:29:49.402732 1 main.go:123] Validated default/default-http-backend as the default backend
I1117 16:29:49.402901 1 main.go:80] mkdir /etc/nginx-ssl: file exists already exists
I1117 16:29:49.402951 1 ssl.go:127] using file '/etc/nginx-ssl/dhparam/dhparam.pem' for parameter ssl_dhparam
F1117 16:29:49.403962 1 main.go:71] invalid NGINX template: template: nginx.tmpl:1: function "where" not defined
The image version used is quite old, but I've tried newer versions with no luck.
containers:
- name: nginx-ingress-controller
image: gcr.io/google_containers/nginx-ingress-controller:0.8.3
This thread is similar to my issue, but I don't quite understand the proposed solution. Where would I use docker cp to extract a usable template from? Seems like the templates I'm using use a language/syntax incompatible with Docker...?
To copy the nginx template file from the ingress controller pod to your local machine, you can first grab the name of the pod with kubectl get pods then run kubectl exec [POD_NAME] -it -- cat /etc/nginx/template/nginx.tmpl > nginx.tmpl.
This will leave you with the nginx.tmpl file you can then edit and push back up as a configmap. I would recommend though keeping custom changes to the template to a minimum as it can make it hard for you to update the controller in the future.
Hope this helps!

Resources