Ansible async module with poll=0 doesn't finish the task - asynchronous

First of all, I'm using Ansible 2.0.0 (which I can't avoid).
I've a task like this. Here I'm using ping command to send traffic to a destination machine for 2 minutes. This command runs on a remote machine.
- name: Ping destination VM from host VM.
shell: ping -n -i 0.004 100.1.1.1 -c 30000 | grep "received" | cut -d"," -f2 | xargs | cut -d" " -f1
delegate_to: 172.25.11.207
async: 250
poll: 0
register: ping_result
failed_when: ping_result.rc != 0
I've 2-3 other tasks after this. That should not take more than a minute.
Now, after these tasks, I want to capture the output of ping_result. So I check the status of the above task like below:
- name: Check ping status
async_status:
jid: "{{ ping_result.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 10
Now this fails with the below error:
FAILED! => {"failed": true, "msg": "ERROR! The conditional check 'job_result.finished' failed. The error was: ERROR! error while evaluating conditional (job_result.finished): ERROR! 'dict object' has no attribute 'finished'"}
From the error, it looks like the original task has not finished. I even tried to increase the async time, say till 5000. No luck.
Any help on this would be appreciated.

Answer from the comment:
You should use delegate_to to check status of async job, because original async task has been delegated as well.

Following worked for me
until: job_result.finished is defined and job_result.finished

Related

Ansible async_status fails when the async call interrupts the target network connection

How can I call a task that temporarily interrupts network connectivity to the target when it runs.
I wrote a PowerShell ansible module that modifies the target windows network settings causing it to lose connectivity for around 30 seconds. The connection is to a windows 2019 server via winrm I need to know if the task succeeded or reports errors.
If I call the task normally, it will fail with host unreachable if the network is gone for more than read timeout (30) seconds. The operation may have succeeded, but I don't have access to its return values.
- win_vswitch:
Name: "SomeSwitch"
VLAN: 123
state: present
Seems like a good use of async:
- name: Configure VSwitch
win_vswitch:
Name: "SomeSwitch"
VLAN: 123
state: present
async: 600
poll: 0
register: async_result
- name: vswitch - wait for the async task to finish
async_status:
jid: "{{ async_result.ansible_job_id }}"
register: task_result
until: task_result.finished
# ignore_unreachable: yes
retries: 60
delay: 5
# failed_when: False
The problem is that if the async_status poll is unable to get a network connection to the target for more than read_timeout seconds, it fails with host unreachable error. Even if the async_status's timeouts haven't expired.
Seems to me that I just need to set the read timeout to a higher value. And that's where I'm stuck.
Things I've tried:
timeout in ansible.cfg
ansible_winrm_read_timeout_sec & ansible_winrm_operation_timeout_sec in the inventory host item
ignore_unreachable & failed_when in the async_status module
rescue clause around the async_status
Suggestions?
TIA, Jeff
You can define ansible_winrm_connection_timeout (and other ansible_winrm_* values as well), by defining vars on the task itself:
- name: vswitch - wait for the async task to finish
async_status:
jid: "{{ async_result.ansible_job_id }}"
register: task_result
until: task_result.finished
retries: 60
delay: 5
vars:
ansible_winrm_connection_timeout: 300
This may be preferable to using the win_wait_for task depending upon the use case. win_wait_for might be good for a task when the network connectivity will immediately drop; however, if you have a task that will cause a drop at a random time (like a task installing network drivers).

How to do arithmetic on registered variables in ansible?

I'm writing a health-check playbook, and when a host is clustered (VCS), I want to make sure all cluster Service Groups are running.
The output of hastatus looks like this:
[root#node1 ~]# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A node1 RUNNING 0
A node2 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B ClusterService node1 Y N ONLINE
B ClusterService node2 Y N OFFLINE
B NFSExport node1 Y N ONLINE
B NFSExport node2 Y N OFFLINE
B Database node1 Y N ONLINE
B Database node2 Y N OFFLINE
B Application node1 Y N OFFLINE
B Application node2 Y N ONLINE
[root#node1 ~]#
A Service Group can run on any cluster node, and the status of every service group is reported for every cluster node, so the actual number of services groups is (servicegroups / nodes).
I've tried with and without the double braces {{ }} , but no matter what, the last debug task always produces a divide by zero error.
Any help would be appreciated.
# START OF BLOCK
- name: Check cluster status
block:
- name: How many cluster nodes?
shell: hastatus -sum|grep "^A"|wc -l
register: numnodes
- name: How many running cluster nodes?
shell: hastatus -sum|grep "^A"|grep "RUNNING"|wc -l
register: numrunningnodes
- name: report if not all nodes are running
debug:
msg: "ACTION: Not all cluster nodes are running!"
when: numnodes.stdout != numrunningnodes.stdout
# The number of cluster Service Groups == totalsgs / numnodes
- name: How many SGs ("B" lines)?
shell: hastatus -sum|grep "^B"|wc -l
register: totalsgs
- name: How many running SGs?
shell: hastatus -sum|grep "^B"|grep "RUNNING"|wc -l
register: runningsgs
- name: Is everything running somewhere?
debug:
msg: "ACTION: Not all SGs are running!"
when: {{ runningsgs.stdout|int }} != {{ totalsgs.stdout|int / numnodes.stdout|int }}
It's the second worst Ansible you can write. If you throw in include_role with when and loop on top of that, you will have the worst worst.
Ansible is not designed to be a good algorithmic language. It's really good at doing side-effects and juggling inventories, but it's terrible at doing math. You can do it, but it will be unreadable, non-debuggable, non-testable, and you will have your variables littered with global variables (which will for sure bite you later when you do not expect it).
How to do it right? The best best way is to write own module. Which is easier than you think (if you know Python).
The second best (which may be even better for small projects than custom module) is to use script or command module.
Just shovel data as input to the stdin script, and get well processed data back from stdout. The main trick is to produce stdout output in json format and parse it with |from_json filter.
This is example of use of command to parse data:
- name: Get data from somewhere
shell: hastatus -sum
changed_when: false
register: hastatus_cmd
- name: Process data
delegate_to: localhost
command:
cmd: process_hastatus_output.py
stdin: '{{ hastatus_cmd.stdout }}'
changed_when: false
register: hastatus_data
- name: Use it
command: ...?...
when: hastatus.running != hastatus.nodes
vars:
hastatus: '{{ hastatus_data.stdout|from_json }}'
For process_hastatus_output.py you can write tests, you can run them without ansible to check of edge cases, and you'll have the beautiful, cosy language to transform your data.
Or, you can do it in mix of Jinja and Ansible, causing irreparable harm to yourself, and everyone reading your code later.

Salt stack execute a command/invoke a different state upon state failure

I am trying to run multiple states in a sls file and I have a requirement to execute a command upon failure of a state.
e.g.
test_cmd1:
cmd.run:
- name: |
echo 'Command 1'
test_cmd2:
cmd.run:
- name: |
echo 'Command 2'
on_fail_command:
cmd.run:
- name: |
echo 'On failure'
exit 1
I want on_fail_command to be executed when any of test_cmd1 or test_cmd2 fails... but not run when both test commands successfully execute. I have failHard set to True globally in our system.
I tried using onfail but that does not behave the way I want. onfail executes a state if any of the state listed under onfail fails, but here I am looking to skip executing other states upon a state fail but instead jump to on_fail_command and then exit.
Set the order of your on_fail_command state so it runs before anything else, and failhard so it fails the whole job.

Why is Ansible not failing this string in stdout conditional?

I am running Ansible version 2.7 on Centos7 using the network_cli connection method.
I have a playbook that:
Instructs a networking device to pull in a new firmware image via TFTP
Instructs the networking device to calculate the md5 hash value
Stores the output of the calculation in .stdout
Has a conditional When: statment that checks for a given md5 value in the .stdout before proceeding with the task block.
No matter what md5 value I give, it always runs the task block.
The conditional statement is:
when: '"new_ios_md5" | string in md5_result.stdout'
Here is the full playbook:
- name: UPGRADE SUP8L-E SWITCH FIRMWARE
hosts: switches
connection: network_cli
gather_facts: no
vars_prompt:
- name: "compliant_ios_version"
prompt: "What is the compliant IOS version?"
private: no
- name: "new_ios_bin"
prompt: "What is the name of the new IOS file?"
private: no
- name: "new_ios_md5"
prompt: "What is the MD5 value of the new IOS file?"
private: no
- name: "should_reboot"
prompt: "Do you want Ansible to reboot the hosts? (YES or NO)"
private: no
tasks:
- name: GATHER SWITCH FACTS
ios_facts:
- name: UPGRADE IOS IMAGE IF NOT COMPLIANT
block:
- name: COPY OVER IOS IMAGE
ios_command:
commands:
- command: "copy tftp://X.X.X.X/45-SUP8L-E/{{ new_ios_bin }} bootflash:"
prompt: '[{{ new_ios_bin }}]'
answer: "\r"
vars:
ansible_command_timeout: 1800
- name: CHECK MD5 HASH
ios_command:
commands:
- command: "verify /md5 bootflash:{{ new_ios_bin }}"
register: md5_result
vars:
ansible_command_timeout: 300
- name: CONTINUE UPGRADE IF MD5 HASH MATCHES
block:
- name: SETTING BOOT IMAGE
ios_config:
lines:
- no boot system
- boot system flash bootflash:{{ new_ios_bin }}
match: none
save_when: always
- name: REBOOT SWITCH IF INSTRUCTED
block:
- name: REBOOT SWITCH
ios_command:
commands:
- command: "reload"
prompt: '[confirm]'
answer: "\r"
vars:
ansible_command_timeout: 30
- name: WAIT FOR SWITCH TO RETURN
wait_for:
host: "{{inventory_hostname}}"
port: 22
delay: 60
timeout: 600
delegate_to: localhost
- name: GATHER ROUTER FACTS FOR VERIFICATION
ios_facts:
- name: ASSERT THAT THE IOS VERSION IS CORRECT
assert:
that:
- compliant_ios_version == ansible_net_version
msg: "New IOS version matches compliant version. Upgrade successful."
when: should_reboot == "YES"
when: '"new_ios_md5" | string in md5_result.stdout'
when: ansible_net_version != compliant_ios_version
...
The other two conditionals in the playbook work as expected. I cannot figure out how to get ansible to fail the when: '"new_ios_md5" | string in md5_result.stdout' conditional and stop the play if the md5 value is wrong.
When you run the play with debug output the value of stdout is:
"stdout": [
".............................................................................................................................................Done!",
"verify /md5 (bootflash:cat4500es8-universalk9.SPA.03.10.02.E.152-6.E2.bin) = c1af921dc94080b5e0172dbef42dc6ba"
]
You can clearly see the calculated md5 in the string but my conditional doesn't seem to care either way.
Does anyone have any advice?
When you write:
when: '"new_ios_md5" | string in md5_result.stdout'
You are looking for the literal string "new_ios_md5" inside the variable md5_result.stdout. Since you actually want to refer to the new new_ios_md5 variable, you ened to remove the quotes around it:
when: 'new_ios_md5 | string in md5_result.stdout'
Credit goes to zoredache on reddit for the final solution:
BTW, you know that for most of the various networking commands ios_command the results come back as a list right? So you need to index into the list relative to the command you run.
Say you had this. task
ios_command:
commands:
- command: "verify /md5 bootflash:{{ new_ios_bin }}"
- command: show version
- command: show config
register: results
You would have output in the list like this.
# results.stdout[0] = verify
# results.stdout[1] = show version
# results.stdout[2] = show config
So the correct conditional statement would be:
when: 'new_ios_md5 in md5_result.stdout[0]'

SaltStack and GitFS - No Top file or external nodes data matches found

Here is my /etc/salt/master config:
#GitFS
gitfs_provider: pygit2
gitfs_base: DEVELOPMENT
gitfs_env_whitelist:
- base
fileserver_backend:
- git
gitfs_remotes:
- ssh://git#github.com/myrepo/salt-states.git:
- pubkey: /root/.ssh/my.pub
- privkey: /root/.ssh/my
- mountpoint: salt:///srv/salt/salt-states
Here is my directory structure for the repo:
.
|-- README.md
|-- formulas
| `-- test
| |-- test.sls
`-- top.sls
Here is my very basic top.sls:
base:
'*':
- test
If i try to run highstate on my test node I get:
root#saltmaster:/etc/salt] salt -v '*' state.highstate
Executing job with jid 1234567890
-------------------------------------------
test-minion.domain:
----------
ID: states
Function: no.None
Result: False
Comment: No Top file or external nodes data matches found.
Started:
Duration:
Changes:
Summary for test-minion.domain
------------
Succeeded: 0
Failed: 1
------------
Total states run: 1
Total run time: 0.000 ms
I'm not sure why this isn't working and would appreciate any help with this. I've tried just applying the test.sls to see if it was the top file that was the issue but I got this:
root#saltmaster:/etc/salt] salt -v '*' state.sls test
Executing job with jid 1234567890
-------------------------------------------
test-minion.domain:
Data failed to compile:
----------
No matching sls found for 'test' in env 'base'
I had a similar problem, which was due to the cache being out of sync and not updating. If I tried to run:
salt-run fileserver.update
I got:
[WARNING ] Update lock file is present for gitfs remote 'git#github.com:mention-me/Salt.git', skipping. If this warning persists, it is possible that the update process was interrupted, but the lock could also have been manually set. Removing /var/cache/salt/master/gitfs/7d8d9790a933949777fd5a58284b8850/.git/update.lk or running 'salt-run cache.clear_git_lock gitfs type=update' will allow updates to continue for this remote.
Deleting the cache file specified, and running the above command fixed the problem.
I talked to the folks on the saltstack IRC and someone helped me fix the problem. It seems that adding a mountpoint was screwing everything up. Credit goes to:
12:20] == realname : Thomas Phipps
[12:20] == channels : #salt
[12:20] == server : orwell.freenode.net [NL]
[12:20] == : is using a secure connection
[12:20] == account : whytewolf
[12:20] == End of WHOIS

Resources