Recently I came across a bottleneck in our ansible playbooks' code. We were deploying our clusters (e.g. a mongoDB Replica Set) sequentially - i.e. one VM after another, each waiting for the previous to be up and running.
This slowed down the whole cluster deploy time by a factor of the members on it.
To solve this, I started digging on ansible's async actions and pooling and found out a few examples on parallel loops and "fire-and-forget" strategies for scenarios like ours.
The particular thing is, we have defined our own "customize the VM and spawn it" ansible task (create_instance.yml) that gets included and receives the different customization variables from the playbook and abstracts the whole process by running different KVM/shell commands.
Using "Parallel task execution in Ansible" as reference, I ended up having something like:
- name: Generate VMs for DB
hosts: hypervisor_fe
tags: platform,mongodb
tasks:
- include: tasks/create_instance.yml
vars:
vm: "{{ item }}"
with_items: "{{ mongodb.vms }}"
register: mongo_instances
async: 7200
poll: 0
- name: Wait for instance creation to complete
async_status: jid={{ item.ansible_job_id }}
register: mongo_jobs
until: mongo_jobs.finished
retries: 300
with_items: "{{ mongo_instances.results }}"
However, this setup does seem to ignore all the new async code and keeps the old, sequential behavior. I'm guessing this has to do with the no. and granularity of plays inside the imported task. If I instead replace the include for a single, explicit long-running task - let's say, e.g.
- name: Test async operation
shell: ping -c1 {{ item.hostname }} && sleep 20
This does seem to work just fine, running one ping to each item and then moving on to the next action.
Is this assumption right? Does someone has experience with include and async loops in ansible? Do I need to move the async declaration to a single play inside the imported code?
I advise you to rethink your playbook design in the following way:
- hosts: localhost
tasks:
- add_host:
name: "{{ item.name }}"
groups: new_vms
vm: "{{ item }}"
with_items: "{{ mongodb.vms }}"
- hosts: new_vms
tasks:
- include: create_instance.yml
And inside create_instance.yml use delegate_to: hypervisor_fe.
This gives you native Ansible host loop for every vm with concurrent execution of each task.
Related
I'm writing a health-check playbook, and when a host is clustered (VCS), I want to make sure all cluster Service Groups are running.
The output of hastatus looks like this:
[root#node1 ~]# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A node1 RUNNING 0
A node2 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B ClusterService node1 Y N ONLINE
B ClusterService node2 Y N OFFLINE
B NFSExport node1 Y N ONLINE
B NFSExport node2 Y N OFFLINE
B Database node1 Y N ONLINE
B Database node2 Y N OFFLINE
B Application node1 Y N OFFLINE
B Application node2 Y N ONLINE
[root#node1 ~]#
A Service Group can run on any cluster node, and the status of every service group is reported for every cluster node, so the actual number of services groups is (servicegroups / nodes).
I've tried with and without the double braces {{ }} , but no matter what, the last debug task always produces a divide by zero error.
Any help would be appreciated.
# START OF BLOCK
- name: Check cluster status
block:
- name: How many cluster nodes?
shell: hastatus -sum|grep "^A"|wc -l
register: numnodes
- name: How many running cluster nodes?
shell: hastatus -sum|grep "^A"|grep "RUNNING"|wc -l
register: numrunningnodes
- name: report if not all nodes are running
debug:
msg: "ACTION: Not all cluster nodes are running!"
when: numnodes.stdout != numrunningnodes.stdout
# The number of cluster Service Groups == totalsgs / numnodes
- name: How many SGs ("B" lines)?
shell: hastatus -sum|grep "^B"|wc -l
register: totalsgs
- name: How many running SGs?
shell: hastatus -sum|grep "^B"|grep "RUNNING"|wc -l
register: runningsgs
- name: Is everything running somewhere?
debug:
msg: "ACTION: Not all SGs are running!"
when: {{ runningsgs.stdout|int }} != {{ totalsgs.stdout|int / numnodes.stdout|int }}
It's the second worst Ansible you can write. If you throw in include_role with when and loop on top of that, you will have the worst worst.
Ansible is not designed to be a good algorithmic language. It's really good at doing side-effects and juggling inventories, but it's terrible at doing math. You can do it, but it will be unreadable, non-debuggable, non-testable, and you will have your variables littered with global variables (which will for sure bite you later when you do not expect it).
How to do it right? The best best way is to write own module. Which is easier than you think (if you know Python).
The second best (which may be even better for small projects than custom module) is to use script or command module.
Just shovel data as input to the stdin script, and get well processed data back from stdout. The main trick is to produce stdout output in json format and parse it with |from_json filter.
This is example of use of command to parse data:
- name: Get data from somewhere
shell: hastatus -sum
changed_when: false
register: hastatus_cmd
- name: Process data
delegate_to: localhost
command:
cmd: process_hastatus_output.py
stdin: '{{ hastatus_cmd.stdout }}'
changed_when: false
register: hastatus_data
- name: Use it
command: ...?...
when: hastatus.running != hastatus.nodes
vars:
hastatus: '{{ hastatus_data.stdout|from_json }}'
For process_hastatus_output.py you can write tests, you can run them without ansible to check of edge cases, and you'll have the beautiful, cosy language to transform your data.
Or, you can do it in mix of Jinja and Ansible, causing irreparable harm to yourself, and everyone reading your code later.
I am not very sure how to trace an existing project written in YAML for networking devices.
I have setup the system correctly and its executing all the tasks perfectly. But I want to check what all data are being assigned.
Is there a way to trace ansible just like python?
Ex: In python, I can use ipdb module or just use print() statement to see all kind of things.
Ansible provides a Playbook Debugger, which can be used to trace execution of tasks.
If you want to debug everything in a play, you can pass debugger: always
- name: some play
hosts: all
debugger: always
tasks: ...
Then you can use c command to continue to the next task, p task_vars to see variables or p result._result to see the result.
Debugger can be used on a task or a role level too like this:
- hosts: all
roles:
- role: dj-wasabi.zabbix-agent
debugger: always
It helps to not to pollute your roles with debug tasks, while limiting the scope of debugging.
The other method is to use debug module, which is similar to using print statements in python. You can use in your tasks like this:
# Example that prints the loopback address and gateway for each host
- debug:
msg: System {{ inventory_hostname }} has uuid {{ ansible_product_uuid }}
- debug:
msg: System {{ inventory_hostname }} has gateway {{ ansible_default_ipv4.gateway }}
when: ansible_default_ipv4.gateway is defined
# Example that prints return information from the previous task
- shell: /usr/bin/uptime
register: result
- debug:
var: result
verbosity: 2
In my ansible playbooks, I often have steps like "create a directory and then do something in it", e.g.:
- name: Create directory
file:
path: "{{ tomcat_directory }}"
state: directory
- name: Extract tomcat
unarchive:
src: 'tomcat.tar.gz'
dest: '{{ tomcat_directory }}'
When I run this playbook, it works perfectly fine. However, when I run this playbook in check mode, the first step succeeds (folder would have been created), but the second one fails, because the folder does not exist.
Is there any way how I could write steps like these where I create folder and then operate in it while also being able to run the playbook in check mode (without skipping such steps)?
Check mode can be a bit of a pain. You only really have two options:
1) Add conditionals to tasks to skip them in check mode, which you don't want to do. For reference tho:
when: not ansible_check_mode
2) You can change the behaviour of the task in check mode. If you set check_mode: no on a task, then in check mode it will behave as it would in a normal run. That is to say, despite you specifying check mode, it will actually perform the task and create the dir if it does not already exist. You have to make a choice if you are happy for a given task to run for real in check mode, so it tends to only be appropriate for low risk tasks, but does provide you a route to continue testing the rest of your playbook that is dependent on the step in question.
Ansible Check Mode Docs
You could make use of the ignore_errors task option, along with the ansible_check_mode variable, to ignore errors with your Extract tomcat task only when running in check mode, e.g.:
- name: Create directory
file:
path: "{{ tomcat_directory }}"
state: directory
- name: Extract tomcat
unarchive:
src: 'tomcat.tar.gz'
dest: '{{ tomcat_directory }}'
ignore_errors: "{{ ansible_check_mode }}"
Running this in check mode will show the Extract tomcat task failed due to dest not existing. However, instead of failing the playbook, the task failure will be marked as ignored and playbook execution will continue.
An option would be to "register: result" and test "when: result.state is defined"
- name: Create directory
file:
path: "{{ tomcat_directory }}"
state: directory
register: result
- name: Extract tomcat
unarchive:
src: 'tomcat.tar.gz'
dest: '{{ tomcat_directory }}'
when: result.state is defined
I'm trying to execute the redis-trib.rb utility (used to configure redis clusters) on the salt orchestrator, after executing salt states on multiple minions to bring up redis processes.
Reading the salt documentation it looks like the the orchestrate runner does what I want to execute the minion states.
Indeed this snippet functions perfectly when executed with sudo salt-run state.orchestrate orch.redis_cluster:
redis_cluster_instances_create:
salt.state:
- tgt: '*redis*'
- highstate: True
The problem is with the next step, which requires me to call redis-trib.rb on the orchestrator. Reading through the documentation it looks like I need to use the salt.runner state (executes another runner), to call the salt.cmd runner (executes a salt state locally), which in turn calls the cmd.run state to actually execute the command.
What I have looks like this:
redis_cluster_setup_masters_{{ cluster }}:
salt.runner:
- name: salt.cmd
- fun: cmd.run
- args:
- srv/salt/orch/redis_cluster/usr/bin/redis-trib.rb create {% for shard in shards %}{{ shard['master'] }} {% endfor %}
- kwargs:
unless: srv/salt/orch/redis_cluster/usr/bin/redis-trib.rb info {{ shards[0]['master'] }} | grep 'cluster_state:ok'
- require:
- salt: redis_cluster_instances_create
But it doesn't work, and salt errors out with:
lab-orchestrator_master:
----------
ID: redis_cluster_instances_create
Function: salt.state
Result: True
Comment: States ran successfully. No changes made to lab-redis04, lab-redis01, lab-redis02, lab-redis03.
Started: 09:54:57.811313
Duration: 14223.204 ms
Changes:
----------
ID: redis_cluster_setup_masters_pdnocg
Function: salt.runner
Name: salt.cmd
Result: False
Comment: Exception occurred in runner salt.cmd: Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/salt/client/mixins.py", line 392, in _low
data['return'] = self.functions[fun](*args, **kwargs)
TypeError: cmd() takes at least 1 argument (0 given)
Started: 09:55:12.034716
Duration: 1668.345 ms
Changes:
Can anyone suggest what i'm doing wrong? Or an alternative way of executing commands locally on the orchestrator?
The problem is, that you are passing fun et. al. to the runner instead of the execution module. Also note that you have to pass the arguments via arg and not args:
redis_cluster_setup_masters_{{ cluster }}:
salt.runner:
- name: salt.cmd
- arg:
- fun=cmd.run
- cmd='srv/salt/orch/redis_cluster/usr/bin/redis-trib.rb create {% for shard in shards %}{{ shard['master'] }} {% endfor %}'
- unless: srv/salt/orch/redis_cluster/usr/bin/redis-trib.rb info {{ shards[0]['master'] }} | grep 'cluster_state:ok'
- require:
- salt: redis_cluster_instances_create
That should do the trick, though I haven't tested it with the kwargs parameter.
This may seem at first to be pretty simple. But I can tell you I've been wracking my brains for a couple days on this. I've read a lot of docs, sat on IRC with folks, and spoken to colleagues and at this point I don't have an answer I really think holds up.
I've looked into a few possible approaches
reactor
orchestration runner
I don't like these two because of the top down execution necessity... they seem tailored to orchestrating multiple node states, not workflows in a single node.
custom states
This is kind of something I would REALLY like to avoid as this is a repeated workflow, and I don't want to build customizations like this. There's too much room for non legibility if I go down this path with my team mates.
requires / watches
These don't have a concept ( that I am aware of ) of applying a state repeatedly, or in a logical order / workflow.
And a few others I won't mention.
Without further discussion, here's my dilemma.
Goals:
Jenkins Master gets Deployed
We can unit.test the deployment as it proceeds
We only restart tomcat when necessary
We can update plugins on a per package basis
A big emphasis on good clean intuitively clear salt configs
Jenkins deployment is pretty straight forward. We drop in the packages, and the configs, and we're set.
Unit testing is harder. As an example I've got this state file.
actions/version.sls:
# Hit's the jenkins CLI interface to check for version info
# This can be used to verify that jenkins is active and the version we want
# Import some info
{%- from 'jenkins/init.sls' import jenkins_home with context %}
# Install plugins in jenkins_plugins list
jenkins_version:
cmd.run:
- name: java -jar jenkins-cli.jar -s "http://127.0.0.1:8080" version
- cwd: /var/lib/tomcat/webapps/ROOT/WEB-INF/
- user: jenkins
actions.version basically verifies that jenkins is running and queryable. we want to be sure of this during the build at several points.
example... tomcat takes time to spin up. we had to add a delay to that restart operation. If you check out start.sls below you can see that operation occurring. Note the bug open on init_delay: .
actions/start.sls:
# Starts the tomcat service
tomcat_start:
service.running:
- name: tomcat
- enable: True
- full_restart: True
# Not functional atm see --> https://github.com/saltstack/salt/issues/20631
# - init_delay: 120
# initiate a 120 second delay after any service start to let tomcat come up.
tomcat_wait:
module.run:
- name: test.sleep
- length: 60
include:
- jenkins.actions.version
Now we have this restart capability by doing an actions.stop and an actions.start. We have this actions.version state that we can use to verify that the system is ready to proceed with jenkins specific state workflows.
I want to do something kinda like this...
Install Jenkins --> Grab yaml of plugins --> install plugins that need it
Pretty straight forward.
Except, to loop through the yaml of plugins I am using Jinja.
And now I have no way to call and be sure that the start.sls and version.sls states can be repeatedly applied.
I am looking for, a good way to do that.
This would be something akin to a jenkins.sls
{% set repo_username = "foo" -%}
{% set repo_password = "bar" -%}
include:
- jenkins.actions.version
- jenkins.actions.stop
- jenkins.actions.start
# Install Jenkins
jenkins:
pkg:
- installed
# Import Jenkins Plugins as List, and Working Path
{%- from 'jenkins/init.sls' import jenkins_home with context %}
{%- import_yaml "jenkins/plugins.sls" as jenkins_plugins %}
{%- import_yaml "jenkins/custom-plugins.sls" as custom_plugins %}
# Grab updated package list
jenkins-contact-update-server:
cmd.run:
- name: curl -L http://updates.jenkins-ci.org/update-center.json | sed '1d;$d' > {{ jenkins_home }}/updates/default.json
- unless: test -d {{ jenkins_home }}/updates/default.json
- require:
- pkg: jenkins
- service: tomcat
# Install plugins in jenkins_plugins list
{% for plugin in jenkins_plugins %}
jenkins-plugin-{{ plugin }}:
cmd.run:
- name: java -jar jenkins-cli.jar -s "http://127.0.0.1:8080" install-plugin "{{ plugin }}"
- unless: java -jar jenkins-cli.jar -s "http://127.0.0.1:8080" list-plugins | grep "{{ plugin }}"
- cwd: /var/lib/tomcat/webapps/ROOT/WEB-INF/
- user: jenkins
- require:
- pkg: jenkins
- service: tomcat
Here is where I am stuck. require won't do this. and lists
of actions don't seem to schedule linearly in salt. I need to
be able to just verify that jenkins is up and ready. I need
to be able to restart tomcat after a single plugin in the
iteration is added. I need to be able to do this to satisfy
dependencies in the plugin order.
- sls: jenkins.actions.version
- sls: jenkins.actions.stop
- sls: jenkins.actions.start
# This can't work for several reasons
# - watch_in:
# - sls: jenkins-safe-restart
{% endfor %}
# Install custom plugins in the custom_plugins list
{% for cust_plugin,cust_plugin_url in custom_plugins.iteritems() %}
# manually downloading the plugin, because jenkins-cli.jar doesn't seem to work direct to artifactory URLs.
download-plugin-{{ cust_plugin }}:
cmd.run:
- name: curl -o {{ cust_plugin }}.jpi -O "https://{{ repo_username }}:{{ repo_password }}#{{ cust_plugin_url }}"
- unless: java -jar jenkins-cli.jar -s "http://127.0.0.1:8080" list-plugins | grep "{{ cust_plugin }}"
- cwd: /tmp
- user: jenkins
- require:
- pkg: jenkins
- service: tomcat
# installing the plugin ( REQUIRES TOMCAT RESTART AFTER )
custom-plugin-{{ cust_plugin }}:
cmd.run:
- name: java -jar jenkins-cli.jar -s "http://127.0.0.1:8080" install-plugin /tmp/{{ cust_plugin }}.jpi
- unless: java -jar jenkins-cli.jar -s "http://127.0.0.1:8080" list-plugins | grep "{{ cust_plugin }}"
- cwd: /var/lib/tomcat/webapps/ROOT/WEB-INF/
- user: jenkins
- require:
- pkg: jenkins
- service: tomcat
{% endfor %}
You won't be able to achieve this without using reactors, beacons and especially not without writing your own python execution modules.
Jenkins Master gets Deployed
Write a jenkins execution module in python with a function install(...):. In that function you would manage any dependencies by either calling existing execution modules or by writing them yourself.
We can unit.test the deployment as it proceeds
Inside the install function of the jenkins module you would fire specific events depending on the results of the install.
if not _run_deployment_phase(...):
__salt__['event.send']('jenkins/install/error', {
'finished': False,
'message': "Something failed during the deployment!",
})
You would map that event to reactor sls files and handle it.
We only restart tomcat when necessary
Write a tomcat module. Add an _is_up(...) function where you would check if tomcat is up by parsing the tomcat logs for the result. Call the function inside a state module and add a mod_watch function.
def mod_watch():
# required dict to return
return_dict = {
"name": "Tomcat install",
"changes": {},
"result": False,
"comment": "",
}
if __salt__["tomcat._is_up"]():
return_dict["result"] = True
return_dict["comment"] = "Tomcat is up."
if __opts__["test"]:
return_dict["result"] = None
return_dict["comment"] = "comment here about what will change"
return return_dict
# execute changes now
return return_dict
Use your state module inside a state file.
install tomcat:
tomcat.install:
- name: ...
- user: ...
...
wait until tomcat is up:
cmd.run:
- name: ...
- watch:
- tomcat: install tomcat
We can update plugins on a per package basis
Add a function to your jenkins execution module named install_plugin. View pkg.install code to replicate interface.
A big emphasis on good clean intuitively clear salt configs
Write python execution modules for easy and maintainable configuration logic. Use that execution module inside your own state modules. Inside state files call your own state modules and supply individual configuration with any state renderer you like.
States only execute once, by design. If you need the same action to occur multiple times, you need multiple states. Also, includes are only included a single time.
Rather than all of this include/require stuff you're doing, you should just put all of the code into a single sls file, and generate states through jinja iteration.
If what you're trying to do is add a bunch of plugins, add config files, then at the end do restarts, then you should really just execute everything in order, don't use require, and use listen or listen_in, rather than watch or watch_in.
listen/listen_in cause triggered actions to happen at the end of a state run. They are similar to the concept of handlers in Ansible.
This is a pretty old question, but If you change your Jenkins/tomcat start/stop procedure to be a standard init/systemd/windows service (as all well behaved services should be), you could have a service.running for the Jenkins service and add this to each of your custom-plugin-{{ cust_plugin }} states.
require_in:
- svc: jenkins
watch_in:
- svc: jenkins
You could continue to use the cmd.run module with onchanges. You'd have to add onchanges_in: to each of the custom-plugin-{{ cust_plugin }} states, but you need to have at least one item in the on changes list or the command will fire every time the state runs.
If you use require you cause salt to re-order your states. If you want your states to run in order, just write them in the order you want them to run in.
Watch/watch_in will also re-order your states. If you use listen/listen_in instead, it'll queue the triggered actions to run in the order they were triggered at the end of the state run.
See:
http://ryandlane.com/blog/2014/07/14/truly-ordered-execution-using-saltstack/
http://ryandlane.com/blog/2015/01/06/truly-ordered-execution-using-saltstack-part-2/