I'm writing a health-check playbook, and when a host is clustered (VCS), I want to make sure all cluster Service Groups are running.
The output of hastatus looks like this:
[root#node1 ~]# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A node1 RUNNING 0
A node2 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B ClusterService node1 Y N ONLINE
B ClusterService node2 Y N OFFLINE
B NFSExport node1 Y N ONLINE
B NFSExport node2 Y N OFFLINE
B Database node1 Y N ONLINE
B Database node2 Y N OFFLINE
B Application node1 Y N OFFLINE
B Application node2 Y N ONLINE
[root#node1 ~]#
A Service Group can run on any cluster node, and the status of every service group is reported for every cluster node, so the actual number of services groups is (servicegroups / nodes).
I've tried with and without the double braces {{ }} , but no matter what, the last debug task always produces a divide by zero error.
Any help would be appreciated.
# START OF BLOCK
- name: Check cluster status
block:
- name: How many cluster nodes?
shell: hastatus -sum|grep "^A"|wc -l
register: numnodes
- name: How many running cluster nodes?
shell: hastatus -sum|grep "^A"|grep "RUNNING"|wc -l
register: numrunningnodes
- name: report if not all nodes are running
debug:
msg: "ACTION: Not all cluster nodes are running!"
when: numnodes.stdout != numrunningnodes.stdout
# The number of cluster Service Groups == totalsgs / numnodes
- name: How many SGs ("B" lines)?
shell: hastatus -sum|grep "^B"|wc -l
register: totalsgs
- name: How many running SGs?
shell: hastatus -sum|grep "^B"|grep "RUNNING"|wc -l
register: runningsgs
- name: Is everything running somewhere?
debug:
msg: "ACTION: Not all SGs are running!"
when: {{ runningsgs.stdout|int }} != {{ totalsgs.stdout|int / numnodes.stdout|int }}
It's the second worst Ansible you can write. If you throw in include_role with when and loop on top of that, you will have the worst worst.
Ansible is not designed to be a good algorithmic language. It's really good at doing side-effects and juggling inventories, but it's terrible at doing math. You can do it, but it will be unreadable, non-debuggable, non-testable, and you will have your variables littered with global variables (which will for sure bite you later when you do not expect it).
How to do it right? The best best way is to write own module. Which is easier than you think (if you know Python).
The second best (which may be even better for small projects than custom module) is to use script or command module.
Just shovel data as input to the stdin script, and get well processed data back from stdout. The main trick is to produce stdout output in json format and parse it with |from_json filter.
This is example of use of command to parse data:
- name: Get data from somewhere
shell: hastatus -sum
changed_when: false
register: hastatus_cmd
- name: Process data
delegate_to: localhost
command:
cmd: process_hastatus_output.py
stdin: '{{ hastatus_cmd.stdout }}'
changed_when: false
register: hastatus_data
- name: Use it
command: ...?...
when: hastatus.running != hastatus.nodes
vars:
hastatus: '{{ hastatus_data.stdout|from_json }}'
For process_hastatus_output.py you can write tests, you can run them without ansible to check of edge cases, and you'll have the beautiful, cosy language to transform your data.
Or, you can do it in mix of Jinja and Ansible, causing irreparable harm to yourself, and everyone reading your code later.
Related
I tried to write some orchestration state for converting HBase to high available mode. To do so, there are couple requirements for checking:
Datanode process has to be up and running on ALL nodes.
HRegion process has to be up and running on ALL nodes.
After converting, namenode process has to be up and running on two nodes only.
So I got ALL implicitly for free, e.g.
{# If all minions passed, then this will pass #}
2_1_example_pod_is_dn_running:
salt.function:
- name: cmd.run
- tgt: 'G#stype:hbase and G#pod_name:example_pod'
- tgt_type: compound
- arg:
- ps aux | grep datanode
- failhard: True
But how to check any or two?
{# Pseduo code. This won't work. #}
2_3_example_pod_is_nn_running:
- name: cmd.run
- tgt: 'G#stype:hbase and G#pod_name:example_pod'
- tgt_type: compound
- arg:
- ps aux | grep namenode
- successful_count: 2 {# <== namenode process has to be running on two minions #}
- failhard: True
So, what you want for this is to know which minions should "fail" the check. if you know you can use fail_minions to mark the minions that will not have namenode.
For things like this it is almost always to either be specific. when setting it up so you know how which minions will have namenode. or, to skip the check an use subset to only start namenode on 2 servers. subset will pick the number of servers of a subset of the targeted minions randomly and run the command on only the subset.
I am not very sure how to trace an existing project written in YAML for networking devices.
I have setup the system correctly and its executing all the tasks perfectly. But I want to check what all data are being assigned.
Is there a way to trace ansible just like python?
Ex: In python, I can use ipdb module or just use print() statement to see all kind of things.
Ansible provides a Playbook Debugger, which can be used to trace execution of tasks.
If you want to debug everything in a play, you can pass debugger: always
- name: some play
hosts: all
debugger: always
tasks: ...
Then you can use c command to continue to the next task, p task_vars to see variables or p result._result to see the result.
Debugger can be used on a task or a role level too like this:
- hosts: all
roles:
- role: dj-wasabi.zabbix-agent
debugger: always
It helps to not to pollute your roles with debug tasks, while limiting the scope of debugging.
The other method is to use debug module, which is similar to using print statements in python. You can use in your tasks like this:
# Example that prints the loopback address and gateway for each host
- debug:
msg: System {{ inventory_hostname }} has uuid {{ ansible_product_uuid }}
- debug:
msg: System {{ inventory_hostname }} has gateway {{ ansible_default_ipv4.gateway }}
when: ansible_default_ipv4.gateway is defined
# Example that prints return information from the previous task
- shell: /usr/bin/uptime
register: result
- debug:
var: result
verbosity: 2
Recently I came across a bottleneck in our ansible playbooks' code. We were deploying our clusters (e.g. a mongoDB Replica Set) sequentially - i.e. one VM after another, each waiting for the previous to be up and running.
This slowed down the whole cluster deploy time by a factor of the members on it.
To solve this, I started digging on ansible's async actions and pooling and found out a few examples on parallel loops and "fire-and-forget" strategies for scenarios like ours.
The particular thing is, we have defined our own "customize the VM and spawn it" ansible task (create_instance.yml) that gets included and receives the different customization variables from the playbook and abstracts the whole process by running different KVM/shell commands.
Using "Parallel task execution in Ansible" as reference, I ended up having something like:
- name: Generate VMs for DB
hosts: hypervisor_fe
tags: platform,mongodb
tasks:
- include: tasks/create_instance.yml
vars:
vm: "{{ item }}"
with_items: "{{ mongodb.vms }}"
register: mongo_instances
async: 7200
poll: 0
- name: Wait for instance creation to complete
async_status: jid={{ item.ansible_job_id }}
register: mongo_jobs
until: mongo_jobs.finished
retries: 300
with_items: "{{ mongo_instances.results }}"
However, this setup does seem to ignore all the new async code and keeps the old, sequential behavior. I'm guessing this has to do with the no. and granularity of plays inside the imported task. If I instead replace the include for a single, explicit long-running task - let's say, e.g.
- name: Test async operation
shell: ping -c1 {{ item.hostname }} && sleep 20
This does seem to work just fine, running one ping to each item and then moving on to the next action.
Is this assumption right? Does someone has experience with include and async loops in ansible? Do I need to move the async declaration to a single play inside the imported code?
I advise you to rethink your playbook design in the following way:
- hosts: localhost
tasks:
- add_host:
name: "{{ item.name }}"
groups: new_vms
vm: "{{ item }}"
with_items: "{{ mongodb.vms }}"
- hosts: new_vms
tasks:
- include: create_instance.yml
And inside create_instance.yml use delegate_to: hypervisor_fe.
This gives you native Ansible host loop for every vm with concurrent execution of each task.
I'm trying SaltStack after using Puppet for a while, but I can't understand their use of the word "state".
My understanding is that, for example, a light switch has 2 possible states - on or off. When I write my SLS configuration I am describing what state a server should be in. When I ask SaltStack to provision a server I issue the command salt '*' state.highstate. I understand that a server can be in a highstate (as described in my config) or not. All good so far.
But this page describes other states. It describes lowstate, highstate and overstate (amongst others) as layers. Does this mean a server passes through several states to get to a highstate? Or all states are maintained simultaneously as layers? Or can I configure multiple possible states in my SLS and have SaltStack switch between them? Or are they just layers to SaltStack that have 'state' in the name and I'm confused?
I'm probably missing something obvious, if anyone can nudge me in the right direction I think a lot of the documentation will become clear to me!
Here, top.sls wihch contain,
# cat top.sls
base:
'*':
- httpd_require
and,
# cat httpd_require.sls
install_httpd:
pkg.installed:
- name: httpd
service.running:
- name: httpd
- enable: True
- require:
- file: install_httpd
file.managed:
- name: /var/www/html/index.html
- source: salt://index1.html
- user: root
- group: root
- mode: 644
- require:
- pkg: install_httpd
High state:
We can see all the aspects of high state system while working with state files( .sls), There are three specific components.
High data:
SLS file:
High State
Each individual State represents a piece of high data(pkg.installed:'s block), Salt will compile all relevant SLS inside the top.sls, When these files are tied together using includes, and further glued together for use inside an environment using a top.sls file, they form a High State.
# salt 'remote_minion' state.show_highstate --out yaml
remote_minion:
install_httpd:
__env__: base
__sls__: httpd_require
file:
- name: /var/www/html/index.html
- source: salt://index1.html
- user: root
- group: root
- mode: 644
- require:
- pkg: install_httpd
- managed
- order: 10002
pkg:
- name: httpd
- installed
- order: 10000
service:
- name: httpd
- enable: true
- require:
- file: install_httpd
- running
- order: 10001
First, an order is declared, All States that are set to be first will have their order adjusted accordingly. Salt will then add 10000 to the last defined number (which is 0 by default), and add any States that are not explicitly ordered.
Salt will also add some variables that it uses internally, to know which environment (__env__) to execute the State in, and which SLS file (__sls__) the State declaration came from, Remember that the order is still no more than a starting point; the actual High State will be executed based first on requisites, and then on order.
"In other words, "High" data refers generally to data as it is seen by the user."
Low States:
""Low" data refers generally to data as it is ingested and used by Salt."
Once the final High State has been generated, it will be sent to the State compiler. This will reformat the State data into a format that Salt uses internally to evaluate each declaration, and feed data into each State module (which will in turn call the execution modules, as necessary). As with high data, low data can be broken into individual components:
Low State
Low chunks
State module
Execution module(s)
# salt 'remote_minion' state.show_lowstate --out yaml
remote_minion:
- __env__: base
__id__: install_httpd
__sls__: httpd_require
fun: installed
name: httpd
order: 10000
state: pkg
- __env__: base
__id__: install_httpd
__sls__: httpd_require
enable: true
fun: running
name: httpd
order: 10001
require:
- file: install_httpd
state: service
- __env__: base
__id__: install_httpd
__sls__: httpd_require
fun: managed
group: root
mode: 644
name: /var/www/html/index.html
order: 10002
require:
- pkg: install_httpd
source: salt://index1.html
state: file
user: root
Together, all this comprises a Low State. Each individual item is a Low Chunk. The first Low Chunk on this list looks like this:
- __env__: base
__id__: install_httpd
__sls__: httpd_require
fun: installed
name: http
order: 10000
state: pkg
Each low chunk maps to a State module (in this case, pkg) and a function inside that State module (in this case, installed). An ID is also provided at this level (__id__). Salt will map relationships (that is, requisites) between States using a combination of State and __id__. If a name has not been declared by the user, then Salt will automatically use the __id__ as the name.Once a function inside a State module has been called, it will usually map to one or more execution modules which actually do the work.
salt '\*' state.highstate
'*' refers to all the minions connected to the master.
'state.highstate' is used to run all modules / scripts mentioned in top.sls defined in master
To invoke a specific module / script on all minions, use the following salt command where the state information is defined in state.sls for apache in the example given below.
salt '\*' state.sls apache
To invoke the above salt call only on a specific minion, use the below command.
salt 'minion-name' state.sls apache
I don't know all levels of state, but when you run :
salt '*' state.highstate
Saltstack apply the states you provide in /srv/salt/top.sls.
If you write nothing in it, you can't apply an highstate.
You can apply other state with this command :
salt '*' state.sls state.example
A highstate is just the collection of states that is applied to your server. There is a process in the background where Salt's "state compiler" goes through several stages preparing the data in order to produce the highstate, but you don't really need to worry about those.
Things like the lowstate can help with debugging, but aren't necessary for day to day usage. The highstate is only applied once.
This may seem at first to be pretty simple. But I can tell you I've been wracking my brains for a couple days on this. I've read a lot of docs, sat on IRC with folks, and spoken to colleagues and at this point I don't have an answer I really think holds up.
I've looked into a few possible approaches
reactor
orchestration runner
I don't like these two because of the top down execution necessity... they seem tailored to orchestrating multiple node states, not workflows in a single node.
custom states
This is kind of something I would REALLY like to avoid as this is a repeated workflow, and I don't want to build customizations like this. There's too much room for non legibility if I go down this path with my team mates.
requires / watches
These don't have a concept ( that I am aware of ) of applying a state repeatedly, or in a logical order / workflow.
And a few others I won't mention.
Without further discussion, here's my dilemma.
Goals:
Jenkins Master gets Deployed
We can unit.test the deployment as it proceeds
We only restart tomcat when necessary
We can update plugins on a per package basis
A big emphasis on good clean intuitively clear salt configs
Jenkins deployment is pretty straight forward. We drop in the packages, and the configs, and we're set.
Unit testing is harder. As an example I've got this state file.
actions/version.sls:
# Hit's the jenkins CLI interface to check for version info
# This can be used to verify that jenkins is active and the version we want
# Import some info
{%- from 'jenkins/init.sls' import jenkins_home with context %}
# Install plugins in jenkins_plugins list
jenkins_version:
cmd.run:
- name: java -jar jenkins-cli.jar -s "http://127.0.0.1:8080" version
- cwd: /var/lib/tomcat/webapps/ROOT/WEB-INF/
- user: jenkins
actions.version basically verifies that jenkins is running and queryable. we want to be sure of this during the build at several points.
example... tomcat takes time to spin up. we had to add a delay to that restart operation. If you check out start.sls below you can see that operation occurring. Note the bug open on init_delay: .
actions/start.sls:
# Starts the tomcat service
tomcat_start:
service.running:
- name: tomcat
- enable: True
- full_restart: True
# Not functional atm see --> https://github.com/saltstack/salt/issues/20631
# - init_delay: 120
# initiate a 120 second delay after any service start to let tomcat come up.
tomcat_wait:
module.run:
- name: test.sleep
- length: 60
include:
- jenkins.actions.version
Now we have this restart capability by doing an actions.stop and an actions.start. We have this actions.version state that we can use to verify that the system is ready to proceed with jenkins specific state workflows.
I want to do something kinda like this...
Install Jenkins --> Grab yaml of plugins --> install plugins that need it
Pretty straight forward.
Except, to loop through the yaml of plugins I am using Jinja.
And now I have no way to call and be sure that the start.sls and version.sls states can be repeatedly applied.
I am looking for, a good way to do that.
This would be something akin to a jenkins.sls
{% set repo_username = "foo" -%}
{% set repo_password = "bar" -%}
include:
- jenkins.actions.version
- jenkins.actions.stop
- jenkins.actions.start
# Install Jenkins
jenkins:
pkg:
- installed
# Import Jenkins Plugins as List, and Working Path
{%- from 'jenkins/init.sls' import jenkins_home with context %}
{%- import_yaml "jenkins/plugins.sls" as jenkins_plugins %}
{%- import_yaml "jenkins/custom-plugins.sls" as custom_plugins %}
# Grab updated package list
jenkins-contact-update-server:
cmd.run:
- name: curl -L http://updates.jenkins-ci.org/update-center.json | sed '1d;$d' > {{ jenkins_home }}/updates/default.json
- unless: test -d {{ jenkins_home }}/updates/default.json
- require:
- pkg: jenkins
- service: tomcat
# Install plugins in jenkins_plugins list
{% for plugin in jenkins_plugins %}
jenkins-plugin-{{ plugin }}:
cmd.run:
- name: java -jar jenkins-cli.jar -s "http://127.0.0.1:8080" install-plugin "{{ plugin }}"
- unless: java -jar jenkins-cli.jar -s "http://127.0.0.1:8080" list-plugins | grep "{{ plugin }}"
- cwd: /var/lib/tomcat/webapps/ROOT/WEB-INF/
- user: jenkins
- require:
- pkg: jenkins
- service: tomcat
Here is where I am stuck. require won't do this. and lists
of actions don't seem to schedule linearly in salt. I need to
be able to just verify that jenkins is up and ready. I need
to be able to restart tomcat after a single plugin in the
iteration is added. I need to be able to do this to satisfy
dependencies in the plugin order.
- sls: jenkins.actions.version
- sls: jenkins.actions.stop
- sls: jenkins.actions.start
# This can't work for several reasons
# - watch_in:
# - sls: jenkins-safe-restart
{% endfor %}
# Install custom plugins in the custom_plugins list
{% for cust_plugin,cust_plugin_url in custom_plugins.iteritems() %}
# manually downloading the plugin, because jenkins-cli.jar doesn't seem to work direct to artifactory URLs.
download-plugin-{{ cust_plugin }}:
cmd.run:
- name: curl -o {{ cust_plugin }}.jpi -O "https://{{ repo_username }}:{{ repo_password }}#{{ cust_plugin_url }}"
- unless: java -jar jenkins-cli.jar -s "http://127.0.0.1:8080" list-plugins | grep "{{ cust_plugin }}"
- cwd: /tmp
- user: jenkins
- require:
- pkg: jenkins
- service: tomcat
# installing the plugin ( REQUIRES TOMCAT RESTART AFTER )
custom-plugin-{{ cust_plugin }}:
cmd.run:
- name: java -jar jenkins-cli.jar -s "http://127.0.0.1:8080" install-plugin /tmp/{{ cust_plugin }}.jpi
- unless: java -jar jenkins-cli.jar -s "http://127.0.0.1:8080" list-plugins | grep "{{ cust_plugin }}"
- cwd: /var/lib/tomcat/webapps/ROOT/WEB-INF/
- user: jenkins
- require:
- pkg: jenkins
- service: tomcat
{% endfor %}
You won't be able to achieve this without using reactors, beacons and especially not without writing your own python execution modules.
Jenkins Master gets Deployed
Write a jenkins execution module in python with a function install(...):. In that function you would manage any dependencies by either calling existing execution modules or by writing them yourself.
We can unit.test the deployment as it proceeds
Inside the install function of the jenkins module you would fire specific events depending on the results of the install.
if not _run_deployment_phase(...):
__salt__['event.send']('jenkins/install/error', {
'finished': False,
'message': "Something failed during the deployment!",
})
You would map that event to reactor sls files and handle it.
We only restart tomcat when necessary
Write a tomcat module. Add an _is_up(...) function where you would check if tomcat is up by parsing the tomcat logs for the result. Call the function inside a state module and add a mod_watch function.
def mod_watch():
# required dict to return
return_dict = {
"name": "Tomcat install",
"changes": {},
"result": False,
"comment": "",
}
if __salt__["tomcat._is_up"]():
return_dict["result"] = True
return_dict["comment"] = "Tomcat is up."
if __opts__["test"]:
return_dict["result"] = None
return_dict["comment"] = "comment here about what will change"
return return_dict
# execute changes now
return return_dict
Use your state module inside a state file.
install tomcat:
tomcat.install:
- name: ...
- user: ...
...
wait until tomcat is up:
cmd.run:
- name: ...
- watch:
- tomcat: install tomcat
We can update plugins on a per package basis
Add a function to your jenkins execution module named install_plugin. View pkg.install code to replicate interface.
A big emphasis on good clean intuitively clear salt configs
Write python execution modules for easy and maintainable configuration logic. Use that execution module inside your own state modules. Inside state files call your own state modules and supply individual configuration with any state renderer you like.
States only execute once, by design. If you need the same action to occur multiple times, you need multiple states. Also, includes are only included a single time.
Rather than all of this include/require stuff you're doing, you should just put all of the code into a single sls file, and generate states through jinja iteration.
If what you're trying to do is add a bunch of plugins, add config files, then at the end do restarts, then you should really just execute everything in order, don't use require, and use listen or listen_in, rather than watch or watch_in.
listen/listen_in cause triggered actions to happen at the end of a state run. They are similar to the concept of handlers in Ansible.
This is a pretty old question, but If you change your Jenkins/tomcat start/stop procedure to be a standard init/systemd/windows service (as all well behaved services should be), you could have a service.running for the Jenkins service and add this to each of your custom-plugin-{{ cust_plugin }} states.
require_in:
- svc: jenkins
watch_in:
- svc: jenkins
You could continue to use the cmd.run module with onchanges. You'd have to add onchanges_in: to each of the custom-plugin-{{ cust_plugin }} states, but you need to have at least one item in the on changes list or the command will fire every time the state runs.
If you use require you cause salt to re-order your states. If you want your states to run in order, just write them in the order you want them to run in.
Watch/watch_in will also re-order your states. If you use listen/listen_in instead, it'll queue the triggered actions to run in the order they were triggered at the end of the state run.
See:
http://ryandlane.com/blog/2014/07/14/truly-ordered-execution-using-saltstack/
http://ryandlane.com/blog/2015/01/06/truly-ordered-execution-using-saltstack-part-2/