How to do any or two successful minion return in Saltstack orch state? - salt-stack

I tried to write some orchestration state for converting HBase to high available mode. To do so, there are couple requirements for checking:
Datanode process has to be up and running on ALL nodes.
HRegion process has to be up and running on ALL nodes.
After converting, namenode process has to be up and running on two nodes only.
So I got ALL implicitly for free, e.g.
{# If all minions passed, then this will pass #}
2_1_example_pod_is_dn_running:
salt.function:
- name: cmd.run
- tgt: 'G#stype:hbase and G#pod_name:example_pod'
- tgt_type: compound
- arg:
- ps aux | grep datanode
- failhard: True
But how to check any or two?
{# Pseduo code. This won't work. #}
2_3_example_pod_is_nn_running:
- name: cmd.run
- tgt: 'G#stype:hbase and G#pod_name:example_pod'
- tgt_type: compound
- arg:
- ps aux | grep namenode
- successful_count: 2 {# <== namenode process has to be running on two minions #}
- failhard: True

So, what you want for this is to know which minions should "fail" the check. if you know you can use fail_minions to mark the minions that will not have namenode.
For things like this it is almost always to either be specific. when setting it up so you know how which minions will have namenode. or, to skip the check an use subset to only start namenode on 2 servers. subset will pick the number of servers of a subset of the targeted minions randomly and run the command on only the subset.

Related

SaltStack - mine.get is able to grab mine_function data from master, but not in .sls or jinja variable

I hope you can help me with a rather frustrating issue I have been having. I have been trying to remove static config from some config files and move this to Pillar/Mine data using Salt-Stack.
Everything is going well, with the exception of 1 specific task.
This is grabbing data (custom grain) from 3 specific minions to make 3 different variables in an .sls (context) or a jinja file (direct variable) on other minions, but I cannot seem to get it to work.
(My scenario is flexible as I can call this in either a state file or jinja variable in a config file.)
This is on AWS EC2 instances, but can be replicated away from AWS in my lab. The grain I need is: "public_ipv4" and the reason I cannot use the network.util in salt runner is because this is NAT'd and the box doesn't have a 2nd interface with the public IP assigned to it. (This cannot be changed)
Pillar data works and I have a init.sls for the mine function:
mine_functions:
grains.item:
- location
- environment
- roles
- srvtype
- instance
- az
- public_ipv4
- fqdn
- ipv4
- ipv6
(Also the custom grain: "public_ipv4" works being called by the minion so I know it is the not the grains themselves being incorrect.)
When targeting via the master using the below it brings back the requested information:
my-minion:
----------
minion-with-data-i-want-1:
----------
az:
c
environment:
dev
fqdn:
correct_fqdn
instance:
3
ipv4:
- Correct_local_ip
- 127.0.0.1
ipv6:
- ::1
- Correct_ip
location:
correct_location
public_ipv4:
Correct_public_ip
roles:
Correct_role
srvtype:
None
It is key to note here that the above comes from:
salt '*globbed_target*' mine.get '*minions-with-data-i-need-glob*' grains.item
This is from the master, but I cannot single out a specific grain by using indexing or any args/kwargs etc.
So I put some syntax into a state file and some jinja templates and I cannot get it to work. Here are a few I have tried so far:
Jinja:
{% set ip1 = salt['mine.get']('*minion-with-data-i-need-glob*', 'grains.item')[7] %}
Above returns nothing.
State file:
- context:
- ip1: {{ salt['mine.get']('*minions-with-data-i-need-glob*', 'grains.item') }}
The above returns a dict error:
Context must be formed as a dict
Running latest salt-minion/master from apt.
Steps I have taken:
Running: salt '*' mine.update after every change and checking with: salt '*' mine.valid after every change and they show.
Any help is appreciated.
This looks like you are running into a classic problem. Not knowing what you are getting as the return value.
first your {# set ip1 = salt['mine.get']('*minion-with-data-i-need-glob*', 'grains.item')[7] #} returns nothing because it is a jinja comment. {% set ip1 = salt['mine.get']('*minion-with-data-i-need-glob*', 'grains.item') %}
the next problem you have is that you are passing a list to context. when it is supposed to take a dict. the error isn't even related to mine.
try this instead
- context:
ip1: {{ salt['mine.get']('*minions-with-data-i-need-glob*', 'grains.item') | json}}
next learn to use slsutil.renderer to look at how things are rendered. such as salt minion slsutil.renderer salt://thing/init.sls default_renderer=jinja

How to do arithmetic on registered variables in ansible?

I'm writing a health-check playbook, and when a host is clustered (VCS), I want to make sure all cluster Service Groups are running.
The output of hastatus looks like this:
[root#node1 ~]# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A node1 RUNNING 0
A node2 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B ClusterService node1 Y N ONLINE
B ClusterService node2 Y N OFFLINE
B NFSExport node1 Y N ONLINE
B NFSExport node2 Y N OFFLINE
B Database node1 Y N ONLINE
B Database node2 Y N OFFLINE
B Application node1 Y N OFFLINE
B Application node2 Y N ONLINE
[root#node1 ~]#
A Service Group can run on any cluster node, and the status of every service group is reported for every cluster node, so the actual number of services groups is (servicegroups / nodes).
I've tried with and without the double braces {{ }} , but no matter what, the last debug task always produces a divide by zero error.
Any help would be appreciated.
# START OF BLOCK
- name: Check cluster status
block:
- name: How many cluster nodes?
shell: hastatus -sum|grep "^A"|wc -l
register: numnodes
- name: How many running cluster nodes?
shell: hastatus -sum|grep "^A"|grep "RUNNING"|wc -l
register: numrunningnodes
- name: report if not all nodes are running
debug:
msg: "ACTION: Not all cluster nodes are running!"
when: numnodes.stdout != numrunningnodes.stdout
# The number of cluster Service Groups == totalsgs / numnodes
- name: How many SGs ("B" lines)?
shell: hastatus -sum|grep "^B"|wc -l
register: totalsgs
- name: How many running SGs?
shell: hastatus -sum|grep "^B"|grep "RUNNING"|wc -l
register: runningsgs
- name: Is everything running somewhere?
debug:
msg: "ACTION: Not all SGs are running!"
when: {{ runningsgs.stdout|int }} != {{ totalsgs.stdout|int / numnodes.stdout|int }}
It's the second worst Ansible you can write. If you throw in include_role with when and loop on top of that, you will have the worst worst.
Ansible is not designed to be a good algorithmic language. It's really good at doing side-effects and juggling inventories, but it's terrible at doing math. You can do it, but it will be unreadable, non-debuggable, non-testable, and you will have your variables littered with global variables (which will for sure bite you later when you do not expect it).
How to do it right? The best best way is to write own module. Which is easier than you think (if you know Python).
The second best (which may be even better for small projects than custom module) is to use script or command module.
Just shovel data as input to the stdin script, and get well processed data back from stdout. The main trick is to produce stdout output in json format and parse it with |from_json filter.
This is example of use of command to parse data:
- name: Get data from somewhere
shell: hastatus -sum
changed_when: false
register: hastatus_cmd
- name: Process data
delegate_to: localhost
command:
cmd: process_hastatus_output.py
stdin: '{{ hastatus_cmd.stdout }}'
changed_when: false
register: hastatus_data
- name: Use it
command: ...?...
when: hastatus.running != hastatus.nodes
vars:
hastatus: '{{ hastatus_data.stdout|from_json }}'
For process_hastatus_output.py you can write tests, you can run them without ansible to check of edge cases, and you'll have the beautiful, cosy language to transform your data.
Or, you can do it in mix of Jinja and Ansible, causing irreparable harm to yourself, and everyone reading your code later.

Accessing Mine data immediately after install

I am deploying a cluster via SaltStack (on Azure) I've installed the client, which initiates a reactor, runs an orchestration to push a Mine config, do an update, restart salt-minion. (I upgraded that to restarting the box)
After all of that, I can't access the mine data until I restart the minion
/srv/reactor/startup_orchestration.sls
startup_orchestrate:
runner.state.orchestrate:
- mods: orchestration.startup
orchestration.startup
orchestration.mine:
salt.state:
- tgt: '*'
- sls:
- orchestration.mine
saltutil.sync_all:
salt.function:
- tgt: '*'
- reload_modules: True
mine.update:
salt.function:
- tgt: '*'
highstate_run:
salt.state:
- tgt: '*'
- highstate: True
orchestration.mine
{% if salt['grains.get']('MineDeploy') != 'complete' %}
/etc/salt/minion.d/globalmine.conf:
file.managed:
- source: salt:///orchestration/files/globalmine.conf
MineDeploy:
grains.present:
- value: complete
- require:
- service: rabbit_running
sleep 5 && /sbin/reboot:
cmd.run
{%- endif %}
How can I push a mine update, via a reactor and then get the data shortly afterwards?
I deploy my mine_functions from pillar so that I can update the functions on the fly
then you just have to do salt <target> saltutil.refresh_pillar and salt <target> mine.update to get your mine info on a new host.
Example:
/srv/pillar/my_mines.sls
mine_functions:
aws_cidr:
mine_function: grains.get
delimiter: '|'
key: ec2|network|interfaces|macs|{{ mac_addr }}|subnet_ipv4_cidr_block
zk_pub_ips:
- mine_function: grains.get
- ec2:public_ip
You would then make sure your pillar's top.sls targets the appropriate minions, then do the saltutil.refresh_pillar/mine.update to get your mine functions updated & mines supplied with data. After taking in the above pillar, I now have mine functions called aws_cidr and zk_pub_ips I can pull data from.
One caveat to this method is that mine_interval has to be defined in the minion config, so that parameter wouldn't be doable via pillar. Though if you're ok with the default 60-minute interval, this is a non-issue.

How to run local commands on the salt orchestrator from the orchestrate runner

I'm trying to execute the redis-trib.rb utility (used to configure redis clusters) on the salt orchestrator, after executing salt states on multiple minions to bring up redis processes.
Reading the salt documentation it looks like the the orchestrate runner does what I want to execute the minion states.
Indeed this snippet functions perfectly when executed with sudo salt-run state.orchestrate orch.redis_cluster:
redis_cluster_instances_create:
salt.state:
- tgt: '*redis*'
- highstate: True
The problem is with the next step, which requires me to call redis-trib.rb on the orchestrator. Reading through the documentation it looks like I need to use the salt.runner state (executes another runner), to call the salt.cmd runner (executes a salt state locally), which in turn calls the cmd.run state to actually execute the command.
What I have looks like this:
redis_cluster_setup_masters_{{ cluster }}:
salt.runner:
- name: salt.cmd
- fun: cmd.run
- args:
- srv/salt/orch/redis_cluster/usr/bin/redis-trib.rb create {% for shard in shards %}{{ shard['master'] }} {% endfor %}
- kwargs:
unless: srv/salt/orch/redis_cluster/usr/bin/redis-trib.rb info {{ shards[0]['master'] }} | grep 'cluster_state:ok'
- require:
- salt: redis_cluster_instances_create
But it doesn't work, and salt errors out with:
lab-orchestrator_master:
----------
ID: redis_cluster_instances_create
Function: salt.state
Result: True
Comment: States ran successfully. No changes made to lab-redis04, lab-redis01, lab-redis02, lab-redis03.
Started: 09:54:57.811313
Duration: 14223.204 ms
Changes:
----------
ID: redis_cluster_setup_masters_pdnocg
Function: salt.runner
Name: salt.cmd
Result: False
Comment: Exception occurred in runner salt.cmd: Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/salt/client/mixins.py", line 392, in _low
data['return'] = self.functions[fun](*args, **kwargs)
TypeError: cmd() takes at least 1 argument (0 given)
Started: 09:55:12.034716
Duration: 1668.345 ms
Changes:
Can anyone suggest what i'm doing wrong? Or an alternative way of executing commands locally on the orchestrator?
The problem is, that you are passing fun et. al. to the runner instead of the execution module. Also note that you have to pass the arguments via arg and not args:
redis_cluster_setup_masters_{{ cluster }}:
salt.runner:
- name: salt.cmd
- arg:
- fun=cmd.run
- cmd='srv/salt/orch/redis_cluster/usr/bin/redis-trib.rb create {% for shard in shards %}{{ shard['master'] }} {% endfor %}'
- unless: srv/salt/orch/redis_cluster/usr/bin/redis-trib.rb info {{ shards[0]['master'] }} | grep 'cluster_state:ok'
- require:
- salt: redis_cluster_instances_create
That should do the trick, though I haven't tested it with the kwargs parameter.

In saltstack, how do I conditionally, and iteratively ( jinja ) apply an included state

This may seem at first to be pretty simple. But I can tell you I've been wracking my brains for a couple days on this. I've read a lot of docs, sat on IRC with folks, and spoken to colleagues and at this point I don't have an answer I really think holds up.
I've looked into a few possible approaches
reactor
orchestration runner
I don't like these two because of the top down execution necessity... they seem tailored to orchestrating multiple node states, not workflows in a single node.
custom states
This is kind of something I would REALLY like to avoid as this is a repeated workflow, and I don't want to build customizations like this. There's too much room for non legibility if I go down this path with my team mates.
requires / watches
These don't have a concept ( that I am aware of ) of applying a state repeatedly, or in a logical order / workflow.
And a few others I won't mention.
Without further discussion, here's my dilemma.
Goals:
Jenkins Master gets Deployed
We can unit.test the deployment as it proceeds
We only restart tomcat when necessary
We can update plugins on a per package basis
A big emphasis on good clean intuitively clear salt configs
Jenkins deployment is pretty straight forward. We drop in the packages, and the configs, and we're set.
Unit testing is harder. As an example I've got this state file.
actions/version.sls:
# Hit's the jenkins CLI interface to check for version info
# This can be used to verify that jenkins is active and the version we want
# Import some info
{%- from 'jenkins/init.sls' import jenkins_home with context %}
# Install plugins in jenkins_plugins list
jenkins_version:
cmd.run:
- name: java -jar jenkins-cli.jar -s "http://127.0.0.1:8080" version
- cwd: /var/lib/tomcat/webapps/ROOT/WEB-INF/
- user: jenkins
actions.version basically verifies that jenkins is running and queryable. we want to be sure of this during the build at several points.
example... tomcat takes time to spin up. we had to add a delay to that restart operation. If you check out start.sls below you can see that operation occurring. Note the bug open on init_delay: .
actions/start.sls:
# Starts the tomcat service
tomcat_start:
service.running:
- name: tomcat
- enable: True
- full_restart: True
# Not functional atm see --> https://github.com/saltstack/salt/issues/20631
# - init_delay: 120
# initiate a 120 second delay after any service start to let tomcat come up.
tomcat_wait:
module.run:
- name: test.sleep
- length: 60
include:
- jenkins.actions.version
Now we have this restart capability by doing an actions.stop and an actions.start. We have this actions.version state that we can use to verify that the system is ready to proceed with jenkins specific state workflows.
I want to do something kinda like this...
Install Jenkins --> Grab yaml of plugins --> install plugins that need it
Pretty straight forward.
Except, to loop through the yaml of plugins I am using Jinja.
And now I have no way to call and be sure that the start.sls and version.sls states can be repeatedly applied.
I am looking for, a good way to do that.
This would be something akin to a jenkins.sls
{% set repo_username = "foo" -%}
{% set repo_password = "bar" -%}
include:
- jenkins.actions.version
- jenkins.actions.stop
- jenkins.actions.start
# Install Jenkins
jenkins:
pkg:
- installed
# Import Jenkins Plugins as List, and Working Path
{%- from 'jenkins/init.sls' import jenkins_home with context %}
{%- import_yaml "jenkins/plugins.sls" as jenkins_plugins %}
{%- import_yaml "jenkins/custom-plugins.sls" as custom_plugins %}
# Grab updated package list
jenkins-contact-update-server:
cmd.run:
- name: curl -L http://updates.jenkins-ci.org/update-center.json | sed '1d;$d' > {{ jenkins_home }}/updates/default.json
- unless: test -d {{ jenkins_home }}/updates/default.json
- require:
- pkg: jenkins
- service: tomcat
# Install plugins in jenkins_plugins list
{% for plugin in jenkins_plugins %}
jenkins-plugin-{{ plugin }}:
cmd.run:
- name: java -jar jenkins-cli.jar -s "http://127.0.0.1:8080" install-plugin "{{ plugin }}"
- unless: java -jar jenkins-cli.jar -s "http://127.0.0.1:8080" list-plugins | grep "{{ plugin }}"
- cwd: /var/lib/tomcat/webapps/ROOT/WEB-INF/
- user: jenkins
- require:
- pkg: jenkins
- service: tomcat
Here is where I am stuck. require won't do this. and lists
of actions don't seem to schedule linearly in salt. I need to
be able to just verify that jenkins is up and ready. I need
to be able to restart tomcat after a single plugin in the
iteration is added. I need to be able to do this to satisfy
dependencies in the plugin order.
- sls: jenkins.actions.version
- sls: jenkins.actions.stop
- sls: jenkins.actions.start
# This can't work for several reasons
# - watch_in:
# - sls: jenkins-safe-restart
{% endfor %}
# Install custom plugins in the custom_plugins list
{% for cust_plugin,cust_plugin_url in custom_plugins.iteritems() %}
# manually downloading the plugin, because jenkins-cli.jar doesn't seem to work direct to artifactory URLs.
download-plugin-{{ cust_plugin }}:
cmd.run:
- name: curl -o {{ cust_plugin }}.jpi -O "https://{{ repo_username }}:{{ repo_password }}#{{ cust_plugin_url }}"
- unless: java -jar jenkins-cli.jar -s "http://127.0.0.1:8080" list-plugins | grep "{{ cust_plugin }}"
- cwd: /tmp
- user: jenkins
- require:
- pkg: jenkins
- service: tomcat
# installing the plugin ( REQUIRES TOMCAT RESTART AFTER )
custom-plugin-{{ cust_plugin }}:
cmd.run:
- name: java -jar jenkins-cli.jar -s "http://127.0.0.1:8080" install-plugin /tmp/{{ cust_plugin }}.jpi
- unless: java -jar jenkins-cli.jar -s "http://127.0.0.1:8080" list-plugins | grep "{{ cust_plugin }}"
- cwd: /var/lib/tomcat/webapps/ROOT/WEB-INF/
- user: jenkins
- require:
- pkg: jenkins
- service: tomcat
{% endfor %}
You won't be able to achieve this without using reactors, beacons and especially not without writing your own python execution modules.
Jenkins Master gets Deployed
Write a jenkins execution module in python with a function install(...):. In that function you would manage any dependencies by either calling existing execution modules or by writing them yourself.
We can unit.test the deployment as it proceeds
Inside the install function of the jenkins module you would fire specific events depending on the results of the install.
if not _run_deployment_phase(...):
__salt__['event.send']('jenkins/install/error', {
'finished': False,
'message': "Something failed during the deployment!",
})
You would map that event to reactor sls files and handle it.
We only restart tomcat when necessary
Write a tomcat module. Add an _is_up(...) function where you would check if tomcat is up by parsing the tomcat logs for the result. Call the function inside a state module and add a mod_watch function.
def mod_watch():
# required dict to return
return_dict = {
"name": "Tomcat install",
"changes": {},
"result": False,
"comment": "",
}
if __salt__["tomcat._is_up"]():
return_dict["result"] = True
return_dict["comment"] = "Tomcat is up."
if __opts__["test"]:
return_dict["result"] = None
return_dict["comment"] = "comment here about what will change"
return return_dict
# execute changes now
return return_dict
Use your state module inside a state file.
install tomcat:
tomcat.install:
- name: ...
- user: ...
...
wait until tomcat is up:
cmd.run:
- name: ...
- watch:
- tomcat: install tomcat
We can update plugins on a per package basis
Add a function to your jenkins execution module named install_plugin. View pkg.install code to replicate interface.
A big emphasis on good clean intuitively clear salt configs
Write python execution modules for easy and maintainable configuration logic. Use that execution module inside your own state modules. Inside state files call your own state modules and supply individual configuration with any state renderer you like.
States only execute once, by design. If you need the same action to occur multiple times, you need multiple states. Also, includes are only included a single time.
Rather than all of this include/require stuff you're doing, you should just put all of the code into a single sls file, and generate states through jinja iteration.
If what you're trying to do is add a bunch of plugins, add config files, then at the end do restarts, then you should really just execute everything in order, don't use require, and use listen or listen_in, rather than watch or watch_in.
listen/listen_in cause triggered actions to happen at the end of a state run. They are similar to the concept of handlers in Ansible.
This is a pretty old question, but If you change your Jenkins/tomcat start/stop procedure to be a standard init/systemd/windows service (as all well behaved services should be), you could have a service.running for the Jenkins service and add this to each of your custom-plugin-{{ cust_plugin }} states.
require_in:
- svc: jenkins
watch_in:
- svc: jenkins
You could continue to use the cmd.run module with onchanges. You'd have to add onchanges_in: to each of the custom-plugin-{{ cust_plugin }} states, but you need to have at least one item in the on changes list or the command will fire every time the state runs.
If you use require you cause salt to re-order your states. If you want your states to run in order, just write them in the order you want them to run in.
Watch/watch_in will also re-order your states. If you use listen/listen_in instead, it'll queue the triggered actions to run in the order they were triggered at the end of the state run.
See:
http://ryandlane.com/blog/2014/07/14/truly-ordered-execution-using-saltstack/
http://ryandlane.com/blog/2015/01/06/truly-ordered-execution-using-saltstack-part-2/

Resources