Salt - Multi master of masters

Salt - Multi master of masters - salt-stack

I modified my salt architecture from one salt master to multiple salt-master/syndic.
I set a high level master of masters where syndics are connected, via syndic_master.
It works well, when I run salt '*' test.ping, minions from differents masters are returned.
Now I would like to add a second master of masters, my syndic config is now like that
id: salt-syndic1
syndic_master:
- 10.30.2.37
- 10.30.2.38
If I now run salt '*' test.ping on both master of masters, returns seems to be split, a mom returns minions from one syndic and the other from other syndic. For minions which did not respond to each not, I get this error :
Minion did not return. [No response]
The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:
salt-run jobs.lookup_jid 20201119145521842618
So we can see that command is well sent to all minions from both mom, but just one syndic returns result per mom.
I set master_id configs only on master of masters servers.
I also test to share jobs cache between moms without success.

You can try this in your syndic config
syndic_forward_all_events: True
Ref: https://docs.saltstack.com/en/latest/ref/configuration/master.html#syndic-forward-all-events

Related

salt-stack multi-master setup - slow and unreliable. What am I doing wrong?

I have to manage a cluster of ~600 ubuntu (16.04-20.04) servers using Saltstack 3002.
I decided to install a multi-master setup for load-distribution and fault-tolerance. salt-syndic appeared not the right choice for me. Instead I thought the salt-minions should pick a master from a list by random (?) at minion start. So my config looks as follows (excerpts):
master:
auto_accept: True
master_sign_pubkey: True
master_use_pubkey_signature: True
minion:
master:
- saltmaster001
- saltmaster002
- saltmaster003
verify_master_pubkey_sign: True
retry_dns: 0
master_type: failover
random_master: True
(three salt masters as you can see). I basically followed this tutorial: https://docs.saltstack.com/en/latest/topics/tutorials/multimaster_pki.html
Now, it doesn't work really well... For various reasons:
salt 'tnscass*' test.ping
tnscass011.mo-mobile-prod.ams2.cloud:
True
tnscass010.mo-mobile-prod.ams2.cloud:
True
tnscass004.mo-mobile-prod.ams2.cloud:
True
tnscass005.mo-mobile-prod.ams2.cloud:
Minion did not return. [Not connected]
tnscass003.mo-mobile-prod.ams2.cloud:
Minion did not return. [Not connected]
tnscass007.mo-mobile-prod.ams2.cloud:
Minion did not return. [Not connected]
Salt runs on the master work only if the targeted minions by accident are connected to the master on which you issue the salt command and not to any other master. In the above example the response would be True for different minions if you ran it on a different master.
So the only way is to use salt-call on a particular minion. Not very useful. And even that is not working well, e.g.:
root#minion:~# salt-call state.apply
[WARNING ] Master ip address changed from 10.48.40.93 to 10.48.42.32
[WARNING ] Master ip address changed from 10.48.42.32 to 10.48.42.35
So the minion decides to switch to another master and the salt-call takes ages... The rules that determine under which condition a minion decides to switch are not explained (at least I couldn't find anything)... Is it the load on the master? The number of connected minions?...
Another problem is the salt mines. I'm using code as follows:
salt.saltutil.runner('mine.get', tgt='role:mopsbrokeraggr', fun='network.get_hostname', tgt_type='grain')
Unfortunately, the values of the mines differ badly from minion to minion, so also mines are unusable.
I should mention that my masters are big machines with 16 cores and 128GB RAM, so this is not a matter of resource shortage.
To me, the scenario described in https://docs.saltstack.com/en/latest/topics/tutorials/multimaster_pki.html does simply not work at all.
So if anybody could tell me how to create a proper setup with 3 saltmasters for load distribution?
Is salt-syndic actually the better approach?
Can salt-syndic be used with randomly assigning the minions to the masters based on load or whatever?
what is the purpose of the mentioned tutorial? Or do I just have overlooked anything?

There are a couple of statements worth noticing in the documentation about this method. Quoting from the link in the question:
The first master that accepts the minion, is used by the minion. If the master does not yet know the minion, that counts as accepted and the minion stays on that master.
Then
A test.version on the master the minion is currently connected to should be run to test connectivity.
So this seems to indicate that the minion is connected to one master at a time. Which means only that master can run test.version on that minion (and not any other master).
One of the primary objectives of your question can be met with a different method of multi-master setup: https://docs.saltproject.io/en/latest/topics/tutorials/multimaster.html
In a nutshell, you configure more than 1 master with the same PKI keypair. In the below explanation I have a multi-master setup with 2 servers. I use the below files from my first/primary server on the second server.
/etc/salt/pki/master/master.pub
/etc/salt/pki/master/master.pem
Then configure salt-minion for multiple masters in /etc/salt/minion:
masters:
- master1
- master2
Once the respective services have been restarted, you can check that all minions are available on both masters with salt-key -L:
# salt-key -L
Accepted Keys:
Denied Keys:
Unaccepted Keys:
minion1
minion2
minion3
...
Rejected Keys:
Once all minions' keys are accepted on both masters, we can run salt '*' test.version from either of the masters and reach all minions.
There are other considerations on how to keep the file_roots, pillar_roots, minion keys, and configuration consistent between the masters in the link referenced above.

How to implement a state that wait for other minions finishing certain jobs then execute certain state?

How to implement a state that wait for other minions finishing certain jobs then execute certain state?
For example, I have a cluster of minions called minion-aha1 to minion-aha3, and I install hadoop and hbase on these 3 minions. Now, I would like to convert them to HA mode. Suppose minion-aha1 is the leader. So the logic flow would be:
Start hadoop and hbase on all 3 minions
-> minion-aha1 wait till rest of minions are hadoop and hbase are on and healthy
-> minion-aha1 call join (e.g. stop namenode, hdfs namenode -initializeSharedEdits, start namenode)
-> rest minions call nn2 (e.g. hdfs namenode -bootstrapStandby, start namenode)
I already knew how to convert hbase to HA mode, and I could set the leader in grain, just curious on how to shrink above procedure to single-line, i.e.
salt 'minion-aha*' state.apply hadoop.hbase_to_ha
Or even salt.orch state would be acceptable. The above would fail due to minion-aha1 never know the state of rest of minions. In other words, it might run successfully once if the developer is lucky, but I look for the solution would run successfully every time.
Thank you.

If you want to solve this without making use of an Orchestration SLS, you could look at one of the following approaches:
use the Salt Mine to publish information from a Minion to the Master, which can then be retrieved by another one
use Peer Communication to allow one Minion to generate a job to be executed on another one

Basically I benefit from this answer with some modification:
Trigger event on Master and wait for "response event" on Salt Minion
For custom event sent to the salt master, e.g. mycompany/hbase/status/*/start, I have to send event -> saltutil.sync_all, then the wait_for_event.

Salt multi master: does it work with multiple masters offline

I am trying to run a multi-master setup in our dev environment.
The idea is that every dev team has their own salt master. However, all minions in the entire dev environment should be able to receive salt commands from all salt master servers.
Since not every team needs their salt master 24/7, most of them are turned off for several days during the week.
I'm running 2016.11.4 on the masters, as well as on the minions.
However, I run into the following problem: If one of the hosts that are listed in the mininons config file is shut down, the minion will not always report back on a 'test.ping' command (not even with -t 60)
My experience is, that the more master servers are offline, the longer the lag of the minion is to answer requests.
Especially if you execute a 'test.ping' on MasterX while the minions' log is at this point:
2017-05-19 08:31:44,819 [salt.minion ][DEBUG ][5336] Connecting to master. Attempt 4 (infinite attempts)
If I trigger a 'test.ping' at this point, chances are 50/50 that I will get a 'minion did not return' on my master.
Obviously though, I always want a return to my 'test.ping', regardless from which master I send it.
Can anybody tell me if what I try is feasible with salt? Because all the articles about salt multi master setup that I could find would only say: 'put a list of master servers into the minion config and that's it!'

The comment from gtmanfred solved my question:
That is not really the way multi master is meant to work. It is supposed to be used more for failover and not for separating out teams.

Salt Internals, connections from master to minion

I just have a question I can't figure out about saltstack. It concerns the mater and the minion configuration.
Salt is even driven, but the documention says (and it works) we should only open port on the master, and that event are received on the master.
However it seems a little ambiguous as the salt command is run from master to execute task on minions. But I'm unsuccessfuly trying to understand how the master does that, and I can't a clear documention about it.
And we also have these statements in salt documention architechure :
More Salt Master facts:
Job publisher with pub/sub and reply channel;
Two open ports on master (default 4505 / 4506);
Salt Mine stores the most recent miniondata, cached data is visible
to other minions;
Salt Syndic passes jobs from a higher master for hierarchal system
management;
Multi-master for SaltStack high availability.
and this
More Salt Minion facts: Listens and receives jobs from a remote Salt
Master;
Creates and reports compressed events (job results, system alerts) to
the Salt Master;
No open ports, not chatty;
Shares data with other Salt Minions via the peer system;
Returners deliver minion data to any system (Salt Master by default)
that can receive the data.
I've highlighted what is ambiguous for me in the attached screenshot.
The question beeing how can we say the no port is to be opened on the minions and also say minions are listerning from master?
Minions listen on what? To what?
Thanks for clarifications.

Good question here. By default, Salt uses a zmq pub/sub interface. So there is a slight mismatch between what's literally happening on the network and most people's mental model of how Salt works.
The zmq connection just needs those 2 ports on the Salt master to allow for the pub/sub interface to work. The minion reaches out to the master on pub port. Zmq just handles all the necessary network communication for you. The Salt Master "publishes" jobs on the pub port.
As far as a mental model of how Salt works, it's helpful to think of the minion "listening" on the pub port and executing commands when the Salt Master publishes a job on the pub port when the minion matches the job target.

Assign role by running process

I'd like to assign a role to various minions based on a process that is running. For example I can do this
salt '*' cmd.run 'ps ax |grep [n]amed'
This returns servers that are running bind, but since it runs against everyone (*) not only do I get the dns servers I also get everyone else, albeit with blank return data. Is there a way to only return the servers where this is true, and then pipe it to grains.setval role nameserver?
Thanks!

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex