Salt multi master: does it work with multiple masters offline - salt-stack

I am trying to run a multi-master setup in our dev environment.
The idea is that every dev team has their own salt master. However, all minions in the entire dev environment should be able to receive salt commands from all salt master servers.
Since not every team needs their salt master 24/7, most of them are turned off for several days during the week.
I'm running 2016.11.4 on the masters, as well as on the minions.
However, I run into the following problem: If one of the hosts that are listed in the mininons config file is shut down, the minion will not always report back on a 'test.ping' command (not even with -t 60)
My experience is, that the more master servers are offline, the longer the lag of the minion is to answer requests.
Especially if you execute a 'test.ping' on MasterX while the minions' log is at this point:
2017-05-19 08:31:44,819 [salt.minion ][DEBUG ][5336] Connecting to master. Attempt 4 (infinite attempts)
If I trigger a 'test.ping' at this point, chances are 50/50 that I will get a 'minion did not return' on my master.
Obviously though, I always want a return to my 'test.ping', regardless from which master I send it.
Can anybody tell me if what I try is feasible with salt? Because all the articles about salt multi master setup that I could find would only say: 'put a list of master servers into the minion config and that's it!'

The comment from gtmanfred solved my question:
That is not really the way multi master is meant to work. It is supposed to be used more for failover and not for separating out teams.

Related

salt-stack multi-master setup - slow and unreliable. What am I doing wrong?

I have to manage a cluster of ~600 ubuntu (16.04-20.04) servers using Saltstack 3002.
I decided to install a multi-master setup for load-distribution and fault-tolerance. salt-syndic appeared not the right choice for me. Instead I thought the salt-minions should pick a master from a list by random (?) at minion start. So my config looks as follows (excerpts):
master:
auto_accept: True
master_sign_pubkey: True
master_use_pubkey_signature: True
minion:
master:
- saltmaster001
- saltmaster002
- saltmaster003
verify_master_pubkey_sign: True
retry_dns: 0
master_type: failover
random_master: True
(three salt masters as you can see). I basically followed this tutorial: https://docs.saltstack.com/en/latest/topics/tutorials/multimaster_pki.html
Now, it doesn't work really well... For various reasons:
salt 'tnscass*' test.ping
tnscass011.mo-mobile-prod.ams2.cloud:
True
tnscass010.mo-mobile-prod.ams2.cloud:
True
tnscass004.mo-mobile-prod.ams2.cloud:
True
tnscass005.mo-mobile-prod.ams2.cloud:
Minion did not return. [Not connected]
tnscass003.mo-mobile-prod.ams2.cloud:
Minion did not return. [Not connected]
tnscass007.mo-mobile-prod.ams2.cloud:
Minion did not return. [Not connected]
Salt runs on the master work only if the targeted minions by accident are connected to the master on which you issue the salt command and not to any other master. In the above example the response would be True for different minions if you ran it on a different master.
So the only way is to use salt-call on a particular minion. Not very useful. And even that is not working well, e.g.:
root#minion:~# salt-call state.apply
[WARNING ] Master ip address changed from 10.48.40.93 to 10.48.42.32
[WARNING ] Master ip address changed from 10.48.42.32 to 10.48.42.35
So the minion decides to switch to another master and the salt-call takes ages... The rules that determine under which condition a minion decides to switch are not explained (at least I couldn't find anything)... Is it the load on the master? The number of connected minions?...
Another problem is the salt mines. I'm using code as follows:
salt.saltutil.runner('mine.get', tgt='role:mopsbrokeraggr', fun='network.get_hostname', tgt_type='grain')
Unfortunately, the values of the mines differ badly from minion to minion, so also mines are unusable.
I should mention that my masters are big machines with 16 cores and 128GB RAM, so this is not a matter of resource shortage.
To me, the scenario described in https://docs.saltstack.com/en/latest/topics/tutorials/multimaster_pki.html does simply not work at all.
So if anybody could tell me how to create a proper setup with 3 saltmasters for load distribution?
Is salt-syndic actually the better approach?
Can salt-syndic be used with randomly assigning the minions to the masters based on load or whatever?
what is the purpose of the mentioned tutorial? Or do I just have overlooked anything?
There are a couple of statements worth noticing in the documentation about this method. Quoting from the link in the question:
The first master that accepts the minion, is used by the minion. If the master does not yet know the minion, that counts as accepted and the minion stays on that master.
Then
A test.version on the master the minion is currently connected to should be run to test connectivity.
So this seems to indicate that the minion is connected to one master at a time. Which means only that master can run test.version on that minion (and not any other master).
One of the primary objectives of your question can be met with a different method of multi-master setup: https://docs.saltproject.io/en/latest/topics/tutorials/multimaster.html
In a nutshell, you configure more than 1 master with the same PKI keypair. In the below explanation I have a multi-master setup with 2 servers. I use the below files from my first/primary server on the second server.
/etc/salt/pki/master/master.pub
/etc/salt/pki/master/master.pem
Then configure salt-minion for multiple masters in /etc/salt/minion:
masters:
- master1
- master2
Once the respective services have been restarted, you can check that all minions are available on both masters with salt-key -L:
# salt-key -L
Accepted Keys:
Denied Keys:
Unaccepted Keys:
minion1
minion2
minion3
...
Rejected Keys:
Once all minions' keys are accepted on both masters, we can run salt '*' test.version from either of the masters and reach all minions.
There are other considerations on how to keep the file_roots, pillar_roots, minion keys, and configuration consistent between the masters in the link referenced above.

Salt - Reach all minions in multimaster mode

I'm migrating salt to salt multimaster.
So in minions config I have my master list, with some multimaster parameters.
I see each master have his connected minions and can talk only with its.
I actually have jobs which send salt commands to my master to run tasks on some minions.
With multimaster I will need to connect on each master and run command if I want to reach all desired minions.
Is there a way to run commands to all minions from only one host ?
You can use syndics and have another master on top.
https://docs.saltstack.com/en/latest/topics/topology/syndic.html#syndic
this way the minion will connect to your normal Master/s where Syndic/s is also installed and can failover to any of them. Syndic/s (another form of a special minion) will connect to MoM (Master of Masters) and you can push commands to all your masters. You can also have multiple MoMs which syndic/s are always connected to.
This offers HA for your minions, masters/syndics and masters of masters.
It has some performance impact if you plan to deploy a large number of minions and might require tuning many options.
Unfortunately the Syndic architecture is not officially supported by Saltstack the time that I'm writing this but they welcome community patches. The main reason is because they provide HA through their Enterprise product and want customers purchasing the HA option rather than getting it for free.
The latest is just my personal opinion based on PRs/Feature Requests I have seen being rejected or dismissed to be developed by the Saltstack team. I don't think they made any official announcement about this.

Can a salt master provide a state to unavailable minions?

Background: I have several servers which run a service I develop. All of them should have the same copy of the service.
To ensure deployment and up-to-dateness I use Ansible, with an idempotent playbook which deploys the service. Since the servers are on an unreliable network, I have to run the playbook periodically (in a cron job) to reach the servers which may not have been available before.
Problem: I was under the impression that the SaltStack philosophy is different: I though I could just "set a state, compile it and offer to a set of minions. These minions would then, at their own leisure, come to the master and get whatever they need to do".
This does not seem to be the case: the minions which were not available at deployment time are skipped.
Question: is there a mechanism which would allow for an asynchronous deployment, in the sense that a state set on the master one time only would then be pulled and applied by the minions (to themselves) once they are ready / can reach the master?
Specifically, without the need to continuously re-offer the same state to all minions, in the hope that the ones which were unavailable in the past are now capable to get the update.
Each time a minion connects to the master there is an event on the event bus which you can react upon.
Reactor
This is the main difference between Ansible and Saltstack.
In order to do what you want, I would react on each minion's reconnect and try to apply a state which is idempotent.
Idempotent
You could also setup a scheduled task in Saltstack that runs the state every X minutes and apply the desired configuration.
Scheduled task
The answer from Daniel Wallace (salt developper):
That is not possible.
The minions connect to the publish port/bus and the master puts new
jobs on that bus. Then the minion picks it up and runs the job, if the
minion is not connected when the job is published, then it will not
see the job.

SaltStack File Server Access Control

I am trying to have different security levels for different minions. I already have different pillars, so a secret ssh key for one minion can not be seen from another.
What I want to attain is: that an easy-to-attack minion, say an edge cloud server run by someone else, cannot download or even see the software packages in the file-roots that I am installing on high-security minions in my own data center.
It appears that the Salt file server, apart from overloaded filenames existing in multiple environments, will serve every file to every minion.
It does not seem that this is possible in any way, using environments, pillars, or clever file-root includes to make certain files inaccessible to a particular minion?
By design the salt file server will serve every file to every minion.
There is something you could do to work around this.
Use a syndic. A minion can only see the file_roots of the master it is directly attached to, so you could have your easy-to-attack minions connect to a specific syndic, but you could still control them from the top level master that the rest of your minions connect directly to.

What happens if orchestration triggers a salt-master service restart?

Suppose I have an orchestration file that runs the salt-formula's salt.master state, among others. Suppose also that I've made some pillar change that results in an update to the master's config file, which in turn causes the salt-master service to restart.
What happens to the rest of the orchestration run? In particular, what happens if the config change is to something like GitFS remotes, where new files may be available to minions after the salt.master state runs?
Once the salt master service restarts, a highstate stops dead in its tracks. There is no built in way to for a highstate to keep state across salt-master restarts. There are some workarounds where you set a flag on the file system or in a grain and have your highstate check for those flags.
That being said, if you're using the state.orchestrate or state.over runners, those aren't necessarily dependent on the salt-master daemon. I haven't tested this, but the state.orchestrate should most likely continue even if the salt-master daemon restarts.
I may have some time this afternoon to test, but I'd recommend just testing this in your environment.

Resources