SaltStack events got lost? - salt-stack

We setup a Salt state which will send an event to Salt Master when certain files missing. We also configured a reactor to capture the event and copy missing files accordingly. It is tested successfully on any single minion. However, when we apply the state to multiple minions (number range from 8 to 80+), only a portion (number range from 5 to 40+, randomly) of the minions triggered the reactors. We saw the events all being sent, just reactor never picks up all of them. With no configuration change, if we run the state again, some of the remaining minions (again, number is random) would trigger the reactor and get the missing files. Only after enough re-states, all of the minions will be completed successfully.
We tried to adjust REACTOR_WORKER_THREADS and/or REACTOR_WORKER_HWM, thinking maybe the queue is too small. However, I don't think it helped.

did you try to add the
queue=True
parameter on your state file, look below:
{% if data['tag'] == 'picachu' %}
trigger_picachu:
local.state.apply:
- tgt: {{ data['id'] }}
- arg:
- gitfs.picachu
- queue=True
{% endif %}

Related

How to store a particular node from an output of a state.sls file in a variable for SaltStack

I'm working on a project where I need to check the status of a service, let's call it RunningService on multiple(more than 500 machines) Windows servers. We are using Saltstack extensively for our deployments.
I'm able to check the status of my service using the below code
status_ser:
module.run:
- name: service.status
- m_name: RunningService
The response that I'm getting on running via this particular code is as below:
----------
ID: status_ser
Function: module.run
Name: service.status
**Result: True**
Comment: Module function service.status executed
Started: 16:20:58.295237
Duration: 78.124 ms
Changes:
----------
ret:
True
Summary for minion-3
------------
Succeeded: 1 (changed=1)
Failed: 0
------------
Total states run: 1
Total run time: 78.124 ms
However, I only want the result part(which could be True or False) out of this in a bigger code probably saved in a variable. And then I'll verify
if var == 'False'
then start RunningService
else
do nothing
endif
How can I get only the result of the service as True or False?
Or a more generic question would be how to store one part of the output as variable or input to something else?
Thanks in advance.
To store the output of a command or salt module as variable (the short answer) is that you use set in a Jinja expression. Like:
{% set service_status = salt['service.status']('RunningService') %}
Either True or False will now be stored in service_status. This can be used in conditional statements. Start service taking example from your question:
{% if not service_status %}
start_service:
module.run:
- service.start:
- name: RunningService
{% endif %}
However, there are few things to consider:
All Jinja expression is evaluated before the States are run
Saltstack can be better used to define the state of service, i.e. if you want a particular service to be running, just define it so.
So, its better to use the Salt state service, which does such checks internally. Then this is all the code you'll need.
start_service:
service.running:
- name: RunningService
Now, whenever you run this state. The service will be started if its not running, and nothing will be done if its already running.

Count attempts in airflow sensor

I have a sensor that waits for a file to appear in an external file system
The sensor uses mode="reschedule"
I would like to trigger a specific behavior after X failed attempts.
Is there any straightforward way to know how many times the sensor has already attempted to run the poke method?
My quick fix so far has been to push an XCom with the attempt number, and increase it every time the poke method returns False. Is there any built-in mechanism for this?
Thank you
I had a similar problem when sensor mode = "reschedule", trying to poke a different path to a file based on the current time without directly referencing pendulum.now or datetime.now
I used task_reschedules (as done in the base sensor operator to get try_number for reschedule mode https://airflow.apache.org/docs/apache-airflow/stable/_modules/airflow/sensors/base.html#BaseSensorOperator.execute)
def execute(self, context):
task_reschedules = TaskReschedule.find_for_task_instance(context['ti'])
self.poke_number = (len(task_reschedules) + 1)
super().execute(context)
then self.poke_number can be used within poke(), and current time is approximately execution_date + (poke_number * poke_interval).
Apparently, the XCom thing isn't working, because pushed XComs don't seem to be available between pokes; they always return undefined.
try_number inside task_instance doesn't help either, as pokes don't count as a new try number
I ended up computing the attempt number by hand:
attempt_no = math.ceil((pendulum.now(tz='utc') - kwargs['ti'].start_date).seconds / kwargs['task'].poke_interval)
The code will work fine as long as individual executions of the poke method don't last longer than the poke interval (which they shouldn't)
Best

Manual DAG run set individual task state

I have a DAG without a schedule (it is run manually as needed). It has many tasks. Sometimes I want to 'skip' some initial tasks by changing the task state to SUCCESS manually. Changing task state of a manually executed DAG fails, seemingly because of a bug in parsing the execution_date.
Is there another way to individually setting task states for a manually executed DAG?
Example run below. The execution date of the Task is 01-13T17:27:13.130427, and I believe the milliseconds are not being parsed correctly.
Traceback
Traceback (most recent call last):
File "/opt/conda/envs/jumpman_prod/lib/python3.6/site-packages/airflow/www/views.py", line 2372, in set_task_instance_state
execution_date = datetime.strptime(execution_date, '%Y-%m-%d %H:%M:%S')
File "/opt/conda/envs/jumpman_prod/lib/python3.6/_strptime.py", line 565, in _strptime_datetime
tt, fraction = _strptime(data_string, format)
File "/opt/conda/envs/jumpman_prod/lib/python3.6/_strptime.py", line 365, in _strptime
data_string[found.end():])
ValueError: unconverted data remains: ..130427
It's not working from Task Instances page, but you can do it in another page:
- open DAG graph view
- select needed Run (screen 1) and click go
- select needed task
- in a popup window click Mark success (screen 2)
- then confirm.
PS it relates to airflow 1.9 version
Screen 1
Screen 2
What you may want to do to accomplish this is using branching, which, as the name suggests, allows you to follow different execution paths according to some conditions, just like an if in any programming language.
You can use the BranchPythonOperator (documented here) to attain this goal: the idea is that this operator is configured by a python_callable, a function that outputs the task_id to execute next (which should, of course, be a task which is directly downstream from the BranchPythonOperator itself).
Using branching will set the skipped tasks to the proper state automatically, as mentioned in the documentation:
All other “branches” or directly downstream tasks are marked with a state of skipped so that these paths can’t move forward. The skipped states are propagated downstream to allow for the DAG state to fill up and the DAG run’s state to be inferred.
The resulting DAG would look something like the following:
(source: apache.org)
Branching is documented here, on the official Apache Airflow documentation.

Making a Salt `onchanges` requisite dependent on what has changed

I want to execute a Salt state not always when changes happened in another state, but only for specific changes. This appears like I would have to make onchanges/onchanges_in dependent on the specific changes.
The respective bug report has been closed saying "this is totally resolved now that states have access to the running dict and the lowstate for a state run". However, I can find no documentation on that and hardly any explanation of what the "running dict" actually is.
So I guess the question could also be rephrased as "How do I access the 'running dict' in an onchanges requisite?", but I'm open to any solutions for the original problem. Thanks for your help!
Update: A comment asked for a specific example, so here is my use case: As most state modules, user.present may either update fields of an existing (user) object or create a new one. Then, I want to run a second state module if and only if a specific field has been changed and/or the object has just been created. In Ansible, for comparison, I would register a variable and access the module's result through it.
So, why would I want to do that?
Essentially, I want to create user accounts on Linux and have them be able to set their own password (when logged in via an SSH key). user.present supports empty_password for that purpose, but it doesn't play nicely with enforce_password. This means that after a password has been manually set, a repeated state run will clear that password again. One might even consider this a bug in Salt, but the interactions between the different user.present fields are convoluted and debatable.
My solution is to create the accounts first and run a module.run state executing shadow.del_password afterwards. This is realised through an onchanges_in requisite. However, password deletion should not be triggered for any change, but only when the user account is created, which is also the only case my user.present state touches the password at all. Otherwise, things like adding users to a group would clear their password. For that effect, I think I would have to look into the details of the user.present change.
Create user account for dummy:
user.present:
- name: dummy
- gid_from_name: True
- remove_groups: False
# TODO: This should be made more specific
- onchanges_in:
- module: Allow dummy to set a password
Allow dummy to set a password:
module.run:
- name: shadow.del_password
- m_name: dummy
# Make sure that this is not executed accidentally if no `onchanges_in` is present
- onchanges: []
- require:
- user: Create user account for dummy
I don't know about specific onchanges or the 'running dict', but, for your particular use case, you can use a condition to enable your password clearing state only when needed, such as:
Create user account for dummy:
user.present:
- name: dummy
- gid_from_name: True
- remove_groups: False
{% if salt['user.info']('dummy') == {} %}
# Only clear the password if the account didn't exist before
Allow dummy to set a password:
module.run:
- name: shadow.del_password
- m_name: dummy
- require:
- user: Create user account for dummy
{% endif %}
I think what you want to use in this case is module.wait, not module.run. module.wait by default will not do anything, unless asked by something else. Also, onchanges_in for some reason (I think this issue) doesn't play well with module.wait for me. I've tried watch_in and it did the job.
I've tried the following code and it seem to work just fine. It creates a user with an empty password and doesn't change anything if user is already there:
Create user account for dummy:
user.present:
- name: dummy
- gid_from_name: True
- remove_groups: False
# TODO: This should be made more specific
- watch_in:
- module: Allow dummy to set a password
Allow dummy to set a password:
module.wait:
- name: shadow.del_password
- m_name: dummy
- require:
- user: Create user account for dummy

Salt States and grain values

I added the following logic to my state file which basically sets a new grain value after installing a utility called agent for the first time.
{% if salt['grains.get']('agent') != 'installed' %}
..............
agent_status:
grains.present:
- name: agent
- value: installed
{% endif %}
The first time I run salt 'server1' state.highstate it returns the following which is what I expect:
----------
ID: agent_status
Function: grains.present
Name: agent
Result: True
Comment: Set grain agent to installed
Started: 16:03:27.083578
Duration: 709.795 ms
Changes:
----------
agent:
installed
When I subsequently run salt 'server1' state.highstate it returns:
server1:
----------
ID: states
Function: no.None
Result: False
Comment: No states found for this minion
Started:
Duration:
Changes:
Summary
------------
Succeeded: 0
Failed: 1
Is this the correct behaviour as I'm a little confused because I would have expected this not to show as a failed? Also, the comment is seems to be a bit misleading here.
Yeah, this is the correct behavior. What's happening is that first Salt renders the jinja. Since the second time you run this the grain exists, your minion is seeing an empty sls file. Hence the "No states found for this minion"
Edit: If you want to avoid getting the "No states found for this minion" error, you could add an innocuous state at the bottom, outside the jinja like this
/tmp/deletemeplease.txt:
file.absent

Resources