saltstack return inconsistent errors - salt-stack

Saltstack (version=3004) has recently been returning a variety of errors on different SLS files without those files having changed recently. Different runs complain of different files, or simply succeed. This happens across our fleet of 20 hosts, not just on one host. We're using salt-call in a master-less context on ubuntu 20.04 LTS hosts.
The key point is that re-running salt-call usually succeeds without problem. If it doesn't, the next run will. And maybe the run after that will fail, with nothing having changed in our SLS repository. No law of the universe seems to require these failures before successes, it's more like some random roll of dice.
Needless to say, looking at the SLS files at the point indicated has so far been fruitless.
Some examples:
[myhost.example.com] sudo: salt-call --local state.highstate
[myhost.example.com] out: sudo password:
[myhost.example.com] out: [CRITICAL] Rendering SLS 'base:dulcia' failed: while parsing a block node
[myhost.example.com] out: did not find expected node content
[myhost.example.com] out: in "<unicode string>", line 148, column 17
[myhost.example.com] out: local:
[myhost.example.com] out: Data failed to compile:
[myhost.example.com] out: ----------
[myhost.example.com] out: Rendering SLS 'base:dulcia' failed: while parsing a block node
[myhost.example.com] out: did not find expected node content
[myhost.example.com] out: in "<unicode string>", line 148, column 17
Another:
[otherhost.example.com] sudo: salt-call --local state.highstate
[otherhost.example.com] out: sudo password:
[otherhost.example.com] out: [CRITICAL] Rendering SLS 'base:dulcia' failed: did not find expected comment or line break
[otherhost.example.com] out: local:
[otherhost.example.com] out: Data failed to compile:
[otherhost.example.com] out: ----------
[otherhost.example.com] out: Rendering SLS 'base:dulcia' failed: did not find expected comment or line break
Yet another:
[host-3.example.com] sudo: salt-call --local state.highstate
[host-3.example.com] out: sudo password:
[host-3.example.com] out: [CRITICAL] Rendering SLS 'base:sftp' failed: while parsing a block node
[host-3.example.com] out: did not find expected node content
[host-3.example.com] out: in "<unicode string>", line 235, column 17
[host-3.example.com] out: local:
[host-3.example.com] out: Data failed to compile:
[host-3.example.com] out: ----------
[host-3.example.com] out: Rendering SLS 'base:sftp' failed: while parsing a block node
[host-3.example.com] out: did not find expected node content
[host-3.example.com] out: in "<unicode string>", line 235, column 17
[host-3.example.com] out:
Or even
[host-3.example.com] sudo: salt-call --local state.highstate
[host-3.example.com] out: sudo password:
[host-3.example.com] out: [CRITICAL] Rendering SLS 'base:sftp' failed: did not find expected alphabetic or numeric character
[host-3.example.com] out: local:
[host-3.example.com] out: Data failed to compile:
[host-3.example.com] out: ----------
[host-3.example.com] out: Rendering SLS 'base:sftp' failed: did not find expected alphabetic or numeric character
[host-3.example.com] out:
I'm at a complete loss on this, and it's more than tricky to debug, because more than half the time it doesn't happen.

so the good news. all of these errors are yaml rendering errors not jinja. they could be caused by jinja rendering but the jinja is finishing it's render cycle without throwing an error. Most likely some value is not getting set to what you think it is. or jinja is not pulling the right value when it should be. maybe a pillar variable is not being set in the masterless config that should be and then on next run it is set. or pillar is taking to long to render.
The easiest way to start debugging this is to start rendering the jinja and validating the resulting yaml. This can be done with slsutil.renderer
salt-call slsutil.renderer salt://sftp/init.sls default_renderer=jinja
Since the problem is intermittent you are going to have to keep rendering the states that seem to fail most often over and over until they fail. maybe try after a long pause and before running a high-state.
Another thing that can help is inserting logging into jinja. this can be done with simple {% do salt["log.info"]('string to log') %} I find this can be useful in issues where the jinja isn't rendering right.
also run the highstate with -l debug. it will show the rendering of each yaml file as it goes through. so you can see what is happening and see the errors as they are happening.

Related

Odd error when attempting net_put via Ansible

Looking for assistance with an odd error I am troubleshooting with a playbook.
I have a working SSH session to a switch, but having difficulty with transferring files via SCP on Ansible. I can start a SCP session directly from the same server with no issues and can transfer a text file (the same one references below) but it does not seem to work in Ansible.
I enabled verbose logging via Ansible and this is what I am seeing in the logfile generated.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/ansible/utils/jsonrpc.py", line 46, in handle_request
result = rpc_method(*args, **kwargs)
File "/root/.ansible/collections/ansible_collections/ansible/netcommon/plugins/connection/network_cli.py", line 1282, in copy_file
self.ssh_type_conn.put_file(source, destination, proto=proto)
File "/root/.ansible/collections/ansible_collections/ansible/netcommon/plugins/connection/libssh.py", line 498, in put_file
raise AnsibleError(
ansible.errors.AnsibleError: Error transferring file to flash:test.txt: Initializing SCP session of remote file [flash:test.txt] for w>
2022-10-06 11:58:35,671 p=535932 u=root n=ansible | fatal: [%remoteSwitch%]: FAILED! => {
"changed": false,
"destination": "flash:test.txt",
"msg": "Exception received: Error transferring file to flash:test.txt: Initializing SCP session of remote file [flash:test.txt] fo>
}
Afraid Google is not helping me too much with this one. If it helps, this is on Ubuntu 22.04, with Ansible 2.10.8.
Play attempting to be ran is:
- hosts: %remoteSwitch%
vars:
- firmware_image_name: "test.txt"
tasks:
- name: Copying image to the switch... This can take time, please wait...
net_put:
src: "/etc/ansible/firmware_images/C2960X/{{ firmware_image_name }}"
dest: "flash:{{ firmware_image_name }}"
vars:
ansible_command_timeout: 20
protocol: scp
I ran into this same error. I was able to fix it by switching back to Paramiko SSH. This can be accomplished by either pip uninstall ansible-pylibssh (note, this very likely has other side-effects).
Alternatively, you can force Paramiko usage at the Ansible play level:
---
- name: Test putting a file onto Cisco IOS/IOS-XE device
hosts: cisco1
# ansible-pylibssh errors out here (force paramiko usage)
vars:
ansible_network_cli_ssh_type: paramiko
tasks:
- name: Copy file
ansible.netcommon.net_put:
src: my_file1.txt
dest : flash:/my_file1.txt
protocol: scp
It would be helpful to know what type of connection it is and what platform.
I see from the file that it is a Cisco IOS device. Do you have the following settings?
ansible_connection: ansible.netcommon.network_cli
ansible_network_os: cisco.ios.ios
The following documentation mentions the need for paramiko. Will it work if you change the ssh_type to paramiko?
https://docs.ansible.com/ansible/latest/collections/ansible/netcommon/net_put_module.html
ssh_type can be set as follows:
configuration:
INI entry:
[persistent_connection]
ssh_type = paramiko
Environment variable: ANSIBLE_NETWORK_CLI_SSH_TYPE
Variable: ansible_network_cli_ssh_type
Variable: ansible_network_cli_ssh_type

Minion cannot find file on master

On Minion:
ID: run_snmpv3_config
Function: file.managed
Name: /tmp/run_snmpv3_config_cmd.sh
Result: False
Comment: Source file salt://files/run_snmpv3_config_cmd.sh not found in saltenv 'base'
Started: 15:11:56.175325
Duration: 27.084 ms
Changes:
On master we confirm that the minion does in fact see the file:
master # salt minion cp.list_master | grep snmp
- files/run_snmpv3_config_cmd.sh
So why isn't it able to get it?
(In fact I wanted to use cmd.script but that errors out with Unable to cache script, so I tried to just copy the file, which doesn't work either as we see above.)
I called the state for debugging purposes on a client system using
salt-call --local state.apply teststate -l debug
Of course in this case it will look for file salt://x inside /srv/salt (or whatever the minion's config is) on the minion and not the master....

Unable to apply State files on Salt

Having issues applying state files to minions on salt, they're just basic test ones, nothing complicated.
In my master config file I have the following file roots definition:
file_roots:
base:
- /srv/salt/
My /srv/salt/top.sls file looks like this:
base:
'*':
- vim
Then at /srv/salt/vim/init.sls I have the following:
vim:
pkg.installed
So, that should be applied to all minions when applied, so I run the following:
sudo salt '*' state.apply
I get the following output, and it's not applied, as it seems to not be detecting the top.sls file?
salt-master-1:
----------
ID: states
Function: no.None
Result: False
Comment: No Top file or master_tops data matches found.
Changes:
Summary for salt-master-1
------------
Succeeded: 0
Failed: 1
------------
Total states run: 1
Total run time: 0.000 ms
dev-docker-1:
----------
ID: states
Function: no.None
Result: False
Comment: No Top file or master_tops data matches found.
Changes:
Summary for dev-docker-1
------------
Succeeded: 0
Failed: 1
------------
Total states run: 1
Total run time: 0.000 ms
ERROR: Minions returned with non-zero exit code
If I look at the logs for the minion, dev-docker-1 nothing is logged as an error, all that I see is this.
2018-11-08 18:33:12,993 [salt.minion :1429][INFO ][4883] User sudo_salt Executing command state.apply with jid 20181108183312990343
2018-11-08 18:33:13,015 [salt.minion :1564][INFO ][5438] Starting a new job with PID 5438
2018-11-08 18:33:13,331 [salt.state :933 ][INFO ][5438] Loading fresh modules for state activity
2018-11-08 18:33:13,448 [salt.minion :1863][INFO ][5438] Returning information for job: 20181108183312990343
Any help greatly appreciated as I'm a bit lost as to why this isn't working . . .
Edit 1
I have enabled verbose logging on the minion, and I see the following, seems it can't see the top.sls file
[DEBUG ] Could not find file 'salt://top.sls' in saltenv 'base'
[DEBUG ] No contents loaded for saltenv 'base'
[DEBUG ] No contents found in top file. If this is not expected, verify that the 'file_roots' specified in 'etc/master' are accessible. The 'file_roots' configuration is: {u'base': []}
Ok so I worked this out, operator error.
I had enabled gitfs backend in the config file, which has overridden the default base file system, so I just needed to do.
fileserver_backend:
- gitfs
- base
Doh!

Run downloaded SaltStack formula

I've downloaded the PHP formula by following the instructions here: https://docs.saltstack.com/en/latest/topics/development/conventions/formulas.html
I've changed apache to php. In my salt config file (which I assume is /etc/salt/master), I've set file_roots like so:
file_roots:
base:
- /srv/salt
- /srv/formulas/php-formula
I don't know how I'm supposed to run it now. I've successfully run a salt state file by discovering that the documentation is incomplete, so I'd missed a step I wasn't aware of.
If I try to run the formula the same way I've been running the state, I just get errors.
salt '*' state.apply php-formula
salt-minion:
Data failed to compile:
----------
No matching sls found for 'php-formula' in env 'base'
ERROR: Minions returned with non-zero exit code
I've also tried: sudo salt '*' state.highstate, and it also has errors:
salt-minion:
----------
ID: states
Function: no.None
Result: False
Comment: No Top file or master_tops data matches found.
Changes:
Summary for salt-minion
------------
Succeeded: 0
Failed: 1
------------
Total states run: 1
Total run time: 0.000 ms
ERROR: Minions returned with non-zero exit code
You have to add a top.sls file to /srv/salt/, not just in /srv/pillar/. If you have a file called /srv/salt/php.sls, you have remove it, otherwise it will interfere with /srv/pillar/php.sls.
Contents of /srv/salt/top.sls:
base:
'*':
- php
This is kind of bizarre, because my previous test (which wasn't a formula) used /srv/salt/php.sls and /srv/pillar/top.sls. Now I'm using /srv/pillar/php.sls and /srv/salt/top.sls.

salt sls to use dnsutil.hosts_append not working

I need to read the host entries from pillar file and update the /etc/hosts file accordingly
This is my simple sls file to update the /etc/hosts file.
#/srv/salt/splunk_dep/hosts.sls
dnsutil:
dnsutil.hosts-append:
- hostsfile: '/etc/hosts'
- ip_addr: '10.10.10.10'
- entries: 'hostname'
when i execute the sls file
salt Minion-name state.apply splunk_dep/hosts
Getting the following error
ID: dnsutil
Function: dnsutil.hosts-append
Result: False
Comment: State 'dnsutil.hosts-append' was not found in SLS 'splunk_dep/hosts'
Reason: 'dnsutil.hosts-append' is not available.
Started:
Duration:
Changes:
If i execute through command line its working fine
salt 'DS-110' dnsutil.hosts_append /etc/hosts 10.10.10.10 hostname
I need to update the /etc/hosts file through sls file. Can someone please help me on this.
I am using the salt version : salt 2015.8.3 (Beryllium)
dnsutil is a Salt module, and not a Salt state. Therefore it can be used from the command line, but not directly via SLS state file.
To run modules from state file you'll need module.run. Please note that in this case you'll need to put an underscore in hosts_append, not a hyphen.
dnsutil:
module.run:
- name: dnsutil.hosts_append
- hostsfile: '/etc/hosts'
- ip_addr: '10.10.10.10'
- entries: 'hostname'
Some caveats with modules: even if they don't change your system, they will be reported as "changed" in the summary of your salt call. Please consider using file.blockreplace for managing hosts file instead to avoid this.

Resources