I am using salt for last one month. Whenever I run a command say sudo salt '*' test.ping, then the master pings all the minions and the response being the list of all the minions which are up and running. Output looks something like:
{
"minion_1": true
}
{
"minion_2": true
}
{
"minion_3": true
}
In the master's conf file, return type is configured to JSON.
But if I execute an incorrect command through salt master say sudo salt '*' test1.ping, then the master returns something like this
{
"minion_1": "'test1.ping' is not available."
}
{
"minion_2": "'test1.ping' is not available."
}
{
"minion_3": "'test1.ping' is not available."
}
In both the outputs displayed above, the command has given a success exit code on the master's shell/terminal. How do we track which minions were not able to execute the commands. I am not interested in what type of error it is, I just need some or the other way to track the minions which failed to execute the command.
The last solution is to write a parser which will read the complete output and decide for itself. Hope that there is a better solution.
Reasons to despair
I would not rely on Salt's CLI exit code at the moment (version 2014.7.5) - there are still many issues opened to solve this.
Get valid JSON output
There is --static option which fixes JSON output:
If using --out=json, you will probably want --static as well. Without the static option, you will get a JSON string for each minion.
Otherwise the output given by Salt above contains multiple objects (one per minion) which is not a valid JSON (JSON requires single object, array or value per document) and simple way of loading entire output by a standard JSON parser will fail. It is even mentioned in documentation (as of 5188d6c):
Some JSON parsers can guess when an object ends and a new one begins but many can not.
In addition to that, some Salt options (like show_jid) also send strings to STDOUT which mixes it with execution report and invalidates JSON output format. Option --static also solves this problem.
UPDATE: Parser to detect failure in Salt execution
This problem squeezed me so much so I gave quick birth to this Python script # 75e42af with example how it is used # b819961d.
NOTE: This won't address output of arbitrary Salt command (including test.ping above), but issues related to the output of state execution are covered. There is still a solution to test.ping problem above - it can be run from state, then the output can be analysed by the script. See how to call an execution module from within a state or *.sls file in this answer.
Features (details in the code itself):
Handle output from both highstate and orchestrate runners.
Handle output of multiple minions and any number of commands.
Report summary "? of N" and overall result.
Standalone file usable as script and module.
The only limitation is that it requires JSON output (Salt option --out json) simply because it is easy to fix the discussed issues before feeding it to parser.
The above parser will only work for test.ping command.
If multiple commands have to be executed we will have to write a more robust parser.
Related
I just started using Airflow to coordinate our ETL pipeline.
I encountered the pipe error when I run a dag.
I've seen a general stackoverflow discussion here.
My case is more on the Airflow side. According to the discussion in that post, the possible root cause is:
The broken pipe error usually occurs if your request is blocked or
takes too long and after request-side timeout, it'll close the
connection and then, when the respond-side (server) tries to write to
the socket, it will throw a pipe broken error.
This might be the real cause in my case, I have a pythonoperator that will start another job outside of Airflow, and that job could be very lengthy (i.e. 10+ hours), I wonder if what is the mechanism in place in Airflow that I can leverage to prevent this error.
Can anyone help?
UPDATE1 20190303-1:
Thanks to #y2k-shubham for the SSHOperator, I am able to use it to set up a SSH connection successfully and am able to run some simple commands on the remote site (indeed the default ssh connection has to be set to localhost because the job is on the localhost) and am able to see the correct result of hostname, pwd.
However, when I attempted to run the actual job, I received same error, again, the error is from the jpipeline ob instead of the Airflow dag/task.
UPDATE2: 20190303-2
I had a successful run (airflow test) with no error, and then followed another failed run (scheduler) with same error from pipeline.
While I'd suggest you keep looking for a more graceful way of trying to achieve what you want, I'm putting up example usage as requested
First you've got to create an SSHHook. This can be done in two ways
The conventional way where you supply all requisite settings like host, user, password (if needed) etc from the client code where you are instantiating the hook. Im hereby citing an example from test_ssh_hook.py, but you must thoroughly go through SSHHook as well as its tests to understand all possible usages
ssh_hook = SSHHook(remote_host="remote_host",
port="port",
username="username",
timeout=10,
key_file="fake.file")
The Airflow way where you put all connection details inside a Connection object that can be managed from UI and only pass it's conn_id to instantiate your hook
ssh_hook = SSHHook(ssh_conn_id="my_ssh_conn_id")
Of course, if your'e relying on SSHOperator, then you can directly pass the ssh_conn_id to operator.
ssh_operator = SSHOperator(ssh_conn_id="my_ssh_conn_id")
Now if your'e planning to have a dedicated task for running a command over SSH, you can use SSHOperator. Again I'm citing an example from test_ssh_operator.py, but go through the sources for a better picture.
task = SSHOperator(task_id="test",
command="echo -n airflow",
dag=self.dag,
timeout=10,
ssh_conn_id="ssh_default")
But then you might want to run a command over SSH as a part of your bigger task. In that case, you don't want an SSHOperator, you can still use just the SSHHook. The get_conn() method of SSHHook provides you an instance of paramiko SSHClient. With this you can run a command using exec_command() call
my_command = "echo airflow"
stdin, stdout, stderr = ssh_client.exec_command(
command=my_command,
get_pty=my_command.startswith("sudo"),
timeout=10)
If you look at SSHOperator's execute() method, it is a rather complicated (but robust) piece of code trying to achieve a very simple thing. For my own usage, I had created some snippets that you might want to look at
For using SSHHook independently of SSHOperator, have a look at ssh_utils.py
For an operator that runs multiple commands over SSH (you can achieve the same thing by using bash's && operator), see MultiCmdSSHOperator
Can anyone show me an example of script that can be run from sqoop2 client in batch mode?
I refered http://sqoop.apache.org/docs/1.99.2/Sqoop5MinutesDemo.html
and it says we can run sqoop2 client in batch mode using the following command
sqoop.sh client /path/to/your/script.sqoop
but that script.sqoop isn't like sqoop1 script, so how should it be?
Batch file is nothing but a list of the same commands you would otherwise type in interactive mode (plus comment lines starting with pound sign).
However! Some commands require manual input, thus cannot be easily fully automated (e.g., 'create link' command). See this thread for details.
Is there anything out there to monitor SaltStack installations besides halite? I have it installed but it's not really what we are looking for.
It would be nice if we could have a web gui or even a daily email that showed the status of all the minions. I'm pretty handy with scripting but I don't know what to script.
Anybody have any ideas?
In case by monitoring you mean operating salt, you can try one of the following:
SaltStack Enterprise GUI
Foreman
SaltPad
Molten
Halite (DEPRECATED by SaltStack)
These GUI will allow you more than just knowing whether or not minions are alive. They will allow you to operate on them in the same manner you could with the salt client.
And in case by monitoring you mean just whether the salt master and salt minions are up and running, you can use a general-purpose monitoring solutions like:
Icinga
Naemon
Nagios
Shinken
Sensu
In fact, these tools can monitor different services on the hosts they know about. The host can be any machine that has an ip address and the service can be any resource that can be queried via the underlying OS. Example of host can be a server, router, printer... And example of service can be memory, disk, a process, ...
Not an absolute answer, but we're developing saltpad, which is a replacement and improvement of halite. One of its feature is to display the status of all your minions. You can give it a try: Saltpad Project page on Github
You might look into consul while it isn't specifically for SaltStack, I use it to monitor that salt-master and salt-minion are running on the hosts they should be.
Another simple test would be to run something like:
salt --output=json '*' test.ping
And compare between different runs. It's not amazing monitoring, but at least shows your minions are up and communicating with your master.
Another option might be to use the salt.runners.manage functions, which comes with a status function.
In order to print the status of all known salt minions you can run this on your salt master:
salt-run manage.status
salt-run manage.status tgt="webservers" expr_form="nodegroup"
I had to write my own. To my knowledge, there is nothing out there which will do this, and halite didn't work for what I needed.
If you know Python, it's fairly easy to write an application to monitor salt. For example, my app had a thread which refreshed the list of hosts from the salt keys from time to time, and a few threads that ran various commands against that list to verify they were up. The monitor threads updated a dictionary with a timestamp and success/fail for each host after they ran. It had a hacked together HTML display color coded to reflect the status of each node. Took me a about half a day to write it.
If you don't want to use Python, you could, painfully, do something similar to this inefficient, quick, untested hack using command line tools in bash.
minion_list=$(salt-key --out=txt|grep '^minions_pre:.*'|tr ',' ' ') # You'
for minion in ${minion_list}; do
salt "${minion}" test.ping
if [ $? -ne 0 ]; then
echo "${minion} is down."
fi
done
It would be easy enough to modify to write file or send an alert.
halite was depreciated in favour of paid ui version, sad, but true - still saltstack does the job. I'd just guess your best monitoring will be the one you can write yourself, happily there's a salt-api project (which I believe was part of halite, not sure about this), I'd recommend you to use this one with tornado as it's better than cherry version.
So if you want nice interface you might want to work with api once you set it up... when setting up tornado make sure you're ok with authentication (i had some trouble in here), here's how you can check it:
Using Postman/Curl/whatever:
check if api is alive:
- no post data (just see if api is alive)
- get request http://masterip:8000/
login (you'll need to take token returned from here to do most operations):
- post to http://masterip:8000/login
- (x-www-form-urlencoded data in postman), raw:
username:yourUsername
password:yourPassword
eauth:pam
im using pam so I have a user with yourUsername and yourPassword added on my master server (as a regular user, that's how pam's working)
get minions, http://masterip:8000/minions (you'll need to post token from login operation),
get all jobs, http://masterip:8000/jobs (you'll n need to post token from login operation),
So basically if you want to do anything with saltstack monitoring just play with that salt-api & get what you want, saltstack has output formatters so you could get all data even as a json (if your frontend is javascript like) - it lets you run cmd's or whatever you want and the monitoring is left to you (unless you switch from the community to pro versions) or unless you want to use mentioned saltpad (which, sorry guys, have been last updated a year ago according to repo).
btw. you might need to change that 8000 port to something else depending on version of saltstack/tornado/config.
Basically if you want to have an output where you can check the status of all the minions then you can run a command like
salt '*' test.ping
salt --output=json '*' test.ping #To get output in Json Format
salt manage.up # Returns all minions status
Or else if you want to visualize the same with a Dashboard then you can see some of the available options like Foreman, SaltPad etc.
I'm running batch Java on an IBM mainframe under JZOS. The job creates 0 - 6 ".txt" outputs depending upon what it finds in the database. Then, I need to convert those files from Unix to MVS (ebcdic) and I'm using OCOPY command running under IKJEFT01. However, when a particular output was not created, I get a JCL error and the job ends. I'd like to check for the presence or absence of each file name and set a condition code to control whether the IKJEFT01 steps are executed, but don't know what to use that will access the Unix file pathnames.
I have resolved this issue by writing a COBOL program to check the converted MVS files and set return codes to control the execution of subsequent JCL steps. The completed job is now undergoing user acceptance testing. Perhaps it sounds like a kludge, but it does work and I'm happy to share this solution.
The simplest way to do this in JCL is to use BPXBATCH as follows:
//EXIST EXEC PGM=BPXBATCH,
// PARM='pgm /bin/cat /full/path/to/USS/file.txt'
//*
// IF EXIST.RC = 0
//* do whatever you need to
// ENDIF
If the file exists, the step ends with CC 0 and the IF succeeds. If the file does not exist, you get a non-zero CC (256, I believe), and the IF fails.
Since there is no //STDOUT DD statement, there's no output written to JES.
The only drawback is that it is another job step, and if you have a lot of procs (like a compile/assemble job), you can run into the 255 step limit.
I am attempting to kick off a third party program using EXEC command in PeopleSoft. It is returning error code 127. When I kick the program off from Unix command line, I get no error. Does anybody know what code 127 is? Or have a list of all the return codes?
I think it is likely the Unix shell return code, in which case 127 is "command not found".
See http://tldp.org/LDP/abs/html/exitcodes.html
You may need to make sure your Exec call is specifying the correct path, relative or absolute, or that any expected environment variables are available. Possibly test with a simple program to see if calling through Exec is successful at all. On the server it would run under the ID that started the app server, and may be sourced differently than an individual user. If using relative paths I believe it would start in $PS_HOME.
If you can provide the code snippet someone may be able to also provide other suggestions.