I have allowed 4505 and 4506 port on the master node system.
When I want to use 'cp.get_dir' to transfer a directory to minion v3000.5(CentOS6 arch), it took a long time, and then returned:
Minion did not return. [No response]
The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:
salt-run jobs.lookup_jid 20221208081502397105
and the minion log shows:
2022-12-08 16:16:25,987 [salt.utils.parsers :1082][WARNING ][21455] Minion received a SIGTERM. Exiting.
2022-12-08 16:16:28,141 [salt.minion :2003][WARNING ][21819] The minion failed to return the job information for job 20221208081547958420. This is often due to the master be
ing shut down or overloaded. If the master is running, consider increasing the worker_threads value.
When I use the same command to transfer the same directory to minion v3005.1(CentOS7 arch), it works.
And I tried several CentOS6 minion nodes, got the same error returns.
It can transfer the name of first file and directory to minion, but with empty content in the file. It seems like the communication between the master and the minion is broken when the first file start to be transfered.
But when I use cmd.run module command, it works.
Related
On Minion:
ID: run_snmpv3_config
Function: file.managed
Name: /tmp/run_snmpv3_config_cmd.sh
Result: False
Comment: Source file salt://files/run_snmpv3_config_cmd.sh not found in saltenv 'base'
Started: 15:11:56.175325
Duration: 27.084 ms
Changes:
On master we confirm that the minion does in fact see the file:
master # salt minion cp.list_master | grep snmp
- files/run_snmpv3_config_cmd.sh
So why isn't it able to get it?
(In fact I wanted to use cmd.script but that errors out with Unable to cache script, so I tried to just copy the file, which doesn't work either as we see above.)
I called the state for debugging purposes on a client system using
salt-call --local state.apply teststate -l debug
Of course in this case it will look for file salt://x inside /srv/salt (or whatever the minion's config is) on the minion and not the master....
I have two machines. Machine1: airflow-webserver, airflow-scheduler. Machine2: airflow-worker on specific queue. I am using CeleryExecutor. Task on machine2 runs successfully (writing and deleting files on local drive), but in web UI on machine1 I didnt read log files.
*** Log file does not exist: /home/airflow/logs/delete_images_by_ttl/delete_images/2018-10-29T12:24:23.299741+00:00/1.log
*** Fetching from: http://localhost-int.localdomain:8793/log/delete_images_by_ttl/delete_images/2018-10-29T12:24:23.299741+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='localhost-int.localdomain', port=8793): Max retries exceeded with url: /log/delete_images_by_ttl/delete_images/2018-10-29T12:24:23.299741+00:00/1.log
To solve this problem edit your /etc/hosts. Add ip and dns-name for airflow webserver
HTTPConnectionPool means webserver is not able to communicate to the worker node.
Add worker node hostname on /etc/hosts file
Also verify below
base_log_folder = /home/airflow/logs/
sudo chmod -R 777 /home/airflow/logs/
I have SLS files set up to copy things from a network folder to a local directory on a minion.
Looks a little like this:
cmd-test:
cmd.run:
- name: 'ROBOCOPY \\\CygwinSource C:\CygwinSource /E'
and get the following output:
-------------------------------------------------------------------------------
ROBOCOPY :: Robust File Copy for Windows
-------------------------------------------------------------------------------
Started : Tuesday, December 6, 2016 10:50:35 AM
2016/12/06 10:50:35 ERROR 1808 (0x00000710) Getting File System Type of Source \\<Server>\<program>\<file>\
The account used is a computer account. Use your global user account or local user account to access this server.
Source - \\<Server>\<program>\<folder>\
Dest : C:\<path>\<folder>\
Files : *.*
Options : *.* /S /E /DCOPY:DA /COPY:DATS /PURGE /MIR /NP /R:1 /W:1
------------------------------------------------------------------------------
NOTE : NTFS Security may not be copied - Source may not be NTFS.
2016/12/06 10:50:35 ERROR 1808 (0x00000710) Accessing Source Directory \\<Server>\<program>\<file>\
The account used is a computer account. Use your global user account or local user account to access this server.
Waiting 1 seconds... Retrying...
When I run the same thing locally in command line as 'ROBOCOPY \\\CygwinSource C:\CygwinSource /E' and it worked perfectly. I have no idea how to fix this 'use local user account' that Robocopy seems to give when using it through salt.
I also tried adding /MIR and /SEC which didnt't work.
Running Windows 10, Minion 2016.3.3
Master: Red Hat, 2016.3.3
Salt seems to be connecting to the network resource with a computer account. A few possible solutions:
Try changing the Salt Service on the Client (if that's how salt is executing the commands) to run as a domain user.
Try using the salt file server
Implement this hacky workaround where a scheduled task is created - discussed in the github issue that seems related to your problem: https://github.com/saltstack/salt/issues/16340
I am a total beginner with SaltStack but I have managed to setup some states on a machine and run them on a minion.
What I have right now is a Debian machine setup with salt-master as well as another Debian setup as salt-minion.
Since I am using the salt-master also as a development machine, I would like to know if I can somehow apply the states on the master itself as well. And if so, how?
Is there a command I can run to apply the states on the master? (so far I was unable to find it)
Should I install salt-minion on the same machine as well to be able to do this and simply register the same machine as a minion on itself?
Thanks!
Since I am using the salt-master also as a development machine, I would like to know if I can somehow apply the states on the master itself as well. And if so, how?
You can do that by following the following steps:
Install salt-minion on your development machine
Edit /etc/salt/minion to point to your master (vi /etc/salt/minion and change the following : master: salt -> master: 127.0.0.1)
(optional) Edit /etc/salt/minion_id to something that is meaningful to you
Start up your salt-minion
Use salt-key to accept your minion's key
Use your salt-master to control your minion as if it were any other salt-minion
Is there a command I can run to apply the states on the master?
The salt-master doesn't really run the the state files, the salt-minions do. If you followed the above steps then you can target your salt-master to run highstate with the following command:
salt 'the_value_of_/etc/salt/minion_id' state.highstate
Should I install salt-minion on the same machine as well to be able to do this and simply register the same machine as a minion on itself?
Yup. I think you have an idea as to what you need to do and just need a push in the right direction.
Install both Minion and Master on single node
I call such node Master Minion. No steps provided - you already know it based on the question.
Some conceptual info instead:
In short, Master never applies states. Instead, it triggers Minions (the local Master Minion in this case).
Salt Minion and Master are two separate services with independent runtime & configuration.
While instances use common software, runtime talk over the network (location-independent).
If you can apply states on remote Minion, the same mechanism will be used for local Minion one as well.
Additional info
There are two ways to apply states:
Master-side salt command to "push" states to multiple remote minions.
rpm -qf $(which salt)
salt-master-2015.5.3-4.fc22.noarch
Minion-side salt-call command to "pull" states on single local minion.
rpm -qf $(which salt-call)
salt-minion-2015.5.3-4.fc22.noarch
Until more than one minion is involved, it's better to use salt-call for the same effect:
salt-call state.highstate
Minion-side salt-call provides advantages especially for testing, isolation, troubleshooting:
It makes network issues (if any) more obvious.
It safely applies states only on single local minion (no way to specify more than one).
It shows debug output directly in the local terminal:
salt-call -l debug test.ping
The last point, salt-call--local can also be used in masterless setup using no network.
Now it's near end of 2015. Let's review some more possibilities to salt master self-control:
Install a minion aside with salt master on the same box
This one has been widely discussed as above two answers.
Use salt-ssh + salt-run state.orchestrate
Setup steps:
Step 1: install salt-ssh
Step 2: modify roster file (e.g. /etc/salt/roster in CentOS 6). The default installation already provide you some example. Since you probably ssh into salt master, of course username / password / private key setup should not be a problem for you. For example to control salt master vagrant box, this sample should do:
localhost:
host: 127.0.0.1
user: vagrant
passwd: vagrant
sudo: True
Now, steal from official tutorial with a little bit twist:
# /srv/salt/orch/cleanfoo.sls
cmd.run:
salt.function:
- tgt: 'localhost'
- ssh: 'true'
- arg:
- touch /tmp/test.txt
And run it with:
salt-run state.orchestrate orch.cleanfoo
Check your salt master vagrant box /tmp directory if test.txt file is there.
This approach should also work for state. Either way you need to install something. I prefer the second way since in general, calling salt master self control (to provision some work) is just a step before I actually call minion to process other state(s).
How does one use MPI_Comm_spawn to start worker processes on remote nodes?
Using OpenMPI 1.4.3, I've tried this code:
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "host", "node2");
MPI_Comm intercom;
MPI_Comm_spawn("worker",
MPI_ARGV_NULL,
nprocs,
info,
0,
MPI_COMM_SELF,
&intercom,
MPI_ERRCODES_IGNORE);
But that fails with this error message:
--------------------------------------------------------------------------
There are no allocated resources for the application
worker
that match the requested mapping:
Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
If I replace the "node2" with the name of my local machine, then it works fine. If I ssh into node2 and run the same thing there (with "node2" in the info dictionary) then it also works fine.
I don't want to start the parent process with mpirun, so I'm just looking for a way to dynamically spawn processes on remote nodes. Is this possible?
I don't want to start the parent
process with mpirun, so I'm just
looking for a way to dynamically spawn
processes on remote nodes. Is this
possible?
I'm not sure why you don't want to start it with mpirun? You're implicitly starting up the whole MPI machinery anyway as soon as you hit MPI_Init(), this way you just get to pass it options rather than relying on the default.
The issue here is simply that when the MPI library starts up (at MPI_Init()) it doesn't see any other hosts available, because you haven't given it any with the --host or --hostfile options to mpirun. It won't just launch processes elsewhere on your say-so (indeed, spawn doesn't require Info host, so in general it wouldn't even know where to go otherwise), so it fails.
So you'll need to do
mpirun --host myhost,host2 -np 1 ./parentjob
or, more generally, provide a hostfile, preferably with a number of slots available
myhost slots=1
host2 slots=8
host3 slots=8
and launch the jobs this way, mpirun --hostfile mpihosts.txt -np 1 ./parentjob This is a feature, not a bug; now it's MPIs job to figure out where the workers go, and if you don't specify a host explicitly in the info, it'll try to put it in the most underutilized place. It also means you don't have to recompile to change the hosts you'll spawn to.