I am considering to move to salt (currently using ansible) to manage a set of standalone IoT devices (Raspberry Pi in practical terms).
The devices would be installed with a generic image, to which I would add on top the installation of salt (client side) as well as a configuration file pointing to salt-master, which is going to serve state files to be consumed by the minions.
The state files include an HTTP query for a name, which is then applied to the device (as its hostname). The obvious problem is that at that stage the minion has already registered with salt-master under the previous (generic) name.
How to handle such a situation? Specifically: how to propagate the new hostname to salt-master? (just changing the hostname and rebooting did not help, I assume the hostname is bundled, on the server, with the ID of the minion).
The more general question would be whether salt is the right product for such a situatiuon (where setting the state of the minion changes its name, among others)
Your Minion ID is based on the hostname during the installation. When you change the hostname after you installed the salt-minion, the Minion ID will not change.
The Minion ID is specified in /etc/salt/minion_id. When you change the Minion ID:
The Minion will identify itself with the new ID to the Master and stops listening to the old ID.
The Master will detect the new Minion ID as a new Minion and it shows a new key in Unaccepted Keys.
After accepting the key on the Master you will be able to use the Minion with the new key only. The old key is still accepted on the Master but doesn't work anymore.
I can come up with two solutions for your situation:
Use salt-ssh to provision your minions. The Master will connect to your Raspberry PI using SSH. It will setup the correct hostname, install and configure salt-minion. After this is finished, your minion will connect to the master with the correct ID. But this requires the Master to know when and where a minion is available...
You mentioned the state where the hostname is set. Change the Minion ID and restart the minion service in the same state. This will change the Minion ID but you need to accept the new key afterwards. Note that the minion will never report a state as successfully finished when you restart the salt-minion service in it.
Here is a short script to change the hostname/minion_id. It should also work well for batch jobs. Simply call the script like so: sudo ./change-minion-id oldminionid newminionid
change-minion-id:
#! /bin/bash
echo $2; salt "$1" cmd.run "echo $2> /etc/hostname && hostname $2 && hostname > /etc/salt/minion_id" && salt "$1" service.restart "salt-minion" && salt-key -d $1 -y && salt-key -a $2 -y
My answer is a direct rip off from user deput_d. I modified it a bit for what I needed.
Even my code will return a Minion did not return. [No response] (due to the salt-minion restart) but this error should be ignored, just let it run. I wait for 40 seconds just to be sure that the minion has re-connected:
#!/bin/bash
echo "salt rename $1->$2"; salt "$1" cmd.run "echo $2> /etc/hostname && hostname $2 && hostname > /etc/salt/minion_id" && salt "$1" cmd.run "/etc/init.d/salt-minion restart &" || true
salt-key -d $1 -y && echo "Will sleep for 40 seconds while minion reconnects" && sleep 40 && salt-key -a $2 -y
Related
I have an lxd container, that is named master. I found out that it's root can be found at:
/var/lib/lxd/containers/master/rootfs/home/ubuntu/
So, I transferred my folder, tars to this address. Note that tars has two files.
Now, I know that the user id and the group id of tars is root. On the other hand, the user id and group id of every other file in the container is 166536.
So, for the folder and the files, I did sudo chown 166536 <file/folder name>, to change the user id, and sudo chown :166536 <file/folder name>, to change the group id.
Once, I did this, I expected tars to be accessible from the container master, but that didn't happen. Can anyone tell me what am I missing?
Here is a method, I found on reddit:
Yeah this was the answer, running with unprivileged container you are
not able to see the permissions on the LXD Host, so it appears as
Nobody:Nobody. In a way is silly because you can mount the folder into
the Container and see the files on it..
For future reference, for anyone having this issue, this are the steps
i made (it may not be the correct ones but it works)
sudo mkdir /tmp/share
adduser subsonic --shell=/bin/false --no-create-home --system --group --uid 6000 (this is a "service account")
sudo chown -R subsonic: /tmp/share
lxc exec Test -- /bin/bash
mkdir /mnt/share
adduser subsonic --shell=/bin/false --no-create-home --system --group --uid 6000 (important that the uid is the same)
exit
lxc stop Test
lxc config edit Test (add the line security.privileged: "true" right bellow config: save and exit)
lxc start Test
lxc config device add MyMusic MyLibrary disk source=/tmp/share path=/mnt/share
lxc exec Test -- /bin/bash
ls /mnt/share/ (note that the subsonic user is there)
exit
It's a shame that i couldnt find a way to map the user inside the
unprivileged container. if Anyone know's let me know.
Basically, to create a common user, for both the host and the container. Is there anything better that this method available?
When you open the container and do
whoami
You'll get the result:
root
So, the current user is root, and not ubuntu. That being said, the address where I was sending the folder to, is wrong. The current address is var/lib/lxd/containers/master/root/. Once the folder is sent, check the uid/gid (something like 165536). Change the uid by:
sudo chmod 165536 <folder-name>
and gid by:
sudo chmod :165536 <folder-name>
Once that is done, we're sure that the container user root will be able to access these files.
Or in a single step sudo chmod 165536:165536 <folder-name>.
I'm using ucarp over linux bonding for high availability and automatic failover of two servers.
Here are the commands I used on each server for starting ucarp :
Server 1 :
ucarp -i bond0 -v 2 -p secret -a 10.110.0.243 -s 10.110.0.229 --upscript=/etc/vip-up.sh --downscript=/etc/vip-down.sh -b 1 -k 1 -r 2 -z
Server 2 :
ucarp -i bond0 -v 2 -p secret -a 10.110.0.243 -s 10.110.0.242 --upscript=/etc/vip-up.sh --downscript=/etc/vip-down.sh -b 1 -k 1 -r 2 -z
and the content of the scripts :
vip-up.sh :
#!/bin/sh
exec 2> /dev/null
/sbin/ip addr add "$2"/24 dev "$1"
vip-down.sh :
#!/bin/sh
exec 2> /dev/null
/sbin/ip addr del "$2"/24 dev "$1"
Everything works well and the servers switch from one to another correctly when the master becomes unavailable.
The problem is when I unplug both servers from the switch for a too long time (approximatively 30 min). As they are unplugged they both think they are master,
and when I replug them, the one with the lowest ip address tries to stay master by sending gratuitous arps. The other one switches to backup as expected, but I'm unable to access the master through its virtual ip.
If I unplug the master, the second server goes from backup to master and is accessible through its virtual ip.
My guess is that the switch "forgets" about my servers when they are disconnected from too long, and when I reconnect them, it is needed to go from backup to master to update correctly switch's arp cache, eventhough the gratuitous arps send by master should do the work. Note that restarting ucarp on the master does fix the problem, but I need to restart it each time it was disconnected from too long...
Any idea why it does not work as I expected and how I could solve the problem ?
Thanks.
After setting up the salt-master and one minion, I am able to accept the key on the master. Running sudo salt-key -L shows that it is accepted. However, when I try the test.ping command, the master shows:
Minion did not return. [No response]
On the master, the log shows:
[ERROR ][1642] Authentication attempt from minion-01 failed, the public key in pending did not match. This may be an attempt to compromise the Salt cluster.
On the minion, the log shows:
[ERROR ][1113] The Salt Master has cached the public key for this node, this salt minion will wait for 10 seconds before attempting to re-authenticate
I have tried disconnecting and reconnecting, including rebooting both boxes in between.
Minion did not return. [No response]
Makes me think the salt-minion process is not running. (those other two errors are expected behavior until you accepted the key on the master)
Check if salt-minion is running with (depending on your OS) something like
$ systemctl status salt-minion
or
$ service salt-minion status
If it was not running, start it and try your test again.
$ sudo systemctl start salt-minion
or
$ sudo service salt-minion start
Depending on your installation method, the salt-minion may not have been registered to start upon system boot, and you may run into this issue again after a reboot.
Now, if your salt-minion was in fact running, and you are still getting No response, I would stop the process and restart the minion in debug so you can watch.
$ sudo systemctl stop salt-minion
$ sudo salt-minion -l debug
Another quick test you can run to test communication between your minion and master is to execute your test from the minion:
$ sudo salt-call test.ping
salt-call does not need the salt-minion process to be running to work. It fetches the states from the master and executes them ad hoc. So, if that works (returns
local:
True
) you can eliminate the handshake between minion and master as the issue.
I just hit this problem when moving the salt master to a new server, to fix it I had to do these things in this order (Debian 9):
root#minion:~# service salt-minion stop
root#master:~# salt-key -d minion
root#minion:~# rm /etc/salt/pki/minion/minion_master.pub
root#minion:~# service salt-minion start
root#master:~# salt-key -a minion
Please note the minion/master servers above.
If you are confident that you are connecting to a valid Salt Master, then
remove the master public key and restart the Salt Minion.
The master public key can be found at:
/etc/salt/pki/minion/minion_master.pub
Minion's public key position under master is
/etc/salt/pki/master/minions
Just compare it with the minions own public key(under /etc/salt/pki/minion/minion.pub)
If it is not the same, excute
salt-key -d *
to delete the public key of minion from master,
then excute service salt-minion restart to restart salt minion on the minion client.
After this, Master can control the minion.
I got the same error message.
I change user: root to user: ubuntu in /etc/salt/minion.
Stop salt-minion, and run salt-minion -l debug as ubuntu user. salt master could get the salt-minion response.
But, when I run salt-minion with systemctl start salt-minion, salt master got error. (no response)
I run salt-minion as root user and systemctl start salt-minion, it works.
I don't know if it is a bug.
I had this exact same problem when I was moving minions from one master to another. Here are the steps I took to resolve it.
Remove the salt-master and salt-minon on the master server.
rm -rf /etc/salt on the master hosts
Save your minion config if it has any defined variables
Save your minion config if it has any defined variables
Remove salt-minion on the minion hosts
rm -rf /etc/salt on all minion-hosts
Reinstall salt-master and salt-minion
Start salt-master then salt-minions
My problem was solved after doing this. I know it is not a solution really, but it's likely a conflict with keys that is causing this issue.
Run salt-minion debug mode to see, if this is a handshake issue.
If so, Adding salt-master port(4506 or configured) to public zone with firewalld-cmd should help.
`firewall-cmd --permanent --zone=public --add-port=4506/tcp`
`firewall-cmd --reload`
Your salt-master keys on minion seems to be invalid (may be due to master ip or name update)
Steps for troubleshoot:-
From minion check if master is reachable (simple ping test )
Delete the old master keys present on minion (/etc/salt/pki/minion/minion_master.pub)
Again try from master to ping minion new valid keys will be auto populated
configured the salt-stack environment like below:
machine1 -> salt-master
machine2 -> salt-minion
machine3 -> salt-minion
This setup is working for me and I can publish i.e. the command "ls -l /tmp/" from machine2 to machine3 with
salt-call publish.publish 'machine3' cmd.run 'ls - /tmp/'
How it's possible to restrict the commands that are able to be published?
In the currently setup it's possible to execute every command on machine3 and that we would be very risky. I was looking in the salt-stack documentation but unfortunately, I didn't find any example how to configure it accordingly.
SOLUTION:
on machine1 create file /srv/salt/_modules/testModule.py
insert some code like:
#!/usr/bin/python
import subprocess
def test():
return __salt__['cmd.run']('ls -l /tmp/')
if __name__ == "__main__":
test()
to distribute the new module to the minions run:
salt '*' saltutil.sync_modules
on machine2 run:
salt-call publish.publish 'machine3' testModule.test
The peer configuration in the salt master config can limit what commands certain minion can publish, e.g.
peer:
machine2:
machine1:
- test.*
- cmd.run
machine3:
- test.*
- disk.usage
- network.interfaces
This will allow minion machine2 to publish test.* and cmd.run commands.
P.S. Allowing minions to publish cmd.run command is not a good idea generally, just put it here as example.
I'd really like to be able to use the ec2-init scripts to do some housekeeping when I spin up an instance. Ideally I'd like to be able to pass user data to set the hostname and run a couple of initialization scripts (to configure puppet etc.).
I see a script called ec2-set-hostname but I'm not sure if you can use it to set an arbitrary hostname from user-data or what the format of the user-data would need to be.
Anyone used these scripts and know how if can set the hostname and run some scripts at the same time?
Thanks in advance.
In the end I decided to skip the ubuntu ec2 scripts and do something similar. I looked into using Amazon's Route53 service as the nameservice and it was really easy to get it up and running.
Using Route53
Here is what I did; Firstly I used the IAM tools to create a user 'route53' with liberal policy permissions for interacting with the Route53 service
Create the dns group & user
iam-groupcreate -g route53 -v
iam-usercreate -u route53 -g route53
Create keys for the user and note these for later
iam-useraddkey -u route53
Give access to the group to add zones and dns records
iam-grouplistpolicies -g route53
iam-groupaddpolicy -p hostedzone -e Allow -g route53 -a route53:* -r '*'
listing the users and policies for a group
iam-grouplistusers -g route53
iam-grouplistpolicies -g route53
iam-grouplistpolicies -g route53 -p hostedzone
To add and remove dns record entries I uses the excellent python wrapper library for Route53, cli53. This takes a lot of the pain out of using route53. You can grab it from here
https://github.com/barnybug/cli53
In my case the python script is symlinked in /usr/bin as cli53. You'll need to set the following environment variables containing keys created earlier for the route53 user.
export AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
export AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXXXXXXXXX
You need to then create a zone entry for your domain e.g. simple.org
cli53.py create simple.org
This should return you an amazon nameserver address that you can associate with your domain name via your domain name registrar, so that hostname lookups for the domain will be redirected to the Route53 servers.
Once the zone is setup, adding and removing entries to it is simple e.g.
cli53 rrcreate simple.org hostname CNAME ec2-184-73-137-40.compute-1.amazonaws.com
cli53 rrdelete simple.org hostname
We use a CNAME entry with the Public DNS name of the ec2 instance as this hostname will resolve to the public IP externally and the private IP from within EC2. The following adds an entry for a host 'test2.simple.org'.
cli53 rrcreate simple.org test2 CNAME ec2-184-73-137-40.compute-1.amazonaws.com --ttl 60 --replace
Automatically set hostname and update Route53
Now what remains is to setup a script to automatically do this when the machine boots. This solution and the following script owes huge debt to Marius Ducea's excellent tutorial found here
http://www.ducea.com/2009/06/01/howto-update-dns-hostnames-automatically-for-your-amazon-ec2-instances/
It's basically doing the same as Marius' setup, but using Route53 instead of Bind.
The script uses the simple REST based services available to each EC2 Instance at
http://169.254.169.254
to retrieve the actual Public DNS name and grab the desired hostname from the instance. The hostname is passed to the instance using the customizable 'user-data' which we can specify when we start the instance. The script expects user-data in the format
hostname=test2
The script will
grab hostname info from the instance user-data
grab the public DNS name from the instance metadata
parse out the hostname
set the hostname to the fully qualified name e.g. test2.simple.org,
Add a CNAME record for this FQDN in Route53 point to the Public DNS Name
write an entry into the Messages of the day so that users can see the domain to ec2 mapping when they log in
Copy and save the following as /usr/bin/autohostname.sh
#!/bin/bash
DOMAIN=simple.org
USER_DATA=`/usr/bin/curl -s http://169.254.169.254/latest/user-data`
EC2_PUBLIC=`/usr/bin/curl -s http://169.254.169.254/latest/meta-data/public-hostname`
HOSTNAME=`echo $USER_DATA| cut -d = -f 2`
#set also the hostname to the running instance
FQDN=$HOSTNAME.$DOMAIN
hostname $FQDN
export AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
export AWS_ACCESS_KEY_ID=xxxxxxxxxxxxxxxxxxxxxxx
# Update Route53 with a CNAME record pointing the hostname to the EC2 public DNS name
# in this way it will resolve correctly to the private ip internally to ec2 and
# the public ip externally
RESULT=`/root/dns/cli53/cli53.py rrcreate $DOMAIN $HOSTNAME CNAME $EC2_PUBLIC --ttl 60 --replace`
logger "Created Route53 record with the result $RESULT"
# write an MOTD file so that the hostname is displayed on login
MESSAGE="Instance has been registered with the Route53 nameservers as '$FQDN' pointing to ec2 domain name '$EC2_PUBLIC'"
logger $MESSAGE
cat<<EOF > /etc/update-motd.d/40-autohostname
#!/bin/bash
# auto generated on boot by /root/bin/auto_hostname.sh via rc.local
echo "$MESSAGE"
EOF
chmod +x /etc/update-motd.d/40-autohostname
exit 0
To get the script to run at boot time, we add a line in /etc/rc.local e.g.
/usr/bin/autohostname.sh
Change the user-data for the test instance to 'hostname=test2' and reboot the instance. Once it reboots, you should be able to login to it via test2.simple.org. It may take a couple of minutes for this to resolve correctly, depending on the TTLs you specified. When you login, you should see a MOTD message telling you
Instance has been registered with the Route53 nameservers as 'test2.simple.org' pointing to ec2 domain name 'ec2-184-73-137-40.compute-1.amazonaws.com'
Once you have this working with the test instance it would make sense to back it up as an AMI that you can use to create other instances with the same autohostnaming abilities.
HTH
I installed the route53 gem, and wrote a little script:
gem install route53
#!/bin/bash
DATE=`date +%Y%m%d%H%M%S`
export HOME=/root
export DEBIAN_FRONTEND=noninteractive
export PATH=/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/local/aws/bin /usr/local/node:$PATH
export JAVA_HOME=/usr/java/current
export EC2_HOME=/usr/local/aws
export EC2_PRIVATE_KEY=/root/.ec2/pk-XXXXXXXXXXXXXXXXXXXXXXX
export EC2_CERT=/root/.ec2/cert-XXXXXXXXXXXXXXXXXXXX
export EC2_INSTANCE_ID=`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`
echo "$EC2_INSTANCE_ID"
mkdir /root/$EC2_INSTANCE_ID
ec2din $EC2_INSTANCE_ID > /root/$EC2_INSTANCE_ID/$EC2_INSTANCE_ID.txt
export FQDN=`cat /root/$EC2_INSTANCE_ID/$EC2_INSTANCE_ID.txt |grep Name |awk '{print $5}'`
export EC2_DNS_NAME=`cat /root/$EC2_INSTANCE_ID/$EC2_INSTANCE_ID.txt |grep INSTANCE |awk '{print $4}'`
/usr/bin/ruby1.8 /usr/bin/route53 -g -z /hostedzone/XXXXXXXX --name $FQDN. --type CNAME --ttl 60 --values $EC2_DNS_NAME > /tmp/route53.out 2>&1
-Josh
I've taken a similar approach to 'sgargan' in that allow an instance to create its own DNS record in Route 53, but instead I've used the phython AWS library 'boto' and I have configured 'systemd' (the init/upstart replacement released in Fedora 15/16) to remove the dns entry wwhen the host is shut down.
Please see the following walk-through on how to do it : -
http://www.practicalclouds.com/content/blog/1/dave-mccormick/2012-02-28/using-route53-bring-back-some-dns-lovin-your-cloud
Whilst it isn't ideal exposing your internal ips in an external DNS zone file, until such time that Amazon create an internal DNS service then I think that it is preferable to running your own BIND instances.
The link mentioned in the previous answer is not available any more. But it is still available at the Wayback Machine: http://web.archive.org/web/20140709022644/http://www.practicalclouds.com/content/blog/1/dave-mccormick/2012-02-28/route53-bring-back-some-dns-lovin-ec2