saltstack dynamically update etc hosts - salt-stack

How to dynamically update /etc/hosts file with saltstack.
There is example that works with ansible great, but don't know how to do it with salt.
http://xmeblog.blogspot.fr/2013/06/ansible-dynamicaly-update-etchosts.html
- name: add hostname in /etc/hosts
lineinfile: dest=/etc/hosts regexp='.*{{ item }}$' line="{{ hostvars[item]['ansible_default_ipv4']['address'] }} {{item}}" state=present
when: hostvars[item]['ansible_default_ipv4']['address'] is defined
with_items: groups['all']
This will update /etc/hosts with all ansible hosts-ip and host address available in inventory file.
How it is possible with salt?
I want to collect all minions ip address and hostname and update it on all minions /etc/hosts.
minion1 => ip (192.168.1.1) hostname is (example1.net)
minion2 => ip (192.168.1.2) hostname is (example2.net)
minion3 => ip (192.168.1.3) hostname is (example3.net)
In all minions /etc/hosts file entry should be like:
127.0.0.1 localhost
::1 localhost
192.168.1.1 example1.net
192.168.1.2 example2.net
192.168.1.3 example3.net

Please take a look into https://github.com/saltstack-formulas/hostsfile-formula, hopefully it suits your needs.
This particular formula allows to 'automagically' create /etc/hosts records for all known minions.
Please no, I've noticed the formula link to Formula Documentation has been broken, try this one instead Salt Formulas installation and usage instructions.
Salt Formulas explained
Formulas are pre-written Salt States. They are as open-ended as Salt States themselves and can be used for tasks such as installing a package, configuring and starting a service, setting up users or permissions, and many other common tasks.

Related

DevStack instances can't be reached outside devstack node

Following official documentation, I'm trying to deploy a Devstack on an Ubuntu 18.04 Server OS on a virtual machine. The devstack node has only one network card (ens160) connected to a network with the following CIDR 10.20.30.40/24. I need my instances accessible publicly on this network (from 10.20.30.240 to 10.20.30.250). So again the following the official floating-IP documentation I managed to form this local.conf file:
[[local|localrc]]
ADMIN_PASSWORD=secret
DATABASE_PASSWORD=$ADMIN_PASSWORD
RABBIT_PASSWORD=$ADMIN_PASSWORD
SERVICE_PASSWORD=$ADMIN_PASSWORD
PUBLIC_INTERFACE=ens160
HOST_IP=10.20.30.40
FLOATING_RANGE=10.20.30.40/24
PUBLIC_NETWORK_GATEWAY=10.20.30.1
Q_FLOATING_ALLOCATION_POOL=start=10.20.30.240,end=10.20.30.250
This would lead to form a br-ex with the global IP address 10.20.30.40 and secondary IP address 10.20.30.1 (The gateway already exists on the network; isn't PUBLIC_NETWORK_GATEWAY parameter talking about real gateway on the network?)
Now, after a successful deployment, disabling ufw (according to this), creating a cirros instance with proper security group for ping and ssh and attaching a floating-IP, I only can access my instance on my devstack node, not on the whole network! Also from within the cirros instance, I cannot access the outside world (even though I can access the outside world from the devstack node)
Afterwards, watching this video, I modified the local.conf file like this:
[[local|localrc]]
ADMIN_PASSWORD=secret
DATABASE_PASSWORD=$ADMIN_PASSWORD
RABBIT_PASSWORD=$ADMIN_PASSWORD
SERVICE_PASSWORD=$ADMIN_PASSWORD
FLAT_INTERFACE=ens160
HOST_IP=10.20.30.40
FLOATING_RANGE=10.20.30.240/28
After a successful deployment and instance setup, I still can access my instance only on devstack node and not from the outside! But the good news is that I can access the outside world from within the cirros instance.
Any help would be appreciated!
Update
On the second configuration, checking packets on tcpdump while pinging the instance floating-IP, I observed that the who-has broadcast packet for the floating-IP of the instance reaches the devstack node from the network router; however no is-at reply is generated and thus ICMP packets are not routed to the devstack node and the instance.
So, with some tricks I created the response and everything works fine afterwards; but certainly this isn't solution and I imagine that the devstack should work out of the box without any tweaking and probably this is because of a misconfiguration of devstack.
After 5 days of tests, research and lecture, I found this: Openstack VM is not accessible on LAN
Enter the following commands on devstack node:
echo 1 > /proc/sys/net/ipv4/conf/ens160/proxy_arp
iptables -t nat -A POSTROUTING -o ens160 -j MASQUERADE
That'll do the trick!
Cheers!

Using cloud-init to change resolv.conf

I want my setup of openstack to work such that when I boot a new instance, 8.8.8.8 should be added to dns-nameservers.
This is my old /etc/resolv.conf (in the new VM which was spawned in openstack)-
nameserver 10.0.0.2
search openstacklocal
And this is the new resolv.conf that I want -
nameserver 8.8.8.8
nameserver 10.0.0.2
search openstacklocal
I followed this tutorial, and
I have added the necessary info. of resolv conf to my config file(/etc/cloud/cloud.cfg) of cloud-init -
manage_resolv_conf: true
resolv_conf:
nameservers: ['8.8.4.4', '8.8.8.8']
searchdomains:
- foo.example.com
- bar.example.com
domain: example.com
options:
rotate: true
timeout: 1
These changes are made in /etc/cloud/cloud.cfg file of the openstack host.
However, the changes don't seem to get reflected.
Any suggestions?
It will not work because cloud-init networking configuration occurs too early in the boot process.
See cloud-init stages: https://cloudinit.readthedocs.io/en/latest/topics/boot.html
Network configuration is done in the "Local" stage, but the user-data from Openstack is only downloaded at the "Config" stage after network is up. At this stage, the network configuration is ignored.
Instead, you need to edit networking files then bring interfaces up by passing commands to cloud-init with runcmd.
Cloud-init overwrites the entry of /etc/sysconfig/network file as well as resolv.conf . To disable this you can create a custom rule for cloud-init config by creating a file /etc/cloud/cloud.cfg.d/custom-network-rule.cfg which contains -
network: {config: disabled}

Could not resolve hostname, ping works

I have installed RasPi Raspbian, and now I can't do ssh or git clone, only local host names are being resolved it seems. And yet ping works:
pi ~ $ ssh test.com
ssh: Could not resolve hostname test.com: Name or service not known
pi ~ $ git clone gitosis#test.com:test.git
Cloning into 'test'...
ssh: Could not resolve hostname test.com: Name or service not known
fatal: The remote end hung up unexpectedly
pi ~ $ ping test.com
PING test.com (174.36.85.72) 56(84) bytes of data.
I sort of worked around it for github by using http://github.com instead of git://github.com, but this is not normal and I would like to pinpoint the problem.
Googling for similar issues but the solutions offered was either typo correction, or adding domains to hosts file.
This sounds like a DNS issue. Try switching to another DNS server and see if it works.
OpenDNS
208.67.222.222
208.67.220.220
GoogleDNS
8.8.8.8
8.8.4.4
Try reseting te contents of the DNS client resolver cache.
(For windows) Fireup a command prompt and type:
ipconfig /flushdns
If you are a linux or mac user, they have their own way of flushing the dns.
Had the same error, I just needed to specify a folder:
localmachine $ git pull ssh://someusername#127.0.0.1:38765
ssh: Could not resolve hostname : No address associated with hostname
fatal: The remote end hung up unexpectedly
localmachine $ git pull ssh://someusername#127.0.0.1:38765/
someusername#127.0.0.1's password:
That error message is just misleading.
if you've a network-manager installed
check /etc/nsswitch.conf
if you've got a line
hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4
remove the **[NOTFOUND=return]**
restart /etc/init.d/networking
the [NOTFOUND=return] prevents futher lookups if the first nameservwe doesn't respond correctly
This may be an issue with the proxy. Kindly unset and try.
git config --global --unset http.proxy
git config --global --unset https.proxy

Hadoop Datanodes cannot find NameNode

I've set up a distributed Hadoop environment within VirtualBox: 4 virtual Ubuntu 11.10 installations, one acting as the master node, the other three as slaves. I followed this tutorial to get the single-node version up and running and then converted to the fully-distributed version. It was working just fine when I was running 11.04; however, when I upgraded to 11.10, it broke. Now all my slaves' logs show the following error message, repeated ad nauseum:
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.1.10:54310. Already tried 0 time(s).
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.1.10:54310. Already tried 1 time(s).
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.1.10:54310. Already tried 2 time(s).
And so on. I've found other instances of this error message on the Internet (and StackOverflow) but none of the solutions have worked (tried changing the core-site.xml and mapred-site.xml entries to be the IP address rather than hostname; quadruple-checked /etc/hosts on all slaves and master; master can SSH password-less into all slaves). I even tried reverting each slave back to a single-node setup, and they would all work fine in this case (on that note, the master always works fine as both a Datanode and the Namenode).
The only symptom I've found that would seem to give a lead is that from any of the slaves, when I attempt a telnet 192.168.1.10 54310, I get Connection refused, suggesting there is some rule blocking access (which must have gone into effect when I upgraded to 11.10).
My /etc/hosts.allow has not changed, however. I tried the rule ALL: 192.168.1., but it did not change the behavior.
Oh yes, and netstat on the master clearly shows tcp ports 54310 and 54311 are listening.
Anyone have any suggestions to get the slave Datanodes to recognize the Namenode?
EDIT #1: In doing some poking around with nmap (see comments on this post), I'm thinking the issue is in my /etc/hosts files. This is what is listed for the master VM:
127.0.0.1 localhost
127.0.1.1 master
192.168.1.10 master
192.168.1.11 slave1
192.168.1.12 slave2
192.168.1.13 slave3
For each slave VM:
127.0.0.1 localhost
127.0.1.1 slaveX
192.168.1.10 master
192.168.1.1X slaveX
Unfortunately, I'm not sure what I changed, but the NameNode is now always dying with the exception of trying to bind a port "that's already in use" (127.0.1.1:54310). I'm clearly doing something wrong with the hostnames and IP addresses, but I'm really not sure what it is. Thoughts?
I found it! By commenting out the second line of the /etc/hosts file (the one with the 127.0.1.1 entry), netstat shows the NameNode ports binding to the 192.168.1.10 address instead of the local one, and the slave VMs found it. Ahhhhhhhh. Mystery solved! Thanks for everyone's help.
This solution worked for me. i.e make sure that the name you used in property in core-site.xml and mapred-site.xml :
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<final>true</final>
</property>
i.e. master is defined in /etc/hosts as xyz.xyz.xyz.xyz master on BOTH master and slave nodes.
Then restart the namenode and check using
netstat -tuplen
and to see that it is bound to the "external" IP address
tcp 0 xyz.xyz.xyz.xyz:54310 0.0.0.0:* LISTEN 102 107203 -
and NOT local IP 192.168.x.y or 127.0.x.y
I had the same trouble. #Magsol solution worked but it should be noted that the entry that needs to be commented out is
127.0.1.1 masterxyz
on the master machine, not the 127.0.1.1 on the slave, though I did that too. Also you need to stop-all.sh and start-all.sh for hadoop, probably obvious.
Once you have restarted hadoop check the nodemaster here: http://masterxyz:50030/jobtracker.jsp
and look at the number of nodes available for jobs.
Though this response is not the solution the author is looking for, other users might land on this page thinking otherwise, so if you are using AWS for setting up your cluster, it is likely that ICMP security rules haven't been enabled in AWS Security Groups page. Look at the following: Pinging EC2 instances
The above solved the connectivity issue from data nodes to master nodes. Ensure that you can ping between each instance.
I am running a 2-nodes cluster.
192.168.0.24 master
192.168.0.26 worker2
I was facing the same problem of Retrying connect to server: master/192.168.0.24:54310 in my worker2 machine logs. But the people mentioned above encountered errors running this command - telnet 192.168.0.24 54310. However, in my case the telnet command worked fine. Then I checked my /etc/hosts file
master /etc/hosts
127.0.0.1 localhost
192.168.0.24 ubuntu
192.168.0.24 master
192.168.0.26 worker2
worker2 /etc/hosts
127.0.0.1 localhost
192.168.0.26 ubuntu
192.168.0.24 master
192.168.0.26 worker2
When I hit http://localhost:50070 on master, I saw Live nodes : 2. But when I clicked on it, I saw only one datanode which was of master's. I checked jps both on master and worker2. Datanode process was running on both the machines.
Then after several trial and errors, I realized that my master and worker2 machines had the same host name "ubuntu". I changed the worker2's hostname from "ubuntu" to "worker2" and removed the "ubuntu" entry from the worker2 machine.
Note: To change the hostname edit the /etc/hostname with sudo.
Bingo! It worked :) I was able to see two datanodes on the dfshealth UI page ( locahost:50070)
I also faced similar issue. (I am using ubuntu 17.0)
I kept only the entries of master and slaves in /etc/hosts file. (in both master and slave machines)
127.0.0.1 localhost
192.168.201.101 master
192.168.201.102 slave1
192.168.201.103 slave2
secondly, > sudo gedit /etc/hosts.allow
and add the entry : ALL:192.168.201.
thirdly, disabled the firewall using sudo ufw disable
finally, I deleted both namenode and datanode folders from all the nodes in cluster, and rerun
$HADOOP_HOME/bin> hdfs namenode -format -force
$HADOOP_HOME/sbin> ./start-dfs.sh
$HADOOP_HOME/sbin> ./start-yarn.sh
To check the health report from command line (which I would recommend)
$HADOOP_HOME/bin> hdfs dfsadmin -report
and I got all the nodes working correctly.

Can't form mpi ring

I am facing problem in configuring and running MPI on my systems.
Here is what I tried:
1) I ran 'mpd &' on one machine and then I ran 'mpdtrace -l' on the same machine. I got this as output: "my-lappy_53430 (127.0.1.1)"
2) On another machine I ran 'mpd -h -p 53430 &' and got this error:
akshey-desktop_39993: conn error in connect_lhs: Connection timed out
akshey-desktop_39993 (connect_lhs 924): failed to connect to lhs at 10.2.28.137 52430
akshey-desktop_39993 (enter_ring 879): lhs connect failed
akshey-desktop_39993 (run 267): failed to enter ring
Can you please help with this issue? I tried to ping and ssh the first machine(on which mpd is running) from the second machine and it worked.
After this I executed 'mpdheck' on the first machine, I got this output:
* * * first ipaddr for this host (via my-lappy) is: 127.0.1.1
These are the contents of /etc/hosts of the first machine:
127.0.0.1 localhost
127.0.1.1 my-lappy
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
Then I ran 'mpdcheck -l' and got this as output:
**********
Your unqualified hostname resolves to 127.0.0.1, which is
the IP address reserved for localhost. This likely means that
you have a line similar to this one in your /etc/hosts file:
127.0.0.1 $uqhn
This should perhaps be changed to the following:
127.0.0.1 localhost.localdomain localhost
**********
Even after changing the first line of /etc/hosts to "127.0.0.1 localhost.localdomain localhost" I still got the same output from 'mpdcheck -l'
Please note that I do not have access to the DNS server of the network and these machines do not have a DNS entry in the DNS server. (I think this should not be a problem because we can always use IP addresses instead of hostnames. Isn't it so?)
Two points:
You probably don't want to wire up an MPD ring by hand. Unless you are just doing some troubleshooting with a raw mpd command, you probably want to use mpdboot. Its usage is described in the User's Guide.
Since you are using MPD, you are using MPICH2 or an MPICH2 derivative. Starting with MPICH2 1.1 there is a new process manager available, called "hydra". I encourage you to update to the latest version of MPICH2 that you can and give hydra a try. It is much more robust than MPD and has many more features, including better performance.
from my personal and recent experience, I would say that
127.0.1.1 my-lappy
must be change to you LAN address, and match your hostname. You can change it with hostname <new hostname> and/or edit permanently /etc/hostname.
Then on host1 you need to start mpd --echo and note the port on which mpd will listen:
mpd_port=N
then on host2 start:
mpd --host=host1 --port=N
It's very important that the /etc/hosts files of all the machines resolve correctly the names to the IPs.
mpdtrace -l will confirm that the ring is correctly setup.
Check for firewall on your systems that might be blocking the default ports. Turn off the firewall by turning off the ipchains and iptables to test if that is the problem.
In addition, make sure the hostnames/IP addresses are correct and can be successfully resolved.

Resources