why i am getting this error "Installation failed. Failed to receive heartbeat from agent." in cloudera installtion - cloudera

I am installing cloudera manager on local machine.
When trying to add new host getting following error
Installation failed. Failed to receive heartbeat from agent.
Ensure that the host's hostname is configured properly.
Ensure that port 7182 is accessible on the Cloudera Manager server
(check firewall rules).
Ensure that ports 9000 and 9001 are free on the host being added.
Check agent logs in /var/log/cloudera-scm-agent/ on the host being
added
(some of the logs can be found in the installation details).
i checked the logs,it shows like hostname differs from canonical name
So I also changed the hostname from /etc/resolv.conf
But still getting sam error

I had the same error for a simple mistake in the file /etc/hosts :
Have you checked you have DNS and reverse DNS ?
Then to check if your port is open 7182, you should do a telnet IP 7182 (replace IP by the host of Cloudera Manager Server).
If there are still some problems, maybe you have forgotten to deactivate the firewall (iptables).
Regards, K.

To resolve this issue you need to check first all port opened on your server service listing to the port no, use command: sudo netstat -lpten
Check if any thing is running on 9000 or 90001, mostly java services required for set up is running on port 9000, and cloudera-scm-agent listner also runs on port 9000. to over come this issue you can re-configure theports as well in /etc/cloudera-scm-agent/config.ini by changing as below:
--------------------------------------------------
## It should not normally be necessary to modify these.
# Port that the CM agent should listen on.
listening_port=9001
-------------------------------------------------
and then restart the cloudera-scm-agent service by command:
service cloudera-scm-agent restart
To verify this port is not activated for other sshd service check Ports opened in /etc/ssh/sshd_config.
I hope this resolution will work for others too.
Cheers,
Ankit Gupta

Related

Cannot see initial page from nginx in other computers in same LAN on Fedora 35

I have installed it with command "sudo yum install ngingx" and its visible from computer host using its own ip in the browser, but in other computer in the same LAN and resolving ping it doesnt work and answers a timeout error. I know there is a /etc/nginx/nginx.conf file but I didnt see any valid configuration to resolve this (or I didnt search very well).
Machine has internet and resolves ping to other machines in lan
Could somebody guide me?
I use virtualbox to run Fedora
Thank you, here I left nginx.conf enter image description here
First of all, are you using bridge mode in virtualBox? If so and this is still not working, check if Fedora has enabled the firewall by typing in a shell:
systemctl status firewalld.service
If active, check the zone where the main adapter is configured
firewall-cmd --get-active-zones
Add ports 443 and 80 to the zone related to your interface (for instance FedoraWorkstation)
firewall-cmd --zone=FedoraWorkstation --permanent --add-port=80/tcp
firewall-cmd --zone=FedoraWorkstation --permanent --add-port=443/tcp
This should do the trick
Finally the problem was that apache is installed by default in fedora workstation and main page of nginx was showing apache as current web server, so any changes made in nginx wasnt being loaded. Solution: purge apache from system and reboot. Now nginx load as the main web server and changes made are applied

Set GITLAB to be accessible on LAN

After many research i have not found anything...
I install GITLAB on a CentOS VM. The CentOS ip address is 192.168.100.1.
In the file /etc/gitlab/gitlab.rb, I modified the line:
external_url 'http:192.168.100.1:1234'
I executed the command 'gitlab-ctl reconfigure' and no errors appeared.
When I use Firefox, and I can access to my Gitlab with all the Centos' interfaces:
192.168.100.1:1234
127.0.0.1:1234
It is normal because when i execute 'netstat -ntlp', I can see:
tcp 0 0.0.0.0:1234 LISTEN 22222/nginx:master
What is the problem?
I cannot access to GitLAB outside from the same Network 192.168.100.1/24.
From an other VM on the same network (192.168.100.2), i can ping '192.168.100.2'. I also make an ssh connection but if I made a:
curl 192.168.100.1:1234
The result is "Time out"
Thank,
Vincent

Creating docker repo in Artifactory with dedicated port, it says "SocketException: Permission denied"

I am running Artifactory Pro (5.3.1), and was trying to use the docker registry functionality.
I created a docker repository, and gave it a port 5001 in the "Registry Port" config.
However, there's nothing running on port 5001 ("telnet localhost 5001" refuses to connect), and the logs show this:
[http-nio-8081-exec-7] [ERROR] (o.a.s.s.SshAuthServiceImpl:210) - Failed to start SSH server
java.net.SocketException: Permission denied
at sun.nio.ch.Net.bind0(Native Method) ~[na:1.8.0_72-internal]
at sun.nio.ch.Net.bind(Net.java:433) ~[na:1.8.0_72-internal]
at sun.nio.ch.Net.bind(Net.java:425) ~[na:1.8.0_72-internal]
at sun.nio.ch.AsynchronousServerSocketChannelImpl.bind(AsynchronousServerSocketChannelImpl.java:162) ~[na:1.8.0_72-internal]
at org.apache.sshd.common.io.nio2.Nio2Acceptor.bind(Nio2Acceptor.java:66) ~[sshd-core-0.14.0.jar:0.14.0]
Any idea what could cause a "permission denied"? There's nothing running on that port (same error for any other port). It's on Ubuntu 14.04.
I had a misunderstanding how the docker registry worked with Artifactory.
The Artifactory service doesn't actually open the port assigned to the repo (5001 in this case), but the reverse proxy will listen on it and forward it (with the right X-forwarded-port) to the "normal" Artifactory service port (e.g. 8081).
After setting up the reverse proxy for it, it worked fine.

Could not resolve hostname, ping works

I have installed RasPi Raspbian, and now I can't do ssh or git clone, only local host names are being resolved it seems. And yet ping works:
pi ~ $ ssh test.com
ssh: Could not resolve hostname test.com: Name or service not known
pi ~ $ git clone gitosis#test.com:test.git
Cloning into 'test'...
ssh: Could not resolve hostname test.com: Name or service not known
fatal: The remote end hung up unexpectedly
pi ~ $ ping test.com
PING test.com (174.36.85.72) 56(84) bytes of data.
I sort of worked around it for github by using http://github.com instead of git://github.com, but this is not normal and I would like to pinpoint the problem.
Googling for similar issues but the solutions offered was either typo correction, or adding domains to hosts file.
This sounds like a DNS issue. Try switching to another DNS server and see if it works.
OpenDNS
208.67.222.222
208.67.220.220
GoogleDNS
8.8.8.8
8.8.4.4
Try reseting te contents of the DNS client resolver cache.
(For windows) Fireup a command prompt and type:
ipconfig /flushdns
If you are a linux or mac user, they have their own way of flushing the dns.
Had the same error, I just needed to specify a folder:
localmachine $ git pull ssh://someusername#127.0.0.1:38765
ssh: Could not resolve hostname : No address associated with hostname
fatal: The remote end hung up unexpectedly
localmachine $ git pull ssh://someusername#127.0.0.1:38765/
someusername#127.0.0.1's password:
That error message is just misleading.
if you've a network-manager installed
check /etc/nsswitch.conf
if you've got a line
hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4
remove the **[NOTFOUND=return]**
restart /etc/init.d/networking
the [NOTFOUND=return] prevents futher lookups if the first nameservwe doesn't respond correctly
This may be an issue with the proxy. Kindly unset and try.
git config --global --unset http.proxy
git config --global --unset https.proxy

Hadoop Datanodes cannot find NameNode

I've set up a distributed Hadoop environment within VirtualBox: 4 virtual Ubuntu 11.10 installations, one acting as the master node, the other three as slaves. I followed this tutorial to get the single-node version up and running and then converted to the fully-distributed version. It was working just fine when I was running 11.04; however, when I upgraded to 11.10, it broke. Now all my slaves' logs show the following error message, repeated ad nauseum:
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.1.10:54310. Already tried 0 time(s).
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.1.10:54310. Already tried 1 time(s).
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.1.10:54310. Already tried 2 time(s).
And so on. I've found other instances of this error message on the Internet (and StackOverflow) but none of the solutions have worked (tried changing the core-site.xml and mapred-site.xml entries to be the IP address rather than hostname; quadruple-checked /etc/hosts on all slaves and master; master can SSH password-less into all slaves). I even tried reverting each slave back to a single-node setup, and they would all work fine in this case (on that note, the master always works fine as both a Datanode and the Namenode).
The only symptom I've found that would seem to give a lead is that from any of the slaves, when I attempt a telnet 192.168.1.10 54310, I get Connection refused, suggesting there is some rule blocking access (which must have gone into effect when I upgraded to 11.10).
My /etc/hosts.allow has not changed, however. I tried the rule ALL: 192.168.1., but it did not change the behavior.
Oh yes, and netstat on the master clearly shows tcp ports 54310 and 54311 are listening.
Anyone have any suggestions to get the slave Datanodes to recognize the Namenode?
EDIT #1: In doing some poking around with nmap (see comments on this post), I'm thinking the issue is in my /etc/hosts files. This is what is listed for the master VM:
127.0.0.1 localhost
127.0.1.1 master
192.168.1.10 master
192.168.1.11 slave1
192.168.1.12 slave2
192.168.1.13 slave3
For each slave VM:
127.0.0.1 localhost
127.0.1.1 slaveX
192.168.1.10 master
192.168.1.1X slaveX
Unfortunately, I'm not sure what I changed, but the NameNode is now always dying with the exception of trying to bind a port "that's already in use" (127.0.1.1:54310). I'm clearly doing something wrong with the hostnames and IP addresses, but I'm really not sure what it is. Thoughts?
I found it! By commenting out the second line of the /etc/hosts file (the one with the 127.0.1.1 entry), netstat shows the NameNode ports binding to the 192.168.1.10 address instead of the local one, and the slave VMs found it. Ahhhhhhhh. Mystery solved! Thanks for everyone's help.
This solution worked for me. i.e make sure that the name you used in property in core-site.xml and mapred-site.xml :
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<final>true</final>
</property>
i.e. master is defined in /etc/hosts as xyz.xyz.xyz.xyz master on BOTH master and slave nodes.
Then restart the namenode and check using
netstat -tuplen
and to see that it is bound to the "external" IP address
tcp 0 xyz.xyz.xyz.xyz:54310 0.0.0.0:* LISTEN 102 107203 -
and NOT local IP 192.168.x.y or 127.0.x.y
I had the same trouble. #Magsol solution worked but it should be noted that the entry that needs to be commented out is
127.0.1.1 masterxyz
on the master machine, not the 127.0.1.1 on the slave, though I did that too. Also you need to stop-all.sh and start-all.sh for hadoop, probably obvious.
Once you have restarted hadoop check the nodemaster here: http://masterxyz:50030/jobtracker.jsp
and look at the number of nodes available for jobs.
Though this response is not the solution the author is looking for, other users might land on this page thinking otherwise, so if you are using AWS for setting up your cluster, it is likely that ICMP security rules haven't been enabled in AWS Security Groups page. Look at the following: Pinging EC2 instances
The above solved the connectivity issue from data nodes to master nodes. Ensure that you can ping between each instance.
I am running a 2-nodes cluster.
192.168.0.24 master
192.168.0.26 worker2
I was facing the same problem of Retrying connect to server: master/192.168.0.24:54310 in my worker2 machine logs. But the people mentioned above encountered errors running this command - telnet 192.168.0.24 54310. However, in my case the telnet command worked fine. Then I checked my /etc/hosts file
master /etc/hosts
127.0.0.1 localhost
192.168.0.24 ubuntu
192.168.0.24 master
192.168.0.26 worker2
worker2 /etc/hosts
127.0.0.1 localhost
192.168.0.26 ubuntu
192.168.0.24 master
192.168.0.26 worker2
When I hit http://localhost:50070 on master, I saw Live nodes : 2. But when I clicked on it, I saw only one datanode which was of master's. I checked jps both on master and worker2. Datanode process was running on both the machines.
Then after several trial and errors, I realized that my master and worker2 machines had the same host name "ubuntu". I changed the worker2's hostname from "ubuntu" to "worker2" and removed the "ubuntu" entry from the worker2 machine.
Note: To change the hostname edit the /etc/hostname with sudo.
Bingo! It worked :) I was able to see two datanodes on the dfshealth UI page ( locahost:50070)
I also faced similar issue. (I am using ubuntu 17.0)
I kept only the entries of master and slaves in /etc/hosts file. (in both master and slave machines)
127.0.0.1 localhost
192.168.201.101 master
192.168.201.102 slave1
192.168.201.103 slave2
secondly, > sudo gedit /etc/hosts.allow
and add the entry : ALL:192.168.201.
thirdly, disabled the firewall using sudo ufw disable
finally, I deleted both namenode and datanode folders from all the nodes in cluster, and rerun
$HADOOP_HOME/bin> hdfs namenode -format -force
$HADOOP_HOME/sbin> ./start-dfs.sh
$HADOOP_HOME/sbin> ./start-yarn.sh
To check the health report from command line (which I would recommend)
$HADOOP_HOME/bin> hdfs dfsadmin -report
and I got all the nodes working correctly.

Resources