I am trying to up Hadoop in Centos-7 usign CLoudera, but while Cluster Setup process (Single node), I am getting this error stating:
There was an error when communicating with the server. See the log file for more information.
I logged into cloudera-scm-agent.log file using
sudo cat /var/log/cloudera-scm-agent/cloudera-scm-agent.log
And I see Failed directory creation and connection refused errors.
The detailed log file can be found here.
Can someone please assist me on what am I doing wrong here?
Have you installed the cluster with single user mode? if so the system user "cloudera-scm" should have permission to perform read, write operation on service log, pid, data directory. From your log message, all services are refused to start because of improper file system permission.
stacks', u'bytes_free_warning_threshhold_bytes': 0, u'group': u'cloudera-scm', u'user': u'cloudera-scm', u'mode': 493}]
[01/Nov/2018 04:41:11 +0000] 28095 MainThread os_ops ERROR Failed directory creation: /var/log/zookeeper/stacks: [Errno 13] Permission denied: '/var/log/zookeeper'
[01/Nov/2018 04:41:11 +0000] 28095 MainThread process ERROR Could not evaluate resource {u'path': u'/var/log/zookeeper/stacks', u'bytes_free_warning_threshhold_bytes': 0, u'group': u'cloudera-scm', u'user': u'cloudera-scm', u'mode': 493}
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.15.1-py2.7.egg/cmf/process.py", line 963, in _do_directory_resources
self.osops.mkabsdir(d["path"], user=d["user"], group=d["group"], mode=d["mode"])
File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.15.1-py2.7.egg/cmf/util/os_ops.py", line 180, in mkabsdir
os.makedirs(path)
File "/usr/lib64/cmf/agent/build/env/lib64/python2.7/os.py", line 150, in makedirs
makedirs(head, mode)
File "/usr/lib64/cmf/agent/build/env/lib64/python2.7/os.py", line 157, in makedirs
mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/var/log/zookeeper'
Related
I have set up a raspberry webserver running on my school lan network ,other people are connecting with arduino,sometimes when they connect i get this error:
Exception happened during processing of request from ('172.17.17.66', 49153)
Traceback (most recent call last):
File "/usr/lib/pythonz.7/SocketServer.py", line 290, in
_hand1e_request_nobloc k self.process_request(request, client_address) File
"/usr/lib/pythonz.7/SocketServer.py“, line 318, in process_request
self.finish_request(request, client_address) File
"/usr/lib/pythonz.7/SocketServer.py“, line 331, in finish_request
self.RequestHandlerClass(request, client_address, self) File
"/usr/lib/pythonz.7/SocketServer.py", line 652, in __init__ self.hand1e()
File "/usr/lib/pythonz.7/BaseHTTPServer.py“, line 340, in handle
self.handle_one_request() File "/usr/lib/pythonz.7/BaseHTTPServer.py", line
310, in handle_one request self . raw_requestline = self . rfile .
readline(65537) ‘ File "/usr/lib/pythonz.7/socket.py", line 480, in readline
data = self._sock.recv(se1f._rbufsize) error: [Errno 104] Connection reset
by peer
Can someone tell me what means? Is a problem of my server or their socket? If needed i can post my code.
The last line:
error: [Errno 104] Connection reset by peer
Means the client dropped the connection. I'd look into the Arduino code first.
Setup
Windows 10
Docker for Windows v18.09.0
AWS SAM CLI v0.10.0
Python 3.7.0
AWS CLI v1.16.67
dotnet core sdk v2.1.403
Powershell v5.1.17134.407
Problem
I'm following the quickstart for AWS SAM Local (as well as the readme generated once the init command is executed below), using the dotnetcore2.1 runtime.
I've run the following command to initialise AWS SAM for use with dotnetcore2.1
sam init --runtime dtonetcore2.1
Then I created the package by running
build.ps1 --target=package
Finally I start the local API Gateway service by running
sam local start-api
I then open a browser and navigate to http://localhost:3000/hello where I'm presented with the following:
PS C:\Users\user_name\Documents\Workspace\messaround\aws-sam\sam-app> sam local start-api
2019-01-04 10:39:15 Found credentials in shared credentials file: ~/.aws/credentials
2019-01-04 10:39:15 Mounting HelloWorldFunction at http://127.0.0.1:3000/hello [GET]
2019-01-04 10:39:15 You can now browse to the above endpoints to invoke your functions. You do not need to restart/reload SAM CLI while working on your functions changes will be reflected instantly/automatically. You only need to restart SAM CLI if you update your AWS SAM template
2019-01-04 10:39:16 * Running on http://127.0.0.1:3000/ (Press CTRL+C to quit)
2019-01-04 10:40:10 Invoking HelloWorld::HelloWorld.Function::FunctionHandler (dotnetcore2.1)
2019-01-04 10:40:10 Decompressing C:\Users\user_name\Documents\Workspace\messaround\aws-sam\sam-app\artifacts\HelloWorld.zip
Fetching lambci/lambda:dotnetcore2.1 Docker container image......
2019-01-04 10:40:13 Mounting C:\Users\user_name\AppData\Local\Temp\tmpq0zka7a7 as /var/task:ro inside runtime container
2019-01-04 10:40:14 Exception on /hello [GET]
Traceback (most recent call last):
File "C:\Users\user_name\AppData\Roaming\Python\Python37\site-packages\docker\api\client.py", line 246, in _raise_for_status
response.raise_for_status()
File "C:\Users\user_name\AppData\Roaming\Python\Python37\site-packages\requests\models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http+docker://localnpipe/v1.35/containers/102dda11417068e01873242be2383c78c7ad4e2739fd4f8b42c1e0ea494d2bbb/start
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\user_name\AppData\Roaming\Python\Python37\site-packages\flask\app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\user_name\AppData\Roaming\Python\Python37\site-packages\flask\app.py", line 1815, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\user_name\AppData\Roaming\Python\Python37\site-packages\flask\app.py", line 1718, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "C:\Users\user_name\AppData\Roaming\Python\Python37\site-packages\flask\_compat.py", line 35, in reraise
raise value
File "C:\Users\user_name\AppData\Roaming\Python\Python37\site-packages\flask\app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\user_name\AppData\Roaming\Python\Python37\site-packages\flask\app.py", line 1799, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "C:\Users\user_name\AppData\Roaming\Python\Python37\site-packages\samcli\local\apigw\local_apigw_service.py", line 153, in _request_handler
self.lambda_runner.invoke(route.function_name, event, stdout=stdout_stream_writer, stderr=self.stderr)
File "C:\Users\user_name\AppData\Roaming\Python\Python37\site-packages\samcli\commands\local\lib\local_lambda.py", line 85, in invoke
self.local_runtime.invoke(config, event, debug_context=self.debug_context, stdout=stdout, stderr=stderr)
File "C:\Users\user_name\AppData\Roaming\Python\Python37\site-packages\samcli\local\lambdafn\runtime.py", line 86, in invoke
self._container_manager.run(container)
File "C:\Users\user_name\AppData\Roaming\Python\Python37\site-packages\samcli\local\docker\manager.py", line 98, in run
container.start(input_data=input_data)
File "C:\Users\user_name\AppData\Roaming\Python\Python37\site-packages\samcli\local\docker\container.py", line 187, in start
real_container.start()
File "C:\Users\user_name\AppData\Roaming\Python\Python37\site-packages\docker\models\containers.py", line 390, in start
return self.client.api.start(self.id, **kwargs)
File "C:\Users\user_name\AppData\Roaming\Python\Python37\site-packages\docker\utils\decorators.py", line 19, in wrapped
return f(self, resource_id, *args, **kwargs)
File "C:\Users\user_name\AppData\Roaming\Python\Python37\site-packages\docker\api\container.py", line 1075, in start
self._raise_for_status(res)
File "C:\Users\user_name\AppData\Roaming\Python\Python37\site-packages\docker\api\client.py", line 248, in _raise_for_status
raise create_api_error_from_http_exception(e)
File "C:\Users\user_name\AppData\Roaming\Python\Python37\site-packages\docker\errors.py", line 31, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation)
docker.errors.APIError: 500 Server Error: Internal Server Error ("error while creating mount source path '/host_mnt/c/Users/user_name/AppData/Local/Temp/tmpq0zka7a7': mkdir /host_mnt/c/Users/user_name/AppData: permission denied")
2019-01-04 10:40:14 127.0.0.1 - - [04/Jan/2019 10:40:14] "GET /hello HTTP/1.1" 502 -
2019-01-04 10:40:14 127.0.0.1 - - [04/Jan/2019 10:40:14] "GET /favicon.ico HTTP/1.1" 403 -
What I've tried
Resetting the shared drive credentials
Initially I though this was a permissioning error between my Windows drive and the VM running docker... After searching the docker forums I found this article which I've followed. However this doesn't seem to have changed the error message
Any suggestions would be greatly received. Thanks
That's how I fixed my problem:
When SAM CLI sees a zip, it unzip into a temp directory (looks to be C:/Users/user_name/AppData/Local/Temp/tmpq0zka7a7 in your case).
Docker must have access to that folder.
In my case, I've created a local user to give Docker access to shared drives and that local user didn't have access to C:/Users/user_name.
I gave it access and got my problem sorted. Maybe you can fix it the same way.
Try to run the following:
docker run --rm -v c:/Users/user_name:/data alpine ls /data
It should list c:/Users/user_name content if all is fine.
Good luck!
with cloudera install doc step by step I have in trouble with Install Agents
like this:
It said install failed and can not receive signal.
And I find the log like this:
[13/Nov/2018 16:44:19 +0000] 4306 MainThread agent ERROR Heartbeating to ryze-1.bigdata.com:7182 failed.
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1371, in _send_heartbeat
response = self.requestor.request('heartbeat', heartbeat_data)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 141, in request
return self.issue_request(call_request, message_name, request_datum)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 254, in issue_request
call_response = self.transceiver.transceive(call_request)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 483, in transceive
result = self.read_framed_message()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 489, in read_framed_message
framed_message = response_reader.read_framed_message()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 417, in read_framed_message
raise ConnectionClosedException("Reader read 0 bytes.")
ConnectionClosedException: Reader read 0 bytes.
I try to solve it with google I already check these setting.
/etc/cloudera-scm-agent/config.ini the port set 7182 and server_host set ryze-1.bigdata.com.
iptable altready shutdown with sudo service iptables stop
ryze-1.bigdata.com is reachable. and telnet ryze-1.bigdata.com 7183 can succeed.
OS: Centos7.4
Platform: AliCloud
So what can I do? Any one can help me ?
I closed the ssl option.
Everything is fine now.......
I am trying to configure remote github repo as the salt server root but it can't make the authentication successful with the pub/priv keypair. I have given the location of the keys in the /etc/salt/master file as well.
Below are the logs I am getting:
2018-11-05 01:48:32,197 [salt.utils.gitfs :1574][ERROR ][21391] Error occurred fetching gitfs remote 'git#[github-endpoint].git': failed to start SSH session: Unable to exchange encryption keys
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/salt/utils/gitfs.py", line 1552, in _fetch
fetch_results = origin.fetch(**fetch_kwargs)
File "/usr/lib64/python2.7/site-packages/pygit2/remote.py", line 405, in fetch
File "/usr/lib64/python2.7/site-packages/pygit2/errors.py", line 64, in check_error
GitError: failed to start SSH session: Unable to exchange encryption keys
I have checked the keypair and connection to the github endpoint.
I am able to sync the repo manually in the server.
I found with the same issue and I finally solved with the following steps:
I create a new ssh key: ssh-keygen -f gitfs_ssh -C 'test#example.com'
Then, I read that an empty line at the end of the private key could be fatal for libssh2, so I removed the empty lines at the bottom of the file (added by ssh-keygen at creation time) and then the new key began to work.
More info in this link
I'm running Airflow on a clustered environment running on two AWS EC2-Instances. One for master and one for the worker. The worker node though periodically throws this error when running "$airflow worker":
[2018-08-09 16:15:43,553] {jobs.py:2574} WARNING - The recorded hostname ip-1.2.3.4 does not match this instance's hostname ip-1.2.3.4.eco.tanonprod.comanyname.io
Traceback (most recent call last):
File "/usr/bin/airflow", line 27, in <module>
args.func(args)
File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 387, in run
run_job.run()
File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 198, in run
self._execute()
File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 2527, in _execute
self.heartbeat()
File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 182, in heartbeat
self.heartbeat_callback(session=session)
File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 50, in wrapper
result = func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 2575, in heartbeat_callback
raise AirflowException("Hostname of job runner does not match")
airflow.exceptions.AirflowException: Hostname of job runner does not match
[2018-08-09 16:15:43,671] {celery_executor.py:54} ERROR - Command 'airflow run arl_source_emr_test_dag runEmrStep2WaiterTask 2018-08-07T00:00:00 --local -sd /var/lib/airflow/dags/arl_source_emr_test_dag.py' returned non-zero exit status 1.
[2018-08-09 16:15:43,681: ERROR/ForkPoolWorker-30] Task airflow.executors.celery_executor.execute_command[875a4da9-582e-4c10-92aa-5407f3b46d5f] raised unexpected: AirflowException('Celery command failed',)
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 52, in execute_command
subprocess.check_call(command, shell=True)
File "/usr/lib64/python3.6/subprocess.py", line 291, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'airflow run arl_source_emr_test_dag runEmrStep2WaiterTask 2018-08-07T00:00:00 --local -sd /var/lib/airflow/dags/arl_source_emr_test_dag.py' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/dist-packages/celery/app/trace.py", line 382, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/lib/python3.6/dist-packages/celery/app/trace.py", line 641, in __protected_call__
return self.run(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 55, in execute_command
raise AirflowException('Celery command failed')
airflow.exceptions.AirflowException: Celery command failed
When this error occurs the task is marked as failed on Airflow and thus fails my DAG when nothing actually went wrong in the task.
I'm using Redis as my queue and postgreSQL as my meta-database. Both are external as AWS services. I'm running all of this on my company environment which is why the full name of the server is ip-1.2.3.4.eco.tanonprod.comanyname.io. It looks like it wants this full name somewhere but I have no idea where I need to fix this value so that it's getting ip-1.2.3.4.eco.tanonprod.comanyname.io instead of just ip-1.2.3.4.
The really weird thing about this issue is that it doesn't always happen. It seems to just randomly happen every once in a while when I run the DAG. It's also occurring on all of my DAGs sporadically so it's not just one DAG. I find it strange though how it's sporadic because that means other task runs are handling the IP address for whatever this is just fine.
Note: I've changed the real IP address to 1.2.3.4 for privacy reasons.
Answer:
https://github.com/apache/incubator-airflow/pull/2484
This is exactly the problem I am having and other Airflow users on AWS EC2-Instances are experiencing it as well.
The hostname is set when the task instance runs, and is set to self.hostname = socket.getfqdn(), where socket is the python package import socket.
The comparison that triggers this error is:
fqdn = socket.getfqdn()
if fqdn != ti.hostname:
logging.warning("The recorded hostname {ti.hostname} "
"does not match this instance's hostname "
"{fqdn}".format(**locals()))
raise AirflowException("Hostname of job runner does not match")
It seems like the hostname on the ec2 instance is changing on you while the worker is running. Perhaps try manually setting the hostname as described here https://forums.aws.amazon.com/thread.jspa?threadID=246906 and see if that sticks.
I had a similar problem on my Mac. It fixed it setting hostname_callable = socket:gethostname in airflow.cfg.
Personally when running on my Mac, I found that I got similar errors to this when the Mac would sleep while I was running a long job. The solution was to go into System Preferences -> Energy Saver and then check "Prevent computer from sleeping automatically when the display is off."