We have setup OpenStack using conjure-up on a (Ubuntu LTS server 16.04.3) single machine. All are services are up and running, and successfully I am able to upload images to the glance.
We wanted to save these glance images created by "glance image-create" in remote machine which have nfs server. So we have configured glance-api.conf file as below.
My glance-api.conf looks like this:
[glance_store]
filesystem_store_datadir = /var/lib/glance/images
default_store = file
And in glance controller node, I have mounted
remote machine Ip/home/glance/images/ in this directory path
/var/lib/glance/images
and have mentioned the same mounted directory path inside the glance-api.conf file.
I have created the two sample private network with some ip (192.168.1.0 and 10.221.50.0) but have not created a public network as at this moment I don't want to access this VM instance from outside.
When I am trying to launch the instance from dashboard UI as well as through CLI, I am getting below error.
Error: Failed to perform requested operation on instance "Ubuntu_Hawkbit", the instance has an error status: Please try again later [Error: No valid host was found. There are not enough hosts available.].
Note: I have tried by associating Instance with different private network ,thinking that it may be network IP address issue but facing the same error.
When I check /var/log/nova/nova-compute.log logs, I see below error.
ERROR nova.image.glance [req-1459f1b2-491c-46a2-b803-6ff621a79d30 6ebc7996240c4ce688234f544c9d0116 07427c9d49704357a049b24193ee0a28 - -
-] Error contacting glance server 'http://10.206.193.159:9292' for 'data', done trying.
ERROR nova.image.glance CommunicationError:
Error finding address for
http://10.206.193.159:9292/v1/images/6c30e2ab-1078-45ad-bed2-3e3a75f6af8c:
('Connection aborted.', BadStatusLine("''",))
ERROR
nova_lxd.nova.virt.lxd.image
[req-1459f1b2-491c-46a2-b803-6ff621a79d30
6ebc7996240c4ce688234f544c9d0116 07427c9d49704357a049b24193ee0a28 - -
-] [instance: eedc008d-ef34-498d-8774-b3813ce032f4] Failed to upload 6c30e2ab-1078-45ad-bed2-3e3a75f6af8c to LXD: Connection to glance
ERROR nova_lxd.nova.virt.lxd.operations
[req-1459f1b2-491c-46a2-b803-6ff621a79d30
6ebc7996240c4ce688234f544c9d0116 07427c9d49704357a049b24193ee0a28 - -
-] [instance: eedc008d-ef34-498d-8774-b3813ce032f4] Faild to start container instance-00000020: Connection to glance host
http://10.206.193.159:9292 failed: Error finding address for
http://10.206.193.159:9292/v1/images/6c30e2ab-1078-45ad-bed2-3e3a75f6af8c:
('Connection aborted.', BadStatusLine("''",))
ERROR nova.compute.manager [req-1459f1b2-491c-46a2-b803-6ff621a79d30
6ebc7996240c4ce688234f544c9d0116 07427c9d49704357a049b24193ee0a28 - -
-] [instance: eedc008d-ef34-498d-8774-b3813ce032f4] Instance failed to spawn
Related
I am using ubuntu 20.04
I have got error while running jupyter notebook:
ERROR
The requested URL could not be retrieved
The following error was encountered while trying to retrieve the URL: http://localhost:8888/tree?
Connection to 127.0.0.1 failed.
The system returned: (111) Connection refused
The remote host or network may be down. Please try the request again.
i have my airflow docker container running with celery worker nodes, the jobs are getting triggered correctly. However from the webserver UI section the it cannot reach the worker for log, showing as
*** Log file does not exist: /usr/local/airflow/airflow/logs/spark-submit/transform/2021-08-07T04:21:58.646836+00:00/1.log
*** Fetching from: http://localhost.localdomain:8793/log/spark-submit/transform/2021-08-07T04:21:58.646836+00:00/1.log
*** Failed to fetch log file from worker. [Errno 111] Connection refused
trying to diagnose why this happened? does it mean that the webserver is not able to resolve the worker ip address correctly? How can i configure this worker ip mapping somewhere?
I've tried set up the hostname in airflow.cfg as
hostname_callable = airflow.utils.net.get_host_ip_address
but doesn't help.
Appreciate any help! Thanks
Version:
airflow==2.1.2
celery[redis]==4.4.2
Composer is failing a task due to it not being able to read a log file, it's complaining about incorrect encoding.
Here's the log that appears in the UI:
*** Unable to read remote log from gs://bucket/logs/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** 'ascii' codec can't decode byte 0xc2 in position 6986: ordinal not in range(128)
*** Log file does not exist: /home/airflow/gcs/logs/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** Fetching from: http://airflow-worker-68dc66c9db-x945n:8793/log/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-68dc66c9db-x945n', port=8793): Max retries exceeded with url: /log/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1c9ff19d10>: Failed to establish a new connection: [Errno -2] Name or service not known',))
I try viewing the file in the google cloud console and it also throws an error:
Failed to load
Tracking Number: 8075820889980640204
But I am able to download the file via gsutil.
When I view the file, it seems to have text overriding other text.
I can't show the entire file but it looks like this:
--------------------------------------------------------------------------------
Starting attempt 1 of 1
--------------------------------------------------------------------------------
#-#{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:23,313] {models.py:1569} INFO - Executing <Task(BigQueryOperator): merge_campaign_exceptions> on 2019-08-03T10:00:00+00:00#-#{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:23,314] {base_task_runner.py:124} INFO - Running: ['bash', '-c', u'airflow run __campaign_exceptions_0_0_1 merge_campaign_exceptions 2019-08-03T10:00:00+00:00 --job_id 22767 --pool _bq_pool --raw -sd DAGS_FOLDER//-campaign-exceptions.py --cfg_path /tmp/tmpyBIVgT']#-#{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:24,658] {base_task_runner.py:107} INFO - Job 22767: Subtask merge_campaign_exceptions [2019-08-04 10:01:24,658] {settings.py:176} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800#-#{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
Where the #-#{} pieces seems to be "on top of" the typical log.
I faced the same problem. In my case the problem was that I removed the google_gcloud_default connection that was being used to retrieve the logs.
Check the configuration and look for the connection name.
[core]
remote_log_conn_id = google_cloud_default
Then check the credentials used for that connection name has the right permissions to access the GCS bucket.
I'm having a similar problem with viewing logs in GCP Cloud Composer. It doesn't appear to be preventing the failing DAG task from running though. What it looks like is a permissions error between the GKE and Storage Bucket where the log files are kept.
You can still view the logs by going into your cluster's storage bucket in the same directory as your /dags folder where you should also see a logs/ folder.
Your helm chart should setup global env:
- name: AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT
value: "google-cloud-platform://"
Then, you should deploy a Dockerfile with root account only (not airflow account), additionaly, you set up your helm uid, gid as:
uid: 50000 #airflow user
gid: 50000 #airflow group
Then upgrade helm chart with new config
*** Unable to read remote log from gs://bucket
1)Found the solution after assigning the roles to the service account
2)The SA key(json or txt) to be added and configured to the connection in the
remote_log_conn_id = google_cloud_default
3)restart the scheduler and webserver of the airflow
4)restart the dags on the airflow
you can find the logs on the GCS bucket where its configured
I upload a dag file to the web page and when I click 'Graph View' -> ${my_dag} -> 'View Log', it shows:
*** Log file isn't local.
*** Fetching here: http://:8793/log/demo_dag/hello_task/2018-11-14T15:06:00
*** Failed to fetch log file from worker.
*** Reading remote logs...
*** Unsupported remote log location.
I have checked the airflow.cfg and find these config info:
worker_log_server_port = 8793
base_log_folder = /root/airflow/logs
My question is:
How to setup IP address for log service (Only port is setup)?
I have setup directory for log service, why does it still go to /log/.. ?
Any help is appreciated.
This can happen when the task status was manually changed (likely through the "Mark Success" option) and the task never receives a hostname value on the record.
The webserver is attempting to reach out to a server, with no name, to get logs for a task that never ran.
PS: Be careful running processes as the root user.
I've been getting this error, fix it by correcting the socket volume path:
WARNING - OSError while attempting to symlink the latest log directory
In windows the volume will go with a double bar like this:
volumes:
- //var/run/docker.sock:/var/run/docker.sock
Bind to docker socket on Windows
Setting up Airflow to run with Docker Swarm’s orchestration
I am attempting to bootstrap a private OpenStack cloud using Cloudify 2.7.1. It boots up the Linux instance correctly but fails "Uploading files to 192.168.10.XXX." due to an SFTP problem : "Could not determine the type of file "sftp://root:***#192.168.10.xxx/root/gs-files".".
I can access to the Instance using ssh (there is no probleme in the connection). I tried with other images (CentOS, Ubuntu, Cerros, ...) but always the same error !!
can anyone help me please ?
I attached a screenshot of the network topology created by Cloudify, and the stack trace.
Full stack trace:
2015-04-30 10:26:27,470 INFO [org.cloudifysource.shell.commands.AbstractGSCommand] - Setting security profile to "nonsecure".
2015-04-30
10:26:27,589 INFO
[org.cloudifysource.shell.commands.AbstractGSCommand] - Bootstrapping
cloud openstack-havana. This may take a few minutes.
2015-04-30
10:26:27,677 INFO
[org.cloudifysource.esc.driver.provisioning.BaseProvisioningDriver] -
Setup network configuration for managers
2015-04-30 10:26:27,677
INFO [org.cloudifysource.esc.driver.provisioning.BaseProvisioningDriver]
- Using management network : Cloudify-Management-Network
2015-04-30
10:26:51,536 INFO
[org.cloudifysource.esc.shell.listener.CliAgentlessInstallerListener] -
Attempting to access Management VM 192.168.10.241.
2015-04-30
10:27:10,551 INFO
[org.cloudifysource.esc.shell.listener.CliAgentlessInstallerListener] -
Uploading files to 192.168.10.241.
2015-04-30 10:27:15,708 WARNING [com.jcraft.jsch] - Permanently added '192.168.10.241' (RSA) to the list of known hosts.
2015-04-30
10:27:25,998 INFO
[org.cloudifysource.esc.shell.installer.CloudGridAgentBootstrapper] -
Failed accessing management VM 192.168.10.241 Reason: Failed to set up
file transfer: Unknown message with code "Could not determine the type
of file "sftp://cirros#192.168.10.241/cirros/gs-files".".; Caused by:
org.cloudifysource.esc.installer.InstallerException: Failed to set up
file transfer: Unknown message with code "Could not determine the type
of file "sftp://cirros#192.168.10.241/cirros/gs-files".".
2015-04-30
10:27:26,210 INFO
[org.cloudifysource.esc.driver.provisioning.openstack.OpenStackCloudifyDriver]
- Deleting Floating ip:
FloatingIp[floatingNetworkId=15578898-5e6b-44d9-a73a-1328ca6ea140,floatingIpAddress=192.168.10.241,portId=4b8dc211-12e8-4383-8799-f783d2786e98,id=593d8424-cfec-41ed-8204-ed8609366416]
2015-04-30
10:27:29,607 SEVERE
[org.cloudifysource.shell.commands.AbstractGSCommand] - Failed to set up
file transfer: Unknown message with code "Could not determine the type
of file "sftp://cirros#192.168.10.241/cirros/gs-files".". :
org.cloudifysource.esc.installer.InstallerException: Failed to set up
file transfer: Unknown message with code "Could not determine the type
of file "sftp://cirros#192.168.10.241/cirros/gs-files".".
at org.cloudifysource.esc.installer.filetransfer.VfsFileTransfer.initialize(VfsFileTransfer.java:206)
at org.cloudifysource.esc.installer.AgentlessInstaller.uploadFilesToServer(AgentlessInstaller.java:306)
at org.cloudifysource.esc.installer.AgentlessInstaller.installOnMachineWithIP(AgentlessInstaller.java:210)
at org.cloudifysource.esc.shell.installer.CloudGridAgentBootstrapper$1.call(CloudGridAgentBootstrapper.java:865)
at org.cloudifysource.esc.shell.installer.CloudGridAgentBootstrapper$1.call(CloudGridAgentBootstrapper.java:860)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused
by: org.apache.commons.vfs2.FileSystemException: Unknown message with
code "Could not determine the type of file
"sftp://cirros#192.168.10.241/cirros/gs-files".".
at org.apache.commons.vfs2.provider.sftp.SftpFileObject.refresh(SftpFileObject.java:95)
at org.apache.commons.vfs2.provider.AbstractFileSystem.resolveFile(AbstractFileSystem.java:366)
at org.apache.commons.vfs2.provider.AbstractFileSystem.resolveFile(AbstractFileSystem.java:317)
at org.apache.commons.vfs2.provider.AbstractOriginatingFileProvider.findFile(AbstractOriginatingFileProvider.java:85)
at org.apache.commons.vfs2.provider.AbstractOriginatingFileProvider.findFile(AbstractOriginatingFileProvider.java:65)
at org.apache.commons.vfs2.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:693)
at org.apache.commons.vfs2.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:621)
at org.cloudifysource.esc.installer.filetransfer.VfsFileTransfer.resolveTargetDirectory(VfsFileTransfer.java:218)
at org.cloudifysource.esc.installer.filetransfer.VfsFileTransfer.initialize(VfsFileTransfer.java:203)
... 8 more
Caused
by: org.apache.commons.vfs2.FileSystemException: Could not determine
the type of file "sftp://cirros#192.168.10.241/cirros/gs-files".
at org.apache.commons.vfs2.provider.AbstractFileObject.getType(AbstractFileObject.java:505)
at org.apache.commons.vfs2.provider.sftp.SftpFileObject.refresh(SftpFileObject.java:91)
... 16 more
Caused by: org.apache.commons.vfs2.FileSystemException: Could not connect to SFTP server at "sftp://cirros#192.168.10.241/".
at org.apache.commons.vfs2.provider.sftp.SftpFileSystem.getChannel(SftpFileSystem.java:153)
at org.apache.commons.vfs2.provider.sftp.SftpFileObject.statSelf(SftpFileObject.java:151)
at org.apache.commons.vfs2.provider.sftp.SftpFileObject.doGetType(SftpFileObject.java:114)
at org.apache.commons.vfs2.provider.AbstractFileObject.getType(AbstractFileObject.java:496)
... 17 more
Caused by: com.jcraft.jsch.JSchException: java.io.IOException: Pipe closed
at com.jcraft.jsch.ChannelSftp.start(ChannelSftp.java:288)
at com.jcraft.jsch.Channel.connect(Channel.java:152)
at com.jcraft.jsch.Channel.connect(Channel.java:145)
at org.apache.commons.vfs2.provider.sftp.SftpFileSystem.getChannel(SftpFileSystem.java:130)
... 20 more
Caused by: java.io.IOException: Pipe closed
at java.io.PipedInputStream.read(PipedInputStream.java:308)
at java.io.PipedInputStream.read(PipedInputStream.java:378)
at com.jcraft.jsch.ChannelSftp.fill(ChannelSftp.java:2665)
at com.jcraft.jsch.ChannelSftp.header(ChannelSftp.java:2691)
at com.jcraft.jsch.ChannelSftp.start(ChannelSftp.java:257)
... 23 more
Looks like you are trying to sftp into a cirros instance - I am not sure cirros even supports sftp. You can try this by using the sftp command line utility.
In general, sftp has to be configured and available on the target machine.
You can try using the SCP file transfer mode by setting this in your compute template:
fileTransfer org.cloudifysource.domain.cloud.FileTransferModes.SCP
If you are really using cirros, I suspect bootstrapping will fail. Cloudify was never tested on cirros. I think cirros is lacking some very basic utilities (I think it is not running bash. Not sure if it has wget). Cirros was never meant as a generic distribution - it is meant for testing your cloud's basic functionality.
One more thing - Cloudify 2 has reached End-of-Life - it is no longer supported. You should check out Cloudify 3.