question and answer thread I found
How to set Airflow scheduler log file mode/permissions
But I am using the Helm chart
I'm looking for another way
https://artifacthub.io/packages/helm/apache-airflow/airflow/1.6.0
how do i set a permission to log file?
as-is : log file permission 666
to-be : log file permission 775 or 664
Related
I am trying to set up SELinux and an encrypted additional partition that I mount at startup using a systemd service.
If I run SELinux in permissive mode, everything runs ok (partition is correctly mounted, data can be accessed and service runs properly).
If I run SELinux in enforcing mode (enforcing=1), I am not able to mount such partition with the error:
/dev/mapper/temporary-cryptsetup-1808: chown failed: Permission denied
sh[1777]: Failed to open temporary keystore device.
sh[1777]: Command failed with code 5: Input/output error
Any ideas to fix that?
Audit2allow does not return any additional rules to be added
Solved assigning to cryptsetup the lvm_exec_t context.
In the lvm.fc file cryptsetup was defined as /bin/cryptsetup but I had to change it to /usr/sbin/cryptsetup where it actually was.
Followed the following procedure for attaching the EFS file to instances created using EB:
https://aws.amazon.com/premiumsupport/knowledge-center/elastic-beanstalk-mount-efs-volumes/#:~:text=In%20an%20Elastic%20Beanstalk%20environment,scale%20up%20to%20multiple%20instances.
But the logs of Elastic Beanstalk are showing following error:
[Instance: i-06593*****] Command failed on instance. Return code: 1 Output: (TRUNCATED)...fs ... mount -t efs -o tls fs-d9****:/ /efs Failed to resolve "fs-d9****.efs.us-east-1.amazonaws.com" - check that your file system ID is correct. See https://docs.aws.amazon.com/console/efs/mount-dns-name for more detail. ERROR: Mount command failed!. command 01_mount in .ebextensions/storage-efs-mountfilesystem.config failed. For more detail, check /var/log/eb-activity.log using console or EB CLI.
Just used **** in EFS ID for security.
Based on the comments.
The solution was to create new EFS filesystem, instead of using the original one.
Composer is failing a task due to it not being able to read a log file, it's complaining about incorrect encoding.
Here's the log that appears in the UI:
*** Unable to read remote log from gs://bucket/logs/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** 'ascii' codec can't decode byte 0xc2 in position 6986: ordinal not in range(128)
*** Log file does not exist: /home/airflow/gcs/logs/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** Fetching from: http://airflow-worker-68dc66c9db-x945n:8793/log/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-68dc66c9db-x945n', port=8793): Max retries exceeded with url: /log/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1c9ff19d10>: Failed to establish a new connection: [Errno -2] Name or service not known',))
I try viewing the file in the google cloud console and it also throws an error:
Failed to load
Tracking Number: 8075820889980640204
But I am able to download the file via gsutil.
When I view the file, it seems to have text overriding other text.
I can't show the entire file but it looks like this:
--------------------------------------------------------------------------------
Starting attempt 1 of 1
--------------------------------------------------------------------------------
#-#{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:23,313] {models.py:1569} INFO - Executing <Task(BigQueryOperator): merge_campaign_exceptions> on 2019-08-03T10:00:00+00:00#-#{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:23,314] {base_task_runner.py:124} INFO - Running: ['bash', '-c', u'airflow run __campaign_exceptions_0_0_1 merge_campaign_exceptions 2019-08-03T10:00:00+00:00 --job_id 22767 --pool _bq_pool --raw -sd DAGS_FOLDER//-campaign-exceptions.py --cfg_path /tmp/tmpyBIVgT']#-#{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:24,658] {base_task_runner.py:107} INFO - Job 22767: Subtask merge_campaign_exceptions [2019-08-04 10:01:24,658] {settings.py:176} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800#-#{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
Where the #-#{} pieces seems to be "on top of" the typical log.
I faced the same problem. In my case the problem was that I removed the google_gcloud_default connection that was being used to retrieve the logs.
Check the configuration and look for the connection name.
[core]
remote_log_conn_id = google_cloud_default
Then check the credentials used for that connection name has the right permissions to access the GCS bucket.
I'm having a similar problem with viewing logs in GCP Cloud Composer. It doesn't appear to be preventing the failing DAG task from running though. What it looks like is a permissions error between the GKE and Storage Bucket where the log files are kept.
You can still view the logs by going into your cluster's storage bucket in the same directory as your /dags folder where you should also see a logs/ folder.
Your helm chart should setup global env:
- name: AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT
value: "google-cloud-platform://"
Then, you should deploy a Dockerfile with root account only (not airflow account), additionaly, you set up your helm uid, gid as:
uid: 50000 #airflow user
gid: 50000 #airflow group
Then upgrade helm chart with new config
*** Unable to read remote log from gs://bucket
1)Found the solution after assigning the roles to the service account
2)The SA key(json or txt) to be added and configured to the connection in the
remote_log_conn_id = google_cloud_default
3)restart the scheduler and webserver of the airflow
4)restart the dags on the airflow
you can find the logs on the GCS bucket where its configured
I have two machines. Machine1: airflow-webserver, airflow-scheduler. Machine2: airflow-worker on specific queue. I am using CeleryExecutor. Task on machine2 runs successfully (writing and deleting files on local drive), but in web UI on machine1 I didnt read log files.
*** Log file does not exist: /home/airflow/logs/delete_images_by_ttl/delete_images/2018-10-29T12:24:23.299741+00:00/1.log
*** Fetching from: http://localhost-int.localdomain:8793/log/delete_images_by_ttl/delete_images/2018-10-29T12:24:23.299741+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='localhost-int.localdomain', port=8793): Max retries exceeded with url: /log/delete_images_by_ttl/delete_images/2018-10-29T12:24:23.299741+00:00/1.log
To solve this problem edit your /etc/hosts. Add ip and dns-name for airflow webserver
HTTPConnectionPool means webserver is not able to communicate to the worker node.
Add worker node hostname on /etc/hosts file
Also verify below
base_log_folder = /home/airflow/logs/
sudo chmod -R 777 /home/airflow/logs/
I have SLS files set up to copy things from a network folder to a local directory on a minion.
Looks a little like this:
cmd-test:
cmd.run:
- name: 'ROBOCOPY \\\CygwinSource C:\CygwinSource /E'
and get the following output:
-------------------------------------------------------------------------------
ROBOCOPY :: Robust File Copy for Windows
-------------------------------------------------------------------------------
Started : Tuesday, December 6, 2016 10:50:35 AM
2016/12/06 10:50:35 ERROR 1808 (0x00000710) Getting File System Type of Source \\<Server>\<program>\<file>\
The account used is a computer account. Use your global user account or local user account to access this server.
Source - \\<Server>\<program>\<folder>\
Dest : C:\<path>\<folder>\
Files : *.*
Options : *.* /S /E /DCOPY:DA /COPY:DATS /PURGE /MIR /NP /R:1 /W:1
------------------------------------------------------------------------------
NOTE : NTFS Security may not be copied - Source may not be NTFS.
2016/12/06 10:50:35 ERROR 1808 (0x00000710) Accessing Source Directory \\<Server>\<program>\<file>\
The account used is a computer account. Use your global user account or local user account to access this server.
Waiting 1 seconds... Retrying...
When I run the same thing locally in command line as 'ROBOCOPY \\\CygwinSource C:\CygwinSource /E' and it worked perfectly. I have no idea how to fix this 'use local user account' that Robocopy seems to give when using it through salt.
I also tried adding /MIR and /SEC which didnt't work.
Running Windows 10, Minion 2016.3.3
Master: Red Hat, 2016.3.3
Salt seems to be connecting to the network resource with a computer account. A few possible solutions:
Try changing the Salt Service on the Client (if that's how salt is executing the commands) to run as a domain user.
Try using the salt file server
Implement this hacky workaround where a scheduled task is created - discussed in the github issue that seems related to your problem: https://github.com/saltstack/salt/issues/16340