Why does Airflow create multiple log files? - airflow

I have recently started working on Airflow scheduler. And I have been observing that it is creating multiple log files for each job that is scheduled. May I know how to restrict it to just one file. I have checked airflow.cfg file and I weren't able to find any argument which is related to the number of copies of log files.

Related

How can I write a file from a DAG to my composer cloud storage bucket?

The accepted answer to this question states that
"...the gs://my-bucket/dags folder is available in the scheduler, web server, and workers at /home/airflow/gcs/dags."
(which is supported by the newer docs)
So I wrote a bash operator like this:
t1 = bash.BashOperator(
task_id='my_test',
bash_command="touch /home/airflow/gcs/data/test.txt",
)
I thought by prefacing my file creation with the path specified in the answer it would write to the data folder in my cloud composer environment's associated storage account. Simiarly, touch test.txt also ran successfully but didn't actually create a file anywhere I can see it (I assume it's written to the worker's temp storage which is then deleted when the worker is shut down following execution of the DAG). I can't seem to persist any data from simple commands run through a DAG? Is it even possible to simply write out some files from a bash script running in Cloud Composer? Thank you in advance.
Bizarrely, I needed to add a space at the end of the string containing the Bash command.
t1 = bash.BashOperator(
task_id='my_test',
bash_command="touch /home/airflow/gcs/data/test.txt ",
)
The frustrating thing was the error said the path didn't exist so I went down a rabbit-hole mapping the directories of the Airflow worker until I was absolutely certain it did - then I found a similar issue here. Although I didn't get the 'Jinja Template not Found Error' I should have got according to this note.

AWS MWAA (Managed Apache Airflow) where to put the python code used in the dags?

Where do you put your actual code? The dags must be thin, this assumes that when the task starts to run it would do the imports, and run some python code.
When we were on the standalone airflow I could add to the PYTHON_PATH my project root and do the imports from there, but in the AWS managed airflow I don't find any clues.
Put your DAGs into S3. Upon initialization of your MWAA environment, you will determine the S3 bucket containing your code.
E.g., create a bucket <my-dag-bucket> and place your DAGs in a subfolder dags
s3://<my-dag-bucket>/dags/
Also make sure to define all python dependencies in a requirements file and put that one in the same bucket as well:
s3://<my-dag-bucket>/requirements.txt
Finally, if you need to provide own modules, zip them up and put the zip file in the bucket, too:
s3://<my-dag-bucket>/plugins.zip
See https://docs.aws.amazon.com/mwaa/latest/userguide/get-started.html

Deleting airflow logs in scheduler folder

Currently i am using Airflow with Version : 1.10.10
After opening into airflow/logs folder there are many folder that are named based on your DAG name but there is a folder named scheduler which when opened consist folder that are named in date format ( E.g 2020/07/08 ) and it goes until the date when i first using airflow.After searching through multiple forum I'm still not sure what this folder logs are for.
Anyway the probelm is I kept wondering if it is okay to delete the contents inside scheduler folder since it takes so much space unlike the rest of the folder that are named based on the DAG name (I'm assuming thats where the log of each DAG runs is stored). Will the action of deleting the contents of scheduler cause any error or loss of DAG log?.
This might be a silly question but i want to make sure since the Airflow is in production server. I've tried creating an Airflow instance in local instance and delete the scheduler folder contents and it seems no error have occurred. Any feedback and sharing experience on handling this issue is welcomed
Thanks in Advance
It contains the logs of airflow scheduler afaik. I have used it only one time for a problem about SLAs.
I've been deleting old files in it for over a year, never encountered a problem. this is my command to delete old log files of scheduler:
find /etc/airflow/logs/scheduler -type f -mtime +45 -delete

Custom logs folder path for each airflow DAG

I know airflow supports logging into S3/GCS/Azure etc.,
But is there a way to log into specific folders inside this storage based on some configuration inside the DAGs?
Airflow does not support this feature yet. There is a centralised log folder to be configured in airflow.cfg where all logs get saved irrespective of the dag

Why are my Episerver jobs failing when running automatically or run by the scheduler?

I'm using Episerver to run various jobs on an intranet website, and almost all of these jobs will fail to run automatically, but will work just fine when I run them manually.
99% of the time, this is the error message I get: Could not load file or assembly 'Midco.CMS.Intranet' or one of its dependencies. The system cannot find the file specified.
If I'm actually missing an assembly, then how is my job able to run manually?
The same code was run automatically on previous publishes of our site, but is now failing. I've tried creating a 'dummy' principal role with admin privileges, hoping that would allow the job to run, I've looked online for other solutions, and have tried debugging, but I can't find a way to make the jobs run automatically.
Are there any Episerver users out there who know how to get the job to run automatically? Thanks!
in order to get the scheduled Episerver jobs to run again, my team had to copy our existing Episerver database to a new database, then delete the old one. When we scheduled the jobs after copying to the new location, they all ran on time, without error, and have been running without issue for about 8 months. It was drastic but it worked.

Resources