how oozie timezone work, it does not pick the right timezone? - oozie

I have oozie job running on CDH cluster. I have the following coordinator
<coordinator-app name="name" frequency="0 */5 * * *" start="2020-03-05T16:00Z" end="2020-03-07T16:00Z" timezone="America/New_York" xmlns="uri:oozie:coordinator:0.4">
I submitted this job at 15:15 new york time and oozie started the first job right away and it was marked at 15:00 (new york time) and the next one is scheduled for 19:00. I don't understand the time zone for oozie. Why it does not pick up the time zone I have specified ?

You can over-ride the Timezone when submitting the oozie job on the terminal
-timezone EST -config coordinator.properties -run

Related

Airflow pipeline doesn't run on scheduled time

I am trying to run a pipeline on say 'first Monday of the month'. My schedule interval is '30 17 * * 1#1' and start date is '2022-01-01' but it doesn't run. Any pointers on why it doesn't run on the scheduled time? Thank You

DataprocSubmitJobOperator Fails Intermittent With Zombie

We are using Airflow as orchestrator where it schedule workflow every hour. DataprocSubmitJobOperator is configured to schedule dataproc jobs (it uses spark). Spark sync data from source to target (runs for 50 min and then completes to avoid next schedule overlap).
Intermittent Airflow task fails due to zombie Exception. Logs show assertion failure due to pthread_mutex_lock(mu). Airflow Task exits. Underlying dataproc Job keeps running without issue.
Please suggest what can be potential issue/fix?
[2021-12-22 23:01:17,150] {dataproc.py:1890} INFO - Submitting job
[2021-12-22 23:01:17,804] {dataproc.py:1902} INFO - Job 27a2c88d-1308-4407-b965-aa490e2217fb submitted successfully.
[2021-12-22 23:01:17,805] {dataproc.py:1905} INFO - Waiting for job 27a2c88d-1308-4407-b965-aa490e2217fb to complete
E1222 23:45:58.299007027 1267 sync_posix.cc:67] assertion failed: pthread_mutex_lock(mu) == 0
[2021-12-22 23:46:00,943] {local_task_job.py:102} INFO - Task exited with return code Negsignal.SIGABRT
Config
raw_data_sync = DataprocSubmitJobOperator(
task_id="raw_data_sync",
job=RAW_DATA_GENERATION,
location='us-central1',
project_id='1f780b38bd7b0384e53292de20',
execution_timeout=timedelta(seconds=3420),
dag=dag
)

Airflow - Skip future task instance without making changes to dag file

I have a DAG 'abc' scheduled to run every day at 7 AM CST and there is task 'xyz' in that DAG.
For some reason, I do not want to run one of the tasks 'xyz' for tomorrow's instance.
How can I skip that particular task instance?
I do not want to make any changes to code as I do not have access to Prod code and the task is in Prod environment now.
Is there any way to do that using command line ?
Appreciate any help on this.
You can mark the unwanted tasks as succeeded using the run command. The tasks marked as succeeded will not be run anymore.
Assume, there is a DAG with ID a_dag and three tasks with IDs dummy1, dummy2, dummy3. We want to skipp the dummy3 task from the next DAG run.
First, we get the next execution date:
$ airflow next_execution a_dag
2020-06-12T21:00:00+00:00
Then we mark dummy3 as succeeded for this execution date:
$ airflow run -fAIim a_dag dummy3 '2020-06-12T21:00:00+00:00'
To be sure, we can check the task state. For the skipped task it will be success:
$ airflow task_state a_dag dummy3 '2020-06-12T21:00:00+00:00'
...
success
For the rest of the tasks the state will be None:
$ airflow task_state a_dag dummy1 '2020-06-12T21:00:00+00:00'
...
None

Airflow: Can't set 'default_timezone' to 'system'

Running puckel/docker-airflow, modified build so that both environment variables, and airflow.cfg have:
ENV AIRFLOW__CORE__DEFAULT_TIMEZONE=system
and
default_timezone = system
accordingly.
But in the UI, it still shows UTC, even though system time is EAT. Here is some evidence from the container:
airflow#906d2275235d:~$ echo $AIRFLOW__CORE__DEFAULT_TIMEZONE
system
airflow#906d2275235d:~$ cat airflow.cfg | grep default_timez
default_timezone = system
airflow#906d2275235d:~$ date
Thu 01 Aug 2019 04:54:23 PM EAT
Would appreciate any help, or an advice on your practice with this.
According to Airflow docs:
Please note that the Web UI currently only runs in UTC.
Although UI uses UTC, Airflow uses local time to launch DAGs. So if you have for example schedule_interval set to 0 3 * * *, Airflow will start the DAG at 3:00 EAT, but it in the UI you will see it as 0:00.

How to schedule a job every 10 mins using JIL script

How can I set a job in autosys which will run evening from 4pm to 5pm every 10 minutes?
Please help me how I can specify then start time parameter. I am new to this JIL script.
Also where can I get the complete details about scheduling a job using JIL script?
You can give the attribute start_mins for every 10 mins with the run_window. You can schedule jobs for every minute but you do not have it in seconds.
date_conditions: 1
days_of_week: all
start_mins: 00,10,20,30,40,50
run_window: "16:00-17:00"
timezone: india
/* ----------------- template ----------------- */
insert_job: template job_type: c
box_name: box1
command: <xxx>
machine: <hostname>
owner: <username>
permission: gx,ge
date_conditions: 1
days_of_week: all
start_mins: 0,30
run_window: "16:00-17:00
please refer to the cheatsheet for further details
http://supportconnectw.ca.com/public/autosys/infodocs/autosys_cheatsheet.asp
Use start_mins: "00,10,20,30,40,50"

Resources