is the DAG runtime displayed anywhere in the UI? - airflow

I find it strange that after a run of a DAG completes there is nowhere in the UI to tell me how long it took. Yes, I can go into (say) Graph View or Gannt to see when the first task started and when the last task to end ended, then subtract one from the other but it would be nice if the run duration were simply displayed on the UI somewhere.
Perhaps it is displayed on the UI and I'm simply not seeing it. Does anyone know if the duration of a DAG run is displayed in the UI somewhere?

Related

Airflow Rendered Template changes when task starts running?

I'm having a very weird airflow bug.
Problem
I have a dag that has a bash operator as step 1 and a KubernetesPodOperator as step 2. The issue is regarding the KubernetesPodOperator. Basically, I was giving the task image X for quite some time, I recently changed the image the task is receiving to Y.
The issue I'm having is within TaskInstanceDetails the image is correct: Y. However, in the Rendered Template, the image starts correct X, however, as soon as the task starts running, it changes the image to Y.
I know this is very vague, I can't provide a whole lot more, but I'm just more looking for possibilities of things that could be happening, as I'm out of ideas.
What I've Tried
Delete Serialized Dags from DB
Delete Rendered Task Details from DB
Airflow db reset
Airflow db init (After nuking the whole thing)
Deleting the EC2 nodes and trying with new ones
EDIT
So, I tried runnning airflow tasks render dag_id task_id execution_date and the result here is image X !! Image Y is only superimposed on dag runs.
Answering my own question here in case anyone runs into this issue. Very simple fix... I carelessly had a different image name for the workers to be run with on the kubernetes_pod_template file. Changing that solved the issue.

Is Airflow a good fit for DAG that doesn’t care about execution date/time?

The API in Airflow seems to suggest it is build around backfilling, catching up and scheduling to run regularly in interval.
I have an ETL that extract data on S3 with the versions of the previous node (where the data comes from) in DAG. For example, here are the nodes of the DAG:
ImageNet-mono
ImageNet-removed-red
ImageNet-mono-scaled-to-100x100
ImageNet-removed-red-scaled-to-100x100
where ImageNet-mono is the previous node of ImageNet-mono-scaled-to-100x100 and
where ImageNet-removed-red is the previous node of ImageNet-removed-red-scaled-to-100x100
Both of them go through transformation of scaled-to-100x100 pipeline but producing different data since the input is different.
As you can see there is no date is involved. Is Airflow a good fit?
EDIT
Currently, the graph is simple enough to be managed manually with less than 10 nodes. They won't run in regularly interval. But instead as soon as someone update the code for a node, I would have to run the downstream nodes manually one by one python GetImageNet.py removed-red and then python scale.py 100 100 ImageNet-removed-redand then python scale.py 100 100 ImageNet-mono. I am looking into a way to manage the graph with a way to one click to trigger the run.
I think it's fine to use Airflow as long as you find it useful to use the DAG representation. If your DAG does not need to be executed on a regular schedule, you can set the schedule to None instead of a crontab. You can then trigger your DAG via the API or manually via the web interface.
If you want to run specific tasks you can trigger your DAG and mark tasks as success or clear them using the web interface.

Control-M: it is possible if first job fails to continue running

I have several jobs than will run in sequence. It is possible to create a dependency between them only for completion, but not that the prior job has to complete successfully?
If a job fails this should remain red and go to the next job and continue running.
It is mandatory that this jobs to run in sequence and not in paralel.
As Mark outlined you can simply create an On-Do action within the parent job to add a condition when the job ends Not OK. The parent job will still go red and the successor job will kick off.
See below for an example:
yes, on the actions tab you create and On/Do step and say when Not OK the job should add the output condition. In this way the next job will run (in sequence) regardless of what happens to the predecessor job.

Autosys box job not finishing

I am new to Autosys and facing difficulty setting up some jobs. I have a box job containg a few command jobs. One of those command jobs may or may not run. The problem is when this job doesn't run(it remains in activated state), it keeps the box running. I have to terminate this job or the box every time such situation arises.
Is there a way to handle this?
Thanks
The answer depends on how your jobs are supposed to work, so posting JIL or a more detailed description would help, but:
You can add a "box_success" JIL attribute on your box job to define conditions that will cause the box to complete even without all the jobs running or completing.
You could consider moving the optional job outside of the box so that this issue goes away. But consider that the box could then complete while the optional job is running so make sure that you don't then have jobs running out of sequence or overlapping when they shouldn't.

Can I have Autosys send an alert if a job doesn't end at a specific time?

I have a box job that is dependent on another job finishing. The first job normally finishes by 11pm and my box job then kicks off and finishes in about 15 minutes. Occasionally, however, the job may not finish until much later. If it finishes later than 4am, I'd like to have it send an alert.
My admin told me that since it is dependent on a prior job, and not set to start at a specific time, it is not possible to set a time-based alert. Is this true? Does anybody have a workaround they can suggest? I'd rather not set the alert on the prior job (suggested by my admin) as that may not always catch those instances when my job runs longer.
Thanks!
You can set a max run alarm time which will alert if that time is exceeded
We ended up adding a job to the box with a start time of 4am that checks for the existence of the files the rest of the job creates. We also did this for the jobs predecessors to make sure we are notified if we are at risk of not finishing by 4am.

Resources