Sonatype Nexus cleanup task in Waiting status - nexus

We've configured cleanup policies in Sonatype Nexus 3.22.1-02 and also created tasks Cleanup service Admin - Cleanup repositories AND also Admin - Compact blob store and ran them manually but they are still in waiting status after a couple of hours. This is not a scheduled task.
Is there a way to understand what is holding them up and give them a push, or cancel and rerun? TIA.

Task status Waiting means the task has completed and is waiting for next run. Whether last run succeeded or failed will be shown in Last result field.
In the screenshot below you can see that the last time the task run was on January 21, it completed successfully and it took 0 seconds to complete. Now the task is waiting for the next run that will happen on January 22.

Related

Airflow task improperly has an `upstream_failed` status after previous task succeeded after 1 retry

I have two tasks A and B. Task A failed once but the retry succeeded and is marked as a success (green). I would expect Task B to perform normally since Task A retry succeeded but it is marked as upstream_failed and was not triggered. Is this a way to fix this behavior?
The Task B has an ALL_SUCCESS trigger rule.
I am using Airflow 2.0.2 on AWS (MWAA).
Trying to restart the scheduler.
upstream_failed happened from scheduler flow or when depends are seting to failed state, you can check states from Task Instances
in Retry Mode:
Task A will be in up_for_retry state until exceed retries number.
If trigger_rule set with all_success(it's default trigger rule), Task B will not trigger untill Task A finished, If every thing running correctly.
Could you add the DAG implementation?

Airflow Dependencies Blocking Task From Getting Scheduled

I have an airflow instance that had been running with no problem for 2 months until Sunday. There was a blackout in a system on which my airflow tasks depend and some tasks where queued for 2 days. After that we decided it was better to mark all the tasks for that day as failed and just lose that data.
Nevertheless, now all the new tasks get trigger at the proper time but they are never being set to any state (neither queued nor running). I check the logs and I see this output:
Dependencies Blocking Task From Getting Scheduled
All dependencies are met but the task instance is not running. In most cases this just means that the task will probably be scheduled soon unless:
The scheduler is down or under heavy load
The following configuration values may be limiting the number of queueable processes: parallelism, dag_concurrency, max_active_dag_runs_per_dag, non_pooled_task_slot_count
This task instance already ran and had its state changed manually (e.g. cleared in the UI)
I get the impression the 3rd topic is the reason why it is not working.
The scheduler and the webserver were working, however I restarted the scheduler and still I am having the same outcome. I also deleted the data in mysql database for one job and it is still not running.
I also saw a couple of post that said it is not running because the depens_on_past was set to true and if the previous runs failed, the next one will never be executed. I also checked it and it is not my case.
Any input would be really apreciated.
Any ideas? Thanks
While debugging a similar issue i found this setting: AIRFLOW__SCHEDULER__MAX_DAGRUNS_PER_LOOP_TO_SCHEDULE (or http://airflow.apache.org/docs/apache-airflow/2.0.1/configurations-ref.html#max-dagruns-per-loop-to-schedule), checking the airflow code it seems that the scheduler queries for dagruns to examine (consider to run ti's for), this query is limited to that number of rows (or 20 by default). So if you have >20 dagruns that are in some way blocked (in our case because ti's were on up-for-retry), then it won't consider other dagruns even though these could run fine.

task must be cleared before being run

I have a task that's scheduled to run hourly, however it's not being triggered. When I look at theTask Instance Details it says:
All dependencies are met but the task instance is not running. In most cases this just means that the task will probably be scheduled soon unless:
- The scheduler is down or under heavy load
- The following configuration values may be limiting the number of queueable processes: parallelism, dag_concurrency, max_active_dag_runs_per_dag, non_pooled_task_slot_count
- This task instance already ran and had its state changed manually (e.g. cleared in the UI)
If this task instance does not start soon please contact your Airflow administrator for assistance.
If I clear the task in the UI I am able to execute it through terminal but it does not run when scheduled.
Why do I have to manually clear it after every run?

Scheduled tasks in Windows Server 2016 don't run after being disabled then enabled

I am having a problem with the Task Scheduler on Windows Server 2016 not running repetitive tasks after they are disabled and re-enabled.
I create a task with a “Triggers->Start” of five minutes in the future and set the task to run every five minutes. The “Settings->Run task as soon as possible after scheduled start is missed” option is checked.
Everything works fine – task runs at the scheduled Start time and then runs every five minutes.
Disable the task and wait more than five minutes.
Enable the task, it does not run again.
The Next Run time continues to update every five minutes but the task does not run and the Last Run Time is never updated.
There are no entries in the Task History once the task is re-enabled and no events in the Windows event logs.
The task was set to run every day and repeat either indefinitely or daily. I changed it to Run Once and repeat indefinitely and resumes running on the schedule once enabled again.
I believe why this works is that the scheduler shows that the task will "After triggered run every x minutes indefinately" even when the task is disabled.
The task stays in the "has been triggered" state when disabled and and so resumes running once enabled.

Task with no status leads to DAG failure

I have a DAG that fetches data from Elasticsearch and ingests into the data lake. The first task, BeginIngestion, opens in several tasks (one for each resource), and these tasks open in more tasks (one for each shard). After the shards are fetched, the data is uploaded to S3 and then closed into a task EndIngestion, followed by a task AuditIngestion.
It was executing correctly, but now all tasks are executed successfully, but the "closing task" EndIngestion remains with no status. When I refresh the webserver's page, the DAG is marked as Failed.
This image shows successful upstream tasks, with the task end_ingestion with no status and the DAG marked as Failed.
I also dug into the task instance details and found
Dagrun Running: Task instance's dagrun was not in the 'running' state but in the state 'failed'.
Trigger Rule: Task's trigger rule 'all_success' requires all upstream tasks to have succeeded, but found 1 non-success(es). upstream_tasks_state={'failed': 0, 'upstream_failed': 0, 'skipped': 0, 'done': 49, 'successes': 49}, upstream_task_ids=['s3_finish_upload_ingestion_raichucrud_complain', 's3_finish_upload_ingestion_raichucrud_interaction', 's3_finish_upload_ingestion_raichucrud_company', 's3_finish_upload_ingestion_raichucrud_user', 's3_finish_upload_ingestion_raichucrud_privatecontactinteraction', 's3_finish_upload_ingestion_raichucrud_location', 's3_finish_upload_ingestion_raichucrud_companytoken', 's3_finish_upload_ingestion_raichucrud_indexevolution', 's3_finish_upload_ingestion_raichucrud_companyindex', 's3_finish_upload_ingestion_raichucrud_producttype', 's3_finish_upload_ingestion_raichucrud_categorycomplainsto', 's3_finish_upload_ingestion_raichucrud_companyresponsible', 's3_finish_upload_ingestion_raichucrud_category', 's3_finish_upload_ingestion_raichucrud_additionalfieldoption', 's3_finish_upload_ingestion_raichucrud_privatecontactconfiguration', 's3_finish_upload_ingestion_raichucrud_phone', 's3_finish_upload_ingestion_raichucrud_presence', 's3_finish_upload_ingestion_raichucrud_responsible', 's3_finish_upload_ingestion_raichucrud_store', 's3_finish_upload_ingestion_raichucrud_socialprofile', 's3_finish_upload_ingestion_raichucrud_product', 's3_finish_upload_ingestion_raichucrud_macrorankingpresenceto', 's3_finish_upload_ingestion_raichucrud_macroinfoto', 's3_finish_upload_ingestion_raichucrud_raphoneproblem', 's3_finish_upload_ingestion_raichucrud_macrocomplainsto', 's3_finish_upload_ingestion_raichucrud_testimony', 's3_finish_upload_ingestion_raichucrud_additionalfield', 's3_finish_upload_ingestion_raichucrud_companypageblockitem', 's3_finish_upload_ingestion_raichucrud_rachatconfiguration', 's3_finish_upload_ingestion_raichucrud_macrorankingitemto', 's3_finish_upload_ingestion_raichucrud_purchaseproduct', 's3_finish_upload_ingestion_raichucrud_rachatproblem', 's3_finish_upload_ingestion_raichucrud_role', 's3_finish_upload_ingestion_raichucrud_requestmoderation', 's3_finish_upload_ingestion_raichucrud_categoryproblemto', 's3_finish_upload_ingestion_raichucrud_companypageblock', 's3_finish_upload_ingestion_raichucrud_problemtype', 's3_finish_upload_ingestion_raichucrud_key', 's3_finish_upload_ingestion_raichucrud_macro', 's3_finish_upload_ingestion_raichucrud_url', 's3_finish_upload_ingestion_raichucrud_document', 's3_finish_upload_ingestion_raichucrud_transactionkey', 's3_finish_upload_ingestion_raichucrud_catprobitemcompany', 's3_finish_upload_ingestion_raichucrud_privatecontactinteraction', 's3_finish_upload_ingestion_raichucrud_categoryinfoto', 's3_finish_upload_ingestion_raichucrud_marketplace', 's3_finish_upload_ingestion_raichucrud_macroproblemto', 's3_finish_upload_ingestion_raichucrud_categoryrankingto', 's3_finish_upload_ingestion_raichucrud_macrorankingto', 's3_finish_upload_ingestion_raichucrud_categorypageto']
As you see, the "Trigger Rule" field says that one of the tasks is in a "non-successful state", but at the same time the stats shows that all upstreams are marked as successful.
If I reset the database, it doesn't happen, but I can't reset it for every execution (hourly). I also don't want to reset it.
Does anyone have any light?
PS: I am running in an EC2 instance (c4.xlarge) with LocalExecutor.
[EDIT]
I found in the scheduler log that the DAG is in deadlock:
[2017-08-25 19:25:25,821] {models.py:4076} DagFileProcessor157 INFO - Deadlock; marking run failed
I guess this may be due to some exception treatment.
I have had this exact issue before, for me my code was generating duplicate task ids. And it looks like in your case there is also a duplicate id:
s3_finish_upload_ingestion_raichucrud_privatecontactinteraction
This is probably a year late for you, but hopefully this will save others, lots of debugging time :)

Resources