how to clear failing DAGs using the CLI in airflow

how to clear failing DAGs using the CLI in airflow - airflow

I have some failing DAGs, let's say from 1st-Feb to 20th-Feb. From that date upword, all of them succeeded.
I tried to use the cli (instead of doing it twenty times with the Web UI):
airflow clear -f -t * my_dags.my_dag_id
But I have a weird error:
airflow: error: unrecognized arguments: airflow-webserver.pid airflow.cfg airflow_variables.json my_dags.my_dag_id
EDIT 1:
Like #tobi6 explained it, the * was indeed causing troubles.
Knowing that, I tried this command instead:
airflow clear -u -d -f -t ".*" my_dags.my_dag_id
but it's only returning failed task instances (-f flag). -d and -u flags don't seem to work because taskinstances downstream and upstream the failed ones are ignored (not returned).
EDIT 2:
like #tobi6 suggested, using -s and -e permits to select all DAG runs within a date range. Here is the command:
airflow clear -s "2018-04-01 00:00:00" -e "2018-04-01 00:00:00" my_dags.my_dag_id.
However, adding -f flag to the command above only returns failed task instances. is it possible to select all failed task instances of all failed DAG runs within a date range ?

If you are using an asterik * in the Linux bash, it will automatically expand the content of the directory.
Meaning it will replace the asterik with all files in the current working directory and then execute your command.
This will help to avoid the automatic expansion:
"airflow clear -f -t * my_dags.my_dag_id"

One solution I've found so far is by executing sql(MySQL in my case):
update task_instance t left join dag_run d on d.dag_id = t.dag_id and d.execution_date = t.execution_date
set t.state=null,
d.state='running'
where t.dag_id = '<your_dag_id'
and t.execution_date > '2020-08-07 23:00:00'
and d.state='failed';
It will clear all tasks states on failed dag_runs, as button 'clear' pressed for entire dag run in web UI.

In airflow 2.2.4 the airflow clear command was deprecated.
You could now run:
airflow tasks clear -s <your_start_date> -e <end_date> <dag_id>

Related

AWS Code Deploy - Script at specified location: scripts/validate_service.sh failed with exit code 1

My deployments fail on last step Validate Service with error message:
The overall deployment failed because too many individual instances failed deployment, too few healthy instances are available for deployment, or some instances in your deployment group are experiencing problems.
Events log
No lines are selected.
My validate_service.sh contain
#!/bin/bash
# verify we can access our webpage successfully
curl -v --silent localhost:80 2>&1 | grep Welcome
Can someone advice what should I change ?

Script return value matters. Yours looks good to me. I just added couple of seconds to wait until application starts up.
In case you use bash -x together with pipeline of commands, you better add shopt -s pipefail so all pipeline fails when one of the commands fails.
Checkout my script:
#!/bin/bash
sleep 5
curl http://localhost:3009 | grep Welcome

How to prevent a step failing in Bitbucket Pipelines?

I am running all my test cases and some of them get fail sometimes, pipeline detects it and fail the step and build. this blocks the next step to be executed (zip the report folder). I want to send that zip file as an email attachment.
Here is my bitbucket-pipelines.yml file
custom: # Pipelines that can only be triggered manually
QA2: # The name that is displayed in the list in the Bitbucket Cloud GUI
- step:
image: openjdk:8
caches:
- gradle
size: 2x # double resources available for this step to 8G
script:
- apt-get update
- apt-get install zip
- cd config/geb
- ./gradlew -DBASE_URL=qa2 clean BSchrome_win **# This step fails**
- cd build/reports
- zip -r testresult.zip BSchrome_winTest
after-script: # On test execution completion or build failure, send test report to e-mail lists
- pipe: atlassian/email-notify:0.3.11
variables:
<<: *email-notify-config
TO: 'email#email.com'
SUBJECT: "Test result for QA2 environment"
BODY_PLAIN: |
Please find the attached test result report to the email.
ATTACHMENTS: config/geb/build/reports/testresult.zip
The steps:
- cd build/reports
and
- zip -r testresult.zip BSchrome_winTest
do not get executed because - ./gradlew -DBASE_URL=qa2 clean BSchrome_win failed
I don't want bitbucket to fail the step and stop the Queue's step from executing.

The bitbucket-pipelines.yml file is just running bash/shell commands on Unix. The script runner looks for the return status codes of each command, to see if it succeeded (status = 0) or failed (status = non-zero). So you can use various techniques to control this status code:
Add " || true" to the end of your command
./gradlew -DBASE_URL=qa2 clean BSchrome_win || true
When you add "|| true" to the end of a shell command, it means "ignore any errors, and always return a success code 0". More info:
Bash ignoring error for a particular command
https://www.cyberciti.biz/faq/bash-get-exit-code-of-command/
Use "gradlew --continue" flag
./gradlew -DBASE_URL=qa2 clean BSchrome_win --continue
The "--continue" flag can be used to prevent a single test failure from stopping the whole task. So if one test or sub-step fails, gradle will try to continue running the other tests until all are run. However, it may still return an error, if an important step failed. More info: Ignore Gradle Build Failure and continue build script?
Move the 2 steps to the after-script section
after-script:
- cd config/geb # You may need this, if the current working directory is reset. Check with 'pwd'
- cd build/reports
- zip -r testresult.zip BSchrome_winTest
If you move the 2 steps for zip creation to the after-script section, then they will always run, regardless of the success/fail status of the previous step.

A better solution
If you want all the commands in your script to execute regardless of errors then put set +e at the top of your script.
If you just want to ignore the error for one particular command then put set +e before that command and set -e after it.
Example:
- set +e
- ./gradlew -DBASE_URL=qa2 clean BSchrome_win **# This step fails**
- set -e
Also valid for group of commands:
- set +e
- cd config/geb
- ./gradlew -DBASE_URL=qa2 clean BSchrome_win **# This step fails**
- cd config/geb
- set -e

I had a similar problem I had a command that normally takes 1 minute, but sometimes stalls and hits the 2 hour max build timeout (and corrupts my cypress installation)...
I wrapped my command with the timeout command and then ORd the result with true
eg. I changed this:
- yarn
to this:
- timeout 5m yarn || yarn cypress install --force || true # Sometimes this stalls, so kill it if it takes more than 5m and reinstall cypress
- timeout 5m yarn # Try again (in case it failed on previous line). Should be quick

How to set timers for console commands in symfony 2.x?

In my app I've got one task that needs to be done every 48 hours on server side. I've created a console command in order to automatize my job. However I don't know how can I set timer to keep invoking that command. Can you point my a way to do that?

You should see on a cron commands.
Cron will run your command every X (frequency) times.
TO create a cron, (on unix) use: crontab -e
For example
0 0 */2 * * bin/console app:command >/dev/null 2>&1
will run every odd days, bin/console app:command
to help you generating a cron
https://crontab-generator.org/

Airflow will keep showing example dags even after removing it from configuration

Airflow example dags remain in the UI even after I have turned off load_examples = False in config file.
The system informs the dags are not present in the dag folder but they remain in UI because the scheduler has marked it as active in the metadata database.
I know one way to remove them from there would be to directly delete these rows in the database but off course this is not ideal.How should I proceed to remove these dags from UI?

There is currently no way of stopping a deleted DAG from being displayed on the UI except manually deleting the corresponding rows in the DB. The only other way is to restart the server after an initdb.

Airflow 1.10+:
Edit airflow.cfg and set load_examples = False
For each example dag run the command airflow delete_dag example_dag_to_delete
This avoids resetting the entire airflow db.
(Since Airflow 1.10 there is the command to delete dag from database, see this answer )

Assuming you have installed airflow through Anaconda.
Else look for airflow in your python site-packages folder and follow the below.
After you follow the instructions https://stackoverflow.com/a/43414326/1823570
Go to $AIRFLOW_HOME/lib/python2.7/site-packages/airflow directory
Remove the directory named example_dags or just rename it to revert back
Restart your webserver
cat $AIRFLOW_HOME/airflow-webserver.pid | xargs kill -9
airflow webserver -p [port-number]

Definitely airflow resetdb works here.
What I do is to create multiple shell scripts for various purposes like start webserver, start scheduler, refresh dag, etc. I only need to run the script to do what I want. Here is the list:
(venv) (base) [pchoix#hadoop02 airflow]$ cat refresh_airflow_dags.sh
#!/bin/bash
cd ~
source venv/bin/activate
airflow resetdb
(venv) (base) [pchoix#hadoop02 airflow]$ cat start_airflow_scheduler.sh
#!/bin/bash
cd /home/pchoix
source venv/bin/activate
cd airflow
nohup airflow scheduler >> "logs/schd/$(date +'%Y%m%d%I%M%p').log" &
(venv) (base) [pchoix#hadoop02 airflow]$ cat start_airflow_webserver.sh
#!/bin/bash
cd /home/pchoix
source venv/bin/activate
cd airflow
nohup airflow webserver >> "logs/web/$(date +'%Y%m%d%I%M%p').log" &
(venv) (base) [pchoix#hadoop02 airflow]$ cat start_airflow.sh
#!/bin/bash
cd /home/pchoix
source venv/bin/activate
cd airflow
nohup airflow webserver >> "logs/web/$(date +'%Y%m%d%I%M%p').log" &
nohup airflow scheduler >> "logs/schd/$(date +'%Y%m%d%I%M%p').log" &
Don't forget to chmod +x to those scripts
I hope you find this helps.

Run a service automatically in a docker container

I'm setting up a simple image: one that holds Riak (a NoSQL database). The image starts the Riak service with riak start as a CMD. Now, if I run it as a daemon with docker run -d quintenk/riak-dev, it does start the Riak process (I can see that in the logs). However, it closes automatically after a few seconds. If I run it using docker run -i -t quintenk/riak-dev /bin/bash the riak process is not started (UPDATE: see answers for an explanation for this). In fact, no services are running at all. I can start it manually using the terminal, but I would like Riak to start automatically. I figure this behavior would occur for other services as well, Riak is just an example.
So, running/restarting the container should automatically start Riak. What is the correct approach of setting this up?
For reference, here is the Dockerfile with which the image can be created (UPDATE: altered using the chosen answer):
FROM ubuntu:12.04
RUN apt-get update
RUN apt-get install -y openssh-server curl
RUN curl http://apt.basho.com/gpg/basho.apt.key | apt-key add -
RUN bash -c "echo deb http://apt.basho.com precise main > /etc/apt/sources.list.d/basho.list"
RUN apt-get update
RUN apt-get -y install riak
RUN perl -p -i -e 's/(?<=\{http,\s\[\s\{")127\.0\.0\.1/0.0.0.0/g' /etc/riak/app.config
EXPOSE 8098
CMD /bin/riak start && tail -F /var/log/riak/erlang.log.1
EDIT: -f changed to -F in CMD in accordance to sesm his remark
MY OWN ANSWER
After working with Docker for some time I picked up the habit of using supervisord to tun my processes. If you would like example code for that, check out https://github.com/Krijger/docker-cookbooks. I use my supervisor image as a base for all my other images. I blogged on using supervisor here.

To keep docker containers running, you need to keep a process active in the foreground.
So you could probably replace that last line in your Dockerfile with
CMD /bin/riak console
Or even
CMD /bin/riak start && tail -F /var/log/riak/erlang.log.1
Note that you can't have multiple lines of CMD statements, only the last one gets run.

Using tail to keep container alive is a hack. Also, note, that with -f option container will terminate when log rotation happens (this can be avoided by using -F instead).
A better solution is to use supervisor. Take a look at this tutorial about running Riak in a Docker container.

The explanation for:
If I run it using docker run -i -t quintenk/riak-dev /bin/bash the riak process is not started
is as follows. Using CMD in the Dockerfile is actually the same functionality as starting the container using docker run {image} {command}. As Gigablah remarked only the last CMD is used, so the one written in the Dockerfile is overwritten in this case.
By using CMD /bin/riak start && tail -f /var/log/riak/erlang.log.1 in the Buildfile, you can start the container as a background process using docker run -d {image}, which works like a charm.

"If I run it using docker run -i -t quintenk/riak-dev /bin/bash the riak process is not started"
It sounds like you only want to be able to monitor the log when you attach to the container. My use case is a little different in that I want commands started automatically, but I want to be able to attach to the container and be in a bash shell. I was able to solve both of our problems as follows:
In the image/container, add the commands you want automatically started to the end of the /etc/bash.bashrc file.
In your case just add the line /bin/riak start && tail -F /var/log/riak/erlang.log.1, or put /bin/riak start and tail -F /var/log/riak/erlang.log.1 on separate lines depending on the functionality desired.
Now commit your changes to your container, and run it again with: docker run -i -t quintenk/riak-dev /bin/bash. You'll find the commands you put in the bashrc are already running as you attach.

Because I want a clean way to have the process exit later I make the last command a call to the shell's read which causes that process to block until I later attach to it and hit enter.
arthur#macro:~/docker$ sudo docker run -d -t -i -v /raid:/raid -p 4040:4040 subsonic /bin/bash -c 'service subsonic start && read -p "waiting"'
WARNING: Docker detected local DNS server on resolv.conf. Using default external servers: [8.8.8.8 8.8.4.4]
f27229a260c9
arthur#macro:~/docker$ sudo docker ps
[sudo] password for arthur:
ID IMAGE COMMAND CREATED STATUS PORTS
35f253bdf45a subsonic:latest /bin/bash -c service 2 days ago Up 2 days 4040->4040
arthur#macro:~/docker$ sudo docker attach 35f253bdf45a
arthur#macro:~/docker$ sudo docker ps
ID IMAGE COMMAND CREATED STATUS PORTS
as you can see the container exits after you attach to it and unblock the read.
You can of course use a more sophisticated script than read -p if you need to do other clean up, such as stopping services and saving logs etc.

I use a simple trick whenever I start building a new docker container. To keep it alive, I use a ping in the entrypoint script.
So in the Dockerfile, when using debian, for instance, I make sure I can ping.
This is btw, always nice, to check what is accessible from within the container.
...
RUN DEBIAN_FRONTEND=noninteractive apt-get update \
&& apt-get install -y iputils-ping
...
ENTRYPOINT ["entrypoint.sh"]
And in the entrypoint.sh file
#!/bin/bash
...
ping 10.10.0.1 >/dev/null 2>/dev/null
I use this instead of CMD bash, as I always wind up using a startup file.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex