I want to trigger a Airflow DAG in future so the execution date is in tomorrow.
this will help us testing a file with tomorrows date.
when I click the execute button I see an option "trigger dag w/ config" but could not find any documentation around that.
Go to "trigger with config" and change the date there, next to the calendar icon. Leave the config editor as is.
The Airflow UI doesn't allow to specify an execution date, it always triggers "right now". However, the REST API and CLI do allow you to specify an execution date.
CLI (docs):
airflow dags trigger -e/--exec-date EXECUTION_DATE DAG_ID
# For example:
airflow dags trigger -e 2022-04-05 mydag
REST API (docs):
curl -X 'POST' \
'http://localhost:8080/api/v1/dags/mydag/dagRuns' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"logical_date": "2022-04-05T00:00:00Z"
}'
Related
I have beening working on Airflow for a while for no problem withe the scheduler but now I have encountered a problem.
Bascially I have a script and dag ready for a task, but the task doesn't run periodically. Instead it needs to be activated at random time. (External parties will tell us it's time and we will run it. This may happen for many times in the following months.)
Is there anyway to trigger the dag manually? Any other directions/suggestions are welcomed as well.
Thanks.
You have a number of options here:
UI: Click the "Trigger DAG" button either on the main DAG or a specific DAG.
CLI: Run airflow trigger_dag <dag_id>, see docs in https://airflow.apache.org/docs/stable/cli.html#trigger_dag. Note that later versions of airflow use the syntaxairflow dags trigger <dag_id>
API: Call POST /api/experimental/dags/<dag_id>/dag_runs, see docs in https://airflow.apache.org/docs/stable/api.html#post--api-experimental-dags--DAG_ID--dag_runs.
Operator: Use the TriggerDagRunOperator, see docs in https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/operators/trigger_dagrun/index.html#airflow.operators.trigger_dagrun.TriggerDagRunOperator and an example in https://github.com/apache/airflow/blob/master/airflow/example_dags/example_trigger_controller_dag.py.
You'll probably go with the UI or CLI if this is truly going to be 100% manual. The API or Operator would be options if you are looking to let the external party indirectly trigger it themselves. Remember to set schedule_interval=None on the DAG.
So a dag can be triggered by following ways:
Using the REST API Reference(see documentation)
endpoint-
> POST /api/experimental/dags/<DAG_ID>/dag_runs
Using Curl:
curl -X POST
http://localhost:8080/api/experimental/dags/<DAG_ID>/dag_runs
-H 'Cache-Control: no-cache'
-H 'Content-Type: application/json'
-d '{"conf":"{"key":"value"}"}'
Using Python requests:
import requests
response = requests.post(url, data=json.dumps(data), headers=headers)
Using the trigger DAG option present in the UI as mentioned by #Daniel
Airflow has API. The method you need is POST /api/experimental/dags/<DAG_ID>/dag_runs. With this method you also could pass config params for the dag run.
We use Jenkins to trigger dags manually. If you are using Jenkins you could check our jenkins pipeine library.
The examples given from other answers use the “experimental” API. This REST API is deprecated since version 2.0. Please consider using the stable REST API.
Using method POST dags/<DAG_ID>/dagRuns.
curl -X POST 'http://localhost:8080/api/v1/dags/<DAG_ID>/dagRuns' \
--header 'accept: application/json' \
--header 'Content-Type: application/json' \
--user '<user>:<password>' \ # If authentication is used
--data '{}'
According to https://airflow.apache.org/api.html I can trigger an Airflow DAG like so:
curl -X POST \
http://localhost:8080/api/experimental/dags/<DAG_ID>/dag_runs \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{"conf":"{\"key\":\"value\"}"}'
This seems to work ok for me, but I cannot figure out how to access the key/value stuff in the conf object being passed in.
I tried this:
something = dag.params.get("key", "unknown")
But it doesn't seem to work.
Does anyone know how to do this?
I have some R and python scripts in CDSW "Cloudera-Data-Science-Workbench". I create a shell script to run this with curl -v -XPOST.
How to get the status of a job from the API CDSW?
Hi it's been a while since this question was posted but hopefully the answer can still be useful to someone :)
After you run:
curl -v -XPOST http://cdsw.example.com/api/v1/projects/<$USERNAME>/<$PROJECT_NAME>/jobs/<$JOB_ID>/start --user "API_KEY:" --header "Content-type: application/json"
You should be able to see in the output a URL that looks like this:
http://cdsw.example.com/api/v1/projects/<$USERNAME>/<$PROJECT_NAME>/dashboards/<$ID>
So then you can use it to retrieve the job status for example with piping the status using jq (or without it so you can also see the status in the output as well as other stuff returned):
curl -v http://cdsw.example.com/api/v1/projects/<$USERNAME>/<$PROJECT_NAME>/dashboards/<$ID> --user "API_KEY:" | jq '.status'
Nginx doesn't have native log rotation, so an external tool, such as logrotate, is required. Nginx presents a challenge in that the logs have to be reopened post rotation. You can send a USR1 signal to it if the pid is available in /var/run.
But when running in a docker container, the pid file is missing in /var/run (and the pid actually belongs to the host, since it is technically a host process).
If you don't reopen the logs, nginx doesn't log anything at all, though it continues to function otherwise as web server, reverse proxy, etc.
You can get the process id from the Pid attribute using docker inspect and use kill -USR1 {pid} to have nginx reopen the logs.
Here's the /etc/logrotate.d/nginx file I created:
/var/log/nginx/access.log
{
size 2M
rotate 10
missingok
notifempty
compress
delaycompress
postrotate
docker inspect -f '{{ .State.Pid }}' nginx | xargs kill -USR1
endscript
}
If you want to run logrotate in a dedicated container (e.g to rotate both nginx logs and Rails' file log) rather than on the host machine, here's how I did it. The trickiest part by far was as above, getting the reload signals to nginx, Rails, etc so that they would create and log to fresh logfiles post-rotation.
Summary:
put all the logs on a single shared volume
export docker socket to the logrotate container
build a logrotate image with logrotate, cron, curl, and jq
build logrotate.conf with postrotate calls using docker exec API as detailed below
schedule logrotate using cron in the container
The hard part:
To get nginx (/etcetera) to reload thus connect to fresh log files, I sent exec commands to the other containers using Docker's API via socket. It expects a POST with the command in JSON format, to which it responds with an exec instance ID. You then need to explicitly run that instance.
An example postrotate section from my logrotate.conf file:
postrotate
exec_id=`curl -X POST --unix-socket /var/run/docker.sock \
-H "Content-Type: application/json" \
-d '{"cmd": ["nginx", "-s", "reopen"]}' \
http:/v1.41/containers/hofg_nginx_1/exec \
| jq -r '.Id'`
curl -X POST --unix-socket /var/run/docker.sock \
-H "Content-Type: application/json" \
-d '{"Detach": true}' \
http:/v1.41/exec/"$exec_id"/start
endscript
Commentary on the hard part:
exec_id=`curl -X POST --unix-socket /var/run/docker.sock \
This is the first of two calls to curl, saving the result into a variable to use in the second. Also don't forget to (insecurely) mount the socket into the container, '/var/run/docker.sock:/var/run/docker.sock'
-H "Content-Type: application/json" \
-d '{"cmd": ["nginx", "-s", "reopen"]}' \
Docker's API docs say the command can be a string or array of strings, but it only worked for me as an array of strings. I used the nginx command line tool, but something like 'kill -SIGUSR1 $(cat /var/run/nginx.pid)' would probably work too.
http:/v1.41/containers/hofg_nginx_1/exec \
I hard-coded the container name, if you're dealing with something more complicated you're probably also using a fancier logging service
| jq -r '.Id'`
The response is JSON-formatted, I used jq to extract the id (excuse me, 'Id') to use next.
curl -X POST --unix-socket /var/run/docker.sock \
-H "Content-Type: application/json" \
-d '{"Detach": true}' \
The Detach: true is probably not necessary, just a placeholder for POST data that was handy while debugging
http:/v1.41/exec/"$exec_id"/start
Making use of the exec instance ID returned by the first curl to actually run the command.
I'm sure it will evolve (say with error handling), but this should be a good starting point.
I want to add a new post to a WordPress site and set a publishing date in the future using the WordPress 4.7 REST API and bash, curl and the JSON-Tool "jq" on the client side. I have added the basic-auth plugin to WordPress so that I can use this type of authentication.
I initially upload the posting like this:
curl -s --location --basic --user 'name:password' \
--url "https://<server>/wp-json/wp/v2/posts" \
-d "title=Testpost" -d "content=This is a new post." | \
jq -r '"id: \(.id), date: \(.date), status: \(.status), title: \(.title.raw)"'
> id: 45, date: 2017-02-07T10:10:31, status: draft, title: Testpost
Ok, this is as expected. Let's see whether changing the post works:
curl -s --location --basic --user 'name:password' \
--url "https://<server>/wp-json/wp/v2/posts/45" \
-d "title=Other title" | \
jq -r '"date: \(.date), status: \(.status), title: \(.title.raw)"'
> date: 2017-02-07T10:20:32, status: draft, title: Other title
Ok, works.
Now I want to publish the post at 3 PM ("curl" parameters shrunk as they do not change):
curl ... "/wp-json/wp/v2/posts/45" \
-d "date=2017-02-07T15:00:00" | \
jq -r '"id: \(.id), date: \(.date), status: \(.status)"'
> id: 45, date: 2017-02-07T10:29:32, status: draft
Nope. Date changed to the moment the request was issued, not the requested publishing date. Ok, perhaps status "pending"?
curl ... "/wp-json/wp/v2/posts/45" \
-d "status=pending" -d "date=2017-02-07T15:00:00" | \
jq -r '"id: \(.id), date: \(.date), status: \(.status)"'
> id: 45, date: 2017-02-07T10:36:24, status: pending
Well, state is set to "pending", but date is still wrong. Perhaps explicitly setting the state to "future"?
curl ... "/wp-json/wp/v2/posts/45" \
-d "status=future" -d "date=2017-02-07T15:00:00" | \
jq -r '"id: \(.id), date: \(.date), status: \(.status)"'
> id: 45, date: 2017-02-07T10:36:48, status: publish
WHAT??? What is happening here? Now, the post is published with the current date. Precisely what I did not want to have.
But if I now, with the published post, reissue the same request
curl ... "/wp-json/wp/v2/posts/45" \
-d "status=future" -d "date=2017-02-07T15:00:00" | \
jq -r '"id: \(.id), date: \(.date), status: \(.status)"'
> id: 45, date: 2017-02-07T16:00:00, status: future
...it does the job - at least to some extent. Somehow, the time is interpreted as UTC, but finally the post is correctly set into "future" state.
Question 1: How do I get the post into this state without having it published in the first place?
Question 2: Why is the date interpreted as UTC? Wouldn't that be "date_gmt"?
What am I missing here?
Ok, I think I got the status stuff (question 1).
Solution:
You can change the publishing date to something in the future by setting the post status to private intermediately. The following two commands change the draft's publish date to some future point of time without a premature publishing:
curl ... "/wp/v2/posts/45" -d "status=private"
curl ... "/wp/v2/posts/45" -d "status=future" -d "date=2017-02-07T15:00:00"
Explanation:
Ok, and why is that? It's a feature of the WordPress core which prevents the intended operation. Let's dig through it:
The REST API is implemented below wp-includes/rest-api/endpoints. For handling posts, the code is class-wp-rest-posts-controller.php. Updating a post is handled by update_item() beginning in line 670 (in WP 4.7.2).
That function does not do that much. Mainly, it calls wp_update_post() in the WordPress core. That method is implemented in wp-includes/post.php, starting in line 3534 (sic!).
Not far after the method start we find the troublesome lines of code:
// Drafts shouldn't be assigned a date unless explicitly done so by the user.
if ( isset( $post['post_status'] ) &&
in_array($post['post_status'], array('draft', 'pending', 'auto-draft')) &&
empty($postarr['edit_date']) &&
('0000-00-00 00:00:00' == $post['post_date_gmt']) )
$clear_date = true;
else
$clear_date = false;
And this is the problem: All data checked here are the already stored information of the post. Even if I transfer a new status future with my update request through the REST API, it is not evaluated at this place. $clear_date is set solely from the information in the database. And as our post is inserted as a draft and all the other conditions match, too, it will always be true which leads the method to drop all updates to the date fields some lines further. So, there is no way to change the publish date of the post as long as the post's state is one of draft, pending or auto-draft. The core simply overwrites all intended changes to the publishing date with what it feels is "right".
The solution, as written at the beginning of this reply, is to change the post's state intermediately to private. That status does neither trigger any publishing actions nor is it on the "special handling list" of wp_update_posts(). Therefore, we can change the date in a second step - and then also update the state so that the post will be published.
I feel that this is after all a bug. In my opinion, the critical part of wp_update_post() should take the new post status of an update into consideration and leave the new date untouched if the new state is (at least) one of published or future.