In this example, I see that the EmrCreateJobFlowOperator receives the aws/emr connections that were setup in Airflow UI:
cluster_creator = EmrCreateJobFlowOperator(
task_id='create_job_flow',
job_flow_overrides=JOB_FLOW_OVERRIDES,
aws_conn_id='aws_default',
emr_conn_id='emr_conn_id',
dag=dag)
In Airflow UI, in the tabs of connections, how can I add my AWS credentials so the DAG can pick them up? I don't see any type of connections for AWS. Any idea?
The aws_default picks up credentials from environment variables or ~/.aws/credentials. You can choose your deployment mode as decide where you want to put the secret.
However if you want to add a connection string via UI, you can go to Admin -> Connections and edit the keys there.
Related
Airflow UI comes very handy to check the status of Airflow DAG progress.
Airflow Rest API is another way to check the status to Airflow DAGs but it requires authentication token.
We can get authentication token from Airflow UI, but if Airflow UI is down it would be difficult to get authentication token and Airflow DAG status.
Is there any other way to check/monitor and clear task instance from backend (apart from Rest API and Airflow UI) ?
We have setup status checks on airflow health check endpoint based on this Airflow doc page Checking Airflow Health Status. We have serverless functions running every 5 minutes to check that the status for metabase and scheduler are healthy.
When Airflow is down, you can get alerts routed directly to Slack channel / Email / Opsgenie Alerts through another code block defined in the serverless function.
from airflow.models.dagrun import DagRun
from airflow.utils.state import DagRunState
dag_runs = DagRun.find(dag_id='the_dag_id_you_want_to_check')
last_run = dag_runs[-1]
print('the dag state is -->: ', last_run.state)
we have added ssh connection in airflow connections, while testing it we get error like
'Hook SSHHook doesn't implement or inherit test_connection method'
Test button in the UI works only with Hooks that implemented test_connection. This means that you can not test the connection with the UI and you will have to create some DAG to test it.
In newer airflow versions the button will be disabled for connections/hooks that doesn't support this functionality (See PR)
I have setup airflow in my local machine. I am trying to access the below airflow link:
http://localhost:8080/api/experimental/test/
I am getting Airflow 404 = lots of circles
I have tried to set auth_backend to default, but no luck.
What changes do i need to make in airflow.cfg to be able to make REST API calls to airflow for triggering DAGs?
Experimental API is disabled by default in Airlfow 2. It was used in 1.10 but it has been deprecated and disabled by default in Airflow 2. Instead you should use the fully-fledged REST API which uses completely different URL scheme:
https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html
In Airflow UI you can even browse and try the API (just look at the menus of Airflow).
When you trigger an Airflow DAG either through the UI (see screenshot) or the API (https://airflow.apache.org/docs/stable/rest-api-ref.html), you have the option of submitting a JSON configuration. However the usefulness of this isn't clearly documented as far as I can tell. I have two basic questions:
Is this intended for free-form configuration settings at the application level, or is this only for Airflow configuration variables?
If this is for free-form configuration settings, how (in my code) can I access whatever configuration was passed when the DAG was triggered?
Here is the screenshot where you can provide configuration when triggering a DAG:
Yes it is intended for Application level configuration.
Example -
{"appConfig":"Test"}
To read it in your DAG
def read_app_configuration(**kwargs):
print("Read App Config - Task : Start")
dag_run = kwargs['dag_run']
region = kwargs['dag_run'].conf.get('appConfig')
I have a setup where I have an app engine REST application and a Google composer / airflow DAG that has a task where it is supposed to fetch data from one of the endpoints of the app. The app is protected by IAP. I have added the service account under which Airflow runs to the "IAP-secured Web App User" list, however each time the step executes the response to the http call is the Google Sign-In page. Any idea if any additional step is needed?
The code for my DAG step:
def get_data():
r = requests.get(url="url-to-my-app-endpoint>")
print('stuff:')
print(r.status_code)
print(r.content)
return 1
# ...
python_fetch_data = PythonOperator(
task_id='python_fetch_data',
python_callable=get_data,
dag=dag,
depends_on_past=True,
priority_weight=2
)
https://cloud.google.com/iap/docs/authentication-howto#authenticating_from_a_service_account explains how to extend your DAG code so that it sends credentials to the IAP-protected API backend.
A bit of background: Since Composer is built on top of GCP, your Composer deployment has a unique service account identity that it's running as. You can add that service account to the IAP access list for your endpoint.
I don't know if the Composer UI makes it easy to see the "email" address for your service account, but if you add the code above and decode the token it generates, that will show it.